题名

考量資源擴增取捨之多核心即時系統排程

并列篇名

Many-Core Real-Time Task Scheduling with Resource Augmentation Trade-off Considerations

DOI

10.6342/NTU201700250

作者

鄭聖威

关键词

即時系統 ; 分群多核心平台 ; 可程式化運算核心 ; 同時考量記憶體分配與工作排程 ; 資源擴充 ; 近似演算法 ; Real-Time System ; Many-Core Island-Based Platform ; Reconfigurable Processor ; Joint Memory Allocation and Task Scheduling ; Resource Augmentation ; Approximation Algorithm

期刊名称

國立臺灣大學資訊工程學系學位論文

卷期/出版年月

2017年

学位类别

博士

导师

郭大維

内容语文

英文

英文摘要

Following the success of multi-core architectures in power consumption reduction and thermal dissipation improvement, many-core architectures draw a lot of attention for the next development trend. On a multi-island platform, homogeneous cores are grouped into islands, each of which is equipped with a scratchpad memory module (referred to as local memory). In addition, a reconfigurable processor which consists of a reconfigurable fabric and general-purpose processors becomes popular due to the ability to accelerate a wide variety of applications. However, the demand to schedule real-time tasks on these emerging architectures poses a challenge. In this dissertation, we seek efficient solutions to schedule real-time tasks with memory preallocation, interference minimization, and data prefetching concerns. We first show the NP-hardness and the inapproximability to schedule real-time tasks on a multi-island platform. Despite the inapproximability, positive results can still be found when different cases of the problem are investigated. We employ the idea of resource augmentation, and aim to capture the relation between the speedup factor and the resource augmentation factor. Being aware of such relation, we can help system developers to understand the trade-off between hardware costs and performance. For example, when the technique of resource augmentation is considered, this dissertation develops an algorithm with (2γ−1)/(γ−1) speedup factor and (γ + 1) memory augmentation factor for the memory preallocation problem on a multi-island platform, where γ represents the trade-off between CPU utilization and local memory space. In addition, resources, such as memory banks and buses, are shared among all cores within an island for power, performance, and cost reasons. The interference on the shared local memory also poses a challenge on real-time analysis, but can be alleviated if task data partition onto local memory is applied with care. We employ symmetric real-time analysis and rectify the analysis to adapt to the considered platform. We propose an algorithm with (5+ 3(2γ+1)/γ) speedup factor and (γ +1) memory augmentation factor for the interference minimization problem on a single-island platform, where γ > 0. Lastly, on a reconfigurable processor, tasks are required to configure their accelerators on a capacity-limited reconfigurable fabric before their execution. To improve system performance, task sequencing and task prefetching are important techniques to minimize the makespan of a task set. We aim at deriving a theoretical analysis that can help system developers quantify the performance of the resulted makespan with respect to the fabric capacity on a reconfigurable processor. We adopt Johnson’s rule and Post-Swapping algorithms to generate a task sequence, and identify capacity blocking and prefetching blocking when configuring accelerators on a reconfigurable processor. Our analysis yields an asymptotic approximation bound (4max{ω,0.5} − 1)OP T + ωrB for makespan minimization problem on a reconfigurable processor, where ωrB is the maximum accelerator configuration time among the tasks.

主题分类 基礎與應用科學 > 資訊科學
電機資訊學院 > 資訊工程學系
参考文献
  1. [2] T. P. Baker. Stack-based Scheduling for Real time Processes. Real-Time System,pages 67–99, 1991.
    連結:
  2. [3] F. Barat and R. Lauwereins. Reconfigurable instruction set processors: a survey. In RSP, 2000.
    連結:
  3. [4] S. Baruah and N. Fisher. A Dynamic Programming Approach to Task Partitioning upon Memory-constrained Multiprocessors. In Proc. of IEEE RTCSA, 2004.
    連結:
  4. [5] S. Baruah and N. Fisher. The Partitioned Multiprocessor Scheduling of Sporadic Task Systems. In Proc. of IEEE RTSS, pages 321–329, 2005.
    連結:
  5. [6] Sanjoy K Baruah, Aloysius K Mok, and Louis E Rosier. Preemptively scheduling hard-real-time sporadic tasks on one processor. In Real-Time Systems Symposium, 1990. Proceedings., 11th, pages 182–190. IEEE, 1990.
    連結:
  6. [7] Luis Angel D. Bathen and Nikil D. Dutt. SPMCloud: Towards the Single-Chip Embedded ScratchPad Memory-Based Storage Cloud. ACM Trans. on DAES, pages 22:1–22:45, 2014.
    連結:
  7. [8] L. Bauer, A. Grudnitsky, M. Shafique, and J. Henkel. Pats: A performance aware task scheduler for runtime reconfigurable processors. In IEEE FCCM, 2012.
    連結:
  8. [9] L. Bauer, M. Shafique, and J. Henkel. Run-time instruction set selection in a transmutable embedded processor. In ACM/IEEE DAC, 2008.
    連結:
  9. [11] Enrico Bini and Giorgio C. Buttazzo. Measuring the Performance of Schedulability Tests. Real-Time System, pages 129–154, 2005.
    連結:
  10. [12] Enrico Bini and Giorgio C Buttazzo. Measuring the performance of schedulability tests. Real-Time Systems, 30(1-2):129–154, 2005.
    連結:
  11. [14] C.-W. Chang, J.-J. Chen, T.-W. Kuo, and H. Falk. Real-Time Partitioned Scheduling on Multi-Core Systems with Local and Global Memories. In Proc. of IEEE/ACM ASP-DAC, pages 467–472, 2013.
    連結:
  12. [15] C.-W. Chang, J.-J. Chen, W. Munawar, T.-W. Kuo, and H. Falk. Partitioned Scheduling for Real-Time Tasks on Multiprocessor Embedded Systems with Programmable Shared SRAMs. In Proc. of ACM EMSOFT, pages 153–162, 2012.
    連結:
  13. [17] Jian-Jia Chen. Task set synthesis with cost minimization for sporadic real-time tasks. In IEEE 34th Real-Time Systems Symposium, (RTSS), Vancouver, BC, Canada, December 3-6, 2013, pages 350–359, 2013.
    連結:
  14. [18] Sheng-Wei Cheng, Che-Wei Chang, Jian-Jia Chen, Tei-Wei Kuo, and Pi-Cheng Hsiu. Many-core real-time task scheduling with scratchpad memory.
    連結:
  15. [20] R. I. Davis and A. Burns. A Survey of Hard Real-Time Scheduling for Multiprocessor Systems. ACM Computing Surveys, pages 35:1–35:44, 2011.
    連結:
  16. [21] Robert I Davis, Attila Zabos, and Alan Burns. Efficient exact schedulability tests for fixed priority real-time systems. IEEE Transactions on Computers, 57(9):1261–1276, 2008.
    連結:
  17. [24] H. Falk and J. C. Kleinsorge. Optimal Static WCET-Aware Scratchpad Allocation of Program Code. In Proc. of ACM DAC, pages 732–737, 2009.
    連結:
  18. [26] Dongrui Fan, Hao Zhang, Da Wang, Xiaochun Ye, Fenglong Song, Guojie Li, and Ninghui Sun. Godson-T: An Efficient Many-Core Processor Exploring Thread Level Parallelism. IEEE Micro, pages 38–47, 2012.
    連結:
  19. [27] M. J. Feeley, W. E. Morgan, E. P. Pighin, and et al. Implementing Global Memory Management in a Workstation Cluster. In Proc. of ACM SOSP, pages 201–212, 1995.
    連結:
  20. [28] N. Fisher, J. H. Anderson, and S. Baruah. Task Partitioning upon MemoryConstrained Multiprocessors. In Proc. of IEEE RTCSA, pages 416–421, 2005.
    連結:
  21. [29] Paul C Gilmore and Ralph E Gomory. Sequencing a one state-variable machine: A solvable case of the traveling salesman problem. Operations Research, 12(5):655–679, 1964.
    連結:
  22. [30] Artjom Grudnitsky, Lars Bauer, and Jぴorg Henkel. Morp: Makespan optimization for processors with an embedded reconfigurable fabric. In ACM/SIGDA FPGA, 2014.
    連結:
  23. [31] M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. Mibench: A free, commercially representative embedded benchmark suite. In IEEE WWC, 2001.
    連結:
  24. [32] J. Howard, S. Dighe, Y. Hoskote, S.Vangal, and et al. A 48-Core IA-32 MessagePassing Processor with DVFS in 45nm CMOS. In Proc. of IEEE ISSCC, pages 108–109, 2010.
    連結:
  25. [33] Wen-Hung Huang, Jian-Jia Chen, and Jan Reineke. Mirror: symmetric timing analysis for real-time tasks on multicore platforms with shared resources. In Proceedings of the 53rd Annual Design Automation Conference, page 158. ACM, 2016.
    連結:
  26. [34] Selmer Martin Johnson. Optimal two-and three-stage production schedules with setup times included. Naval research logistics quarterly, 1(1):61–68, 1954.
    連結:
  27. [35] Lei Ju, Samarjit Chakraborty, and Abhik Roychoudhury. Accounting for Cacherelated Preemption Delay in Dynamic Priority Schedulability Analysis. In Proc. of IEEE DATE, pages 1623–1628, 2007.
    連結:
  28. [36] B. Kalyanasundaram and K. Pruhs. Speed is as Powerful as Clairvoyance. J. ACM, pages 617–643, 2000.
    連結:
  29. [37] Bala Kalyanasundaram and Kirk Pruhs. Speed is as powerful as clairvoyance. Journal of the ACM (JACM), 47(4):617–643, 2000.
    連結:
  30. [38] S. Kato, N. Yamasaki, and Y. Ishikawa. Semi-partitioned Scheduling of Sporadic Task Systems on Multiprocessors. In Proc. of IEEE ECRTS, pages 249–258, 2009.
    連結:
  31. [39] T. Kelter, H. Falk, P. Marwedel, and et al. Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds. In Proc. of IEEE ECRTS, pages 3–12, 2011.
    連結:
  32. [40] Hyoseung Kim, Dionisio De Niz, Bjぴorn Andersson, Mark Klein, Onur Mutlu, and Ragunathan Rajkumar. Bounding and reducing memory interference in cots-based multi-core systems. Real-Time Systems, 52(3):356–395, 2016.
    連結:
  33. [41] R. Kirner and P. Puschner. Classification of WCET Analysis Techniques. In Proc. of IEEE ISORC, pages 190–199, 2005.
    連結:
  34. [42] H. Leontyev and J. H. Anderson. Generalized Tardiness Bounds for Global Multiprocessor Scheduling. In Proc. of IEEE RTSS, pages 413–422, 2007.
    連結:
  35. [43] C. L. Liu and J. W. Layland. Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment. J. ACM, pages 46–61, 1973.
    連結:
  36. [45] Mirko Loghi, Massimo Poncino, and Luca Benini. Cache coherence tradeoffs in shared-memory mpsocs. ACM Trans. Embed. Comput. Syst., 5(2):383–407, May 2006.
    連結:
  37. [46] Malardalen WCET Research Group. WCET Benchmarks. http://www.mrtc.mdh.se/projects/wcet, 2012.
    連結:
  38. [49] D. Melpignano, L. Benini, E. Flamand, and et al. Platform 2012, a Many-Core Computing Accelerator for Embedded SoCs: Performance Evaluation of Visual Analytics Applications. In Proc. of ACM DAC, pages 1137–1142, 2012.
    連結:
  39. [50] G. Memik, W. H. Mangione-Smith, and W.-D. Hu. NetBench: A Benchmarking Suite for Network Processors. In Proc. of IEEE/ACM ICCAD, 2001.
    連結:
  40. [53] M. Niemeier, A. Wiese, and S. Baruah. Partitioned Real-time Scheduling on Heterogeneous Shared-Memory Multiprocessors. In Proc. of IEEE ECRTS, pages 115–124, 2011.
    連結:
  41. [54] P. R. Panda, N. D. Dutt, and A. Nicolau. On-chip vs. Off-chip Memory: the Data Partitioning Problem in Embedded Processor-Based Systems. ACM Trans. on DAES, pages 682–704, 2000.
    連結:
  42. [55] M. Paolieri, E. Qui nones, F. J. Cazorla, and et al. Hardware Support for WCET Analysis of Hard Real-Time Multicore Systems. In Proc. of ACM ISCA, pages 57–68, 2009.
    連結:
  43. [56] M. Paolieri, E. Qui nones, F. J. Cazorla, and M. Valero. An Analyzable Memory Controller for Hard Real-Time CMPs. Proc. of IEEE LES, pages 86–90, 2009.
    連結:
  44. [58] Michael L Pinedo. Scheduling: theory, algorithms, and systems. Springer Science & Business Media, 2012.
    連結:
  45. [59] C. Pitter and M. Schoeberl. A real-time Java chip-multiprocessor. ACM Trans. on TECS, pages 9:1–9:34, 2010.
    連結:
  46. [61] Vincenzo Rana, Srinivasan Murali, David Atienza, Marco Domenico Santambrogio, Luca Benini, and Donatella Sciuto. Minimization of the reconfiguration latency for the mapping of applications on fpga-based systems. In ACM/IEEE CODES+ISSS, 2009.
    連結:
  47. [62] S. Baruah and N. Fisher. The Partitioned Multiprocessor Scheduling of DeadlineConstrained Sporadic Task Systems. In IEEE Trans. on Computers, pages 918–923, 2006.
    連結:
  48. [63] V. Suhendra, T. Mitra, A. Roychoudhury, and Ting Chen. WCET centric data allocation to scratchpad memory. In Proc. of IEEE RTSS, pages 223–232, 2005.
    連結:
  49. [64] T. Ungerer, F. J. Cazorla, P. Sainrat, and et al. MERASA: Multicore Execution of Hard Real-Time Applications Supporting Analyzability. IEEE Micro, pages 66–75, 2010.
    連結:
  50. [1] UTDSP Benchmark Suite. http://www.eecg.toronto.edu/corinna/DSP/infrastructure/UTDSP.tar.gz, 2012.
  51. [10] Luca Benini. Programming heterogeneous many-core platforms in nanometer technology: the p2012 experience. Presentation in the ARTIST Summer School, Autrans, France, 2010.
  52. [13] S. Y. Borkar, P. Dubey, K. C. Kahn, and et al. Platform 2015: Intel Processor and Platform Evolution for The Next Decade. Intelligence/sigart Bulletin, 2005.
  53. [16] Che-Wei Chang, Jian-Jia Chen, Tei-Wei Kuo, and Heiko Falk. Real-time task scheduling on island-based multi-core platforms. IEEE Transactions on Parallel and Distributed Systems, 26(2):538–550, 2015.
  54. [19] Louise H Crockett, Ross A Elliot, Martin A Enderwitz, and Robert W Stewart. The Zynq Book: Embedded Processing with the Arm Cortex-A9 on the Xilinx Zynq-7000 All Programmable Soc. Strathclyde Academic Media, 2014.
  55. [22] B. D. de Dinechin, R. Ayrignac, P. E. Beaucamps, P. Couvert, B. Ganne, P. G. de Massas, F. Jacquet, S. Jones, N. M. Chaisemartin, F. Riss, and T. Strudel. A clustered manycore processor architecture for embedded and accelerated applications. In High Performance Extreme Computing Conference (HPEC), 2013 IEEE, pages 1–6, Sept 2013.
  56. [23] Benoふıt Dupont de Dinechin, Duco van Amstel, Marc Poulhi`es, and Guillaume Lager. Time-critical computing on a single-chip massively parallel processor. In Proceedings of the Conference on Design, Automation & Test in Europe, DATE ’14, pages 97:1–97:6, 3001 Leuven, Belgium, Belgium, 2014. European Design and Automation Association.
  57. [25] Heiko Falk, Paul Lokuciejewski, and Henrik Theiling. Design of a WCET-Aware C Compiler. In Proc. of IEEE/ACM/IFIP Workshop on ESTMED, 2006.
  58. [44] Jane W.-S. Liu. Real-Time Systems. 1st edition, 2000.
  59. [47] T. G. Mattson, R. F. Van der Wijngaart, M. Riepen, and et al. The 48-core SCC Processor: the Programmer’s View. In The Journal of Supercomputing, pages 1–11, 2010.
  60. [48] Alessandra Melani, Marko Bertogna, Vincenzo Bonifaci, Alberto MarchettiSpaccamela, and Giorgio Buttazzo. Memory-processor co-scheduling in fixed priority systems. In Proceedings of the 23rd International Conference on Real Time and Networks Systems, pages 87–96. ACM, 2015.
  61. [51] Jぴorg Mische, Stefan Metzlaff, and Theo Ungerer. Distributed Memory on Chip – Bringing Together Low Power and Real-Time. In Proc. of the Workshop on RePP, 2014.
  62. [52] A. K. Mok. Fundamental Design Problems of Distributed Systems for the HardReal-Time Environment. Technical report, 1983.
  63. [57] Rodolfo Pellizzoni, Andreas Schranzhofer, Jian-Jia Chen, Marco Caccamo, and Lothar Thiele. Worst case delay analysis for memory interference in multicore systems. In Proceedings of the Conference on Design, Automation and Test in Europe, pages 741–746. European Design and Automation Association, 2010.
  64. [60] P. Puschner and A. Burns. A Review of Worst-Case Execution-Time Analysis (editorial). Real-Time Systems, pages 115–128, 2000.
  65. [65] Pavel G. Zaykov, Georgi K. Kuzmanov, and Georgi N. Gaydadjiev. Reconfigurable multithreading architectures: A survey. In IEEE SAMOS, 2009.
  66. [66] V. Zivojnovic, J. M. Velarde, C. Schlager, and H. Meyr. DSPstone: A DSP-Oriented Benchmarking Methodology. In Proc. of ICSPAT, 1994.