Configurable Hardware Support for Single Processor Real-Time Systems

Full text

(1)

(2) .

(3)

(4)

(5)

(6) . !" #$$%. . ! " ! . !.

(7) #$ ! % & '( )** +,-.),+ / .0-.-,1,-01- 2 34 44$ 5 6 7.

(8) Abstract Embedded systems are included in a variety of products within different technical areas such as industrial automation, consumer electronics, automotive industry, and communication-, and multimedia systems. Products ranging from trains and airplanes to microwave ovens and washing machines are controlled by embedded systems. Programmable devices constitute a part of these embedded systems. Today, a programmable device can include a complete system containing building blocks connected with each other via programs written using a hardware description language. The programmable devices can be programmed and changed over and over again and this flexibility makes it possible, before final implementation, to explore how these building blocks can best be designed in relation to system requirements. This thesis describes a further development of a building block for programmable devices that handles real-time functionality in embedded systems. The building block is implemented in a non-traditional way, i.e., the implementation is written using both hardware description language and traditional software languages. This enables certain benefits, such as increased performance, predictability and less memory consumption. Using a non-traditional implementation also has its drawbacks, and e.g., extensions and adjustments can be hard to handle since modifications are required in both hardware and software. After examining how the usability of the new building block could be improved, the configurability of the block was extended. This enables customization and makes it possible to use the block within a wider spectrum of applications. It is also possible to reduce the size and cost of the final product since resource consumption can be optimized. Furthermore, a mathematical model estimating resource consumption for configurable real-time functionality has been developed. The model enables i.

(9) ii. distinctive trade-offs comparisons, and guidance for system designers, when considering what type of real-time operating system to use in a certain design..

(10) Swedish Summary Svensk sammanfattning Inbyggda system finns inom många olika teknikområden som industriell automation, hemelektronik, bilindustri och kommunikations- och media system. Produkter inom allt från tåg och flygplan till mikrovågsugnar och tvättmaskiner är kontrollerade av inbyggda system. Programmerbara kretsar utgör en del av dessa inbyggda system. En krets kan idag innehålla ett komplett system som innehåller byggblock som kopplas ihop med program skrivna i ett hårdvarubeskrivande programmeringsspråk. Kretsarna kan programmeras och ändras om och om igen och denna flexibilitet möjliggör att man kan utforska hur dessa byggblock kan utformas på bästa sätt i förhållande till systemets krav, innan slutlig implementation. Avhandlingen beskriver en vidareutveckling av ett byggblock för programmerbara kretsar som hanterar realtidsfunktionalitet i inbyggda system. Byggblocket är implementerat på ett icke traditionellt sätt, det vill säga att implementeringen är fördelad i både hårdvarubeskrivande programmeringsspråk och traditionell programvara. Detta medför vissa fördelar, såsom ökad prestanda, förutsägbarhet och mindre åtgång av utrymme för programvara i minneskretsar. En icke traditionell implementering kan dock vara svårare att använda och modifiera. Efter att ha undersökt hur man kan underlätta användandet av ett sådant icke traditionellt byggblock har dess konfigurerbarhet utökats. Detta gör att man kan anpassa byggblocket efter behov och leder till att man kan använda det i ett bredare perspektiv, men även att man kan optimera resursutnyttjandet vilket medför en reducering av den slutliga produktkostnaden, då man kan välja en mindre och mer kostnadseffektiv programmerbar krets. Utöver detta har en matematisk modell tagits fram. Modellen uppskattar iii.

(11) iv. resursutnyttjandet av byggblock för konfigurerbar realtidsfunktionalitet i programmerbara kretsar. Genom att kunna förutsäga resursutnyttjandet får systemkonstruktören en överblick av resursåtgång och kan på så sätt få bättre beslutsunderlag vid val av realtidssystem och programmerbar krets..

(12)

(13)

(14) Acknowledgements This work has been supported by the KK Foundation (KKS), RealFast Intellectual Property AB and Prevas AB. I would like to thank my supervisors at Mälardalen University, Lars Asplund and Kristina Lundqvist. I also want to thank Lennart Lindh, Mälardalen University and Anders Näslund and Claes Brisby, Prevas AB. Susanna Nordström Västerås, February 2008.. vii.

(15)

(16) List of Publications The following papers are included in this Licentiate1 thesis. A.. Configurability and Hardware Support for Real-Time Operating Systems - A State of the Art Report, Susanna Nordström, MRTC Technical Report, Mälardalen University, Västerås, Sweden, February 2008.. B.. Application Specific Real-Time Microkernel in Hardware, Susanna Nordström, Lennart Lindh, Lars Johansson and Tobias Skoglund, In Proceedings of 14th IEEE-NPSS Real-Time Conference, Stockholm, Sweden, June 2005.. C.. Configurable Hardware/Software Support for Single Processor Real-Time Kernels, Susanna Nordström and Lars Asplund, In Proceedings of International Symposium on System-On-Chip Conference, Tampere, Finland, November 2007.. D.. Model for Resource Usage Estimation of Configurable Real-Time Kernels in Hardware and Software, Susanna Nordström and Lars Asplund, Submitted to Real-Time Systems Journal, February 2008.. Comments on my contribution A.. 1A. I am the sole author of this paper.. Licentiate degree is a Swedish degree halfway between MSc and PhD.. ix.

(17) x. B.. The prototype implementation and measurements presented in this paper were performed in a Master’s thesis project by Johansson and Skoglund. I compiled and further analyzed the results of the thesis, and wrote the paper in discussion with Lindh.. C.. I performed the configuration implementations, built the systems from where the results were extracted, and carried out the measurements. I wrote the paper, in discussion with, and under supervision of, Asplund.. D.. I built the systems for the experiments and carried out all measurements, constructed the resource usage estimation model, and performed the model validation and analysis. I wrote the paper, in discussion with, and under supervision of, Asplund..

(18) Contents I. Thesis. 1. 1 Introduction 1.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Background and Motivation 2.1 Real-Time Operating Systems . . . . 2.2 Hardware Support . . . . . . . . . . . 2.3 The Real-Time Unit (RTU) . . . . . . 2.4 Configurability . . . . . . . . . . . . 2.5 Modeling Resource Usage Estimation. 3 5. . . . . .. 7 7 8 9 12 16. 3 Problem Description 3.1 Research Method . . . . . . . . . . . . . . . . . . . . . . . .. 19 20. 4 Related Work 4.1 Previous Research on the Real-Time Unit . . . . . . . . . . . 4.2 Hardware Support and Configuration . . . . . . . . . . . . . . 4.3 Models and Estimations . . . . . . . . . . . . . . . . . . . . .. 23 23 27 29. 5 Summary of Papers 5.1 Paper A (Chapter 7) . 5.2 Paper B (Chapter 8) . 5.3 Paper C (Chapter 9) . 5.4 Paper D (Chapter 10). . . . .. 31 31 31 32 33. 6 Summary and Conclusions 6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . .. 35 37. . . . .. . . . .. . . . .. . . . .. . . . .. xi. . . . .. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . ..

(19) xii. Contents. Bibliography. 39. II. 43. Included Papers. 7 Paper A: Configurability and Hardware Support for Real-Time Operating Systems - A State of the Art Report 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Hardware Support for Real-Time Operating Systems . . . . . 7.3.1 Real-Time Unit (RTU) . . . . . . . . . . . . . . . . . 7.3.2 The δ Hardware/Software RTOS Framework and the Configurable Hardware Scheduler . . . . . . . . . . . 7.3.3 Co-Scheduler2 . . . . . . . . . . . . . . . . . . . . . 7.3.4 Real-Time Task Manager (RTM) . . . . . . . . . . . . 7.3.5 Operating System Coprocessor (OSC) . . . . . . . . . 7.3.6 The Silicon OS in the TRON project . . . . . . . . . . 7.3.7 F-Timer . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Paper B: Application Specific Real-Time Microkernel in Hardware 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 8.2 Published Work . . . . . . . . . . . . . . . . . . . . . 8.3 Implementation . . . . . . . . . . . . . . . . . . . . . 8.3.1 Task management . . . . . . . . . . . . . . . . 8.3.2 Semaphore and flag management . . . . . . . 8.3.3 Time management . . . . . . . . . . . . . . . 8.4 Experimental Results . . . . . . . . . . . . . . . . . . 8.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. 45 47 48 50 51 52 54 54 55 56 57 57 61. 65 67 68 69 71 71 72 72 74 75. 9 Paper C: Configurable Hardware/Software Support for Single Processor RealTime Kernels 77 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 9.2 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 80.

(20) Contents. 9.2.1 Real-Time Unit (RTU) 9.2.2 MicroC/OS-II . . . . . 9.3 Footprint . . . . . . . . . . . 9.4 Conclusions . . . . . . . . . . Bibliography . . . . . . . . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. xiii. . . . . .. . . . . .. 80 82 83 86 87. 10 Paper D: Model for Resource Usage Estimation of Configurable Real-Time Kernels in Hardware and Software 89 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 10.1.1 Configuration . . . . . . . . . . . . . . . . . . . . . . 92 10.1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . 92 10.1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . 93 10.1.4 Organization . . . . . . . . . . . . . . . . . . . . . . 94 10.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 10.2.1 Real-Time Kernel Memory Resource Usage Model . . 97 10.2.2 Real-Time Kernel FPGA Area Resource Usage Model 100 10.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 10.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . 104 10.3.2 Experimental Procedure . . . . . . . . . . . . . . . . 107 10.3.3 Model Validation . . . . . . . . . . . . . . . . . . . . 108 10.3.4 Real-Time Kernel Memory Resource Usage Results . 111 10.3.5 Real-Time Kernel FPGA Area Resource Usage Results 117 10.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . 123 10.5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . 127 10.6 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128.

(21)

(22) I Thesis. 1.

(23)

(24) Chapter 1. Introduction "The monolithic idea (the chip) occurred to Robert Noyce in the depth of winter - or at least in the mildly chilly season that passes for winter in the sunny valley of San Francisco Bay that is known today, because of that idea, as Silicon Valley1." The monolithic idea, bringing electronic components into a chip, was the start of the integrated circuit era that began in the late 1950s and has since then made the way for development of a large set of different integrated circuit designs. Technology advancements has made it possible to integrate more and more electronic components into a chip. Today it is possible to include a whole system in one chip, i.e., a system-on-chip (SoC). A system in this sense, is a set of components that together constitute a whole, performing a certain assignment. A system-on-chip may contain one or more processors as well as a variety of functionality for communication, calculation, timing etc., all depending on what product the system-on-chip is to be used in. The system-on-chip technology is included in a variety of products in different technique areas e.g., industrial automation, consumer electronics, automotive industry, and communication- and multimedia systems. Products ranging from trains and airplanes to washing machines and mobile phones are controlled by system-on-chip solutions. There is a large collection of devices that a system-on-chip can be implemented in. One of them is the Field Programmable Gate Array (FPGA). Most FPGAs can be programmed by a designer an infinite 1 From the book "The Chip: How Two Americans Invented the Microchip and Launched a Revolution", by T.R. Reid, ISBN 0-375-75828-3.. 3.

(25) 4. Chapter 1. Introduction. number of times, after the circuit is manufactured. This means that the functionality implemented in the FPGA can be changed for bug fixes or upgrades during design, if and when it is necessary. An example of a product containing an FPGA implemented system-on-chip solution, is a digital set-top box. A digital set-top box is in general connected to a TV and receives and decodes broadcast from mainly satellite, cable, broadband and terrestrial television. The set-top box can also provide video, audio, internet webpages and video games, depending on supplier. It is too expensive for a set-top box manufacturer to provide receiving of all broadcast alternatives in one box. However, most components in a set-top box remains the same, regardless of received broadcasting. Here, the flexibility of an FPGA is useful. An FPGA in a set-top box is often used as "glue" between other components in the system, for example when implemented to be the interface between the receiver section and the main processing section (as shown in Fig. 1.1). This means that the rest of the system remains but the FPGA is programmed differently depending on what broadcast the set-top box will contain [1]. Another. Figure 1.1: A possible placements of FPGA technology in the receiving end in the internal architecture of a Digital set-top box [1]. example of the flexibility of an FPGA is that since a set-top box is connected to a network, an FPGA can be reprogrammed with upgrades after delivery to customer. The functionality in a system-on-chip is divided into components, which are created using a hardware description language (HDL). System-on-chip de-.

(26) 1.1 Outline. 5. signers may construct these components themselves, using general purpose pre-designed components provided by FPGA manufacturers, or buy components from Intellectual Property (IP) component suppliers. This thesis presents further development and analysis regarding one such component. The component implements real-time operating system functionality (described in Section 2.1), and is called Real-Time Unit (RTU). The RTU has been developed via previous research at Mälardalen Real-Time Research Centre (MRTC) [2, 3]. Conventional real-time operating systems on the market today are implemented using a software programming language, the RTU is mainly implemented using a hardware description language. This thesis presents the work of providing customization possibilities to the RTU in order to increase its usability. By customization we mean that the component provides a set of functionality that the system designer can choose to enable, disable, and quantify. The customized (i.e., configured) component occupies various amount of area in the FPGA depending on configuration. In order to address the need to obtain early awareness of required FPGA area, a mathematical model for estimating the size of consumed area according to configuration has been constructed. The mathematical model is applicable for estimating the size of both a conventional software implemented, and an unconventional hardware implemented, real-time operating system. The model enables distinctive tradeoffs comparisons, and guidance for system designers, when considering what type of real-time operating system to use in a certain system-on-chip design.. 1.1 Outline The thesis is divided into two parts. Part I begins with the introduction above and continues with background and motivation in Chapter 2, where concepts such as real-time operating systems, hardware support for real-time operating systems, configurability and FPGA resource usage estimation modeling are presented. Problem description with research questions and research methodology is described in Chapter 3 followed by related work in Chapter 4. Chapter 5 contains summaries of publications included in this thesis. Chapter 6 presents conclusions and a brief description of future work. Part II with Chapter 7 to 10, contains the publications this thesis is based upon..

(27)

(28) Chapter 2. Background and Motivation This chapter describes concepts used throughout the thesis, such as real-time operating systems, hardware support for real-time operating systems, configurability and FPGA resource usage estimation modeling.. 2.1 Real-Time Operating Systems This section describes the basic concepts of real-time operating systems (RTOS) that will be used throughout the thesis. A real-time system is a system that reacts on events in the environment and executes functions based on these within a precise time. In these systems, time is a vital parameter and the behavior of the system is only considered correct if the correct result is presented within a specified time limit [4]. An airbag in a car exemplifies the importance of reacting to an event in a limited time frame. The airbag cannot be triggered too late, nor too early. Additionally, the airbag exemplifies how a real-time system reacts on events (input) and produces a result (output). The sensor detecting a collision is input to the real-time system and causes it to react. The reaction triggers the correct signals in order to get the airbag to develop on time. A real-time operating system (RTOS) is an operating system that is implemented for real-time systems in order to simplify design, execution and maintenance of real-time systems and applications. An RTOS provides the system designers with a pre-designed programming interface with services that simplifies the construction of an RTOS application [5]. An RTOS application is 7.

(29) 8. Chapter 2. Background and Motivation. divided into tasks1 that are executed by the processor. Each task has a special assignment to perform when it is running (i.e., executed). A task can be executed periodically (periodic task), or be activated by an event (aperiodic task). Each task is assigned a priority level that decides when a task is to be executed if several tasks are ready to run at the same time. The RTOS scheduler controls the execution order of the tasks according to implemented scheduling algorithm. Tasks may need to communicate with each other or exchange information. An RTOS provides mechanisms for communication and synchronization between tasks, also called interprocess communication (IPC). What sort of IPC that is provided is RTOS specific but semaphores, flags, mailboxes, message queues and mutexes are a few examples. Semaphores and flags will be mentioned in following chapters of the thesis. A semaphore is a data structure administrated by the RTOS which is used for synchronization between tasks that use resources. A resource can be a certain memory area, an input/output port etc. Semaphores are used to protect resources that only can be used by one task at a time [4]. Flags are used when tasks need to synchronize multiple events. A flag may be set or cleared when a certain event occurs, and a task may wait for one or several flags to be set, or cleared, in order to react [5].. 2.2 Hardware Support The FPGA technology has made it possible to implement established software algorithms in hardware, i.e., hardware description languages such as VHDL2 or Verilog [6]. The term hardware support describes functionality that has been implemented with a hardware description language (HDL) instead of a software programming language executed by a Central Processing Unit (CPU). Like any other hardware component implemented in HDL, hardware support for RTOS executes in parallel to the CPU, and this is the motivation for moving functionality from software to hardware, the concurrency of the hardware results in relieved pressure on the CPU and enhanced performance [7]. Consequently, hardware support for RTOS is conventional RTOS functionality that has been implemented in hardware instead of software. What sort of RTOS functionality that benefits the most from being implemented in the hardware support part regarding performance, has been a topic for research in several research groups over the world since the beginning of the 1990s. An 1 Also. called processes or threads. hardware description language (VHSIC: Very High Speed Integrated Circuits). 2 VHSIC.

(30) 2.3 The Real-Time Unit (RTU). 9. overview of this research is presented in Chapter 4 describing both in-house research regarding the RTU and other research on hardware support. To summarize, research on hardware support has shown benefits of having real-time kernel functionality implemented in hardware: • Hardware is deterministic. Therefore the hardware support improves the predictability of the time behavior of the system. The time gap between the service call minimum and maximum time is decreased. • Hardware is concurrent. A hardware supported RTOS can utilize this hardware characteristic, enabling fast response. • In software real-time kernels, the CPU has to execute code for clock-tick administration and task waiting queues. When clock-tick administration is implemented in hardware, the CPU load is reduced because no CPU execution time for clock-tick administration is required. Neither needs the CPU be involved in task queue handling when the scheduling algorithm is implemented in hardware and administrated in parallel to the CPU. This results in reduced system overhead and faster service call response time. • The memory footprint is reduced since functionality is placed in hardware instead of software. A disadvantage compared to software solutions is, naturally, that hardware support occupies FPGA area and conventional software implemented RTOS occupy none. Furthermore, using hardware support requires a purchase of an FPGA and associated development tool. A more detailed example of how a hardware supported RTOS is implemented is presented in Section 2.3, describing the version of the RTU that this work originates from.. 2.3 The Real-Time Unit (RTU) The RTU has been a topic for research since the first article was published in 1991 [8] describing creation and proof of concept. Since then the RTU has been further developed, had different architectures, and been used in a variety of projects. This is described in Section 4.1 that presents a brief overview of in-house work on the RTU. It has also been developed to a commercial product as an intellectual property component first released by the company RealFast Intellectual Property AB and later taken over by Prevas AB [9]..

(31) 10. Chapter 2. Background and Motivation. The RTU architecture used in the work of this thesis is a uni-processor solution that originates from the commercial Sierra RTOS3 provided by Prevas AB [9]. The RTU is mainly implemented in hardware description language VHDL, and a small software part, the RTU driver, implemented in C-code and assembler. The RTU-driver is the application programmer’s interface (API) that utilizes the hardware implemented functionality, i.e., service call functions that communicate with the hardware part. The hardware part together with the software driver makes a small complete RTOS. The service calls provide conventional RTOS functionality such as task management (creation, starting, blocking etc.), external interrupt management (initiation, waiting etc.), inter process communication (binary semaphores, flagbit operations etc.) and timing management (delay and periodic start of tasks etc.). The RTU version described in this section provides 16 tasks, 8 priority levels, 16 semaphores, 4 flagbits, 8 external interrupts and a maximum delay value of 1 024 RTOS clock-ticks.. Figure 2.1: Simplified view of the RTU in a system. (RTU hw - RTU hardware part, RTU sw - RTU software driver).. The RTU in this version is dependent on a general purpose CPU in order to execute its software driver. The RTU is connected to the CPU with the CPU bus and with an internal CPU interrupt, as shown in Fig. 2.1. The CPU bus is used for communication between the software RTU-driver and the memory mapped registers in the RTU hardware. The internal CPU interrupt is used by the scheduler in order to notify the CPU of changing execution to another task. 3 version. 4.01.

(32) 2.3 The Real-Time Unit (RTU). 11. Figure 2.2: RTU architecture [9]. (GBI - Generic Bus Interface, TDBI - Technology Dependent Bus Interface).. The RTU hardware part consists of the internal components presented in Fig. 2.2. The RTU hardware receives its instructions from the service calls over the Technology Dependent Bus Interface (TDBI). The instructions are decoded and routed from the Generic Bus Interface (GBI) to the subcomponent that will handle the instruction. The IPC available in the RTU are binary semaphores and flags (subcomponent "Sem" in Fig. 2.2). The component includes task queues for synchronization according to semaphore and flag events. The timer component handles service calls in association with delay and periodic start of tasks (subcomponent "Timers" in Fig. 2.2). It also includes functionality for RTOS clock-tick. In conventional software RTOS, the RTOS has to check the task delay queues, decrease each task’s timer and re-schedule a task if a timer has expired. This procedure has to be performed in certain time intervals, and each time is the CPU interrupted and has to execute software code for handling this. In the RTU, this procedure and task queues are performed in the hardware part of the RTU and relieved completely from the CPU. The CPU is only interrupted if the delay queue checking results in a context switch. External interrupts (subcomponent "Irq" in Fig. 2.2), are directly connected to the hardware part of the RTU as shown in Fig. 2.1 and 2.2. The hardware implemented functionality of this process is to receive external interrupts and inform the scheduler of the event. Instead of interrupt service routines, an in-.

(33) 12. Chapter 2. Background and Motivation. terrupt triggers a task that is set to be waiting for that particular external event. The task is then scheduled according to priority as any other task. The component includes task queues for handling the external interrupt waiting process. The scheduler (subcomponent "Scheduler" in Fig. 2.2), includes a hardware implemented scheduling algorithm and is pre-emptive in this version and executed in parallel to the CPU. The task queues are implemented in hardware as well. The scheduler notifies the CPU when a context switch is about to occur by triggering an internal interrupt (shown in Fig. 2.1). A context switch (or task switch), is the process of saving current task’s context (the current values of the CPU registers) and then restore the context of next task to be run [5]. In the RTU, the context switch is implemented in software (assembler) but preparation of finding next task to be executed is performed in hardware by the scheduler. The information of which task to switch to is transferred to the RTU driver in connection with triggering the interrupt to the CPU. The RTU can be used in mainly two aspects, as stand-alone as a complete small RTOS or together with another software RTOS. Earlier versions of the RTU used together with another RTOS has been presented in [10, 11] (described in Section 4.1). Further, Paper B in this thesis presents an example where the RTU is used together with software implemented MicroC/OS-II [12]. The RTU has been implemented to replace the scheduling and partial task management, semaphore handling and flag handling in MicroC/OS-II. The motivation is to achieve acceleration when parts of a software RTOS is replaced by the RTU hardware implemented functionality. The integration of the RTU is carried out at service call level which makes it possible to preserve the software RTOS programmer’s interface.. 2.4 Configurability In today’s conventional software implemented RTOS, a lot of functionality is available in order to satisfy a large variety of customer requirements. For this reason, many RTOS are configurable in order to only include functionality that will be used in the application. Configuration possibilities make it possible not to use more system resources than absolutely necessary, motivated by decreased memory usage, an important matter in resource restricted environments such as embedded systems and FPGA designs. RTOS configuration has impact on several important aspects in the area of embedded systems such as performance issues, power consumption and resource usage (memory and FPGA area). Another kind of configuration, not to be confused with RTOS.

(34) 2.4 Configurability. 13. configuration, is the possibility to re-configure FPGA designs at run time. The scope of this thesis is within the area of examining the impact of RTOS configuration on resources usage when configuration is performed at compile time (pre-runtime). Configurability was originally discussed in the area of complete operating systems, but since RTOS are becoming more widely used in embedded systems and FPGA designs, the need to adapt the real-time system to the limited embedded environment has increased the interest in configuring RTOS as well. Configurability broadens the use of a particular RTOS since it can be used in a wider context when it provides both limited and extended functionality. By being able to use as small amount of resources as possible, a more cost effective target FPGA can be used, which decreases the end product cost. Further, configurability is a possibility for a system to grow if new system requirements arise over time, extending system lifetime. There is no solid definition of RTOS, or OS, configurability. Other terms, such as scalability, modularity, flexibility, adaptability and customization have been used to describe the ability to configure an RTOS to include or exclude functionality [13, 14, 15, 16, 17, 18]. We use the term RTOS configurability to describe the ability for a system designer to enable desired functionality, and disable undesired functionality, among functionality provided by an RTOS at compile time. We define an RTOS to be configurable if the system designer can configure it only by changing the values of the RTOS configuration parameters. The configuration must be possible to perform without changing the source code files. To what extent an RTOS is configurable, and exactly how this is accomplished, is RTOS dependent. Pre-runtime RTOS configuration is performed by defining the services needed at compile time. This is accomplished by setting flags, or in the case of libraries, having the linker include the services used by the application [13]. The programmer can use a specific configuration file, or use an RTOS specific GUI provided in the system development tool that generates the configuration file automatically after user input. Fig. 2.3 and 2.4 show a simplified view of the idea of RTOS configuration. RTOS configuration may concern enabling/disabling of RTOS functionality regarding: • Tasks • Scheduling priorities • IPC such as semaphores, flags, message queues, message boxes and mutexes..

(35) 14. Chapter 2. Background and Motivation. • Timing functionality • Stack usage • Memory allocation • Processor endianess • Processor datawith (8, 16, 32 or 64 bit wide) • Error checking • Debugging possibilities. Figure 2.3: Simplified figure of a not configurable RTOS.. Figure 2.4: Simplified figure of a configurable RTOS.. Configuration can be performed in three categories: 1. Enabling or disabling of a certain functionality. An example of this could be the ability to exclude the use of message box functionality completely, code for creating and using message boxes are not included in compilation and hence not a part of the final software footprint in memory. Another example is the possibility to disable a subset of a certain functionality, e.g., disable counting semaphores but still be able to enable the use of binary semaphores..

(36) 2.4 Configurability. 15. 2. Enabling and disabling options may be applicable on function or service call level within a certain functionality. If an RTOS for example provides a service call for reading the status of a semaphore, which is not desired for use by the system designer, that particular service call may be disabled, even though the other semaphore service calls are enabled for use. 3. Configuration of number of items of a certain functionality (e.g., number of semaphores, number of tasks or scheduling priorities). When configuring number of items of a functionality, the number is optional within a certain range, where input zero may mean that the functionality is excluded completely from use and from software footprint. Even within these three categorizations, the granularity of configuration varies for each particular RTOS, affecting the resource usage to different extent. When an RTOS is partly implemented in hardware like the RTU (presented earlier in Section 2.3), not only memory usage is motivated for configuration; occupied amount of FPGA area has to be considered as well. A disadvantage in using hardware support for RTOS is, naturally, the occupation of FPGA area. Configurability addresses this issue by optimizing FPGA area footprint. Pre-runtime configuration of hardware components is accomplished by using generic VHDL design. Generic design makes it possible to pass information into a design description of a component by setting generic parameters, e.g., the size of an input port [19]. By doing this, the size can be varied according to system requirements. Similar to configuring software implementations, the programmer may set the generic parameters in connection with a specific GUI provided in the development tool or use a specific configuration file. Generic design can be applied to internal components as well as the interface of an outer component in a component hierarchy. In a hardware/software real-time kernel, like the RTU, configuring the hardware part of the kernel would mean configuration of number of tasks, semaphores, flags, external interrupts, scheduling priority levels and datawidth of the CPU interface. The software part of the RTU, the RTU driver, would be configured similarly. However, it is quite a complex task to conform the hardware part and the software part when both parts are configurable. Each supported configuration and each configurable functionality of the hardware part has to be adapted by the software part down to bit manipulation level, i.e., the communication interface between the RTU hardware part and software part requires special attention in order not to cause faulty behavior. The configurability modifications performed on the RTU is presented in Paper C (Chapter 9).

(37) 16. Chapter 2. Background and Motivation. and further analyzed with a mathematical model in Paper D (Chapter 10).. 2.5 Modeling Resource Usage Estimation This section is an introduction to an approach on modeling resource usage estimation for RTOS in FPGA designs. The results and analysis of the model is presented in Paper D (Chapter 10). A model is a set of assumptions describing how a system behaves, usually expressed in the form of mathematical or logical relationships. A mathematical model represents a system in terms of logical and quantitative relationships described with equations. The equations are manipulated in order to examine how the model reacts to stimuli [20]. If the model is valid, a certain set of stimuli makes the model produce a result that is very close to what the system it represents would have produced. Paper D in this thesis (Chapter 10) proposes a mathematical model for estimating how much resources an RTOS consumes in an FPGA. The need for estimating resource usage arises when an RTOS is implemented to be configurable (discussed in Chapter 2.4). A mathematical model is suitable for describing how an RTOS behaves in consumed amount of resources at different configurations, and an adequate tool for comparisons of resource usage between different types of RTOS. Proposed model estimates the resource usage of a certain RTOS configuration. The model consists of a set of equations that each represents consumed resources of a subset of an RTOS. Together the equations estimate the total amount of resources consumed by an RTOS. If the RTOS is implemented in software, the model estimates consumed amount of bytes, and if the RTOS is implemented in hardware, the model estimates consumed amount of logic cells (LC)4 . The model estimation is dependent on the input stimuli. Input stimuli of proposed model is a set of input parameters representing a subset of the values of configurable features in an RTOS (E.g., number of tasks, semaphores etc.) Since a model is a simplified view of a real system, it is possible to obtain the results from the model more rapidly than to obtain the same results from a real system. The user only needs to set the values of the model input parameters to the pre-defined equations and then obtain an estimation. In a real system, the user would first need to purchase and install the hardware/software design tool and the RTOS source code with prerequisite components such as a CPU and memory, configure the RTOS before compilation and then finally read the 4 A logic cell is a basic building block in FPGA area and is the smallest unit of logic in FPGAs provided by Altera [21]..

(38) 2.5 Modeling Resource Usage Estimation. 17. resource usage result. If two RTOS is to be compared, this process would need to be repeated. Even though it is sometimes possible to obtain time limited editions of design tools and RTOS for free, the installation and start-up process takes quite a lot of time. With proposed model, estimations of resource usage can be obtained in advance and the model is constructed to simplify RTOS comparisons to be as fair as possible. Once the resource usage behavior is captured by a model, effects of different configurations can be shown rapidly and with little input effort. Proposed model is useful when a fast and accurate estimation of resource usage is required in mainly the three following aspects. First, accurate estimations is important in the planning stage of designing a real-time system application. The amount of resources consumed by a certain RTOS configuration must be known before implementation in order to be able to decide which type of FPGA to implement the system on, and thereby estimate product cost. Regarding hardware support, resource usage concerns both resource usage of memory and FPGA area. A design’s consumption of FPGA area varies depending on which FPGA it is implemented in, and this needs to be considered as well when performing resource usage estimations. The model can estimate resource usage based on different configurations regarding included amount of RTOS functionality that affects the amount of resource usage required by an RTOS. An example of this is the number of available tasks and priority levels the RTOS is configured to provide that greatly affects the amount of resources required. By estimating the resource usage of the RTOS and its occupied part of the application or FPGA area, the required kernel resource usage becomes known and it becomes less complicated to predict resource usage for the rest of the application and other system components in both hardware and software. Second, the model can estimate the impact of modifications of an existing real-time system application. Early estimations of resource usage enables trade-offs during the design phase and is useful when examining the impact of extending an existing system, important if available resources are few. A modification could for example imply increased number of tasks and the model can be used in order to estimate how consumed amount of hardware and software resources are affected of the modification, compared to the original configuration. Third, the results of the model can be used for analyzing differences of a software implemented and a hardware/software implemented RTOS. When introducing new types of RTOS, such as hardware supported RTOS, it is desirable to compare the differences in resource usage compared to conventional.

(39) 18. Chapter 2. Background and Motivation. software implemented RTOS. Since hardware supported RTOS are not a commonly used product on the market, it is necessary to know the resource usage of different RTOS configurations in order to see if it can be afforded in cost of FPGA area. This is because conventional RTOS solution does not consume any FPGA area at all. A model describing the resource usage of both hardware supported RTOS and conventional software RTOS is an adequate method for analyzing differences in resource usage for trade-off decisions. The results from such a model can bring a clearer view of how hardware support is beneficial or unfavorable regarding resource usage. Resource usage estimation is a good basis for optimization decisions regarding hardware support design and future development. Further, since there for the most part is possible to handle implementations of system requirements in several ways, estimations are useful when examining options in configurations of RTOS functionality. It should be possible to know the resource usage at different configurations of a planned system in order to make trade-offs before implementation and before deciding which RTOS to use and purchase. Additional to producing valuable results, a model enables comparison analyzes to be feasible since the results are produced very rapidly compared to methods involving manipulations of real systems. In order to be able to draw conclusions from the results of a model, it has to be proven to be credible. This is accomplished with model validation, a procedure concerned with determining whether the model is an accurate representation of the system under study [20]. If a model is valid, the model produces results that are very close to what the real system would have produced. Proposed model was validated with real system results at four different sets of configurations, i.e., four different combinations of the configuring input parameters. The real system results was obtained from two different systems, one with the hardware/software implemented RTU and one with the software implemented MicroC/OS-II. The model validation showed the model to have an average estimation error less than 1.0% for estimating resource usage of memory and an average estimation error of 3.9% for estimating resource usage in FPGA area, as described more thoroughly in Paper D (Chapter 10). Since the average estimation error is less than 10% the model is considered accurate..

(40) Chapter 3. Problem Description Existing hardware support for RTOS and benefits associated with such hardware/software partitioning have been discussed. With that as background, we will describe our main research problem formulation with focus on how to improve the possibilities to take advantage of these benefits associated with hardware support. Since RTOS solutions with hardware support are implemented in both hardware and software, it has been criticized for not being as flexible and customizable as traditional software solutions regarding changes, extensions and adaptability. This is the view of stand-alone RTOS with hardware support as well as hardware support ported to another existing RTOS for acceleration purposes described earlier. This problem is addressed with the first research question: Question 1 (Q1): How can customization of hardware support for RTOS and its software driver (the RTU) increase and simplify its use? Q1 is to some extent addressed in Paper B since it describes an implementation where the RTU accelerates service calls in another RTOS (MicroC/OS-II) and has been an important base for the work presented in Paper C that provides the answer to Q1. Applying configurability to hardware support will affect the resource usage in both memory and FPGA area. When comparing a hardware support RTOS with a traditional software RTOS regarding resource usage, a disadvantage is that hardware support occupies FPGA area and a software RTOS none. On the other hand, hardware support occupies less memory while functionality 19.

(41) 20. Chapter 3. Problem Description. is placed in the hardware part instead. This needs to be established and is addressed in the second question: Question 2 (Q2): What are the resource usages in memory and FPGA area for configurable hardware supported RTOS (the RTU) and a conventional software RTOS, at different configurations? Further, if an embedded system project group plans to use hardware support in a future system it is of great value to know the actual resource usage for traditional software RTOS or an RTOS with hardware support in advance, i.e., before implementation. When state-of-the-art technology like RTOS with hardware support are to be introduced in industrial embedded systems, the system designers must have a clear view of actual resource usage that might be involved, compared to conventional software RTOS solutions. This is addressed in the third question: Question 3 (Q3): How construct a model for resource usage estimation of real-time kernels regardless if it is implemented in software or hardware/software? Q2 is answered in Paper C and further analyzed in Paper D, which also provides answer to Q3.. 3.1 Research Method In order to obtain experimental results that could bring answers to the research questions presented above, several systems were built. In the work presented in Paper B (Chapter 8), the RTU was ported to the commercial software implemented MicroC/OS-II for service call response time measurements. This required two equal systems, one with MicroC/OS-II only and one with MicroC/OS-II ported to the RTU driver. The systems were built with FPGA vendor Xilinx tools for FPGA design, the Xilinx Embedded Development Kit (EDK) with the soft core MicroBlaze CPU [22]. Overview of the setup of the systems are shown in Fig. 3.1 and 3.2. After the experiments in Paper B, design tools, FPGA and CPU were exchanged to corresponding equipment from FPGA vendor Altera [21]. The change of CPU required a complete porting of the RTU driver. Bus communication, interrupt connection and context switch routine in the RTU driver were.

(42) 3.1 Research Method. Figure. 3.1: Setup of the MicroC/OS-II system with the MicroBlaze CPU in a Xilinx Virtex-II FPGA [22, 23].. Figure. Figure. Figure 3.4:. 3.3: Setup of the MicroC/OS-II system with the NiosII/e CPU in an Altera Stratix FPGA [21].. 21. 3.2: Setup of the MicroC/OS-II/RTU system with the MicroBlaze CPU in a Xilinx Virtex-II FPGA [22, 23].. Setup of the RTU system with the NiosII/e CPU in an Altera Stratix FPGA [21].. adapted to function on the soft core NiosII/e1 CPU. The systems were built in Altera QuartusII, SOPC Builder and NiosII IDE development tools. The systems shown in Fig. 3.3 and 3.4 were used during the configuration development of the RTU hardware part and software driver as well as during the construction and validation of the mathematical model for resource usage estimation 1 Economy. settings of the NiosII CPU were used..

(43) 22. Chapter 3. Problem Description. in order to obtain results to answer research question Q2 and Q3, presented in Paper C (Chapter 9) and Paper D (Chapter 10)..

(44) Chapter 4. Related Work This chapter presents related work regarding academic research in the area of hardware support for real-time operating systems and how it relates to this thesis. Since there have been in-house research performed on hardware support during a long time at Mälardalen Real-Time Research Centre (MRTC), a brief summary of what has been done is given in Section 4.1. Section 4.2 is an overview of external related work on hardware support for real-time operating systems discussed with focus on configurability. Further examples of hardware support for RTOS are presented in a state-of-the art report in Chapter 7 of this thesis. Related work regarding models and resource usage estimation in FPGA designs is discussed in Section 4.3.. 4.1 Previous Research on the Real-Time Unit This section presents research performed on hardware support at Mälardalen Real-Time Research Centre (MRTC). The research presented in this section concerns earlier versions of the RTU and originates from the work of Lindh et al. [8, 24]. Realizing a Real-Time Kernel for Single Processor Systems Obtaining absolute timing determinism is one of the main reasons given why the kernel was implemented in hardware [8, 24].The first prototype of the kernel, FASTCHART, was a micro-processor with an integrated hardware kernel 23.

(45) 24. Chapter 4. Related Work. in one chip. This enabled context switch in hardware registers in one clock cycle. In the second prototype, FASTCHARD, the kernel was implemented as a separate unit on a stand-alone chip, adjusted to a CPU [25]. The FASTCHARD was a part of a minor real-time system consisting of a CPU, main memory and I/O ports. The system bus and an interrupt line connected the CPU and FASTCHARD. Eight external interrupts were connected; it contained seven registers and task-queues implemented in Random Access Memory (RAM). Further developments have been implemented based on the second approach. Multiprocessor Real-Time Kernel in Hardware When the hardware kernel was introduced into multiprocessor systems it was called Real-Time Unit (RTU). The first version with support for multiprocessor systems was called RTU94, followed by RTU95. Both RTU94 and RTU95 could handle three CPUs (see Fig. 4.1). More functionality and improvements were added. The number of function calls was increased to include semaphores, event flags and watchdogs and a real-time clock for continuous supervision of time was added. This means that all administration of resources that have any kind of time dependency has to be supported by the RTU, e.g., distribution of CPU time, semaphores, flags and sorting of queues. To sup-. Figure 4.1: RTU in multiprocessor system architecture [26]. port a multiprocessor environment, the RTU was implemented to consist of one scheduler for each CPU and tasks could be initialized to execute on a fixed CPU (local task), or on any CPU (global task). There was one ready queue for each CPU and one queue for the global tasks. Each scheduler checks both the local and the global queues [3]. Since the RTU has knowledge of the load.

(46) 4.1 Previous Research on the Real-Time Unit. 25. of each CPU it can be used for dynamic load balancing. In [26] the RTU was presented as a co-processor in multiprocessor environments. In [27, 28, 29] the RTU was used in a research project in multiprocessor systems called Scalable Architecture for Real-Time Applications (SARA). The SARA-system is based on the idea to incorporate as many parts of a realtime operating system into hardware as possible. The scalability of the SARAsystem could be used in the transition from a single processor system into a multiprocessor system. The RTU handled the scheduling of the system. In [27] the RTU based dynamic scheduling decisions on extra observability in the form of load information from bus-monitors. Benchmarking and Comparisons In [10, 11], the RTU is used together with a pure software RTOS handling the scheduling activities. This results in the RTU accelerating the software RTOS to some extent. Fig. 4.2 illustrates the RTUs placement in the system.. Figure 4.2: Overview of a software and hardware implemented real-time kernel solution [26]. In [10], the real-time kernel is called "booster" and the functionality is reduced to merely consist of the scheduling. It is used together with an RTOS implemented in software. Benchmark of a model of a telecommunication application running in three different systems was performed. The systems were: 1. A processor supervised by a commercial single processor RTOS..

(47) 26. Chapter 4. Related Work. 2. A processor supervised by an RTOS with the booster (the RTU). 3. Two processors supervised by an RTOS with the booster (the RTU). Application response time and RTOS overhead for clock tick administration was measured with and without data located in local or global accessed memory, or cache memory. The conclusions were that a real-time kernel in hardware (the RTU) decreases the application response time, a fast memory system decreases the difference in using and not using a hardware kernel and the clock tick administration is zero when using a hardware kernel. In [11], the real-time kernel in hardware, here referred to as the Real-Time Unit (RTU), was used in a performance comparison with two other RTOS: a pure software Atlanta RTOS, and a hardware/software RTOS composed of part of Atlanta RTOS interfaced to a System-on-a-Chip Lock Cache hardware (SoCLC). The SoCLC is a hardware support to accelerate software locks and semaphores in a software RTOS [30]. All systems contained three processors running a database application with many different task level synchronization scenarios. A framework to generate the three system configurations was used. In measuring the average-case simulation time, the RTU system showed best performance, a 50% speed-up over case performing on 6 tasks and 36% speedup performing on 30 tasks, compared to the pure software RTOS. The RTU system also had best performance when number of clock-cycles spent on communication, context switch and computation was measured. In [31] a performance comparison between the real-time kernel in hardware and a corresponding kernel in software, in a multiprocessor system, was done. The software kernel was especially implemented for this comparison using almost the same API as the hardware kernel uses. The speed-up achieved with the hardware kernel was 2.6 times. Other results were that the time for creating tasks in the software kernel increases with number of tasks while it is constant in the hardware kernel. This is because of list management that increases with number of tasks in a software kernel. It was discovered that the software kernel was faster when tasks were created on a master node, since it can draw benefits from using system cache in this case while the hardware kernel suffers from Peripheral Component Interconnect (PCI) bus access latencies. Realizing Special Purpose Hardware Components Utilizing the Real-Time Kernel in Hardware Having the kernel implemented in hardware makes it possible to create other hardware components that can benefit from the fact that they can be integrated.

(48) 4.2 Hardware Support and Configuration. 27. with the kernel in different useful aspects. Monitoring RTOS kernel activities with the Multiprocess Application Monitor (MAMon) is a non-intrusive monitor that gives observability into the execution of a single- or multiprocessor system supporting the real-time kernel in hardware [32]. MAMon is an integrated solution to on-chip monitoring of system-level events in real-time systems. The observability comes from a probe unit, which is integrated at the HDL-level of the hardware kernel, detecting and collecting events regarding process execution, communication, synchronization and I/O interrupt activities. Collected events are time stamped and transferred to a separate computer system hosting an event database and a set of monitoring application tools that shows the results graphically. A hardware implementation of asynchronous IPC in an RTU based architecture is presented in [33]. It was investigated how performance and message flow in a message intense system could be increased by adding some functionality, like message priority, to the IPC functions and implement it in an RTU architecture. This resulted in an IPC-RTU, the ordinary RTU with an augmentated instruction set. The IPC implementation supported message priority, priority inheritance on message arrival and task time-out on message send/receive. The IPC administration, sorting message queues etc. was placed in the RTU. The conclusions were that it is possible to implement IPC in hardware but that the design becomes too big to fit into one FPGA. Hardware Kernel Energy Consumption In order to study the RTUs impact on system energy consumption, an energy characterization of the RTU was performed in [34]. The results obtained showed that the power consumption is independent of what function the RTU performs and that power consumption during idle periods are approximately the same as during system calls. The conclusion was that the RTU needs to be power optimized, using techniques such as gated clocking, in order to beat a software based RTOS.. 4.2 Hardware Support and Configuration Related work in the area of hardware support for RTOS combined with configuration has been performed by Mooney et al. A configurable hardware scheduler is presented in [35]. Configuration in context of the scheduler is re-configurable, i.e., the scheduling mode can be changed at runtime and the.

(49) 28. Chapter 4. Related Work. scheduler provides three scheduling disciplines: Priority-based, Rate monotonic and Earliest deadline first. Other functionality of the hardware support are however only configurable at compile time and not possible to manipulate during run-time and concerns number of tasks, external interrupts and timer resolution. The hardware support is configured with a GUI (Graphical User Interface) customization tool that sets the hardware and software parameters, and generates the associated hardware files in Verilog and software driver files in C [36]. Additional to configuration regarding the hardware support, processor type is selectable with the GUI tool as well. Similar to the RTU, number of tasks, external interrupts and timer resolution is configurable at compile time, except that the RTU has further configurability options. Regarding scheduling, the RTU provides a fixed scheduling algorithm and is not configurable at run time. Further, the result of pre-runtime configuration methods are similar. The configurable hardware scheduler GUI tool generates files according to user input that after compilation results in a customized solution of provided functionality. The RTU provides a software and a hardware configuration file for user parameter input that results in a customized solution after compilation as well.. Related work in the area of hardware support for RTOS combined with resource usage estimation has been performed by Nakano et al. The Industrial TRON (µITRON) is a subproject of The Real-time Operating System Nucleus project (TRON). In [37, 38] a solution that consists of a hardware part, called "silicon TRON", and a software part, called ITRON is presented. The Silicon TRON together with the ITRON is called a "Silicon OS". The hardware support part implements task scheduling, semaphores, flags and external interrupt management. Configurability is not described specifically but variations in number of tasks, semaphores, flags and timer resolution are reported when FPGA area resources usage is analyzed. The Silicon OS size in memory are one third compared an equivalent software RTOS and FPGA area are examined for variations in number of tasks from 3 to 16 with varied number of semaphores and flags. The hardware/software partitioning, provided functionality and connection to a general purpose CPU is very similar to the RTU. However, the focus of the Silicon OS is performance more than resource usage. The Silicon OS FPGA area resource usage estimation is only presented in variations in number of tasks and an estimation formula is not presented. Neither is comparison with a software RTOS further analyzed..

(50) 4.3 Models and Estimations. 29. 4.3 Models and Estimations The work of Xu et al. and Kulkarni et al. relates to this work regarding proposed model for RTOS resource usage estimation presented in this thesis (in Chapter 10). Regarding how proposed model performs estimation of hardware support, the work of Xu et al. and Kulkarni et al. is related in the aspect of estimation models for FPGA area resource usage but has not been applied to hardware/software real-time kernels. Xu et al. [39] present an approach for estimating area, timing and wiring effects for look-up-table based (LUT) FPGAs based on a model, applicable for Xilinx FPGAs [22]. Regarding area, an algorithm takes a netlist as input and predicts resource usage of internal FPGA elements at several stages with an accuracy of average estimation error of 2.0%. Kulkarni et al. present a compile-time area estimation technique for their own developed compiler for high-level language SA-C (Single Assignment C). The SA-C code is eventually translated to VHDL and compiled into FPGAbased reconfigurable computing systems. An estimation model is developed to aid the compiler in selecting optimizations that affect the resource usage in LUTs during compile time, available before translation into VHDL. The algorithms used in the model are less complex and faster compared to commercial synthesis tools. Experimental results show that they reach an accuracy of 2.5% [40]..

(51)

(52) Chapter 5. Summary of Papers This Chapter presents summaries of the papers included in this thesis.. 5.1 Paper A (Chapter 7) Susanna Nordström, Configurability and Hardware Support for Real-Time Operating Systems - A State of the Art Report, MRTC Technical Report, Mälardalen University, Västerås, Sweden, February 2008. Summary: This state-of-the-art report describes the concepts of hardware support and configurability followed by an overview of research performed in the area of hardware support for real-time operating systems. The prototypes within the different research projects on hardware support are compared and discussed regarding provided functionality, hardware/software partitioning of provided functionality and extent of configurability. The comparisons are compiled into a summary of similarities and distinctions between them.. 5.2 Paper B (Chapter 8) Susanna Nordström, Lennart Lindh, Lars Johansson and Tobias Skoglund, Application Specific Real-Time Microkernel in Hardware, In Proceedings of 14th IEEE-NPSS Real-Time Conference, Stockholm, Sweden, June 2005 (Poster presentation).. 31.

(53) 32. Chapter 5. Summary of Papers. Summary: The paper presents how the RTU hardware support is ported to the commercial software implemented MicroC/OS-II at software driver level. This means that the RTU replaces parts of MicroC/OS-II, i.e., scheduling, partial task management, semaphore handling, and flag handling that becomes performed in hardware. Benchmark of service call response time of MicroC/OSII/RTU prototype showed that the response time was reduced in 75% of the measurements. The reduced response time of the prototype varied from being 27% to 93% of the MicroC/OS-II original service call response time. The results pointed out a few occasions where the RTU limited the implementation due to not being configurable in current version. MicroC/OS-II does not support several tasks at same priority level and since the RTU only supported 8 priority levels, the MicroC/OS-II/RTU prototype handled only 8 tasks. Similar, the RTU supported only 4 flagbits and was extended to handle 8 flagbits but MicroC/OS-II could handle more. Further, the delay service call limited the MicroC/OS-II/RTU prototype since the RTU could delay a highest value of 1 024 clock-ticks and MicroC/OS-II could delay 65 534 clock-ticks. In the prototype, this difference was compensated with repeated calls to RTU delay call in order to achieve a delay longer than 1 024 clock-ticks, which was not effective performance vise. There were also differences in functionality where MicroC/OS-II provided more functionality. The results from this paper showed that it would be possible to reduce service call response time by having the RTU ported to another RTOS to greater extent, if the RTU was modified. The results from this paper was guidance in how to accomplish that.. 5.3 Paper C (Chapter 9) Susanna Nordström and Lars Asplund, Configurable Hardware/Software Support for Single Processor Real-Time Kernels, In Proceedings of International Symposium on System-On-Chip Conference, Tampere, Finland, November 2007 (Poster presentation). Summary: This paper presents how the RTU hardware part and software driver were modified in order to accomplish configurability. Configurability limits and hardware and software footprint was compared to software implemented MicroC/OS-II. Four different configuration settings where both the RTU and MicroC/OS-II could provide equal functionality were found in order to make as fair comparisons as possible..

(54) 5.4 Paper D (Chapter 10). 33. The results show that the RTU memory footprint was 24% to 38% of the size of the MicroC/OS-II footprint. In FPGA area, the smallest configuration of the RTU occupied only 50% of the logic cells used by the largest RTU configuration. The footprint results were compared with the cost (in USD) of consumed resource usage in different sized FPGAs.. 5.4 Paper D (Chapter 10) Susanna Nordström and Lars Asplund, Model for Resource Usage Estimation of Configurable Real-Time Kernels in Hardware and Software, Submitted to Real-Time Systems Journal, February 2008. Summary: When the RTU is configurable, it becomes important to be able to estimate the resource usage of the RTU, in both hardware and software, at different configurations. This paper presents the construction of a model for estimation of resource usage, for configurable real-time operating systems implemented in software or hardware/software. The model was validated on commercial software implemented MicroC/OS-II and hardware/software implemented RTU, at different configurations of functionality. The model preciseness is presented accurate since average estimation error is 0.2% for estimating memory resource usage and 3.9% average estimation error for estimating FPGA area resource usage. The usefulness of the model is motivated. When stat-of-the-art architectures, such as real-time operating systems with hardware support are introduced in industrial systems, it is described how the model provides the system designer with a clear view of differences in resource usage compared to conventional software solutions and advises designers when choosing real-time operating system during the design phase of a system. Further, it is presented how the hierarchical component-based structure of the model enables low-level analysis of internal modules in hardware/software trade-offs in real-time operating systems. Model analysis is performed regarding the effects in decreased memory usage due to hardware support which is most visible in the size of the constant part of the real-time operating system driver where the MicroC/OSII memory resource usage is 339% to 356% larger than corresponding RTU software part..

(55)

(56) Chapter 6. Summary and Conclusions We have modified and extended the Real-Time Unit (RTU), and made it configurable for increased usability. In addition to this we have constructed a model for resource usage estimation of configurable software and hardware/software RTOS. The model was validated with actual system results obtained from two different systems, one with the hardware/software implemented RTU and one with the software implemented MicroC/OS-II. The model preciseness is very accurate since the validation showed the model to have an average estimation error of 0.2% for estimating resource usage of memory and an average estimation error of 3.9% for estimating resource usage in FPGA area. The hardware part of the model can be applied for estimating resource usage in different circuits even though the resource usage result is FPGA dependent. This is possible because of a circuit factor (Cx ) that conforms the result of the model according to current circuit. The hierarchical module-based structure of the resource usage model enables low-level analysis of internal modules in hardware/software trade-offs in RTOS design. Configurability of an RTOS with hardware support, like the RTU, increases the scope of usage for this kind of hardware/software implementations in the following ways: • Configurability contributes to decreasing project cost since it is more likely to be able to use a low cost FPGA when RTOS functionality can be configured to only include what is required by the application. • The scope of usage of hardware support becomes greater when a variety of configuration requirements are supported. System lifetime is pro35.

(57) 36. Chapter 6. Summary and Conclusions. longed when the hardware supported RTOS in an existing system can be extended in number of features and functionality, and does not have to be exchanged to another more extensive RTOS. • When hardware support is used for acceleration, i.e., ported to another RTOS, the configurability characteristic makes the hardware support more adaptable to other RTOS, which facilitates the porting process. A model for resource usage estimation contributes to improving the usability of hardware support for RTOS in several aspects: • In the design phase of a real-time system application, it is important to know the resource usage of the system components at different possible configurations. Since a hardware supported RTOS differs from conventional software RTOS solutions it must be possible to ascertain how the hardware and software related resource usage is divided, before building the system. Most RTOS application designers are only familiar with conventional software RTOS and need to see the differences clearly in order to feel secure when using unconventional technology. Resource usage is one of these differences and proposed model can estimate resource usage for different configuration accurately and rapidly. • When planning for an extension of an existing real-time system application, a resource usage estimation model is an appropriate aid for determining what impact a more extensive configuration may have on resource usage. If the resource usage exceeds the amount of available resources, a larger FPGA may be required which affects the product cost in large volumes. • Since a hardware supported RTOS differs in resource usage from a conventional software RTOS, it is desirable to be able to compare differences in order to decide what kind of RTOS that constitutes the most profitable solution for a planned system. It is important to determine if the benefits of hardware support is affordable, i.e., what is the cost in FPGA area of a certain configuration of the RTOS? Or, how much memory is saved when using a hardware supported RTOS compared to a conventional one? The model is an important tool in order to answer these questions..

(58) 6.1 Future work. 37. 6.1 Future work This Section discusses directions of future work from perspective of the work presented in this thesis. Regarding configuration effects, the possibilities of discarding a RTU internal hardware component completely, if its provided functionality is not configured to be used, could be examined. This is possible in software but a bit more complex in hardware programming and in current RTU implementation. Further, since this thesis has analyzed resource usage in software and hardware/software implemented RTOS for different configurations, the analysis could be accomplished with benchmarks in order to analyze what impact configuration has on performance. The scope of proposed model could be extended to include resource usage estimation for additional RTOS as well as extended to include resource usage estimation of all service calls and functionality of an RTOS. The hardware part of the model can be made applicable for additional FPGAs from other FPGA vendors, or other types of circuits. The proposed circuit factor for conforming the model estimations to be applied for different FPGAs could be further examined. Moreover, the work presented in this thesis, as well as suggestions for future work, could be continued in multiprocessor systems..

(59)

(60) Bibliography [1] Xilinx at work in set-top boxes, white paper. www.xilinx.com, 2007.. Xilinx Inc, USA,. [2] L Lindh. Utilization of Hardware Parallelism in Realizing Real Time Kernels. PhD thesis, Royal Institute of Technology, Stockholm, Sweden, 1994. [3] L Lindh, J Stärner, and J Furunäs. From Single to Multiprocessor RealTime Kernels in Hardware. In IEEE Real-Time Technology and Applications Symposium, Chicago, USA, May 1995. [4] C Norström, K Sandström, J Mäki-Turja, H Hansson, H Thane, and J Gustafsson. Robusta realtidssystem. MRTC, Mälardalen University, Västerås, Sweden, 2000. [5] J J Labrosse. MicroC/OS-II The Real-Time Kernel. CMP Books, second edition, 2002. [6] S Sjöholm and L Lindh. VHDL för konstruktion. Prentice Hall PTR, fourth edition, 2003. [7] L Lindh and T Klevin. Programmerbara Kretsar. Studentlitteratur, 2005. [8] L Lindh and F Stanischewski. FASTCHART - A Fast Time Deterministic CPU and Hardware Based Real-Time-Kernel. In IEEE Euromicro workshop on Real-Time Systems, June 1991. [9] Prevas AB, Västerås, Sweden, www.prevas.se, 2007. [10] J Furunäs. Benchmarking of a Real-Time System that utilises a booster. In International Conference on Parallel and Distributed Processing Techniques and Applications, Mantova, Italy, June 2000. 39.

No results found