A research on debugging tools’ platform independency

(1)

A research on debugging tools’ platform independency

HANS BRICKNER

Master’s Thesis at ENEA Supervisor: Barbro Claesson

Supervisor: Detlef Scholle Supervisor: Kathrin Dannmann

Supervisor: Francesco Robino Examiner: Ingo Sander

TRITA-ICT-EX-2011:108

(2)

(3)

Abstract

Debugging of embedded systems is costly and time consuming, but imperative to system design. There are many different requirements on embedded systems and complying with these requirements has lead to many different kinds, and different configurations of embedded systems. The vast array of embedded systems, and the ever increasing complexity of the systems make debugging a growing challenge.

Different domains impose different requirements on the systems and as embedded system pervade in our society, new requirement are introduced. ENEA’s products target various domains as: telecom, medical and the automotive domain. Targeting these three domains means that ENEA’s products need to comply with various, and stringent requirements. A debugging tool used in a tool chain like the tool chain developed in the iFEST project, as intended for ENEA’s debugging tool Optima, needs to support various debugging methods, ranging from software debugging methods to methods utilizing embedded hardware for debugging.

The need for debugging tools in a tool chain to support various debugging methods lead to this master’s thesis, at and for ENEA. This thesis investigates methods to debug embedded systems in order to define enhancements of the debugging tool Optima that enables Optima to debug various systems and to debug all systems in ENEA’s target domains.

The thesis is divided into two parts: a pre-study and a development part. The pre-study covers debugging of embedded systems by studying articles and ENEA’s operating system OSE and debugging tool Optima. Conclusion drawn from the study of methods to debug embedded systems and the study of the debugging support in Optima, show that Optima needs to utilize embedded hardware for debugging. Updates for Optima that enable utilization of embedded hardware for debugging are designed and implemented in the development part of the thesis.

Hardware debugging facilities in the development platform targeted in this thesis are not fully incorporated. The lack of hardware debugging support in the platform makes it infeasible to debug the platform from software running on the processor core, and thus infeasible to improve Op- tima’s debugging capability by enhancing standard OSE debugging facilities to utilize embedded hardware for debugging. An external debugging tool, JTAG, is required to access the embedded debugging hardware in the target platform and to enable non-intrusive debugging.

(4)

Referat

Felrättning av inbyggda system är både kostsamt och tidkrävande men, oerhört viktigt i systemdesign. Det ställs många olika krav på inbyggda system och att gå alla dessa krav till mötes har lett till olika konfigurationer av inbyggda system. Det enorma antalet inbyggda system och sys- temens tilltagande komplexitet gör att felrättning är en växande utmaning. Olika domäner ställer olika krav på sys- temen och då inbyggda system blir vanligare i vårt samhälle införs nya krav. ENEA:s har produkter för olika domän- er, som t.ex.: telekommunikation, medicinsk och automobil industri. Att ENEA har produkter för dessa tre domän- er medför att ENEA:s produkter måste uppfylla olika och stränga krav. Ett felrättningsverktyg som skall användas i en verktygskedja lik den verktygskedja utvecklad i iFEST projektet, som är avsikten med ENEA:s felrättningsverktyg Optima, måste stödja olika felrättningsmetoder, allt från mjukvarumetoder för felrättning till att utnyttja inbyggd hårdvara för felrättning.

Kravet på att ett felrättningsverktyg i verktygskedjan stödjer olika felrättningsmetoder ledde till detta mastersarbete, för och på ENEA. Detta mastersarbete undersöker olika metoder för felrättning för att definiera förbättringar av felrättningsverktyget Optima som möjliggör för Optima att rätta fel i olika system och att rätta fel i all system i de domäner där ENEA har produkter.

Mastersarbetet är indelat i två delar: en förstudie och en utvecklingsdel. Förstudien täcker felrättning av inbyggda system genom att studera artiklar och genom att studera ENEA:s operativsystem OSE och felrättningsverktyget Op- tima. Slutsatser från studierna av metoder för att rätta fel i inbyggda system och studierna av felrättningsstödet som finns i Optima påvisar att det finns behov av att Optima utnyttjar inbyggd hårdvara för felrättning. Uppdateringar för Optima som möjliggör utnyttjande av inbyggd hårdvara för felrättning designas och implementeras i utvecklingsde- len av mastersarbetet.

Hårdvaraufaciliteter för att rätta fel är inte inkluder- ade till fullo i den utvecklingsplattform som används i detta mastersarbete. Saknaden av hårdvaraustöd för att rätta fel omöjliggör rättande av fel med mjukvara som exekver- ar på processorkärnan. Det krävs ett externt verktyg för felrättning, en JTAG, för åtkomst av hårdvaran i utveck- lingsplattformen och för att möjliggöra felrättning som inte påverkar systemet.

(5)

List of Figures

1.1 Enable Optima to debug systems in target domains . . . 2

2.1 The evolution of embedded systems . . . 8

3.1 Illustrating message passing. . . 13

3.2 Domains for which ENEA offers solutions . . . 16

4.1 Approaches for connecting the debugging interface with the processors on target: (a) Direct connect (b) Agent based . . . 31

5.1 Process states and the state transitions [1] . . . 37

5.2 Connection between Optima and the runtime monitor . . . 39

8.1 Illustration of demonstration application . . . 48

8.2 Performance Monitor Control Register [2] . . . 49

8.3 Excerpt from demonstration scenario 1 . . . 49

9.1 Debug features in the ARM11 processor . . . 52

9.2 Functional block diagram of the Embedded Trace Macrocell in the i.MX31 platform . . . 53

9.3 Functional block diagram of the Embedded Trace Buffer in the i.MX31 platform . . . 54

9.4 Functional block diagram of the Embedded Cross Trigger in the i.MX31 platform . . . 55

9.5 Functional block diagram of the Smart Direct Memory Access in the i.MX31 platform . . . 56

9.6 Example of CoreSight . . . 58

9.7 Embedded Tross Trigget topology, i.MX52 . . . 59

10.1 Use case test application . . . 64

10.2 Illustration of design 1 . . . 65

(9)

10.3 Embedded debugging modules in the i.MX31 platform, targeted debugging modules are encircled [3] . . . 66 10.4 Illustration of design 2 . . . 68 10.5 TRACE32-ICD PowerDebug [4] . . . 69

List of Tables

3.1 Table summarizing requirements for debugging of embedded systems . . 17 4.1 Table summarizing methods to debug embedded systems that summa-

rizes chapter 4 . . . 33 6.1 Table summarizing requirements for debugging of embedded systems and

Optima’s capability to meet those requirements . . . 43 8.1 Average cycle count and cache misses for demonstration scenarios . . . . 50 9.1 Table summarizing the embedded debugging support in the i.MX paltforms 62 E.1 Demonstration measurements table . . . 96 E.2 Summary of demonstration measurements table . . . 97

(10)

(11)

Chapter 1

Introduction

1.1 Background

Enea and KTH are part of an international research project, iFEST¹. iFEST is a larger Artemis² project of three year’s duration, now on its second year. The goal with iFEST is to develop a framework for establishing and maintaining a tool chain for development of complex industrial embedded system [5]. To maintain a tool chain over a long period of time i.e. a system’s lifetime of several decades, the tool chain must allow for tools to be interchanged and render it easy to integrate different tools in the chain. Establishing a tool chain where tools can be interchanged requires that tool configurations are identified and specified. ENEA³ targets several domains including the telecom, medical and automotive domains. ENEA has interests in iFEST regarding test and debugging tool specification.

1.2 Purpose

The purpose of this master thesis is to contribute to the framework developed in iFEST regarding debugging tools’ platform independence.

1.3 Problem statement

The project will investigate possible methods to debug embedded systems. De- bugging support for various systems, independent of real-time requirements and independent of system configuration would render the debugging tool target system independent, which could contribute to a platform independent tool chain.

1http://www.artemis-ifest.eu/

2http://www.artemis-ju.eu/

3http://www.enea.com/

(12)

CHAPTER 1. INTRODUCTION

1.3.1 Problem

Render Optima⁴ compliant with debugging methods that are required to meet the requirements for embedded systems in various domains, is essential for a tool in a tool chain and for the tool to be future proof.

1.3.2 Goal

Specify enhancements of Optima to support essential debugging methods for embedded systems in target domains.

1.4 Tasks to achieve the goal

Conduct a research on debugging methods for embedded systems, and a study of Optima and OSE⁵. Derive updates for Optima from the research and the study that will enhance Optima’s debugging capabilities. Verify the improvements of Optima with a use case and demonstrate the improved functionality. Figure 1.1 illustrates the goal of this thesis i.e. an improvement of Optima’s debugging capability.

Figure 1.1. Enable Optima to debug systems in target domains

4ENEA Optima is a profiling and development tool for the OSE RTOS. http://www.enea.

com/Templates/Product____27017.aspx

5ENEA OSE: Multicore Real-Time Operating System (RTOS). http://www.enea.com/

Templates/Product____27035.aspx

2

(13)

1.5. DELIMITATIONS

1.5 Delimitations

The delimitation is to work with the debugging tool Optima and the operating system OSE. OSE is the only operating systems that will be studied. Project time is 20 weeks.

1.6 Methods

The project is divided into two parts, a pre-study part and a development part.

The pre-study part consists of a research on debugging methods in accordance with the tasks and, a study of a use case. An analytical method will be used to extract methods to debug various kinds of embedded systems. The research includes analyses of OSE and Optima. Furthermore, the research includes studying of scientific articles and earlier work at ENEA. Software updates for Optima will be developed, implemented and verified with a use case in the development part of the project. The software updates are developed in accordance with the conclusion from the pre-study. The updated tools will be demonstrated; proving the update successful.

(14)

(15)

Chapter 2

Debugging embedded systems

The usage of embedded systems augment and the requirements on reliability, qual- ity of service and development time are high. These high requirements put high demands on system verification and validation. A considerable amount of development time and effort is spent on testing and debugging. Debugging embedded systems is challenging, especially systems with requirements on real-time behaviour and systems with multiple cores. It is necessary to understand the behaviour of an embedded system in order to verify and debug the system. A good debugging tool should provide the system engineer with methods to extract information regarding the system’s behaviour and methods to locate and rectify bugs. There are several methods to debug embedded systems; methods to debug embedded systems in the domains that ENEA targets are disclosed in this report.

The report begins with an introduction on debugging of embedded systems, followed by an overview of requirements and difficulties with debugging of embedded system. Then follows a chapter discussing methods to resolve the mentioned debugging issues and suitable methods to debug various systems. After the chapters describing debugging of embedded systems follows chapters introducing OSE, Optima and debugging of OSE with Optima. A chapter covering the design and implementation of enhancing software updates for Optima, based on conclusions from the pre-study, follows before the chapter presenting the conclusions drawn motivat- ing necessary enhancements of Optima. The report ends with a chapter addressing further work.

2.1 Introduction

Hopkins and McDonald-Maier [6] defines support for debugging as:

"Debug support can be defined as the strategy of placing access points within a system . . . , so that its internal nodes become observable and controllable from outside with the intention of improving the overall system development process".

(16)

CHAPTER 2. DEBUGGING EMBEDDED SYSTEMS According to Hopkins and McDonald-Maier should support for debugging not be confused with hardware/software co-design, design for testability, modeling, simulation and formal verification and that test infrastructure aims to diagnose system design errors whereas support for debugging aims at removing bugs from the overall system including software defects [6].

2.2 Requirements for debugging embedded system

There is a vast array of different embedded systems and a vast number of functionalities that all need to be verified and, possibly debugged. Despite the wide differences of embedded systems, one thing is common for all embedded systems: it is a non trivial task to debug them. The difficulties mainly lie in the uniqueness of the systems due to custom solutions and the stringent requirements on functionality, especially the requirements for hard real-time systems. The time for verification of embedded systems is substantial in the development process. Efficient debugging methods are thus desired to reduce the verification process. Hopkins and McDonald- Maier specifies four fundamental requirements for efficient support for debugging [6]:

• The support for debugging should not significantly change the device’s behaviour.

• Infrastructure for external observation of the internal system state and other critical nodes.

• External access to control the system state and resources including complex peripherals.

• Limited cost impact on the SoC in terms of device pins or chip area.

Essential concepts of debugging are: run-control, real-time trace and intrusiveness. These concepts are introduced here and addressed in more details in a later chapter.

2.2.1 Run-control and real-time trace

Testing and debugging is divided in to two main categories: Run control and real- time trace [7, 8, 9]. Run control features include setting breakpoints, stop and start the execution of code. Many control features like reading a CPU’s internal registers require that the processor is halted. Real- time trace tracks a system’s behaviour.

Real-time trace is important for embedded systems where tracking of the system’s real-time behaviour doesn’t tolerate that the execution is halted.

6

(17)

2.3. THE TRENDS IN EMBEDDED SYSTEM DESIGN AND METHODS TO DEBUG EMBEDDED SYSTEMS

2.2.2 Requirements on intrusiveness

Systems with hard real-time requirements demand non-intrusive debugging methods whereas systems with soft requirements for timing could be debugged with intrusive methods. Non-intrusive debugging does not interfere with the system’s execution or the system’s behaviour and does not alter the code under test, which intrusive methods do. Methods that are regarded as intrusive for some systems might be regarded as non-intrusive for other system.

2.3 The trends in embedded system design and methods to debug embedded systems

The trend in embedded system design moves from systems on printed circuit boards, where each core is a single chip on the circuit board and connected together with buses to an embedded system, towards embedded systems with all system nodes embedded in a single chip.

These systems composed of several single core chips were debugged and tested by attaching physical probes to the pins of the target chip and to the critical buses in the system. This method of debugging became infeasible as the number of pins increased, pin spacing decreased and the surface mounted technology was introduced.

Although the method using probes for debugging became infeasible, the demand for debugging methods still remained which called for new methods to cope with these new technologies. The all electrical access method was developed based on boundary scans to provide the same level observability as with the old probe method. As the systems and the cores in the system became more complicated these boundary scan methods were no longer enough. In-circuit emulators were developed to provide internal information and a non-intrusive method of debugging. The introduction of multicore systems further complicated debugging. To meet the requirements of debugging multicore systems while maintaining industry standards for debugging, hardware manufacturer started to design for debug, providing deeply embedded support for debugging. Figure 2.1 depicts the evolution of embedded systems and the development of countermeasures to overcome the difficulties imposed by the development.

(18)

CHAPTER 2. DEBUGGING EMBEDDED SYSTEMS

Figure 2.1. The evolution of embedded systems

8

(19)

Chapter 3

Difficulties with debugging embedded systems

Verification of a system is the act of testing a system to determine if the system meets its requirements. If the verification of a system fails, the cause of the failure needs to be identified and remedied i.e. the system needs to be debugged. Veri- fication and debugging of embedded systems are very costly, time consuming and very important. The development of an embedded system requires verification in every design stage, from hardware design to application implementation. There are challenges with debugging that need to be overcome in all the design phases and different challenges for systems designed for different domains. Verification and debugging methods suitable for a system in one domain might not be suitable or might even be harmful for systems in another domain. The fact that debugging of embedded systems is challenging is not only because different methods are required but also because a profound understanding of the system and the application’s behaviour is required. This makes verification and debugging maybe the most time consuming and costly task in embedded system development. This is the reason why developers choose platforms depending on the support for debugging that is provided. Modern system can be as small as a single custom system on-chip with limited communication possibilities to large distributed systems comprising several nodes, where each node is a systems on-chip, communicating over a network. All critical nodes of an embedded system must be made observable to enable efficient debugging.

3.1 System on-chip, SoC

The challenges of debugging embedded systems grew when moving from systems on printed circuit boards to system on-chip. Moving the system’s cores and the cores’

data and instruction buses into a single chip removed the connection points to the system’s critical nodes. Traditional host-target communication with systems on PCB was realized with physical connections to a processor’s pins and the external

(20)

CHAPTER 3. DIFFICULTIES WITH DEBUGGING EMBEDDED SYSTEMS data and instruction buses. A SoC’s internal nodes must be possible to observe to allow debugging of the system [6]. The loss of external access points from moving all cores and buses into a single chip led to new electrical test methods like JTAG that provides a test interface analogous to the traditional probe method for systems on printed circuit boards. As the development of embedded system cores and, the performance and the complexity of the on-chip communication increases, additional debugging difficulties are introduced.

3.2 Pipelines and caches

Tow features that increase a system’s performance and make debugging more difficult are pipelines and cache memories. Cache memories for both data and instructions reduce a core’s memory access time but make the system less predictable and less deterministic. Deep pipelines increase the difficulties with internal observability and controllability. Systems with high demand on deterministic behaviour might not incorporate caches. It is, according to MacNamee and Heffernan, essential to incorporate on-chip logic to enable debugging of modern microcontrollers [8] Sys- tems with multiple cores and caches require cache protocols which imposes complex bus transactions that are difficult to intercept. An advanced on-chip bus structure or an on-chip network further complicates the debugging of the system.

3.3 Multicore and multiprocessor systems

The requirements for low power devices lead to multicore SoC designs [6]. A multicore system is a system with multiple nodes that perform some tasks and with one core that is a processor e.g. a RISC processor. The cores connected to the processor core in a multicore system are special purpose architectures like a hardware accelerators or DSPs, in contrast to a multiprocessor system there multiple cores are processors. Both multicore and multiprocessor systems can be on-chip architectures or distributed systems with nodes on a PCB or even nodes on different PCBs.

Embedded systems with multicore solutions impose additional challenges with debugging compared to single core systems: multicore and multiprocessor system enables parallel execution which creates concurrency issues. The cores in multicore and in multiprocessor system communicate during normal operation which requires additional communication protocols compared to single core systems. Advanced protocols for communication are needed because both data and instructions are transferred to and from the processor and the cores, or between the processors in the system. This intercommunication usually implies memory accesses which make memory resources shared resources and a bottleneck in large systems. Debugging of multiprocessor architectures is further complicated by incorporating caches and cache protocols. Each processor accessing its cache influences all other processors’

caches, which causes coherency problems in the system. One of these coherency problems that might arise is false sharing. False sharing occurs if one core in a

10

(21)

3.4. NETWORK ON-CHIP, NOC

system is using an address exclusively that resides in the same cache block as addresses altered by other cores. When an address in that cache block is altered, all caches in the system storing that cache block invalidate that cache block, forcing the core accessing the exclusive address to a cache miss. It is desirable to monitor the intercommunication to identify the cause of a system bug [10]. Listed are five requirements that should be fulfilled for successful debugging of multicore systems according to Tang and Qiang [10]:

• Concurrent debug access to multiple embedded cores

• Inter-core transaction tracing and analysis

• Real-time tracing of debug components

• Cross-triggering among debug components

• Low DfD overhead in terms of area, routing and device pins

The intercommunication in a multicore or a multiprocessor put high demands on the chip’s bus structure. In modern high performance system a simple bus or even a more complicated fabric structure might not suffice to handle the intercommunication. In order to solve this issue some modern chips are designed with an on-chip network.

3.4 Network on-chip, NoC

Network on-chip is a communication infrastructure based on packet switching between on-chip routers. The cores or nodes on-chip are connected to a router. Net- works on-chip enable a flexible and scalable communication backbone for complex embedded systems with requirements on high speed core communication [11]. De- bugging support for NoC imposes additional challenges to conventional debugging techniques with bus monitoring because multiple nodes in the network can send concurrent messages [12]. Compared with traditional bus communication, networks on chip use routing algorithms so that all packets do not pass all nodes in the network making it difficult to intercept the communication. Systems built with an on-chip network may transmit both data and instructions over the networks and there is too much data and too many instructions transmitted in the network to log. This requires that the traffic in the network is filtered for efficient monitoring.

3.5 Message passing communication

Message passing is a common method for communication in multicore and multi processor systems. The operating system targeted in this master’s thesis, OSE, incorporates a version of a message passing protocol for communication between processes. Message passing is therefore covered in the report as a separate section.

(22)

CHAPTER 3. DIFFICULTIES WITH DEBUGGING EMBEDDED SYSTEMS Implementations using message passing are usually built with multiple software levels. The levels in the message passing API impose performance loss that is not suitable in systems utilizing high-level parallelism on multiprocessor chip. There are two paradigms for communication between processors in a multicore system, symmetrical multi processing, also called shared memory, and message passing interface. Communication with symmetrical multi processing requires that both the sending processor and the receiving processor access a shared memory whereas processors sending data directly to the receiver use a message passing interface, MPI [13]. Direct message passing between processors utilize on-chip infrastructure for communication, but this requires that the infrastructure supports direct message passing between processors. Furthermore, a scratchpad memory or a dedicated memory connected to each of the sending and the receiving node is required to support direct communication between processors. Message passing in multiprocessor embedded systems are commonly implemented on top of a shared memory according to Francesco et al.[14]. Producer and consumers communicate via FIFO buffers that reside in a shared memory. The producer and consumer are synchronized with semaphores and atomic memory operations to prevent the communication buffer to underflow or overflow. When a processor performs an atomic memory operation the processor is granted memory access by an arbiter and locks the bus. When the bus is locked by one processor the arbiter prohibit other processors to utilize the bus until the bus is released. Atomic memory operations do not scale well with scaling of communication channels. This is because of the bus locking mechanism and in larger system more nodes compete for access to the shared memory which then becomes a bottleneck in the system. Figure 3.1 illustrates message passing on top of shared memory and direct message passing.

Direct communication requires that the message is intercepted or that the debugging tool has access to the memory connected to the cores in the system. Com- munication over shared memory requires message interception or memory access.

Acquiring the message from shared or dedicated memory must be done while the data is valid i.e. before the message is overwritten. Acquisition of valid messages requires that the debugging tool is notified of any message passing event.

12

(23)

3.5. MESSAGE PASSING COMMUNICATION

Figure 3.1. Illustrating message passing.

3.5.1 Message Passing Interface, MPI

Message Passing Interface is a good communication model for distributed and parallel systems. MPI is a standard that provides mechanisms for point-to-point and collective communications [15, 16]. The basic point-to-point communication operations in MPI are send and receive functions. The message is passed between processes via a message buffers and point-to-point communication can be either blocking or non-blocking. Collective communication is communication involving a group or groups of processes. The collective operations have consistent syntax with point-to-point communication [16]. A message passing system consists of a set of tasks or processes that communicate or are synchronized over a communication channel. Message passing interfaces are good in systems with low computation power since they may not be powerful enough handle complex communication mechanisms. A custom API with lightweight communication is ideal for these kinds of systems. Although MPI is a standard for message passing in distributed systems it is not always suitable for embedded systems with small computation power or limited memory because many MPIs’ API are memory consuming. MPI is not available in systems built on special purpose hardware, because the system does not run

(24)

CHAPTER 3. DIFFICULTIES WITH DEBUGGING EMBEDDED SYSTEMS a proper operating system with support for communication layers, and porting the API to the special purpose hardware is to costly [17]. Agbaria et al. propose a lightweight implementation of MPI for distributed embedded systems where nodes are not capable of running a full MPI implementation. The LMPI implementation consists of two types of nodes, server and client. The client is a core that does not support a full MPI implementation and a server is a core that does. The client communicates with the server via TCP/IP or a simple PCI bus. This implementation enables MPI for distributed systems comprising nodes with low computation power and limited memory [17]. Mahr et al. argues for a method for a reduced API that can be easily configured to support only the low level network communication and necessary MPI functions. This method targets embedded systems built on FPGAs and other systems where on-chip memory is a limited resource [13].

3.6 Software for embedded systems

Application software for embedded systems can be divided into two categories:

those with real-time requirements and those without. Applications with real-time requirements have in addition to the requirements on functionality requirements on timing, compared to non-real-time applications; non-real-time applications only have requirements on functionality. Real-time systems are divided into systems with hard real-time requirements and systems with soft real-time requirements. Hard real-time systems have high requirements on timing and are the most challenging systems to verify and to debug. When verifying hard real-time systems it must be verified that all deadlines are met and, off course, verified that all requirements on functionality are met. Soft real-time systems have lower requirements on timing than hard real-time systems, and some missed deadlines can be acceptable. The difference in requirements for the soft and hard real-time systems allows for different debugging methods with different levels of intrusiveness.

When designing an application with real-time requirements a usual approach is to do a pre-runtime schedule. Scheduling a real-time system with static priorities pre-runtime is quite straight forward compared to scheduling a dynamic real-time system pre-runtime. Scheduling a real-time system on a multicore platform complicates the pre-runtime scheduling even further because the execution order of the tasks in some real-time system is not known pre-run time, and in parallel systems it is not know at what core a particular task nor a particular job of a task is executed. Suárez et al. argues that static analyses are unsuitable for real-time systems because static scheduling techniques render high performance embedded systems over dimensioned. In static analyses of real-time tasks running on embedded systems the time constraints for a task is derived from an input to the system till the time the system has responded to the event. Static analyses require that requires that the time between input to response is measured or estimated for pre-runtime scheduling. Inaccurate estimations and measurements of a task’s execution time usually render scheduling of high performance system, especially on a parallel or

14

(25)

3.7. DOMAINS ENEA TARGETS

distributed platform, over dimensioned [18]. Other disadvantage with pre-runtime scheduling that makes system over dimensioned are the complex synchronization of the tasks in the system and simplified scheduling models that compensate for synchronization anomalies to fulfill real-time constraints [18].

Since static scheduling is not sufficient to guarantee the behaviour of a system and to guarantee that timing requirements are met, debugging of real-time systems require that the systems are run on the target and a sufficient debugging tool to monitor the system. Debugging of real-time systems requires that the debugging tool does not interfere whit programme execution [9]; because, a tool that interferes with programme execution might change the execution order, changing the system’s behaviour and imposes tool related timing problems. Tools for debugging real- time operating systems must be able to detect task switches along with the status of the data structure used by the real-time operating system [8]. According to MacNamee and Heffernan is it essential that real-time operating system aware tools must support both data and programme tracing and detect task switches [8].

Real-time requirements on embedded systems render methods that are useful for non-real-time embedded applications unsuitable. Debugging using breakpoints is inappropriate in some real-time applications where stopping the system suddenly can be harmful for the system or make the system’s behaviour unsafe [19]. Debugging a controller to heavy machinery is an example where uncontrolled behaviour caused by a breakpoint is unsafe. Starting and stopping a real-time system is not always desirable because the time dependant behaviour of the system will be disrupted and lost [6]. A real-time system’s behaviour must be monitored during runtime. Hard real-time systems do not allow intrusion imposed by debugging tools when monitored, soft-real time systems might allow for some intrusion as long as the system’s overall behaviour is not altered. Other time related problems like starvation and system failures due to memory fragmentation require monitoring over time.

3.7 Domains ENEA targets

All different and, in many cases unique requirements for embedded system have lead to a vast number of unique systems. The uniqueness of the systems means that all systems have to be verified against an almost unique set of requirements. To verify and debug all sets of requirements call for several verification and debugging methods. Offering solutions for as many domains as ENEA does imposes great challenges with debugging and high requirements on a debugging tool if the tool should be able to debug the execution behaviour and functionality of all applications:

OSE and other vendors’ software, in all domains, as intended with a module in a tool chain. Figure 3.2 depicts the domains ENEA offers solutions for, divided into two categories: domains with hard and domains with soft real-time requirements.

The figure, although it is coarse, also gives an introduction to the requirement on debugging methods and challenges with debugging associated with each category.

(26)

CHAPTER 3. DIFFICULTIES WITH DEBUGGING EMBEDDED SYSTEMS

Figure 3.2. Domains for which ENEA offers solutions

The domains are divided into the two categories depending on the temporal requirements that the real-time requirements impose. The consequence of a missed deadline in a system with hard real-time requirements might be fatal, that is not the case with a missed dead line in a system with soft real-time requirements. Systems where failure might be fatal are found in the domains listed in the category with hard real-time requirements. Systems with hard real-time requirements must have its temporal behaviour verified witch is very difficult. The challenges with verifying temporal behaviour is the reason why multicore systems are associated with soft real-time requirements in this report.

3.8 Summary

Table 3.8 summarizes the specific requirements for enabling debugging of different embedded system designs and system features presented in chapter 3.

16

(27)

3.8. SUMMARY

Design / Feature Specific requirement to enable de- bugging of design or feature

Tag System on-chip Internal access to critical nodes and buses Req01 Pipeline and caches Access to the pipeline status Req02 Multicore and multipro-

cessor systems

Non-intrusive debugging methods due to

concurrency Req03

Multiple core access Req04

Communication interception Req05 Network on-chip Communication interception and filtering Req06 Message passing On time communication interception Req07 Software for embedded

systems

Execution monitoring Req08

Table 3.1. Table summarizing requirements for debugging of embedded systems

(28)

(29)

Chapter 4

Methods to debug embedded systems

When a verification attempt of a system fails the system needs to be debugged.

Depending on in what stage of the development process an error occurs, debugging of the system requires different methods for locating and eradication of the source to the problem. This report targets software application development i.e.

the application runs on a target hardware or an emulated hardware that runs on a desktop computer. Different methods to debug embedded systems are investi- gated, and suitable debugging methods for handling different system requirements are mapped in this chapter. Different methods to debug embedded systems are good for targeting different areas of a design; some methods are good for detecting communication problems, others at detecting timing issues. There are trade offs to all debuging methods, these trade offs are not seldom between versatility and, cost or intrusiveness. Now follows two sections that cover the main categories of debugging methods, namely run control and real-time trace.

4.1 Run control

Run control can be implemented in both hardware and software. Hardware de- signers add three functions to the chip design for embedded system that enable run control [7]: breakpoints, execution control and internal-state access Breakpoints are programmable halt instructions that interrupt execution at a determined point in time. There are, according to [7] two control approaches in execution control, stopping and halting. The difference between stopping and halting is that a system’s clocks are disabled when stopped, and unaffected when halted. Disabling all clocks freezes the target allowing for observation of embedded memories, state of internal processes and the stages in the pipeline. Stopping the target during debugging is sometimes the only way to locate the source of a problem in a system that is mal- functioning. However, there are disadvantages to stopping a system. The phase relationship between function clocks in systems with multiple clocks might be lost, which makes the system more complicated to resume. Another disadvantage with stopping a system, especially a real-time system where execution order is unknown,

(30)

CHAPTER 4. METHODS TO DEBUG EMBEDDED SYSTEMS is that the execution order will be disrupted, further complicating resumption of the system.

When a processor in a multiprocessor system encounters a breakpoint instruction its dedicated on-chip hardware sends an interrupt to the other processors in the system. The interrupted processors enter their debugging handling routines upon the interrupt signal. A halted system allows for single stepping of functional execution. Since the system clocks are not disabled when halting a system the resumption of execution is easier, this is an advantage over stopping a system.

When a system’s execution has been interrupted the system can provide internal stat access by either a structural or a functional method. A structural method to access internal state information is to utilize scan chains. Functional access methods allow read and write instructions in the system controlled by debugging software via a test interface. Both methods require the system to be stopped [7].

When run control is implemented in software and basic hardware support for debugging instructions is not implemented, additional instructions are added to the code under test. When such an additional instruction is executed a software interrupt is raised or invokes the exception handler. Software implementations of run control have no control over the system’s clock, so when a system is halted all nodes enter their dedicated debugging interrupt handler or raises a special exception.

Software implemented run control suffers from the same drawbacks as hardware implemented run control regarding halting the target. Since software implemented run control lack the ability to control the system clock, targets cannot be stopped as can be done with hardware implemented run control. Internal access to the system is limited to memory access which limits the access to system state information. The operating system used in this thesis, OSE, uses software methods for run control.

Hardware run control is usually implemented using a serial interface connected to on chip debugging logic. Two interfaces allowing run control are mentioned in this thesis, JTAG and NEXUS. Both software and hardware implementations of run control are cover in more detail in later sections in this chapter.

A traditional method for debugging embedded systems is post mortem analysis.

When doing post mortem analyses the system is stopped in response to an event or a predefined address location. A drawback with postmortem analysis is that the source of the bug might be lost by the time the system is halted. Monitoring the system while it is running helps overcome this problem according to Hopkins and McDonald-Maier [6].

4.2 Real-time trace

Real-time tracing is a useful method for profiling and debugging of embedded systems that can be implemented in both software and hardware. As the name real- time trace implies is this method applicable to running targets. Tracing enables gathering of information over time which makes it possible to capture information of the events that lead to system failure, events that might not show up during

20

(31)

4.2. REAL-TIME TRACE

post-mortem analyses. A drawback to tracing is the large volume of information gathered from the system and that transmission of gathered information to the host is relatively slow. The problem with low bandwidth between the host and the target can be mitigated with a trace buffer embedded in the target. Modern system with on-chip trace facilities repacked and compressed the trace data to enable real-time tracing of high speed systems. Other approaches to deal with large trace volumes are to increase the bandwidth of the host-target link.

It is essential to trace all parts of a system, including peripherals to capture the system’s behaviour. It is important in modern systems to trace the communication between different cores and, the communication between the cores and the peripherals to capture timing bugs. Tracing intercommunication is further complicated by complicated bus structures and networks on chip. In order to capture concurrency related bugs in multiprocessor system data and programme traces need to be synchronized [19]. Due to the complexity of debugging large modern systems the trend is now to design for debug rather than design for test [19].

Real-time tracing complements run control with timing information of internal signals. A selection of internal signal traces are sent from the chip to the debugging tool via dedicated device pins. An advantage of tracing is that the internal signals can be observed over extensive periods of time. Disadvantages are that the number of internal signals exceeds the available number of dedicated pins and tracing restricts the number of internal signals observed in real-time. Trace data can be sent directly through the device pins or through an internal trace buffer via a debugging interface. Depending on the trade offs between internal level of observation, hardware costs and software overhead; the capturing and output can be either hardwired, weakly programmable or fully programmable. Fully programmable capturing is the approach with the highest level of observation and flexibility. However, this method has high hardware costs and imposes software overhead that might be to intrusive. Hardwiring the selected signals imposes no software overhead and has low hardware cost, but restricts the level of observation [7].

Tracing captures the state of a system and monitors how the system’s state evolves over time. Tracing gives the test engineer insight to complex systems and provides information of the system’s behaviour. There are a few difficulties with tracing large distributed systems; three reasons are mentioned in [20]. Capturing non-deterministic events for large systems requires monitoring for long periods of time. Long time monitoring requires a minimal overhead and small impact on the system to be feasible. A second issue is timing; there is no global clock in distributed systems, so time-synchronization is needed. The third issue is the scale of the analysis, tracing techniques may generate millions of event for contemporary systems and future system may generate even more events. Future distributed systems comprising more nodes and with higher event rate, will further increase the difficulty of tracing.

Real-time tracing differs from traditional run control debugging techniques that add instructions to the code under test [9]. Traditional tracing methods send tracing data to the host via a serial port. The sending of tracing data is non-intrusive

(32)

CHAPTER 4. METHODS TO DEBUG EMBEDDED SYSTEMS but has a limited bandwidth, which limits the amount of information sent to the debugging tool [9]. Techniques that compensate for the low bandwidth are: to compress tracing data and in large designs even embed a DSP for compressing tracing data.

Increased clock frequencies, caches and deep pipelines makes real-time tracing more difficult. Higher clock frequencies require higher bandwidth to the external debugging tool and larger on-chip buffers. Retrieving information from the system under test via a low bandwidth interface like JTAG enables gathering of information about the system’s execution, but requires on-chip trace buffers to compensate for the low bandwidth. Comparing the trace data with the code under test allows tracking of the programme execution, which gives insight to the systems behaviour.

Gather enough trace data from large high speed systems to monitor the system’s behaviour is not feasible because of the amount of data needed and the buffer size required to buffer all data. MacNamee and Heffernan, and, Moore and Moya addresses an alternative method to gather enough tracing data from large systems that solves the problem with a low bandwidth connection to the host and limited on-chip buffers for tracing data. Instead of gathering all kind of data from the target this method only monitors activities on the address, data and control bus. Besides that less data is needed to be captured with this method it is also non-intrusive.

When combining bus information with a model of the system, and using the code under test as a map, it is not necessary to monitor all instructions executed e.g. in case the system operates out of cache, this tracing technique accomplishes tracing by modelling the system’s behaviour and complements the model of the system with information from the monitored buses. The debugging unit uses less clock cycles when it is not executing all instructions, which enables real-time tracing of the system and accommodate tracing of high frequency systems [8, 9].

As mentioned earlier is tracing a non intrusive debugging technique that allows the user, with a proper tool, to inspect variables, do post mortem analyses and dynamically attach to running programmes. These features help the programmer to increase productivity by reducing the time required to understand the behaviour of the system [21]. Beynon et al suggest five non-intrusive techniques to improve debugging of multi-thread and distributed parallel systems. These techniques can be realized with free commodity tools such as the GNU debugger, GDB [21]. How- ever, debugging techniques based on software are regarded as intrusive [22] for some systems, especially hard real-time systems. Depending on the system’s configuration, software methods’ intrusiveness can have a practically non-existent influence on the programme execution and thus be regarded as non-intrusive, as argued by Beynon et al [21].

Tracing is a good method to capture the behaviour of a system that incorporates message passing protocols. These kinds of systems’ behaviour are heavily influenced by the communication between tasks or in multicore system the communication between tasks running on different cores. The only way to monitor message transmission in the system is to monitor the system during runtime. Mon- itoring communication during runtime requires that the on-chip debugging support

22

(33)

4.3. SOFTWARE SIMULATION AND HARDWARE EMULATION

monitors the data buses or that a process in the system intercepts messages and transmits the messages to the debugging tool. Message passing interception in OSE is implemented in software with a process executing on the target.

4.3 Software simulation and hardware emulation

The fact that verification and debugging is very time consuming and costly makes it desirable to start verification and debugging as early in the development process as possible. In development project were the hardware platform is not accessible for some reason or during development of the hardware platform, software can in some cases be simulated with a software model emulating the target hardware. Some embedded systems might need custom hardware which renders testing and debugging on the platform impossible in an early stage in the development process because the hardware is under development. In these cases a reliable platform for simulations are desirable. However, platforms for simulation are neither fast nor accurate according to Hopkins and McDonald-Maier [19]. The inaccurate behaviour is a ma- jor drawback to simulating a system’s hardware during application development.

The fact that the hardware models are not totally accurate can cause new bugs to show up when the software is run on the intended target, or cause the system to show signs of inaccurate behaviour when it is run on the intended target due to modifications made to the system in order to make the system run on the emulated hardware. Debugging software on the intended hardware requires different debugging methods than the methods used during simulation to remove software bugs.

This is because some design problems only occur when the software executes on a chip in the intended environment [7]. The discrepancy between an emulator and the actual hardware is that the actual hardware often have a higher system clock frequency then the emulator and that physical phenomena influent all execution on the actual hardware, whereas hardware emulation simulates the physical phenomena. The desire for short design time restricts test coverage and the level of detail in the tests i.e. trade offs between simulation time and the level of detail make pre-silicon verification incomplete. Increasing the level of detail in the simulations increases the execution time thus making it necessary to reduce the number of use cases tested [12].

4.4 Software based debugging support with basic hardware support

Software monitoring adds monitoring routines to the code under test. A simple example of software debugging is to add print functions in the code under test to monitor e.g. variables. A better approach to software instrumentation is to use a debugger. A common debugger is the GNU debugger, which defines software monitoring routines. Software based debugging can be platform independent and some processors support software monitoring with special debugging instructions.

(34)

CHAPTER 4. METHODS TO DEBUG EMBEDDED SYSTEMS Debugging instructions make the system enter debug mode via interrupt handlers or an exception handlers. Software monitor routines provide reliable means for observation and control once the system is halted but has substantial impact on the system’s behaviour. Furthermore, when a system is considered bug free the monitoring instructions will be removed from the code which will change the system’s behaviour. Software monitoring is useful in systems that are not operating at full capacity and systems with soft deadlines [6]. Software instrumentation provides debugging features like breakpoints, stop and start at the cost of software overhead.

More on advantages and disadvantages with software instrumentation are addressed through out this chapter and systems where software instrumentation is insufficient are discussed. The target operating system in this thesis, OSE, supports GDB.

Hence follows a short introduction to GDB.

4.4.1 The GNU Project Debugger, GDB

GDB is general-purpose software debugger developed by the GNU project team.

GDB provide debugging possibilities for various operating systems, processors and programming languages. GDB allows the user to monitor a programme while it executes or provide information on what happened before a system crash. GDB can do four things to aid debugging [23]:

• Start the programme, specifying anything that might affect its behaviour.

• Make your programme stop on specified conditions.

• Examine what has happened, when the programme has stopped.

• Change things in the programme, so that it is possible to experiment with correcting the effects of one bug and go on to learn about another

In order to debug using GDB the code under test needs to be compiled for debug. The debugging information is stored in an object file.

4.5 Debugging using device test features

Debugging using device test features is a scan-chain based method; scan-chins pass through the processor’s control and data paths. This method reuses design for testability facilities with scan-enable flip-flops and thereby imposes minimal extra hardware cost. The scan-enabled FFs enable retrieving information on the FF’s state and thereby information from control and data paths in the processor. Draw- backs to scan-chain based debugging are that the scan-chains are normally only accessible in testing mode and only use a single scan path. One method to overcome this is to multiplex the scan-chains and make them accessible via a single scan-chain interface. A problem with reading the scan-chained enabled FFs is that reading the FFs might change the state of a FF. This is solved by replicating the

24

(35)

4.6. IN-CIRCUIT EMULATOR, ICE

FFs with shadow FFs that capture the state of the original FFs. This method avoids stare corruption but imposes large hardware costs and for large systems the extra hardware cost due to many critical data paths render this method infeasible [6].

On-chip emulation is a data capturing tool within the system on-chip that emulate the target system. This capturing tool extends basic run-control whit trigger functions for breakpoints and watch-points. The on-chip emulator operates in parallel with the system’s normal functionality and has low impact on the system [6]. The on-chip emulator gathers information from the system by monitoring data and instruction buses. The on-chip emulator can be configured to be accesses via a test interface. More on accessing on-chip debugging support with test interfaces are covered later in this chapter.

4.6 In-Circuit Emulator, ICE

During development of an embedded system a node or a core in the system can be replaced with an in-circuit emulator, ICE, which is a specially produced chip that provides the same functionality as the target circuit, but with extra connectivity at the cost of a larger footprint due to additional pins. The extra pins are connected to internal signals. The advantages of this method are the increased level of observation and that the method is non-intrusive. This means that data can be gathered from the system during runtime without altering the system’s behaviour. The drawbacks of this method are that the ICE has a different footprint than the target circuit and that the method requires two PCB designs which require manufacturing of an additional mask set. Besides these drawbacks; designs with a large gate count have too many critical observation points to be hardwired to external pins, or even, as many observation points that multiplexing them together before wired to external pins is infeasible. This is the same disadvantage as with reusing test features based on scan-chains for large designs. High frequencies and different frequency planes further complicate real-time observation [6] using ICEs.

4.6.1 ICE implementation

An ICE provides the same functionality as the target circuit e.g. a microcontroller.

The ICE provides extra support for development, testing, debugging and main- tenance. Features like single stepping, breakpoints and tracing are supported by the ICE. In systems on PCBs the target circuit was replaced with an ICE chip during development. When SoCs were introduced the option to replace a node in the system with an ICE were rendered impossible. To meet the need for testing support and the need of debugging support, manufacturers started to embed ICEs in modern chip [22]. [22] addresses several techniques to implement IECs in hardware and software. Furthermore, Kao et al. have divided ICE operations into two debuggign modes, background debuggign mode and foreground debugging mode . Background debugging mode detects trigger conditions and activates foreground debugging mode. The system runs normally during background debugging mode in

(36)

CHAPTER 4. METHODS TO DEBUG EMBEDDED SYSTEMS contrast to foreground debugging mode where system is halted. When the system is switched to foreground debugging mode it is possible to communicate with the host computer, read out debugging information from, and send debugging instructions to the target [22]. How these two modes are implemented in software and hardware is addressed in more detail in the following sections.

4.6.2 Background debug mode

Background debug mode detects trigger conditions such as breakpoints and timer interrupts. When a trigger condition is detected the processor enters foreground debug mode.

If background debug mode is implemented in software a special version of the target software is created with special instructions inserted in the code, as when using GDB. When a processor executes one of these special instructions a software interrupt is raised and the programme enters the exception handler and accordingly enters foreground debug mode. Conditional checking is supported in software instrumentation at the cost of performance overhead. The performance overhead and the intrusion are the drawbacks of software instrumentation. This method is intrusive because the debugging version of the code is different from the system code without the extra debugging instructions i.e. a bug free software instrumentation code cannot be guaranteed to be bug free without the debugging information. The intrusive nature of the method and the fact that it cannot detect timing related bugs makes this method inappropriate for real-time systems. Other disadvantages with software emulation is that it requires large amount of memory which is costly in embedded systems, and that it takes longer time to detect and handle breakpoints than with hardware implemented background debug functionality. The main advantage of software instrumentation is that it can be utilized for almost all mi- croprocessors. Other advantages are that software emulation is flexible, easy to modify and has a smooth transaction between background and foreground debug mode without manipulation of the system clock [22].

Implementing background debug mode in hardware is a simple concept but requires careful design to achieve good timing balance in the design. Good timing is imperative for keeping a halted processor in a stable state and to accommodate instruction parallelism and pipelining. This is to keep the logical sequence maintained and to avoid false conditions. Hardware support for background debug mode is usually implemented with comparators monitoring various system buses.

The comparator’s registers can be programmed to trigger on different conditions.

Hardware trigger conditions are more sophisticated than those supported by software methods. When the hardware comparator detects a trigger condition it stops the systems core clock, raises an exception and enters foreground debug mode. The advantages of hardware support for background debug mode are that trigger conditions can be monitored in real-time, making this method suitable for real-time systems, and that system status that cannot be accessed with software methods can be access [22].

26

(37)

4.6. IN-CIRCUIT EMULATOR, ICE

4.6.3 Foreground debug mode

When the system enters foreground debug mode it enables interactions with the host computer, configuring of the background debug mode, switching back to background debug mode and resumption of program execution [22].

Software foreground debug mode is implemented as e.g. a monitor task or process running in the system. The monitoring process is part of the system and requires memory space. The monitor process communicates debugging information to the host via a communication channel and receives instructions commands such as setting breakpoints or single stepping. The advantages of implementing foreground mode in software are that software is flexible and easy to modify, as for background debug mode implemented in software. The disadvantages are similar to those for background debug mode: intrusive, consumes memory and it is relatively slow compared to hardware implemented foreground debug mode [22].

The advantages of implementing foreground debug mode in hardware are that the debugging circuit is independent in the core and does not intrude in the programme as a software implementation does. Moreover, the systems core clock is stopped when the system enter foreground debug mode and the system is controlled by the test clock. Controlling the system with the test clock enables faster debugging, because the debugging circuit is simpler than the core and can thus run at a higher frequency. The disadvantage is that it influences the hardware design phase and often requires a dedicated interface for debugging, at the cost of chip area and pins [22].

4.6.4 FDM communication channels

Software implementations of foreground debug mode communicate via external I/O buses. The advantage of communication via the external I/O bus is that it is easy to implement and that facilities for off-chip communication can be integrated in the system. The off-chip communication in OSE is facilitated by utilizing TCP/IP which is a part of the system and thus requires no extra components or functionalities for off-chip communication. The disadvantage with utilizing the system’s communication channel for off-chip communication is that the off-chip communication channel is then shared with the system’s communication channel and makes the channel a shared resource. The system communication might be blocked by the host-target communication. Hardware interfaces communicate via a dedicated testing or debugging interface. This interface does not interfere with on-chip communication but requires additional pins [22]. It is common to reuse existing boundary scan testing ports to access the embedded ICE. Advantages of reuse a test port like the JTAG interface are that it is an adapted industry standard, requires few extra pins and includes an external test clock [6].

A research on debugging tools’ platform independency