• No results found

A Camera System for the KTH MIST Satellite

N/A
N/A
Protected

Academic year: 2022

Share "A Camera System for the KTH MIST Satellite"

Copied!
63
0
0

Loading.... (view fulltext now)

Full text

(1)

IN

DEGREE PROJECT INFORMATION AND COMMUNICATION TECHNOLOGY,

SECOND CYCLE, 30 CREDITS STOCKHOLM SWEDEN 2020 ,

A Camera System for the KTH MIST Satellite

VOLKAN COSKUN

(2)

Abstract

The KTH MIST satellite will include a Camera System that should capture pictures of Earth from Space. Due to low Bandwidth, the satellite can traverse several lapses around the Earth until the picture is transmitted down to Earth.

The picture is processed and the raw pixels stored in an on-board DRAM. A problem is that pixel-data is subjected to bit-flips because FPGAs and DRAMs in general are sensitive to ionizing radiation which can corrupt data. This will degrade the performance of the system, unless some protective measures are taken.

The Camera System is connected to the Single Event Upset Detector (SEUD) experiment, so that the SEUD experiment will have something to compute while waiting for SEU events to happen. Another important topic in this research is the power consumption, i.e., how much power will the Camera System consume, and how can it be minimized. The goal is to find the most power consuming areas in the block design and optimize it.

The self-healing SEUD architecture consist of two COTS FPGAs, one ARTIX- 7 and one SmartFusion2 chip. The ARTIX-7 chip is sensitive to radiation, and is used as a sensor-platform for capturing SEUs as bit-flips happens in its configuration memory. The SmartFusion2 chip, on the other hand, is less sensitive to radiation due to its flash-ram based configuration memory. It also contains an ARM Cortex-M3, and is used to supervise and perform tasks like re-programming, restarting and overseeing the ARTIX-7 chip, as well as storing images and communicating with the main On-Board Computer (OBC) of the MIST satellite. Pictures are stored in a local DRAM memory that is protected from SEUs through Triple Modular Redundancy (TMR) and scrubbing.

In this thesis, an OV7670 camera module is used for capturing pictures and save it in a local memory. Two camera system implementations are investigated and implemented using the Nexys Video board as a target platform. The 2nd Camera System fulfills the throughput and generates 640x480x3 = 921.6 kB pixels per frame.

The pixel stream is forwarded by a VDMA unit to other components for storage or further processing. The setup has been validated through simulations and by connecting the VDMA to HDMI and VGA screens. The implementation had a total on-chip power consumption of 1.172 W (including the MicroBlaze processor), while the I/O used for debugging consumed 0.3 W (29 percent of the total power consumption). Hence, the power consumption of the final Camera System is below 1 W, which is the power budget allotted for the whole SEUD experiment. A combination of RTL code along with AXI4 compliant IPs was concluded to be the best choice for the 2nd Camera System.

(3)

Sammanfattning

KTHs MIST satellit inkluderar ett kamerasystem som skall ta bilder p˚a jorden fr˚an rymden. P˚a grund av l˚ag bandbredd kommer satelliten snurra flera varv runt jorden innan ¨overf¨oringen blir klar. Bilden bearbetas under tiden och den r˚aa pixeldatan sparas i ett DRAM. Ett problem ¨ar FPGA:er och DRAM:ar ombord uts¨atts f¨or joniserande str˚alning vilket kan orsaka bit-flippar, korrupt data och p˚averka systemets prestanda om inte n˚agra skyddande ˚atg¨arder g¨ors.

Kamerasystemet ¨ar kopplat till SEUD (Single Event Upset Detector) ex- perimentet, s˚a att SEUD har n˚agot att r¨akna p˚a medan den v¨antar p˚a SEU- h¨andelser. Ett annat viktigt omr˚ade som tas upp i denna avhandling ¨ar hur stor effekt-f¨orbrukning som kamerasystemet har, och hur den kan minimeras.

M˚alet ¨ar att hitta de mest effektf¨orbrukande omr˚adena och optimera dem.

Det sj¨alvhelande SEUD-experimentet best˚ar av tv˚a stycken COTS-FPGA:er, ett ARTIX-7-chip och ett SmartFusion2-chip. ARTIX-7 chippet ¨ar k¨ansligt f¨or str˚alning och anv¨ands som sensor-plattform f¨or att detektera SEU:er genom att hitta och fixa bit-flippar i dess konfigureringsminne. SmartFusion2-chippet

¨

ar d¨aremot mindre k¨ansligt f¨or str˚alning p˚a grund av sitt flash-ram baserade konfigureringsminne. Den har ocks˚a en ARM Cortex-M3, vilken anv¨ands f¨or att ¨overvaka, omprogrammera och starta om ARTIX-7 chippet, s˚av¨al som att kommunicera med MIST-satellitens centraldator OBC (On Board Computer).

Bilderna sparas i ett lokalt DRAM-minne som ¨ar skyddat fr˚an SEU:er genom Trippel Modul¨ar Redundans (TMR) och skrubbning.

Kamera-modulen som anv¨ands i denna avhandling f¨or att ta bilder och spara dem i ett lokalt minne ¨ar en OV7670-sensor. Tv˚a implementeringar av kam- erasystemet unders¨oks och implementeras p˚a ett Nexys videokort. Det andra kamerasystemet uppfyller den prestanda som kr¨avs och genererar 640x480x3 = 921.6 kB pixlar per bild.

Pixelstr¨ommen vidarebefordras av en VDMA-enhet till andra komponen- ter f¨or lagring. Kamerasystemet validerades med hj¨alp av simuleringar och i h˚ardvara genom att koppla VDMA:an till HDMI och VGA-sk¨armar. Konstruk- tionen hade en total effektf¨orbrukning av 1.172 W (inklusive uBlaze-processorn), varav de I/O som anv¨andes vid fels¨okning f¨orbrukade 0.3 W (29 procent av den totala effektf¨orbrukningen). Effektf¨orbrukningen f¨or det slutliga kamerasys- temet ¨ar s˚alunda mindre ¨an 1 W, vilket ¨ar effektbudgeten som avsatts f¨or hela SEUD-experimentet. En kombination av RTL-kod tillsammans med AXI4- kompatibla IP konstaterades vara den b¨asta l¨osningen f¨or det andra kamerasys- temet.

(4)

Acknowledgments

I would like to thank my examiner Johnny ¨Oberg for giving me the opportunity to work with the SEUD project. Furthermore, I would also like to thank Kalle Ngo for helping and guiding me during the thesis, all my teachers and professors that have helped me during my time as a student and directed me in matters pertaining to job career and private life. I have had a great time doing projects at Makerspace together with wonderful students at KTH. Lastly and most im- portantly, if it was not for my fianc´ee, I would not be so strong to finalize my thesis and would like to thank her the most for being so loyal and supportive during my thesis.

(5)

Contents

1 Introduction 6

1.1 SEUD Experiment . . . 6

1.2 SEUD Architecture . . . 7

1.3 Camera System . . . 9

1.4 Problem Description . . . 11

1.5 Research Objectives . . . 11

2 Background 12 2.1 Space Environment . . . 12

2.2 FPGA In Space Environments . . . 12

2.3 Problematic Clocking Issues . . . 13

3 The SEUD Camera System 15 3.1 Board preference . . . 15

3.2 Initial Camera System . . . 16

3.2.1 Pixel handler . . . 17

3.2.2 Camera controller . . . 19

3.2.3 Clock Manager . . . 21

3.2.4 Application Manager . . . 22

3.2.5 Block Memory generator . . . 23

3.2.6 VGA generator . . . 23

3.3 2nd Camera System . . . 23

3.3.1 AXI4 Pixel-Handler . . . 24

3.3.2 Video Direct Access memory(VDMA) . . . 26

3.3.3 MicroBlaze . . . 27

3.3.4 Video Timing Controller (VTC) . . . 28

3.3.5 Dynamic Clock Generator (DCG) . . . 29

3.3.6 AXI4-Stream to Video Out . . . 29

3.3.7 RGB to DVI Video Enconder . . . 29

3.3.8 Memory Interface Generator 7 (MIG7) . . . 30

4 Experiments 31 4.1 Test And Validation . . . 31

4.2 Results . . . 37

5 Conclusions 43 6 Future Work 44 A Appendix A - Space environment 45 A.1 Radiation Belts . . . 45

A.2 Radiation Effects on Digital Electronics . . . 46

A.3 Soft Errors explained in detail . . . 48

A.3.1 Single Event Upset (SEU) . . . 48

A.3.2 Multiple Bit Upset (MBU) . . . 48

A.3.3 Single Event Functional Interrupt (SEFI) . . . 49

A.4 Hard Error . . . 50

(6)

B Appendix B - Miscellaneous 51 B.1 DRAM Memory transfers . . . 51 B.2 Video Graphics Array (VGA) explained . . . 51 B.3 AXI4 protocol . . . 52

(7)

1 Introduction

1.1 SEUD Experiment

Currently, KTH Space center is working on a student project called MIST satel- lite. The MIST satellite project is divided into several sub-projects containing interesting research experiments such as the Propulsion experiment, RATEX- J, Piezo LEGS, CUBES, SiC in Space, MoreBac, and the SEUD experiment and a Camera unit. The SEUD experiment is the most relevant experiment for this thesis and the goal is to sense bit-flips on Commercial Off The Shelf (COTS) FPGAs. The SEUD is a self-healing architecture that measures bit-flips in memory and corrects them. Its second objective is to show that processing can be done safely, despite the bit-flips. While Camera Systems can be used for applications such as targeting locations, recognizing objects and calculating distances, anomalies related to power management and system reliability still remains a problem.

The Camera System is directly connected to the SEUD experiment. It’s purpose is to capture pictures of Earth. The satellite can traverse several lapses around the Earth until the picture is transmitted (due to low bandwidth on the down-link) down to Earth. While processing and saving the picture seem solvable through engineering, pixel-data is still subjected to ionizing radiation since radiation hits sensitive nodes in electrical hardware. As a consequence, unforeseeable behavior occur and slow down the system response, which then results in non-deterministic behavior.

Currently, the SEUD prototype consists of an ARTIX-7 chip, which works both as a sensor and controls system functionalities, self-recovery and data collection, and a flash-based Microsemi SmartFusion2 FPGA, working as a su- pervisor for the ARTIX-7 chip, and responsible for communicating with the On Board Computer (OBC) of the MIST Satellite. While the ARTIX-7 is sensitive to SEUs in its configuration memory, the flash-RAM in the Microsemi Smart- fusion2 FPGA is quite insensitve to radiation. It is expected that the ARTIX-7 is affected by a SEU every ten minutes, while the SmartFusion2 suffers from a hit every two years, which is longer than the expected lifetime of the satellite.

There are three operational soft cores in the ARTIX-7 chip and one ARM core in the SmartFusion2 chip. Even though the flash-based configuration memory is not sensitive to SEUs, the registers in the ARM-core and the memories still are.

The memory on the SEUD architecture is built from transistors that store builds up of charge and represent binary information that either is one or zero, and can be changed when ionizing radiation hits the transistors in the DRAM.

The goal with SEUD is to count the number of soft errors pertaining to Single Event Effects (SEEs) that happen in Space to see how frequent they are and mitigate them. Soft errors are resolvable and can be mitigated with scrubbing techniques. Hard errors, like latch-up effects, are more difficult to handle since it causes the hardware to be permanently damaged. However, the ARTIX-7 is reportedly immune to latch-up effects, making them unlikely to happen, so any concerns about managing hard errors is put aside.

(8)

1.2 SEUD Architecture

The self-healing SEUD architecture consist of two COTS FPGAs such as the ARTIX-7 and the SmartFusion2 chip. Further, two SDR SDRAMs and three flash memories are used to save important data into the memories. The Smart- fusion2 chip consists of an ARM Cortex-M3, and a radiation hardened flash memory, which supervise and perform tasks like re-programming, restarting and overseeing the ARTIX-7 chip. Also, the communication link between the SEUD and the OBC is provided through the Smartfusion2 chip. The ARTIX-7, on the other hand, work as a sensor-platform that captures SEUs when bit-flips happens in the memory. The ARTIX-7 chip is susceptible to radiation thus having a high rate of SEUs.

Figure 1: The SEUD architecture

In this thesis, an OV7670 camera is used to capture a picture and save it in memory. Currently, the system has three operational soft cores on the ARTIX-7 chip. The smartFusion2 will save the image in the SDRAM and transmit it to the OBC. The goal with SEUD is to correct and find SEUs in the Space and maintain a safe system functionality even when exposed to radiation. To show that this is possible, the image compression is done on the FPGA sensitive to radiation. The TMR is used in order to mask and scrub the corrupted bit. In this way, values are triplicated and saved into the memory, and voted out. The healing core is used to detect soft errors in the FPGA and mitigate them by a scrubbing algorithm. Connected to the SEUD , the 0.3 megapixel OV7670 camera peripheral is the main sensor generating the pixels that is stored in the SDRAM. The idea is to use the Microblaze soft processor and compress the image from the memory and thereafter send it to Earth.

(9)

In Space applications, in order to repair faults caused by ionizing radia- tion, self-healing techniques are desirable. In [1] the authors mentioned that they successfully implemented a TMR algorithm that was able to detect multi- bit flips in a data word. The self-healing SEUD architecture was later en- hanced by [1] which proposed a new SEUD design containing a Single-Data-Rate (SDR) DRAM. The SEUD architecture was generated using the NoC System Generator[2]. The NSG tool abstracts out the detailed programming and de- bugging process for every processor in the system.

Designing reusable and adaptive hardware has always been a demand for effi- cient FPGA designs that aim to balance production cost, development time, and performance. This makes SoC an attractive preference for Space applications[3].

It’s also feasible to mitigate faults using SoC, since it provides HW and SW reusability, along with IP-cores, HAL (Hardware Abstraction Layer) drivers, and the AXI4 protocol. The AXI4 protocol provides burst traffic for applica- tions that focus on performance, since burst transfers increases data throughput.

Importantly, in [4] and [5], the authors demonstrated a working adaptable self- healing architecture that covered many FT aspects. However, the authors also concluded that conventional hardware redundant schemes like TMR is overused nowadays and had to be enhanced and replaced with an improved fault miti- gating technique like TMR-with-spare.

The multi-core platform is generated using an XML configuration file. As time passes, number of processors in multi-core platforms are getting bigger in quantity and thus paving the way for “Sea-of-cores”. Since complexity increases when trying to integrate a huge multi-core platform with thousands of proces- sors, overseeing the multi-core platform is cumbersome since resources such as processors or memory controllers has to be supervised. The NSG tool is there- fore used to generate the multi-core platform needed for the SEUD architecture.

To sum it up, the SDR SDRAM used in [1] draw to much power. A better memory had to be utilized, in order to increase the overall transaction speed for every pixel data. Therefore, in this thesis, a Double Data Rate (DDR) 3 SDRAM is used.

(10)

1.3 Camera System

The initial design in this thesis uses a controller that stores 12-bit RGB (con- sidered as a pixel) values into the BRAM storage as a start. The values are thereafter read and sent to the VGA generator that shows the picture on the VGA screen, and thus validating the initial design. The low cost camera is programmed to process 307200 (640x480) pixels. The uniqueness of the ini- tial design is that it did not use any AXI4 (Advanced eXtensible Interface 4) IP cores and was mainly designed using VHDL code targeting the Zedboard.

Further, simulations where used to validate critical data-paths and the whole pixel data-flow. The VTPG (Video Test Pattern Generator) IP core was used for testing and debugging various configurations such as video colour, quality, edge, colour bars. and motion performance since it eases internal and external block validation.

The 2nd camera architecture in this thesis is far more complex compared to the initial camera design. The challenge here is to adjust the VHDL code to the AXI4 protocol specification. The Nexys Video board was used to validate the 2nd camera architecture. The board has 33650 logic slices and ten clock management tiles where each clock tile has a Phase-Locked Loop (PLL) with a maximum clock speed of 450 MHz [6]. The Zedboard used in this thesis has a cortex-A9 and reaches memory speeds of 533 MHz (1000 Mbps)[7]. The Nexys Video board, on the other hand, uses a data rate of (450 MHz) 800 Mbps. On the Zedboard the DDR3 memory is connected to a hard memory controller in the Processor Subsystem (PS). The Nexys Video board, on the other hand, gives more freedom of choice in memory controllers.

In order to speed up the process, a large chunk of data can easily be trans- ferred using a Direct Memory Access (DMA). For example, it is more efficient to transfer data-chunks from main memory to other I/O peripherals rather than letting the processor to do the job since the DMAs operates without involving the processor. However, the DMA consist of hardware resources which im- plies that it consumes more power when used. Additionally, tasks pertaining to providing memory addresses, generating control signals, increasing the memory addresses for consecutive words and retaining the number of transfers is handled by the DMA. While there are huge advantages of not letting the processor to intervene with the DMA, the DMA must be checked occasionally thus ensuring its operability.

The 2nd camera architecture has a Video Direct Access Memory (VDMA) that sends 24-bit RGB data to a DDR3 SDRAM and controls the pixel-data flow.

Timing parameters are generated using the Video Timing Controller (VTC) IP. The VTC core contains control registers that configures important timing parameters in order to synchronize input and output signals, needed for the HDMI screen. Also, an RGB to DVI encoder was used to encode the RGB values so that it can be presented on the HDMI screen. Most of the provided AXI4 IP cores have an AXI4 compliant slave and master interface, i.e., making it easy to connect various IP cores.

Simulation was used as well in order to validate each IP core in the 2nd camera architecture. Initially, a smaller picture was tested (10x10), the reason is simply to see if the pixel-data could be traced. As distinct from the BRAM used in the custom VHDL design, the MIG7 controller IP was used to transfer the pixel-data to the DDR3 memory. Unlike the BRAM that was used in the

(11)

initial architecture, the MIG7 controller could not be simulated at all, but was validated using an HDMI screen.

Clock-switching data paths are scrutinized as well in order to find the most power consuming areas in the Camera Systems. The Camera System has to control the clock sources in order save power. This is because switching activity (of clocks) increases the power consumption.The power consumption of both the initial design and the 2nd design is discussed and evaluated. The Camera Systems are evaluated based on efficiency, power consumption and how relevant they are for the MIST satellite.

Another factor that can affect the power consumption is the picture size, i.e., number of pixels to process. Clearly, better picture quality means that you have more pixels to process and save in memory, which needs more clock cycles and time. Since the goal is to send down a High Definition picture to Earth, compression algorithms might not be the best choice since it removes pixel-data and consumes more power. The power budget is aimed to be below 1 W. By enabling and disabling power consuming data-paths, power can be saved which in turn increase battery-life. Currently, estimates of the dynamic power indicates that an operating frequency of 12 MHz would consume about half the power budget.

Matters related to maximum clock rates for hardware peripherals, skews, number of clocks are important when designing the Camera System. All in all, almost all modern FPGAs have some sort of clock skewing resources for routed clocks such as dedicated clock inputs or low-skew buffers. In this thesis, the Nexys Video board will be used to design the Camera System. The Nexys video board uses a Clock Management Tile (CMT) which uses two distinct clocks: the Mixed-Mode Clock Manager (MMCM) and the Phase-Locked Loop (PLL)[6]. The PLL is a light weighted version containing sub-functionalities of the MMCM. The 7 series devices provide up to 24 CMT tiles, frequency synthesizers, jitter filters and low skewed clocks. The clock buffers shall also be placed in the same region as the MMCM in order to decrease the clock skewing.

Hardware redundancy schemes provides fault tolerance, re-configurability during run-time, and can be useful when designing the Camera System to be used on the MIST satellite. Hardware redundancy is commonly more costly than a non-redundant system, but it is safer. Also, fault detection schemes prevent numerous errors from occurring. TMR is one fault detection scheme which is commonly used to mask faults in a memory frame. The idea is to integrate the TMR scheme together with the Camera System since the goal is to find the corrupted bits (bit-flips due to radiation) in the memory, and mitigate them using a scrubbing algorithm. The reason why TMR is used is because to triplicate raw data and save the raw data into the memory. The triplicated values can thereafter be compared in order to find the faulty data.

The error can then be masked out and tracked considering that three faults happening simultaneously at the same location in different copies of the data is extremely low. The rate of scanning has to be higher than the rate of bit-flips in the memory, i.e., the memory has to be scanned in a smart way in order to save power.

Blind-Scrubbing combined with dynamical partial reconfiguration is one way to handle failures in the FPGA configuration memory. However, since the FPGA is on write mode state when using Blind-Scrubbing it is more vulnerable to SEEs[8]. For instance, if the configuration memory is affected by upsets during a

(12)

write operation, data can change and mislead the CPU with wrong information.

Contrary, when the FPGA is on read mode, the default information in the FPGA configuration memory is known, and thus, simplifying validation of its integrity.

1.4 Problem Description

The Camera System has to consider radiation while the satellite traverses Earth.

Sporadic upsets has to be handled using fault-recovery techniques. The Camera System is supposed to be integrated into the SEUD architecture containing peripherals like memories, processors or IPs. It shall also consider radiation effects in Space as well.

Currently, the design attempts to reduce power consumption by minimizing the system clock frequency, but the integration of the Camera System into the SEUD necessitates the evaluation of multiple clocking domains and/or gated clock solutions. Further, the SEUD design has a power budget that is con- strained to consume less than 1 W.

The goal in this thesis is to store 640x480 pixels using a VGA camera. The estimated data stream rate is 640 x 480 x 8 x 3 = 7,372,800 bits, i.e., 200 Mbps(30 frames/second). The design shall capture one frame from a camera sensor, controlled by the SEUD ARTIX-7 FPGA, and buffer it to an off-chip SDRAM. Furthermore, the clocks in the system has to be managed such that they meet the bandwidth requirements.

1.5 Research Objectives

This thesis has the following objectives:

• Develop and examine multiple clock domains or gated clock solutions for the Camera System.

• Develop a suitable data path region that buffers high-speed data (2 Gbps) to an external SDRAM that works together with an adaptive clock de- signed to meet the bandwidth requirements.

• Design and integrate the developed IPs into the NoC and make the Camera System work on an FPGA.

(13)

2 Background

2.1 Space Environment

Figure 2: The difference between LEO and GEO

Modernized electronics used in Space are susceptible to radiation and can lead to anomalies pertaining to system degradation. The radiation intensity varies depending on how far the satellite orbits around the Earth. As described in appendix A, satellites orbiting around Earth are usually placed in Low Earth Orbit (LEO) or Geosynchronous Equatorial Orbit (GEO). A susceptible hit on the hardware does not mean that the hardware breaks, instead it makes the signals change state.

Traditionally, DRAM(see appendix B.1) devices are made of transistors that store build-ups of charge representing binary information. Connected to the Camera System, focus will be put on how memories and the Camera System react against ionized radiation hitting memory cells. Also, single event strikes near the storage capacitor or the source of the access transistor are other soft errors that are relevant to study in order to increase the reliability of the Camera System.

SEEs are categorized into different sorts of errors such as soft errors and hard errors. Soft errors are non-destructive, i.e., the design still functions but with degraded performance. Connected to soft errors, three fundamental errors like SEUs, MBUs, and SEFIs are the most common failures causing the hardware to malfunction (see appendix A.3). Hard errors, on the other hand cause severe or permanent damage to electronic devices and are out-of-scope of this work.

2.2 FPGA In Space Environments

Old Satellite designs launched out to Space had limited on-board processing capacity and time was spent on capturing pictures, analyzing it and sending it down to Earth. This was done using processors since FPGAs were not avail-

(14)

able at that time. In [9] the authors concluded that radiation hardened FPGAs and image-based pipeline architectures are promising, and can be used in Space applications since it offers immense processing speed. Also, the authors con- cluded that off-chip SRAMs are crucial for storing data since it reduces design footprint, and enables more complex, and larger designs.

Bits located near each other have a higher chance of being exposed to MBU.

In [10], the authors discussed that the physical layout (in FPGAs) of the con- figuration memory is important to study when analyzing Multiple Bit Upsets (MBUs). The purpose is to investigate and analyze MBUs in adjacent memory bits and memory structures.

While many researcher mentions reliability as a major concern when de- signing FT hardware architectures, power consumption still remains a great challenge. Technology scaling mentioned in [11] decreases the power consump- tion but makes circuits more sensitive to MBUs. The authors concluded that the worst case scenario has to be studied well before the satellite is launched into Space.

Radiation tests for satellites are usually done in-house and is expensive. It is also an active research area. In [12], a randomized test system was developed in order to test electronic devices exposed to ionizing radiation. The authors concluded that their fault injecting mechanism was cheap and flexible. Further, fault-injection during testing on the SEUD can reliably detect and correct two SEUs/second. The correction frequency was later improved in [13]. However, in this thesis, the focus is mainly on testing the functionality of the Camera System, i.e., whether it can transfer pixels efficiently and save them into DRAM.

2.3 Problematic Clocking Issues

Multiple clocks used in larger designs can give various outcomes, therefore, crit- ical data-paths has to be investigated in detail. According to [14], matters per- taining to maximum clock rates, skews, the number of clocks and asynchronous clock designs is important to investigate in FPGAs in general. Anomalies re- lated to system degradation has to be checked when designing a Camera Sys- tem since unwanted propagation effects can slow down the output resulting in non-deterministic behavior. Decreasing the number of muxes in multiple clock domains is one method that eliminates cross-talk noise between clocks as men- tioned in[14].

All in all, almost all modern FPGAs have some sort of clock skewing resource for routed clocks such as dedicated clock inputs that are feed into low-skew buffers. For instance, the PLL[6] clock resource on the Nexys Video Board use low-skew buffers. This is in order to prevent it from outputting corrupted data due to propagation effects. A reliable system usually consists of hardware logic that is placed on the global nets where clocks usually drive as many gates as possible, whereas smaller hardware logic is placed on the local net. This will stabilize the design and minimize the chance for unwanted signals taking place in between two flip-flops, i.e., flip-flops driven by the PLL clock. Propagation effects is therefore excluded from this investigation since the authors in [15] also mentioned that the ASY N CREG attribute enables synchronization chains, an input of asynchronous data, optimizations, clock placement, and clock routing, and thus eliminating unwanted propagation behavior.

(15)

Controlling multiple clock domains with various frequency levels is impor- tant since power consumption can greatly be reduced with various power saving techniques. In [16], the authors designed a Clock Gating Generator (CGG) that considered metastability for their asynchronous clock domain architecture.

While extensive research shows that TMR decreased the failure rate for syn- chronous clock domains, authors in [17] argued that asynchronous clock domains with hardware redundancy schemes like TMR might not give the same failure rate due to the use of asynchronous clock architectures causing metastability.

Therefore, the authors proposed an enhanced TMR-based synchronizer that re- duced the metastability rate in asynchronous clock domains. The SEUD is a synchronous design, so this area will not be investigated in this thesis. How- ever, for future designs, metastability shall be considered when selecting a new memory structure for the SEUD project.

A structural clock gating method was developed in [18]. The task was to control clocks in the system. Trivially, this is done by writing to a status clock register which deactivates the clock domain thus saving power. Another impor- tant and yet simple method used to manipulate clock domains is presented in [19], where a clock manager is designed. The main task is to check for relevant data-paths and disable or enable clocks. Also, the authors discussed the impor- tance of gating, enabling or blocking clocks and concluded that their developed technique had several advantages compared to existing power-saving techniques for SRAM-based FPGAs. Since adaptivity and manipulation of multiple clocks are significant when designing the Camera System, the authors in [20] proposed an Adaptive Clock Management (ACM) method, which enhanced performance.

The idea is to oversee the instruction stream flow and make decisions thus achieving maximum speed up. As concluded from their research, the authors finally came up with a hybrid Asynchronous Clock Manager (ACM) combining Clock Stretching (CS) and Clock Multiplexing (CM). However, since decisions are mainly based on performance metrics, it is not efficient enough to be used for Space applications since it does not consider ionizing radiation. Additionally, in [21] the authors provided a clock domain translation scheme that prevented clocks from jittering. However, the authors later concluded that a better jitter eliminating circuit could be used to increase the signal integrity and accuracy.

Video data that have image dimension issues such can be scaled or cropped.

applications that process continuous streaming data are ususally using three frame buffers are, i.e., one for reading, another one for writing, and the last frame to improve synchronization, avoid stuttering, and decrease screen tearing which can be seen on HDMI or VGA screens.

(16)

3 The SEUD Camera System

This section describes the Camera Systems for the SEUD architecture. As a start, a comparison between two FPGAs are presented, i.e., the Zedboard and the Nexys Video board. The initial SEUD camera design uses the Zedboard and has a popular community compared to the Nexys Video board. This made development much easier since there are a lot of camera examples.

The 2nd Camera System for SEUD uses the Nexys Video board along with Xilinx’s own IP cores. The IP cores are AXI4 compliant and offer a great trade- off between resource utilization and power consumption through configuration settings for each IP block. Also, any AXI4 compliant memory controller can be used together with the AXI4 blocks without affecting the modularity of the system.

Blocks marked as green are developed by the student, whereas blocks marked as yellow are either taken from tutorial[22] or is Xilinx’s own AXI4 compliant IP.

3.1 Board preference

In this section two FPGA boards are compared and discussed in order to explain why the Nexys Video was chosen for developing the 2nd SEUD Camera System.

The Zedboard contains a dual cortex-A9 and is based on Xilinx Zynq-700 All Programmable SoC. Further, it combines 85000 Series-7 Programmable Logic (PL) cells and can be used for various applications. Also, an onboard oscillator of 33.333 MHz (PS) and 100 MHz (PL) is used for clocking various applications.

Further, an HDMI output along with a VGA connector is provided along with a 512 Mb DDR3 memory system[7].

On the other hand, Nexys Video board contain 33650 logic slices, each with four 6-input LUTS and 8 flip-flops. A total of ten clock management tiles are provided where each clock tile has a Phase-Locked Loop (PLL) thus reaching a clock speed of 450 MHz. The Zedboard will reach higher memory speeds which is 533 MHz (1000Mbps) whereas Nexys Video has a data rate of (450 MHz) 800 Mbps. It is also using a DDR3 which is connected to a hard memory controller in the Processor Subsystem (PS). The Nexys Video board and the SEUD architecture use the same ARTIX-7 chip. However, it does not have a hard memory controller and gives room for designing a custom memory con- troller. Therefore, in this thesis, the Nexys Video is chosen since it provides the freedom to add or remove functionalities like a TMR scheme when saving important pixel-data. Also, the Nexys Video board, offers soft cores that can be used with the ARTIX-7 chip, which makes it similar to the SEUD board.

(17)

3.2 Initial Camera System

Figure 3: Initial Camera System

The Camera design, outlined in figure 3 consists of seven blocks which are used to capture a picture of 640x480=307200 pixels and save it into the memory.

The application manager block controls the behaviour of the system, i.e, it decides when to capture a picture and save it into memory. It is also used to control the clocks in the initial Camera System. The application manager block knows when a frame has transferred all its pixel-data. It will thereafter send a new Enable signal that enables the VGA clock only. This is in order to start the VGA generator that is used to present the 640x480 picture on the VGA screen.

The clock manager block is dependent on the clock wizard block and the application manager block. The clock manager block controls the clocks in the camera design. It is used to enable the pixel-clock for the camera sensor, the VGA-clock for the VGA generator, and the I2C clock for the Camera Controller.

In this way, relevant clocks are turned on and off thus saving power.

The camera is pre-programmed with a set of instructions in ROM. The Cam- era Controller block initializes the registers through the SCCB interface. These instructions are camera specific configurations used to set the pixel format, the picture size, and etc.

The Pixel-Handler block interfaces the OV7670 camera and is used to cap- ture 8-bit per pixel clock. So the pixel-data is retrieved every second clock cycle.

The values are thereafter converted to 12-bit RGB pixel-data and saved into the BRAM.

The VGA generator block is used to test the initial Camera System. It has the task to read the pixels from the BRAM memory, and present the 640x480 picture on the VGA screen.

(18)

3.2.1 Pixel handler

Before explaining how the pixel handler block works. The OV7670 camera has to be understood since the Pixel Handler block is used to interface the OV7670 camera sensor.

Figure 4: The OV7670 camera

The OV7670 module is a low power, low cost and small-sized camera chip containing a VGA camera with an image processor. The sensor can operate at a speed of 30 Frames Per Second (FPS) in VGA format. Further, features such as exposure control, gamma, white balance, color saturation and hue control are provided in order to control image quality.

Providing formats like full-frame configuration which helps to capture high- resolutions pictures and reduce noise, sub-sampling configuration which is used for image scaling, and a windowed 8-bit configuration which is used to represent 8-bit RGB values in windowed mode with different sizes, the OV7670 camera have various configuration formats. The table 5 describes all the pins that are used for the OV7670 camera.

Pin Type Description

VDD Supply Power supply

GND Supply Ground level

SDIOC Input SCCB clock

SDIOD Input/Output SCCB data VSYNC Output Vertical synchronization

HREF Output Horizontal synchronization

PCLK Output Pixel clock

D0-D7 Output Video parallel output RESET Input Reset (Active Low) PWDN Input Power down (Active high)

Table 1: Pin map for OV7670

A pixel is a numerical value that consists of color values such as Red, Green,

(19)

and Blue[23], hence called RGB pixels. The formats can vary in different bit size. The most commons ones are the following formats: 5:6:5, 5:5:5, and 4:4:4.

The pixel handler is programmed to use the 5:6:5 format in order to convert the 12-bit RGB data.

The YCbCr (4:2:2) format, also known as Chroma sub-sampling is a com- pression method that aims to reduce the color information in a signal. It is used to reduce the bandwith without affecting the quality of the picture[24]. How- ever, this format was not used in this thesis since the RGB format is simpler to work with. Further, the image size can be configured and varies from 640x480 to 40x30[25].

The pixel-handler block gets the pixel-data from the OV7670 camera in order to capture 16-bits every second clock cycle and converts it to 12-bit RGB pixel- data. The camera is restricted to transfer 8-bits every P CLK clock cycle. So two clock cycles are needed to capture 2 bytes, i.e, 16-bit RGB (5:6:5). Since 12-bit RGB pixel-data is enough to present a picture on the screen, four distinct bits are skipped. Figure 5 shows the bits to be removed from the 16-bit data, i.e, position 15, position 10, position 9 and position 4 are skipped since they are not needed for the 4:4:4 RGB format.

Figure 5: The 16-bit RGB (5:6:5) value from the OV7670 camera sensor is converted to 12-bits RGB (4:4:4) in order to be presented on the VGA screen

As can be seen from figure 6, the first state initialize the parameters described in table 5. When V SY N C is low and HREF is high the byte counter is increased by one, and this indicates that one byte has been processed. When the byte counter is equal to two, the W riteEnable signal will be enabled and is set to 1 when a write operation has been issued to the BRAM controller, i.e., in this case writing the converted 12-bit RGB to the BRAM. There is a total of 640*480 = 307200 address space for the 12-bit RGB pixels captured from the OV7670 camera. The HREF signal indicates when a new pixel data on the horizontal row has arrived and the V SY N C signal is when a new frame is available, i.e., when 480 lines has been processed.

(20)

Figure 6: Behavioral diagram of the pixel handler

All the parameters in table 2 are set to 0’s when the V SY N C signal is high, thus, indicating that a new frame is ready to be processed.

Pins Type Description

PCLK Input OV7670 clock

DATA input OV7670 8-bit data

HREF input Horizontal synchronization VSYNC Input Vertical synchronization BRAM address Output BRAM pixel address

BRAM pixel Output BRAM Pixel data

WE Output Pixel clock

Table 2: Pins for the pixel handler block

3.2.2 Camera controller

The Serial Camera Control Bus (SCCB) is OmniVision’s self-defined protocol used to send information. The most essential pins are the SIOC,SIOD and SCCBE. Usually, a three-wired serial bus is used to configure the camera, but in this thesis the two-wired serial bus is used since the OV7670 consists of the reducded package. The data can be transferred in three different phases and is categorized into 3-phase write transmission cycle, 2-phase write transmission cycle, and 2-phase read transmission. Further, in the two-wired implementation, the SCCBE signal is not used and held low. This is to ensure that only one master module can operate with the slave device which is the camera chip itself.

The 3-phase write transmission protocol is mainly initiated by the master device. One byte of data is written to a specific slave. In the first phase, you write an ID address that specifies the slave device address, in this case the OV7670 camera. Subsequently, a Sub-Address is written which identifies the register address to be modified. Lastly, the data is written to the respective address register.

The 2-phase write transmission is usually used together with the 2-phase

(21)

Figure 7: 3-phase write

read transmission. Initially, a write 2-phase transmission has to be issued first in order to send the slave device ID address, and then the Sub-Address is written.

The Sub-Address is the register address in this case. Thereafter, a 2-phase read transmission is issued with the device ID address and the data to be read from the register address.

Figure 8: 2-phase-write and 2-phase-read

The OV7670 controller uses a ROM containing instructions that are sent through the SCCB interface which is based on the I2C protocol. These instruc- tions are camera specific configurations used to set pixel format such as having the (5:6:5) RGB picture format, having 640x480 picture size, and initializing the control signals such as the HREF signal that indicates when rows are available

(22)

and the V SY N C signal which indicates when a new frame is available. Table 3 below shows the registers that are used to to configure the OV7670 camera.

Register Name Address Data Description

Common Control 7 0x12 0x80 resets internal registers

Common Control 7 0x12 0x04 Set RGB format

CLKRC 0x11 0x00 Disable clock option

Common Control 3 0x0C 0x00 Reset scaling settings Common Control 14 0x3E 0x00 Disable PCLK scaling

Common Control 15 0x40 0x10 Set RGB to 5:6:5

Common Control 1 0x04 0x00 Disable the CCIR656 format

TSLB 0x3A 0x04 Sets Saturation Level Auto)

Common Control 9 0x14 0x30 Sets AGC ceiling to 16x

MTX1 0x4f 0x40 Sets MTX Coefficient 1

MTX2 0x50 0x34 Sets MTX Coefficient 2

MTX3 0x51 0x00 Sets MTX Coefficient 3

MTX4 0x52 0x3d Sets MTX Coefficient 4

MTX5 0x53 0xa7 Sets MTX Coefficient 5

MTX6 0x54 0xE4 Sets MTX Coefficient 6

MTXS 0x58 0x9E Sets MTX sign and Auto Contrast mode Common Control 13 0x3D 0xC0 Sets GAMMA and UV Auto Adjust

HSTART 0x17 0x11 Sets HREF start high 8-bit

HSTOP 0x18 0x61 Sets HREF end high 8-bit

HREF 0x32 0xA4 Sets edge offset

VSTRT 0x19 0x03 Sets VSYNC start high 8-bits

VSTOP 0x1A 0x7b Sets VSYNC end high 8-bits

VREF 0x03 0x0a Sets VSYNC low two bits

MVFP 0x1E 0x30 Sets flipp and mirror mode

Common Control 12 0x3C 0x80 Sets to Enable HREF

GFIX 0x69 0x30 Sets FGC for R,G,B format

Table 3: Instruction in the ROM

3.2.3 Clock Manager

The clock manager block in figure 3, enables and disables relevant clocks in the Camera System. The clock management block is responsible for controlling the P CLK clock, the V GA clock, and the I2C clock. These clocks were created using the Wizard Clocking IP core. The P CLK is used to process the pixels from the camera. The I2C clock is used to configure the camera with correct register settings. The V GA clock is used to generate the picture on the VGA screen. The following frequencies on table 4 are used to control the initial Camera System:

As can be seen from figure 9, there are two states used to control the clocks, i.e., the first state enables the P CLK clock and the I2C clock in order to retrieve the pixel data. The second state, on the other hand, is used to process the picture from the BRAM and present it on a VGA screen.

(23)

Pins Description VGA clock 25.175 MHz

PCLK OV7670 24 MHz I2C I2C Clock 100 khz XCLK XCLK Clock 48 Mhz

Table 4: Clocks in the Camera System

Figure 9: The Clock management statemachine

3.2.4 Application Manager

Figure 10: The application management statemachine

The application management block controls the Camera System. This block is also used to debug the Camera System on a behavioral level. The application management is responsible for checking if there is a picture in the memory.

This is done by a flag named pictureInM emory. If this flag is enabled, the two clocks are disabled since only the VGA clock is needed to present the picture in memory. As can be seen on figure 10, state GetnewP icture defines when there is no picture in the memory and that two clocks are enabled to process the picture which is the I2C clock and the PCLK clock. State P resentpicture, on the other hand, describes when there is a picture in the memory, i.e., when newpicture flag is low. A GPIO switch is used to capture a new picture.

(24)

3.2.5 Block Memory generator

The BRAM on the Zedboard is 36 Kb RAM and can be used for high-speed memory interfaces[7]. Additionally, each BRAM contains two consecutive blocks of 18 Kb RAM. The connections are cascaded thus widening the memory size and decreasing the timing penalty. The BRAM control modules can come in various forms such as the synchronous Dual-Port, Single-Port or Simple Dual- port block. Readers that want more information are encouraged to read[7].

The most essential blocks are the BRAM controller that is configured to store 12-bit RGB values and the size of the BRAM was made to have a size of 640*480= 307200 fram addresses to store 12-bits each.

3.2.6 VGA generator

As can be seen from figure 3, the design is validated by using a VGA generator that takes the pixel values from the memory and presents it on a VGA screen.

Readers can read B.2 in order to get a deeper understanding of pixels and VGAs.

The VGA generator is used to generate the picture. The block is driven by the VGA clock and is used to generate the RGB pixels on the VGA screen.

The 12-bit RGB values are converted into 4-bit output signals such as vgaRed, vgaBlue and vgaGreen. These values are thereafter presented on the VGA screen together with the vgaV sync signal which indicates when the last line is done and the vgaHref which indicates when a new line can be processed. The table below show all the signals on the VGA generator block.

Pins Description

vgaRed R-value

vgaGreen G-value

vgaBlue B-value

vgaHsync Signal used to indicate when a row is done vgaVsync signal used to indicate when a frame is done

EN Enable signal used to read from the BRAM BRAM-frame-addr Pixel address

BRAM-frame-pixel Pixel data

Table 5: Signal description for the VGA generator block

3.3 2nd Camera System

The 2nd Camera System in figure 11 is much more complex and uses the AXI4- specification in order to communicate between respective blocks. Readers inter- ested in how the AXI4-protocol works are advised to read B.3. It stores 8 bits every clock cycle as usual but sends out 24-bit RGB-pixel to the VDMA block.

The most important block, the AXI4-pixel handler captures 16-bits and con- verts it to 15-bit RGB. The 5:5:5 RGB data gets further converted into 24-bit RGB pixel data, i.e., the R,G and B bits are copied and duplicated in order to have 24 bit RGB (8:8:8) pixel-data. This was a technique to make the picture clearer by adding more bits to the pixel-data.

The output values of the AXI4-Pixel handler is adjusted according to the AXI4-protocol specification since it has to be compatible with the VDMA. The VDMA transfers the pixel-data from the AXI4-pixel handler to the DDR3 main

(25)

Figure 11: The 2nd Camera System

memory. It is faster than the microBlaze which has the maximum frequency of 100 MHz, whereas the VDMA transfers faster than the normal soft processor.

The VDMA is a register based peripheral and can be configured to run on different modes. The VDMA is used to store data into the DDR3 memory, read data from the DDR3 memory, and stream data to the AXI4 Stream to Video Out block.

The AXI4-Stream to Video Out is used to configure timing parameters so that the data can be presented on the HDMI screen. Together with the VTC block, the picture size can be chosen in order to debug the picture and validate the picture.

A faster Dynamic Clock Generator is used to give the clock speeds that are needed to present the data on the HDMI screen. Serializers are used to send the data 10x faster than the pixel clock. The serial clock in the Dynamic Clock Generator is five times faster than the pixel clock and works as a de-serializer which is used to control the rate of data sent to the HDMI.

The RGB to DVI Video Encoder takes the 24-bit RGB pixel data and en- codes it to video data along with the pixel clock and synchronization signals.

The RGB data is encoded according to the DVI specification for source devices.

3.3.1 AXI4 Pixel-Handler

The AXI4-pixel handler is used to capture 8-bit sensor data from the OV7670 camera and convert it to 24-bit RGB data that is sent to the VDMA peripheral.

The communication between the AXI4-Pixel Handler and the VDMA must be according to the AXI4 protocol.

Initially, all the parameters are initialized with 0’s before data is captured from the OV7670 camera. The V Sync signal indicates when a new frame is ready to be processed from the OV7670 camera sensor. The HREF signal indicates when 8-bits are ready to be processed, and a byte counter is used to indicate when two bytes of data has been processed. Whenever 16-bits are

(26)

Figure 12: Flow diagram shows how the pixels are captured and further sent to the VDMA

captured then it is converted to 5:5:5 15-bit RGB data. Additionally, since the VDMA needs 24 bits to be transferred, additional bits are added. The pixel- data is converted to 24-bits RGB pixel-data. Figure 13 shows an illustration of how the R,G and B bits are copied and duplicated in order to have 24 bit RGB (8:8:8) pixel-data.

Figure 13: 16 bit RGB converted to 24 bit RGB

By replicating the last bits for each R, G and B, the quality of the captured picture can be enhanced as shown in figure 13. The R value is replicated three times, the G value is replicated two times and the B value is replicated three times. Even though the picture seems to be clearer with the extra bits, it is still false information. Perhaps using zeros instead of copying the last bit would maybe give a better result since it is a constant value, however, due to the limited time this could not be tested.

The conversion to AXI4 data is illustrated in figure 14. The OV 7670 − AXI4 − tuser signal is used to indicate the start of a frame and is controlled by

(27)

a pulse signal. This signal is used together with the VDMA in order to indicate the first pixel-data to the VDMA. The OV 7670 − AXI4 − tlast signal is used to indicate the last pixel-data in the line. The End Of Line state indicates that there are still pixels left, i.e., the whole frame is still being processed. The OV 7670 − AXI4 − tvalid signal indicates when a pixel is sent to the VDMA. So, when the signal is High the pixel-data is sent to the VDMA. When the signal is low the new pixel-data is processed and prepared for the the next clock cycle.

Figure 14: Flow diagram for assigning addresses for pixels using SOF (Start Of Frame) and EOF (End Of Line)

3.3.2 Video Direct Access memory(VDMA)

Figure 15: VDMA

Currently, the VDMA is configured to use a single frame buffer since only one picture is processed and saved. It provides high bandwidth which is good for increasing the throughput, and can be used to transfer data chunks between

(28)

different peripherals. The VDMA provides up to 32 frame buffers for a 32-bit address space. The data-width can vary from 8 to 1024 bits. Furthermore, the VDMA support asynchronous clock domains and dynamic clock frequency change. Genlock is used to synchronize writes and reads of frames. Register-sets can be used to program the VDMA so that it operates in different states such as interrupt mode or polling mode. Additionally, error codes can be used to see when the VDMA is non-functional or if incorrect configurations are used[26].

Write and read burst sizes are set to be 32 beats which simply means that four distinct bytes are transferred 32 times with only one address, and thus in- creasing the throughput. The VDMA along with the Microblaze soft processor, and the MIG7 controller is the key blocks controlling the data stream generated from the pixel-handler block. The VDMA peripheral is set to take 24 bit RGB (8:8:8) pixels generated from the OV7670 capture stream block. Further, the VDMA is programmed to receive 640x480= 307200 pixels. These pixels are thereafter sent to the MIG7 conroller. The VDMA either reads data from the memory, writes to the memory, or directs the data to the debug HDMI IP cores.

As can be observed in table 6, the following registers were used to configure the VDMA through the microBlaze:

Register Address Value Description

S2MM VDMACR Write channel 0x30 0x0001008b Bit 0-7 enables run/stop, circular park and genlock registers. Bit 16 defined the number of frames to be processed.

MM2S VDMACR Read channel 0x00 0x0001008b Bit 0-7 enables run/stop, circular park and genlock registers. Bit 16 defined the number of frames to be processed.

S2MM Start Address 1 0xAC 0x8E000000 Indicates the start address of a image buffer MM2S Start Address 1 0x5C 0x8E000000 Indicates the start address of a image buffer

S2MM HSIZE 0xA4 0x0000000A Defines the number of bytes per line which is 10.

S2MM VSIZE 0xA0 0x0000001E Indicates the vertical lines and has to be 3 times bigger i.e. 30 in this case.

MM2S HSIZE 0x54 0x0000000A Defines the number of bytes per line which is 10.

MM2S VSIZE 0x50 0x0000001E Indicates the vertical lines and has to be 3 times bigger i.e. 30 in this case.

S2MM FRMDLY STRIDE 0XA8 0X000001E Stride value has to be more than S2MM HSIZE

MM2S FRMDLY STRIDE 0x58 0x0000001E Stride value has to be more than MM2S HSIZE

Table 6: Register sets that are used to configure the VDMA IP block

3.3.3 MicroBlaze

As can be seen from figure 11, the final Camera System contains both RTL blocks and Xilinx’s own custom IPs. The most essential block in the design is the Microblaze soft processor. Microblaze is a 32-bit soft processor containing a Reduced Instruction Set Computer(RISC). The processor is optimized for implementations related to FPGA design. It is generally known to be low power and high-performance since it offers a speed of up to 100 MHz, has a 32-bit instruction set, a general purpose register, and a 32-bit address bus that can be extended to 64 bits.

The processor is used to configure each AXI4 compliant IPs in the block design. Additionally, a UART is provided to debug the IP cores through the Microblaze[27]. The main role of the microBlaze is to configure the UART in order to debug each peripheral, configure the VDMA to control the flow of data, configure the VTC to generate the correct synchronization signal, and configure the DCG to generate the correct clocks for the HDMI screen. Figure 16 shows how the microBlaze configures each peripheral.

(29)

Figure 16: MicroBlaze configuration

3.3.4 Video Timing Controller (VTC)

The VTC converts the 640x480 sized picture into a 1280x720 image which then generates pixels for HDMI (1280x720) screen for debugging. Timing parame- ters are generated from the VTC which is Xilinx’s own IP core. Furthermore, the VTC contains a video timing detector, a video timing generator, and an interrupt controller. The VTC can both detect and generate timing parame- ters such as vertical and horizontal pulses, polarity, blanking timing, and active video pixel[28]. The VTC is AXI4-lite compliant. The VTC can detect up to 8192 clocks by 8192 lines [28]. The input format generated from other IP cores can be modified to various video formats. It has six inputs such as the vertical blank, vertical synchronization, horizontal blank, horizontal synchronization, active video, and polarity[28].

Figure 17: Video Timing Controller

(30)

3.3.5 Dynamic Clock Generator (DCG)

Other than that, the Dynamic Clock Generator IP is used to generate important clocks such as the pixel clock and the serial clocks used for the RGB to DVI Video Encoder IP[29]. This clock uses a reference clock as input and generates two pixel clocks. The serial clock in the Dynamic Clock Generator is five times faster than the pixel clock, and works as a deserializer which is used to control the rate of data sent to the HDMI.

3.3.6 AXI4-Stream to Video Out

The AXI4-Stream to Video Out IP core is designed to work together with the VTC, and thus works as a bridge between AXI4 compliant Video Processing cores. Data is directed into the AXI4-Stream to Video Out IP and synchronized with video timing signals automatically. Asynchronous clock boundary crossing between video clock domain and the AXI4-stream clock domain are handled within the core. The data-width can be selected from 8 to 256 bits[30].

Figure 18: AXI4-Stream to Video Out

3.3.7 RGB to DVI Video Enconder

The RGB to DVI video encoder is used to encode the RGB values so that it can be presented on an HDMI screen. It takes 24-bit RGB as input, a pixel clock, and synchronization signals. The output is a TMDS interface used to generate the correct parameters, and thus showing the picture on the HDMI screen. The supported resolutions varies from 1920x1080 to 800x600[29]. The IP core has 8-bit for each color, a horizontal synchronization signal, a vertical synchronization signal, and a video data enable signal. The parameters has to be synchronized to the pixel clock.

(31)

Figure 19: RGB to DVI encoder

3.3.8 Memory Interface Generator 7 (MIG7)

The MIG7 controller is used to send data to the on-board DDR3 memory and is Xilinxs own AXI4 compliant IP. This IP is used to get the data from the VDMA and send it to the DDR3 memory. The VDMA has a read and write channel.

It is used to read and write to the DDR3 memory. The pixel-data stored inside the DDR3 memory is then presented on the HDMI screen. Readers intersted in how memory transfers works can read appendix B.1 in order to get a better overview of how the DDR memories work in general.

(32)

4 Experiments

4.1 Test And Validation

Figure 20: The simulation design

A BRAM was generated using the Block Memory Generator and connected to the output of the VDMA in order to confirm and validate the Camera Sys- tem through Simulation. The MIG7 controller, needed to validate the VDMA against a DRAM, was not simulated due to the limited time and technical prob- lems encountered during the thesis. Since memory IPs such as the MIG7 DRAM controller and the BRAM uses the protocol interface (AXI4), the validation of a particular memory would yield the same result for any memory having an AXI4 slave interface.

Important blocks such as the VDMA has to work correctly, i.e., data trans- fers shall map to the correct address space in memory. Therefore, it should be tested rigorously before being included in the design. The VDMA was pro- grammed according to the use case in[26] and has been adjusted a little bit to fit into the simulation. The VTPG core lets the user set important configu- rations such as video system color, quality, edge, and motion performance are programmable features[31]. It does also provides color bars that becomes helpful when validating and debugging any architecture using hardware or simulation.

For instance, a 10x10 picture can be sent to test the Camera System in this case. Formats such as RGB, YUV 444, YUV 422, YUB 420 are provided. The VTPG supports 8,10,12, and 16-bits per color component. The video format and size can be configured using it. It is simply meant to be used with the VTC in order to generate the correct timing parameters[31].

The validation design in figure 20 uses a VTPG in order to generate a RGB(8:8:8) pixel-data. As for debugging and ease validation, the colour red (F F 0000) was used since it is traceable. Also, a Microblaze processor core was used to configure the VTPG, the VDMA, the UART, the BRAM, and the GPIOs. The idea is to test a smaller picture (10x10 sized image), and

(33)

thereafter test a bigger picture (640x480 sized image). The pixel data is there- after directed towards the BRAM memory controller. Further, no addressing is

Figure 21: 10x10 picture using the VDMA

needed since the VTPG sends streaming data (using the AXI4-stream protocol) to the VDMA. The VDMA assigns addresses to the RGB (8:8:8) values, and sends the data directly to the BRAM controller. The VDMA maps every pixel into a 32-bit value, thus concatenating the pixel values.

Figure 22: VTPG generating 10x10 picture

Also, the order of the values can still be traced since the hex value F F indicates the start of a pixel. Figure 21 illustrates a clearer example of how a 10x10 image is being sent to the BRAM. As can be observed in figure 22, T U SER indicates the start and the end of a frame, T LAST signal indicates

(34)

the end of a row, and T V ALID indicates each valid row that contains 10 pixels.

Figure 23 illustrates how the picture is saved and then read out again from the

Figure 23: The BRAM receives the 10x10 pixels

BRAM, thus validating the whole data-path, i.e., from the VTPG to the BRAM.

Figure 24: Zoomed example showing 10 pixels

By scrutinizing the data stream in figure 24, each pixel value can be validated in the data stream until W LAST goes high. Further, as can be seen from the addresses, accesses are byte addressable. Totally, 10 pixels can be found in the data stream on the W DAT A line. The last pixel has a W ST RB value of three, and thus indicating that the bytes starts from LSB. This simply means that the following hex value F F 0000 is valid and saved correctly into the memory.

(35)

Figure 25: The whole 640x480 picture

The bigger picture was validated as well which can be seen in figure 25, where 640x480 = 307200 pixels was sent to the BRAM and validated. Since the simulation design turned out to be valid and working, the 2nd Camera System had to be validated using hardware as well. Once the simulation design turned out to be valid and working, the 2nd Camera System was validated using hardware as well. As can be seen from figure 26, the data pins for the OV7670 camera are not connected, thus, validating the path from the Microblaze to the HDMI screen. In other words, the Microblaze is used to generate the color pattern which is directly written into the VDMA and thereafter written into the streaming IPs, thus, confirming the data path before using the camera sensor.

To validate the AXI4 pixel-handler block in figure 11, generating the RGB (8:8:8) pixel-data, white ”F F F F F F ” was used in order to confirm the design in hardware.

Figure 26: Random pixel colours from the microblaze

Figure 27: White pixel colours from the microblaze

As can be seen from the figure 27, the values are passed through the VDMA into the streaming IPs thus showing the correct HDMI picture, and also note that the D0 − D7 pins are not connected in this example. However, the path where the pixels are stored into the VDMA, and saved into the memory, is still not validated. The microBlaze is used to send a white color bar to the HDMI screen, presented in figure 27. This is done by using library that has the ability

(36)

to send various color bars.

As can be observed in figure 28, the MM2S channel in the VDMA IP, re- trieves the pixel values ”F F F F F F ” from the DDR3 SDRAM. Further, ”ARV ALID”

indicates when addresses are ready to be processed. The ”RV ALID” signal along with the ”RREADY ” indicates when a successful read has occurred in the bus architecture. Finally, the last data being read is confirmed through

”RLAST ”.

Figure 28: MM2S read channel.

The ILA (Integrated Logical Analyzer) was used to validate the design that was loaded into the hardware. In this way, sequences of write and read trans- action are monitored which becomes useful when validating the design.

Also, as can be seen in figure 29 the S2MM channel (in the VDMA) success- fully writes ”F F F F F F ” into the DDR3 SDRAM. By scrutinizing the AXI4 transactions in figure 30, value F F F F F F F F (white) is sent, but this is not sufficient, and does not validate the whole flow of pixel-data being sent from the camera to the DDR3 SDRAM memory. The possibility to debug the DDR3 signals through the ILA is not possible since they are external and also because Vivado did not allow the signals to be selected when using the ILA. Therefore, the AXI4 interfaces were the only signals that could be validated since they are internal, and explicitly viewed for developers. Other than that, Microblaze specific operations are seen as well since important configurations are read from the memory (heap, stack, and I/D cache information).

Scrutinizing figure 31, the white pixel-data can clearly be validated which is:

(F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F ) However, a better way to validate the design would be to used bit patterns such as (0000000), (1111111), (1010101), and (1010101), thus validating every other transaction. This could not be done due to the limited time-span.

Also, the ILA debugger validates that the data is written correctly into the MIG7 controller. As can be seen in figure 32, 128 bits are used to write into the

(37)

Figure 29: S2MM write channel

Figure 30: Zoomed example illustrating the S2MM channel

DDR3 SDRAM controller were pixel-data is sent as bursts. Also, a part of the whole transfer is acknowledged in figure 32.

(38)

Figure 31: AXI memory interconnect reading from the MIG7

Figure 32: AXI memory interconnect writing to the MIG7

4.2 Results

The custom VHDL design uses resources such as BRAM and MMCM which is the most utilized logic compared to the other resources. As can be observed from figure 33, most of the resources are not utilized when using a custom RTL based design which is good since it is small and also leaves room for more blocks that can enhance the throughput for the Camera System.

The custom VDHL design is not using any AXI4-blocks which can be good if a custom communication protocol is designed. However, this would be time consuming. In order to save time, Xilinx’s AXI4 blocks can be used together with the custom VHDL blocks depending on the time budget. Xilinx provides

(39)

Figure 33: Utilization of resources for the custom VHDL architecture

various performance boosting IP cores that can be used to ease development and validation.

Since the MIG7 DRAM controller was too complicated to simulate, hard- ware was used to validate the 2nd Camera System. Since the AXI4 protocol is compliant with almost all Xilinx’s IP cores, the previous simulations still proves that the path from camera sensor is transferred to the VDMA and is reaching the memory.

The 2nd Camera System was tested partially in order to confirm the follow- ing peripherals:

• AXI4-Stream to Video Out.

• RGB to DVI Video Encoder.

• VTC.

• Dynamic Clock Generator.

As can be observed in figure 34, I/O along with PLL, MMCM, and LUTs are the most used resources for this design which is good in terms of performance and validation.

As can be observed in figure 35 the total On-Chip power is low, i.e, ap- proximately 0.239 W. The MMCM clocks are the resources that consume most power, since most of the switching activity comes from the clocks that control critical data-paths.

The power consumption of the final Camera System is shown in figure 36.

It had a total on-chip power consumption of 1.172 W (including the MicroB- laze). Most of the power consumption is from resources such as clocks, Phasers, MMCM, and I/Os. The I/O are consuming 0.300 W which is 29 percent of the total power consumption. The power consumption of the debugging part of the design is not so critical, since it will be removed from the final design. It

(40)

Figure 34: Resource utilization for the 2nd Camera System

Figure 35: Power consumption for the custom VHDL architecture

contains AXI4-Stream to Video Out, RGB to DVI Video Encoder, VTC, and Dynamic Clock Generator. Removing these modules would save power since re- sources pertaining to I/O consumed 0.300 W,thus, decreasing the power budget to be below 1 W.

The gated clock implementation was not implemented in the 2nd Camera System, since it had to be AXI4 compliant thus increasing its complexity level and is inefficient in general. The custom VHDL design has a working clock gating module controlling the clocks. However, due to limited time, the gated clock algorithm was never implemented in the Final Camera System.

The custom VHDL design could easily be validated through hardware since

(41)

Figure 36: Power consumption for the 2nd camera architecture

AXI4 IP’s were not used. This resulted in lesser IP cores in the Camera System.

However, the validation part for the 2nd Camera System became harder since modules such as the MIG7 could not be simulated at all. Other than that, the AXI4 compliant IPs were more convenient to use when designing the 2nd Camera System since handshaking signals could be taken into account, and thus giving the possibility to validate every read and write transaction.

The MIG7 controller was not simulated because it was to complex, so a BRAM controller was used in order to validate the test Camera System. It seemed to work well since internal key blocks such as the VDMA was validated.

Even though the BRAM has a limited amount of memory, the address range could be increased in the simulation.

Figure 37: A blurry picture presented on a VGA screen

The 2nd Camera System generates 640x480 ∗ 3 = 921.6kB pixels per frame which is sent to into the VDMA. The picture quality was bad since the register

References

Related documents

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Av tabellen framgår att det behövs utförlig information om de projekt som genomförs vid instituten. Då Tillväxtanalys ska föreslå en metod som kan visa hur institutens verksamhet

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar

While firms that receive Almi loans often are extremely small, they have borrowed money with the intent to grow the firm, which should ensure that these firm have growth ambitions even

Pursuant to Article 4(1) of the General Data Protection Regulation (“GDPR”) machines have no right to data protection as it establishes that “personal data means any

unaffected. 58 The choice of waste location is highly important for the spread of contaminants. Waste that is decomposed in the unsaturated zone of the soil may be deposited in

The latency of the Lambda function indicates the time required to process the message, while the time in Kinesis Stream represents the time it takes to wait until messages