Design and verification of a USB 3.0 readout system for Timepix3 hybrid pixel detectors

(1)

Master’s thesis in Electronics Engineering

Design and Verification of a USB 3.0 Readout System for Timepix3 Hybrid Pixel Detectors

Till Dreier

(2)

(3)

MID SWEDEN UNIVERSITY Department of Electronics Design (EKS)

Examiner: Göran Thungström, goran.thungstrom@miun.se Supervisor: David Krapohl, david.krapohl@miun.se Author: Till Dreier, tidr1400@student.miun.se

Degree programme: Master-by-Research (M.Sc.) in Electronics, 120 credits Main field of study: Electronics Engineering

Semester, year: Spring, 2018

(4)

(5)

Abstract

This thesis describes the design, implementation, and verification of a USB 3.0 readout system for Timepix3 detectors. Timepix3 is a hybrid pixel detector consisting of a 256x256 pixel matrix with a 55 µm pitch and a timing resolution of 1.56 ns. It allows to measure energy and time simultaneously utilising an event-driven data stream with a maximum data rate of up to 5.12 Gb/s or up to 85 million hits per second. Our aim is to implement a readout system that allows to add processing algorithms into the firmware, reducing the amount of data and the post-processing time. USB 3.0 was selected as an interface, because it provides a sufficient data rate and is present on all modern computers. Furthermore, the selected Opal Kelly XEM6310 development board provides a framework handling the communication between the FPGA and the host computer, FPGA components, and an API.

We implemented a hardware adapter board in collaboration with the University of Glasgow connecting the development board to the detector chip board converting the detector’s output signals from SLVS to LVDS signals. Moreover, we implemented FPGA firmware consisting of a detector interface, USB interface, and a core including a processing interface. A multi-platform desktop library was implemented in C/C++ using Qt, which is used to configure the readout system and handles high speed data streaming. Data analysis and verification is conducted using custom build Python scripts.

Simulations of the firmware showed the designed behaviour. The firmware and library were verified by configuring the detector, reading back the configuration, and measurements with an Americium source. An equalisation and a global and per-pixel energy calibration have been done successfully. Moreover, the system has been used to create and correct an X-ray image. Furthermore, the USB 3.0 data streaming performance was evaluated and it could be shown that the system can sustain a stream of around 380 MB/s.

The proposed readout system has been implemented and was verified in simulation

and experiments with X-ray radiation. USB 3.0 data streaming performed better

than anticipated reaching higher speeds as stated by Opal Kelly. Furthermore,

the firmware and the library function as intended. The hardware adapter

requires some changes to accommodate higher data speeds and the data chan-

nels have to be moved to different pins to allow synchronisation to an external clock.

(6)

Denna avhandling beskriver utveckling och verifikation av ett USB 3.0 baserad utläsningssytem för Timepix3 detektorer. Timepix3 är en hybridpixeldetektor som består av en 256x256 pixelmatris med en pixelstorlek av 55 µm ² och som klarar en tidsupplösning av 1.56 ns. Detektorn tillåter att mäta energi och tidsinformation samtidigt och använder en händelsedriven dataström med en maximal datahastighet på 5,12 Gb/s som motsvarar ca. 85 miljoner träffar per sekund. Vårt mål är att detta systemet tillåter databehandling i FPGA:n alltså minskar datamängden och efterbehandlingstid. Vi valde USB 3.0 för att det ger en nytsad datahastighet och för att det finns på alla moderna datorer. Dessutom föll valet på ett XEM6310 kort från Opal Kelly som utvecklingsplatform. Opal Kelly erbjuder ett ramverk som hanterar kommunikationen mellan FPGA:n och datorn. Ramverket innehåller FPGA-komponenter och en API.

Adapterkortet utvecklades i samarbete med University of Glasgow som kopplar vårt utvecklingskort till detektorkortet där detektorutgångssignaler vandlas från SLVS till LVDS signaler. Dessutom implementerade vi FPGA-firmware som består av ett detektorgränssnitt, ett USB-gränssnitt, och en kärna med ett databehandlingsgränssnitt. Ett multi-platform bibliotek utvecklades och implementerades i C/C++ med användning av Qt. Bibliotektet används för att konfigurera utläsningssystemet, konfigurera detektorn, och hantera dataströmmning från och till detektorn. Dataanalys och verifiering utfördes med hjälp av självutvecklade Python-verktyg.

Simuleringar av firmware visade det förväntade beteendet. Firmware och bibliote- ket verifierades genom att konfigurera detektorn, läsa tillbaka konfigurationen, och mätningar med en Americiumkälla. Ekvalisering, global energikalibration, och per-pixel-kalibration utfördes också. Dessutom har systemet använts för att ta röntenbilder. Analys av USB 3.0 dataströmming visade att biblioteket och utläsningssystemet kan upprätthålla en ström av upp till 380 MB/s från FPGA:n till datorn.

Det beskrivna utläsningssystemet implementerades och verifierades i simulering och

experimentellt med hjälp av strålkällar. Dataströmmning med USB 3.0 utförde sig

bättre än förväntat och visade högre hastigheter som visas av Opal Kelly. Firmware

och biblioteket fungerar som förväntat. Adapterkortet fungerar men kräver vissa

ändringar för att tillåta högra datahastigheter. Dessutom måste datakanalerna

flyttas till olika ingångar för att synkronisera datakanalerna till en extern klocka.

(7)

Acknowledgements

I would like to thank my supervisor, David Krapohl, for his guidance and support during my studies, Dima Maneuski for his help during the development and the discussions we had, and Na- jeem Lawal for his help concerning FPGA development, especially in the beginning of this project.

Specifically important for this thesis has been the work of Jan Oliver Schöwerling. We spent a significant amount of time on optimising the developed system. Without his help, this work could not have been achieved with this quality. Moreover, I would also like to thank Xiaotian Li for his help soldering our prototypes after I messed it up. A special thanks goes to Oliver Keller from CERN with whom I could always discuss issues and ideas during the development and analysis process and to Giulio Crevatin from Diamond Light Source, who helped me out a few times when I go stuck.

I also would like to thank Xavier Llopart and Tuomas Poikela who I annoyed a few times about

issues with the Timepix3 chip. Furthermore, I want to thank all the others who I met on the

Medipix meeting in Glasgow and on the iWoRiD 2018 conference for the really interesting and

informative discussions, especially Stuart George from NASA, Val O’Shea from the University of

Glasgow, and Ian Horsewell from Diamond Light Source. Lastly, I would like to thank all the

others in the radiation sensor systems research group at Mid Sweden University, the rest of the

Medipix collaboration for their support and interest, and Florian Gebben for proofreading this

thesis.

(8)

(9)

1 Introduction

1.1 Background and problem motivation

Timepix3 detectors are highly versatile radiation pixel detectors for a variety of applications from X-ray imaging to particle tracking [1] and can be used in industrial, medical, and research environments [1, 2]. Using the detector requires a readout system to be able to connect it to a host computer or to running as an embedded system. In order to be able to utilise the detector’s 5.12 Gbps output data stream [3], a FPGA-based system needs to be implemented. Furthermore, the readout system also needs to provide a high speed interface to the host computer, such as USB 3.0, 10 Gb Ethernet, or Thunderbolt 3. Additionally, other on-board functions might be desired as well like data pre-processing, decoding, etc. The main challenge of this thesis is the design and imple- mentation of a multi-purpose readout system that is easy to use, does not require special hardware to be connected to, and also is capable of running pre-processing within the detector’s data stream.

The selection of possible system-to-host interfaces is limited by several factors, most importantly, the maximum data rate. Another issue is the availability of the interface in computers: USB 3.0 is present on all modern computers and laptops, while 10 Gb Ethernet is still uncommon, especially in laptops, and Thunderbolt 3 is present only in new high-end computers. Further, also the availability of prototyping hardware and software is limited. There is no prototyping hardware that combines an FPGA with a Thunderbolt 3 output and also provides sufficient IO pins nor is there FPGA or desktop software to realise data streaming. In case of 10 Gb Ethernet, open FPGA cores are available. However, prototyping hardware is hard to find. Lastly, USB 3.0 prototyping hardware with the necessary FPGA modules and software support, sufficient IO pins, and an acceptable form factor can be found. However, the complexity of the USB 3.0 protocol is much higher than the one of 10 Gb Ethernet, while using Thunderbolt 3 has one of the most complex protocols available. USB 3.0, the only of the three interfaces available almost everywhere, is the best choice at this point.

1.2 High-level problem statement

The purpose of this project is to design, implement, and use a readout system for Timepix3 hybrid pixel detectors. It should be usable without specialised equipment like a dedicated computer, provide reasonable data rates, and allow for pre-processing. Weighting the available interfaces and the detector’s requirements and limitations results in the selection of an FPGA board with integrated USB 3.0 interface, implemented via an ARM co-processor, with provided FPGA and desktop software to handle USB 3.0 data streaming.

This project will implement an FPGA-based readout system, consisting of a detector interface

with 8 data channels, 1 input channel, 2 input clocks, 6 control signals, 1 analogue output, and

1 analogue input, as required by the detector. A USB interface will be implemented using the

FPGA modules provided with the prototyping board. Furthermore, a firmware core consisting of

additional control and management mechanisms, data pre-processing, and a processing interface

will be implemented. On the host side, a multi-platform C/C++ library will be implemented

using Qt, which is supposed to configure and control the readout system as well as receiving

and saving a stream of data from the system. In order to connect the prototyping board to the

detector, an adapter card is designed in collaboration with the University of Glasgow. At last, in

(12)

parallel to the firmware and software development, analysis tools are implemented in Python to verify detector outputs, analyse, and visualise data.

1.3 Scope

Included in this project is the design, implementation, and verification of the FPGA firmware consisting of a detector interface, USB interface, firmware core with flow control and other control functions, pre-processing, and a processing interface. The design and implementation of a multi-platform C/C++ library written in Qt handling the FPGA firmware, an adapter board to connect the detector to the readout system, and analysis tools written in Python.

The firmware is designed to be adaptable for other Medipix family detectors and allows to connect a single detector at a time. Processing algorithms can be implemented in the FPGA firmware.

Furthermore, the system-to-host interface is built to regulate itself to allow for maximum band- width usage. The firmware supports a single detector.

The adapter board is built to allow configuring the detector and to read out data from currently up to 7 channels. Furthermore, the readout clock can be provided or generated by the detector and provided back to the firmware. Data output channels of the detector have to be converted from Scalable Low Voltage Signalling (SLVS) to Low Voltage Differential Signalling (LVDS) for the FPGA to be usable.

Interactions between the readout sytem firmware and the host computer are controlled via the implemented library. The library is used to create commands, which are sent to the detector, and to prepare the commands for the readout system, i.e. filling up the packet to fit the requirements of the USB interface and converting the endian of the data. It also configures the detector’s periphery blocks, pixel matrix, and other settings. Receiving and saving data as well as controlling the data streaming is also handled by the library.

The functionality of hardware, firmware, and library is verified by configuring the detector, verifying the configuration, and measurements. Changing the detector’s configuration and reading it back already requires almost all modules to work together properly. Receiving data puts a lot more pressure on the system. Following the initial tests, the readout system is used to create an equalisation and a calibration for a detector. In addition to that, X-ray imaging is tested as well.

1.4 Detailed problem statement

The goal of this thesis is to design, implement, and verify a USB 3.0-based readout system for Timepix3 detectors. In order to achieve this, the task was split up into 4 parts:

• design of an adapter card

• design and implementation of the FPGA firmware

• design and implementation of a multi-platform C/C++ library

• verification of the whole system by measurements and analysing data using Python

Design and implementation of the firmware was split into 2 main parts: implementation of a

detector interface and implementation of a USB interface. Consequently, a flow control has to be

added between the detector and the readout system. Also, pre-processing and data decoding

of detector data has to be implemented as well. As a first step, the firmware was simulated to

verify each individual module and later the full system split into a receiver chain, which simulates

data input from the detector until data output via USB, and a transmitter chain simulating data

input via USB up to transmission of this data to the detector. Furthermore, the firmware has to

(13)

1.5. Outline

synchronise its data input channels with the detector, deserialise, and decode incoming data. In summary, the following parts have to be implemented for the firmware:

• a transmitter serialising data to the detector

• a receiver module consisting of up to 8 channels with synchronisation, decoding, and deserialisation

• a USB receiver module obtaining commands from a host and forwarding those to the detector transmitter

• a USB transmitter handling high speed data streaming between readout system and a host

• a simulation of the readout system

• additional control modules like a flow control and modules to handle the detector’s control pins

• pre-processing modules decoding data packets from the detector

The multi-platform library acts as the counter part to the firmware. It implements a class to create and prepare commands that the firmware has to transmit to the detector. This class has to create the payload, add the packet header, fill up the packet for the USB transfer, and also convert the data packet to big endian prior to transmitting it to the firmware. On the other hand, a class has to be implemented to manage the readout system, transmitting, receiving, and saving data. Furthermore, this class is also responsible to manage the data streaming. The detector configuration has to be handled as well. For this task, another class is implemented. In summary, the following tasks have to be solved:

• implement a class, which can generate and prepare commands for the detector

• implement a class to configure the detector

• implement a class to manage the readout system and to transmit, receive, and save data To verify that the readout system is functioning properly, it is used to configure the detector, read back the settings, and take a few simple measurements to verify that data passes through the firmware and the USB interface. Furthermore, the readout system is used to equalise and calibrate a detector. In summary, the verification is done by the following:

• configure the detector and verify the configuration

• verify transmitter, receiver, flow control, and pre-processing by simple measurements

• create an equalisation for a detector

• create an energy calibration for a detector and for every pixel

• create an X-ray image and apply flat field correction

1.5 Outline

Chapter 2 introduces the necessary theory for this thesis. It covers Timepix3 detectors, the detector’s custom serial protocol, different effects that occur in the detector, correction methods, 8b/10b data decoding, and the differential signalling standards used by the FPGA and the detector. Furthermore, the related work section compares our approach to other readout system, their hardware and software selections, purpose of the system, firmware design, and takes a look a on-board processing of pixel detector data.

In Chapter 3, the selected prototyping board is introduced, the design approaches are discussed,

and the evaluation data and measurement setups are shown.

(14)

Chapter 4 covers the design of the adapter board and the implementation of the FPGA firmare, library, and analysis tools. The firmware design is divided into the detector interface, USB interface, and the firmware core, consisting of a flow control module, pre-processing modules, a PLL, and additional control module. Furthermore, the library design is divided into handling the readout system firmware, creating commands for the detector, and configuring the detector.

The simulation of the readout system is covered in Chapter 5. This chapter simulates the transmitter (USB data input to data output to the detector) and receiver (data input from the detector to data output via USB) chains.

In Chapter 6, the readout system’s functionality (firmware, library, and hardware) is verified by measurements. Moreover, the detector is equalised and calibrated, and imaging with flat field correction is tested.

The results shown in Chapters 5 and 6 are analysed and discussed in Chapter 7. This chapter takes a deeper look into the results obtained by the experiments, interprets them and thereby verifies the functionality of the readout system. Also, issues and limitations are discussed.

Finally, Chapter 8 concludes the project. Social and ethical aspects are discussed and future possibilities and improvements are outlined. Consequently, the goals outlined in the detailed problem statement are verified.

1.6 Contributions

The following people contributed to this project: Jan Oliver Schöwerling and Dima Maneuski. Jan

Oliver helped to restructure and optimise modules in the FPGA firmware and also implemented

an optimised USB data streaming algorithm. The optimised streaming algorithm consists of an

additional firmware module, which is controlled from the library and allows to let the library pick

streaming parameters instead of having the user select the parameters. Dima created the layout

of the adapter board using Altium Designer. The layout is based on a previous analysis, where

the ports of the detector chip board were mapped to the IO connectors of the prototyping board.

(15)

2 Theory and Related Work

2.1 Timepix3

2.1.1 Timepix3 detector

Timepix3 [3] is a hybrid pixel detector and part of the Medipix family [4–6]. These detectors are designed at CERN for the Medipix Collaboration [4], an international cooperation of universities and research institutes [1]. The original purpose of Medipix family detectors was to act like a camera for radiation that is fast enough to record particles hitting the detector, thereby permitting high-resolution, high-contrast, and almost noise-free X-ray imaging [4]. Hybrid pixel detectors have a broad field of possible applications besides high energy physics [4] such as particle tracking [7, 8], medical imaging [2, 9–13], radiation therapy [14], spectroscopy [15, 16], dosimetry [15, 17, 18], astronomy [19, 20], environmental monitoring [21, 22], material science [1], monitoring at large facilities (Large Hadron Collider (LHC), synchrotrons, etc.) [23–25], and more. Timepix3 belongs to the third generation of Medipix detectors and is currently the most advanced and versatile hybrid pixel detector available.

(a) Pixel matrix bump bonded to the ASIC (b) Side view of a single pixel

Figure 2.1: Build up of the pixel matrix and individual pixels [1]

Figure 2.1a shows the detector consisting of a semiconductor sensor, which is bump-bonded

to a readout Application-Specific Integrated Circuit (ASIC) [1, 26]. Individual pixels (Figure

2.1b) consist of a p-n junction to which either electrons or holes are pulled, due to an applied

electric field in the semiconductor. Every pixel consists of over 1000 radiation hard transistors

[3, 26]. Depending on the desired energy resolution, polarity, and general detector sensitivity,

different semiconductor materials can be bounded to the ASIC, such as Silicon (Si), Cadmium

Telluride (CdTe), Cadmium Zinc Telluride (CZT), or Gallium Arsenide (GaAs) [11, 27–29].

(16)

Special about Timepix3 is that the detector allows to measure energy and time simultaneously via Time-over-Threshold (ToT) and Time-of-Arrival (ToA). Furthermore, it also allows to simply measure a more precise time stamp using ToA-only mode or to count photons combined with an integrated energy (iToT), typically used for imaging purposes [3, 26, 27, 30]. The detector sends out data either sequentially with suppressed zeros or as an unordered, event-driven stream of sparse data, i.e. only pixels that sensed a charge are transmitting, thus reducing the required bandwidth by removing the need to transfer full frames [3, 31]. This allows the detector to work practically with no dead-time. When measuring in sequential-mode the detector can provide around 1300 [31] up to 2000 [29] frames per second. The pixel matrix is able to provide a fToA timing resolution of 1.56 ns [3] with an individual pixel dead-time of around 475 ns [31], while the ToA time stamp and the ToT counter run at a frequency of 40 MHz limiting the energy and time resolution. Considering the 256 by 256 pixel matrix with 8 output channels at a maximum readout speed of 320 MHz with Double Data Rate (DDR), this results in a theoretical bandwidth of 5.12 Gbps or around 85 million hits per second [3, 27, 30, 31].

The detector derives a rectangular discriminator signal from the charge, which is then synchronised to the 40 MHz pixel clock. Based on the synchronised discriminator signal, the ToT and ToA counters are determined. Figure 2.2 illustrates the concept of the ToT counter, ToA time stamp, and the fToA counter. For the ToT counter, a Linear-Feedback Shift Register (LFSR) register is used starting on the rising edge of the global clock if the discriminator is high. This causes a delay between the point the discriminator goes high and the next rising edge of the pixel clock.

Moreover, the ToA time stamp is based on a free-running gray counter latched into a register.

The value of the gray counter is taken when the ToA clock signal goes high, at the rising edge of the global clock after the discriminator signal went high. Both, ToT and ToA counters run on 40 MHz. Finally, the 640 MHz fToA counter measures the time between the point where the original discriminator goes high and the next rising edge of the pixel clock. [3, 26]

time charge

pixel clock

discriminator

discr synced

ToT clock

ToA clock

fToA clock THL

Figure 2.2: Concept of Time-over-Threshold (ToT), Time-of-Arrival (ToA), and fast-Time-of- Arrival (fToA)

To connect to the standard detector chip board a Very-High-Density Cable Interconnect (VHDCI) connector is used. It maps all required channels, signals and clocks to one interface to which a readout system can be connected (Figure 2.3). The Detector_bias port does not have to be used, there is an external connector on the chip board as well. Furthermore, the different DAC_Out pins allow to connect up to 4 detectors via a single VHDCI connector. Moreover, the detector chip board also contains voltage stabilisers providing a constant voltage of 1.5 V to the detector.

[26]

The detector’s ASIC is interfaced using a readout system. It includes a number of control blocks

using a proprietary serial protocol and additional control signals. Figure 2.4 shows the floor

(17)

2.1. Timepix3

Figure 2.3: VHDCI connector on the detector’s chip board [26]

plan of the detector divided into three parts (bottom to top): IO pads connecting to a readout system, the ASIC periphery, and the pixel matrix [26]. The periphery modules handle different parts of the detector’s functions, e.g. Slow Control & Command Decoder are handling inputs via command or control signal, while the Output Block provides data outputs [26]. A detailed description of the Timepix3 serial protocol can be found in the next section (see Section 2.1.2).

Additionally, the detector expects a number of different control signals. These are differential inputs provided either as LVDS or SLVS signals (see Section 2.3.3) triggering certain detector events. Present control signals are: EnableIn, EnablePowerPulsing, Reset, Shutter, T0_Sync, and ExternalTestPulse . Furthermore, the detector has an input for an external DAC, as well as an analogue output. EnableIn is used to enable the detector’s input block, i.e. to start sampling the input data channel. EnablePowerPulsing starts providing power to the pixel matrix when set to high. Reset performs a reset of the detector’s internal registers and the pixel matrix, i.e. it causes the detector to initialise to default values. The Shutter signal is used to start and stop data acqui- sition from the detector, like a shutter of a camera. T0_Sync is required to start the detector’s internal timer, which is used to generate the ToA time stamp. ExternalTestPulse can be used to provide test pulses from an external source instead of using the internal test pulse generator. [26]

The detector requires two differential clocks to be provided, where the system clock (ClkIn40 ) can be either a 40 MHz or a 80 MHz clock and the secondary readout clock can have frequency of up to 320 MHz. However, generating and providing such a fast clock signal is highly prone to errors. Therefore, the detector can use a slower provided clock (such as 40 MHz) to generate its readout clock using an internal Phase Locked Loop (PLL), which can be provided back to the readout system using the ClkOut or the PLLOut pads. On the other hand, the 8b/10b encoding standard can be utilised to recover the clock from encoded data signals (see 2.3.2). [3, 26]

Inputs to the detector are sent via the DataIn channel. The channel is synchronous to ClkIn40, uses Single Data Rate (SDR), and is generally not encoded. For the detector to be able to receive a command, its input block has to be started by toggling the EnableIn control signal to high and back to low. Outputs from the detector are sent out on up to 8 independent channels.

By default only channel 0 is enabled and the output is synchronous to the provided readout clock

(ClkInRefPLL by default). Furthermore, all outputs are 8b/10b encoded (see Section 2.3.2) and

(18)

Figure 2.4: Timepix3 detector floor-plan consisting of IO pads, ASIC and sensor [26]

use Double Data Rate (DDR) by default. It is important to note that output data packets are never split up between channels, every individual packet is transmitted on a single channel [3, 26]. The detector’s protocol is described in detail in Section 2.1.2.

2.1.2 Timepix3 protocol

The Timepix3 detector implements a proprietary serial protocol with multiple independent output channels. All communication with the detector is handled using commands and a few differential control signals to trigger certain events (as described in Section 2.1.1). Following the start-up sequence shown in Figure 2.9, only the EnableIn signal needs to be toggled before attempting to send a command to the detector. Command inputs to the chip are not encoded and can vary from 48 bits up to 393,520 bits, whereas command outputs always have a size of 48 bits [26].

To allow for a rudimentary flow control between readout systems and the detector, the detector’s

protocol supports simple handshaking. It will always acknowledge successful commands via

(19)

2.1. Timepix3

issuing an End-of-Command (EoC) packet or either a wrong command packet, in case the detector’s command decoder could not find a valid header, or an other chip command, when the provided ChipID is a local ID and does not match the detector. Furthermore, optional acknowledge commands can be enabled, which cause the chip to issue an acknowledgement as soon as the header was successfully deciphered [26]. Functionality like error correction and re-sending of commands have to be implemented in the readout system if desired.

By default, all commands the detector sends out are 8b/10b encoded and are sent LSB first.

Decoding the data automatically results in flipping the bits to MSB first format. The detector breaks the packets down into 6 blocks that are individually 8b/10b encoded (see Figure 2.5) and transmitted via one of the independent DataOut channels. For synchronisation, the detector fills up all enabled channels with K.25.8 control characters (see Section 2.3.2) and issues one every 8th full data packet to force synchronisation between detector and readout system.

Figure 2.5: Packet bit order and encoding [26]

Figure 2.6 shows the typical detector behaviour when sending a command, an optional ac- knowledgement is triggered after deciphering the command header, then the requested data are transmitted encapsulated in its own 48 bit command, and the command is terminated with an EoC packet. It can be seen that every input command consists of a 8 bit presync pattern, which can either be the global 0xAA to address every detector connected to the DataIn channel or the local 0xE4 when addressing a specific detector. Following the presync pattern, the ChipID can be specified. However, this is only needed when using the local presync, otherwise the ChipID should be set to all zeros. After presync and ChipID, the 8 bit command header is sent. Based on this header, the detector knows how much more data to expect as payload. There are commands that do not require any payload at all, commands affecting the periphery generally require 16 bit payload, and commands aimed at the pixel matrix require another 256 bit as column mask and eventually more data based on the amount of selected columns. For commands that provide multiple outputs, such as reading the configuration of the pixel matrix (PCR), the detector will is- sue one command per pixel with the desired information encoded in the commands payload [3, 26].

Figure 2.6: Structure of a periphery command to the detector [26]

(20)

Figure 2.7 shows the structure of pixel data packets with the Voltage-Controlled Oscillator (VCO) enabled. If VCO is disabled, the lowest 4 bits are replaced with a hit counter. Outputs packets can always be identified via their unique 4 bit headers. However, control commands, such as EoC, use a longer header to specify the control packet. Periphery outputs use an 8 bit header, which identifies the periphery module. [26]

Figure 2.7: Structure of detector output commands for pixel data [26]

Generally, selecting a column or double column is done by setting the bit at the desired position to 0, while setting it to 1 would unselect a column. However, this logic is inverted when selecting columns for test pulses [26]. Figure 2.8 shows how the pixel matrix is masked, starting with the highest column at the MSB. The same logic is used when reading data sequentially using double columns and tokens. In this case, two columns are combined into a double column and a number of readout tokens has to be supplied as well. The amount of tokens is derived from the number of bits that are set to 1 in the TokenSelect register, while ignoring its MSB, i.e. 127 tokens can be selected at maximum. Furthermore, the manual describes the optimal amount of tokens depending on the available bandwidth [26].

Figure 2.8: Masking columns by using a 0 to select a column [26]

Configuring the pixel matrix’ Pixel Configuration Register (PCR) or the Column Test Pulse Register (CTPR) requires some additional caution as their entries per pixel are smaller than one byte. Both registers require to be transferred in compressed form, whereby the compressed register violates the byte endian, thus resulting in the detector to receive corrupted data, while reading back and decoding results in the data seemingly looking valid. To avoid this, the individual register entries (every pixel) have to be inverted prior to compression and inverted again for verification when reading back the registers from the detector [26].

Pixel data obtained from the detector are generally encoded. ToT, iToT, PC and the 4 bit hit counter are Linear-Feedback Shift Register (LFSR) encoded, whereas ToA data are encoded using a Gray counter. Furthermore, the pixel addresses are encoded in super pixel layout, which consists of 2 columns with 4 pixels each. These data fields have to be decoded to provide accurate data [3, 26, 32]. This can be done either on-board the readout system or during analy- sis on a computer, which will be much more time consuming especially with large amounts of data.

Prior to using the detector, a minimal start-up sequence has to be executed resetting and sending

an initial configuration to the detector (see Figure 2.9). After providing clocks and resetting the

detector, the order of commands does not matter. However, for the chip to be able to receive a

command, it is required to start its internal command decoder by toggling the EnableIn pins to

high and back to low. The command decoder will start sampling the DataIn channel on the falling

edge of EnableIn. After configuring the registers and the pixel matrix (PCR) EnablePowerPulsing

has to go high to start powering the pixel matrix [3, 26, 30]. Furthermore, it should be noted that

changing the configuration of the detector’s output block causes the module to reset, i.e. anything

(21)

2.2. Physical phenomenons and compensation

connected to the detector’s output channels will lose synchronisation and might miss EoC packets.

Figure 2.9: Default start-up sequence of the detector [26]

2.2 Physical phenomenons and compensation

2.2.1 Charge sharing

Charge sharing is a phenomenon where a charge is deposited into several pixels, thus creating a cluster. There are several possible causes, photons and ionising radiation can be deposited into several pixels; it can be caused by cross-talk within the chip or capacitive effects between the pixels. This effect leads to lower ToT counts in multiple pixels (Figure 2.10). In most cases, charge sharing is an unwanted side effect, which is desired to be minimised, e.g. by increasing the voltage of the electric field in the sensor material [1] or by summing up the clusters into a single pixel in pre-processing [11]. However, there are some cases where charge sharing is beneficial, such as clustering of alpha particles, time-walk correction (see Section 2.2.7) [33], or to improve the resolution objects, e.g. when measuring with alpha particles or gamma radiation that causes a lot of charge sharing. Alpha particles will always spread their charge into several pixels due to their size, which makes it easy to identify them simply by shape.

Figure 2.10: Concept of charge sharing between pixels [1]

Considering the calibration of the detector, charge sharing is, basically, unusable data, due to the

fact that the wanted energies are spread between several pixels resulting in a cluster of lower

values. However, it could be summed up into a single pixel, resulting in the original value after

the data acquisition to recover the original value, which is an integrated function of the Medipix3

(22)

detector [5]. The higher the energy of the used radiation gets, the more often charge sharing will occur [11]. Another effect can occur when using radiation with energy higher than the one of the detector material. This will cause the detector material to fluoresce photons, which are detected as well [1].

2.2.2 fToA propagation delay

The fast-Time-of-Arrival (fToA) time stamp runs at 640 MHz (compared to 40 MHz for ToA) and is distributed through the pixel matrix via double columns using different clock phases generated by the detector’s PLL. Due to the high clock speed propagation delays are expected.

The detector iterates through the available clock phases and drives one double column per clock phase resulting in the pattern behaviour shown in Figure 2.11. Based on the observed pattern, the propagation delay can be compensated depending the pixel address [26, 34].

Figure 2.11: Propagation delay pattern of fToA in the pixel matrix observed using test pulses [34]

2.2.3 Detector equalisation

(a) Equalisation (b) Noise behaviour of the indvidual pixels

Figure 2.12: Equalisation of a Timepix3 detector (figure by David Krapohl)

A typical method to create an equalisation of the pixel matrix is using noise. This is done by setting

the individual pixel trim bits first to 0 and then to 15, placing the detector in a dark room without

any radiation sources, and iterating through all global thresholds in photon counting mode. Due

to differences in the pixels analogue electronics, each pixel will show its own characteristic noise

(23)

2.2. Physical phenomenons and compensation

behaviour. The goal of an equalisation is to adjust the trim bits in every pixel, thus minimising the threshold differences, which will result in a distribution similar to Figure 2.12a. In order to assure that the individual pixels will not influence each other and to reduce the amount of data generated at the same time, a masking grid is created. For every possible grid, all global thresholds are scanned using the lowest and highest possible pixel adjustment bits (0 and 15). These data are used to calculate the individual pixel trim bits, which is then verified using the same setup. In order to visualise the differences between the individual pixels, their individual noise behaviour can be plotted (Figure 2.12b). Furthermore, the data obtained from this measurement can be used to turn off (mask) pixels that produce noise out of the expected threshold area (noisy pixels). [35, 36]

2.2.4 Detector calibration

To be able to relate the ToT counts to the deposited energy, an energy calibration is required.

The relationship between ToT and energy can be approximated using:

f (x) = ax + b + c

x − t (2.1)

Where x is the ToT value and a, b, c, and t are fitting parameters. When using non-Silicon detectors, resolving lower energies might not be possible. In that case, only the linear part of the calibration curve can be used, thus allowing to approximate using a linear equation (Equation 2.2) where x is the ToT value, m is the slope, and b is the intercept:

f (x) = mx + b (2.2)

In order to calibrate a dataset, the inverse function of the calibration curve is used to convert the ToT value x to energy. The parameters calculated using the fit (Equations 2.2 or 2.1) are inserted into Equation 2.3:

f ⁻¹ (x) = b − x

m (2.3)

It has to be noted that depending on the obtained parameters, the output of Equation 2.3 might return negative values. In this case, the absolute value of the function should be used. Timepix detectors show a non-linear relationship for lower energies and a linear one for higher energies (see Figure 2.13a). Furthermore, every individual pixel shows a slightly different behaviour as well. To compensate for this, a per pixel calibration can be done creating a fit for every individual pixel. Further, the obtained parameters are inserted into the inverse function used to calculate the fitting parameters depending on the pixel coordinates. [1, 28, 36]

One popular method to create a calibration (see Figure 2.13a), is using K α peaks of fluorescence materials, i.e. materials responding with a known and clear characteristic energy to radiation.

(a) Calibration curve (b) Fluorescence peaks

Figure 2.13: Demonstration of a detector calibration [27, 28]

(24)

Figure 2.13b shows measured K α peaks for different fluorescence materials. It is important to only use hits without charge sharing, which will typically show up as noise on lower ToT counts.

Moreover, a calibration can be done for every individual pixel to achieve the best possible energy resolution [27, 28, 36]. On the other hand it is also possible to create a calibration using the detector’s integrated test pulses [27].

2.2.5 Spectroscopy

Figure 2.14: Am-241 spectrum taken with a Timepix3 [27]

Every individual pixel in the pixel matrix can be used as a spectrometer. Using a Timepix detector for spectroscopy requires a precise energy calibration (see Section 2.2.4) to deliver accurate results, due to the differences between the individual pixels. Figure 2.14 shows an Am-241 spectrum recorded by different pixels using a Timepix3 detector. On the left side, the spectrum recorded by different pixels is shown and on the right side, the same spectrum after a per-pixel energy calibration is shown. [27]

2.2.6 Flat field correction

Flat field correction is an efficient method to reduce systematic noise in X-ray images, thus increasing the Signal-to-Noise Ratio (SNR) [11, 37]. This technique creates a coefficient matrix with a unique correction factor for every pixel, which then is applied to the X-ray image. The correction coefficient for an individual pixel is calculated using Equation 2.4:

f (x) = Count(x)

Count M ean (2.4)

Where x is a pixel, Count(x) the amount of hits for a specific pixel, and Count M ean the average amount of hits per pixel. Figure 2.15 shows an example of a flat field correction, calculated using Equation 2.4, applied to an X-ray image of a fly taken with a first generation Medipix detectors [38].

In order to create a flat field correction two measurements are taken, besides the actual X-ray

image. The bright field is measured irradiating the detector without any object between the X-ray

source and the detector. For the dark field, the same setup is used without radiation. However,

with threshold-based detectors, such as Medipix and Timepix, it is not necessary to measure the

(25)

2.2. Physical phenomenons and compensation

dark field since these detectors would not record any data below their threshold. [37, 38]

(a) raw X-ray image (b) image with applied flat field correction

Figure 2.15: Demonstration of flat field correction using an X-ray image of a fly [38]

2.2.7 Time-walk correction

Time-walk is a phenomenon caused by the effect that the rise-time of signal always is the same regardless of its amplitude (height) (Figure 2.16a). This means that, depending on the energy recorded in a pixel, the signals need more or less time to reach the threshold, thus shortening or prolonging the detected signal. The lower the detected energy is, the higher is the delay of a signal until it crosses the threshold and thereby is detected. Since Timepix3 can measure ToT and ToA simultaneously, it is possible to correct the time-walk effect by using charge sharing in the pixel matrix [33]. For this approach, a source with a known energy is used (such as Am-241).

Only clusters with two to four hits, a total energy matching the source (59.5 keV for Am-241), and one pixel having a specific reference energy are considered. The remaining pixels in the cluster are then related to the chosen reference, resulting in a distribution of time differences for the remaining energies. These distributions can be fitted using a Gaussian function (Figure 2.16b) [27, 39]. The time-walk can then be corrected using Equation 2.5:

∆t = c

(E − E ₀ ) ^σ (2.5)

(a) Time-walk concept (b) Time-walk correction

Figure 2.16: Time-walk concept [27, 33]

(26)

where ∆t is the time-walk in ns, E 0 the detector threshold in keV, E the energy of the individual pixel in keV, and c and σ are center and width paramters obtained by applying a gaussian fit on every energy bin. [33]

2.3 Standards and interfaces

2.3.1 USB 3.0 via the FrontPanel framework

FrontPanel is a framework provided by Opal Kelly for their prototyping boards (see Figure 2.17).

It allows to connect an FPGA to a computer via USB 3.0 while hiding most of the complexity of the USB protocol. The prototyping boards contain a Cypress FX3 ARM processor specialised on USB 3.0, which is handled entirely by FrontPanel, so that it seems that the FPGA is connected directly to a computer. To achieve this, FrontPanel provides pre-compiled FPGA modules called Endpoints and a C/C++ Application Programming Interface (API). The framework interconnects these Endpoints with the API via the FX3 processor. [40]

Figure 2.17: Overview of the FrontPanel framework [40]

On the FPGA side, four different Endpoints are provided: Wires, Trigger, Pipes, a Register

Bridge, and a module to tie everything together. All these modules work in both directions and

serve different purposes. Triggers provide synchronous signals that can go high for a clock cycle

and are therefore suitable as start signals or status indicators. Wires provide slow asynchronous

data transfers, e.g. for reset signals or cases where more than a single bit is required. The Register

Bridge works like a standard wishbone-type register interface, it provides 32 bit addresses with

32 bit data fields allowing for detailed configuration and status information to be exchanged

between the FPGA and the host. Furthermore, there are two types of Pipes, regular Pipes and

Block-Throttled Pipes. The latter can be used to better control USB 3.0 transfers and therefore

allow for a higher data rate with less USB protocol overhead. However, the USB 3.0 protocol

is a master-slave protocol, i.e. the host computer is always the master and has to ask for a

specific amount of data with certain pipe settings. On the other hand, the FPGA is the slave

and therefore is responsible to provide the exact amount of data the master asked for, which

means that additional logic has to be implemented on the FPGA-side to make sure that the right

amount of data is provided to the host (see Section 4.2.2). [40]

(27)

2.3. Standards and interfaces

On the PC side, the API provides the counter part for the FPGA Endpoints. Since the host is always the master, it is in control over all the FPGA’s functionality. The API can be integrated into any C/C++, Python, Java, C#, or Matlab programme. [40]

2.3.2 8b/10b data encoding

8b/10b encoding is a standard technique for reliable high-speed data transceivers. A byte of data are encoded as 10 bit, to ensure that the data signals frequently transition between zero and one, thus ensuring an equal number of ones and zeros on the data lines. This principle is called DC balance can also be used to recover a clock signal from the data signals, due to recurring signal transitions. In detail, 8b/10b encoding incorporates two sub-encodings: 5b/6b and 3b/4b. Figure 2.18 shows how encoding data works, the LSB’s of the un-encoded data are 5b/6b encoded and take the MSB’s places in the encoded data, while the MSBs of the un-encoded data are 3b/4b encoded and moved to the LSB’s position. For most bit combinations there are two different encodings, regular and inverted, to match the Running Disparity (RD), which ensures DC balance. Furthermore, the 8b/10b standard also includes control characters, called comma symbols. These comma symbols are used to synchronise transmitter and receiver or to provide control functionality [41, 42]. However, Timepix3 only implements the unique K.28.5 symbol for synchronisation [3, 26].

Figure 2.18: Concept of 8b10b encoding [41]

2.3.3 LVDS and SLVS signalling

Figure 2.19: Concept of LVDS and SLVS signalling [43]

Low Voltage Differential Signalling (LVDS) is a signal transmission standard widely used in

high-speed communication technologies. The electrical principle is rather simple (see Figure

2.19), a differential voltage flows over a termination resistor between the differential inputs of

the receiver, the polarity of the signal determines if a zero or a one is detected. LVDS is used in

many modern communication standards such as HDMI, Display Port, Ethernet, etc. [43, 44] and

(28)

also to connect periphery modules on a Printed Circuit Board (PCB), e.g. connecting external RAM to an FPGA [45, 46].

While LVDS typically works at a voltage level of 1.2 V with a 200 mV swing, Scalable Low Voltage Signalling (SLVS) uses a much lower voltage level of 200 mV with a swing of 100 mV,

thus reducing the power consumption and also the maximum possible transmission range, but significantly increasing the data rate [26]. However, the principle of LVDS and SLVS are identical.

Dealing with SLVS signalling usually requires using converter, due to the fact that most FPGAs do not fully support the standard yet [41, 46].

2.4 Related work

Currently, there a several different readout systems available for Timepix3. Some are commer- cially available such as SPIDR [32, 47], AdvaDAQ [33], and Katherine [31]. Other institutes and universities built their own proprietary readout systems, e.g. Diamond Light Source with MERLIN [24] for Medipix3, an adapted version for Timepix3 [25, 34], and a new system using 48 Medipix3 detectors [48], or DESY with a readout system using 36 Gallium Arsenide (GaAs) Medipix3 detectors [29]. Both systems were designed to be used at a synchrotron. Another approach was taken with the GEFE project, which is aiming to design a radiation hard and open base-board for pixel detector readout [49].

All current readout systems use FPGAs to communicate with the detector or several detectors at once. However, the chosen interfaces and purposes vary. The SPIDR system is available in two different version using either a Virtex-7 or an Artix-7 FPGA for the compact version with 1 or 10 Gb Ethernet as interface [32, 47]. On the other hand, the MERLIN system is based on a Virtex-5 FPGA programmed via LabView and also uses Ethernet as interface [34].

The AdvaDAQ system uses USB 3.0 [33] and can be seen as the successor of the FITPix read- out system [50]. We also selected USB 3.0 as interface for several reasons. First of all, there was no USB readout system available when we started this project and USB 3.0 provides a satisfactory data rate and is present on all modern computers unlike 10 Gb Ethernet. The Katherine system takes another approach, it was designed using an Altera Cyclone System on Chip (SoC) combining an FPGA with an ARM processor capable of running Linux [31].

This approach allows for easy on-board data processing using C/C++ instead of VHDL and also to use the system independently from a computer or include it into an existing network infrastructure. However, the downside of this approach is that it only provides a 1 Gb Eth- ernet interface [31]. The limited availability of high speed interfaces is a common issue with current SoC development boards and is the main reason we decided not to use a SoC at this point.

As mentioned before, all current readout systems require the use of an FPGA or a SoC containing an FPGA [25, 31–34, 47, 49]. However, Dima Maneuski from the University of Glasgow has shown that it is possible to interface Timepix1 detectors using only an ARM processor [51]. Moreover, there is currently an approach from Oliver Keller from CERN to create a simple readout system for educational purposes using Timepix3 together with a specialised processor allowing precise IO timing [52]. However, in order to utilise the high data rate of Timepix3, interfacing it with an FPGA is inevitable. Furthermore, there are not many interfaces that support sufficient data rates to properly utilise the detector. Possible interfaces would be fiber-based 10 Gb Ethernet as used by SPIDR [32], USB 3.0 as used by us and AdvaDAQ [33], or High-Definition Multimedia Interface (HDMI), which is present in the SPIDR system [47]. Other possibilities would be Thunderbold 3, for which no prototyping hardware is available or utilising more than a single interface [48]. Using 1 Gb Ethernet significantly limits the possible data rate, which could be compensated with on-board data processing [31, 53]. Considering these interface choices, only 1 Gb Ethernet or USB 3.0 (5.12 Gb/s) allow readout systems that can be used without specialised measurement computers or even servers [29, 48].

Concerning the FPGA firmware, the designers typically do not publish it. However, they show

(29)

2.4. Related work

some details in their publications. SPIDR for instance uses a modular approach where the firmware consists of interchangeable blocks, which fits well to their philosophy of creating a modular and adaptable system [32, 47]. We decided to follow a similar approach and create logical blocks using standard interface, to make it easier to adapt our readout system. On the other hand, the Katherine system allows to access the ARM processor and implement own control and processing algorithms on the readout system [31]. Typically, data can only be processed after acquisition [25, 32, 33]. However, all Timepix3 readout systems implemented decoders for 8b/10b encoded raw data and deserialisers to pack pixel packets together [31–33, 54]. Furthermore, most system also include data decoders, which decode LFSR or Gray encoded pixel data saving time in post-processing [31, 32]. We decided take for the same approach, added decoders for pixel data (LFSR, Gray and addresses), as well as 8b/10b decoders and deserialisation to create pixel packets inside the readout system similar to the design shown by the MERLIN system [24], which is required anyway to prepare data for on-board processing.

We want our system to be able to support on-board real-time processing, which was shown to be possible by Calliope-Louisa Sotiropoulou et al. and used for the ATLAS project [55–57].

The authors used a pixel detector similar to Timepix3 [58] and managed to cluster data from four detectors simultaneously on a single Spartan-6 FPGA. On-board processing has also been attempted by Kraus et al. for the FITPix system to remove pixels without data from frames, a feature now implemented in the detector itself [54]. However, the most usable approach to on-board processing is provided by the Katherine system. The system uses a SoC, which allows to implement processing algorithms in C/C++ [31].

There are different approaches to provide access to the readout systems. AdvaDAQ and Katherine

are distributed with a desktop software [31, 33, 59]. We are using the same approach as SPIDR,

providing a library, thus requiring the user to add an own application or script on top [32]. The

library approach offers a higher flexibility, while providing an already made software suite is more

user friendly, as long as the software is maintained. However, providing a library also allows to

integrate the readout system into other tools, such as ImageJ [51] or the Jupyter notebook [52].

(30)

(31)

3 Methodology

3.1 Components, tools and software

For prototyping, the XEM6310 development board (Figure 3.1) from Opal Kelly was selected [40]. The board consists of a Xilinx Spartan-6 150 FPGA and the Cypress FX3, a specialised ARM processor for USB 3.0 applications. Opal Kelly delivers a framework called FrontPanel (see Section 2.3.1), which obscures the complexity of USB 3.0 and allows to directly interface the FPGA, using so called Endpoints, from a computer via a provided C/C++ API, thus removing the requirement to implement USB 3.0 in the FPGA or to produce an own solution using a FX3 processor.

Figure 3.1: The selected XEM6310 development board from Opal Kelly [40]

Table 3.1 lists the software tools used during this project. FPGA development had to be conducted in Xilinx ISE, due to the fact that Spartan-6 FPGAs cannot be programmed using the newer Xilinx Vivado environment. ISE includes the ISIM simulator, which was used to simulate the design. Further, it also includes an IP Core generator allowing to generate standard components,

Table 3.1: Software and tools used for this project

Software/Tool Version Purpose

Xilinx ISE 14.7 FPGA firmware development and simulation FrontPanel 4.5.6 USB 3.0 framework and API

Qt Creator 4.6 C++ library development

Qt 5.10 Multi-platform C++ framework

Python 3.6.5 Data analysis and visualisation Anaconda 1.7 Scientific Python distribution

Root 6.12 Data analysis framework from CERN

Altium Designer 17.1 PCB design software

(32)

such as FIFOs, PLLs, etc.

The multi-platform software library is developed in C++ using Qt. For this project, functions from the newer C++ versions are used, such as C++11, C++14, and C++17. Qt is a framework, which provides data types, macros, and functions that are assured to be compiled to an equivalent data type on every supported operating system. Furthermore, Qt also allows to create user interfaces in C++, which also work on every platform.

Communication between the FPGA and the library is handled by the FrontPanel framework, which provides pre-compiled VHDL (and Verilog) components and a C/C++ API. The pre- compiled modules can be instantiated in the firmware and is then interfaced via the API. The API is available for Windows, Linux, macOS, and ARM-based platforms. The framework obscures the complexity of the USB 3.0 protocol, which is handled by an on-board ARM processor. Therefore, the FrontPanel framework handles the processor making it seem like the FPGA can be interfaced directly via the API.

Data analysis is performed mostly in Python using the Anaconda distribution from Continuum Analytics. Most of the data handling is implemented using the NumPy package. Plots are created using the Matplotlib and the Seaborn packages. Clustering and data processing has been implemented using SciPy and SciKit-Learn packages. Moreover, there are packages available for hdf5 file handling (PyTables), multi processing (multiprocessing), advanced data analysis (Pandas), etc. Time consuming functions can also be implemented in C via the Python API or the Cython package. Furthermore, the Root framework is used to create a detector equalisation using the C++ bindings, due to the fact that the code is available and known to work. However, the same functionality is also implemented in Python.

For the adapter design, Altium Designer is used, which is one of the most common PCB design tools. It provides advanced features to handle differential signals, which is useful since almost all signals are differential signals. Furthermore, an Altium design can be converted to KiCAD in order to have it available as an open source design.

3.2 Approach

This project consists of the development of a readout system and its application, which was divided into 4 parts:

• development of a hardware adapter board interfacing the detector

• development of FPGA firmware interfacing the detector and providing data streaming via USB 3.0

• development of a multi-platform software library to control the readout system

• development of analysis tools to analyse and verify the readout system

The individual parts all depend on each other to some degree. For instance, the firmware can requires the adapter board to be able to test the communication with the detector and the library to test USB streaming. The firmware is simulated in order to verify its functionality. Opal Kelly provides VHDL procedures, which can be used to simulate the USB interface. However, the exact detector behaviour cannot be simulated, the simulation is based on the detector’s manual.

Moreover, the analysis tools are developed in parallel to the other projects.

The hardware adapter’s purpose is to connect the detector chip board using a VHDCI connector

to the readout system prototyping board using two SAMTEC connectors. It has to provide

differential data channels, clocks, and control signals. Furthermore, it also needs to implement

an ADC. The main problem is to convert the detector’s output signals from SLVS to LVDS as

required by the FPGA. The chosen components are listed in Table 3.2. Additionally, an amplifier

(33)

3.3. Evaluation

is connected to the detector’s DAC output and resistors, capacitors, and beads are added for the power supply.

Table 3.2: Hardware components on the adapter board

Component Manufacturer Purpose

SN65LVDT125 Texas Instruments 4 channel SLVS to LVDS level converter MCP3204-CI/SL Microchip Technology Multi-channel ADC with SPI interface

BTE/BSE-040 Samtec 80 pin high density connector

VHDCI - 68 pin high density connector

The firmware itself cannot work on its own. It depends on data from the detector and the library. For simple testing, additional modules are added into the firmware, triggered and monitored via the FrontPanel API, which allows to create a basic interface in XML. Furthermore, the functionality can be simulated. However, the firmware depends on the adapter and the library working properly. First of all, the adapter has to be tested and verified, in order to finish the detector interface. Afterwards, the focus shifts towards the USB interface, which has to be worked on in conjunction with the library. Lastly, the internal firmware control modules have to be implemented. Those modules are e.g. a flow control module handling data transmission between the host and the detector and pre-processing modules, such as data decoders.

The software library serves several purposes, it handles the USB communication with the firmware, saves data received from it, configures the firmware and the detector, and prepares commands for the detector. Configuring the detector requires commands with a different parameters compressed in the commands payload. The idea is to handle the creation of the command payloads within the library so that the user does not have to handle the bit shifting. Another approach is to let the library handle the configuration of high speed USB data streaming automatically. Therefore, an algorithm is implemented, which allows the library to dynamically adjust the settings of the USB data transfer from the firmware to the library. Moreover, the library should also be able to configure the detector and save the used settings. Therefore, an additional class is implemented handling the configuration settings for the detector. Furthermore, this class is also able to save and load detector settings.

The analysis tools are intended to process and visualise the obtained data in order to verify them. Therefore, functions to load, decode, and format pixel data are implemented. For a more advanced analysis, clustering and filtering are implemented. Furthermore, the analysis tools also should be able to equalise the detector’s pixel matrix using noise, calculate an energy calibration, globally and per-pixel, and do a time walk correction. The main challenges are handling the size of the data and reducing the processing time. In order to reduce the RAM utilisation, files can be loaded and processed sequentially.

3.3 Evaluation

3.3.1 Simulation setup

The firmware is simulated using the integrated ISIM tool in Xilinx ISE. Opal Kelly provides a set of VHDL procedures that mimic the timing behaviour of the real USB endpoints (see Section 2.3.1). Furthermore, the simulation is split into 2 parts: transmitter chain and receiver chain.

The transmitter chain covers data inputs via USB to serialisation of data to the detector. On the other hand, the receiver chain covers data outputs from the detector to transmission via USB.

Own procedures were added to interact with the firmware. Moreover, verifiable dummy data is

generated to be used in the simulation (see Tables 3.3a and b). The expected behaviour was

described in Section 4.2. Furthermore, the Firmware Core is not included in this simulation,

(34)

since those components were built to work with the actual detector. Simulating the detectors behaviour is highly complex and therefore extremely prone to errors. However, the modules inside the Firmware Core have been tested and verified independently.

(a) Transmitter chain test data

No. Dummy Data No. Dummy Data

1 0x4a_d8_b2_00 9 0x51_d6_e4_bf 2 0x27_b8_ae_e7 10 0xfd_76_f8_d6 3 0x10_51_ee_6a 11 0x50_b4_62_02 4 0xe0_28_16_6a 12 0x0a_89_ec_d6 5 0x91_43_bc_5f 13 0x02_7d_c7_ea 6 0xe4_be_70_26 14 0x3b_20_08_6d 7 0x36_31_6a_c7 15 0x80_ed_b2_2e 8 0x76_b9_a0_34 16 0x37_da_36_24

(b) Receiver chain test data

No. Dummy Data

1 0x4a_d8_b2_00_f5_c6 2 0x27_b8_ae_e7_45_69 3 0x10_51_ee_6a_3b_88 4 0xe0_28_16_6a_dd_bf 5 0x91_43_bc_5f_cf_71 6 0xe4_be_70_26_e8_21 7 0x36_31_6a_c7_d5_dd 8 0x76_b9_a0_34_0e_77 Table 3.3: Simulation test data for transmitter and receiver simulations

3.3.2 Calibration setup

The setup used to acquire calibration data is shown in Figure 3.2. A fluorescence material is placed in front of the X-ray source in a 45 ^◦ angle, the detector is placed below and shielded from the source using a lead plate. Furthermore, the X-ray source is configured to emit radiation with an energy of double the K α of the fluorescence material. The source’s tube current is adjusted based on the desired amount of data. Moreover, the detector is configured in ToT+ToA mode.

X-ray source

Fluorescence material

Detector Detector board

Readout system Lead shielding

90 ^◦

Figure 3.2: Measurement setup for calibration with fluorescence

3.3.3 Time-walk correction setup

Time-walk correction requires clusters of 2-4 pixels with a known energy. Therefore, an Am-241 source is placed above the detector in a distance of ca. 5 cm with a paper filter for alpha particles in between (Figure 3.3). The detector is configured in ToT+ToA mode and the measurement is run for as long as possible.

3.3.4 Imaging setup

For imaging an object is placed in between the source and the detector. An X-ray tube with small

focal spot has to be used and the object should be placed as close to the detector as possible to

(35)

3.3. Evaluation

Readout system Detector

Detector board

Americium source

Filter (Paper)

(a) Side view

Detector board

Detector

Filter (e.g. Paper)

Source

(b) Top view

Figure 3.3: Measurement setup for time-walk correction data acquisition

minimise eventual magnification and distortion of the object. The setup is shown in Figure 3.4. To improve the quality of the image, flat field correction can be applied (see Section 2.2.6). The bright field can be measured using the same setup without any object. Furthermore, imaging can be done in ToT+ToA mode to keep the spectral information of the object or in photon counting mode.

Design and verification of a USB 3.0 readout system for Timepix3 hybrid pixel detectors

Master’s thesis in Electronics Engineering

Design and Verification of a USB 3.0 Readout System for Timepix3 Hybrid Pixel Detectors

Till Dreier

MID SWEDEN UNIVERSITY Department of Electronics Design (EKS)

Examiner: Göran Thungström, goran.thungstrom@miun.se Supervisor: David Krapohl, david.krapohl@miun.se Author: Till Dreier, tidr1400@student.miun.se

Degree programme: Master-by-Research (M.Sc.) in Electronics, 120 credits Main field of study: Electronics Engineering

Semester, year: Spring, 2018

Abstract

The proposed readout system has been implemented and was verified in simulation

and experiments with X-ray radiation. USB 3.0 data streaming performed better

than anticipated reaching higher speeds as stated by Opal Kelly. Furthermore,

the firmware and the library function as intended. The hardware adapter

requires some changes to accommodate higher data speeds and the data chan-

nels have to be moved to different pins to allow synchronisation to an external clock.

Det beskrivna utläsningssystemet implementerades och verifierades i simulering och

experimentellt med hjälp av strålkällar. Dataströmmning med USB 3.0 utförde sig

bättre än förväntat och visade högre hastigheter som visas av Opal Kelly. Firmware

och biblioteket fungerar som förväntat. Adapterkortet fungerar men kräver vissa

ändringar för att tillåta högra datahastigheter. Dessutom måste datakanalerna

flyttas till olika ingångar för att synkronisera datakanalerna till en extern klocka.

Acknowledgements

I would like to thank my supervisor, David Krapohl, for his guidance and support during my studies, Dima Maneuski for his help during the development and the discussions we had, and Na- jeem Lawal for his help concerning FPGA development, especially in the beginning of this project.

I also would like to thank Xavier Llopart and Tuomas Poikela who I annoyed a few times about

issues with the Timepix3 chip. Furthermore, I want to thank all the others who I met on the

Medipix meeting in Glasgow and on the iWoRiD 2018 conference for the really interesting and

informative discussions, especially Stuart George from NASA, Val O’Shea from the University of

Glasgow, and Ian Horsewell from Diamond Light Source. Lastly, I would like to thank all the

others in the radiation sensor systems research group at Mid Sweden University, the rest of the

Medipix collaboration for their support and interest, and Florian Gebben for proofreading this

thesis.

Contents

Abstract iii

Sammanfattning iv

Acknowledgements v

Contents vii

1 Introduction 1

1.1 Background and problem motivation . . . . 1

1.2 High-level problem statement . . . . 1

1.3 Scope . . . . 2

1.4 Detailed problem statement . . . . 2

1.5 Outline . . . . 3

1.6 Contributions . . . . 4

2 Theory and Related Work 5 2.1 Timepix3 . . . . 5

2.1.1 Timepix3 detector . . . . 5

2.1.2 Timepix3 protocol . . . . 8

2.2 Physical phenomenons and compensation . . . . 11

2.2.1 Charge sharing . . . . 11

2.2.2 fToA propagation delay . . . 12

2.2.3 Detector equalisation . . . 12

2.2.4 Detector calibration . . . 13

2.2.5 Spectroscopy . . . . 14

2.2.6 Flat field correction . . . . 14

2.2.7 Time-walk correction . . . 15

2.3 Standards and interfaces . . . 16

2.3.1 USB 3.0 via the FrontPanel framework . . . 16

2.3.2 8b/10b data encoding . . . . 17

2.3.3 LVDS and SLVS signalling . . . . 17

2.4 Related work . . . 18

3 Methodology 21 3.1 Components, tools and software . . . . 21

3.2 Approach . . . 22

3.3 Evaluation . . . 23

3.3.1 Simulation setup . . . 23

3.3.2 Calibration setup . . . . 24

3.3.3 Time-walk correction setup . . . . 24

3.3.4 Imaging setup . . . . 24

4 Implementation 27 4.1 Hardware design . . . . 27

4.2 Firmware design . . . 28

4.2.1 Detector interface . . . 29

4.2.2 USB interface . . . . 31

4.2.3 System clocks . . . . 34

4.2.4 Pin driver . . . 35

4.2.5 Shutter control . . . 35

4.2.6 Filter core . . . 36

4.2.7 Flow control . . . 36

4.2.8 Pre-processing . . . . 37

4.2.9 Processing interface . . . . 37

4.2.10 Counters and debug modules . . . . 37

4.3 Library design . . . 39

4.3.1 Packet creator . . . 40