An FPGA implementation of neutrino track detection for the IceCube telescope

(1)

An FPGA implementation of neutrino

track detection for the IceCube telescope

−500 0 500 −500 0 500 −500 0 500 y x z

Examensarbete utf¨ort i Datorteknik av

Carl Wernhoff

LiTH-ISY-EX–10/4174–SE

(2)

Examensarbete utfört i Datorteknik vid Linköpings Tekniska Högskola

av Carl Wernhoff LiTH-ISY-EX–10/4174–SE

Link¨oping 2010

Handledare: Christian Bohm, Per Olof Hulth, Stockholms Universitet Examinator: Olle Seger, Link¨opings Tekniska H¨ogskola

(3)

Presentationsdatum

2010-03-10

Publiceringsdatum (elektronisk version)

2010-03-24

Institutionen för systemteknik Department of Electrical Engineering

URL för elektronisk version

http://www.ep.liu.se

Publikationens titel

An FPGA implementation of neutrino track detection for the IceCube telescope

Författare

Carl Wernhoff

Sammanfattning

The IceCube telescope is built within the ice at the geographical South Pole in the middle of the Antarctica continent. The purpose of the telescope is to detect muon neutrinos, the muon neutrino being an elementary particle with minuscule mass coming from space.

The detector consists of some 5000 DOMs registering photon hits (light). A muon neutrino traveling through the detector might give rise to a track of photons making up a straight line, and by analyzing the hit output of the DOMs, looking for tracks, neutrinos and their direction can be detected.

When processing the output, triggers are used. Triggers are calculation- efficient algorithms used to tell if the hits seem to make up a track - if that is the case, all hits are processed more carefully to find the direction and other properties of the track.

The Track Engine is an additional trigger, specialized to trigger on low- energy events (few track hits), which are particularly difficult to detect. Low-energy events are of special interest in the search for Dark Matter.

An algorithm for triggering on low-energy events has been suggested. Its main idea is to divide time in overlapping time windows, find all possible pairs of hits in each time window, calculate the spherical coordinates θ and ϕ of the position vectors of the hits of the pairs, histogram the angles, and look for peaks in the resulting 2d-histogram. Such peaks would indicate a straight line of hits, and, hence, a track.

It is not believed that a software implementation of the algorithm would be fast enough. The Master's Thesis project has had the aim of developing an FPGA implementation of the algorithm.

Such an FPGA implementation has been developed. Extensive tests on the design has yielded positive results showing that it is fully functional. The design can be synthesized to about 180 MHz, making it possible to handle an incoming hit rate of about 6 MHz, giving a margin of more than twice to the expected average hit rate of 2.6 MHz.

Nyckelord

FPGA, IceCube, neutrino, telescope, south pole, trigger, Track Engine

Språk

Svenska

x Annat (ange nedan) Engelska Antal sidor 87 Typ av publikation Licentiatavhandling x Examensarbete C-uppsats D-uppsats Rapport

Annat (ange nedan)

ISBN (licentiatavhandling)

ISRN LiTH-ISY-EX--10/4174--SE Serietitel (licentiatavhandling)

(4)

1 Abstract 1

2 Terms and abbreviations 2

3 Introduction 4

3.1 Background . . . 4

3.1.1 The neutrino elementary particles . . . 4

3.1.2 The IceCube telescope . . . 5

3.1.3 Triggering and the Data Acquisition System . . . 6

3.1.4 The Track Engine . . . 7

3.2 Aim . . . 7

3.3 Method . . . 8

3.3.1 Research and reference material . . . 8

3.3.2 FPGA implementation . . . 8

4 Analysis 9 4.1 The need of the Track Engine . . . 9

4.1.1 Hits . . . 9

4.1.2 Local coincidence hits . . . 9

4.1.3 Read-out . . . 10

4.1.4 Triggering with the Track Engine . . . 10

4.2 Premises . . . 10

4.2.1 Interfaces to existing systems . . . 10

4.2.2 IceCube Coordinate System . . . 12

4.3 Track Engine algorithm . . . 12

4.3.1 The idea . . . 13

4.3.2 A 2-dimensional example . . . 15

4.3.3 The algorithm . . . 15

4.4 Performance . . . 17

4.4.1 Hit frequency and pair frequency (fh and fp) . . . 17

4.4.2 Number of hits and pairs (nh and np) in a time window 19 4.4.3 Highest achievable number of hits per time window . . 20

4.5 Placing pairs in angle bins . . . 20

(5)

CONTENTS iv

4.5.1 Simple binning . . . 20

4.5.2 The binning problem . . . 21

4.5.3 Suggested binning . . . 22

4.5.4 Analytical expressions for the suggested binning . . . 23

4.6 Time sorting . . . 26 4.7 Implementation options . . . 26 4.7.1 Software (PCs) . . . 26 4.7.2 Logic (FPGA) . . . 28 4.7.3 Comments . . . 28 4.8 Borderline events . . . 28 5 The implementation 30 5.1 Full-TE system overview . . . 30

5.2 Introduction to FPGAs . . . 31

5.3 Hardware used . . . 32

5.3.1 The FPGA . . . 32

5.3.2 The embedded processor . . . 32

5.3.3 The evaluation board . . . 33

5.4 Communication within TE, data rates . . . 33

5.4.1 Currently used encoding of hits in IceCube . . . 33

5.4.2 Suggested encodings, resulting data rates . . . 34

5.5 The VHDL code . . . 35

5.5.1 Code amount . . . 36

5.5.2 VHDL code conventions used . . . 37

5.6 Internal representations . . . 38

5.6.1 DOM IDs . . . 38

5.6.2 Time stamps . . . 38

5.6.3 Position coordinates, lengths . . . 39

5.7 TE core and its units . . . 39

5.7.1 Input and output ports, other signal types . . . 39

5.7.2 System overview . . . 40

5.7.3 The preBuffer unit . . . 43

5.7.4 The pairProd unit . . . 43

5.7.5 The speedCrit unit . . . 47

5.7.6 The angles unit . . . 49

5.7.7 The preHistBuffer unit . . . 55

5.7.8 The histStage unit . . . 57

5.8 Comments on the size of TWbuffer . . . 62

5.8.1 Motivation . . . 63

5.8.2 Consequences of larger TWbuffer . . . 65

5.8.3 Behavior when TWbuffer is full . . . 65

5.9 BlockRAM usage . . . 65

5.10 Synthesis . . . 66

(6)

5.10.2 Area, logic utilization . . . 66

5.11 Testing with “TEtest” . . . 66

5.11.1 Testing of the histogram unit . . . 71

5.11.2 Testing the whole TE core design (system unit) . . . . 71

5.11.3 “Problematic track” problem . . . 74

6 Results and suggestions 79 6.1 Realizing the TE algorithm in hardware . . . 79

6.2 Realizing the TE algorithm in software . . . 79

6.3 Interfacing the TE to existing systems . . . 79

6.4 Possible improvements of the TE . . . 80

6.4.1 String geometry and coordinate system . . . 80

6.4.2 Rectangular-shaped bins . . . 80

6.5 Borderline events-problem . . . 80

(7)

Chapter 1

Abstract

The IceCube telescope is built within the ice at the geographical South Pole in the middle of the Antarctica continent. The purpose of the telescope is to detect muon neutrinos, the muon neutrino being an elementary particle with minuscule mass coming from space.

The detector consists of some 5000 DOMs registering photon hits (light). A muon neutrino traveling through the detector might give rise to a track of photons making up a straight line, and by analyzing the hit output of the DOMs, looking for tracks, neutrinos and their direction can be detected.

When processing the output, triggers are used. Triggers are calculation-efficient algorithms used to tell if the hits seem to make up a track—if that is the case, all hits are processed more carefully to find the direction and other properties of the track.

The Track Engine is an additional trigger, specialized to trigger on low-energy events (few track hits), which are particularly difficult to detect. Low-energy events are of special interest in the search for Dark Matter.

An algorithm for triggering on low-energy events has been suggested. Its main idea is to divide time in overlapping time windows, find all possible pairs of hits in each time window, calculate the spherical coordinates θ and ϕ of the position vectors of the hits of the pairs, histogram the angles, and look for peaks in the resulting 2d-histogram. Such peaks would indicate a straight line of hits, and, hence, a track.

It is not believed that a software implementation of the algorithm would be fast enough. The Master’s Thesis project has had the aim of developing an FPGA implementation of the algorithm.

Such an FPGA implementation has been developed. Extensive tests on the design has yielded positive results showing that it is fully functional. The design can be synthesized to about 180 MHz, making it possible to handle an incoming hit rate of about 6 MHz, giving a margin of more than twice to the expected average hit rate of 2.6 MHz.

(8)

Terms and abbreviations

ALU Arithmetic Logic Unit, the unit of a CPU performing binary arith-metic and logical operations.

Bin A “box” on the unit sphere. The unit sphere is divided into a couple of hundred bins.

BlockRAM or BRAM Resources of RAM memory (volatile working-memory) within the FPGA chip, available for the logic to use.

CPU Central Processing Unit, common term for a processor.

DAQ Data Acquisition System, a fundamental part of the IceCube de-tector, which, whenever a trigger has triggered, analyzes hits that occurred within the time window trigged for.

DOM Digital Optical Module, the photon-detecting units of the detector which along with cabling make up the strings. The DOMs consist of a PMT and electronics to handle data acquisition and communications with the systems on the surface.

Event Common name for hits caused by physics events such as traveling particles, as opposed to hits caused by PMT noise.

FIFO buffer First In, First Out-buffer, buffer storing elements and out-putting them in the order they were inputted.

FPGA Field Programmable Gate Array, integrated circuit to be configured by the designer after manufacturing implementing digital circuits. ICL IceCube Laboratory, a building at the surface above the IceCube

de-tector housing hardware and computer equipment for communicating with the DOMs of the detector and data analysis.

(9)

CHAPTER 2. TERMS AND ABBREVIATIONS 3

LC, Local Coincidence A possible attribute of a hit, meaning that the DOM over or under the DOM registering the hit also registered a hit simultaneously.

LUT Look-Up Table, a data structure used to replace calculations with looking-up pre-calculated values.

Neutrino Common name for the elementary particles electron neutrino, muon neutrino and tau neutrino. Neutrinos are electrically neutral, are able to pass through ordinary matter, travel with almost the speed of light and have a minuscule (but non-zero) mass.

PMT Photomultiplier Tube, very sensitive detectors of light, able to detect individual photons. Is part of a DOM in the detector.

RAM Random Access Memory, volatile working-memory.

ROM Read-only Memory, a memory that can only be read and not written to. For ROM in FPGAs, BlockRAM resources are used.

String The vertical structures of cabling and DOMs making up the detec-tor.

VHDL VHSIC Hardware Descriptive Language (VHSIC is acronym for Very High Speed Integrated Circuit), a language for describing digital logic.

(10)

Introduction

3.1 Background

IceCube is a neutrino telescope being built on Antarctica. The IceCube project is a large international collaboration of researchers and universities. In Sweden, the physics department at Stockholm University and a research group at Uppsala University are involved in the project.

The project is mainly funded by the National Science Foundation (US), but is also partly funded by Knut and Alice Wallenberg Foundation and the Swedish Polar Research Secretariat.

3.1.1 The neutrino elementary particles

The neutrino elementary particles are created in various nuclear reactions, such as those that take place in the sun and in other parts of space. Their mass is close to zero and they travel with approximately the speed of light. There are three kinds of neutrinos; electron, muon and tau neutrinos, each of them also having an antiparticle. IceCube is most sensitive to the muon neutrinos.

Neutrinos seldom interact with other particles. This property makes them hard to detect, but it also makes them particularly interesting since their travel through space is not affected by magnetic fields and since they even travel through matter. Every second, billions of neutrinos pass through the human body.

By learning more about neutrinos, their origins and attributes, our knowledge of the universe and of such things as supernova explosions, gamma-ray bursts and black holes grows. We might also be given clues to what Dark Matter is, and what its properties are.

(11)

CHAPTER 3. INTRODUCTION 5

Figure 3.1: The Amundsen-Scott South Pole Station. (Photo from the IceCube Collaboration.)

Figure 3.2: The IceCube telescope with an illustrated track. Colored bulbs represent detected photon hits. (Graphics from the IceCube Collaboration.)

3.1.2 The IceCube telescope

IceCube is a neutrino telescope being built within the ice close to the ge-ographical south pole at Antarctica. An American research station, the Amundsen-Scott South Pole Station (Figure 3.1), is situated right beside the detector, and it contains the base camp for building and maintaining the detector.

In the (very seldom) event of a muon neutrino colliding with a proton or a neutron in a water molecule, a muon is produced, and that muon will have roughly1 _{the same direction as the neutrino had. When traveling through}

ice, the muon will radiate a so-called Cherenkov Cone of photons. It is these photons that are detected in IceCube. Since they make up a track, the direction of the track can be found. Figure 3.2 shows the detector during a muon track event.

The photons are detected by Photo Multiplier Tubes (PMTs), which are built into units called Digital Optical Modules (DOMs) (Figure 3.3). The detector will consist of 5200 DOMs when it is completed, spread out in a volume of 1 km3 _{of ice, at depths of 1500 to 2500 meters below the surface.}

1

(12)

Figure 3.3: A DOM just before deploy-ment. (Photo from the IceCube Collabora-tion.)

Figure 3.4: The ICL. (Photo from the IceCube Collaboration.)

DOMs and their cabling make up vertical strings. On a string, the DOMs are spaced 17 meters2 _{(vertically), and the strings are spaced about}

125 meters (laterally).

Currently, a large part of the strings have been deployed in the ice, with the last strings to be deployed in the winter season 2010/2011. The detector is already running with the deployed DOMs which are producing hits giving scientific data.

3.1.3 Triggering and the Data Acquisition System3

Hit data is collected by the Data Acquisition System (DAQ), housed in the IceCube Laboratory (ICL) (Figure 3.4), a building at the surface. The DAQ is implemented in software running on a large farm of industrial PCs. Neutrinos are looked for by searching for photon hits seeming to build up straight tracks.

A full analysis of all hits in the detector is impossible to perform con-tinuously in real-time due to the large number of hits. Softwares known as triggers are used to make a rough estimate of whether the hits in the detector, at any single moment, might be of interest or not. Only if they seem to be, all hits are read out from the detector and an analysis of those hits is performed.

The current triggers only look at a few percent of the most qualitative

2

Six so-called Deep Core-strings in the middle of the detector, not yet deployed, will have less vertical spacing for increased resolution.

3

A more detailed discussion of these issues and of the need of a Track Engine can be found in Section 4.1.

(13)

CHAPTER 3. INTRODUCTION 7

hits4_{. The main strategies for triggering is currently to look at the}

multiplic-ity (number of) such hits. Only if above a defined threshold, the detector triggers and an analysis of all hits within this time window is performed.

Many muons traversing the detector are not caused by neutrinos. There are two ways of filtering out those tracks: first, one can regard only the upward-going tracks5_{, and, second, one can regard only the tracks that start}

in the detector.

When the detector is completed, it is expected to find some 50 000 neutrinos each year.

3.1.4 The Track Engine

Triggering with the strategy of applying a multiplicity condition to the num-ber of the most qualitative hits is simple, and possible to perform contin-uously in real-time, which is necessary. However, the existing triggers are typically not so good in triggering on the dim tracks, that is, tracks with few photon hits, caused by low-energy muons.

The suggested Track Engine is a trigger that will take all hits into ac-count, and that not only will consider the multiplicity, but also analyze if they seem to make up a straight track. The challenge is to be able to perform such calculations in real-time for the very large flow of hit data at hand.

A rather simple algorithm has been suggested, but it has been believed that a software implementation will still be too slow to be able to run it continuously in real-time, at least if running it at one single PC. Therefore, the Track Engine will either have to be implemented in hardware, or, some distributed solution using several PCs running software will have to be used.

3.2 Aim

There has been one primary and two secondary aims of the Master’s Thesis project. The primary aim has been to

• make an FPGA design that implements an as large part as possible of the Track Engine,

and the secondary aims, to

• evaluate the both options of realizing the Track Engine in software (PCs) or hardware (FPGA)

• investigate how the Track Engine should interface to the existing sys-tems of the IceCube telescope.

4

Those hits being the local coincidence hits. This will be further described in sec-tion 4.1.

5

Neutrinos are the only particles being able to travel through the whole earth, hence, upward-going muons must be caused by neutrinos.

(14)

3.3 Method

3.3.1 Research and reference material

There was no complete documentation of e.g. the DAQ. Although such sys-tems are in use and have been for some time, they are still under develop-ment, which is one cause of the difficulty of finding detailed documentation. For details, personal contact with the developers were taken.

3.3.2 FPGA implementation

The design is coded in VHDL, a hardware description language (HDL). The ModelSim software has been used for simulation and Xilinx’s ISE has been used for synthesis.

For the VHDL coding, the design flow used in the beginning was typically Coding → Test bench coding → Compiling for simulation →

Running simulation → Synthesis but as the work evolved, a work flow of

Coding → Compiling for simulation → Synthesis → Test bench coding → Compiling for simulation → Running simulation

appeared to be more convenient. Synthesizing before simulating makes sense since the timing performance and resource usage results tell if the chosen design strategy is at all possible, whereas simulation runs typically reveal errors in details of the design. However, before synthesis, the code was compiled in the simulation tool to reveal syntactical errors.

(15)

Chapter 4

Analysis

4.1 The need of the Track Engine

A more detailed discussion on the need of the Track Engine follows.

4.1.1 Hits

When a PMT in a DOM detects a photon, it is said that there is a hit in the DOM. When a hit occurs, the DOM will store both the waveform of the PMT signal and the exact time when the hit occurred. The hit will be stored in a buffer so that it can be sent up to the surface later.

The PMTs are said to have a noise rate of 500 Hz, meaning that each PMT and hence each DOM in the ice registers a hit about 500 times per second. Only a small fraction of the hits corresponds to photons originating from neutrino events.

With 5200 DOMs, 500 Hz noise rate for each DOM and waveforms stored for each hit, it can be understood that the total raw data rate produced in the detector is very large.

4.1.2 Local coincidence hits

A simple and rough measure of the quality of a hit is the local coincidence (LC). A hit is an LC hit if there is another simultaneous1 _{hit in any of}

the neighboring DOMs; neighboring meaning the nearest DOM up or down along the string2_{. As can be understood, LC-hits appear in pairs.}

If a hit is an LC-hit, it is much more likely to be the result of a physics event of interest than random noise. About 2% of the hits in the telescope are LC-hits.

1_{“Simultaneous” being defined as a limit of the difference of the times.}

2

There is support for loosing this condition by considering DOMs further away (e.g. two DOMs up or down the string) being neighboring DOMs.

(16)

4.1.3 Read-out

When a trigger triggers, it issues a trigger signal, causing the DAQ to per-form a read-out of all hits (also non-LC hits) in the detector within the time window triggered on. The hits are then analyzed in order to find out whether they make up a track or not and what its direction and possibly other properties are.

Data to be stored is stored on tape, flown to the McMurdo Base at the coast of the continent, shipped to the US and then finally made accessible at the University of Wisconsin. Part of the data is sent to University of Wisconsin over satellite link directly from the Amundsen-Scott South Pole Station.

4.1.4 Triggering with the Track Engine

As mentioned, the existing triggers are likely to miss dim tracks caused by low-energy muons.

The Track Engine is an additional trigger suggested by prof. Dave Ny-gren, Berkeley Laboratories (during 2008 being a guest researcher at Stock-holm University).

By a simple algorithmic approach it shall be possible to trigger also on the very dim tracks. This will make it possible to detect neutrinos of much lower energies than is otherwise possible. Finding those neutrinos is especially interesting in the search for Dark Matter.

4.2 Premises

4.2.1 Interfaces to existing systems

In the ICL, computers known as stringHubs communicate with the DOMs. The stringHubs are industrial PCs equipped with a number of so-called DOR (DOM Read-out) cards that handle the low-level communication with the DOMs.

Except for acting as the interface to the DOMs, the stringHubs also have the important task of re-calculating the time stamps of all hits from DOM time to IceCube time.

When detecting a hit, the DOM will tag the hit with a time stamp according to its internal clock. The clocks within the DOMs will however be slightly out of phase, and the stringHubs therefore has a system of querying the DOMs about their internal time. By keeping track of the offset from the global time used, called the IceCube time, the stringHub can correct the time stamps it receives. This process is known as the time transformation.

The DAQ is the system used for analyzing hits and storing away the result. An overview of the detector, the stringHubs and the DAQ can be found in Figure 4.1.

(17)

CHAPTER 4. ANALYSIS 11 stringHub IceCube (DOMs in the ice) stringHub stringHub Existing triggers All hits

Only LC hits Data Acquisition System Trigger signal ICL stringHub IceCube (DOMs in the ice) stringHub stringHub Existing triggers Trigger signal All hits

Only LC hits Data Acquisition System Trigger signal Track Engine ICL All hits

Figure 4.1: The detector, string hubs and DAQ. Current layout (above) and with the TE added (below).

(18)

Figure 4.2: The definition of zenith (θ) and azimuth (ϕ)

From the above discussion it is found that the Track Engine will have to interface to the stringHubs, in order to receive the hits, and to the DAQ, which it should provide the trigger signal to. The trigger signal should contain the time window of interest, within which the hits will be read out. The stringHub computers have an unused ethernet port that the Track Engine could use to receive hits through.

The DAQ is typically interfaced to by ethernet as well.

4.2.2 IceCube Coordinate System

A cartesian coordinate system known as the IceCube Coordinate System is used within the IceCube collaboration. The coordinate system is most important when it comes to the exact location of each DOM.

The coordinate system has its origin in the middle of the detector with the positive z-axis pointing up.

The zenith (θ) and azimuth (ϕ) angles are defined according to Fig-ure 4.2. These definitions are the same as for θ and ϕ in (a common defini-tion of) the spherical coordinate system. The intervals used for the angles are:

θ ∈ [0, π[ ϕ ∈ [0, 2π[

4.3 Track Engine algorithm

The Track Engine algorithm is suggested by prof. David Nygren. Its aim is to determine whether the hits in the detector during some time window is random noise or if they seem to make up a straight line – a track, originating from a muon.

(19)

CHAPTER 4. ANALYSIS 13 µs 1 2 3 4 5 6 7 8 time Time window 1 Time window 2 Time window 3 Time window 5 Time window 4

Figure 4.3: Time windows

4.3.1 The idea

The maximum life time of a track within the detector should be dependent of the velocity of the particle and the size of the detector. A time window of width

Tw= 5 µs

has been suggested. The time window is to be slided every 1 µs. This will lead to a division of the hits into (overlapping) time windows according to Figure 4.3.

For each time window, all possible pairs of hits are studied. All such pairs are considered candidates of belonging to a track.

With only noise hits, for the number of hits in a time window nh,tw, we

expect nh,tw = fhTw = 13 on average.

Some pairs can be immediately excluded because of the geometrical dis-tance between them (between the DOMs that detected them), the difference of the time stamps and the expected speed of the particle. If these quantities do not make sense together, the pair is thrown away. Currently, the limits [0.3c, 1.0c] are used for the implied speed v, c being the speed of light. Al-though the muon travels with the speed of light, the spread of the Cherenkov Cone and possible refraction in the ice before the photon reaches the PMT, causes the lower limit of 0.3c to make sense.

For the remaining pairs, it needs to be determined whether they seem to make up a track or not.

For each pair the zenith (θ) and azimuth (ϕ) angles are calculated. Now, if all the hits are random noise, no θ or ϕ angle should be more frequent than any other. However, if some of the hits are hits within a track, the θ and ϕ for the track should be more common than the other angles. Histogramming the angles is a good way of seeing how they are distributed. This idea is

(20)

3060 90120 150180 0 30 60 90 120 150 180 210 240 270 300 330 360 0 2 4 6 8 10 12 theta [deg] 2−dimensional histogram over theta and phi

phi [deg]

multiplicity

Figure 4.4: θ and ϕ angles histogrammed in a 2-dimensional histogram. A peak indicating a track can be seen at (θ, ϕ) ≈ (130◦_{, 250}◦_).

shown in an example in the section below.

A 2-dimensional histogram over θ and ϕ is created. θ and ϕ are dis-cretized, leading to a number of so-called bins. Figure 4.4 shows what such a histogram might look like3_{. It can be understood that building the}

2d-histogram is the same as keeping track of the number of pairs for each and all of the bins.

For all pairs, their bin is determined, and when a complete time window of pairs has been processed, there is for each bin information on how many pairs were placed in that bin. One additional piece of information is also necessary for each bin: whether at least one of the pairs placed in the bin had at least one LC-tagged hit.

When the whole time window of pairs has been processed, all bins with at least nhist.thresh.4 pairs, with at least one pair with one LC-tagged hit,

are considered to correspond to a detected track.

There may be none, one or several detected tracks in a time window. If there was at least one, a trigger signal to the DAQ is sent for that time window (see further details below).

3

In reality, the binning is not as is indicated in the figure. The binning actually used is described in Section 4.5

4

(21)

CHAPTER 4. ANALYSIS 15

4.3.2 A 2-dimensional example

The idea of the TE algorithm is shown in a 2-dimensional example in Fig-ure 4.5

The direction of a possible track is called ϕ, defined as the angle between the track and a horizontal line.5

With only noise, no significant peaks appear in the histogram. With a (weak) track signature as in the middle graph, a peak around 25◦ _appears

in the histogram. Also more than one track signature in each time window can be detected as shown in the rightmost graph.

For the graphs, 13 noise hits and 13 hits for each track signature were used, with the number of noise hits thus corresponding to nh,tw.

The track signatures used were:

l1: y = 0.5x + 0.2

l2: y = −0.7x + 1

Further, we have

ϕ1 = arctan(0.5) ≈ 27◦

ϕ2 = arctan(−0.7) + 180◦ ≈145◦

which corresponds to the peaks in the histograms.

Note: The detector is, very roughly, 1 km × 1 km × 1 km. The unit square in Figure 4.5 can be thought of as being 1 km × 1 km. The standard deviation σ used when spreading the track hits around the ideal lines l1 and

l2 were σ = 0.05. 0.05 km = 50 m, comparable to the optical absorption

length in the ice λabs, λabs ≈100 , and to the effective scattering length of

photons in the ice λef f, λef f ≈25 m.6

4.3.3 The algorithm

The Track Engine algorithm is applied to all hits within each time window, and, written out point by point, it is as follows:

1. Find all pairwise combinations of the hits.

5

The interval used for ϕ is [0, π], which means that no information of the direction of the track, which would require an interval of [0, 2π], is at hand. In the real case, the direction of the track is important.

6

λabs and λef f from “The IceCube Data Acquisition System: Signal Capture,

(22)

0 0.5 1 0 0.2 0.4 0.6 0.8 1 x y noise

noise samples: 13, line samples: 13, [k1 m1 k2 m2]= 0.5 0.2 -0.7 1, [phi1_deg phi2_deg]= 26.5651 145.008 [stddev1 stddev2]= 0.05 0.07 0 0.5 1 0 0.2 0.4 0.6 0.8 1

noise and track signature l1

0 0.5 1 0 0.2 0.4 0.6 0.8 1

noise and track signatures l1 and l2

0 50 100 150 0 20 40 60 80 100 120 140 0 50 100 150 0 20 40 60 80 100 120 140 phi [deg] multiplicity 0 50 100 150 0 20 40 60 80 100 120 140

(23)

2. For all pairs, find the geometrical distance l between the DOMs for the two hits and the time difference ∆t between the two time stamps. Find the implied velocity v and discard all pairs not confirming to

v := l

∆t ∈[0.3c, 1.0c] , c being the speed of light.

3. Find the zenith angle θ of all pairs. 4. Find the azimuth angle ϕ of all pairs.

5. Find the bin of the (θ, ϕ) combination for all pairs.

6. Build a histogram over the bins for all pairs in the time window—or, equivalently—for each bin, count how many pairs was assigned to that bin.

7. For all bins containing at least nhist.thresh. pairs, of which at least one

pair has one LC-tagged hit, consider this to correspond to a detected track.

8. If there was at least one detected track in the time window, send out a trigger signal from the Track Engine.

The trigger signal is a packet of data containing: • The start and end times of the time window • The total number of pairs in the time window • For all detected tracks (but maximum 10):

– The bin number (corresponding to intervals for θ and ϕ) – The number of pairs in the bin

4.4 Performance

4.4.1 Hit frequency and pair frequency (fh and fp)

From the algorithm we can see that the Track Engine calculations are cen-tered around pairs. We want to determine what rate of pairs we will have to be able to handle for a realization of the algorithm.

Call the rate of pairs fp. Call the rate of hits (earlier called the noise

(24)

Consider the time window as a buffer always containing all hits not too old. The pairs are produced by pairing each incoming hit with all the hits in the time window. The time window contains fhTw hits. The number of

hits entering every second to be paired with those hits is fh. Hence we have

fp = fh2Tw . (4.1)

For example, with some reasonable numerical values,

fh = 5200 DOMs · 500 Hz/DOM = 2.6 MHz

Tw = 5 µs

we get

fp ≈30 MHz ,

that is, 30 million pairs each second to process for the expected average noise rate.

We also want to determine the highest achievable hit frequency. We hence need the relation between fh and and the clock frequency fcand that

relation is given by the cycle efficiency ηcyc, explained in Section 5.7.4 on

page 44, through the expression

fp = ηcycfc . (4.2)

Taking (4.2), substituting fp according to (4.1) and ηcyc according to

(5.1) (page 44) with nextra= 4 (see 7) in (5.1) and solving for fh yields the

(positive) solution fh(fc) = − nextra 2Tw + s n2 extra 4T2 w + fc Tw . (4.3)

(4.3) is plotted in Figure 4.6. Since the design is intended to be clocked at 180 MHz, we can see that roughly fh = 6 MHz will be the highest achievable

hit frequency. This is more than twice the expected average hit frequency 2.6 MHz.

It should be remembered that Expression (4.1) is only valid when TWbuffer is not full, since fhTw (the number of hits in TWbuffer) has a maximum

value of 31 (see Section 5.8 on page 62). Also, fhTw in (5.1) has the same

maximum value. In other words, the above is only valid for fhTw ≤31 ⇐⇒ fh≤

31 Tw

= 6.2 MHz which is true for the larger part of Figure 4.6.

7

The value for 4 for nextracan be understood through the state machine in Figure 5.7

(25)

CHAPTER 4. ANALYSIS 19 120 140 160 180 200 5 5.2 5.4 5.6 5.8 6 6.2 6.4 clock frequency f c [MHz] hit frequency f h [MHz]

Figure 4.6: Achievable hit frequency vs. the clock frequency of the design.

4.4.2 Number of hits and pairs (nh and np) in a time window

We seek the relation between the number of hits and the number of pairs for one single time window. To find such a relation, we must assume a steady hit frequency, that is, we must assume several consecutive time windows with approximately the same number of hits.

We call the number of hits per time window nh. nh= fhTw will be true

for a steady fh according to the previous section. A hit is not paired with

itself, leading to nh(nh−1) possible pairs. However, the pairing is only done

in “one direction” (“all possible un-ordered pairs are produced”), e. g., for the three hits 1, 2 and 3, if (2,1) is a pair, (1,2) is not another pair, and all pairs would be (2,1), (3,1) and (3,2). From this follows that from the number of hits in a time window nh, the number of pairs np for that same

time window would be

np =

nh(nh−1)

2 .

For a given np, solving the above expression for nh yields the positive

solution

nh = 1₂+

q

1

(26)

4.4.3 Highest achievable number of hits per time window

We will estimate the number of pairs for each time window. With 1

Tw being

the number of time windows per second, we expect np,tw= fp

1 Tw

= 150 ,

np,tw being the number of pairs per time window. This is true for the

expected average pair frequency, fp = 30 MHz. The highest achievable fp in

the implementation is an fp close to the clock frequency fc, that is, handling

one pair each clock cycle. If this is the case, we will get an np,tw close to

fc/_T1_w = 900 pairs per time window.

How many hits per time window does this correspond to? This is de-pendent on the size of TWbuffer, np,twmax. Assume TWbuffer is full and

will remain full for a whole set of incoming hits for a time window. Call the number of incoming hits for that time window nh∗, introducing starred

variables for highest achievable number of hits or pairs for a time window. All those hits will be paired with all hits in TWbuffer and this should give 900 pairs:

900 = np,twmaxnh∗ ⇔ nh∗= 30

for np,twmax = 30 which is one suggested value for that parameter.

All this would mean that the TE is capable of processing np,tw ≈ 900

pairs ornh∗= 30 hits each time window for Tw = 5 µs.

Having a higher rate of incoming hits than that would mean that the buffers grow. It should be remembered, that when exceeding this limit, more pairs are generated, the number of pairs growing with the square of the number of hits, potentially giving the TE a load of pairs so large that it won’t be able to catch-up again. There should be some mechanism making sure that the TE is never reached by more than about 30 hits per time window.

4.5 Placing pairs in angle bins

As has been mentioned before, a bin is a combination of one interval for θ and one interval for ϕ. The pairs are placed in bins in order to determine if some direction is more common than others.

4.5.1 Simple binning

It would be straight-forward to just divide the θ interval ([0, π]) and the ϕ interval ([0, 2π]) in some number of equally-sized intervals. A binning according to this “simple” method is shown in Figure 4.7 by having plotted the bin limits over the unit sphere. The detector can be thought of as being

(27)

CHAPTER 4. ANALYSIS 21 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 x Bins on the unit sphere (tot. 262 bins)

y

z

Figure 4.7: Simple limits. The top and bottom bins have been given special treatment.

in the middle of the sphere with points on the sphere represented directions of incoming particles.

It can be clearly seen that the bins are not of the same area, which is desirable since the number of pairs in a bin decides whether it is considered a real track or not.

4.5.2 The binning problem

Instead, we should look for a binning giving bins of similar sizes as they appear on the unit sphere. This problem is similar to that of sewing a soccer ball: create the surface of the unit sphere by using equally-sized surface segments. Joining octagons and hexagons are established means of achieving this.

However, we must keep the implementation in mind. The properties of a pair used to determine its bin are its θ and ϕ angles. With a oc-tagon/hexagon binning, either rather complex calculations or a look-up ta-ble must be used. There is room for neither in an FPGA implementation8_.

We need a binning with roughly consistent bin areas and a simple way of determining bin from the θ and ϕ properties.

8

Such a LUT would be to big to fit in the Block-RAM of the FPGA. With 5300 DOMs

there are 53002_{/2 DOM pairs (without regard to order) and with 9 bits to identify the}

bin, 53002/2 × 9 ≈ 16 Mbyte. Placing the LUT in external RAM probably wouldn’t be

(28)

−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 x y z

Figure 4.8: Suggested and used binning, here with 328 bins.

4.5.3 Suggested binning

The binning method I suggest and use is roughly as follows:

1. Slice the sphere a number of times along the XY-plane. θ angles define where the slices should be made. Choose the discrete θ angles uniformly over [0, π].

2. Keep the top and bottom slices as they are.9

3. For the other slices, cut them into a number of segments, those seg-ments becoming bins. ϕ angles define where the cuts should be made. Choose discrete ϕ angles uniformly over [0, 2π].

Note that the ϕ angles defining the cuts are different for each slice since the number of ϕ cuts depend on the latitude of the slice. This binning produces bins on the unit sphere as shown in Figure 4.8.

The sizes of the bins are visualized in Figure 4.9. It can be seen that the biggest area differences are around 15%, and that the bins on all slices except the top- and bottom slices and their neighbours (only about 20 out of over 300) have area differences within 4%. With “simple binning”, the greatest bin areas are several times as large as the smallest.

9_{Dividing also the top and bottom slice in segments at different ϕ angles is possible,}

(29)

CHAPTER 4. ANALYSIS 23 0 50 100 150 200 250 300 350 100 102 104 106 108 110 112 114 Bin no.

Size, % of the smallest bin

Bin sizes

Figure 4.9: Bin sizes with suggested binning.

4.5.4 Analytical expressions for the suggested binning

Here follows an analytical description of the bin limits in terms of θ and ϕ according to the suggested and used binning method (shown in Figure 4.8).

Input parameter:

nb number of bins to aim for

Output variables:

nθ number of θ slices

nϕ,kθ number of ϕ cuts for θ slice with index kθ

θkθ the θ slice angles

ϕkθ,kϕ the ϕ cut angles for θ slice with index kθ

with kθ= 0, 1, . . . , nθ−1, kϕ= 0, 1, . . . , nϕ,kθ −1

The expressions for the nθ and nϕ,kθ variables are:

nθ = √ nbπ 2 −1 nϕ,kθ = ckθ 2 r nb π −1

(30)

x y z θ₀ θ₁ θ₂ θ_nθ-1 ϕ_kθ,0 ϕ_kθ,1 ϕ_kθ,nϕ-1 ∆θ y z x θ₁ θ₂_ϕ 1,1 ϕ1,2 ϕ_1,3 c₁

Figure 4.10: Variables in the binning expressions

θ0= 2 √ nb (top slice) θnθ−1= π − 2 √ nb (bottom slice) θkθ = θ0+ kθ∆θ, kθ = 1, 2, . . . , nθ−2 (middle slices)

The expressions for the ϕ cut angles are10_:

ϕkθ,kϕ = 2π

kϕ+ 1

nϕ,kθ + 1

The expressions for the helper variables ckθ and ∆θ used above are:

ckθ = 2π sin π(kθ+3₂) nθ+ 1 ∆θ = π − 2θ0 nθ−1 = π − 4/ √ nb nθ−1

The helper variables have geometrical meanings. ckθ is the circumference

at the θ angle right between θkθ and θkθ+1. ∆θ is the difference (or distance

along θ on the unit sphere) between two adjacent θ slice angles. Please refer to Figure 4.10.

Motivation

The unit sphere has an area of 4π from which follows that we will take aim at keeping each bin area as close as possible to

(31)

Ab =

4π nb

.

The top and bottom slices are not divided with ϕ cuts. Call the top/bot-tom bin area At, seek an expression and solve for θ0:

At≈ πθ02 = Ab ⇐⇒ θ0 = 2 √ nb ⇒ θnθ−1 = π − 2 √ nb

We need to find the number of θ cuts desired. Since the bins will have square-like shapes we have a desired bin side length of√Ab. We hence have

∆θ =pAb , (4.4)

and using a simplified expression for ∆θ,

∆θ = π

nθ+ 1

we solve (4.4) for nθ and round nθ to the nearest integer11.

To find the middle θ slice angles, the rest of the θ interval should be split up uniformly over [θ0, θnθ−1]. Doing this gives the “stripe width”, the

helper variable ∆θ, and the expression for θkθ for the middle slices follows.

We now seek the nϕ,kθ expression. Take a slice, cut at ϕ = 0 and consider

the resulting stripe. It should be cut such a number of times that the length of the pieces are as close to√Ab as possible. We seek the expression for the

length of the stripe, which is the same as ckθ. The circumference at a θ is

cθ = 2π sin θ .

However, for a given kθwe want the circumference right between θkθ and

θkθ+1, that is, the circumference at

θ = π(kθ+

3 2)

nθ+ 1

which gives the expression for ckθ as given above.

As soon as we know the number of ϕ cuts for a stripe, we need only to distribute them uniformly over [0, 2π] which gives the expression for ϕkθ,kϕ

as given above.

11_{In the algorithm actually used, the nearest even integer has been chosen in order}

not to get a θ limit at θ = π₂ since this is a singularity in the quantity used by the

(32)

4.6 Time sorting

As stated earlier, current triggers only handle LC (local-coincidence) hits. The LC hits are time-sorted in the string hubs which means that the triggers receive a time-sorted stream of hits.12

If however all hits, LC and non-LC, are to be sent on from the string hubs, they will be unsorted. The TE algorithm must have a stream of sorted hits as its input. Hence, a time-sorting is needed, capable of sorting all hits. With 2.6 MHz of hits, dispersed in times for possibly ten minutes or more, it can be understood that such a task may not be simple. Being able to time-sort the hits is a major issue with the TE.

There exists a software sorting implementation with sufficient perfor-mance developed at Stockholm University by Clyde Robson. It is able to sort all the hits with the algorithm running on an ordinary PC.13

The algorithm has been tested, the sending and receiving of the hits from other network clients included. Performance was up to 20 Mhits/s14

(receiving, sorting and sending). The performance needed for the TE at average load is, as stated earlier, around 2.6 MHz.

4.7 Implementation options

It has been discussed whether the TE can be implemented in none, one or both of software and hardware. The implementation must be able to execute the TE algorithm in real-time for the expected hit-rates. The options are discussed below.

4.7.1 Software (PCs)

A single PC

It is generally hard to estimate the performance of software not yet written. However, some simple argumentation might give some clues.

For the pair rate fp we have fp ≈30 MHz. The clock rate for a typical

PC processor is 3 GHz. This gives 100 clock pulses for each pair. A software implementation of the TE must:

• Synthesize the pairs

12

The hits would typically not be sorted in time if no action to achieve that would be taken. Buffers in the DOMs might hold the hits several minutes or more before passing them on to the surface.

13_{The algorithm is briefly described in “A low energy muon trigger”, C. Bohm, D.}

Ny-gren, C. Robson, C. Wernhoff and G. Wikstrm, posted to the IEEE/NSS 2008 conference in Dresden.

14

“Track Engine status” (presentation slides), C. Bohm, D. Nygren, C. Robson, C. Wernhoff and G. Wikstrm, IceCube DeepCore workshop in Utrecht, 2008

(33)

• Check the speed criteria for each pair:

– Determine the positions of the two DOMs – Calculate the distance

– Calculate the implied velocity from the distance and the time stamp difference

– Compare the implied velocity to given limits and reject or keep the pair

• Calculate θ and ϕ for each pair

• Build the histograms, search each finished histogram for peaks and report the peaks

• Handle the input (hits) and output (reported tracks) interfaces of the TE

It seems that managing this with the given pair rate will be hard. Note: It could be argued that all the “calculate”s above could be re-placed by “look-up”s. This might possibly give increased performance. It should however be kept in mind that those kinds of memory operations typ-ically result in cache misses and hence we are looking-up in the external memory with its limited memory bus clock and big delay as compared to the processor.

Multiple PCs

There is a way of dividing the load of the algorithm between several PCs. Hence, principally any performance can be achieved by just using more PCs. A structure of several PCs, of which one is called the arbiter and the other the clients, is suggested. The arbiter receives the hits through e.g. an ethernet interface card. It is also equipped with one ethernet card per client.

The arbiters task is to distribute the hits to the clients. It will direct the hit stream to one client at a time, changing which client is to receive the stream according to some condition, for example a time condition, jumping to the next client, say, every second.

Each client work in to phases, the receive phase and the compute phase. During the receive phase, the computer stores the incoming hit stream into memory. During the compute phase, it reads the stored hits from the mem-ory, synthesizes the pairs and executes the algorithm. It then reports the detected tracks in some way.

If the arbiter simply redirects the stream of hits to the next client, the last time window at the old client and the first time window of the new client will have incomplete hit pairs. Either, this could simply be accepted,

(34)

Client

Arbiter

Client Hits

Figure 4.11: Using multiple PCs

since the number of lost tracks will be very small. Otherwise, the arbiter could overlap the hit streams by the time window size. In that case, the computer collecting the results from the clients would need to check for the duplicated tracks and remove one of them.

4.7.2 Logic (FPGA)

The TE can be implemented using logic. Performance is enough, with mar-gin, for executing the algorithm in real-time. The larger part of the Master’s Thesis project has been about achieving such an implementation and the result was successful. The rest of this report provides details on the imple-mentation.

4.7.3 Comments

Considering the special place the TE is intended to be deployed at, e.i. the South Pole, the FPGA option has some clear benefits when it comes to physical space and power consumption.

4.8 Borderline events

A drawback of the TE algorithm is the inability to handle borderline events, e.i., events scattered around a bin limit. Figure 4.12 visualizes the problem. For the likelihood to trigger on an event like in b to be as high as to trigger on an event like in a, twice as many pairs are needed. For events like in c, four times as many pairs are needed.

(35)

a b c

Figure 4.12: Figure visualizing the borderline events problem by illustrating three ways that pairs from a track can be scattered relative to the bin limits on the unit sphere.

(36)

The implementation

This chapter not only intends to present and give an overview to the FPGA implementation of the TE, but is also meant to be a reference and docu-mentation of the design. The chapter is therefore on some points rather detailed.

5.1 Full-TE system overview

The “full-TE” consists of all new hardware units and cabling needed to run the TE with the existing systems within the IceCube DAQ. The unit executing the algorithm, the logic within the FPGA, is called the “TE core”. Often, the TE core is referenced by simply “the TE”.

The full-TE (see overview in Figure 5.1) gets its input from the stringHubs, feeding the full-TE with hits. The stringHubs have an unused ethernet port, and the stringHub software will be modified so that all hits will be sent on through that port.

There are 86 stringHubs, the same as the number of strings. Using a hierarchy of switches, all hits are presented to a unit called the Control Server through one single ethernet cable.

The Control Server has three main tasks: • time-sort the stream of incoming hits • buffer the hits

• interface to the FPGA processor (see below) and send the time-sorted stream of hits on to it

In the Virtex-V FPGA chip, there is also a PowerPC processor which is used, called the FPGA processor. The FPGA processor has the following tasks:

• interface to the Control Server and receive the hits 30

(37)

CHAPTER 5. THE IMPLEMENTATION 31

CPU (input and output

interfaces) Logic (”TE core”) FPGA chip Control Server Existing IceCube DAQ reported time windows hits stringHubs hits

Figure 5.1: Overview of the so-called “full-TE”

• buffer the hits using the external RAM memory of the Xilinx ML-507 evaluation board

• interface to the TE core (the logic in the FPGA) in order to: – send the hits on to the TE core

– receive detected tracks messages from the TE core

• interface to the IceCube DAQ in order to send on the detected tracks messages to it

There are various ways to communicate between the FPGA processor and the FPGA logic. Probably, one of the buses provided by the PowerPC processor, such as the PLB (Processor Local Bus), will be used. Another option is to use the APU (Auxiliary Processor Unit) of the processor.

5.2 Introduction to FPGAs

An FPGA, Field Programmable Gate Array, is a programmable silicon de-vice. In it, logic is realized as specified by the user. The user describes what logic should be implemented, earlier by electronic logic circuit diagrams but nowadays usually by using a hardware descriptive language (HDL). VHDL, VHSIC1 _{Hardware Descriptive Language, is one such language and the one}

used for this project. An FPGA can be reprogrammed with new designs described by new or revised HDL code a very large number of times.

In the Track Engine project, there was a given algorithm that was to be implemented, the main options being software or an FPGA. The main

(38)

advantage of the FPGA in this case is the very much larger performance that can be achieved. In a CPU, only one binary computation (such as e. g. addition, comparison) can be performed at each clock cycle, since there is only one ALU2_{, the unit performing such operations, in the CPU.}

In an FPGA, on the other hand, we could have literally hundreds of thousands of adders, if we wanted to, working in parallel or serial. Several such components serially make up a pipeline, similar to the pipeline in a RISC processor. The TE FPGA design is in essence a long pipeline with hundreds of stages.

FPGA manufacturers usually offer evaluation boards, where the FPGA is pre-mounted on a circuit board with various components such as connectors, interfaces, buttons and LEDs. Such an evaluation board has been used and will also be used for the real implementation of the TE.

5.3 Hardware used

5.3.1 The FPGA

The TE core is intended for a Xilinx Virtex-V FPGA. The exact FPGA model number is XC5VFX70T in package FFG1136 with speedGrade -1, since this is the FPGA chip on the ML-507 board which is intended to be used.

The XC5VFX70T is intended for embedded systems and hence has a PowerPC processor embedded within the chip, which is a key feature for the TE since the design relies on a processor available within the FPGA chip. Some other features of the FPGA of interest for the TE are:

• 11,200 slices and 44,800 flip-flops

• Up to 550 MHz (BlockRAM and DPS48-slices) • 5,328 Kbit = 670 Kbyte BlockRAM, dual-port • 128 DSP48 slices (can be used as multipliers) • 640 available (single-ended) user I/O-pins • PCIe endpoint blocks

5.3.2 The embedded processor

Some details on the PowerPC block of interest for the TE are: • Model: IBM PowerPC 440

• Type: 32-bit RISC

(39)

• 550 MHz clock frequency

• Separate memory bus and Processor Local Bus (PLB) • Access to PCIe blocks

5.3.3 The evaluation board

The TE is intended to be implemented in an FPGA on the Xilinx ML-507 evaluation board (Figure 5.2). Some features of this board of interest for the TE are:

• External DDR2 RAM (256 MB) • Ethernet port

• SystemACE CompactFlash socket

Figure 5.2: The ML-507 evaluation board. Photo from xilinx.com.

5.4 Communication within TE, data rates

5.4.1 Currently used encoding of hits in IceCube

The current encoding of hits in IceCube is according to Listing 5.1.

With 38 bytes/hit, it is clear that the density of information of interest for the TE is low.

(40)

BYTES TYPE WHAT 4 INT R e c o r d l e n g t h i n b y t e s s e l f − i n c l u s i v e 4 INT P a y l o a d I D 8 INT Timestamp 4 INT T r i g g e r T y p e (−1 u s u a l l y ) 4 INT T r i g g e r C o n f i g I D (−1 u s u a l l y ) 4 INT S o u r c e I D ( 1 2 0 0 0 + s t r i n g #) 8 INT DomID 2 INT T r i g g e r M o d e ( b i t m a s k o f t r i g g e r b i t s ) 38 ( t o t a l )

Listing 5.1: Excerpt from a specification of the hit encoding currently used in IceCube. Of interest to us is Timestamp, SourceID (string number) and DomID (DOM number).

The hits according to the specification appear in the stringHubs. As described, full-TE collects all hits from all stringHubs into the ControlServer, merging the data stream from the different stringHubs through switches.

With this encoding, the data rate for the Control server to handle would be

DR = 2.6 Mhits/s · 38 bytes/hit ≈ 100 Mbyte/s ,

that is, being the expected average data rate, with peaks potentially sig-nificantly higher. Such a data rate is too high to be handled conveniently.

5.4.2 Suggested encodings, resulting data rates

“TE protocol”

A protocol called the TE protocol shall be used for (refer again to Figure 5.1 on page 31)

• sending hits from the stringHubs to the Control Server, and for • sending hits from the Control Server to the FPGA processor.

The protocol won’t be fully specified here, but it will represent each hit by3_:

3_{“DomID”, whenever used in this document, refers to a value uniquely identifying any}

of the 5300 DOMs in the detector and should not be confused with the DOM number, identifying a DOM on a given string.

(41)

BYTES WHAT

2 DomID and LC t a g

2 Timestamp o f f s e t

(The encodings used for DOM IDs with LC-tag and time stamps are specified in Section 5.6.)

A global time stamp is sent more sparsely (with a negligible additional communication load), and the time stamp offset marks the time to be added to the lastly received absolute time stamp in order to get the absolute time stamp of the current hit.

With 4 bytes/hit according to above and a hit rate of 2.6 MHz, the data rate would be6 Mbyte/s.

Communication between the FPGA processor and the logic The FPGA processor will calculate the absolute time stamp for each hit and the full absolute time stamp will be included with each hit sent from the FPGA chip processor to the logic. The full absolute time stamp has been chosen to have 5 bytes. The hits fed to the TE logic will hence be described as:

BYTES WHAT

2 DomID and LC t a g

5 a b s o l u t e t i m e stamp

This totals to 7 bytes/hit, making up a data rate of 10.5 Mbyte/s. Describing a hit in 32-bit words (4 bytes), 2 words are needed for each hit. With 2.6 MHz of hits there are 5.2 MHz of 32-bit words that the processor needs to send to the TE logic. A processor working at 550 MHz (such as the PowerPC in the Virtex-5 FPGAs) should be able to handle that easily. The TE logic will report detected tracks back to the processor. This load should be negligible compared to that of the incoming hits.

5.5 The VHDL code

The VHDL code for the implementation is placed in files named as the top-level units. There are two additional files:

miscPkg , a VHDL package, with types, constants and functions for all signals and ports that is not specific to a single unit

(42)

miscComps , a .vhd file with entities and their architectures for general units that are used more than once (flip-flops of various kinds, regis-ters, counregis-ters, tri-state buffers, FIFO buffers, RAM etc.)

5.5.1 Code amount

The total VHDL code amount of the project is a bit over 8000 lines. Fig-ure 5.3 gives an idea of the code amount for the different units. The code amount is more or less proportional to the complexity of the units.

Figure 5.3: VHDL code amount for the different units. The code for the angles unit includes 480 lines of auto-generated code.

Furthermore, the project includes a bit over 1500 lines of Matlab code, used for various tasks such as

• writing input data for the DOM coordinate LUT,

• calculating bin limits and generating the VHDL code for the compara-tors, and

• doing some simple behavioral simulations giving performance mea-sures.

In addition to that, some 3000 lines of Matlab code is included in the TEtest Matlab scripts for generating input and analyzing output data to the units making up the TE. TEtest will be further explained in Section 5.11.

(43)

CHAPTER 5. THE IMPLEMENTATION 37 f i f o i n s t : e n t i t y work . genFIFO g e n e r i c map( s i z e => 11+27+15 , −− c l o c k c y c l e s f o r s p e e d C r i t + a n g l e s + m a r g i n w i d t h => T a n g l e s T i m e S t U n s i g n e d ’ l e n g t h , a l m o s t E m p t y O f f s e t => open , −− n o t u s e d a l m o s t F u l l O f f s e t => open , −− n o t u s e d i n i t F i l e => ” ” −− empty f o r no i n i t f i l e ) p o r t map( wrEN => fifoWrEN , r d A c k => weAnglesOut , DI => a n g l e s I n U n s i g n e d , a l m o s t F u l l => open , a l m o s t E m p t y => open , empty => empty , f u l l => f u l l , DO => a n g l e s O u t U n s i g n e d , c l k => c l k , r s t => r s t ) ;

Listing 5.2: Example of entity instantiation, here from the instantiation of the FIFO buffer in the preHistBuffer unit.

5.5.2 VHDL code conventions used

Some comments on the VHDL code conventions used will be made.

Fully parameterized design No design parameters or data type param-eters are hard-coded within sub-units of the design—instead, all such parameters are defined in a top-level VHDL package (miscPackage). Any references to such values are also given as actual references and not as numerical values. This minimizes consistency problems within the code (see the declaration of TdetTrackPkg in Listing 5.5 for an example—the actual type declaration is dependent on other types and constants in several steps.)

Instantiations coding style VHDL supports component, configuration and entity instantiation. In the TE design, entity instantiation has been chosen since it considerably reduces code amount and simplifies chang-ing the design. An example of entity instantiation can be found in Listing 5.2.

Processes Clocked sequential statements can be described by a process block with either a sensitivity list and “if rst=‘1’ then [...] elsif rising edge(clk)

(44)

[...]”-like code, or by no sensitivity list and a “wait until rising edge(clk)” statement. The first of the two has been used in the TE.

5.6 Internal representations

5.6.1 DOM IDs

In the TE, DOM IDs are represented by 14 bit words. A DOM ID carries three pieces of information:

• LC (yes/no) • String number

• DOM number

This information is contained in the DOM ID word according to:

BITS : MSB [ 1 2−8 9−14 ] LSB

WHAT: [ LC S t r i n g No . DOM No . ]

String number and DOM number are 0-based. For an LC-hit, the LC bit is ‘1’, otherwise ‘0’.

The way pairs are produces in pairProd, they will be sorted in time with regard to hit1.timeSt.

5.6.2 Time stamps

Each hit is tagged with a time stamp. The time stamp is important when calculating the implied speed, used for cutting away noise hits as earlier described. Also, when the TE reports detected tracks on to the DAQ, the start- and ending times of the time window in which they occurred are reported as well, these start- and ending times also originating from the time stamps.

The time stamps as reported by the DOMs have a resolution of about 1 ns. The DOMs report time stamps with an LSB significance of 0.1 ns. By stripping the three least-significant bits, we get a new LSB significance of 0.8 ns which seems reasonable.

For the time stamps, 5 bytes are used. The largest time value that can be described is hence

(45)

CHAPTER 5. THE IMPLEMENTATION 39 TE core hitIn readyOut weIn detTrackPkg detTrackPkgWe clk rstSig

Figure 5.4: The interfaces of system (TE core).

When reporting time windows with detected tracks on to the DAQ, the standardized way of describing times in the IceCube DAQ4 _{must be used.}

The Control Server converts the time to that format. This should not impose any problems since the delay in the TE core is much less than 18 minutes.

5.6.3 Position coordinates, lengths

Position coordinates are represented by words 13 bits wide, with the LSB bit having a significance of 0.3 m.

The largest representable coordinate value is hence 0.3 · 213 _{m ≈ 2500 m ,}

which is within our needs since origo is placed in the middle of the detector and the detector is roughly 1 km × 1 km × 1 km.

5.7 TE core and its units

5.7.1 Input and output ports, other signal types

The top-level unit of the TE core is called system. Its entity declaration can be found in Listing 5.3.

The fundamental input to the TE core is hits, and the output is detected tracks information. These signals along with ready and writeEnable signals constitute the interface of the TE core (Figure 5.4).

readyOut is high when system is ready to receive another hit, which is written to the unit by laying it out on the hitIn port and pull weIn high for one clock pulse.

To report a time window with detected track(s), a TdetTrackPkg struc-ture is laid out on the detTrackPkg output port and detTrackPkgWe is pulled high for one clock pulse.

(46)

e n t i t y s y s t e m i s p o r t ( h i t I n : i n T h i t ; r e a d y O u t : out s t d l o g i c ; w e I n : i n s t d l o g i c ; d e t T r a c k P k g : out TdetTrackPkg ; detTrackPkgWe : out s t d l o g i c ; c l k : i n s t d l o g i c ; r s t S i g : i n s t d l o g i c ) ; end s y s t e m ;

Listing 5.3: Entity declaration of system, the top-level unit of TE core.

−− r a n g e s w i t h i n TdomID v e c t o r :

subtype s t r i n g N o R a n g e i s n a t u r a l range 12 downto 6 ;

subtype domNoRange i s n a t u r a l range 5 downto 0 ;

subtype TdomID i s u n s i g n e d (1+ s t r i n g N o R a n g e ’ h i g h downto 0 ) ;

subtype T t i m e S t i s u n s i g n e d (8∗5 −1 downto 0 ) ; −− 5 b y t e s , LSB=0.8 n s type T h i t i s r e c o r d domID : TdomID ; t i m e S t : T t i m e S t ; end r e c o r d ;

Listing 5.4: Declaration of the Thit structure

The Thit structure declaration, along with types and constants needed to declare it, is found in Listing 5.4.

The TdetTrackPkg structure declaration, along with types and constants needed to declare it, is found in Listing 5.5.

In the TE, the first thing to happen to the hits is that they are paired into pairs. Pairs are thereafter the data type that is pushed through the whole TE. The declaration of the Thit data type is found in Listing 5.4.

In a pair, hit2 is always the first (oldest) hit of the two. Its time stamp will hence always be less than (or possibly equal to) the time stamp of hit1.

5.7.2 System overview

The TE core design can be thought of as a long pipeline. The object flowing through the pipe is a pair, consisting of two hits. Typically, at every positive

(47)

c o n s t a n t a n g l e s A d d r B i t s : i n t e g e r := 9 ; c o n s t a n t t o t P a i r s C o u n t e r B i t s : i n t e g e r := 1 4 ;

c o n s t a n t l c C o u n t e r B i t s : i n t e g e r := 3 ; −− 2ˆ3−1=7

subtype Tuns1 i s u n s i g n e d ( 0 downto 0 ) ;

subtype T a n g l e s A d d r i s u n s i g n e d ( a n g l e s A d d r B i t s −1 downto 0 ) ; subtype T t o t P a i r s i s u n s i g n e d ( t o t P a i r s C o u n t e r B i t s −1 downto 0 ) ; subtype T h i s t C o u n t i s u n s i g n e d ( 3 downto 0 ) ; subtype T l c C o u n t e r i s u n s i g n e d ( l c C o u n t e r B i t s −1 downto 0 ) ; type T d e t e c t e d T r a c k D a t a i s r e c o r d we : Tuns1 ; a d d r : T a n g l e s A d d r ; c o u n t : T h i s t C o u n t ; end r e c o r d ; type T d e t T r a c k V e c i s a r r a y ( 0 to 9 ) o f T d e t e c t e d T r a c k D a t a ; type TdetTrackPkg i s r e c o r d t i m e S t F r o m : T t i m e S t ; t i m e S t T o : T t i m e S t ; t o t N o O f P a i r s : T t o t P a i r s ; l c C o u n t e r : T l c C o u n t e r ; d e t T r a c k : T d e t T r a c k V e c ; end r e c o r d ;

Listing 5.5: Declaration of the TdetTrackPkg structure

type T p a i r i s r e c o r d

h i t 1 : T h i t ;

h i t 2 : T h i t ;

v a l i d P a i r : s t d l o g i c ; end r e c o r d ;

Listing 5.6: Declaration of the Tpair structure. hit1.timeSt ≥ hit2.timeSt will always be the case.

(48)

hits pairs histStage preHistBuf angles pairProd speedCrit reported time windows

Figure 5.5: Overview of the sub-units of the TE. The preBuffer unit is not shown. Only the principal information flow is shown through the arrows; in the design there are further connections between the sub-units.

clock flank, each pair is sent one step onwards. This long pipeline can also be thought of as a large shift register, with each cell in the shift register containing one pair. This likening of the system with a pipeline makes sense up to but not including the so-called histStage unit.

There might not always be a pair to send on. Still though, just filling the data fields for that pipeline stage with 0’s could not easily be distinguished from a pair, and therefore there is one bit at each stage, called the validPair bit, signaling if the registers at that stage actually contain information on a pair or if that stage is empty.

The units of the top-level design from beginning to end is (see also Fig-ure 5.5):

• the preBuffer unit, buffering the hits from the FPGA processor and sending them on to the pairProd unit,

• the pairProd unit which takes hits and synthesizes them to all possible pairs with the condition that the time stamps of the two hits may differ at most Ttw,

• the speedCrit unit, deleting (setting validPair bit to 0) all pairs not conforming to the speed criteria,

• the angles unit, determining the θ and ϕ angle properties of the pair, using these quantities to find the bin of the pair and from now on sends the bin number along with the pair,

• the preHistBuffer unit, a buffer with a special functionality necessary before the histStage, further explained later, and

• the histStage unit, building histograms in the FPGAs RAM resources for all the (overlapping) time windows, finding time windows with histogram cells with the number of pairs above the threshold level, and reporting those time windows and their track candidates back to the FPGA processor.

An FPGA implementation of neutrino track detection for the IceCube telescope