Reconstruction of PINGU data with a Differential Evolution Minimizer

(1)

Reconstruction of PINGU data with a Differential Evolution Minimizer

Christoph Raab January 9, 2014

Department of Physics and Astronomy

Supervisor: Carlos P´ erez de los Heros; Reviewer: David Boersma

(2)

The Precision IceCube Next Generation Upgrade (PINGU) is supposed to have an energy

threshold below . 10 GeV in order to resolve the neutrino mass hierarchy. In order to

reconstruct the energy and direction of neutrinos interacting in this array, producing

both a hadronic cascade and a muon track, advanced reconstruction methods need to

be employed. A class of these seeks to maximize a complicated likelihood function

within an 8-dimensional parameter space describing the event, and requires sophisticated

minimizers to achieve the necessary resolution in a reasonable time. In this report, a pre-

existing but hitherto unused minimizer which samples that parameter space with several

Markov chains at once, based on the Differential Evolution Monte Carlo algorithm, is

developed further and its behaviour and performance is tested on simulated data of the

IceCube/PINGU array. The tests compare both various configurations of the minimizer

and Markov Chain Monte Carlo, a similar previous approach.

(3)

1. Introduction 1

1.1. Neutrino Telescopes . . . . 1

1.2. Track reconstruction . . . . 2

2. Simulation, Processing and Reconstruction Chain 4 2.1. Simulation . . . . 4

2.2. Processing . . . . 4

2.3. Likelihood Function . . . . 5

3. Differential Evolution Monte Carlo 7 3.1. Differential Evolution Algorithm . . . . 7

3.2. Implementation . . . . 8

3.2.1. Example Script . . . . 8

3.2.2. Tray Segment . . . . 9

3.2.3. Module . . . 10

3.3. Problems and challenges . . . 10

3.3.1. Stuck chains . . . 10

3.3.2. Cascade Energy . . . 14

3.4. Performance . . . 14

3.4.1. Comparison to Markov Chain Monte Carlo . . . 15

3.4.2. Resolution for SPE4 Seed . . . 21

3.4.3. Resolution for Mixed Monte Carlo/SPE4 Seed . . . 21

4. Summary, Conclusions and Outlook 24 4.1. Summary . . . 24

4.2. Conclusions . . . 24

4.3. Outlook . . . 25

5. Acknowledgements 28 A. Software options and parameters 29 A.1. Example script . . . 30

A.2. Tray script . . . 31

A.3. Module . . . 32

B. Supplementary figures 36

Bibliography 43

(4)

Introduction

1 1.1. Neutrino Telescopes

2 High-energy cosmic neutrinos impinge upon the Earth at a high flux from the entire sky

3 and can be produced in a variety of sources. Large-scale instruments that are used to

4 observe these sources by measuring the direction of the neutrinos are called neutrino

5 telescopes. One such neutrino telescope is installed at the South Pole, and uses the

6 antarctic ice as a target for neutrinos to interact in. In case of the weak, charged-current

7 interaction of a muon-flavour neutrino, this creates a muon alongside an hadronic cascade

8 from the interacting nucleus:

9 ¯

ν _µ + N → µ ⁺ + N ^∗ (1.1)

ν µ + N → µ ⁻ + N ^∗ (1.2)

The muon and nuclear fragments deposit their energy into secondary particles, which

10 in turn induce Cherenkov light in the ice with absorption lengths typically between 90 m

11 and 270 m, strongly depending on the ice layer. The muon does so along a linear-, while

12 the cascade creates a mostly point-shaped and nearly isotropic light source. (See fig.

13 1.1.) If this light is captured by an array of sensors, it is possible to infer the direction of

14 the incoming neutrino with a certain precision. This opens up the possibility of neutrino

15 astronomy, where neutrino sources from the entire sky can be observed. An irreducible

16 background to cosmic neutrinos are neutrinos created by the interaction of cosmic rays

17 in the atmosphere.

18 IceCube is an array of 86 strings of 60 (digital) optical modules (DOMs) each, buried

19 between depths of 1500 m and 2500 m. The modules consist of large photomultipliers,

20 which can resolve the shape of pulses due to single Cherenkov photons, and readout

21 electronics, which send data to the surface via a common cable, where the detector is

22 triggered according to coincidence conditions. The strings are spread out over a square

23 kilometer of surface and instrumenting a cubic kilometer of ice in total. The resulting

24 string spacing of 125 m leads to an energy threshold corresponding to the shortest muon

25 track possible to be resolved of 50 - 100 GeV, making IceCube mostly useful for observing

26 high energy neutrinos. These can e.g. give information about the origin of high-energy

27 cosmic rays. However neutrino telescopes can also be used to detect signatures of dark

28 matter particles gravitationally accumulated in certain locations such as the Sun or

29

(5)

μ ν

17 m (IceCube) 5 m/3 m (PINGU) 125 m (IceCube)

14 - 26 m (PINGU)

A cascade

C ̌ere nkov cone

Fig. 1.1.: Diagram of the light sources from a ν µ charged-current interaction inside the IceCube array.

Galactic center, or to measure the oscillation of neutrinos through the Earth. For these

30 purposes, a lower energy threshold is required, and IceCube contains the DeepCore

31 inset array of closer spaced strings with, in turn, 50 closer-spaced DOMs of 35% higher

32 quantum efficiency. This lowers the energy threshold of the combined detector to 10

33 GeV by gaining light yield for low-energy events, and improve the resolution of short

34 tracks. DeepCore’s position at the bottom center of the array, from a depth of 2100 m

35 to 2450 m, allows the rest of IceCube and a layer of 10 further DOMs per string above

36 2000 m to act as a veto for the DeepCore sub-array. [4]

37 PINGU is a planned extension of the IceCube/DeepCore array with an inset of even

38 higher-density strings. Neutrinos oscillate between their flavours due to having three

39 different mass eigenstates, m 1...3 . In vacuum, these oscillations are only parametrized by

40 the absolute value |∆m ² ij | = |m ² i − m ² _j | . In matter, however the MSW effect is sensitive

41 to the signs, as well, and by measuring the oscillations of atmospheric neutrinos through

42 the Earth, PINGU will be employed to resolve the neutrino mass hierarchy (NHM). In

43 the oscillation, the two “free” parameters are the energy of the neutrino, and oscillation

44 length. The latter corresponds to the zenith angle of the neutrino arriving in the detector.

45 Hence, resolution of both these will need to be improved, especially for E < 10 GeV,

46 where the effect is expected to be most strongly pronounced [5].

47 The used geometry in this report is called “V6” 1.2. It consists of 20 strings within

48 the central eight that make up DeepCore with a horizontal spacing of 26 m. Each string

49 has 60 DOMs with a vertical spacing of 5 m.

50 1.2. Track reconstruction

51 The energies of muons which may leave the array (& 100 GeV) is estimated by the energy

52 loss rate along the track, which is approximately linear ∝ E at & 500 GeV [3]. For the

53 low energies in concern, the track is most likely contained in the detector, and its energy

54 loss has a constant rate ≈ 0.222 ^GeV _m , so that the muon energy can be estimated by the

55 total length of the track. However, the energy lost through the hadronic cascade is com-

56

(6)

IceCube-86 with PINGU v6 inlay _IceCube

DeepCore PINGU (HQE) PINGU (HQE) IceCube volume PINGU volume

X (m)

-100 -50 0 50 100 150 200

Y (m )

-150 -100 -50 0 50

100 PINGU Geometry V6 (Dozier)

IceCube DeepCore PINGU (HQE)

PINGU Geometry V6 (Dozier)

Fig. 1.2.: Left: A top view of the IceCube/DeepCore array with PINGU in the V6 geometry. Right: A close view of V6, an inset of 20 strings (26 m spacing) with 60 high-quantum-efficiency DOMs each (5 m vertical spacing).

parable to the energy lost in the track (if any), so it is E ν + M nucleus = E cascade + E track .

57 This means that for PINGU, the event hypothesis is a combination of the light yield from

58 charged particles, originating at one point and along a line segment. Such a hypothesis

59 is described by 8 total parameters: the interaction time, vertex position, the zenith and

60 and azimuth angle, E cascade and E track . The last step from the actual event to these

61 parameters is the reconstruction of the signal from all the individual DOMs. Models

62 describing the light production in the ice, its propagation and the detector response

63 result in a likelihood function L which is the probability of the measurement, given a

64 certain hypothesis. A likelihood reconstruction seeks to find the parameter vector ~x

65 that maximizes L(event|~x), or for short L(x). Especially for a low number of detected

66 photons / number of channels (hit DOMs), this function has several local maxima. This

67 multimodality is a challenge that all minimizing algorithms face. In addition, a stan-

68 dard reconstruction method is needed already before the construction of PINGU in order

69 to evaluate its sensitivity. Several for these are being studied, like SANTA/monopod

70 which was already used for IceCube/DeepCore, and Multinest which is currently being

71 developed. This report considers a new minimizer that explores the 8-dimensional pa-

72 rameter space at several points simultaneously, based on an algorithm called Differential

73 Evolution Monte Carlo (DE). This algorithm is then tested in various configurations on

74 simulated data of the PINGU/IceCube array, and compared in performance to Markov

75 Chain Monte Carlo (MC), a similar previous approach.

76

(7)

Simulation, Processing and Reconstruction

77

Chain

78 2.1. Simulation

79 To perform reconstruction studies for PINGU necessarily involves simulation data. For

80 the events studied in this report, ν µ and ¯ ν µ with a flat spectrum between 1 GeV and 80

81 GeV were thrown into the V6 geometry of IceCube/PINGU (fig. 1.2). Their interaction

82 was simulated with the GENIE software, and the resulting light propagated assuming the

83 SPICE-Mie ice model, i.e. a combination of absorption and scattering lengths, depending

84 on the depth in the ice. This model describes the optical properties of the ice at the South

85 Pole, and was developed based on dedicated calibration runs. When a photon impinges

86 upon a DOM, the detector response is simulated accordingly, creating a waveform. This

87 includes for instance afterpulses, which can follow the main photon-induced pulse like

88 an echo with a ∼ 6 µs delay due to the ionization of residual gases inside the PMT [12].

89 The readout is triggered according to the waveform. On top of the hits resulting from

90 this simulation, noise hits are added.

91 The events were also selected according to the simulated location of the vertex, im-

92 posing R < 50m and −400m < Z < −200m with R the distance from the central string

93 (#36) and Z the depth in IceCube coordinates [9]. This containment criterium could

94 also be implemented in a less ideal form from simulated/real data, and ensures that the

95 reconstruction studies are in particular relevant to PINGU. Furthermore, neutral-current

96 reactions which would not contain a muon were removed, since they a) do not match

97 with the expected event signature but more importantly b) the outgoing neutrino would

98 carry away an unknown amount of energy and make studying energy resolution of the

99 reconstruction pointless.

100 2.2. Processing

101 These simulation files were then processed as experimental data would be. For each hit

102 DOM, the waveform as output by the digitizers (Fig. 2.1) undergoes a feature-extraction

103 that linearly decomposes it into a series of pulses with a certain time, width and charge.

104 These pulses then undergo a cleaning filter that seeks to remove noise. In our case,

105 the static time window cleaning was used which simply selects hits within a constant

106 time window relative to each event’s trigger time. A method that was previously used is

107

(8)

the sliding time window, which simply seeks time window of a pre-defined length which

108 contains the greatest number of pulses, and disregards those outside. Using this type of

109 cleaning could however lead to cleaning away hits from low-energy events if there was a

110 coincident event of higher energy, even when the dimmer one triggered the detector. This

111 behaviour is naturally undesirable for low-energy analysis. After that, only events that

112 had more than 10 hits were allowed to pass. On the hits within the PINGU subarray,

113 an SPE4 reconstruction was performed as a preliminary guess for the following ones.

114 This reconstruction is based on the SPE1st reconstruction, which minimizes a likelihood

115 computed by using the first hit from each DOM and the Pandel likelihood function,

116 which is an analytical approximation to the probability of a DOM being hit, given a

117 track hypothesis. This is plugged into the I3IterativeFitter module from the gulliver

118 suite. This module varies the result, minimizes again, and after four iterations returns

119 the fit result with the highest logL out of the five as the SPE4 fit result.

120 Fig. 2.1.: The output of the four different amplifiers present on a DOM. The FADC chip digitizes the PMT output signal with a long readout window (6400 ns) after passing a shaping amplifier, and the ATWD chip captures shorter time windows (422 ns) at a higher resolution while making use of three gain levels.

Figure from [13]

2.3. Likelihood Function

121 The likelihood function is taken from Millipede, a reconstruction software developed

122 within the IceCube collaboration (project millipede). The model at the basis of the

123 Millipede likelihood function is one of Poissonian statistics (see ch. 2ff of [3]). For

124 each DOM and each time bin within the duration that it recorded a waveform, signified

125 by index i, the measured quantity k i is the deposited charge in units of photoelectrons

126

(9)

(PE), corresponding to a single photon. It is expected to follow a Poissonian distribution

127 L _i = ^λ _k! ^k ⁱ e ^−λ ⁱ . The average λ i is the expected number of photons. On top of that, there

128 is expected to be a noise level ρ i , so that λ i → λ _i + ρ i . The complete likelihood is then

129 a product of L i , or

130 logL = ^X

i

logL i = ^X

i

k i ln(λ i + ρ i ) − (λ i + ρ i ) − ln k i ! (2.1) For DOMs that were not hit within the readout time window, the logL sum over time

131 bins simplifies into a P ^{no hit} , the probability that no single photon reached this particular

132 DOM during the entire readout time.

133 Millipede was configured to use time bins of variable width that contain maximally

134 1 PE, and are at most 200 ns long. The λ i are retrieved from so-called spline tables,

135 which are based on Monte Carlo simulation of the detector response for idealized forms

136 of sources. These are Cherenkov-emitting track segments of 15 m length correspond-

137 ing to minimally ionizing muons, and isotropically emitting point sources for hadronic

138 cascades. The resulting expection values λ i were then fitted to spline functions [10] on

139 the parameter space, and the fit results stored in lookup tables, from which the λ i to

140 be used for any given configuration of sources, i.e. event hypothesis, can be computed

141 more accurately than from analytical models [3]. Even though, they make up the largest

142 portion of the likelihood function call, which in itself dominates the processing time of

143 likelihood reconstruction methods.

144 Since the photospline tables are only computed for integer numbers of track segments,

145 likelihood values for track lengths that lie between two multiples of 15 m are are ap-

146 proximated by a linear interpolation. This is implemented in the LikelihoodWrapper

147 module, which was written by M. Dunkman as part of the hybrid-reco project.

148

(10)

Differential Evolution Monte Carlo

149 3.1. Differential Evolution Algorithm

150 One way to find the maximum of L(~x) is to simply sample points ~x out of the 8-

151 dimensional parameter space (interaction time, vertex position, the zenith and and az-

152 imuth angle, E cascade and E track ), compute L(~x) for each and take the best one. An

153 attempt to make this sampling more efficient than taking an 8-dimensional grid is called

154 Markov Chain Monte Carlo . It proceeds as follows:

155 1. Choose a seed as the starting vector ~x, for instance a first guess of the parameters.

156 2. Sample a new point ~y from a Gaussian distribution around ~x.

157 3. Replace ~x = ~y with a probability of min(1, ^L(y) _L(x) )

158 (This is called a Metropolis-Hastings test.)

159 4. Reiterate from the current point - this progression is called a chain.

160 After a number of burn-in steps, the distribution of points in the history of the chain

161 will be a sampling of the input distribution L, so the points are denser where L is higher,

162 i.e. the interesting region(s) in parameter space. The one parameter vector with the

163 highest L will then be chosen as the reconstructed maximum.

164 This method is still rather slow in approximating the true maximum. It also carries

165 with it the possibility of the chain converging to a non-global maximum, depending on

166 the chosen step size. Differential Evolution Monte Carlo (DE) is a way to attempt to

167 accelerate the convergence and make it more resistant against multimodality.

168 DE acts upon several chains in parallel. At each step, they’re collectively called a

169 generation . In summary:

170 1. Smear the seed into the initial generation (taking N samples out of a Gaussian).

171 2. For the i-th vector ~x i , calculate an updated version ~ x p :

172 a) Randomly chose two other, non-identical vectors ~ x r1 and ~ x r2 out of the gen-

173 eration.

174 b) Sample a vector ~e from a distribution that is small compared to the spread

175 within the generation

176

(11)

c) Compute

177 ~

x p = ~x + γ( ~ x r2 − ~ x r1 + ~e) (3.1) (The factor γ is 0.2 by default, the ~e widths are the steps in A.3.)

178 3. Replace ~x i = ~ x _p according to a Metropolis-Hastings test as in MC.

179 4. Reiterate.

180 The unique stationary distribution of the sampled parameter vectors is L as in MC

181 (proof in [1]). In this method, when the spread of the generation goes down, so will the

182 variation from step to step, which could accelerate the convergence to the stationary

183 distribution. Taking the ”best“ vector out of several chains also means that it’s enough

184 for part of them to converge to the region of the true maximum. On the other hand,

185 the rest of the generation will still use up processing time for its updates.

186 3.2. Implementation

187 A basic version of a DE-based likelihood reconstruction was already implemented by

188 Ken Clark and Matt Dunkman (PSU) as a module within the likelihood-scanner project

189 (sandbox/mdunkman/likelihood-scanner/trunk at rev 105435). I updated the order

190 of parameters to the one used currently by HybridReco (sandbox/reagan/hybrid-reco/trunk

191 at rev 100077), added timing functionality, more options and output keys including the

192 possibility to write a log of the entire evolution to a separate file, made it possible to pass

193 bounds and step sizes to the method from within a script via the HybridReco parameter

194 service, synchronized the same values to those default for the MC method, fixed miscel-

195 laneous bugs, implemented new kinds of parameter-vector updates, and packaged it in

196 scripts which could be used by the IceCube software framework IceTray, as described

197 below.

198 3.2.1. Example Script

199 There is an example script in likelihood-scanner/resources/examples/darwin_chain.py.

200 This tray script

201 • reads in an input file

202 • skips neutral current events

203 • computes the hadronic scaling factor F . This factor is to correct for the fact

204 that a hadronic cascade has a smaller light yield (or visible energy) than an elec-

205 tromagnetic cascade of the same energy due to neutral hadrons (like neutrons).

206 The photospline tables are calculated for electromagnetic cascades, so the actual

207 cascade energy relates to the reconstructed visible energy by

208 E _actual = E _visible

F (E actual ) (3.2)

(12)

The factor can be parametrized as

209 F (E) = 1 − 0.690 max(2.7 GeV, E) 0.188 GeV

^−0.162

[8] (3.3)

Since the current implementation requires Monte Carlo truth input, this factor is

210 not used for the evaluation of this method’s resolution.

211 • includes the tray segment likelihood_scanner.darwin_chain described below

212 • and finally writes output in the form of .i3 files and via tableio in ROOT or HDF

213 format.

214 Its command-line options are listed in tab. A.1.

215 216

3.2.2. Tray Segment

217 The tray segment is included in the above example tray script. It

218 • sets the log-level for the involved modules

219 • adds the Photospline services (I3PhotoSplineServiceFactory) for both tracks

220 and cascades

221 • adds the time window in which pulses could have been recorded by the simulated

222 read-out, for use by the likelihood via I3WaveformTimeRangeCalculator. Simu-

223 lation files which contained both a noise-free pulse series and one with added noise

224 used to have a bug in the trigger simulation which allowed noise pulses to prompt

225 the error ”Millipede: Pulse time before readout window start“.

226 • adds the Millipede likelihood service. This service in turn is configured to use

227 the afforementioned muon and cascade photospline services, the time window, and

228 count 1 pulse per variable-width time bin.

229 • adds a seed particle to the frame. In case of a Monte Carlo seed, this particle

230 has position, time and energy from the cascade, and length and direction from the

231 muon track. In case of the infinite-track SPE4_PINGU seed, the latter is appended

232 with default length 50 m and energy 25 GeV. If chosen, the position and time are

233 replaced by the MC values.

234 • stores the seed in the HybridReco seed service (HybridRecoSeedServiceFactory)

235 • adds the HybridReco parameter service (HybridRecoParametrizationServiceFactory)

236 and stores step sizes and bounds for all parameters therein (see tab. A.3), which

237 are later used by the minimizer. A step size of 0 leaves the parameter fixed.

238 These services were taken from the project hybrid-reco (sandbox/mdunkman/likelihood-scanner/trunk

239 at rev 105435).

240

(13)

• Finally the DarwinizedChainer module is added and provided with likelihood,

241 parameter and seed services.

242 • The tray returns a list of output keys, which can e.g. be passed to a table writer.

243 Options passed to the tray segment are described in tab. A.2.

244 245

3.2.3. Module

246 The modul implements the actual minimizer as a C++ class which inherits from the

247 I3Module class. On inclusion into a tray script, it receives a set of parameters which are

248 handled (once) in the method DarwinizedChainer::Configure (see A.3). These are

249 detailed in tab. A.3. Among these parameters are several external pieces of software:

250 • a seed service, from which the seed particle is retrieved

251 • a parameter service, which supplies the bounds and step sizes used during the

252 evolution update (see A.3)

253 • a likelihood service, containing the likelihood function to be maximized

254 The general form of the iteration is described in A.3. Upon completion, the best

255 parameter-and-LLH vector and its history, along with information about used CPU

256 time, are written into the output file. It is possible to write an additional file, containing

257 the complete evolution of all individual chains. For more details, see A.3.

258 3.3. Problems and challenges

259 3.3.1. Stuck chains

260 While most chains gradually improve their likelihood over the course of the evolution,

261 some chains stay stuck at a lower likelihood which is clearly separate from the rest of

262 the generation. In the figure 3.1, this is shown once for 80 chains on one specific event,

263 and once for two particular chains representing each case. Examining individual chains

264 (see fig. 3.1) shows that the stuck chains are updated with constant frequency, but seem

265 to oscillate around a broad, local maximum. In comparison, the higher-likelihood chains

266 slowly converge to a maximum, as the parameter space they can move into shrinks.

267 Separating these two subsets of chains with an logL cut and plotting the X-Y coordi-

268 nates of the vertex in fig. 3.2 shows that the stuck chains progress from the initial (seed)

269 region to the edges of the detector volume. Plotting the duration of a likelihood-function

270 call versus the number of track segments for these two sets in fig. 3.2 it’s apparent that

271 the stuck chains extend their tracks farther from the edge of the detector. These track

272 segments lie outside the array, so there are no DOMs nearby that would penalize them

273 with Millipede’s P ^{(no hit)} (see 2.3). The energy of the cascade meanwhile increases to

274 match the expected photoelectrons from an interaction at the edge of the array to the hits

275

(14)

step

100 150 200 250 300 350 400 450

LLH

-950 -900 -850 -800 -750 -700 -650 -600 -550 -500

0 1 2 3 4 5 6 7 8 9 10

LLH of all chains in a single file

(a) All chains, the stuck chains are below -880.

generationNum

100 150 200 250 300 350 400 450

LLH

-470 -460 -450 -440 -430 -420 -410 -400

LLH:generationNum {chainNum==13}

(b) A normal chain.

generationNum

100 150 200 250 300 350 400 450

LLH

-590.5 -590 -589.5 -589 -588.5

LLH:generationNum {chainNum==71}

(c) A stuck chain.

Fig. 3.1.: The evolution of logL with the steps for one event. Note the relative scale of their variations.

inside the array, while the interaction time has to decrease to accomodate the increased

276 propagation time.

277 The hits inside the array will only contribute a flat, noise-like logL from each track

278 segment which is too far removed compared to the scale of the absorption length. Hence

279 in fig. 3.2 the linear fits are

280 time for normal chains [s] = (0.18 + 0.09 × N segments ) time for stuck chains [s] = (0.22 + 0.04 × N segments )

The out-of-reach tracks take half the time per track segment. One track segment corre-

281 sponds to one lookup of the photospline table (see 2.3) for each hit, so the higher number

282 of track segments still increases the average time (2.2 s → 7.9 s).

283 Comparing the logL call time for two representative chains in fig. 3.3 shows that

284 the track in the ”stuck“ chain continues to grow almost steadily. For some events, the

285 proportion of stuck chains can be so high that their diverging track length is clearly

286 reflected in an overall diverging computation time per step (fig. 3.3). The non-divergent

287 events on the other hand contain a smaller fraction of stuck chains.

288 This behaviour is unwelcome not only due to the increase in computation cost, but

289 also because evolution steps that involve taking a difference between one conservative,

290

(15)

high-logL and one extremely distant, low-logL region are unlikely to improve the logL

291 of chains in the high-logL region, and therefore the current best likelihood. There are

292 several attempts to control it.

293 Parameter Boundaries

294 Since the divergence in time is due to a divergence in track length and is accompanied

295 by variations in cascade position, time, and energy, a straight-forward solution to this

296 problem is by implementing boundaries on all of these parameters. Especially the track

297 length had hitherto no upper boundary, and the vertex position/time were bound far

298 too losely compared to what could be expected of the data.

(a)

number of track segments

0 10 20 30 40 50 60 70 80

time / evolution [s]

0 0.5 1 1.5 2 2.5 3 3.5

time / evolution step over generations for stuck and normal chains

(b)

Fig. 3.2.: Left: The vertex for normal chains (gradient colours) and stuck chains (red) in one event, with the bounds indicated. Right: Time of a logL function call for one event vs. number of track segments. Separated in normal chains (green) and stuck chains (red), showing means with RMS error bars per bin, linear fits and the upper boundary.

299 Jump Steps

300 In a jump step, the γ parameter in eqn. 3.1 is set to 1, so the update becomes

301 ~

y = ~x + (~r 1 − ~ r 2 ) + ~e (3.4) The idea is that stuck chains could be removed from their local maximum by a step

302 with a randomly chosen difference vector between the stuck sub-population and the

303 converging sub-population.The random choice would then result in an exponential decay

304 of the stuck population size. However, the stuck chains cover a broad region in parameter

305 space, which increases the time constant of this decay. Also, the other chains are affected

306

(16)

by this jump step, too, while the time spectrum (fig. 3.4) did not qualitatively change

307 for runs seeded with Monte Carlo truth, while the divergent chains were cut off for the

308 SPE4 seed. This however remains to be separated from the effect of the track length

309 boundary.

310 The effects of both jump steps and parameter boundaries are examined in fig. 3.5,

311 where the time per step as well as the best logL are shown vs. step number for the

312 same event being processed with and without these measures, once with an SPE seed

313 and once with an Monte Carlo truth seed. Of the four curves, the only one diverging

314 in time is for SPE without jump steps and unbound parameters. The times of the SPE

315 runs are comparable before the divergence, and the MC run’s time is clearly higher with

316 the counter-divergence measures before it becomes compatible after ∼ 250 steps. Note

317 that this excludes the actual jump steps, which have significantly higher times. For both

318 seeds, the logL is enhanced by bound parameters, with an Monte Carlo seed always

319 exceeding an SPE seed.

320 Hard Time Limit

321 The divergence of processing time for a whole event can be mitigated by simply inter-

322 rupting the evolution if the total time exceeds a limit estimated:

323 X t ≤ 2 × N steps × hti first 20 steps (3.5) Once all the above measures are implemented, the fraction of events cancelled by

324 this condition are on average 0.1% for both 40 chains and 80 chains, one event being

325 cancelled in each run with an SPE seed. The fractions were 1% (40) and 0.4% (80) for

326 an Monte Carlo seed. The reason for this was not investigated. The time per step is

327 then well-bound to ≤165 s/step and does not diverge, seen in fig. 3.4, which contrasts

328 to the previous upper limit of 634 s/step for a diverging event.

(a) Compared to number of track segments (gradient colours)

step

0 50 100 150 200 250 300 350 400 450

time/step [s]

0 5 10 15 20 25

(b) Compared to a normal chain (blue)

Fig. 3.3.: Time per step for a stuck chain (red) vs. step number (no jump steps).

329

(17)

time/step [s]

0 100 200 300 400 500 600

frequency

1 10 10

2

10

3

10

4

Step times before and after

(a) Blue: from a large sample of events. Red:

from a smaller sample.

time/step [s]

0 50 100 150 200 250 300 350 400

frequency

1 10 102 103

Time spectrum before, after bounds and jump steps (SPE seed)

(b) Just SPE4 seed.

time/step [s]

20 40 60 80 100 120 140 160

frequency

1 10 102 103

Time spectrum before, after bounds and jump steps (MC seed)

(c) Just Monte Carlo seed.

Fig. 3.4.: Distribution of time per step. Blue: after all parameter bounds, jump steps, and time limits have been implemented. Red: Before.

3.3.2. Cascade Energy

330 The photospline tables 2.3 are actually computed for electromagnetic cascades. The

331 likelihood reconstruction is hence expected to ideally give the energy of an electromag-

332 netic cascade that had the same light yield as the hadronic cascade in the data. Since in

333 a hadronic cascade, more energy is deposited stochastically into neutral particles such

334 as neutrons which do not result in Cherenkov light, this would lead on average to an

335 underestimation of the energy of the hadronic cascade. This underestimation is reflected

336 in a bias of −9.7 GeV (averaged over the used data sample) when comparing the recon-

337 structed cascade energy to the energy of the cascade as taken from the Monte Carlo

338 truth information. This effect gets more pronounced towards low energies, as shown in

339 Fig. 3.6. If one applies the analytical approximation of the factor, as computed from

340 the true cascade energy, as an unphysical correction factor to the reconstructed cascade

341 energy, this bias reduces to −4.8 GeV, which is smaller than the RMS of 9.9 GeV. Any

342 way to apply this correction using purely reconstructed data was thought to involve a

343 separate study to arrive at a different correction function, and hence no such attempts

344 were made.

345 3.4. Performance

346 Several quantities were computed to describe the deviation between reconstruction and

347 Monte Carlo truth. Here, the form ∆X means X reco − X _true .

348 The first two are the most interesting for physical analyses:

349 • Relative energy error

∆E E true

, with the total reconstructed and true neutrino energy

350 respectively

351

(18)

step number

0 50 100 150 200 250 300 350 400 450

time/step [s]

20 40 60 80 100 120 140 160

time:generationNum {eventNum==0 && generationNum%5 == 0 && generationNum%20!=0}

(a) Time per step vs. step, 1 event

step number

0 50 100 150 200 250 300 350 400 450

Best LLH

-540 -520 -500 -480 -460 -440 -420 -400 -380

bestLLH:generationNum {eventNum==0 && generationNum%5==0 && generationNum!=20}

(b) Best logL evolution, 1 event

Fig. 3.5.: Green: SPE seed. Blue: Monte Carlo seed. Dashed: unbound track length, no jump steps. Solid: bound track length, 1/20 jump steps. The separate points are for the jump steps.

• Zenith error |∆θ|

352 Two more energy-related, also with physical relevance

353 • Energy bias ∆E, the signed difference of total reconstructed and true neutrino

354 energy. This is expected to have a bias proportional to true energy, see sec. 3.3.2

355 • Fraction error ∆ ^E _E ^µ _ν , which evaluates how well the actual hybrid hypothesis is

356 matched

357 Vertex-related quantities which affect the reconstruction quality of others (see sec.

358 3.4.3)

359 • Vertex position ∆ ~R , the distance between the true and reconstructed vertex.

360 • Interaction time |∆T |

361 The performance of the reconstruction method is then evaluated by e.g. the median

362 of these errors across the event sample, called the resolution.

363 Plots are shown here for the first group, and in the appendix B for the rest.

364 3.4.1. Comparison to Markov Chain Monte Carlo

365 First, a comparison to the original MC has to be made. Both methods were run on

366 the same events, with the same SPE4 seeds, and their best vector recorded at regular

367 intervals of steps, along with the time elapsed since the start of the iteration (see A.3).

368 For DE, the effective number of steps is given by the number of generations times the

369 size of the generation. The chosen generation size was 40, since previous tests showed

370

(19)

E/GeV

F(E) scaling factor

(a) Parametrization

0 10 20 30 40 50 60 70 80

0.2 0.4 0.6 0.8 1 1.2

Cascade Energy Reconstruction/Truth vs. Truth, Mean 0.57

(b) Reconstruction mean ± rms

Fig. 3.6.: Left: Average visible over actual energy for hadronic cascades, parameterized according to L. R¨adel [8]. Right: Reconstructed over true energy, with mean (red) and mean ± rms (blue). Note the different ranges and large width of the distribution.

promising results with this number, which is on scale of the recommended 10 × d (see

371 3.1). A population of 10×d = 80 was also tried, but limits on computing time interruped

372 the reconstruction on the majority of the events.

373 The median resolutions for both methods can then be plotted over the step number

374 at which they were achieved; i.e., the resolution if the iterations were interrupted at this

375 step. These plots are shown in the left column of fig. 3.7. The difference between two

376 medians does not contain all information about the relative merits. A second type of

377 quantity is also computed: the portion of events where a certain resolution improved

378 when moving from MC to DE, called the separation. These make up the right column

379 of fig. 3.7.

380

(20)

Chapter 3. Differen tial Ev olution Mon te Carlo

step (MC)

5000 10000 15000 20000 25000 30000

] (MC) ° [  θ ∆ 

6 7 8 9 10 11 12

step (DE)

5000 10000 15000 20000 25000 30000

0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46 0.48 0.5

(a) Zenith resolution and separation

step (MC)

5000 10000 15000 20000 25000 30000

/E (MC)  E ∆

0.3 0.32 0.34 0.36 0.38 0.4 0.42

Median Relative Energy Resolution vs. Step

step (DE)

5000 10000 15000 20000 25000 30000

0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 0.6 0.62 0.64

Portion of events where Relative Energy Resolution DE<MC vs. Step

(b) Relative energy resolution and separation

17

(21)

The zenith resolution (fig. 3.7, top) of DE declines from a larger starting value of 11.8 ^◦ ,

381 which has to be due to the initial smearing (A.3). Meanwhile the MC zenith resolution

382 loses its rate of decline quicker, leading to a close approach under 5.8 ^◦ (DE) and 5.5 ^◦

383 (MC), with a separation between 0.48 and 0.5, after 25k steps. The relative energy

384 resolution (fig. 3.7, bottom) begins at a slightly higher level, but here the separation

385 crosses 0.5 at 2000 steps already and ends up above 0.62 after 25k steps, corresponding

386 to resolutions of 30% (DE) and 38% (MC).

387 An interesting point to evaluate is also the processing time per step for each method.

388 This quantity depends on the computing resources available, in this case the NPX4

389 cluster at the University of Wisconsin in Madison. However it’s apparent in Fig. 3.8

390 that DE differs clearly from MC in that regard, with almost twice the processing time

391 per step.

392

step

5000 10000 15000 20000 25000 30000

ratio

0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14

<t

MC

Portion of events where t

DE

(a) Separation of times vs. step

MC [s]

t

0 10000 20000 30000 40000 50000 60000

[s]DEt

0 10000 20000 30000 40000 50000 60000

0 5 10 15 20 25 30 35

Time, DEMC vs. MCMC @ Step 30k

(b) time(DE) vs. time(MC) at step 30k.

step

5000 10000 15000 20000 25000 30000

time/step [s]

0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85

Mean Time/Step, DEMC and MCMC

(c) Mean time/step for DE (red) and MC (blue).

Fig. 3.8.: Comparison of running times for DE and MC.

This motivates a second type of plot, namely the resolutions at a certain time, but

393 including all events that already have concluded their iterations; i.e., the resolution if

394 the iterations were interrupted after this CPU time. These are shown in fig. 3.9

395 Here, MC resolutions can be seen reaching a minimum between 15 ks and 20 ks,

396

(22)

time [s] (MC)

0 10000 20000 30000 40000 50000 60000

] (MC) ° [  θ ∆ 

5 6 7 8 9 10 11 12

Median Zenith Resolution vs. Time

(a) Zenith resolution vs. time

time [s] (MC)

0 10000 20000 30000 40000 50000 60000

/E (MC)  E ∆

0.28 0.3 0.32 0.34 0.36 0.38 0.4 0.42 0.44 0.46

Median Relative Energy Resolution vs. Time

(b) Relative energy resolution vs. time

Fig. 3.9.: Resolutions vs. time for MC (blue) and DE (red). The resolution at certain time corresponds to the condition that the iteration would be interrupted either after this time is reached or at the maximum number of steps. Hence, the asymptotic behaviour is not intrinsic to the method, but rather a result of the limited step number leading to fewer events completing their iteration after a certain time.

with the few events still running after that point worsening the resolution. The more

397 rapid descent of DE in zenith resolution is balanced by the longer time, so that the two

398 methods actually achieve the same resolution at 27 ks.

399 For benchmark purposes, a display step of 25200 is chosen, which corresponds to

400 the step after which DE remains underneath the zenith resolution it achieves at the

401 maximum step of 30000. The resolutions and separations can then be measured at this

402 step, but varying over the (true neutrino) energy (fig. 3.10) and cosine of zenith (fig.

403 B.3).

404

(23)

Chapter 3. Differen tial Ev olution Mon te Carlo

[GeV]

true E ν

0 10 20 30 40 50 60 70 80

] (MC) ° [  θ ∆ 

4 6 8 10 12

[GeV]

true E ν

0 10 20 30 40 50 60 70 80

0.42 0.44 0.46 0.48 0.5 0.52 0.54

[GeV]

true E ν

0 10 20 30 40 50 60 70 80

/E (MC)  E ∆

0.25 0.3 0.35 0.4 0.45

Median Relative Energy Resolution vs. Energy at step 25k

[GeV]

true E ν

0 10 20 30 40 50 60 70 80

0.55 0.6 0.65 0.7 0.75

Portion of events where Relative Energy Resolution DE<MC vs. Energy at step 25k

20

(24)

The main deficiency of DE w.r.t. MC in zenith resolution is at energies < 20 GeV,

405 which is weighted more strongly in the overall median resolution due to the E ⁻² input

406 spectrum. Both relative energy resolutions increase from the low to the high end of the

407 energy range, but stay in the same order of magnitude. This is consistent with the clear

408 dependence of the energy bias in fig. B.2. Due DE starting to rise at a lower energy than

409 MC, the separation dips to a minimum of 0.51% between 40 and 50 GeV. Meanwhile the

410 separation is 0.75 > 70 GeV, and 0.65 < 10 GeV.

411 3.4.2. Resolution for SPE4 Seed

412 The DE method is run for two different population sizes, 40 and 80. The “display” step

413 number of 25200 is then reached after respectively 625 and 312.5 generations. Since

414 multiples of 50 were recorded in this run, generations 600 and 300 are used for plots in

415 fig. 3.11.

416 The zenith resolution always improves with energy, at least until 50 GeV. It’s in-

417 tuitively understandable that the events < 10 GeV with few hits are the hardest to

418 reconstruct with a track direction. The slight improvement of 7.2 ^◦ for 80 chains, to 5.9 ^◦

419 for 40 chains can be seen to continue in the zenith resolution for MC, which achieves

420 5.6 ^◦ with one chain (see fig.3.7). The energy resolutions are even more similar, but still

421 show an improvement 30% (80) to 33% (40).

422 3.4.3. Resolution for Mixed Monte Carlo/SPE4 Seed

423 A trial was made to evaluate the performance of DE under the condition where the

424 vertex position and interaction were known to a high accuracy. In such a case, which

425 for example can be possible using Multinest for reconstruction, it would be possible to

426 actually fix these four parameters at their initial values for the entire evolution. Not

427 only would this mean slightly less computation time spent on updating them and an

428 effective decrease in the number of dimensions, but the true maximum of the likelihood

429 function in the remaining free parameters would also better approach the true values.

430 To this end, a seed particle was prepared using the direction from SPE4, default cascade

431 energy and track length, and the Monte Carlo truth X, Y, Z and T. The latter four were

432 neither smeared before, nor updated during the evolution. Again, population sizes of 40

433 and 80 were used.

434 The zenith resolution (fig. 3.12) is improved dramatically with regard to the SPE seed

435 (fig. 3.11) and even MC (fig. 3.10), but at 3.8 ^◦ for both 40 and 80 chains. Particularly <

436 10 GeV does not show a resolution which is as large proportionally to the total resolution,

437 while for the SPE seed the factor was ∼ 3. The energy resolutions (fig. 3.12) are also

438 similar with 33% and show the same energy dependence as before.

439

(25)

Chapter 3. Differen tial Ev olution Mon te Carlo

[GeV]

true E ν

0 10 20 30 40 50 60 70 80

] (DE 80) ° [  θ ∆ 

0 10 20 30 40 50

0 2 4 6 8 10 12 14 16 18 20

[GeV]

true E ν

0 10 20 30 40 50 60 70 80

] (DE 80) ° [  θ ∆ 

0 10 20 30 40 50

0 2 4 6 8 10 12 14 16

(a) Zenith resolution

[GeV]

true E ν

0 10 20 30 40 50 60 70 80

/E (DE 80)  E ∆

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

0 2 4 6 8 10 12 Relative Energy Resolution (DE 80) vs. Energy @ gen 600, Median 0.303000

[GeV]

true E ν

0 10 20 30 40 50 60 70 80

/E (DE 80)  E ∆

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

0 2 4 6 8 10 12 14 16 Relative Energy Resolution (DE 80) vs. Energy @ gen 300, Median 0.325000

22

(26)

Chapter 3. Differen tial Ev olution Mon te Carlo

[GeV]

true E ν

0 10 20 30 40 50 60 70 80

] (DE 80) ° | [ θ ∆|

0 10 20 30 40 50

0 2 4 6 8 10 12 14 16 18 20 22

[GeV]

true E ν

0 10 20 30 40 50 60 70 80

] (DE 80) ° | [ θ ∆|

0 10 20 30 40 50

0 2 4 6 8 10 12 14 16 18 20 22

(a) Zenith resolution

[GeV]

true E ν

0 10 20 30 40 50 60 70 80

/E (DE 80)  E ∆

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

0 2 4 6 8 10 12 14 Relative Energy Resolution (DE 80) vs. Energy @ gen 600, Median 0.333000

[GeV]

true E ν

0 10 20 30 40 50 60 70 80

/E (DE 80)  E ∆

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

0 2 4 6 8 10 12 14 16 18 Relative Energy Resolution (DE 80) vs. Energy @ gen 300, Median 0.329000

(b) Relative energy resolution

23

(27)

Summary, Conclusions and Outlook

440 4.1. Summary

441 A minimizer based on Differential Evolution Markov Chains was employed for likelihood

442 reconstruction of both a track and a cascade within PINGU. The minimizer was equipped

443 to compare it to the previously used Metropolis-Hastings Markov Chain miminizer.

444 Whether the original MC or this DE method has the more favourable resolution depends

445 on the quantity. In general, energy-derived quantities are however better reconstructed.

446 DE takes 1.6× the time/step as MC. The difference between running DE with 40 and

447 80 chains appears mostly in a slight detriment to the zenith resolution. A trial with

448 using the true vertex to fix its four parameters however improves the zenith resolution

449 below that of MC, which makes trials with an additional, better vertex reconstruction

450 promising. A drastic reduction in the required processing time was not observed, instead

451 DE takes longer per step. The problem of chains diverging towards ever-greater track

452 lengths (along with the vertex moving to the detector edge), and hence driving up the

453 time for LLH function calls, was mitigated with a bound on the track length and jump

454 steps, so that events where the majority of chains exhibit this behaviour and significantly

455 affect the overall processing time don’t appear anymore.

456 4.2. Conclusions

457 Differential Evolution is a minimizer that encompasses many choices of parameters and

458 modes of application, which have to be studied and chosen carefully, beyond the scope

459 of this project. The apparatus to systematically status these is in place, so that someone

460 continuing the study should be able to pick up at the same point. As encouragement

461 might be seen that it is not unambiguosly worse than the method it was supposed to

462 replace, even though that the places where it marks an improvement it can not be said

463 to be significantly faster, either. Furthermore, study of the method has revealed a wide

464 variety of modifications to explore, which are detailed in 4.3. In its current state, the

465 long CPU time required for the DE makes Multinest a more viable default minimizer

466 for IceCube/PINGU.

467

(28)

4.3. Outlook

468 There are several points from which these studies can move forward. In the following,

469 these are detailed in several groups.

470 As has been shown in section 3.3.1, stuck chains remain part of the generation, which

471 is a point to further investigate.

472 The fact that they are little affected by jump steps might be a hint that

473 • these steps need to be constructed in a different way, for instance through the

474 snooker update

475 • there needs to be a heuristic to detect stuck chains, so that any measure could

476 against them could be applied more discriminately

477 • the step magnitude is too big in general. This would not only impede the conver-

478 gence, but also mean that jump steps do not any make any significant difference.

479 This could be either due to a too-large scale of the e variation, or the γ factor.

480 Compared to MC, the speed/resolution performance is still lacking. Apart from vari-

481 ations in the algorithm’s parameters, several types of updates could be of help:

482 Crossover and Block Update

483 Both these related ideas are described in the 2008 paper by Cajo et al. [1]. A crossover

484 appears in many genetic algorithms. It involves replacing certain parameters in the trial

485 vector with the un-updated version according a certain pattern, for instance binomially

486 with a crossover probability for each parameter. A special case of non-random crossover

487 simply divides the parameters into blocks, subsets that get updated in turn. This effec-

488 tively decreases the number of dimensions and with that the minimal generation size,

489 corresponding to the time spent in each step.

490 Snooker Update

491 The module contains an implementation (::EvolveSnooker) of the snooker update

492 which I derived from the 2008 paper ([2]). It choses the direction and magnitude of

493 the step separately. For a vector ~x, the update proceeds as this:

494 1. Randomly chose a different chain ~z, yielding the direction ~n = ~z − ~x

Reconstruction of PINGU data with a Differential Evolution Minimizer