B-tagging - b-jet identiﬁcation and searches for supersymmetry, dark matter and Higgs boson pai

jets.

The ATLAS experiment has what is called non-compensating calorime-ters, which means that the response to hadrons is lower than the re-sponse to particles interacting electromagnetically and this is corrected through the so-called jet energy scale calibration. Furthermore, energy from the underlying event and additional proton-proton collisions in the same bunch crossing (pile-up) can sometimes be included in the jet re-construction. To account for these and other effects, different calibration procedures need to be followed to correct the reconstructed jets.

4.4 B-tagging

Top quarks and Higgs bosons as well as many BSM particles often decay tob-quarks which in turn hadronise to form b-hadrons. A jet containing ahadron is referred to as a jet, a jet containing a c-hadron but no b-hadron is referred to as ac-jet, while a jet containing neither a b-hadron nor ac-hadron is referred to as a light jet.

The identification of b-jets, referred as b-tagging, is an important tool used in the ATLAS physics program. Algorithms are developed exploiting the characteristic features of b-hadrons: long lifetime, high decay multiplicity and high mass. Due to the long lifetime ofb-hadrons, reconstructed jets containing these particles, referred to as b-jets, are characterised by being spatially matched to a SV which is displaced from the collision point or primary vertex. The SV is the point where the theb-hadron decays and the impact parameter (IP) the distance of closest approach from the track trajectory to the PV. The presence of the SV results in a high IP of the tracks in the jet, as illustrated in Figure 4.3.

Different low level taggers have been developed to focus on specific flavour-tagging sensitive features [30]. These low level taggers are later used as input to more complex high level taggers which result in signifi-cantly improvedb-jet identification performance.

4.4.1 Low level taggers

The low level taggers of interest for this thesis are based on the following information:

Figure 4.3: Illustration of important observables used to identify jets containing hadrons (jets) while rejecting jets not containing any b-or c-hadrons (light jets) - Nazar Bartosik, Wikipedia.

• Impact parameter (IP) of tracks in jets: The decay point of the b-hadron must lie along its flight path, hence the transverse com-ponent of the impact parameter is signed to discriminate between tracks produced in a b-hadron decay and those originating from the PV. If the track intersects the jet axis in front of (behind) the PV the IP is defined as positive (negative). Tracks from decays of long lived particles have large OBpositive impact parameters making the signed IP a suitable variable for b-tagging. ATLAS has developed the following IP-based algorithms: IP2D and IP3D [31] which are based on a log likelihood discriminant and RNNIP [32] which is a recurrent neural network. The algorithms do not rely on SV finding, and can therefore be combined with secondary vertex-based taggers for improved performance.

• Detection and properties of a secondary vertex: The SV algorithm [33] starts by forming two-track vertices from all tracks in a jet.

It then removes 2-track vertices which are likely to originate from decays of particles with long lifetimes other than b-hadrons (e.g.

K_S⁰ or Λ⁰), photon conversion or hadronic interactions with the detector material. The algorithm then tries to fit a single secondary vertex from all remaining two-track vertices.

• Reconstruction of the b- and c-hadron decay chain topology: The JetFitter algorithm [34] includes vertex reconstruction and IP

in-4.4. B-TAGGING 49 formation, but focuses on the topological decay structure of weak interaction decays of b-hadrons. It assumes that the primary ver-tex, the b-hadron and the c-hadron flight paths are all aligned.

Therefore the algorithm only searches for subsequent decay vertices and tracks with large IP along the flight path of the reconstructed b-hadrons.

4.4.2 High level taggers

Two different types of high level taggers will be described in this thesis:

MV2 and DL1 [31]. Both of the algorithms use as input the information from the low level algorithms IP2D, IP3D, SV1, JetFitter together with the jet variablesp_T andη.

MV2 is a boosted decision tree algorithm which is trained using the ROOT Toolkit for Multivariate Data Analysis (TMVA) [35] on a hy-brid sample whereb-jets are considered signal and c- and light jets are considered background. Specifically the widely used MV2c10 algorithm contains 10% ofc-jets in the background sample.

DL1 is an artificial deep neural network (NN) trained using Keras with the THEANO [36] backend. Its topology consists of a mixture of fully-connected hidden layers and maxout layers. Other than the already described input, DL1 uses a set of variables to discriminate against c-jets. Two variations of DL1 algorithms will be presented in this thesis, the baseline DL1, and a variation called DL1r which uses in addition the RNNIP algorithm as input to the NN. The DL1 algorithms have a multidimensional output corresponding to the probabilities for a jet to contain ab-hadron, a c-hadron or neither of those.

4.4.3 Performance of the tagging algorithms

The performance of the tagging algorithms is estimated by the b-tagging efficiency (_b), c-tagging efficiency (_c) and light-jet tagging ef-ficiency or mistag rate (light). The different tagging efficiencies are de-scribed by equation 4.1 wherei = b, c, light.

i = # tagged i-jets

# i-jets (4.1)

A highb is essential to correctly identifyb-jets, but low candlight

are required to avoid high c- and light-jet contamination.

Jets are considered b-tagged if the output score of the high level al-gorithm is above a certain threshold. This threshold is referred to as a working point (WP). The WPs are chosen such that each of them result in a desiredb-tagging efficiency. The tightest is the 60% WP which tags only 60% of theb-jets but in turn has very low mistag rate. The loosest WP corresponds to a b-tagging efficiency of 85% and intermediate WPs of 70% and 77%b-tagging efficiency are also defined.

Different working points andb-tagging algorithms are used in differ-ent physics analyses. It is important to decide the jet type, b-tagging algorithm and WP that best suits each analysis depending on e.g. the pT of particles produced. Analyses looking for low-pT b-jets often bene-fit from using track jets whereas analyses looking for high p_T jets often benefit from usingpf low or topo jets that also exploit calorimeter infor-mation. Furthermore, a tight WP of 60% can be optimal if a very pure sample ofb-jets is needed. However if statistics is limited and not much contamination from c- or light jets is expected, one can benefit from a looser WP of e.g. 77%.

In document b-jet identiﬁcation and searches for supersymmetry, dark matter and Higgs boson pair production with the ATLAS experiment (Page 47-50)