Study of charged Higgs bosons using multi-jet final states with the ATLAS experiment at the LHC

(1)

Jesper Hallberg Uppsala University

Department of High Energy Physics Department of Physics and Astronomy

Uppsala University Sweden June 27, 2014

Bachelor Degree Project, 15 hp

(2)

Abstract

Eng. The ATLAS experiment at the LHC is searching for evidence of charged Higgs bosons, without yet finding it. This report investigates the possibility of reconstructing charged Higgs bosons using multi-jet final states (pp → [b]tH⁺→ [b]bbbqlν) with focus on fully hadronic H⁺→ hW decays using a simulation of 500k events with m_H+ = 250 GeV. After optimizing TMVA (Toolkit for Multivariate Analysis) input variables for BDT (Boosted Decision Trees), the mass of H⁺ was recreated with an efficiency of up to 26,7% (correct combinations).

Sv. ATLAS experimentet vid LHC letar efter bevis p˚a laddad Higgs- boson, hittills har man inte hittat n˚agot. Den här rapporten undersöker möjligheterna att rekonstruera laddade Higgsbosoner genom multi-jet slut- tillst˚and (pp → [b]tH⁺→ [b]bbbqlν) med fokus p˚a fullt hadroniskt H⁺→ hW sönderfall genom 500k simulerade event med m_H+= 250 GeV. TMVA (Toolkit for Multivariate Analysis) används och variabler optimerades för BDT (Boosted Decision Trees). H⁺ massa ˚aterskapades med en effek- tivitet p˚a upp till 26,7% (korrekta kombinationer).

(3)

1 Introduction

1.1 Particle Physics and the Standard Model

Particle Physics is a reductionistic method to study the universe. It is the study of the fundamental constituents of matter, and the forces with which they interact. What we know to be fundamental particles, however, changes with the course of time. Some 100 years ago we thought that the atom was the smallest fundamental particle. New discoveries revealed that the atom can be split into electrons and a nucleus, and even the nucleus can be split into quarks. Modern particle physics as formulated the Standard Model (SM), containing quarks, leptons and bosons. All members of the SM-family are treated as elementary, i.e. point-particles without internal structures. [1]

Leptons and quarks are spin-¹₂ fermions, and make up matter. They are di- vided into three generations, and each generation of fermion contains one lepton- and quark-doublet. Each fermion has an anti-fermion counterpart, similar to the fermion in every way except it has an opposite charge. However, for the uncharged neutrino, the difference lies in whether its momentum aligns with its spin or not. Anti-particles are denoted with a bar.

The three charged leptons, e⁻, µ⁻, τ⁻ (electron, muon, tauon), form doublets with three neutral leptons, νe, νµ, ντ, called neutrinos (electron neutrino, muon neutrino, tauon neutrino). Table 1.1.1 shows the general structure of leptons in the Standard Model, as well as their masses. The neutrino was assumed to be massless in the Standard Model, until the phenomenon of neutrino oscillation was experimentally proven [2]. While still no exact mass has been found for the neutrino, the Planck Collaboration [3] has set the most recent upper limit to 0.23 MeV for the combined mass of the three different neutrinos.

Generation Lepton Charge Mass [MeV]

1 νe 0 < 2.2 × 10⁻⁶

e⁻ -1 0.511

2 νµ 0 < 170 × 10⁻³

µ⁻ -1 105.7

3 ντ 0 < 15.5

τ⁻ -1 1776.8

Table 1.1.1: Lepton generations and masses. [4]

Unlike leptons, quarks (Table 1.1.2) do not have integer charges. However, the ”quark molecules” (hadrons) which they form always have. Hadrons are called mesons (quark-antiquark pair) and baryons (3 quarks). Examples are the three different pions, π⁻ (du), π⁰(superposition of uu and dd), π⁺ (ud), or

(5)

on the existence of so-called exotic particles, containing 4 quarks, from the LHCb-experiment at CERN [5]. For hadrons to be formed without contradicting the Pauli principle, quarks must also have color charge. There are three color charges - red (r), green (g), blue (b) - as well as corresponding anti-charges (r, g, b). These color charges are not fixed for each quark, and they can interact through color charged gluons without altering the other properties of the quark.

The theory of color charge and interaction is called quantum chromodynamics (QCD).

Generation Quark Charge Mass [MeV]

1 u +2/3 2.3

d -1/3 4.8

2 c +2/3 1.28 × 10³

s -1/3 95

3 t +2/3 173 ×10³

b -1/3 4.18 ×10³ Table 1.1.2: Quark generations and masses. [4]

What governs the interaction between these fundamental matter constituents are the spin-1 gauge bosons. These bosons can be found in Table 1.1.3 with their masses. The photon (γ) mediates the electromagnetic force between charged particles. Its properties are well described in the theory of quantum electro- dynamics (QED). W^± and Z⁰ bosons mediate weak interactions between both quarks and leptons. These three bosons, W^±, Z⁰ and γ, are usually grouped together in the electroweak theory. The gluon (g) is massless like the photon, but couples to color charge rather than electric charge. Gluons themselves have color, which allows gluon-gluon interactions (QED and QCD will not be discussed further in this report).

Gauge bosons Force Electric charge Mass [GeV]

Photon (γ) Electromagnetic 0 0

W boson (W^±) Weak ±1 80.4

Z boson (Z⁰) Weak 0 91.2

Gluon (g) Strong 0 0

Table 1.1.3: Gauge bosons and their masses. [4]

The last component of the Standard Model is the Higgs boson (H), a scalar boson with a spin 0. The discovery of this boson confirmed the existence of the Higgs field which gives mass to elementary particles. It has four degrees of freedom, two of these are absorbed by the W^± bosons and the third by the

(6)

2 Background

2.1 Beyond the Standard Model

Dave Barney and Steven Goldfarb of the CMS collaboration said through a TED-talk: ”We want to see if the particle we have found is the one predicted by the Standard Model or if it fits into other possible theoretical models. And if it fits a different model? That would be even more exciting.” [7]

Beyond the Standard Model physics (BSM physics) seeks to answer the questions that the SM can not answer in its current form. One of the BSM-theories is the Grand Unification Theory, which tries to unite the strong and electroweak interactions. Another is Supersymmetry (SUSY) which solves the fine-tuning problem in high-order Higgs mass calculations [1]. A simple extension of the SM to accept SUSY is the Minimal Supersymmetric Standard Model (MSSM), which would allow for two complex doublets of scalar Higgs fields, needed to generate mass for all quarks and charged leptons. This leads to eight different components of the Higgs field, hence eventually five physical states, two of them would be charged Higgs bosons (H^±). [8] A discovery of charged Higgs boson would be a strong indication for SUSY. If charged Higgs bosons are not found, SUSY is still not ruled out, but has to be more complex.

2.2 LHC and the ATLAS experiment

The LHC (Large Hadron Collider), built by CERN, is the most powerful particle accelerator in the world, and has four detectors. There are two multi-purpose experiments, CMS and ATLAS, which investigate many different aspects of particle physics. The most notable scientific achievement of these experiments is the discovery of the Higgs boson in 2012 [9]. Some experiments are more specific, like ALICE (heavy-ion collisions) and LHCb (CP-violation of b-hadrons).

A major task for the ATLAS and CMS experiments is to measure the properties of the SM-like Higgs boson and explore the Higgs sector even further, not just within the SM but beyond it, as discussed in section 2.1.

2.3 Status of Charged Higgs Boson Searches

Data from the ATLAS experiment is analysed to find possible signatures of H^± production and decays. Searches for charged Higgs bosons with a mass below the top-quark mass and decaying through H^± → τ ν using the data collected at center-of-mass energy of 7 TeV, have been conducted, but no evidence for charged Higgs bosons was found [10]. More recently, the search was repeated

±

(7)

reported in [11].

The decay chains which this report will focus on are H⁺ → tb and H⁺ → hW⁺, both resulting in the same final state [b]bbbqqlν as seen in Figure 2.3.1.

For further reference, note the difference in semi-leptonic and fully hadronic H⁺ decays as shown in Figure 2.3.2.

Figure 2.3.1: Feynman diagrams for H⁺ → tb and H⁺ → hW decays. Each W-boson may have a leptonic or hadronic decay.

Figure 2.3.2: Feynman diagrams for H⁺→ hW decays. The left diagram shows a leptonic decay of H⁺, the right diagram a hadronic one. Note that all the above tH⁺events have the same final state.

Heavy H⁺ are produced through e.g. gluon-gluon interaction as illustrated in Figure 2.3.3.

g g

t t

b

H⁺ b

Figure 2.3.3: An example of heavy H⁺ production through gluon-gluon interaction.

(8)

2.4 Purpose of this Report

Full investigation of the H⁺− > τ ν and H⁺ → tb decay channels has already been started, and in some cases performed. However, the H⁺ → tb analysis is not public yet. Due to time priorities, a full investigation of H⁺→ hW will not be done in Run 1 (the first run of the LHC). This report seeks to answer the two questions, in order to pave way for future H⁺ → hW searches:

• Can one find a combination of variables with a (combinatorial) background rejection strong enough to reconstruct H⁺?

• If yes, is the reconstruction efficiency high enough to justify a full investigation for Run 2 at 13-14 TeV?

3 Methods Used

Below are the major tools and methods used to analyse simulations and data events in high-energy physics, which are used in this report.

3.1 ROOT

ROOT is a framework which provides all the functionality needed in order to handle large event samples (from real data, or as in this case simulation), with C++ as its computing language. ”Big Data” management has since become an integral part of industry and society. ROOT handles objects called Trees, a class designed to reduce disk space and increase access speed. A tree consists of branches, which in our case will be the variables of interest, filled event-by-event as leaves. E.g, in our case:

• Tree - simulation output;

• Branch - ”mass of b-quark”;

• Leaf - measured mass for each event in simulation.

Some functions included in ROOT are e.g. histogramming methods and curve fitting.

The ROOT project started in the context of another experiment conducted at CERN. The experiment generated a huge amount of data, about 10 TB per run. This amount of data was to be expected from the not-yet constructed LHC experiments, so it was an ideal base to create the next generation of data analysis tools. [12]

(9)

3.2 TMVA

TMVA (Toolkit for MultiVariate Analysis) is a toolkit for ROOT with a large variety of multivariate classification algorithms. Multivariate means it is designed to handle several variables simultaneously. Its main use is to discriminate signal from background in a given data-set. In this report, ”signal” will be correct combinations of variables, while ”background” will be incorrect combination (discussed later in the report). It is also used to see how different variables correlate with one another, so one can reduce the amount of meaningful variables. It basically works as a complex box in which one inserts some variables that describe signal and/or background events and extract information on how significant those variables are for the discrimination. [13]

For some examples of TMVA output, see Figures 3.2.1 and 3.2.2 below.

Figure 3.2.1: Background vs. Signal. The plot shows output distributions for a signal and background sample for a given classifier (in this case BDT, i.e.

Boosted Decision Trees). The goal is to find a combination of variables with which these areas are as widely separated as possible, in order to make a good cut on the BDT output, which would return a high signal vs. background separation. More on this in Sec. 4.2.

(10)

Figure 3.2.2: The y-axis shows the background rejection, i.e. how much background is lost after a given cut on the BDT output, while the x-axis shows how much signal is kept. This gives an idea on how discriminative a set of chosen variables is. The further apart the areas in Figure 3.2.1 are, the closer this curve would be to the top-right corner.

3.3 Boosted Decision Trees

Decision Trees is a method which classifies events through asking it a series of questions. All events are asked the same initial question and, depending on the answer, sent to a different question. This repeats until enough questions are answered to classify the event. The terminology is similar to the one used for ROOT. Every question is called ’node’, the first question ’level 0 node’, the next ’level 1 node’ etc. ’Branches’ link these nodes, e.g. the level 0 node can have three branches connected to it, each leading to different level 1 nodes. The branch chosen depends on how the event answers the question given in the level 0 node. Finally, the event ends up in a ’leaf’ which is a node without following branches. [14]

Boosted Desicion Trees (BDTs) are decision trees in which each node only has two (or zero) branches. Instead many different trees are created where each new tree is specialized on events with low performance in the previous tree, i.e.

boosting. Combining many weak trees generates a good result.

(11)

4 H

⁺

→ tb TMVA Analysis

In this and the next sections, the analysis is performed on signal ntuples only, i.e. there is no background from SM processes. Wrongly assigned combinations of objects are referred to as the background. The signal corresponds to the correctly assigned objects, i.e. this TMVA analysis discriminates correctly assigned combinations from wrongly assigned ones.

4.1 Preparation

The reason behind this analysis is primarily to learn the tools and methods described in section 3. A detailed investigation of the H⁺ → tb channel has already been performed by the ATLAS group in Uppsala, which can later be applied to the H⁺→ hW analysis.

A program for discriminating signal from background was made by Alexan- der Madsen, similar to the program that will later be built for the H⁺→ hW analysis. A variety of multivariate analyses are available (in this case limited to BDT, described in section 3.3). The program loads the ntuple, and one has to manually add each variable to be used in the analysis. The H⁺→ tb ntuple contains 70 variables. As the variables are not only individually discriminating against background, but also against each other (by investigating how much they correlate to one another, see Sec. 4.2 for details), the computation time when adding variables increases exponentially. Due to efficiency reasons, a number of six variables was chosen for each run, training 850 decision trees (used for this entire section), resulting in a computation time of 10 minutes.

Only using six variables, each run requires the user to manually open up the program and change input variables by hand every 10 minutes, but using more variables exponentially increases computation time. To circumvent this, a small loop was added, repeating the procedure six times with six different variables every time, then saving six unique output data files. This enables the user to test more variables with ”one click”, i.e. 6x6 variables with only a one hour runtime.

A script was created to move the output data files to a different location and plot the ”Signal Efficiency vs. Background Rejection” curves (described in Fig.

3.2.2) for each 6-variable run. Each output file is plotted with a corresponding color (see Fig. 4.1.1) so the most discriminant group of variables can be picked for further analysis.

(12)

Figure 4.1.1: Background Rejection vs. Signal Efficiency plots for two runs with 6x6 variables each.

(13)

4.2 Analysis

In Fig. 4.1.1, the red and yellow curves have a significantly higher signal efficiency than the other curves. TMVA ranks variables in order of significance to the overall efficiency of the method used (BDT). From each of the two runs with the highest signal efficiency, the four highest ranked variables were chosen for further analysis, i.e. another TMVA run based on eight variables. These were, in order of ranking (highest to lowest):

• M th (mass of t-quark in hadronic side);

• M tl (mass of t-quark in leptonic side);

• dR Wh bh (distance between W-boson and b-quark on hadronic side);

• Ang l v (angle between lepton and neutrino);

• dPt Wh bh (difference in transverse momentum between W-boson and b-quark on the hadronic side);

• dEta Wh bh (difference in η between W-boson and b-quark on hadronic side, where η is a function of the polar angle called pseudorapidity;

• Ang Wh bh (angle between W-boson and b-quark on hadronic side);

• dPhi Wh bh (difference in azimuthal angle between W-boson and b-quark on hadronic side)

In addition to ranking variables, TMVA also produces a correlation matrix (Fig. 4.2.1), showing how much each of the variables correlates to each other. If two variables show a 100% correlation, they contain the same information. If all used variables have little or none correlation to each other, the signal efficiency is likely to be much higher.

After the first run with eight variables, a 7-variable run was performed without dPhi Wh bh, i.e. the least discriminative variable. In this run, Ang l v had the lowest rank and was therefore removed. This was to be expected since dPhi, Ang and dR have high correlations, as they are all variables of distance between objects. The final six variables were M th, M tl, dR Wh bh, dPt Wh bh, dEta Wh bh and Ang Wh bh.

(14)

Figure 4.2.1: Example of a correlation matrix for the variables above.

4.3 Results

The variables used in the final run and their ranking are as follows:

--- Rank : Variable : Variable Importance ---

1 : M_th : 2.951e-01 2 : M_tl : 1.736e-01 3 : Ang_Wh_bh : 1.469e-01 4 : dR_Wh_bh : 1.391e-01 5 : dPt_Wh_bh : 1.343e-01 6 : dEta_Wh_bh : 1.111e-01

---

The exact value of ”Variable Importance” is not important (it is a weighted measure of how often the variables are used to ”split” decision tree nodes [13]), what to look for here is their relative values. M th has by far the highest importance for this set of variables. Figure 4.3.1 shows the signal plotted against

(15)

more blue area (signal) outside the red area (background) means the variable is more discriminative against background. Figure 4.3.2 shows the total signal separation from background for the chosen method, BDT. Preferably the peaks would have a wider separation, and the lump in the background area around -0.2 should be worked around, but for the intent of this analysis, this is sufficient.

Note that it is impossible to know which combination of variables are most discriminative against background unless you test every combination. E.g. in this case, the 7th ranked variable in the 8-variable run was not ranked 7th in the 7-variable run performed after. However, testing every different combination of all 70 variables would take too long (more than 130 million combinations).

Figure 4.3.1: Variables used for the TMVA analysis of H⁺→ tb.

The Signal Efficiency vs. Background Rejection plot (Fig. 4.3.3) is clearly better than the example plots shown in Fig. 3.2.2 or any of the curves in Fig.

4.1.1.

(16)

Figure 4.3.2: Normalized signal and background plots, showing their separation.

Figure 4.3.3: Final Signal Efficiency vs. Background Rejection plot for H⁺→ tb

5 H

⁺

→ hW Analysis

The report will now focus on the other [b]tH⁺→ [b]bbbqqlν channel, H⁺→ hW . The decay chain is similar to that of H⁺→ tb (see Fig. 2.3.1), and particle/jet labelling is used as in Fig. 5.0.1. Before the TMVA analysis is repeated for this channel, a smaller set of simulated events is truth-matched (Sec. 5.1) and the

(17)

reconstruct the mass of H⁺ (Sec. 5.5). Note that the H⁺ → hW analysis is limited to hadronic decays of H⁺.

Figure 5.0.1: Diagram of the decay analysed in this section, together with nam- ing of particles used in the analysis.

The simulated data uses m_h= 125 GeV, m_H+ = 250 GeV, m_W = 80.4 GeV and the following event selection:

• Total jet-multiplicity ≥ 5;

• Number of b-jets ≥ 3 (b-jets are discussed in Sec. 5.2);

• Missing ET ≥ 30 GeV

• Lepton p_T ≥ 25 GeV )

E_T and lepton/neutrino p_T are discussed in Sec. 5.2;

• Lepton |η| ≤ 2.47.

Any event not fulfilling all of the above requirements is rejected.

5.1 Truth Matching

Truth matching means that a reconstructed object from the detector is matched with a true particle from simulation. Note that neither of these can be ’real’

data, truth matching can only be done with simulated events. A class named TLorentzVector was used for 4-vector computations. All jets are stored in a tree in the ntuple called ’jet’, and all true particles in ’mc’. A loop calculates the distance dR between each combination of jet and true particle, and the combination with smallest dR is tagged as true. If no combination with sufficiently small dR is found, the event is marked as having no true combinations. Particles and jets are loaded as 4-vectors and added up to compute mass and momentum

(18)

In Fig. 5.1.1, the reconstructed Higgs mass after truth-matching is shown.

The ”0-events” are when no Higgs candidate could be reconstructed from final- state bb pairs, for example when the Higgs decays to something else than bb (only 60% of decays are to bb). The peak is at around 125 GeV as expected, and the tail can be explained by the selected area for the jet. If the area is too big, more background hadrons may be recorded as jets, and if too small, some parts of the jet might be outside the area, and energy will be ”missed”.

Figure 5.1.1: Mass for reconstructed Higgs matched to true Higgs. Around 3000 events out of 5000 had no true combination. The reconstructed Higgs is built from the jets being closest to the true particle, while the true Higgs is the actual Higgs particle.

5.2 TMVA Preparation

As mentioned in Sec. 4, TMVA is used to discriminate correctly assigned combinations of variables against wrongly assigned ones. Thus ntuples have to be created, containing both correct and incorrect combinations. The code used in Sec. 5.1 is a good start for a program creating these ntuples, as it is already designed to read simulated ntuples and do combinations.

Like in Sec. 5.1, all q and b jets are stored in a tree ’jet’. However, for the TMVA ntuple preparation, incorrect combinations of q- and b-jets are also needed. Still one does not want to use a q as a b or vice versa, so any jet index can not be used for both q and b. This is solved by b-tagging (the details of b-tagging will not be discussed in this report), a tool that ranks each jet particle on a scale 0-1 (not percent). If the rank is above a chosen threshold, it is marked as a b-jet and the jet index is saved to a bJets vector. If below, it is not a b-jet and its index is saved to a qJets vector.

In the truth matching, only the hadronic side was investigated. As the entire decay is now going to be analysed with TMVA, the leptonic side (t- decay) is to be added. This requires some additional work as the neutrino can not be detected. The transverse momentum (pT) of the neutrino is stored in the

(19)

of the collision axis (z-axis) can not the measured as we do not know the p_z of colliding partons. Using the missing pT, the measured properties of the lepton and setting the mass of the W-boson to 80.4 GeV, the neutrino momentum along the z axis can be calculated. This calculation leads to a quadratic equation, i.e.

it gives two solutions. If the solutions are real, the solution closest to the lepton is chosen. If the solutions are imaginary, collinear approximation is applied, i.e.

pz_ν = pz_l. The result is used to build a 4-vector for the neutrino.

Like jets, leptons (electrons or muons) are truth matched to the corresponding true particle through minimizing dR. Every event is considered only having one lepton, which means the lepton, neutrino and Wl offer no discrimination against combinatorial background.

Two ntuples are created, one with truth-matched events and one with incorrect events (seen as ’background’ in TMVA). Incorrect combinations are created through randomly chosing b1, b2, b3, q1 and q2 (see Fig. 5.0.1), and building 4- vectors for H⁺, W , h and t with these. Values are extracted from the 4-vectors for each variable chosen for TMVA analysis and saved into the ntuple. A total of 58 variables were chosen, yielding NTrees = 850.

5.3 TMVA Analysis

As the number of simulated events for H⁺→ hW are fewer than for H⁺→ tb, a larger number of variables can be tested at once while keeping the computation time reasonable. However, since 58 variables are still too many, a loop as described in Sec. 4.1 is used to run TMVA with variables in groups of 10. In each run, the lowest ranked variable is discarded. This is repeated until 20 variables remain, few enough to run them all at the same time while keeping the computation time within 15 minutes. In every run, the three least-ranked variables are noted, and if a variable scores in the bottom three after two consecutive runs, it is discarded.

The final ten variables are as follows (in order of rank, for labelling see Fig 5.0.1):

• M Hp (mass of H⁺);

• b2 pt (transversal momentum of b2);

• b1 pt (transversal momentum of b1);

• q2 pt (transversal momentum of q2);

(20)

• M Wh (mass of Wh);

• dEta q1 q2 (difference in η between q1 and q2);

• dR h Wh (distance between h and Wh);

• dR b1 b2 (distance between b1 and b2).

Mass variables are usually discriminative. This has also been seen in Sec. 4.3, which makes it interesting to note that M h, i.e. the mass of the Higgs boson candidate, has been discarded in this TMVA analysis. This can be explained by the fact that b1 and b2 already have a strong presence (both from dR and pt variables), and h correlates strongly to b1 and b2. From here, variables were tested individually. It is known that dR and dEta correlate (as dR is a function of dEta and dPhi), so removing either dR q1 q2 or dEta q1 q2 is the natural next step. Even though dR q1 q2 ranked higher in this run, removing it showed a smaller decrease in the overall signal efficiency compared to removing dEta q1 q2. Removing dR q1 q2 left dR h Wh ranking lowest, and removing it had no significant implication on the overall signal efficiency.

5.4 TMVA Results

Two different runs in TMVA were performed, one with M Hp and one without.

The background rejection is better when including M Hp and the reconstruction should be better when including it, but in practise the mass of H⁺is unknown, and a method not using this mass parameter would be more realistic.

Figure 5.4.1: Normalized signal and background plots, showing their separation.

Left is including M Hp, right is without M Hp. Clearly, including M Hp gives a better separation.

(21)

Figure 5.4.2: Variables used for the TMVA analysis of H⁺→ hW .

(22)

Fig. 5.4.1 shows how the inclusion of M Hp gives a better separation. Indeed, the combinatorial background around -0.2 on the x-axis is bigger, when exclusion of M Hp seems to shift some of combinatorial background peak to around 0.1 on the x-axis, where the correct signals peak. In other words, the area of the correct combination signal and combinatorial background overlaps more when excluding M Hp. Because of this, even though the exclusion of M Hp shows a smaller correlation (compare Fig. 5.4.4 to Fig. 5.4.3), the total background rejection vs. signal efficiency is better when excluding M Hp (Fig. 5.4.5). Ultimately a lower reconstruction efficiency is obtained without M Hp, discussed in Sec. 5.5.

Figure 5.4.4: Linear correlation coefficients for the used variables, without M Hp.

Figure 5.4.5: Signal efficiency vs. background rejection plots. Left is including M Hp, right is without M Hp. Clearly, including M Hp gives a better background rejection, however, it is impossible to tell how much better before reconstructing H⁺.

(23)

5.5 H

⁺

Reconstruction

To check how well TMVA really performs, its output data (i.e. the BDT response) is used to select the best object combination and reconstruct variables, such as the mass of H⁺. The reconstructed mass is then compared to the truth- matched mass. TMVA produces a ROOT macro which ranks a combination of variables from 0 to 1, depending on the chosen variables and the TMVA method.

The same macro used for truth-matching and ntuple-creation is used as it already has the proper cuts, b-tagging, etc. implemented. A loop creates values for all final variables from TMVA (q1 pt, q2 pt, M Wh etc.) for every possible combination och b1, b2, q1 and q2. The highest ranked combination is kept and 4-vectors are built like described in sec. 5.2, but instead of saving the values to an ntuple, it is plotted in histograms. The jet-indices used for the reconstructed variables are saved for each event and compared to the indices of the true combination to get an efficiency of the reconstruction method.

The random-generated combinations used to simulate background is also plotted, and its indices are saved, for reference (e.g. a reconstruction efficiency of 5% would not be interesting if the background efficiency is 4%).

(24)

6 Results

The reconstruction of the mass of H⁺ as described in sec. 5.5 is shown in Fig.

6.0.1 for TMVA including M Hp and in Fig. 6.0.2 for TMVA excluding M Hp.

The method using M Hp as a variable has a reconstruction efficiency of 26,7%

for picking the right object combination. The method not using M Hp has an efficiency of 17,6%. This can be compared to the randomly selected combination which has an efficiency of only 3,5% in both cases.

While the method using M Hp has a tighter peak around 250 GeV, note that even the wider peak from the other method is still at 250 GeV. This is important because even with a lower efficiency and a longer tail, the TMVA analysis manages to pin the H⁺ mass to 250 GeV.

Fig. 6.0.3 shows both methods plotted simultaneously for comparison.

Figure 6.0.1: Plots of the reconstructed H⁺ mass (green line), truth-matched H⁺ mass (blue line) and randomly combined H⁺ mass (red line) from TMVA including M Hp. The x-axis shows the reconstructed H⁺ mass in GeV and the y-axis number of events.

(25)

Figure 6.0.2: Plots of the reconstructed H⁺ mass (green line), truth-matched H⁺ mass (blue line) and randomly combined H⁺ mass (red line) from TMVA not including M Hp. The x-axis shows the reconstructed H⁺ mass in GeV and the y-axis number of events.

Figure 6.0.3: Comparison of the H⁺ mass reconstructions, including M Hp (green line) or excluding M Hp (yellow line).

(26)

7 Possible Improvements

With more time, one would run more combinations of variables in TMVA. A script could be created which loops the TMVA macro over every combination with a fix number of variables (e.g. 6-8). The script would save the most discriminating set or keep track on the average ranks of variables (i.e. M Wh ranks high in most combinations, dPhi b1 b3 ranks low). This would not be too hard to implement, but it would take a long time to execute.

More methods besides BDT should also be tested, like Fisher and likelihood.

These methods were tested (see Fig. 7.0.1), however only for the final eight variables. These variables were chosen because they performed best for BDT, other sets of variables might perform better for Fisher and likelihood, possibly with an even better background rejection than BDT with current variables.

Additional mass points should also be tested, such as 350 and 600 GeV, since m_H+ is not necessarily 250 GeV.

Figure 7.0.1: Signal efficiency vs. background rejection for multiple methods.

The figure shows the TMVA run with the final eight variables, including M Hp, but also Fisher and likelihood with the same variables.

(27)

8 Conclusions

Charged Higgs boson searches in multi-jet final states (tH⁺ → [b]bbbqlν) were studied in this report, with focus on hadronic H⁺ → hW decay with a H⁺ mass of 250 GeV. Cuts were applied to the simulated ntuple (500k events) and two TMVA input files were created (one with truth-matched combinations and one with randomly selected combinations) with 58 variables. Optimizing the BDT response resulted in a set of 8 variables. The TMVA result was used to reconstruct the event with an efficiency of 26,7% when using H⁺ mass as an TMVA input variable, and 17,6% when not using it. Both of the mentioned reconstructions show a H⁺ mass peak at around 250 GeV, as expected.

This means the answers to the questions stated in Sec. 2.4 are;

• Yes, in this case a method could be developed with an efficiency of 17-27%,

• For the mass point studied in this report, the efficiency is high enough to justify further investigations towards the Run 2 of LHC. In particular, more mass points should be studied and a full analysis to discriminate possible H⁺signals from the Standard Model backgrounds should be pursued before drawing final conclusions.

(28)

9 References

[1] Martin BR, Shaw G. Particle Physics. 3rd ed. - Manchester, John Wiley

& Sons. 2008.

[2] Karagiorgi G, Aguilar-Arevalo A, Conrad J, et al. Leptonic CP violation studies at MiniBooNE in the 3 + 2 sterile neutrino oscillation hypothesis.

Phys. Rev. D 75, 013011 (2007)

[3] The Planck Collaboration. Planck 2013 results. XVI. Cosmological Param- etes. arXiv:1303.5076v3 (2013)

[4] Beringer J. et al. (Particle Data Group). 2013 Review of Particle Physics PR D86 010001 (2012)

[5] The Belle Collaboration. Observation of a Resonancelike Structure in the π⁺⁻Ψ⁰ Mass Distribution in Exclusive B → Kπ⁺⁻Ψ⁰ Decays. Phys. Rev.

Lett. 100, 142001

[6] Bernardi G, Carena M, Junk T. Higgs Bosons: Theory and Searches. Par- ticle Data Group, Reviews. (2007)

[7] Barney D, Goldfarb S. The Basics of the Higgs Boson. Video: Ted- Ed/CERN. (2007)

[8] Frisch W. The Higgs Sector in the MSSM. HEPHY Institut f¨ur Hochen- ergiephysik. (2010)

[9] CERN Press Office. CERN Experiments Observe Particle Consistent With Long-sought Higgs Boson. Geneva. (2012)

[10] The ATLAS Collaboration, Search for charged Higgs bosons decaying via H^± → τ ν in tt events using pp collision data at √

s = 7 TeV with the ATLAS detector - JHEP06 (2012) 039

[11] The ATLAS Collaboration, Search for charged Higgs bosons in the τ -jets final state with pp collision data recorded at√

s = 8 TeV with the ATLAS experiment - ATL-CONF-2013-090

[12] ROOT, About (Available online: http://root.cern.ch/drupal/content/about) [13] TMVA - Users Guide - CERN-OPEN-2007-007

[14] Evans H, Decision Trees - Indiana University

Study of charged Higgs bosons using multi-jet final states with the ATLAS experiment at the LHC

Contents

1 Introduction

1.1 Particle Physics and the Standard Model

2 Background

2.1 Beyond the Standard Model

2.2 LHC and the ATLAS experiment

2.3 Status of Charged Higgs Boson Searches

2.4 Purpose of this Report

3 Methods Used

3.1 ROOT

3.2 TMVA

3.3 Boosted Decision Trees

4 H

→ tb TMVA Analysis

4.1 Preparation

4.2 Analysis

4.3 Results

5 H

→ hW Analysis

5.1 Truth Matching

5.2 TMVA Preparation

5.3 TMVA Analysis

5.4 TMVA Results

5.5 H

Reconstruction

6 Results

7 Possible Improvements

8 Conclusions

9 References