Uncertainty-aware Tracking of Single Bacteria over Image Sequences with Low Frame Rate

(1)

STOCKHOLM, SWEDEN 2015

Uncertainty-aware Tracking of Single

Bacteria over Image Sequences with

Low Frame Rate

AXEL THEORELL

(2)

(3)

Bacteria over Image Sequences with

Low Frame Rate

A X E L T H E O R E L L

Master’s Thesis in Optimization and Systems Theory (30 ECTS credits) Master Programme in Applied and Computational Mathematics

(120 credits) Royal Institute of Technology year 2015 Supervisor at Forschungszentrum Jülich GmbH (FZJ): Dr.Katharina Nöh Supervisor at KTH: Johan Karlsson Examiner: Johan Karlsson

TRITA-MAT-E 2015:66 ISRN-KTH/MAT/E--15/66--SE

Royal Institute of Technology SCI School of Engineering Sciences

KTH SCI

(4)

(5)

Abstract

Modeling and Simulation Group

Institute for Bio- and Geosciences 1: Biotechnology

Master of Science

Uncertainty-aware Tracking of Single Bacteria over Image Sequences with Low Frame Rate

by Axel Theorell, B.Sc.

In single-cell analysis, the physiologic states of individual cells are studied. In some studies, the subject of interest is the development over time of some cell characteristic. To obtain time-resolved single-cell data, one possibility is to conduct an experiment on a cell population and make a sequence of images of the population over the course of the experiment. If a mapping is at hand, which determines which cell it is that is the cause of each measured cell in the image sequence, time resolved single-cell data can be extracted. Such a mapping is called a lineage tree, and the process of creating it is called tracking.

One aim of this work is to develop a tracking algorithm that incorporates organism specific knowledge, such as average division time, in the tracking process. With respect to this aim, a Bayesian model that incorporates biological knowledge is derived, with which every hypothetical lineage tree can be assigned a probability. Additionally, two Monte Carlo algorithms are developed, that approximate the probability distribution of lineage trees given by the Bayesian model. When an approximate distribution is known, for example the most likely lineage tree can be extracted and used.

In many cases, the information provided to an automatic tracking algorithm is insuffi-cient for the algorithm to find the gold standard lineage tree. In these cases, a possibility is to construct the gold standard lineage tree by manual correction of the lineage tree that has been provided by the tracking algorithm.

(6)

(7)

Abstract

Modeling and Simulation Group

Institute for Bio- and Geosciences 1: Biotechnology

Master of Science

Os¨akerhetsmedveten Tracking av Enskilda Bakterier som Avbildats p˚a en

Bildserie med L˚ag Frekvens

by Axel Theorell, B.Sc.

I enskild-cell analys studeras de fysiologiska tillst˚andet hos enskilda celler. I vissa studier ¨

ar man intresserad av hur n˚agon cellegenskap utvecklas över tid. Ett sätt att generera tidsupplöst data p˚a enskild-cellniv˚a är att utföra ett experiment med en cellpopulation och avbilda den med mikroskop med jämna mellanrum. Med hjälp av en avbildning som beskriver vilken cell i experiment det är som ger upphov till vilken uppmätt cell i bild-sekvensen, kan sedan enskild-cell data tillg˚as. En s˚adan avbildning kallas ett stamträd (lineage tree), och processen att bestämma stamträdet kallas tracking.

En m˚alsättning med detta arbete är att utveckla en trackingalgoritm som använder organismspecifik kunskap, s˚asom organismens genomsnittliga delningstid, i trackingpro-cessen. Med denna m˚alsättning i hänseende härleds en bayesiansk modell med vilken varje stamträd kan tillskrivas en sannolikhet, och som kan ta hänsyn till biologisk fakta när detta sker. Därtill utvecklas tv˚a Monte Carlo algoritmer som approximerar san-nolikhetsfördelningen av stamträd som härrör ur den bayesianska modellen. När en uppskattad fördelning är känd kan t ex det mest sannolika stamträdet i fördelningen användas för enskild-cell analys.

I m˚anga fall är informationen som en automatisk trackingalgoritm har till hands inte tillräcklig för att algoritmen ska kunna producera gold standard stamträdet. I dessa fall kan det vara befogat att konstruera gold standard stamträdet genom att göra manuella korrektioner p˚a ett stamträd som tagits fram automatiskt med en algoritm.

(8)

(9)

My advisor and my supervisors have been a great help in completing this project. Im-portant to mention is also Samuel Leweke M.Sc, who has spent much time giving me useful advice.

(10)

(11)

1 Introduction 1

1.1 Introduction to single-cell analysis . . . 1

1.1.1 Image analysis pipeline . . . 3

1.1.2 Project objectives . . . 4

1.1.3 Structure of the thesis . . . 5

2 Tracking overview 6 2.1 Crocker and Grier . . . 6

2.1.1 Discussion on Crocker and Grier . . . 9

2.2 Jaqaman et al . . . 9

2.2.1 Discussion on Jaqaman et al . . . 10

2.3 Amat et al. . . 11

2.3.1 Discussion on Amat et al . . . 12

2.4 Bayesian inference tracking . . . 12

2.4.1 Discussion on Bayesian inference tracking . . . 17

3 A Bayesian Model that Incorporates Biological Knowledge in Cell Tracking 19 3.1 Necessary modifications of the reference model for the incorporation of biological knowledge . . . 19

3.2 Deriving the BMIBK . . . 21

3.3 How the BMIBK is used for tracking in this work . . . 25

3.4 Assignment confidence . . . 29

3.4.1 Approximate distributions and resolution . . . 30

3.4.2 Displaying assignments and assignment confidence . . . 31

4 A Markov Chain Monte Carlo Approach to Tracking 32 4.1 Markov Chains . . . 33

4.2 Monte Carlo Methods . . . 34

4.2.1 Metropolis-Hastings . . . 36

4.3 Application to tracking. . . 38

4.3.1 The proposal density. . . 38

4.3.2 Selection Rules Without Root Change . . . 39

4.3.3 Selection Rules With Root Change . . . 46

4.3.4 Discussion on probability distributions for the sampling processes. 49 4.3.5 The probability distributions for the sampling processes used in practice . . . 51

(12)

5 A Particle Filter Approach to Tracking 57

5.1 Introduction to Sequential Monte Carlo Methods . . . 58

5.2 Applying the PF for tracking with the BMIBK . . . 62

5.2.1 Correcting for the sampling order. . . 63

5.2.2 Correcting for the splits . . . 64

6 Evaluation of the BMIBK, the MCMC and the PF 65 6.1 Implementation of the MCMC and the PF. . . 65

6.2 Convergence test . . . 67

6.3 Performance evaluation on a realistic example . . . 69

6.3.1 Performance indicators. . . 71

6.4 Results and discussion . . . 72

6.4.1 Discussion on the main results . . . 73

6.4.2 Discussion on the convergence of the MCMC . . . 74

6.4.3 Discussion on the convergence of the PF . . . 77

7 Summary and Outlook 79 7.1 Summary . . . 79

7.2 Continuing this work . . . 80

7.2.1 Systematic assessment of parameters . . . 81

7.2.2 Improved sampling schemes . . . 81

(13)

AR Acceptance Ratio

BMIBK Bayesian Model that Incorporates Biological Knowledge

DTMC Discrete Time Markov Chain

MCMC Markov Chain Monte Carlo

MH Metropolis-Hastings

PF Particle Filter

(14)

(15)

Introduction

1.1 Introduction to single-cell analysis

Traditionally the investigations of biotechnological production processes have been con-ducted on the cell population average. With this approach, the roles of the individuals cells in the process remain unknown. If the cells in the population are in different physiologic states, the population is called heterogeneous. Since heterogeneity can have impact on the properties of the production process, such as product yield per time, in-sights about heterogeneity might hold keys for obtaining more efficient production. In single cell analysis, heterogeneity is investigated by studying the individual cells. One way to get insights at single cell level is to make a sequence of images of the pop-ulation as it develops, see Figure 1.1. Information about the physiological state of the cells in the images can be acquired with the help of biosensors. A biosensor translates the occurrence of some molecule within the cell to an optical readout. Biosensors can for example be encoded into the genome of the cells [N. Mustafi,2014].

Figure 1.1: Corynebacterium glutamicum cultivated with Glutanate -MOPS growing on a microfluidic chip. The green and red dotted cells in the three last frames originates

from the green and red dotted cells in the first frame.

(16)

Since the images are in a time sequence, they are indexed according to their order in this sequence. Conventionally these indexes are called frames. The cell properties studied by single cell analysis can be divided into two categories: temporal and non-temporal properties. For a non-temporal property, every frame can be analyzed independently. An example could be the inter-cell variance of the readouts of some biosensor. A tem-poral cell property develops over time. To extract information about a temtem-poral cell property, it is necessary to know the inter frame correspondences between the cells. In other words, given a certain cell at a certain frame, which cell(s) does this cell corre-spond to in the proceeding frame. A map over the inter frame correcorre-spondences is called a lineage tree. The process of creating a lineage tree is called tracking.

Tracking multiple objects over a sequence of images automatically is a well investigated problem with many algorithms at hand, for example [Amat et al.,2014,Schiegg et al.,

2014,Jaqaman et al.,2008,Coraluppi and Carthel, 2011]. A short survey of some ex-isting tracking techniques will be given in Chapter2.

When choosing a tracking algorithm, the concept of frame rate is of importance. Frame rate refers to how often the images in the sequence are sampled. The implications of the frame rate are illustrated in Figure1.2. From the figure it is clear that lower frame rate (lower temporal resolution) implies less similarity between consecutive frames. That lower frame rate implies less similarity is also clear from the fact that the cells have more time to move between every image.

Figure 1.2: Illustration of frame rate: The first row displays a sequence of nine images at a constant frame rate 4 frames per hour. The second and third rows display the same experiment as the first row, but with frame rate 2 and 1 frames per hour respectively.

(17)

consecutive images. An example of this increased difficulty can be seen in Chapter

6, where a standard tracking algorithm [Jaqaman et al.,2008] yields poor results when tracking the bacteria c. glutamicum, cultivated with gluconate -MOP in a sequence with 4 frames per hour.

In the preceding sentence, it is mentioned that a tracker yields poor results. How poor the results of a tracker is or the accuracy of the tracker, is defined as the distance between the lineage tree produced by the tracker and a gold standard lineage tree. One definition of distance between two lineage trees called edit distance is introduced in Section6.3.1.

In single cell analysis, cells are always the subject of tracking. That it is cells and not any objects that are being tracked can be used to provide extra information for the tracking. If we for example know the average division time or the expected change in morphology between consecutive frames for the organism that we are tracking, this knowledge can be used to increase the accuracy of the tracking. Increasing the accuracy of a tracker is of special interest for cases when the tracker is known to perform poorly, as in cases with low frame rate.

1.1.1 Image analysis pipeline

With an image analysis pipeline, it is meant a number of steps that lead you from a sequence of images to a lineage tree for the cells. A list of steps follows:

• automatic segmentation

• manual correction of segmentation (optional) • automatic tracking

• manual correction of tracking (optional)

Optional means that the step is applied only under some circumstances, for example un-der circumstances where it is known that the automatic tracker generates lineage trees with poor quality.

(18)

should distinguish the cells in the images.

In the tracking step, the objects are linked between the frames. In the case of single cell tracking, the cells (the objects) divide and die. Therefore one cell can be linked to zero, one or two cells in the proceeding frame. Over the course of the text, the links are often referred to as assignments.

If very high accuracy is required, the segmentation and tracking can be complemented with manual correction steps. However, making manual corrections is usually time con-suming, so small improvements in the automatic steps generate big time gains when taking the whole pipeline into account. If the tracking could provide a confidence level to each assignment, the person doing the manual corrections could focus on the assign-ments with low confidence and thus save time. Again to the present knowledge of the author, no present tracking algorithm provides such a confidence.

An image analysis pipeline does not have to follow the above steps. A good example is [Schiegg et al., 2014], where several alternative segmentations are proposed, and the final segmentation is determined by optimizing over segmentation and tracking simulta-neously.

1.1.2 Project objectives

The project has two objectives:

• Developing a tracking algorithm that incorporates organism specific knowledge to track bacteria over an images sequence. The tracking will be done under the assumption that a correct segmentation is at hand.

• Defining and implementing an assignment confidence measure that optimizes the manual correction of the tracking by providing information about whether the assignment is likely to be correct or not.

The organism specific knowledge considered in this work is limited to two examples of organism specific knowledge. Those are:

(19)

• Knowledge about from what random distribution the size changes between two frames for the organism comes.

The intention is to develop a tracking model that takes the two objectives into account. For this model, a tracking algorithm is to be developed. The algorithm is to be imple-mented as a plugin to the Java image analysis software TrackMate.

1.1.3 Structure of the thesis

(20)

Tracking overview

In this chapter, an overview of tracking techniques relevant for this work is given. The first technique presented is a greedy technique. Greedy implies that the algorithm seeks an optimal solution from frame to frame according to some optimization criterion. The other techniques are non greedy, which implies that they optimize according to some criterion that takes all frames into account simultaneously. In Figure2.1 we see that a non greedy algorithm will solve one larger optimization problem and a greedy algorithm will solve many smaller optimization problems.

2.1 Crocker and Grier

An early attempt on a robust tracking algorithm was presented by Crocker and Grier in 1996 [Crocker and Grier, 1996]. The paper is not necessarily about tracking cells, but rather any indistinguishable objects, here named particles. Because of the indistin-guishability, it is assumed that only proximity can be used for determining the particle correspondence between the frames. The tracked particles are supposed to move accord-ing to some probability distribution P (δ|t), where δ is the movement length over the time t. The algorithm works sequentially from frame to frame and maximizes P ({δi}|t),

where the notation {δi} refers to the joint probability for all particles. Naively solving

the frame to frame optimization for N particles requires O(N !) computations, which can only be solved for small N . Therefore, a maximal movement length L is introduced. If it

(21)

F

rame

1

2

3

4 Greedy

non

greedy

Figure 2.1: The above figure shows a greedy tracking procedure. The best tracking (optimal according to some criterion) has been found between frame 1 and 2. The algorithm continues to find the best tracking between frame 2 and 3. The below figure shows a non greedy tracking procedure. The tracker seeks the optimal solution for all

frames simultaneously.

is assumed that particles move maximally the length L between consecutive frames, par-ticle clusters that are further then a distance L apart in the two frames can not exchange any particles. Thereby, the N particles can be divided into m non-interacting clusters with sizes Mi, i ∈ [1, m]. The dividing into clusters is illustrated in Figure2.2. As the

clusters are defined, they stretch over two consecutive frames. Solving the maximum likelihood for the clusters then only implies O(Mi!) computations for each cluster, which

saves effort, given that the largest Mi is relatively small. If a cluster has more particles

(22)

L

Case I

Case II

Figure 2.2: In case I, the particles are grouped in two non interacting clusters. The filled discs are particles at a frame k − 1 and the dotted discs are particles at frame k. Each particle has a ring with radius L around it, indicating the maximal movement length. Since the rings of radius L have no overlap between the two cluster in case I, the clusters are categorized as independent of one another for the frame transition k − 1 to k. In case II, all particles are in one cluster. To find the optimal tracking between k − 1 and k with Crocker and Grier, all particles have to be considered simultaneously.

moved the distance L and are out of the cluster. If a cluster has more particles in the later frame than in the earlier, it is assumed that a particle has moved the distance L into the cluster. This case is illustrated in Figure2.3.

L

Figure 2.3: The filled discs are particles at frame k − 1 and the dotted and striped discs are particles at frame k. Each particle is surrounded with a ring of radius L. Filled particles are assigned to the dotted particles. The striped particle is assumed to

(23)

2.1.1 Discussion on Crocker and Grier

Crocker and Grier is included in this survey in order to give the reader an example of a tracking algorithm that is easy to understand and to introduce some concepts that are important for this work. One important concept introduced is that the tracking be-tween two consecutive frames can assigned a probability P ({δi}|t), and that the optimal

tracking is the most likely one.

The algorithm proposed by Crocker and Grier is in its original formulation not suited for tracking cells, since it has no way of dealing with cells splitting. Instead of proposing split assignments, Crocker and Grier will propose that new particles move from a dis-tance L into the clusters. Another problem is the splitting into clusters. After a split, the two resulting cells are likely to be close to each other. This implies that a population of cells that originate from one cell, will all be located in one cluster, with small inter cell distances. When choosing an L corresponding to some assumed maximal movement distance for the cells, this will in most cases be bigger than the inter cell distance in the cluster. Therefore no subclusters will emerge. This is illustrated as case II in Figure2.2. Without subdividing, the complete enumeration approach by Crocker and Grier would often be computationally infeasible. The concept of a maximum movement length L is still useful for simplification in more advanced techniques, but it can not be expected to cut the cells into more than one non-interacting cluster. The problem of making assignments between the cells in two consecutive frames will hereafter be called the linking problem1. Solving some kind of linking problem is a crucial part of all tracking techniques. In Crocker and Grier the linking problem is divided into many small linking problems, and each of these linking problems are solved with complete enumeration. As mentioned, this works if individual linking problems are small enough. An alternative to complete enumeration is to use mathematical optimization techniques to solve the linking problem. More about this in the proceeding sections.

2.2 Jaqaman et al

The algorithm proposed by Jaqaman et al [Jaqaman et al.,2008] is of great importance for this work, since the tracking algorithms developed in later chapters of this work are

1

(24)

compared to the algorithm of Jaqaman et al inchapter 6. To understand the algorithm, the concept of a track has to be introduced. A lineage tree (or tracking hypothesis) can be seen as a mathematical graph with no cycles. The lineage tree is composed of one or more connected components. A track is a connected component within the lineage tree. The first step of the algorithm proposed by Jaqaman et al is to make a segmentation that locates all cell centers. Subsequently, tracks are formed using a frame to frame approach, where cells are allowed to link to at most one cell in the proceeding frame. Thus, only incomplete tracks without splits are created at this stage. In the frame to frame tracking, the links are assigned costs, and the optimal tracking is the one that minimizes the total cost between the two frames. The link costs are functions of the distance between the cells and the intensities of the cells in the images. After the frame to frame tracking, the incomplete tracks formed in the frame to frame tracking are linked together by solving another optimization problem that considers all frames at the same time. In this optimization problem, links are allowed to connect: One cell with one cell in the proceeding frame (continue), one cell with two cells in the proceeding frame (split) and two cells with one cell in the proceeding frame (merge). Also for this optimization problem the link costs are functions of inter cell distances and cell intensities in the images.

The two different optimization problems are posed as linear assignment problems in bipartite graphs and can therefore be solved efficiently (O(n3) complexity)[Munkres,

1957].

2.2.1 Discussion on Jaqaman et al

(25)

With this modification, the resulting assignment problems will no longer be linear, since changing one assignment will affect the costs of other assignments. This implies that the efficient solution algorithms no longer are applicable, and other, more computationally demanding algorithms would have to be used to solve the resulting non linear linking problems.

2.3 Amat et al

An advanced technique on cell tracking was proposed by Amat et al [Amat et al.,2014]. For the terminology it should be noticed that since the technique is applied to 3D im-ages, it handles voxels rather than pixels. The image analysis pipeline used by Amat et al is slightly extended, compared to the pipeline presented in chapter 1. A simplified version of the algorithm can be explained in 3 steps:

In the first step, visually connected voxels are grouped into supervoxels. The purpose of this is to create a new representation of the images, that reduces the amount of data without loosing information useful for the tracking.

In the second step, the supervoxels are clustered using Gaussian Mixture Models (GMMs) at frame k = 0. For an introduction to mixture models, see [Picard, 2007]. It is as-sumed that ”location, shape and intensity of nuclei cannot change abruptly between time points” [Amat et al.,2014]. The clusters represent tracked cells.

In the third step, the clustering from frame k is propagated to frame k + 1. It is as-sumed that ”location, shape and intensity of nuclei can not change abruptly between time points. Using this assumption, the clustering at frame k + 1 is done with the clus-tering at frame k as a Bayesian prior [Bernardo and Smith,2009]. This propagation is then done repeatedly until the last frame.

In this simplified version of the algorithm, no splits are considered and the approach is greedy. In order to integrate splits in the tracking an addition is needed to the simplified version of the algorithm.

(26)

that are more computationally demanding.

2.3.1 Discussion on Amat et al

The algorithm by Amat et al [Amat et al.,2014] is included in the survey for complete-ness, because it is at the front line of cell tracking. This said, the algorithm can not be expected to yield good results at low frame rate. The tracking is based on the assump-tion that locaassump-tion, shape and intensity of nuclei remains similar between consecutive frames. This assumption becomes less valid the more the frame rate decreases. The use of spatiotemporal windows is also unfit for the case of low frame rate. A typical scenario considered by this work is average division times around one split every three frames. This would turn the whole computation into overlapping spatiotemporal windows, and the technique would be computationally too demanding.

2.4 Bayesian inference tracking

Bayesian statistics [Bernardo and Smith, 2009] can be used in order to assign proba-bilities to lineage trees of cells in an image sequence. This is of central importance for this work, since the tracking algorithms developed in this work are based on a Bayesian tracking model developed inchapter 3. In this section, the model used by Coraluppi et al in an article from 2011 [Coraluppi and Carthel,2011] will be examined. It should be remarked that the model presented in the article [Coraluppi and Carthel,2011] is of a general character and is present in other works, for example the book chapter ”Issues in the design of practical multitarget tracking algorithms” [Kurien,1990].

Some notation for the Bayesian model is now introduced: The set Zk_{= (Z}

1, ..., Zk) is the set of all measured objects, where Zk = (z0(k), ..., z (k) rk ) is

the set of measured objects at frame k. The element z_j(k) holds the measured position and size etc of the jth measured object at frame k. For brevity, the frame indices (k) and k are suppressed in z₁(k) and rk respectively, in cases where these indices are clear from

(27)

contains the position and shape etc of each cell at discrete times (t1, ..., tk). Part of

esti-mating Xkis determining which cell it is, if any, that is the cause of each measurement. This information is contained in the auxiliary discrete state history qk= (q1, ..., qk). An

element of the auxiliary state qk can either be:

• A continue assignment from a measured object in Z_k−1 to a measured object in

Zk. This means that the two measured objects correspond to the same cell.

• A death assignment on a measured object in Zk−1. This means that the

mea-sured object corresponds to a cell, but that meamea-sured objects after frame k − 1 corresponds to the same cell.

• A birth assignment on a measured object in Zk. This means that the measured

ob-ject corresponds to a cell, but that no measured obob-jects before frame k corresponds to the same cell.

• A false alarm assignment on a measured object in Zk. This means that the

mea-sured object has no corresponding cell.

An assignment can be seen as an ordered sets of integers, where the first integer de-termines whether it is a continue, birth, false alarm or death assignment. In case of a continue assignment the second and third integers are the indices of the corresponding measured objects in Zk−1 and Zk. In case of the non continue assignments, the second

integer is the index of the corresponding measured object. Since qk determines the lin-eage of the measured objects, it is referred to as a linlin-eage tree or tracking hypothesis. In Bayesian inference, tracking becomes find the state Xk∗ that fulfills:

Xk∗= arg max

Xk p(X

k_|Zk_). _(2.1)

But since p(Xk|Zk_{) is unknown without an interpretation of the measurements Z}k_,

Equation (2.1) is marginalized with respect to qk:

Xk∗ = arg max

Xk

X

qk_∈Qk

p(Xk|qk, Zk)p(qk|Zk). (2.2)

The set Qk, is the set of all possible lineage trees qk.

(28)

where the state Xk is estimated as the state with the minimum mean square error (MMSE), given Zk and the most likely auxiliary state history qk∗ :

Xk= X_{M M SE}k (Zk, qk∗) = E[Xk|qk∗_{, Z}k_], _(2.3)

qk∗= arg max

qk_∈Qk(p(q

k_|Zk_)). _(2.4)

When Equation (2.4) has been solved, the most likely lineage tree for the measurements is known as qk∗. If the measurements Zk are expected to have some error and the cells have some expected behavior, for example with respect to their trajectories and their shape development etc, Equation (2.3) is used to calculate the true state Xk, where for example the true positions and shapes can be different than measured in Zk. Normally when tracking for single-cell analysis, it can be assumed that position, size etc is correctly measured in Zk (it is not assumed that every measurement corresponds to a real cell). If this is assumed, it is not necessary to solve Equation (2.3), since the state history Xk

obtained by combining Zk _{and q}k∗ _{has a MMSE of zero by assumption.}

To solve the optimization problem in Equation (2.4), a model for p(qk|Zk_{) is required.}

In the article [Coraluppi and Carthel,2011], a model for p(qk_|Zk_{) is developed in several}

steps. Firstly p(qk|Zk_{) is rewritten in a recursive form using Bayes theorem [}_Bernardo

and Smith,2009]. p(qk|Zk) = p(q k_{, Z}k₎ p(Zk₎ = p(Zk|qk, Zk−1) · p(qk|qk−1, Zk−1) · p(qk−1|Zk−1)p(Zk−1) p(Zk₎ = p(Zk|q k_{, Z}k−1_{) · p(q} k|qk−1, Zk−1) · p(qk−1|Zk−1) ck (2.5)

In finding the most likely hypothesis, only the relative likelihoods for different hypotheses

are needed. The constant ck = p(Z

k₎

p(Zk−1₎ is independent of the tracking hypothesis, and

can therefore be ignored. Therefore, Equation (2.5) is rewritten with a proportionality:

p(qk|Zk) ∝ p(Zk|qk, Zk−1) · p(qk|qk−1, Zk−1) · p(qk−1|Zk−1). (2.6)

(29)

three factors. The last factor is the probability for the tracking up to frame k − 1. If the two other factors and a starting distribution is known, the right hand side of Equation (2.6) can be calculated recursively for any k. In the following paragraphs, Equation (2.6) is developed for the frame k. Since all the calculations regard the frame k, the index k is suppressed for many of the introduced quantities.

The factor in the middle of the right hand side of Equation (2.6) is the so called prior. The prior provides prior belief about the probability of a lineage tree qk, before the

measurements Zk are known. To develop the prior, some new concepts and quantities

have to be introduced, together with some assumptions:

The number of measured objects at frame k is denoted r. It should be stressed that the true r is unknown when developing the prior, since Zk is not given. In the prior

p(qk|qk−1, Zk−1), the lineage tree qk−1 is given, which implies that it is known whether

the jth measured object zj ∈ Zk−1 is a false alarm or not. The measured objects at

frame k − 1, not declared as false alarms, are referred to as track ends. The number of track ends at frame k − 1 is denoted τ . The number of dead track ends at k − 1 is denoted χ and it is assumed that the probability that a track end dies is pχ. Every track

(defined as insection 2.2) that has a non dead track end in frame k − 1 will continue into frame k. The probability that the track will be detected in frame k by the segmentation

is assumed to be pd. The number of detected tracks at frame k is denoted d. The

number of track births at frame k is denoted b, and it is assumed that b comes from a Poisson distribution with intensity λb. Every measured object at frame k must be either

a false alarm, a detected track or a track birth. Therefore, the number of false alarms is r − d − b. It is assumed that the number of false alarms at frame k is Poisson distributed with intensity λf a.

Using the notation and assumptions from the preceding paragraph, the prior can be written: p(qk|Zk−1, qk−1) = exp(−λb− λf a)λrf a r! p χ χ((1 − pχ)(1 − pd))τ −χ−d (1 − p_χ)p_d λf a dp_dλ_b λf a b . (2.7) For the calculations leading to Equation (2.7), it is referred to the article [Coraluppi and Carthel,2011].

The first term at the right hand side of Equation (2.6), p(Zk|qk, Zk−1), is the

proba-bility of Zk given the lineage tree qk and the measurements Zk−1. Since qk is given in

(30)

assignment aj ∈ qk. The assignment aj belongs to one of the three sets Jd, Jb and Jf a,

depending on whether it is a continue, birth or false alarm assignment respectively. The notation

Y

j∈Jd

g(zj)

means the product of some function g(zj) for each zj associated with a aj ∈ Jd. The

assumption that the probability of each measurement zj given its assignment in qk and

Zk _{is independent of each other leads to:}

p(Zk|qk, Zk−1) = Y j∈Jd fd(zj|qk, Zk−1) Y j∈Jb fb(zj|qk, Zk−1) Y j∈Jf a ff a(zj|qk, Zk−1). (2.8)

The probability densities fd, fb and ff a in Equation (2.8) should be chosen by suitable

assumptions. In the reviewed article [Coraluppi and Carthel, 2011], position measure-ments is the only type of measuremeasure-ments taken into account. If the cells are assumed to move according to Brownian dynamics, fd(zj|qk, Zk−1) would be a multidimensional

Gaussian distribution centered in the track end leading to zj. The probabilities of births

and false alarms can be assumed equal everywhere, which would render fb(zj|qk, Zk−1)

and ff a(zj|qk, Zk−1) uniform distributions over the whole area where objects can be

measured.

Now, the Equations (2.7) and (2.8) are put back into the right hand side of Equation (2.6): p(qk|Zk_{) ∝ p}χ χ((1 − pχ)(1 − pd))τ −χ−d Q zj∈Jd (1−pχ)pdfd(zj|Zk−1,qk) λf aff a(zj|Zk−1,qk) Q zj∈Jb pdλbfb(zj|Zk−1,qk) λf aff a(zj|Zk−1,qk)· p(q k−1_|Zk−1₎_. (2.9) For the calculations leading to Equation (2.9), it is again referred to the article [Coraluppi and Carthel, 2011]. Since Equation (2.9) is a proportionality, some factors from the Equations (2.7) and (2.8) have been omitted in Equation (2.9). The omitted factors are independent of qk.

Now that p(qk|Zk_{) is known up to a normalizing constant, attention is turned to solving}

(31)

equivalent to solving the logarithm of the optimization problem:

qk∗= arg max

qk_∈Qk(log(p(q

k_|Zk_))). _(2.10)

If Equation (2.10) is solved, Equation (2.9) turns into a sum of values, each connected to a particular assignment in the lineage tree qk. If every value for every assignment in every possible lineage tree is calculated, the optimization problem in Equation (2.10) can be solved as a constrained linear integer programming problem [Papadimitriou and Steiglitz,1998], where the constraint is qk∈ Qk_.

Solving Equation (2.10) is generally computationally too demanding without heuristics. Coraluppi et al [Coraluppi and Carthel,2011] suggests a heuristic called n-scan pruning to simplify the problem. This heuristic will now be explained shortly:

The central assumption of n-scan pruning is that the tracking up to frame k has no influence on the tracking at frame k + n, for some fixed n. The n-scan pruning scheme follows:

• Find the optimal tracking between frame 1 and n (using some optimization tech-nique).

• Keep the tracking between frame 1 and 2. (It is assumed to be optimal).

• Find the optimal tracking between frame 2 and n + 1, starting from the known tracking between frame 1 and 2.

• Continue in the same manner until the last frame.

2.4.1 Discussion on Bayesian inference tracking

(32)

by Coraluppi et al in some aspects. One aspect is that the modified model has explicit split assignments, which is not present in the model used by Coraluppi et al. The reason

why a modified Bayesian model is developed in Chapter 3, is that it allows for the

(33)

A Bayesian Model that

Incorporates Biological

Knowledge in Cell Tracking

In this chapter, A Bayesian model that incorporates biological knowledge in tracking will be developed. For brevity, it will be referred to as the BMIBK. BMIBK is a modification of the model used by Coraluppi et al [Coraluppi and Carthel,2011] that is presented in Section2.4, which will be referred to as the reference model.

First it will be discussed why modifications are necessary in order to incorporate bio-logical knowledge. After that the BMIBKwill be derived. The rest of the chapter is about how the confidence of an assignment in a lineage tree can be measured using the BMIBK.

3.1 Necessary modifications of the reference model for the

incorporation of biological knowledge

The biological knowledge considered in this work is listed in the project objectives in Section1.1.2. Here this list is repeated:

• Knowledge about from what random distribution the division times for the organ-ism comes.

(34)

• Knowledge about from what random distribution the size changes between two frames for the organism comes.

Incorporating point one in the above list into the tracking implies that the BMIBK should make lineage trees more or less likely depending on how the splits in the lineage tree are distributed. To make this possible, a model with split assignments that assign one measured object in one frame to two measured objects in the proceeding frame are necessary. Such split assignments are not part of the reference model. Therefore, split assignments are introduced in the BMIBK. Formally this implies qk will include a new

assignment type, which is a list of 4 integers. The first integer states that it is a split assignment, the second is the index of a measured object in Zk−1 and the third and the

fourth are two different indexes of measured objects in Zk.

Incorporating point two in the above list into the tracking also requires explicit split assignments, since the distribution of size changes often are different for the cases split and continue. In most cases, a cell increases its size between consecutive frames if it does not split and decreases its size if it splits.

Another necessary modification of the reference model is making the probabilities as-sociated with a certain assignment dependent on the previous tracking. In Section 2.4

the probability for detecting a track continue pdis introduced. The parameter pd is the

same for all track ends, but when taking the first point in the above list into account, the track ends should have individual probabilities for detected continues depending on when they split last time. In BMIBK pd is replaced by the two functions pc(qk−1) and

ps(qk−1), representing the probability that a particular track continues or splits

respec-tively. It is assumed that all cells are detected in the segmentation. This assumption is justified in the case when the segmentation have been manually corrected. If it can not be assumed that all cells are detected, the following scenarios about what can happen for a track between two frames have to be considered:

• The track dies

• The track continues and is detected • The track continues and is not detected

(35)

• The track splits, and none of the corresponding cells are detected

In most scenarios, the detection probabilities are different for the listed cases and can not be represented with a constant detection probability.

To keep BMIBK relatively simple, it is derived under the assumption that all cells are detected.

Another assumption made in order to keep BMIBK relatively simple, is that a cell splits at most once between two consecutive frames. This assumption is reasonable as long as the frame rate is higher than the average cell division rate.

In the BMIBK, the errors of the measurements are assumed to be negligible, so the estimated true state Xk, containing the position and size etc of every cell at every frame, is the combination of the estimated lineage tree qk and the measurements Zk. In going from a detection probability pdthat is the same for all track ends to the tracking

dependent functions pc(qk−1) and ps(qk−1), finding the most likely lineage tree can no

longer be posed as a linear optimization problem as it is done in Section2.4, Equation (2.10).

3.2 Deriving the BMIBK

The derivation of the BMIBK is similar to the derivation of the reference model. As a starting point we use Equation (2.6), which is the same for the BMIBK and the reference model:

p(qk|Zk) ∝ p(Zk|qk, Zk−1) · p(qk|qk−1, Zk−1) · p(qk−1|Zk−1).

As in the derivation in Section 2.4, p(qk|qk−1, Zk−1) is developed next. In order to

develop p(qk|qk−1, Zk−1), qk is partitioned into subsets:

• Jr, a set of the measured objects at k, |Jr| = r.

• Jc, a set of continue assignments from track ends in qk−1 to measured objects in

Jr, |Jc| = c.

• Js, a set of split assignments from track ends in qk−1 to measured objects in Jr,

(36)

• Jb, a set of birth assignments to elements in Jr, |Jb| = b.

• Jf a, a set of false alarm assignment to elements in Jr, |Jf a| = f a.

• Jχ, a set of death assignments to track ends in qk−1, |Jχ| = χ.

With this notation introduced, qk= Jr∪Jc∪Js∪Jb∪Jf a∪Jχ. Introducing the convention

that a false alarm assignment is represented by no assignment, qk= Jr∪ Jc∪ Js∪ Jb∪ Jχ

without loss of information.

It is worth noticing that both Jr and Zk denote sets of measured objects at frame k.

The set Zk is not given in the expression p(qk|qk−1, Zk−1), so p(qk|qk−1, Zk−1) is the

probability of the assignments qk given any hypothetical set of measured objects Jr.

A hypothetical set Jr does not provide measurement information like location and

size, etc of a measured object. When developing the prior p(qk|qk−1, Zk−1), measurement

information is given up to frame k − 1 by Zk−1. It is assumed that this information has no impact on the probability p(qk|qk−1, Zk−1), because there is no measurement

information at frame k. If for example a track continues from frame k − 1 to k, the location at k − 1 is irrelevant for the probability, since the location at k is unknown. Assuming that the measurement information has no impact on p(qk|qk−1, Zk−1) means

that p(qk|qk−1, Zk−1) = p(qk|qk−1).

The prior p(qk|qk−1, Zk−1) is now developed by repeated use of Bayes theorem [Bernardo

and Smith,2009]:

p(qk|qk−1, Zk−1) = p(qk|qk−1) = p(Jχ, Jc, Js, Jb, Jr|qk−1) =

p(Jχ, Jc, Js|qk−1)p(Jb|Jχ, Jc, Js, qk−1)p(Jr|Jχ, Jc, Js, Jb, qk−1).

(3.1)

The three factors in Equation (3.1) can now be handled separately. The first factor is p(Jχ, Jc, Js|qk−1). Every assignment in the sets Jχ, Jc and Js have one corresponding

track end. It is assumed that the probability that a track end at frame k should have the assignment aj is dependent on the tracking before frame k. Depending on whether

aj is a death, continue or split assignment, the probability that a certain track should

be assigned aj is denoted pjχ(qk−1), pcj(qk−1) and pjs(qk−1) respectively. The notation

Y

j∈Jχ

(37)

means: the product of the probabilities of all death assignments in the set Jχ.

Every assignment in the set jchas one corresponding measured object and every

assign-ment in the set Js has two corresponding measured objects. With r measured objects,

there are _(r−c)!r! _{(r−c−2s)!2}(r−c)! s different combinations of sets Jc and Js that are assigning to

the track ends in the same way. Assuming that all of these combinations are equally likely: p(Jχ, Jc, Js|qk−1) = Y j∈Jχ pj_χ(qk−1) Y j∈Jc pj_c(qk−1) Y j∈Js pj_s(qk−1) 1 r! (r−c)! (r−c)! (r−c−2s)!2s . (3.2)

The second factor in equation (3.1) is p(Jb|Jχ, Jc, Js, qk−1). Like in the reference model,

it is assumed that the number of births b is Poisson distributed with the intensity λb.

With this assumption we can write that

p(b|qk−1) = exp(−λb)λ

b b

b!

However, the b births can be chosen from the r − c − 2s not yet assigned measured objects in r−c−2s_b different ways. Assuming that each way is equally likely:

p(Jb|Jχ, Jc, Js, qk−1) = exp(−λb)λb_b b! 1 r−c−2s b . (3.3)

The third factor in Equation (3.1) is p(Jr|Jχ, Jc, Js, Jb, qk−1). The r measured objects

in Jr must either be births, continues, splits or false alarms. The number of false alarms

f a can thereby be written f a = r − b − c − 2s. The factor 2 stems from the fact that every split assigns 2 measured objects. As in the reference model, the number of false alarms is assumed to come from a Poisson distribution with intensity λf a. With this

assumption we can write

p(Jr|Jχ, Jc, Js, Jb, qk−1) =

exp(−λf a)λr−b−c−2sf a

r − b − c − 2s! . (3.4)

(38)

Combining the Equations (3.1), (3.2), (3.3) and (3.4) and crossing out ratios: p(qk|qk−1, Zk−1) = Y j∈Jχ pi_χ(qk−1) Y j∈Jc pj_c(qk−1) Y j∈Js pj_s(qk−1) · exp(−λ_b)λb_bexp(−λf a)λr−b−c−2s_{f a} 2s r!. (3.5)

As the first factor in Equation (2.6), p(qk|qk−1, Zk−1) now has been developed, focus is

moved towards the second factor, p(Zk|qk, Zk−1).

The factor p(Zk|qk, Zk−1) has the same meaning as in the reference model. Using the

same notation and assumptions as for developing Equation (3.6), p(Zk|qk, Zk−1) yields:

p(Zk|qk, Zk−1) = Y j∈Jc fc(zj|qk, Zk−1) Y j∈Js fs(zj1, zj2|q k_{, Z}k−1₎ Y j∈Jb fb(zj|qk, Zk−1) Y j∈Jf a ff a(zj|qk, Zk−1). (3.6)

The difference between Equation (2.8) and (3.6) is that the sets Jcand Js have replaced

the set Jd. fs(zj1, zj2|q

k_{, Z}k−1_{) has two arguments z}

j1 and zj2, since the split assignment

aj assigns to two measured objects. Suitable assumptions should be made for the f s in

BMIBK just as in the reference model.

The factor exp(−λb) exp(−λf a)

r! is the same for all hypotheses q

k_{, and has been left out in the}

(39)

can be substituted recursively, using Equation (3.7). If this is done for all frames k, the resulting equation will state that p(qk|Zk_{) is proportional to the product of the}

probabilities of all assignments in qk.

3.3 How the BMIBK is used for tracking in this work

In this work, the BMIBK will be used for tracking splitting bacteria over sequences of 2 dimensional images sequences with frame rate so low, that a standard tracker [Jaqaman et al.,2008] will yield inaccurate results. The tracking is done under the assumption that a correct segmentation is at hand. This section is about how parameters and distribution in BMIBK are specified. A complete list of parameters to specify for the tracking will be given at this end of this section (Table 3.1). The treatment of some of the already introduced parameters are listed below:

• pjχ(qk−1) = pχ is constant (same for all track ends and frames)

• λb and λf a are constant (same for all frames) and small since no births should

occur and false detections should be removed in the segmentation correction step • No undetected tracks are assumed. This is justified by the segmentation correction step. This implies that if the assignment aj and ai are a continue and a split

assignment respectively for the same track end, pχ+ pjc(qk−1) + pis(qk−1) = 1.

• f_b and ff a are both assumed to be uniform, since no information about these

distributions is at hand.

pjs(qk−1) = pjs(nsplit) is an arbitrary function, where nsplit is the number of frames

between the frame when the track with the assignment aj split for the last time and

frame k − 1. With arbitrary it means that the function can be assigned a value between 0 and 1 − pχ for each value of nsplit. This values correspond to observed probabilities

for the organism.

(40)

The continuous distribution fcis decomposed into two factors fc= fpos,c· fgrowth,c. The

probability distribution fpos,c is the distribution from which the random movements of

a cell is drawn when not splitting. It is assumed that the cell moves according to an uncorrelated two dimensional normal distribution with variance σpos = σpos,x = σpos,y

and expectation value µpos. µpos is assumed to be the present position plus the average

distance moved (in 2 dimensions) averaged over the past nmove frames. Because of L,

fpos,c is actually truncated, see Figure 3.1.

L Expected position at frame k

position at frame k − 1 fpos,c

Figure 3.1: The normal distribution fpos,cis centered in the expected position of the

cell at frame k. Only the part of fpos,c that is within the circle with radius L yields

feasible assignments, since L is the longest allowed moving distance.

If fpos,c should remain a probability distribution after the introduction of L, fpos,c has

to be renormalized to fulfill the probability distribution property R fpos,c = 1 when

in-tegrating over all space. In practice, calculating this normalization implies increasing the likelihood density within L gradually the further away the expected position µpos is

from the present position. To calculate these normalization, fpos,c should be integrated

over the disk with radius L shown in Figure 3.1. This has to be done for every track end, frame and hypothesis. Calculating the normalizations of fpos,c implies extra

com-putational work, so L is chosen to be so big that it can be assumed that most of the mass of fpos,c lies within the disk with radius L and that the effects of the truncation is

negligible.

(41)

choice between normalization and no normalization will have consequences for whether pjc(qk−1) is correctly chosen.

The probability density fgrowth,c takes into account that the area covered by the cell

should increase between consecutive frames. It is assumed that this growth is linearly proportional [Campos et al.,2014] to the present area and that it comes from a normal distribution. This distribution has expectation value

µgrowth,c= area · Mgrowth,c

and variance

σgrowth,c = area · Σgrowth,c,

where Mgrowth,c is the expected ratio between the cell areas in two consecutive frames

and Σgrowth,c is the variance of the same ratio.

Attention is now turned towards split assignments and the fs(zi1, zi2|q

k_{, Z}k−1_{). If a}

track end at frame k − 1 has N measured objects at frame k within a distance L,

N

2 = N (N −1)/2 different split assignments are possible, which means that the number

of assignment probability values that have to be calculated increase quadratically with average number of cells within a disc of radius L. To avoid this quadratic increase in computational complexity, it is assumed that a split assignment can be divided into two assignments, one to zj1 and one to zj2

1_{, and that the probabilities of the assignment}

to zj1 can be calculated independently of zj2 and vice versa

2_{. With the assumption}

that a split can be divided into two assignments, only the probabilities of N instead of N (N − 1)/2 assignments have to be calculated and

fs(zi1, zi2|q k_{, Z}k−1_{) = f} s(zi1|q k_{, Z}k−1_)f s(zi2|q k_{, Z}k−1₎

for some function density fs(zi|qk, Zk−1).

Similarly to fc, fs (fs denotes fs(zi|qk, Zk−1)) is decomposed into fs= fpos,s· fgrowth,s.

The dynamics of a split are quite different to those of a continue, since the cells are

1

In Section3.1it was stated that a split assignment will be represented as a list of 4 integers. With the modification that a split can be separated into two assignments, a split will be represented as two lists of 3 integers each.

(42)

”pushed aside” in the splitting moment [Letek et al.,2008]. It is therefore not satisfactory to use a normal distribution for fpos,s, instead a linearly decreasing probability function

with three parameters is introduced. The density fpos,scenters around µpos (same as for

fpos,c). At a distance l from µpos the distribution is cut off. The slope is determined by

cratio which defines the ratio between the distribution values at the cut of l and at the

center. A cross section of fpos,s is displayed in Figure 3.2.

The density fgrowth,s is defined like fgrowth,c with parameters Mgrowth,s and Σgrowth,s.

µpos

l h1

h2

Figure 3.2: Cross section of the density fpos,s. The parameter µposdetermines where

fpos,s is centered, and l determines how far from the center the distribution takes non

zero values. The ratio between the heights h2 and h1are determined by the parameter

cratio=h_h2₁

All needed model parameters that should be estimated from data are given in Table3.1.

Parameter Explanation

pchi Probability of track death

λb Intensity of track births

λf a Intensity of false alarms

L Maximal movement length

l Cut off length from expected split center

nstep Maximal number of steps for assessing expected position

cratio Ratio between linear distribution height at boundary and center

Mgrowth,c Expected proportional cell area increase at continue

Mgrowth,s Expected proportional cell area increase at split

Σgrowth,c Expected proportional cell area increase variance at continue

Σgrowth,s Expected proportional cell area increase variance at split

ps(nsplit) Arbitrary distribution governing ps and pc

(43)

3.4 Assignment confidence

The second project objective reads: ”Defining and implementing an assignment confi-dence measure that optimizes the manual correction of the tracking by providing infor-mation about whether the assignment is likely to be correct or not.”.

To make the confidence measure useful for the person doing the manual correction, it should answer the question: ”Are there many alternatives to this assignment, given a model and a complete set of parameters?”

Here a definition of a confidence measure is given in regard to a Bayesian inference model. The definition reads:

Definition 3.1. If Zk and Qk is the set of measured objects and tracking hypotheses qk for an images sequence with k frames respectively and p(qk|Zk_{) the probability of q}k

given Zk, then the confidence ca of the assignment a is given by the formula:

ca=

X

qk_∈Qk

p(qk|Zk_{)I(a ∈ q}k),

where I(a ∈ qk) is the indicator function that takes the value 1 if a is an assignment in qk and 0 otherwise.

ca=

X

qk_∈Qk

p(qk|Zk_{)I(a ∈ q}k),

where I(a ∈ qk) is the indicator function that takes the value 1 if a is an assignment in qk and 0 otherwise.

N

According to this definition, ca is equal to the relative frequency of a, weighted

according to the hypotheses probabilities. Intuitively, with this definition, ca answers

the question whether there are alternatives to a or not.

To calculate ca for an assignment a with Definition 3.1, knowledge of every tracking

hypothesis qk is required, which is problematic since a complete enumeration of

hypotheses is generally not possible. When it is not possible to make a complete

enumeration of hypotheses, cacan be approximated using an approximate distributions

ˆ

p(qk|Zk_{). Two methods for approximating p(q}k_|Zk_{) are developed in Chapter} ₄ _and

(44)

If an assignment a is at frame κ and p(qk|Zk_{) is known for several different k larger}

than κ, ca can be calculated using any of the p(qk|Zk). For the case when a complete

enumeration of hypotheses is possible and no approximations are necessary, the ca

resulting from the highest k should be used, since every extra set of measurements Zk, k > κ incorporated in p(qk|Zk) improves the tracking also at frame κ (non-greedy

approach, Chapter 2). In a realistic scenario when an approximation ˆp(qk|Zk_{) is used,}

the choice of the largest k might no longer be the best. This is discussed in the next subsection.

3.4.1 Approximate distributions and resolution

Let p(qk|Zk_{) be a probability distribution of all possible hypotheses q}k _{∈ Q}k_. _Let

ˆ

p(qk|Zk_{) defined on ˆ}_Qk _{⊆ Q}k _{be an approximation of p(q}k_|Zk_{) in the sense that}

ˆ

p(qk|Zk_{) ≈ p(q}k_|Zk_{) ∀ q}k _{∈ ˆ}_Qk _and

X

qk_∈Qk_{/ ˆ}_Qk

p(qk|Zk_{) 1.}

Definition 3.2. Let Qk be the set of all lineage trees for an image sequence. The

number of hypotheses in ˆQk _{⊆ Q}k _{that have different assignments at frame κ ≤ k is}

called the resolution at κ.

N

The term resolution is chosen, since it represents how ”fine grained” an approximation ˆ

p(qk|Zk_{) is at frame k. When calculating approximate confidences of assignments at}

a frame k using an approximate distribution ˆp(qk|Zk_{), a low resolution will make the}

approximations of the confidences crude, since just a small subset of all possible hy-potheses at that frame is taken into account. In the preceding section it was stated that when a sequence of distributions p(qk|Zk_{) for each k in an image sequence is at}

hand, the distribution p(qk|Zk_{) with the highest k should be used for the calculation}

of the assignment confidences in every frame. When instead a sequence of approximate distributions ˆp(qk|Zk_{) for each k in an image sequence is at hand, it is no longer clear}

(45)

every frame, since ˆp(qk|Zk_{) might have low resolution at some frames, resulting in crude}

approximations. The conclusion of this section is that when an approximate sequence of distributions ˆp(qk|Zk_{) for each k in an image sequence is at hand, the optimal choice}

of which distribution ˆp(qκ|Zκ_{) to use for the calculation of the assignment confidences}

should take resolution into account and might vary from case to case.

3.4.2 Displaying assignments and assignment confidence

The purpose of the assignment confidences is to make the manual correction step in the image analysis pipeline (subsection 1.1.1) more efficient. A confidence feature for assignments has been introduced, but how this should be used has not yet been discussed. One way to display a lineage tree is to superimpose it as links on top of the image sequence (Figure 3.3). In a manual correction step, links can be added or removed from the images.

Figure 3.3: C. glutamicum growing on a microfluidic chip. The assignments are indicated as colored bars. A continuum between red and blue indicate the confidence

in the assignments, where red is high confidence and blue is low confidence.

If a distribution of lineage trees is at hand, different approaches are possible for the visu-alization. One approach is to display the most likely tree together with the confidences. The confidences can, for example, be visualized with the color of the assignment link (Figure3.3). This is the visualization approach taken in this work.

(46)

A Markov Chain Monte Carlo

Approach to Tracking

In Chapter 3, a probability distribution for tracking hypotheses was derived. One way to produce an estimation of this probability distribution is by using a method called Markov Chain Monte Carlo (MCMC). A Markov Chain is a stochastic chain of states, where, being at a state xn, the probability that the state xm will be the next state in

the chain, only depends on the present state xn and not on the states preceding xn in

the chain. Monte Carlo methods is a class of methods for estimating the expectation Ef[g(x)] of a function g(x) of a random variable X, coming from some known distribution

f (x). MCMC uses a Markov Chain to generate samples from some distribution f (x)

and estimate Ef[g(x)]. A strength with MCMC is that it can produce samples from a

distribution f (x) even if only a function z(x) ∝ f (x) is known.

In the context of tracking, a Markov Chain of lineage trees can be generated and sampled from, in order to estimate the distribution p(qk|Zk_{). This chapter provides a short}

introduction to the theory of MCMC, and after that, it is explained how MCMC can be used for estimation of the probability distribution p(qk|Zk_{) in the BMIBK in Chapter}

3.

(47)

4.1 Markov Chains

To understand the concept of Markov Chain Monte Carlo, we first introduce the con-cept of Markov Chains. In the following paragraphs, many Markov Chain concon-cepts are introduced. These properties can be seen as prerequisites for the concept of a stationary distribution, which is fundamental for the proceeding introduction to MCMC. The in-troduction is restricted to the case of discrete time Markov Chains (DTMC), since only discrete Markov Chains are relevant for the application to tracking.

A DTMC is a sequence of random variables {Xn}n∈N fulfilling the Markov property

[Taha,1995]:

P (Xn= x|X1 = x1, X2= x2, ..., Xn−1= xn−1) = P (Xn= x|Xn−1= xn−1). (4.1)

This property is often interpreted as ”memorylessness” of a Markov Chain. At stage n, the outcome of Xn = xn is interpreted as the present state. When conditioned on the

state history x1, ..., xn, only the present state xnimpacts the probability of the outcome

Xn+1= xn+1.

If the random variables {Xn}n∈Ntake values in some finite state space S, the DTMC is

called a finite DTMC. If the probability of transition between two arbitrary states xi

and xj is independent of time, the DTMC is called time-homogeneous. For the case of

a finite, time-homogeneous DTMC, the transition probabilities can be summarized in a probability matrix P where an element pi,jdescribes the transition probability from state

xi to xj. In this work, we are only concerned with finite, time-homogeneous DTMCs

and therefore, the matrix notation pi,j will be used to indicate transition probability

between states xi and xj.

A Markov Chain is called irreducible, if any state can be reached within a finite number of transitions from any other state:

∀ xi, xj ∃k ∈ N : pki,j > 0.

The notation pk_i,j refers to the probability of going from state xi to state xj in k

transi-tions, which is equivalent to saying that pk_i,j is the (i, j) element in the matrix Pk. The state xi is called a periodic state with period k > 1, if it can only be reached

(48)

k = 1, it is called an aperiodic state. As soon as an irreducible Markov Chain has one aperiodic state, all states are aperiodic. This is easily realized through contradiction. A Markov Chain with no periodic states is called an aperiodic Markov Chain.

A stationary distribution π of a finite DTMC is a row vector with values in R+,

where the sum of the values in π is equal to 1, that fulfills the property:

πP = π.

P is a reducible matrix if its components can be divided into more than one non-intersecting index sets I1, I2, ...In where pi,j = 0 if i and j are in different index sets. P

is an irreducible matrix if its not reducible. It can be shown that an irreducible finite DTMC has an irreducible transition matrix P . The proof is technical.

If P is the transition matrix of an irreducible, aperiodic, finite DTMC, it can be shown that a unique stationary distribution π for the Markov Chain corresponding to p exists and

lim

k→∞P

k_{= 1π,} _(4.2)

where 1 is a column vector of ones with the same length as π. For a proof, it is referred to literature ([Norris,1998], section 1.7 and 1.8). Any vector v withP v = 1 fulfills:

v1π = π. (4.3)

From the Equations (4.2) and (4.3) we draw the conclusion that regardless of starting state, the probability for finding the Markov Chain in a certain state converges to the probability given by the stationary distribution, when the time variable k goes to infinity.

4.2 Monte Carlo Methods

(49)

is to estimate the expected value

Ef =

Z

g(x)f (x)dx (4.4)

of some function g(x), where f (x) is the probability density of some d dimensional

random variable X. If f (x) is known, Ef can be estimated with numerical methods.

When solving the integral with some numerical integration method, based on uniform discretization of the integration space, with order of accuracy l, the error σ of the estimation follows the law:

σ ∝ hl,

where h is the step length of the discretization.

For d dimensions and step length h, the number of grid points N to evaluate will follow:

N ∝1

h d

.

From this we can draw the conclusion that the number of calculations increases expo-nentially with the number of dimensions, if we want to keep the error constant. The exponential increase in computational complexity, makes integration strategies based on discretization unpractical, when the integration space is high dimensional.

A different approach to estimating (4.4) is provided by the Law of Large Numbers (LLN). The (LLN) is a theorem in probability theory stating:

Theorem 4.1 ([Durrett,2010], Theorem 2.4.1). Let X1, X2, ... be pairwise independent

identically distributed random variables with expectation E[|Xi|] < ∞. Let E[Xi] = µ

and Sn= X1+ ... + Xn.

Then Sn

n → µ a.s as n → ∞.

N

Monte Carlo methods are methods for estimating Ef, defined as in (4.4), by generating

random samples of the random variable X, distributed according to f (x), and taking the average of the values of g(x) for all generated samples. This average is then guaranteed to converge to Ef, as the number of samples goes to infinity, by Theorem4.1.

(50)

of Ef. If µn = _n1 n

P

i=1

g(xi) is the Monte Carlo estimator of (4.4) and µ the true value

of Ef, then the variance of µn V(µn) = V(_n1 n

P

i=1

g(xi)) = 1_nV(g(xi)), which implies that

the standard deviation D(µn) = √1_nD(g(xi)), which, in turn, implies that the standard

deviation of µn is proportional to √1_n. Notice that this estimator is independent of

d, the dimensionality of X. The argument in this section does not provide any direct comparison between the errors of the estimates from Monte Carlo methods and numeric quadrature, it points out how the computational complexity for the two methods evolve when the dimension d increases. A direct comparison is out of the scope of this work.

4.2.1 Metropolis-Hastings

Sampling from probability distributions is necessary for making Monte Carlo estimates. For some distributions, the process of sampling can in itself pose a non-trivial problem. One common problem when sampling, is that the probability distribution is known only up to a normalizing constant (for example the constant ck in Chapter 3). MCMC is a

class of algorithms for sampling from an arbitrary distribution z(x) only known up to a normalizing constant. The key idea behind Markov Chain Monte Carlo is to generate a Markov Chain with stationary distribution π ∝ z(x) and use the outcomes of the Markov Chain as random samples from z(x).

One important method for generating a Markov Chain with stationary distribution π(x) ∝ z(x), for some arbitrary distribution z(x), was presented by Metropolis et al [Metropolis et al.,1953] and generalized by Hastings [Hastings,1970], thus this method is known as Metropolis-Hastings (MH).

Central for the MH algorithm is the use of a transition density g(xi|xj), where g(xi|xj)

denotes the probability of going from the state xj to the state xi. The only formal

con-straint on g(xi|xj) is that the Markov Chain generated by g(xi|xj) has to be irreducible.

A state transition from a state xj in MH is a two step process. Firstly, a transition

to a state xi is proposed randomly using the transition density g(xi|xj). Secondly, a

stochastic rejection criterion will either accept or reject the transition to xi. If accepted

xi becomes the new state, if rejected the old state xj will become the new state. The

(51)

π(x) ∝ z(x). The MH scheme for generating N samples is as follows: Data: A starting state x0, N and a transition density g(xi|xj)

Result: N samples from z(x)

1 for i=1:N do

2 Draw xi0 ∼ g(x_i|x_i−1)

3 Draw u ∼ U (0, 1)

// U (0, 1) refers to the uniform distribution between 0 and 1

4 if u < max 1, z(xi0)g(xi−1|xi0) z(xi−1)g(xi0|xi−1) then 5 x_i← x_i0 6 else 7 x_i← x_i−1 8 end 9 end

Algorithm 1: Metropolis-Hastings algorithm

A proof of that the Markov Chain generated by Algorithm 1 has z(x) as its stationary distribution is presented by Hastings [Hastings,1970] for the case of finite state spaces. In this work, only Markov Chains on finite state spaces are considered, so the proof by Hastings [Hastings,1970] is sufficient.

At the beginning of Section 4.2.1it is claimed that only z(x) ∝ f (x) has to be known in order to generate samples from f (x). That this claim is valid can be seen directly in Algorithm 1, since z(k) only is present in Algorithm 1at line 4 in the ratio

z(xi0)g(x_i−1|x_i0)

z(xi−1)g(xi0|x_i−1),

and any normalization applied to z(x) will cancel out in the ratio.

As mentioned in Algorithm 1, a starting state x0 is necessary. The Markov Chain is

guaranteed to converge to π for any starting state. If x0is drawn from some distribution

ˆ