Decoding Neural Signals Associated to Cytokine Activity

(1)

Decoding Neural Signals

Associated to Cytokine

Activity

GABRIEL ANDERSSON

KTH SKOLAN FÖR TEKNIKVETENSKAP

(2)

Associated to Cytokine

Activity

GABRIEL Andersson

Master Thesis in Applied Mathematics and Data Science Date: March 5, 2021

Supervisors: Henrik Hult, Fredrik Viklund Examiner: Fredrik Viklund

Department of Mathematics, KTH

Swedish title: Identifiering av Nervsignaler Associerade Till Cytokin Aktivitet

(3)

Decoding Neural Signals Associated to

Cytokine Activity

Abstract

The Vagus nerve has shown to play an important role regarding inflammatory diseases, regulating the production of proteins that mediate inflammation. Two important such proteins are the pro-inflammatory cytokines, TNF and IL-1β. This thesis makes use of Vagus nerve recordings, where TNF and IL-1β are subsequently injected in mice, with the aim to see if cytokine-specific information can be extracted. To this end, a type of semi-supervised learning approach is applied, where the observed waveform-data are modeled using a conditional probability distribution. The conditioning is done based on an estimate of how often each observed waveform occurs and local maxima of the conditional distribution are interpreted as candidate-waveforms to encode cytokine information.

The methodology yields varying, but promising results. The occurrence of several candidate waveforms are found to increase substantially after exposure to cytokine. Difficulties obtaining coherent results are discussed, as well as different approaches for future work.

Keywords

Cytokines, Neural Signals, Vagus Nerve, Variational Inference, Variational Autoencoder

(4)

Identifiering av Nervsignaler Associerade

Till Cytokin Aktivitet

Sammanfattning

Vagusnerven har visat sig spela en viktig roll beträffande inflammatoriska sjukdomar. Denna nerv reglerar produktionen av inflammatoriska protein, som de inflammationsfrämjande cytokinerna TNF och IL-1β.

Detta arbete använder sig av elektroniska mätningar av Vagusnerven i möss som under tiden blir injicerade med de två cytokinerna TNF och IL-1β. Syftet med arbetet är att undersöka om det är möjligt att extrahera information om de specifika cytokinerna från Vagusnervmätningarna. För att uppnå detta designar vi en semi-vägledd lärandemetod som modellerar dem observerade vågformerna med en betingad sannolikhetsfunktion. Betingandet baseras på en uppskattning av hur ofta varje enskild vågform förekommer och lokala maximum av den betingade sannolikhetsfunktionen tolkas som möjliga kandidat-vågformer att innehålla cytokin-information.

Metodiken ger varierande, men lovande resultat. Förekomsten av flertalet kandidat-vågformer har en tydlig ökning efter tidpunkten för cytokin-injektion. Vidare så diskuteras svårigheter i att uppnå konsekventa resultat för alla mätningar, samt olika möjligheter för framtida arbete inom området.

Nyckelord

(5)

Acknowledgments

The author would like to thank his supervisors, Henrik Hult and Fredrik Viklund for their help and guidance throughout the project.

Stockholm, March 2021 Gabriel Andersson

(6)

Chapter 1 Introduction

1.1 Bioelectronic Medicine

Bioelectronic medicine is an emerging discipline that combines molecular medicine, neuroscience, engineering and computing to develop devices to diagnose and treat disease. This discipline relies on standard methods of molecular medicine, including identifying disease with a protein-specific mechanism in the cell. However, in contrast to traditional medicine where you design molecules to interact with proteins, Bioelectronic medicine makes use of the nervous system to achieve a desired affect. By relating disease with specific nerve-interactions, treatment could be achieved by building devices to modulate these nerves in a patient specific manner. Traditional pharmaceutical drugs usually lack anatomical and cellular specificity and are furthermore not designed to adapt to individual treatment. Off-target effects, as well as under- and overdosing are thus common using molecular drugs. Advances in bioelectronic medicine hold promise to address some of these challenges and provide personalized treatment of disease. An overview of bioelectronic medicine is given in [10].

Neural reflexes are central for maintaining physiological homeostasis, keeping the different conditions in the body within their narrow range of optimal function to maintain life. Changes in the cellular environment activates sensory neurons to transmit signals in the form of action potentials to the central nervous system. This is the first step of a reflex circuit that control homeostasis. In the context of inflammation, this type of circuit is referred to as the inflammatory reflex, [1,16]. For instance, the production of cytokines

(10)

— small proteins that mediate inflammation, are known to activate sensory neurons, resulting in action potentials through the Vagus nerve [12,13], see Figure2.1. The Vagus nerve runs from the brain through the face and chest down to the abdomen. These signals are then followed by a chain reaction of neural response signals and molecular-ligand secretion that eventually regulate cytokine production. The inflammatory reflex is of great importance. The World Health Organization has ranked chronic inflammatory diseases as one of the greatest threats to human health [11].

1.2 Previous Studies

Studies have shown promise in reducing pro-inflammatory cytokine levels and improving the condition of people suffering from inflammatory diseases such as Rheumatoid arthritis and Crohn’s disease, using bioelectronic devices to stimulate the Vagus nerve [3,9]. Although promising, these trials are experimental in their nature and a better understanding is needed. One of the many challenges in need of a solution regards learning how to extract information about the inflammatory status of cells from neural recordings. Earlier work has shown that production of the pro-inflammatory cytokines, IL-1β and TNF, both are inhibited by action potentials in the Vagus nerve [13]. However, it still remains unclear if it is possible to extract information, specific to each of the cytokines, from Vagus nerve recordings.

Recent research in bioelectronic medicine shows intriguing results in developing methods to discriminate between the two different cytokines [18]. In this study, waveform-data was collected from an experimental setup where an electrode is placed on the Vagus Nerve in anesthetized mice. The electronic activity was recorded for a period of 90 minutes. The mice were during this time subsequently injected with the cytokines TNF and IL-1β. Waveforms were extracted from the Vagus nerve recordings and sorted into different groups based on their shape and amplitudes. The methodology in [18] included reducing the dimensionallity of the waveforms using t-SNE and then preforming the unsupervised clustering method DBSCAN. The firing rates were then computed for the different clusters and the results in [18] revealed sensory neural groups responding specifically to TNF and IL-1β, in a dose-dependent manner. These results provides a proof of concept of the possibility to extract cytokine specific information from the Vagus nerve, although many question are left to be answered. The novelty of their work comes with the

(11)

important issue of reproducibility, and the question of how the information is encoded remains.

1.3 Aim and Contribution of Thesis

This work makes use of the recorded data from [18], with the aim to reproduce their results using a new methodology, and furthermore investigate how the information is encoded. A type of semi-supervised learning approach is designed, by artificially labeling each observed waveform into groups representing an estimated change of occurrences after the time of exposure to the different cytokines. By then constructing a probabilistic model of the labeled data, using a conditional variational autoencoder, a variant of gradient descent is performed to find the peaks of the conditional probability function. The found peaks are interpreted as possible candidate-waveforms to encode cytokine-specific information and used for further evaluation.

The obtained results show neural-waveforms that respond specifically to TNF in a similar manner as obtained by [18]. Regarding how the information is encoded, waveform-shapes that potentially encode TNF-information are evaluated over multiple different mice recordings. This indicates that TNF-information could be encoded similarly in different mice, but remains speculative.

(12)

Chapter 2 Background

This chapter provides some information about the biology of the Vagus nerve and an approach to model neural signals mathematically. The necessary definitions and derivations to enable the suggested methodology are provided.

2.1 Biology of Vagus Nerve

Sensory neurons are responsible for providing the central nervous system

(CNS) with information about any changes in the body’s internal and external environment. These changes range from mechanical to chemical and activates sensory neurons, resulting in action potentials to propagate through axons to the CNS, see Figure 2.1. An action potential refers to a rapid change of the electric potential over the nerve-cell membrane, which quickly propagates through the axon in a specific direction. Motor neurons are the opposite counterpart of sensory neurons, responsible for information propagating from the CNS, to achieve a desired action in the cellular environment. A nerve will refer to an enclosed bundle of many axons, which can be both sensory and motor in their nature.

A fundamental assumption in this work is the existence of cytokine specific information, measurable at the surface of the Vagus nerve. This claim is not at all obvious considering the complexity of the axon-composition. A schematic cross section of the Vagus Nerve is given in Figure 2.2. It it estimated that the nerve contains between 80.000 − 100.000 individual axons in a human, where about 80 percent of them are sensory [2,6]. The signals measured at

(13)

Figure 2.1: Schematic figure of information propagating to the CNS.

Figure 2.2: Schematic cross section of Vagus nerve.

the surface are thus a compound of the ∼ 105 _{individual action potentials,}

referred to as compound action potentials (CAPs). Furthermore, the Vagus nerve acts as an information pathway for many vital organs, and the majority of the axons are therefore encoding information unrelated to cytokines. This gives reason to believe that the surface-recorded CAPs will exhibit a very low signal-to-noise-ratio, due to structure of the nerve itself. It is therefore most likely that single axon-activation lies well below the noise-level. However, a simulation study [8], has shown that differences in spatial location and size of activated axon-groups, see Figure 2.2, induces surface-CAPs that differ in terms of their shape and amplitude enough to discriminate between them. One can from these results argue that the surface-CAPs mirror the activation of different axon-groups, which hopefully includes groups that are activated due to increased cytokine-levels in the cellular environment. This supports the assumption that there exist cytokine specific information in the surface-recorded CAPs.

The addressed difficulties in this section motivates the proposed methodology of labeling the waveforms, to narrow down the search of cytokine-encoding CAPs, in contrast to a straightforward clustering approach.

(14)

2.2 Mathematical Model of CAPs

After preprocessing, the data used for the analysis are given in the form, X = {xi}Ni=1, for xi ∈ Rd. This corresponds to N observed CAP-waveforms xi,

where each CAP is represented by a vector in Rd, (d = 141). We assume that

these observations are following some unknown distribution pX(x)and will

furthermore label the data and consider a conditional distribution pX|L(x|`),

such that

pX(x) =

X

`

pX|L(x|`)PL(`).

Here ` ranges over a finite set that will be defined in a later section. The proposed method of finding the maxima of a probability distribution relies on the ability to sample from the unknown distribution pX|L(x|`), (see Section

2.4). The following sections will aim to create a mathematical model of the CAP-waveforms and to find an approximate distribution of the observed data. To ease notation, the main derivations will be performed to model the unconditional distribution pX(x), but extends naturally to the conditional case

that will actually be used to find local maxima. The conditional distribution is introduced in Section2.5.

The form of the theoretical distribution pX(x)is not known. We will therefore

define a model distribution pθ(x), which ideally have enough freedom in terms

of some parameters θ to be similar to the theoretical distribution pX(x), given

some optimal choice of parameters θ∗_.

As mentioned in the previous section, the shape and amplitude of the observed CAPs are assumed to encode information about the activation of different axon-groups. Assume that a single activated axon-group corresponds to a unique surface CAP, denoted y, and referred to as a basis-CAP. Since we do not possess any direct control over the activation of the different axon-groups in the nerve, these basis-CAPs are unobservable by themselves. Let us furthermore assume the existence of K such basis-CAPs in total, {y1, y2, . . . , yK}. We

consider all observations xi to be some combination of the basis-CAPs

together with independent noise, and choose to model each observed CAP as an observation from a multivariate normal distribution,

xi ∼ N (µxi, σ2I),

where I refers to the identity matrix and µxi = fi(y1, y2, . . . , yK) for some

(15)

To model the activation of the different basis-CAPs yj, an unobserved (latent)

random variable Z ∈ Rk _{is introduced with some distribution p}

Z(z). The

notion of the basis-CAPs will hereafter be replaced by this latent variable. We let the mean functions fi belong to a deterministic family of functions

f (z; θ), parameterised by θ ∈ Θ, for some parameter-space Θ, such that f : Rk × Θ → Rd. We could then, given a good choice of the function f(z; θ) and the parameters θ, σ2_{, sample from something that is similar to p}

X(x)by,

1. Sample z ∼ pZ(z),

2. Evaluate µ = f(z; θ), 3. Sample x ∼ N (µ, σ2_I).

A key point here is that a random variable from some simple distribution pZ(z), which is easy to sample from, can be transformed to follow an arbitrary

complex distribution of interest pX(x), by mapping it through a sufficiently

complicated function f(z; θ) [4].

Using a Bayesian framework we can view the model above as sampling the latent variable z from a chosen prior pZ(z), which allows us to generate x from

some conditional distribution pX|Z(x|z). From above, we have the following

parameterised model of the likelihood,

pθ(x|z) = N (f (z; θ), σ2I). (2.1)

A good model should give rise to large probabilities for waveforms similar to our observations, {xi}Ni , and low probabilities otherwise. Sampling z directly

from a standard prior distribution pZ(z), might inhere with large variance in

the likelihood pθ(x|z), causing samples to deviate from the observations xi.

It could therefore improve the model if a x-dependence is introduced for the latent variable. To achieve this, we will make use of the posterior,

pZ|X(z|x) =

pθ(x|z)pZ(z)

pθ(x)

, (2.2)

when sampling z in Eq. (2.1). Using the introduced latent variable, the model marginal likelihood pθ(x)can be expressed as,

pθ(x) =

Z

Rk

pθ(x|z)pZ(z)dz. (2.3)

(16)

parameterised likelihood pθ(x|z). Therefore, the posterior in Eq. (2.2) will

also be intractable. To solve this problem, a parameterised approximation qφ(z|x), will be constructed for the posterior pZ|X(z|x).

The two parameterised distributions pθ(x|z) and qφ(z|x) should together

model the observed data well, given a good choice of parameters θ and φ. To achieve a suitable choice of parameters, the theory of variational inference is applied.

2.3 Variational Inference

With variational inference, we will approximate the true intractable distributions using optimisation as a tool to maximise the likelihood of the observed data. The model from the previous section will fit naturally into this setting, using the results from [7].

Recall, we have assumed the parameterised likelihood to be some multivatiate Gaussian,

pθ(x|z) = N (f (z; θ), σ2I).

We similarly let the latent posterior pZ|X(z|x), be approximated as,

qφ(z|x) = N (h(x; φ), τ2(x; φ)I),

where h(x; φ) and τ2_{(x; φ)}_{are parameterised functions such that h, τ}2 _{: R}d_×

Φ → Rk_{, for some parameter space Φ. The parameters of the likelihood}

mean-function f(z; θ), will be tuned jointly with the posterior mean-functions h(x; φ) and τ2(x; φ), using an optimisation scheme to find a good solution in terms of maximum likelihood of the observed data.

Assuming the observations to be independent, the log-likelihood can be expressed as, log pθ(x1, x2, . . . , xN) = N X i=1 log pθ(xi). (2.4)

This quantity is not straightforward to calculate since the distribution pθ(x)is

intractable. It is, however, possible to find a lower bound of the likelihood, which is tractable and differentiable w.r.t. the parameters θ and φ. This will allow us to use a gradient descent as a optimisation tool, which then can be implemented in e.g. tensorflow.

(17)

To find a lower bound of the log-likelihood, we start out with the expression given in Eq. (2.4). Using the approximate posterior qφ(z|x), it is possible (see

AppendixA.2) to write each intractable log-likelihood term as,

log pθ(xi) = DKL(qφ(z|xi)||pZ|X(z|xi)) + L(θ, φ; xi). (2.5)

The first term refers to the Kullback-Leibler (KL) divergence, (see Appendix A.1), between the approximate and true posterior of the latent variable. This term can not be evaluated since it contains the intractable posterior pZ|X(z|x).

It is, however, known that the KL-divergence is a positive quantity. The second term in (2.5) can be written,

L(θ, φ; x) = Eqφ[log pθ(x|z)] − DKL(qφ(z|x)||pZ(z)). (2.6)

Since the first term in (2.5) is known to be positive, L(θ, φ; x) serves as a lower bound for the log-likelihood — often called the Evidence Lower

Bound (ELBO). We have, in other words, that log pθ(x) ≥ L(θ, φ; x) and

by maximizing the ELBO, we are effectively maximizing the log-likelihood. Note that all the terms in (2.6) now are tractable for a given choice of f(z; θ), h(x; φ)and τ2_{(x; φ)}_{. Tractability here includes the possibility to estimate the}

expectations using a Monte Carlo method.

Maximising the ELBO in (2.6) is also effectively minimising the KL-divergence between the approximate and true posterior. Figure2.3illustrates the idea of optimising a parameterised family of distributions to be close to the theoretical posterior, where closeness is defined by the KL-divergence.

Figure 2.3: Each point within the ellipse represents a distribution from the family qφ(z|x). After a initial guess φinit, the parameters are optimised to be

(18)

The prior pZ(z)is chosen to be standard normal, N (0, I). This allows for an

analytic expression of the KL-divergence in Eq. (2.6), −DKL(N (h, τ2I)kN (0, I)) = 1 2 J X j=1 1 + log τ_j2− h2 j − τ 2 j,

where the latent space is assumed to be J-dimensional. For a derivation of the analytic expression, see Appendix in [7]. By estimating the remaining expectation in Eq. (2.6) using the naïve Monte Carlo estimator, we can write,

L(θ, φ; x) = 1 2 J X j=1 1 + log τ_j2− h2_j − τ_j2+ 1 L L X l=1 log pθ(x|zl). (2.7)

One issue remains to be solved before it is possible to optimise the parameters. We wish to differentiate the ELBO w.r.t the parameters θ and φ, but the estimator using (2.7) exhibits very high variance [7]. This is the case since the Monte Carlo estimator samples from the distribution,

qφ(z|x) = N (h(x; φ), τ2(x; φ)I),

which depends on φ. The problem can be avoided using what [7] introduced as the reparameterisation trick. The key is to introduce a new auxiliary variable ∼ N (0, I). This allows us to write z as a deterministic function of φ, given ,

z = h(x; φ) + τ2(x; φ). (2.8)

Here, refers to an element-wise product.

If we now let the parameterised functions h(x; φ), τ2_{(x; φ)} and f(z; θ), be

modeled as neural networks, we have arrived at a variational autoencoder (VAE). It is then possible to perform the optimisation of the parameters in tensorflow, using back-propagation with the loss specified as −L(θ, φ; x) from Eq. (2.7). The sampling of z is done using Eq. (2.8). This scheme is visualised in Figure2.4and after optimisation, the "optimal" parameters will be denoted φ∗_{and θ}∗_.

(19)

Figure 2.4: Schematic figure of the variational autoencoder. The network-outputs represent the distributions N (h(x; φ), τ2_{(x; φ)I)}_{and N (f(z; θ), σ}2_I)

for the encoder and decoder respectively. These resulting distributions are both contributing to the loss function that is used for the optimisation via back-propagation.

2.4 Gradient Ascent of Modeled Probability

Distribution

Finding the maxima of a distribution pX(x) is equivalent to minimizing the Shannon Information I(x) = − log pX(x). As with any continuous function, the basic gradient descent is performed iteratively by taking small steps in the opposite direction of the gradient, xn+1 = xn− γ∇xI(x), for some step size

γ. We start by evaluating the gradient, ∇xI(x) = − 1 pX(x) ∇xpX(x) = − 1 pX(x) Z Rk ∇xpX|Z(x|z)pZ(z)dz,

where we assume pX(x) to be sufficiently smooth to change the order of

integration and differentiation. The integral in the last expression can be interpreted as an expectation value w.r.t. pZ(z)and approximated using the

naïve Monte Carlo estimator,

EpZ(z)[∇xpX|Z(x|z)] ' 1 J J X j=1 ∇xpX|Z(x|zj). (2.9)

In Eq. (2.9), zj is sampled from the prior pZ(z). This may result in high

variance of the MC estimate, since the observations ximight be unlikely given

the prior. To reduce the variance, importance sampling is applied using the posterior pZ|X(z|x), Z Rk ∇xpX|Z(x|z)pZ(z)dz = Z Rk ∇x{pX|Z(x|z)} pZ(z) pZ|X(z|x) pZ|X(z|x)dz.

(20)

The full gradient can now be estimated as, ∇xI(x) ' − 1 pX(x) 1 J J X j=1 ∇x{pX|Z(x|zj)} pZ(zj) pZ|X(zj|x) , (2.10)

where zj is sampled from the posterior pZ|X(z|x). For an arbitrary term in

(2.10) we have up to a constant, using Bayes rule, − pZ(zj) pX(x)pZ|X(zj|x) ∇xpX|Z(x|zj) = − pZ(zj) pZ(zj)pX|Z(x|zj) ∇xpX|Z(x|zj) = −∇xpX|Z(x|zj) pX|Z(x|zj) . (2.11)

Now, using the model pθ(x|z) = N (f (z; θ∗), σ2I)of the likelihood pX|Z(x|z),

we can evaluate the last expression in (2.11) using the Gaussian pdf, −∇xpX|Z(x|zj) pX|Z(x|zj) = ∇xexp− 1 2σ2(x − f (zj; θ∗))2 exp− 1 2σ2(x − f (zj; θ∗))2 = x − f (zj; θ∗) σ2 ,

where the normalising constants of the Gaussian pdf canceled. Note that this is abusing notation a bit since x and f are multidimensional. The squares in the exponentials should be interpreted as inner products, and we have furthermore used that the variance is the same for all dimensions. Putting the results together, we have found that a step towards a local minimum of I(x), (and thereby maxima of pX(x)), is estimated by,

xn+1= xn− γ 1 J J X j=1 xn− f (zj; θ∗) σ2 , (2.12)

where we can use the model posterior qφ(z|x) to sample zj. The standard

form of gradient descent in (2.12) is, however, known to perform badly for complicated, high dimensional, functions and is prone to get stuck in poor local minima. There exists a variety of alternatives to increase the performance, using adaptive step sizes, momentum etc. For the purpose of this work, where we do not aim to find the global minimum, but rather a handful of "good" local minima, a smoothing parameter will be used for a fixed step size γ. The full

(21)

gradient descent takes the form, ˜ xn = xn+ ξ, ξ ∼ N (0, η2I) xn+1 = ˜xn− γ 1 J J X j=1 xn− f (zj; θ∗) σ2 , (2.13) where the latent variables {zj}Jj=1, are sampled from the approximate posterior,

qφ(z|xn) = N (h(xn; φ∗), τ2(xn, φ∗)I).

The smoothing parameters acts as a control over how sensitive the gradient descent is to local minima. A large value of the variance η2will result a smaller

number of local minima and vice versa.

We could from here implement a VAE to model the distribution pX(x)and

find the corresponding high probability regions. To consider the maxima of pX(x), however, may not be informative enough to find the CAPs encoding

cytokine activity, given the complexity described in Section 2.1. It may be that the cytokine-encoding CAPs are unlikely observations of pX(x)and that

it is necessary to limit the considered probability space to a subspace where we believe them to be more likely. We will therefore aim to construct a conditional probability distribution, pX|L(x|`), ` ∈ L, by labelling the data based on an

estimation of their change in occurrence at the time of cytokine exposure. The labels are obtained by defining a waveform similarity measure, (see Section 2.5). It is then possible to only consider the conditional probabilities

P(x | "Increased occurrence after cytokine exposure.").

2.5 Similarity Measure and Labeling of CAPs

Recall the important assumption that similar CAPs, in terms of shape and amplitude, encode similar biological information. Furthermore, the idea behind the experimental setup is that the occurrence of CAPs that encode cytokine-specific information, should increase significantly after the time of exposure to the respective cytokine. This idea motivates the method of restricting the "x-space" Rd, to a region where there hopefully is a higher

density of cytokine-encoding CAPs in pX(x).

(22)

each observed CAP as a candidate waveform, and then find all "similar" CAPs during the time of recording. By looking at the time of occurrence for each of the "similar" CAPs, we can estimate how many times per second this waveform occurs — referred to as the event-rate of the candidate-CAP. By subsequently looking at how the event-rate changes after the time of exposure, we can infer if it is likely or not to encode cytokine-information — and label it accordingly. The constrained probability space to consider in further analysis is then the conditional

P(x | "Increased event-rate after cytokine exposure.").

This require us to specify a quantitative definition of what is meant by one CAP to be "similar" to another. To define this, we make use of our model,

x ∼ N (µx, σ2I),

of the observed CAP-waveforms. Let us consider an observation, xj, to be a

candidate for the mean vector µx. It is then possible to look at how probable

each of the other observations are under this assumption, i.e to evaluate the probabilities pij = N (xi; µx = xj, σ2I)for i = 1, 2, . . . , N. Observations

that are likely enough, given some threshold pij > pthres, could then be

considered to encode the same information and defined as "similar".

One problem using this definition is the high dimensionallity of the normal distribution. We are working with observations in R141_{, which comes with}

the inherent feature of having almost zero probability mass around the mean µx— resulting in extremely small evaluated probabilities for all observations.

To avoid this, we can make use of the Gaussian Annulus Theorem, given in AppendixA.3. It states that nearly all of the probability mass of a spherical zero-mean, d−dimensional, Gaussian with variance σ = 1 is within a thin annulus at radius√d. The main result give us that the transformed data, xi =

xi−xj σ2 , should fulfill P( √ d − β ≤ ||xi|| ≤ √ d + β) −→ 1,

as d −→ ∞ for any β > 0, under the hypothesis that xi ∼ N (µx = xj, σ2I).

All in all, for a candidate xj, we define a threshold β > 0, and consider all

observations xi, fulfilling

kxi− xj σ2 k ≤

√ d + β,

(23)

to encode similar information. The lower bound is removed since it would not be reasonable to remove an observation xi, for being too similar to xj.

After labeling the data, we have obtained a conditional model of both the posterior and the likelihood that take the forms,

qφ(z|x, `) = N (h(x; φ, `), τ2(x; φ, `)I), (2.14)

pθ(x|z, `) = N (f (z; θ, `), σ2I). (2.15)

The labels ` ∈ L can take three values, corresponding to,

L = {Increased event-rate after exposure to first cytokine,

Increased event-rate after exposure to second cytokine, No increase in event-rate after exposure to cytokines}.

(24)

Chapter 3 Methodology and Implementation

This chapter provides a more hands-on description of how each step in the methodology is performed, with the aim to make the work reproducible. An overview of the methodology is given in Fig. 3.1.

Figure 3.1: Overview of the different steps in the methodology.

3.1 Experimental Setup Collecting the raw

Data

The raw data was collected by placing a small electrode on the surface of the Vagus nerve in anaesthetized mice, see Figure3.2a. The timeline for the experimental setup is given in Figure3.2b. Isoflurane is given initially as an anaesthetic and the two different cytokines are injected 30 and 60 minutes after the recording is started. This setup is done for 44 different mice where the order of the two different cytokines TNF and IL-1β varies. Another 7 mice are used for control, where saline (salt water solution) is injected at both 30 and 60 minutes. All recordings are available online.∗

(25)

(a) (b)

Figure 3.2: Experimental setup. (a) Electrode placement for Vagus nerve recording. (b) Timeline of events during recording, [18].

A visual inspection of the different raw recordings reveals a very large variation of the overall characteristics of the data, both regarding the amplitudes of the measured voltage and the apparent noise level. It is furthermore clear that some of the files are completely deficient, not lasting the predetermined period of 90 minutes. This caused many of the recordings to yield a small number of CAP-waveforms after preprocessing. About half of the recordings are therefore disregarded, resulting in 20 cytokine-injection recordings to be used in the analysis. All the 7 saline-injection recordings are used as control. Two examples of the raw recordings that will be considered in the analysis are plotted in Figure3.3.

(a) (b)

(26)

3.2 Signal Conditioning and Decomposition

The results in [18] are to the authors knowledge the only work that has presented successful results of decoding afferent nerve signals, (action potentials propagating towards the CNS), related to inflammation. The methods used to preprocess the raw Vagus nerve recording developed by [18] in MATLAB, are therefore used in this work as well. The two main parts in this preprocessing includes reducing the noise level and decomposing the continuous signal into individual, neurally-evoked events.

A high-pass filter is applied initially to remove frequencies that are unrelated to neural events. This is followed by a down-sampling step to reduce the computational time of the whole analysis, effectively reducing the signal sample frequency.

The measured signal is an aggregation of many different sources, in addition to the compound action potentials of interest. This includes cardiac and respiratory events, as well as other artifacts such as variations of the nerve-electrode contact. It is therefore important to discriminate between the neurally evoked events and events caused by other sources. A wavelet-transformation is applied to isolate cardiac events and emphasize CAPs, using pre-knowledge about the assumed duration of a cardiac and neural-event. An adaptive threshold is then applied to isolate individual CAP-instances. A baseline noise level is estimated using a local-in-time sliding window and any deviation larger than three times the local noise level is assumed to be a detected CAP. This is furthermore done for the cardiac-enhanced signal, and all CAPs found to temporally coincide with cardiac events are disregarded. This type of adaptive threshold is known as "smallest of constant false-alarm rate" (SO-CFAR).

The resulting data takes the form of an array, where each element is an identified (wavelet-convolved) CAP-waveform, see Figure 3.4. For a more thorough description of the steps in the initial signal processing, we refer to [18].

(27)

Figure 3.4: Examples of the identified CAP-waveform, using an adaptive threshold of the raw signal as done by [18]. The right figure plots a zoomed-in version of an observed CAP-instance to show the corresponding time scale. The final preprocessing is done in Python, where an amplitude threshold is applied to disregard CAPs with amplitude below or above set thresholds. The max-amplitude threshold is necessary to avoid the different gradient descents to diverge, caused by noisy samples. The min-amplitude threshold is thought to remove signals that are too weak to lie above the signal to noise ratio. Furthermore, an event-rate threshold is applied using the defined similarity measure. CAPs that are found to have a mean event-rate below a set threshold value are considered as noise and disregarded. Lastly, the CAPs are standardised by subtracting the mean and dividing by the variance. Although this might remove some information encoded in the amplitudes, it is inevitable for the gradient descent methods to converge. For further discussion on this topic, see Section5.2.

3.3 Labeling CAP-Waveforms

To use the Gaussian Annulus Theorem as a similarity measure, two parameters needs to be specified. These are the model variance σ2 _{in p}

θ(x|z, `) and

the threshold β. It is assumed that similar waveforms encode the same information. The parameters must therefore be chosen such that the CAP-waveforms within the threshold are similar enough for this assumption to be reliable. This has to be balanced against finding enough CAPs, similar to the candidate, to be able to extract any useful information from the event-rates. The similarity is evaluated by a visual inspection and the final choice of the variance was σ2 _{= 0.7}_{and the threshold β = 0.1.}

(28)

To actually label the CAPs, the event-rates are used to find CAPs with an increase in occurrence after injection. To quantify what is meant by "an increase", a baseline for the mean and variance is calculated for the period up until the different injection times. If the mean event-rate has increased with more than a fraction of the baseline standard deviation, then the CAP is labeled as "Increase after injection". Each CAP is only assigned one label, so if an increase after both injections is found, the larges increase is considered.

3.4 Conditional Variational Autoencoder (CVAE)

A Conditional Variational Autoencoder was implemented using tensorflow. The posterior functions h(x; φ, `) and τ2_{(x; φ, `)} _{was implemented as one}

single neural network with two different outputs, constituting the encoder of the CVAE. The likelihood, f(z; θ, `), is in the context of autoencoders known as the decoder, see Figure 2.4. A symmetric architecture was used with 3 dense hidden layers for both networks. This was the result after experimental runs starting small and increasing the number of nodes in the networks until a satisfying result was obtained. The CVAE assessment was based on the continuity of the latent space together with the result of sampling from the modelled likelihood pθ∗(x|z, `), where z is sampled from the posterior

qφ∗(z|x, `)and comparing the result to the original data point x, see Section

4.1. The latent space dimension was set to two, since the ability to plot the latent space allows for transparency of the model.

3.5 Gradient Ascent of Conditional pdf

The implementation of the iterative algorithm given in Eq. (2.13), is rather straightforward using the tensorflow-CVAE model. Considering the different parameters, the number of samples to estimate the expectations is set to J = 1, speeding up computations. The noisy realisation of the gradient using J = 1 have shown to be sufficient for a similar implementation using MNIST in [5]. The step-size γ was chosen as large as possible while still allowing the algorithm to converge. The noise variance η2_{, was tuned such}

that the number of found minima typically was around 4 − 8. This resulted in γ = 0.02 and η2 _{= 0.005}_{. The number of gradient descent steps was}

(29)

to separated clusters by visually inspecting the latent space representation.

3.6 Cluster High-Probability-Data-Points

Although the gradient descent should have separated the observed CAPs into distinct high probability regions, it is intrinsically not a clustering method. K-means is therefore applied as a separate method on the data points after the gradient descent iterations are performed. Given that the gradient descent has been successful, the clustering task should be easy enough for K-means perform well. A drawback of K-means is the need to specify the number of cluster, which is difficult when the results varies greatly from one recording to another. Therefore, a generous number of clusters is chosen for all mice-recordings, which sometimes result in many clusters encoding the same waveform. This is not an issue in this case, since all candidates will be evaluated further and any duplicates can simply be disregarded. This can be visualised in Figure 4.7c,d, where the red and purple curves clearly corresponds to the same waveform, but has been assigned to two different K-means clusters. The resulting event-rates are almost identical and one of them can be disregarded.

DBSCAN was considered as an alternative, since the number of clusters does not need to be specified. It was, however, difficult to find corresponding parameters that worked well for all recordings and it was therefore disregarded as a clustering method.

3.7 Evaluation of CAP-Candidates

After the clustering, we have for each recording obtained K, CAP-candidates for encoding cytokine information. These being the mean of each resulting K-means cluster. The final inference is done by studying the event-rates of the candidate-CAPs, to see if there is a significant increase in event-rate after the time of injections. This approach is similar to the labeling of each waveform, but now using a more restrictive parameter choice for the similarity measure. This by choosing σ2 _{= 0.5}_{, which is the actual modeled variance in the CVAE}

model.

(30)

CAP-candidate after the time of injection, we use a similar method as in [18]. The standard deviation σ0, was measured between 10 and 30 minutes after initial

recording and used as a baseline. A baseline mean µ0, was measured 4 minutes

before the considered injection and a threshold was set as

µ0+ k · min{σ0, σmin}, (3.1)

where σminis defined such that the results is less sensitive to any insignificant

changes in the event-rate. This threshold was then considered between 10 and 30 minutes after the injection time, and an event-rate increase is defined as significant if the event-rate was above the threshold (3.1), for at least one-third of the post injection time interval, (7 min). A mouse for which a significant increase was found after at least one of the two injections, was defined as a

responder. The parameters k and σmin where set looking at the distribution

of the mean and standard deviation of the event-rates using all observations. This lead to the choice of σmin = 0.3and k = 2.5.

(31)

Chapter 4 Results

This chapter initially provides an assessment of the different methods in the proposed methodology. The main results of the thesis are thereafter provided. Due to the unstable sleep-state of the mice during the first part of the recording, the first 10 minutes of the event-rates are disregarded in the analysis and therefore also from all figures.

4.1 Model Assessment

To evaluate the trained conditional variational autoencoder, we make draws from the model,

xsim ∼ N (f (z; θ∗, `), 0.5I),

where f(z; θ, `) is the decoder part of the CVAE and σ2 _{= 0.5}_{, the assumed}

model variance. We sample,

z ∼ N (h(x, ; φ, `), τ2(x; φ, `)I),

using the encoder and compare the simulated xsim with the observation x, to

(32)

(a) (b)

Figure 4.1: Results taking samples xsim from N (f(z; θ∗, `), 0.5I) where

z are sampled from the posterior N (h(x; φ∗, `), τ2(x; φ∗, `)I), for a given observation x.

The modeled mean looks accurate, and the model variance is within the right order of magnitude. It is, however, visible that the independence of the modeled covariance is not the best assumption, since it appears to be a positive autocorrelation in the noise of the observed CAPs. This is discussed more extensively in Section5.2.

The CVAE can be evaluated further by looking at the latent space representation of our data for the different labels. Figure4.2, shows the encoded latent-mean µz = h(x; φ∗, `), for each data point x, corresponding to the two labels of

interest for one of the recordings.

Figure 4.2: Visualisation of encoded latent space mean for CAP-waveforms labeled as "increased event-rate after- first" and "second injection".

(33)

The latent space representation in Figure 4.2 shows a similar behaviour as the prior N (0, I), which is expected using the KL-divergence term in the loss, (2.6). Note also that waveforms with different labels are encoded to the same latent distributions, as the encoded means are overlapping each other. The same encoded distribution in the latent space can thus represent different waveforms depending on the label, which is expected working with a conditional VAE. I.e. information about overall differences between the CAPs of different labels, can be encoded into the label itself.

To ensure that the latent-space is continuous, we look at how the encoded latent variable mean µz affects the decoded model mean µx = f (z; θ, `). For this

purpose, z is sampled deterministically from an evenly spread quadratic grid around 0 in the latent space — the region that from Figure 4.2, is shown to encode information. For each sampled z, the mean f(z; θ, `), of the resulting likelihood is evaluated and visualised in Figure4.3for two different labels.

(a) (b)

Figure 4.3: Visualisation of decoded latent space for two of the three different labels. (a) "Increase after first injection". (b) "No increase". It is clear that the decoded CAP-means in (a) are considerably more similar than in (b).

The latent space appears continuous — a small change in the sampled z-variable, results in a small change in the decoded CAP-mean µxin Figure4.3.

It is also visible that there is a considerable difference in the decoded CAP-means for the two different labels. There is furthermore less variation of the CAP-means for different values of z, in4.3a, compared to4.3b. This suggests that an increased event-rate after the first injection inhere with a relatively unique CAP-shape in this recording. All in all, the CVAE-model appears to do a good job in modeling the data, without any larger remarks.

(34)

(a) (b)

(c) (d)

Figure 4.4: Example results using the similarity measure. Given a candidate-CAP, all similar waveforms during the recording are found and plotted in grey, (a,c). Looking at the time of occurrence, the corresponding event-rates are calculated and plotted in (b,d).

To evaluate the similarity measure, waveforms that are assumed to encode similar information as a candidate-CAP are plotted in Figure4.4, together with their respective event-rate. The similarity measure appears to perform well, since the variation of the "similar" CAPs is close to the overall variance of the candidate-CAP. Furthermore, the choice of parameters seem to allow for enough similar CAPs to result in informative event-rates. These results varies for different CAP-candidates, as it most probably should, but we can see in Figure4.4that these CAP-shapes are observed a substantial amount of times. Note the similarity of the cluster means in Figure4.4 and the model means from the CVAE in Figure4.1. This is promising since the similarity measure is assuming the CAP-candidate to be the mean of the modeled normal

(35)

Figure 4.5: Example of how the modeled CVAE-mean compares with the similarity measure CAP-mean for a randomly chosen observation.

distribution. It would be ideal if the different means where identical, suggesting that the similarity measure is incorporating the normal assumption of the data. The different means can be more easily compared by including them in the same figure, see Figure 4.5. The modeled mean, µVAE, follows the actual

observation more closely then the similarity measure cluster-mean. The reason for this may be that the similarity measure parameters allows for more variance than what is modeled in the CVAE. This was the choice for the labeling of the data as described in Section3.3.

Finally, we look at the method of using gradient descent to find local maxima of the conditional distribution. To visualise the results, the latent means of the CAP-waveforms are plotted before and after the gradient descent for one of the recordings in Figure4.6.

Figure 4.6: Visualisation of how the gradient descent of a conditional distribution is resulting in clusters of encoded high-probability CAPs.

(36)

The result shows a handful of clusters for each label, as intended. Considering the second label, a few individual points still appears to be scattered. This either indicates small local maximums, or an insufficient number of iterations in the gradient descent. It is also clear that the clusters around (−7, −2), are caused by some artifact in the gradient-descent iterations, since no information was encoded in this region looking at the left part of Figure4.6. These type of artifacts resulted in decoded CAPs with zero event-rate, see e.g. yellow event-rate plotted in Figure4.7c.

All in all, the probabilistic model of the CAPs performs well, inhibiting similar characteristics as the observed data. The same goes for the similarity measure of the CAPs and the gradient descent, where no real remarks where found in the final evaluation. The methods are assumed to model the data successfully and will further be used to find cytokine encoding CAPs.

4.2 Candidate CAP-Waveforms

After labeling the data using the defined similarity measure, and training the CVAE, we apply the gradient descent method (2.13), to find high probability regions in the labeled data. Figure4.7 shows two examples of the resulting candidate-CAPs after applying K-means and their corresponding event-rates. The results in Figure 4.7a,b are using a conditioning on the label "increase after second injection" and in Figure4.7c,d "increase after first".

It is clearly some CAP-shapes that inhere with an increased event-rate after the injection-time in both cases. The results varies for different recordings, but you see some increase in the event-rates after injection for the majority of the recordings, although the magnitude of this increase varies. To achieve some more quantitative results, we apply the methods in3.7to look for responders.

(37)

(a) (b)

(c) (d)

Figure 4.7: Example result from K-means clustering of the labeled data after gradient descent. In (a,b), the distribution is conditioned on "increase after second injection". In (c,d), "increase after first injection".

4.3 Responders

Running the evaluation procedure described in Section3.7, on 20 recordings and 7 Saline-control, gave the results given in Table 4.1. Considering the results in Figure 4.7, the evaluation procedure found one CAP-candidate in 4.7a,b with a significant event-rate increase, while no CAP fulfilled the criteria in Figure4.7c,d, with the chosen parameters. Figure4.8shows four examples of the found CAP-waveform and corresponding event-rate for the responders to TNF.

(38)

Injection Number of Responders

TNF 6

IL-1β 0

Saline 1

Table 4.1: The number of found responders for the different cytokines out of the 20 recordings using the evaluation method from Section3.7.

(a)

(b)

Figure 4.8: Four example of the found responders to TNF. The found CAP-candidates are plotted together with their corresponding event-rates.

(39)

4.4 Fixed Candidate-CAP Shape Over Multiple

Recordings

It is of interest to see if the shape of a cytokine-encoding CAP transcend the different mice recordings. To see if there exists any shape-consistency for a responder CAP-waveform over different recordings, the candidate-CAP in the left part of Figure 4.8a is chosen as a TNF-encoding CAP-candidate for the 20 different recordings. The similarity measure is applied to calculate the event-rate in the same way as before. This yields varying results where 6 of the recordings show a visible increase in the event-rate after the TNF-injection. Two examples of these results are plotted in Figure 4.9. There

Figure 4.9: Using the same CAP-waveform as one of the responders to TNF showed interesting results for 6 of the 20 recordings, out of which two are shown here.

is, however, no clear consistency of the CAP-shape in general. This is not surprising considering the results in Figure 4.8, where the CAP-waveforms found to encode TNF information are different for the four recordings. The fact that the cytokine-encoding CAPs has different shapes for different mice is not, necessarily, unexpected. The shape of the recorded CAPs should mirror the spatial location of different axon-groups. Therefore, subtle differences in the electrode-placement for the different recordings could result in different shapes of recorded CAPs even if they actually are encoding similar information.

(40)

Chapter 5 Discussion

This chapter discusses what can be inferred from the obtained results, possible improvements of the methodology and future work.

5.1 Results

The proposed method yields many noteworthy results that potentially show cytokine encoding CAP-waveforms. It is, however, difficult to draw strong conclusion without further discussion. Comparing the results in Figure4.8a with4.8b, we see a large difference in the scale of the event-rates. This raises the question if the increase of the event-rates, in Figure4.8a, are large enough to be considered as significant — increasing from zero to approximately one CAP/sec after TNF-injection. To answer this, we consider the total number of CAPs/sec for the recording corresponding to the results in Figure4.8a, given in Figure5.1. The total event-rate is approximately 20-25 CAPs/sec after the injection, 30 minutes into the recording. The event-rates for the responder-CAPs in Figure4.8athus constitutes about 4 − 5% of the total event-rate, post injection.

To further put the scale of the event-rates in Figure 4.8a into context, the distribution of the mean event-rate and standard deviation in the considered recordings are plotted in Figure5.2.

(41)

(a) (b)

Figure 5.1: Total event-rates for the recordings in Figure4.8a.

(a) (b)

(c) (d)

Figure 5.2: Distribution of mean event-rate and standard deviation for the full time of recording corresponding to Figure4.8a.

(42)

Observe that the event-rate statistics in Figure 5.2 are obtained using the remaining CAPs after applying the amplitude thresholds, while the final evaluation is considering all observed CAPs. The statistics in Figure5.2 are thus considering about two thirds of all observations. Therefore, it is possible that the mean and standard deviations shown in Figure5.2should be increased with a factor of two, to be fairly compared with the event-rates in Figure 4.8a. Regardless, it gives us the correct order of magnitude of the event-rates, which clearly does not exceed 1 CAP/sec. Although this does not say anything specifically about the found CAPs relevance in terms of encoding cytokine, it gives reason to believe that an increase in event-rate of about 1 CAP/sec is large enough to be considered as significant.

We now have reason to believe that the resulting responders show a significant increase in occurrence after TNF-injection. The overall results are, however, inconsistent. About two thirds of the recordings did not show any CAPs with a significant increase of event-rate after cytokine exposure, see Table 4.1. Furthermore, no responders where found to encode IL-1β. This makes it hard to draw strong conclusions about the findings, although they indicate that it is possible to decode cytokine-specific information.

Note that the resulting number of responders is heavily dependent on the choice of parameters for the evaluation. You can for instance see an increase of the event-rates after the 30min injection in Figure4.7c, although this recording was not defined as a responder. If the parameters described in Section 3.7 instead was chosen to be σmin = 0.2and k = 2, we end up with the result

given in Table5.1. The parameters could probably be tuned to obtain an even better looking result, but defining the evaluation parameters simply based on the obtained result would arguably be an ad-hoc approach.

Injection Number of Responders

TNF 6

IL-1β 4

Saline 1

Table 5.1: The number of found responders using different parameters in the evaluation, see Section3.7.

It should be made clear that it is unknown if the cytokine-specific information lies above the noise level or not, using the experimental setup as done by [18]. The difficulties obtaining coherent result may therefore be caused by an inadequate experimental setup. Many of the raw recordings inhibits very

(43)

spurious behaviour and are furthermore differing in orders of magnitudes regarding voltage-amplitudes. It is, however, not possible at this point to determine if the inconsistencies is due to the proposed method or the quality of the recordings. Applying the methodology on new, independent Vagus nerve recordings, would be interesting to see if more coherent results could be obtained.

5.2 Modeling of CAPs

The suggested mathematical model of the CAPs shows to inhibit similar characteristics as the observations, see Section 4.1. There are, however, some ambiguities worth discussing. One of these regards the assumption of independent noise in the CVAE-model and similarity measure. The data under consideration are time series, which from inspection appears to inhibit some positive autocorrelation, i.e. positive correlation in the noise between consecutive samples in each CAP-waveform. Figure 5.3, shows the autocorrelation (ACF) and partial autocorrelation function (PACF) for the modeled noise of a representative CAP-waveform. This suggests that there exists room for improvements for the modeled noise, allowing for positive correlation between consecutive samples of the CAPs.

Figure 5.3: Representative example of the ACF and PACF for the noise of the modeled CAPs.

Another apprehension of the methodology regards the standardisation of the CAP-waveforms. The reason for applying a standardisation is mainly to stabilise the stochastic gradient descent during the training of the CVAE. Experimental runs without standardising the waveform were also showing bad result using the similarity measure and prone to errors in all gradient-descent methods. Although there might be information encoded in the amplitude of the CAP-waveforms, a standardisation is something that a robust method,

(44)

arguably, should be able to include. If the main information is obtained from the amplitude of the CAPs, (as by appearance is the case in [18]), it is possible that the candidate CAPs only mirror the overall activation level in the Vagus nerve. If one instead is able to find a unique standardised CAP-shape to encode cytokine information, it is arguably a more robust measure, since it would be less sensitive to the overall activation-level in the nerve.

Finally, it is worth discussing the proposed evaluation procedure. The methodology of using the event-rate in both the analysis and evaluation, is somewhat ad-hoc. The complexity described in Section 2.1, regarding the potentially low signal-to-noise-ratio, does however suggest that a more narrow search is necessary. The event-rate is the only measure to connect the observed CAPs with the encoding of cytokines. It should therefore be pointed out that a more independent evaluation is necessary before one can confidently conclude that the results are showing cytokine encoding CAPs.

5.3 Future Work

A main part of the inference in the proposed methodology is done during the gradient descent of the probability distribution. This method is relatively new, developed in the recent years by Henrik Hult and Adam Lindhe at KTH. Its use is therefore somewhat experimental, although it has shown success applying it to MNIST, [5]. It could be relevant to use the methodology of this work in conjunction with another clustering method. An approach that was considered, which would be interesting to see in future work, is to take inspiration from [17] and use a Gaussian Mixture Model (GMM) as prior for the latent variables, leading to the loss-term

DKL(pgmm(z)||qvae(z|x)).

This allows for a more natural multi-modal latent space, which possibly could increase the performance of the gradient descent. Another approach with a similar idea is to use a "Variational of posterior" (VAMP) -prior as developed by [15]. This method has shown to increase the ELBO for standard data sets, which therefore is likely to improve the approximate distributions of the CAP-data sets as well.

(45)

the ACF in the model, allowing for positive correlation in the noise between consecutive observations. A method which allows for prior information about the correlations using VAEs has been proposed by [14].

Considering the similarity measure, a basic correlation measure was also considered, in addition to the "Gaussian"-similarity measure, which showed similar performance. It was however disregarded since the former is more compatible with the probabilistic model. New approaches would be interesting to see if the results can be improved or if the methodology is robust in regards to different kind of similarity measures.

Finally, applying the proposed methodology on new recordings would be useful to see if the inconsistencies of the obtained results of this work is caused by faulty recordings or not. The Center of Bioelectronic Medicine at Karolinska Institutet, which this work is collaborating with, are planning a new experimental setup during March 2021. These recordings will hopefully give an answer to this question.

(46)

Chapter 6 Conclusion

In this thesis, we construct a probabilistic model of CAP-waveforms that are observed from Vagus nerve recordings in mice, with the aim to infer which waveform-shape encodes information about specific cytokines. A similarity measure of the CAPs is defined to estimate the event-rate of each CAP, and is further used to label the data according to prior belief about how likely each CAP is to encode information about the cytokines. A conditional variational autoencoder is implemented to construct the probabilistic model and further used to find maxima of the conditional probability densities,

P(x | "Increase in event-rate after injection.").

The found maxima are interpreted as possible candidates to encode cytokine-specific information. These are evaluated further by looking for any significant increase in the corresponding event-rates after cytokine exposure.

The method yields interesting results. CAP-waveforms are found in multiple recordings to have a substantial increase in the event-rate at the time of exposure to cytokines. Although promising, the results are not persistent for all case- and control recordings and furthermore only found for TNF, not IL-1β. The results are showing that the proposed method possesses potential for decoding sensory nerve signal but it is not possible to conclude, with confidence, that the method is successful in finding cytokine encoding CAP-shapes and more work is needed.

(47)

Bibliography

[1] U. Andersson and K. J. Tracey, “Reflex principles of immunological homeostasis,” Annual review of immunology, vol. 30, pp. 313–335, 2012. [2] H.-R. Berthoud and W. L. Neuhuber, “Functional and chemical anatomy of the afferent vagal system,” Autonomic Neuroscience, vol. 85, no. 1-3, pp. 1–17, 2000.

[3] B. Bonaz, V. Sinniger, D. Hoffmann, D. Clarencon, N. Mathieu, C. Dantzer, L. Vercueil, C. Picq, C. Trocmé, P. Faure et al., “Chronic vagus nerve stimulation in crohn’s disease: a 6-month follow-up pilot study,” Neurogastroenterology & Motility, vol. 28, no. 6, pp. 948–953, 2016.

[4] L. Devroye, “Sample-based non-uniform random variate generation,” in

Proceedings of the 18th conference on Winter simulation, 1986, pp. 260–

265.

[5] A. Eskandari, “Vae-clustering of neural signals and their association to cytokines,” Master thesis, KTH, diva2:1432505, 2020.

[6] H. H. Hoffman and H. N. Schnitzlein, “The numbers of nerve fibers in the vagus nerve of man,” The Anatomical Record, vol. 139, no. 3, pp. 429–435, 1961.

[7] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv

preprint arXiv:1312.6114, 2013.

[8] R. G. Koh, A. I. Nachman, and J. Zariffa, “Use of spatiotemporal templates for pathway discrimination in peripheral nerve recordings: a simulation study,” Journal of neural engineering, vol. 14, no. 1, p. 016013, 2016.

(48)

[9] F. A. Koopman, S. S. Chavan, S. Miljko, S. Grazio, S. Sokolovic, P. R. Schuurman, A. D. Mehta, Y. A. Levine, M. Faltys, R. Zitnik et al., “Vagus nerve stimulation inhibits cytokine production and attenuates disease severity in rheumatoid arthritis,” Proceedings of the National

Academy of Sciences, vol. 113, no. 29, pp. 8284–8289, 2016.

[10] P. S. Olofsson and K. Tracey, “Bioelectronic medicine: technology targeting molecular mechanisms for therapy,” Journal of internal

medicine, vol. 282, no. 1, p. 3, 2017.

[11] R. Pahwa, A. Goyal, P. Bansal, and I. Jialal, “Chronic inflammation,”

StatPearls [Internet], 2020.

[12] V. A. Pavlov and K. J. Tracey, “Neural regulation of immunity: molecular mechanisms and clinical translation,” Nature neuroscience, vol. 20, no. 2, p. 156, 2017.

[13] B. E. Steinberg, H. A. Silverman, S. Robbiati, M. K. Gunasekaran, T. Tsaava, E. Battinelli, A. Stiegler, C. E. Bouton, S. S. Chavan, K. J. Tracey et al., “Cytokine-specific neurograms in the sensory vagus nerve,”

Bioelectronic medicine, vol. 3, no. 1, pp. 7–17, 2016.

[14] D. Tang, D. Liang, T. Jebara, and N. Ruozzi, “Correlated variational auto-encoders,” in International Conference on Machine Learning. PMLR, 2019, pp. 6135–6144.

[15] J. Tomczak and M. Welling, “Vae with a vampprior,” in International

Conference on Artificial Intelligence and Statistics. PMLR, 2018, pp.

1214–1223.

[16] K. J. Tracey, “The inflammatory reflex,” Nature, vol. 420, no. 6917, pp. 853–859, 2002.

[17] L. Yang, N.-M. Cheung, J. Li, and J. Fang, “Deep clustering by gaussian mixture variational autoencoders with graph embedding,” in

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6440–6449.

[18] T. P. Zanos, H. A. Silverman, T. Levy, T. Tsaava, E. Battinelli, P. W. Lorraine, J. M. Ashe, S. S. Chavan, K. J. Tracey, and C. E. Bouton, “Identification of cytokine-specific sensory neural signals by decoding murine vagus nerve activity,” Proceedings of the National Academy of

(49)

Appendix A

Some Derivations and Definitions

A.1 The Kullback-Leibler Divergence, D

KL

The Kullback-Leibler divergence is a similarity measure between two distributions, which for the continuous case is defined as,

DKL(q(x)kp(x)) =

Z

q(x) log(q(x) p(x))dx.

It is a positive quantity which is zero only for p(x) = q(x). There exists many possible interpretations of this quantity, but for the purpose of this work it is enough to regard it as a similarity measure. Note that it does not qualify as a metric, since it is not a symmetric quantity.

A.2 Derivation of ELBO

To derive the expression of the marginal log-likelihood in terms of the evidence lower bound,

(50)

we start by considering the KL-divergence between the approximate and true posterior of the latent variable. We have that,

= Eqφ[log qφ(z|xi) + log pX(xi) − log pX|Z(xi|z) − log pZ(z)].

The margnial log-likelihood, log pX(xi), is independent of qφand can therefore

be moved out of the expectation. By rearranging the terms and identifying a part of the expectation in the last term as a KL-divergence, we arrive at the desired expression for the log-likelihood.

logpX(xi) = Eqφlog qφ(z|xi) + log pX(xi) − log pX|Z(xi|z) − log pZ(z)

− DKL(qφ(z|xi)kpZ|X(z|xi))

= DKL(qφ(z|xi)||pZ|X(z|xi)) + L(φ; xi).

A.3 Gaussian Annulus Theorem

Consider the d-dimensional spherical Gaussian with mean zero and variance σ2 = 1in each dimension, f (x) = 1 (2π)d/2exp −kxk 2 2 . (A.2)

For any β <√d, (A.2) has all but most 3e−cβ2 of the probability mass in the

annulus _√

d − β ≤ kxk ≤√d + β, where c is some fixed constant.

(51)

Decoding Neural Signals Associated to Cytokine Activity

Decoding Neural Signals

Associated to Cytokine

Activity

GABRIEL ANDERSSON

Associated to Cytokine

Activity

GABRIEL Andersson

Decoding Neural Signals Associated to

Cytokine Activity

Abstract

Keywords

Identifiering av Nervsignaler Associerade

Till Cytokin Aktivitet

Sammanfattning

Nyckelord

Acknowledgments

Contents

Chapter 1

Introduction

1.1

Bioelectronic Medicine

1.2

Previous Studies

1.3

Aim and Contribution of Thesis

Chapter 2

Background

2.1

Biology of Vagus Nerve

2.2

Mathematical Model of CAPs

2.3

Variational Inference

2.4

Gradient Ascent of Modeled Probability

Distribution

2.5

Similarity Measure and Labeling of CAPs

Chapter 3

Methodology and Implementation

3.1

Experimental Setup Collecting the raw

Data

3.2

Signal Conditioning and Decomposition

3.3

Labeling CAP-Waveforms

3.4

Conditional Variational Autoencoder (CVAE)

3.5

Gradient Ascent of Conditional pdf

3.6

Cluster High-Probability-Data-Points

3.7

Evaluation of CAP-Candidates

Chapter 4

Results

4.1

Model Assessment

4.2

Candidate CAP-Waveforms

4.3

Responders

4.4

Fixed Candidate-CAP Shape Over Multiple

Recordings

Chapter 5

Discussion

5.1

Results

5.2

Modeling of CAPs

5.3

Future Work

Chapter 6

Conclusion

Bibliography

Appendix A

Some Derivations and Definitions