The Fast Multipole Method for Ion Mobility Spectrometry: A Proof of Concept

(1)

The Fast Multipole Method for Ion Mobility Spectrometry: A Proof of Concept

Ky Gurasz September 28, 2017

Bachelor degree project 15 hp Supervisor: Erik Marklund

(2)

Abstract

In order to decipher the structure of proteins and other compounds, new methods are needed. Ion mobility spectrometry (IMS) has emerged as a tool that has potential to quickly and efficiently analyse structural motifs (Lanucara et al. 2014). IMS is a tool used to separate different ions based on their travel time through fixed distance, using a specific, inert carrier gas. The ion’s travel time is then influenced by size, shape and charge. Structural analysis uses this data to work out the theoretical shape of a molecule from other known data. How- ever, the theoretical model needed to turn this travel time into useful structural data gets steadily more computationally intensive as the number of atoms in the molecule increases. More than a day is needed for a medium sized structural model’s drift time to be analysed using todays most rigorous class of methods (MOBCAL). The aim of this project has been to overcome this need for long computing times using a method known as the fast multipole method (FMM), developed by Greengard and Rohklin for electrostatic interactions (Greengard &

Rohklin 1987). The key idea behind FMM is to take a group of atoms which are far away and count them as one single atom; since they are so far away, this leads to only a minor error. We now only need to do one calculation on that single atom, instead of doing one for each of the original ones. Implementing this method, with adaptations, we achieve a substantial increase in performance compared to direct calculations of each individual atom against all others for large datasets (above 10⁵ data points). FMM calculations have significant overhead and as such are much slower for small datasets, compared to direct calculations, since direct calculation time is directly proportional to the size of the dataset. Therefore using the FMM for IMS based calculations is very feasible for large datasets. Further work to be done includes, but is not limited to, updating the algorithm to calculate forces and in addition what the optimal minimum distance is for grouping points together.

(3)

1 Introduction

1.1 Background

1.1.1 Ion mobility spectrometry

Figure 1. Cutaway of a drift tube in an ion mobility spectrometer (Wiki- media Commons 2017).

Ion mobility spectrometry (IMS) is an analytical technique that has tradi- tionally been used for a wide range of practical applications, whether it be identifying known molecules at an airport, i.e. drug screening or identifica- tion of other illegal substances or coupled to other detection methods in the lab, such as gas chromatography (Lanucara et al. 2014). As structural biol- ogists are expanding their tool set for visualizing molecular structure, IMS is now starting to gain traction as a way to do this quickly and over a large data set, especially when linked to a mass spectrometer (MS). It has for example recently been used to study the different forms and strains of prion protein conformations (Van der Rest et al, 2017) and the role of lipids in Na+/H+ antiporters (Landreh et al, 2016). What makes MS linked IMS so useful in both these cases is the ability to rapidly distinguish different conformations of the same molecule under different conditions, with secondary molecules being able to interact freely with the target during analysis.

IMS works by pulling ions through a counter flow of buffer gas, by using an electric field (figure 1). The time taken in traveling a certain distance is then characteristic of that ion, based on its size, shape and charge. Able to detect multiple conformations through their drift time, which relates to size and charge, IMS depends on theoretical models of these drift times to identify species and structures. As such, the computational time needed in direct calculation, that is to say calculations for each atom and its relation to all the incoming gas particles, for all those gas particle positions, for these models can be short for small molecules, but can take over a day using MOBCAL, the standard implementation of the trajectory method for calculating drift times. A day is seen as a long time as it is the bottleneck when it comes to identifying molecules: IMS itself extremely fast, in the order seconds to minutes. This limits the use of MOBCAL calculations considerably for large

(5)

proteins or for large sets of structures, which is a hinderance that stands in the way of using IMS as a tool for systematic structural analysis of proteins and complexes.

1.1.2 Fast multipole method

First introduced by Greengard and Rohklin (Greengard & Rohklin 1987) in order to speed up computational electromagnetics, the fast multipole method (FMM) is an efficient way to solve problems involving many body interactions. The core ideas of the FMM are to group many bodies together in cells, calculate their multipole moment from the desired multipole expansion and then group the cells into a hierarchical tree. The tree may then quickly be traversed and the interactions between each cell with each other cell can then be calculated based on their moments. The concept of near and far cells plays an important role in distinguishing which cells in the tree to calculate interactions of (far) and which to break down into smaller cells (near). In Greengard & Rohklin (1987) spherical harmonics are used, though today the simpler and more efficient way of constructing a FMM is done using Cartesian coordinates (Visscher & Apalkov 2010).

1.2 Aim

The aim of this paper is to test if it is possible to overcome long computing times using the method known as the fast multipole method (FMM) for calculating the change in momentum of an incoming probe, representing the buffer gas particles. The probe’s change in momentum as it approaches the molecule would then directly correspond to structure and flight time. As an important first step in realizing this goal, we have implemented the FMM in order to calculate potential energy, both electrical and Van Der Waals.

This paper will demonstrate a proof-of-concept which showcases how well the FMM works for potential energy calculations, and by extension, the force evaluations needed for tracing the trajectories of the probes

(6)

2 Method

2.1 Algorithm

The fast multipole method is based on the idea that the interaction with multiple particles can be described as a collective interaction when the interaction distance is large compared to the distance between the particles.

Zhang & Haas (2009) describe a recursive version of the FMM that is easy to use and adapted for Cartesian coordinates, though where Zhang & Haas (2009) calculates the interaction of each cell with each other cell, we are only interested in the interaction between cell and probe. The multipole expansion is up to the third order, as for orders above this it is noted in Visscher

& Apalkov (2010) that for orders above this the error (erro estimate is ca.

10⁻⁶) is larger than any potential gain in accuracy (Meager1982). The basic algorithm is based on that in Zhang & Haas (2009) with modifications and is structured in this manner:

1. Construct a tree of cells containing data points, an octree in this case (figure 2), beginning from the largest cell then splitting that cell into eight smaller cells within the space occupied by the parent cell, then repeating recursively. For each cell, calculate the multipole moments.

Figure 2. 3D and 2D visualisations of an octree (Wikimedia Commons 2017).

2. Read the distance between the topmost cell and the probe (figure 3). If the distance is larger than an arbitrary minimum distance, then only use this cell’s multipole moment and calculate the potentials at the probe position.

This minimum distance is then the distance at which the probe will chunk cells together; a horizon where beyond there is low resolution. The minimum distance can be altered in order to find the best ratio between speed and

(7)

accuracy. If the cell is within the minimum distance, go down a level in the tree and repeat.

Figure 3. 2D visualisation of the multipole moment as one large particle being constructed from many small.

This allows us, with some overhead in the construction of the tree, to evaluate interactions in near cells excplicitly while quickly estimating the interactions of far cells. This leads to quick computation time of O(logN ).

In order to compare results with direct calculations to verify accuracy and speed, the program also prints a picture of the potential at all points around the molecule. This allows us to visualize any artifacts or discrepancies produced by the algorithm and also to compare the time it takes for our algorithm to calculate the potentials versus the time taken for direct calculation.

The size of the picture is set to 500x500 pixels, where each pixel is a separate potential calculation. For further documentation, please see the code description in the appendix, section 7.2.

(8)

2.2 Formalism

In order to calculate the multipole moments in the construction of the octree some formulas are here outlined. The following shorthand notation will be used, just as in Zhang & Haas (2009):

n ≡ (n_x, n_y, n_z) (1)

n ≡ nx+ ny+ nz (2)

rⁿ ≡ xⁿ^xyⁿ^yzⁿ^z (3)

n! ≡ nx!ny!nz! (4)

Where n is the order of each plane and n the overall order. r is the distance from the probe to the center of the multipole. To calculate the multipole moments of a charge distribution q_i we define, as in Zhang & Haas (2009):

Q = 1 n!

X

i

qirⁿ_i (5)

In order to calculate the potential from Qnit is shown in Zhang & Haas (2009) that:

V (r) =

−→X

n

Dn(r)Qn (6)

Where

D_n≡ δⁿ δ(−r)ⁿ( 1

p r p~

) (7)

Dnthen reflects the electrostatic potential’s distance relation, ¹_r. For Lennard- Jones potential, this is adapted to fit _r¹12 −_r¹₆.

For example, if we wanted to calculate the electric potential at the probe due to a cell with Q_(1,0,0)= m then we would use:

m D_1,0,0 = − mx

(x²+ y²+ z²)³²

(8) These formulas are also applicable to the Lennard-Jones potential, seeing as they are simply derived from multipole expansion. For further documentation please see section 7.1 in the appendix.

(9)

3 Results and Discussion

3.1 Accuracy

Images produced via both direct calculation and FMM for the potential are identical both when the signal is amplified 10x (figure 4) and 100x (figure 5) demonstrating that the potential from the FMM is extremely close to the directly calculated potential. This was tested using the online tool Resemble (https://huddle.github.io/Resemble.js/) without antialiasing. Worth noting is that each pixel is generated from a finetely sized fraction, but each pixel can only have intensity in integer format and so the value is truncated. This is overcome to some degree by signal amplification, as the potentials are low to begin with. Therefore however, though the pictures are identical, it does not mean that the values are, see section 3.4. The same results were generated when working with Lennard-Jones potentials (figure 6).

Figure 4.

Images generated by direct and FMM calculations, left and right respectively. Green indicates positive potential, red negative. Intensity is linear and proportional to ^Q_r. 10x amplification of signal.

(10)

Figure 5.

Same as Figure 4, with a different data set. The image is zoomed in and the intensity is amplified 100x.

Figure 6.

Lennard-Jones potential of a random dataset. Generated by direct calculation and FMM, left and right respectively. Red is negative potential, green is positive. Amplification times 10⁴. Note: Observe the visual artifacts generated when the potential is very high, close to the atom. First ’shimmering’

occurs, switching between colours green and red, and lastly pixels are simply black. The black areas are the actual atoms and are adjusted by a size variable.

(11)

3.2 Speed

Our primary aim was to increase the speed of calculations for large molecules (above 10⁵ data points). As we can see from Figure 7, the time it takes to calculate via FMM is greater than via direct calculation up to about 10^4.5 data points. After this, the computing time varies little for FMM but increases linearly for direct calculations. For example, a molecule with 10^5.5 atoms would take ten times longer for direct calculations while the FMM would calculate it in roughly the same time as it did one with 10^4.5 atoms.

This is of note since trajectories are usually calculated in amounts of 10⁴ or above, with each trajectory sometimes requiring over 1000 force evaluations.

Figure 7.

The time taken in 10⁻⁴s on the y axis versus the number of atoms, n, where the y axis represents the amount of time taken for the calculations, and the x axis shwoing us the amount of input data points. Both axes are logarith- mic, base 10 in order to better see the shape of both calculation methods and compare them. Each point represents the time taken to calculate the potential at 250 000 points around the data set (an array of 500x500).

3.3 Problems

As the project took form, two interesting problems arose. The first was, what is the optimal distance for a cell to be considered near? The algorithm now uses an arbitrary amount of 100 distance units, since this worked well

(12)

to check whether to calculate the cell or to split and not calculate, it seems intuitive to decrease the minimum distance, as most cells within the minimum distance will go to their maximum depth before being calculated if the minimum length is fixed. This would then create a gradient of larger cells further away, and smaller cells closer to the probe. Exploring this approach from the start, the implementation would shorten the minimum depth after each time it was called by making it slightly shorter than the distance between the probe and the cell center it investigated. However this lead to an interesting error where the source grouping would radically shift at certain points around the dataset. This is illustrated in figure 8, where there are clear lines of intersection between different viewpoints. Note that this is an error specific for this implementation and not in general. For this reason, this was not included in the final version.

Figure 8.

The shatter effect occurring when the minimum distance is decreased at each step. Note that the colours are not potentials, but rather the magnitude of forces from an attempted force implementation, and are not accurate.

3.4 Areas for further development

In order for this implementation to be of any potential use in IMS, a number of areas would need to be added or improved. These are adding the force calculations and optimizing the code, while also adding the ability to calculate the change in momentum for an incoming probe along its trajectory.

This feature could then be used to calculate drift time and investigate structural features. Adding the force calculations would be a matter of derivation of potential, as F = −∇V . Optimization by hand would improve calculation times, as the code is unoptimized at present. Adding trajectories with momentum change and linking it to drift time could be done by creating

(13)

a function that integrates the equations of motions for the probe particle.

This could then be used to calculate drift times.

In addition to these areas, error analysis would be an interesting area to investigate. Implementing this would require doing direct calculations and the FMM alongside each other in the same program, evaluating the potentials separately and storing their difference. Doing this would generate a map of when the error is greatest and would inform about the error and its relation to model parameters. These areas are however beyond the scope of this project.

4 Conclusion

Our implementation of the FMM adapted for calculation of potentials on a single probe point works as intended, shortening the time needed for computation considerable for large molecules. This indicates that IMS would have no real barrier in computing time for large n, as we saw that the increase in computing time was negligible after n= 10^4.5. Thus, it seems that implementing the FMM for use in IMS would open up its potential for use in analyzing large molecules or complexes. While this works as a proof-of- concept, demonstrating the strengths of FMM, there is still work that needs to be done in order to adapt it for use in IMS. The forces on the probe need to be, and can be, calculated in much the same way as the potentials are.

The ability to read protein database files (PDB) would be necessary, as the project simply used randomly generated molecules. For efficiency, it would be ideal to implement parallel processing threads. Some questions remain, such as the best minimum distance from probe to cell or what the deepest level in the octree should be, based on molecule size. Error analysis of the potentials and forces would be interesting to evaluate, especially considering the previous two questions. Another interesting observation made is that since the minimum distance is checked with each step of the tree calculation function, one would expect the minimum distance to decrease successively as we go deeper down into the tree. However, doing this leads to errors in choosing what cells to chunk together from different point of views. Thus this feature was scrapped, though it intuitively feels right.

(14)

5 Acknowledgments

I would like to thank Erik Marklund at the Uppsala University Department of Chemistry, for giving me the opportunity to do this project and for his patience and time. I would also like to thank Professor Helena Grennberg, also from the Department of Chemistry at Uppsala University, for all her helpful work and expertise.

6 References

Greengard L. Rohklin, V. 1987. A fast algorithm for particle simulations.

Journal of computational physics 73: 325-348.

Lanucara L., Holman S.W., Gray C.J. and Eyers C.E. 2014. The power of ion mobility-mass spectrometry for structural characterization and the study of conformational dynamics. Nature Chemistry 6: 281-294.

Meagher D., 1982. Geometric Modeling Using Octree Encoding. Computer Graphics and Image Processing 19: 129-147.

Visscher P.B., Apalkov D.M. 2010. Simple recursive implementation of fast multipole method. Journal of magnetism and magnetic materials 322: 275- 281.

Wikimedia Commons, accessed 6/5/2017. Figure 1:

https://upload.wikimedia.org/wikipedia/commons/thumb/6/66/

Ion mobility spectrometry diagram.svg/500px-

Ion mobility spectrometry diagram.svg.png and Figure 2:

https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/

Octree2.svg/1200px-Octree2.svg.png

Zhang W., Haas S. 2009. Adaptation and performance of the cartesian coordinates fast multipole method for nanomagnetic simulations. Journal of magnetism and magnetic materials 321: 3687-3692.

(15)

7 Appendix

7.1 Mathematics of the multipole expansion 7.1.1 General formulas for D and Q

Q_n= _n ¹

x!ny!nz!P qxⁿ^xyⁿ^yzⁿ^z D_n= _δ(−r)^δⁿn

√ 1

x²+y²+z²

7.1.2 Specific examples for different n n=000

Qn=P q D_n= √ ¹

x²+y²+z²

n=100 Q_n=P qx

Dn= ^x

(x²+y²+z²)³²

n=200

Qn= ¹₂P qx²

Dn= −(^x²^+y²^+z²)³²^−3x²√

x²+y²+z² (x²+y²+z²)³

n=110

Q_n= ¹₂P qxy Dn= ^3xy

(x²+y²+z²)⁵²

(16)

7.2 Code description

Code implemented in C++. Both the direct calculation algorithm and the FMM code described below are unoptimized.

7.2.1 Tree creation function

Takes a set of points with coordinates with charge and Lennard-Jones sigma value.

Increase depth value by one.

Sum all coordinates and get the mean value. Call this the center of the cell.

Calculate the cell moments.

Split cell into 8.

For each split cell that is not empty, call this function on it.

Else make its pointer NULL.

7.2.2 Interaction calculator function Takes the probe position and a cell pointer.

Set origo to be the probe position.

Calculate the distance from probe to cell.

If the distance is longer than the minimum range or if the cell has only NULL children, then calculate its potentials.

Else call this function on the cell’s children.

7.2.3 Main function

Takes parameters “p” for electric potential output and “lj” for Lennard- Jones potential.

Read data file into list.

Call tree creation on list.

Create a 500x500 array.

Call the interaction calculator at origo then multiply it by some amplification parameter, then store the value in the array, then increment one coordinate unit in one direction.

Repeat until each coordinate value has been incremented 500 times and the array is full.

Print array to screen.

Close.

The Fast Multipole Method for Ion Mobility Spectrometry: A Proof of Concept