Real-time Head Motion Tracking for Brain Positron Emission Tomography using Microsoft Kinect V2

(1)

,

STOCKHOLM SWEDEN 2016

Real-time Head Motion Tracking

for Brain Positron Emission

Tomography using Microsoft

Kinect V2

(2)

(3)

est version of Microsoft Kinect sensor (Kinect v2) as an external tracking device for head motion during brain imaging with brain Positron Emission Tomography (PET). Head movements constitute a serious degradation factor in the acquired PET images. Although there are algorithms implementing motion correction using known motion data, the lack of effective and reliable motion tracking hardware has prevented their widespread adoption. Thus, the development of effective external tracking instrumen-tation is a necessity. Kinect was tested both for Siemens High-Resolution Research Tomograph (HRRT) and for Siemens ECAT HR PET system. The face Applica-tion Programming Interface (API) ’HD face’ released by Microsoft in June 2015 was modified and used in Matlab environment. Multiple experimental sessions took place examining the head tracking accuracy of kinect both in translational and rotational movements of the head. The results were analyzed statistically using one-sample T-tests with the significance level set to 5%. It was found that kinect v2 can track the head with a mean spatial accuracy of µ0< 1 mm (SD = 0,8 mm) in the y-direction of the tomograph’s camera, µ₀ < 3 mm (SD = 1,5 mm) in the z-direction of the

tomo-graph’s camera and µ0< 1◦ (SD < 1◦) for all the angles. However, further validation needs to take place. Modifications are needed in order for kinect to be used when acquiring PET data with the HRRT system. The small size of HRRT’s gantry (over 30 cm in diameter) makes kinect’s tracking unstable when the whole head is inside the gantry. On the other hand, Kinect could be used to track the motion of the head inside the gantry of the HR system.

(4)

Kinect v2 som instrument för rörelse korrektion av bilddata från Positron Emission Tomography (PET). Patientens huvudrörelser står för en betydande del av kvalitets försämring av bilddata inom PET, och problemet ökar i takt med tekniska förbät-tringar av PET tekniken (upplösning) och rekonstruktions algoritmer. Olika algorit-mer för rörelse korrektion, där patientens rörelser är kända, finns och används idag på många ställen. Problemet är att på ett enkelt och tillförlitligt sätt mäta patientens rörelser under en undersökning. Ett generellt, externt system för att spåra patientens rörelse under en undersökning är därför av högsta vikt. Kinect testades både i Siemens High Resolution Research Tomograph (HRRT), hjärn-kamera, och Siemens Ecat Ex-act HR, helkropps-kamera. Microsofts applikations verktyg (Application Programming Interface, API) “HD face” släpptes i juni 2015 och har byggts in I ett mjukvarupaket för Matlab. Ett flertal experiment utfördes där Microsoft Kinect’s funktion och nog-grannhet testades vid känd förflyttning av huvudet, både vid translation och rotation. Resultatet analyserades statistiskt med hjälp av one-sample T-test (lilla T-testet) med en signifikansnivå på 5%. Resultatet visar att Kinect v2 kan spåra huvudrörelser med en spatiell noggrannhet av µ0 < 1 mm (SD = 0,8 mm) i PET kamerans Y-led och

µ0 < 3 mm (SD = 1,5 mm) i Z-led samt µ0 < 1◦ (SD < 1◦) för alla vinklar. Ytterli-gare testes behövs dock för att säkerställa dessa värden. På grund av HRRT-kamerans trånga gantry (drygt 30 cm i diameter) är Kinect’s möjlighet att kontinuerligt spåra rörelser med existerande mjukvara, otillförlitligt. Programvaran behöver modifieras så att igenkänningsfunktionen kan bli mer adaptiv. För HR-kameran, med dess betydligt större gantry kan Kinect användas i nuvarande version.

(5)

when dealing with technical pieces of work. I am deeply grateful to different persons who have supported me during the enriching experience of working on my master thesis. Firstly, I would like to thank my supervisors Andrea Varrone and Göran Rosenqvist for giving me the opportunity to work in the specific project. It was a very interesting and fruitful experience for me.

Special thanks to Göran for his help and support during the whole procedure and the time he spent with me. His practical help with the equipment was valuable.

I would also like to thank my reviewer Prof. Massimiliano Colarieti-Tosti for his help and his constructive and interesting feedback.

Many thanks to Urban Hansson for always volunteering to participate in the ex-periments. His positive attitude created a very enjoying atmosphere and made the experiments actual fun. It was a pleasure staying in the same office room and having nice conversations.

I would also like to thank Kalin Stefanov for giving me the opportunity to borrow his kinect device. It was an important contribution which allowed me not to lose time and be able to complete my thesis work on time.

I am genuinely grateful to my classmate Awais Ashfaq for letting me know about the existence of the specific project, which gave me the opportunity to apply for it. In addition, working together in different courses has been a nice experience and I feel I learnt a lot through our discussions.

Many thanks to all the members of my supervision group, Awais, Zoreh, Masih and our group supervisor Rodrigo Moreno. Their feedback was valuable and always to the point and I took it seriously into consideration during the whole procedure.

Thank you to Natalie. Sharing our experiences in Stockholm has been amazing and led to the creation of a real friendship.

A big thank you goes to my beloved friends Maria and Fay and my cousin Katerina. Thank you for being always there for me, supporting me and believing in me even in times when I didn’t. I am really grateful for their friendship and love.

(6)

List of Figures

1 Point cloud of face, created by HD Face API. . . 4 2 Head pose euler angles in kinect’s coordinate system. . . 5 3 Kinect mount in the PET room. . . 7 4 The origin (x = 0, y = 0, z = 0) is located at the center of the IR sensor on

Kinect. X grows to the sensor’s left, Y grows up (this direction is based on the sensor’s tilt) and Z grows out in the direction the sensor is facing. The measurement unit of kinects measurements equals 1 meter. . . 8

5 Coordinate system of the camera of HRRT. . . 8

6 Markers placed on HRRT’s bed for spatial calibration. . . 9

7 Red point: Reference point on HRRT’s bed, with its coordinates expressed

with reference to the coordinate system of HRRT in mm. Green points: Calibration points. . . 10

8 Head tracking performed in the HRRT PET tomograph. Image frame as

returned by kinect v2 using the FaceHD API. . . 13

9 Head tracking performed in the HR PET tomograph. Image frame as returned

by kinect v2 using the FaceHD API. . . 14 10 Schematic representation of the a) setup for measuring pitch angles, b) setup

for measuring yaw angles, c) setup for measuring roll angles. . . 15

11 Flowchart of obtaining raw motion-data and post-processing them. . . 16

12 Absolute 3D position (x, y, z coordinates) of the head pivot in HRRT’s coor-dinate system. The volunteer was asked to remain still till t = 15 s when he was asked to move his jaw and eyes. . . 18 13 Absolute 3D position (x, y, z coordinates) of the head pivot in HRRT’s

coordi-nate system during a session with total duration of 368 s. The continuous blue line corresponds to the position as captured by kinect and then transformed to the coordinate system of the camera. The dashed red line corresponds to the actual position of the head pivot in HRRT’s coordinate system. . . 19 14 Absolute 3D position (x, y, z coordinates) of the head pivot in HRRT’s

coordi-nate system during a session with total duration of 609 s. The continuous blue line corresponds to the position as captured by kinect and then transformed to the coordinate system of the camera. The dashed red line corresponds to the actual position of the head pivot in HRRT’s coordinate system. . . 20 15 Pitch angle of the head during a session with total duration of 330 s. The

continuous blue line corresponds to the angle as captured by kinect and then smoothed for each time interval of 2 s. The dashed red line corresponds to the angle as measured on the protractor. . . 21

16 Yaw angle of the head during a session with total duration of 184 s. The

(9)

17 Roll angle of the head during a session with total duration of 343 s. The continuous blue line corresponds to the angle as captured by kinect and then smoothed for each time interval of 2 s. The dashed red line corresponds to the angle as measured on the protractor. . . 23 18 Example of Bland-Altman plot of the translational measurements as obtained

by kinect compared to the bed controlled measurements in one of the sessions. The bias line is placed at 3_{· 10}−4 m. The limits of agreement are placed at

±0, 0028 m. . . . 24

19 Example of Bland-Altman plot of the rotational measurements as obtained

(10)

List of Tables

(11)

Abbreviations

AIR Automated Image Registration. 27

API Application Programming Interface. 4, 7, 25, 26 DOF Degrees Of Freedom. 10

FOV Field of View. 2 HD High Definition. 4

HR High Resolution. 2, 7, 8, 12, 26, 27

HRRT High Resolution Research Tomograph. vi, 2, 7, 8, 10, 12, 17, 26, 27 IR Infrared. 3, 29

MEX Matlab Executable. 6 NUI Natural User Interface. 6, 7

PET Positron Emission Tomography. 2, 25, 27 RGB Red Green Blue. 3, 29

(12)

1 Introduction

Positron Emission Tomography (PET) is a nuclear imaging modality which provides researchers and doctors with three-dimensional (3D) images of the distribution of a radio-labeled molecule in the body. In this way, brain PET enables the visualization of several neurotransmission systems in the living human being allowing for research in clinical neuro-science and drug development.

Improved scanner technology and reconstruction algorithms have improved significantly the spatial resolution of PET [1],[2]. The Siemens High-Resolution Research Tomograph (HRRT) for brain scanning available at the Karolinska hospital makes use of Inki Hong’s code for optimized reconstruction [3] and it can theoretically reach a spatial resolution of 1,4 mm in the centre of the field of view (FOV) of the scanner [4]. The older Siemens ECAT HR PET whole body scanner has a spatial resolution of 3,6 mm in the centre of the field of view.

As a result, small motions of the head can cause significant degradation in the quality of the acquired images since they are on levels above the PET brain scanners’ spatial resolution [5]. A typical PET brain imaging session can last over an hour, so it is impossible for the subject to remain motionless during this time. Thus, head motion is an important limiting factor that impacts on costs and effective diagnosis and treatment.

There are mainly three kinds of head motion seen on PET: long drift motion reaching up to 20 mm (when the patients are relaxing, e.g falling asleep), frequent short motions around a mean position and occasional quick movements e.g because of coughing. The most common one is the first one [6].

In this frame, motion correction arises as a necessity in PET brain tomography. In the ”State of the Art” Appendix, the reader can find a comparative and evaluative presentation of the different motion correction trends that have been developed during the last twenty years and the corresponding instrumentation used. During the last 8 years, the focus has been on external tracking of the head and implementation of motion correction using the known motion data. Although there is much research going on at a software level, the lack of effective and reliable motion tracking hardware has prevented the widespread adoption of the developed algorithms. The Polaris system is one optical tracking system that has been used in most of the related research works [7],[8]. However, its main weakness is that it requires the use of attached markers. The relative movement between those markers and the skull is a serious cause of errors in the motion tracking process. As a result, physicists have expressed the need for developing a markerless motion tracking instrumentation.

(13)

2 Materials

2.1 Hardware

2.1.1 Kinect

Kinect for Xbox One (Kinect V2) was released by Microsoft in July 2014. It belongs to a generation of sensors known as range imaging cameras or RGB-D cameras, which be-came popular since the Kinect V1 sensor was released by Microsoft in November 2010. The innovative idea brought by this generation of sensors is that the combination of the comple-mentary visual and depth information opens up new opportunities for solving fundamental computer vision problems. The raw data returned by kinect are depth and colour images captured by its infrared (IR) and RGB camera respectively. In Appendix A, the reader can find a presentation of kinect v2 specifications and principle of function.

2.1.1.1 Point Cloud Acquisition

Three Dimensional (3D) coordinates of the scene can be calculated from the depth infor-mation captured by kinect, leading to the creation of a point cloud of the scene.

Since the Kinect sensor uses a standard lens, the camera’s intrinsic parameters - focal length f and principal point coordinates (cx, cy) - can be determined and assumed known.

The Z-coordinate for each pixel is also known since it corresponds to the value stored in the depth-map. Using these parameters and the perspective projection relation, a 3D point

P = [X, Y, Z]T _{in the camera coordinate system can be determined from the homogenous}

image coordinates of a pixel p = [u, v, 1]T _{through relation 1 [9].}

λp = KP =    fx 0 cx 0 fy cy 0 0 1    ·   X Y Z   (1)

where λ is a scale factor and fx = _{pixel width}f and fy = _{pixel height}f is the focal length expressed

in horizontal and vertical pixels respectively.

2.2 Software

2.2.1 Software Development Kit (SDK) 2.0

(14)

• Firstly, get the RGB and depth data

• Secondly, infer body parts’ positions using machine learning.

Kinect is a real-time body recognition and tracking system that locates body parts based on a local analysis of each pixel from a single image using no temporal information. It uses an estimation algorithm based on random decision forest proposed by Shotton et al [12]. Each pixel is characterized by a number of features (e.g colour, distance from the Kinect). A type of classifier called a decision forest, a collection of decision trees [14], is trained based on these features; each tree is trained on a set of features on depth images that have been pre-labeled with the target face parts. The trained classifiers assign a probability of each pixel being in each face part. Finally, the algorithm picks out areas of maximum probability for each face part type. The complementary depth and RGB information provide more available features for each pixel.

2.2.1.1 HD Face - High definition face tracking

In June 2015, Microsoft released a new Application Programming Interface (API ) called

HD face which implements face tracking. This is the one used in this master thesis project.

Since its release has been very recent, there is still no relevant literature available and it has not been tested and used in other applications with publically available results yet.

This API captures the face and creates a point cloud that represents it, as shown in Fig. 1. Thence, it can output the face orientation and the 3D coordinates in Kinect’s coordinate system of a head pivot point. The head pivot point is the computed center of the head, which the face may be rotated around. The orientation of the head consists of the three Euler angles - yaw, pitch, roll - as illustrated in Fig. 2 and explained in Table 1.

(15)

Figure 2: Head pose euler angles in kinect’s coordinate system.

Table 1: Head Pose Angles

Angle Value

Pitch angle -90 = looking down towards the floor

0=neutral +90 = looking up towards the ceiling

Face Tracking tracks when the user’s head pitch is less than 20 degrees, but works best when less than 10 degrees.

Roll angle -90 = horizontal parallel with right shoulder

of subject

0 = neutral +90 = horizontal parallel with left shoulder

of the subject

Face Tracking tracks when the user’s head roll is less than 90 degrees, but works best when less than 45 degrees.

Yaw angle -90 = turned towards the right shoulder of

the subject

0 = neutral +90 = turned towards the left shoulder of

the subject

(16)

The faceHD API will track only those frames for which the Natural User Interface (NUI) API has already identified the head and neck joints. The tracking quality increases if the face has been captured, and the output of the capture is used to initialize the face tracking. This is why in the beginning of each session, the first captures take place with the volunteer’s head outside of the gantry, so that no part of their face is obscured. The volunteer is asked to move their head slowly so that a personal model of their head is created, using the precise geometry of their face instead of an average geometry. This enables the face tracker to succeed more accurate characterization of the face motions.

2.2.2 Matlab

(17)

3 Methods

3.1 Kinect mount

In both cases (HRRT, HR room), kinect was mounted on the ceiling of the clinical room containing the PET tomograph. Firstly, Kinect was mounted on a tripod and thence fabric straps were used in order to mount the tripod on metal tubes overhanging from the ceiling, as it can be seen in Fig. 3. As pre-mentioned, the faceHD API will track only those frames for which the NUI API has already identified the head and neck joints. This is why kinect had to be placed in a way so that the upper body part of the subject is in its field of view.

(18)

3.2 Spatial Calibration

Following, the procedure followed for spatial calibration in the case of the HRRT tomo-graph is described. The same procedure was followed in the case of the HR tomotomo-graph.

3.2.1 Mapping 3D coordinates from one coordinate system to the other

Kinect’s coordinate system is Cartesian right-handed and is defined as illustrated in Fig. 4

Figure 4: The origin (x = 0, y = 0, z = 0) is located at the center of the IR sensor on Kinect. X grows to the sensor’s left, Y grows up (this direction is based on the sensor’s tilt) and Z grows out in the direction the sensor is facing. The measurement unit of kinects measurements equals 1 meter.

The camera of the HRRT system has a left handed Cartesian coordinate system as shown in Fig. 5

Figure 5: Coordinate system of the camera of HRRT.

(19)

reference to Kinect’s coordinate system were calculated using the intrinsic parameters of kinect in the same way as described in equation 1.

Thence, the 3D coordinates of the markers with reference to the reference point shown in Fig. 6 and Fig. 7 were measured. This point is placed on the center of the bed in the x-direction and its 3D coordinates with reference to the HRRT’s camera’s coordinate system are known for specific placement of the bed according to the tomographs’ manual.

(20)

Figure 7: Red point: Reference point on HRRT’s bed, with its coordinates expressed with reference to the coordinate system of HRRT in mm. Green points: Calibration points.

The transformation between two Cartesian coordinate systems can be considered as the result of a rigid transformation. Thus, it can be decomposed into a rotation and a translation. This means that there are three Degrees Of Freedom (DOF) to translation and three DOF’s to rotation, leading to six unknowns in total. Three points known in both coordinate systems provide nine constraints (three coordinates each), which are enough to permit determination of the six unknowns. Two points do not provide enough constraints. Because practically, measurements are not exact, more than three points were used in order to achieve higher accuracy.

A rigid transformation matrix that maps 3D data from one coordinate system to the other was found using the Closed-Form Solution of Absolute Orientation using Orthonormal

Matrices proposed by Horn [16]. Horn’s method is a non-iterative method that searches for

the solution which minimizes the squared residual error between the target points and the points calculated by applying the rigid transformation matrix to the source points, taking advantage of matrices algebraic properties.

Mapping the 3D points from one coordinate system to the other consists of simply multi-plying the original points with the transformation matrix. So for a given number of points,

N , the original points with coordinates xn, yn, zn are transformed to the points with

coor-dinates x′_n, y_n′, z′_n according to:      x′_n y_n′ z_n′ 1     =       R11 R12 R13 Tx R21 R22 R23 Ty R31 R32 R33 Tz 0 0 0 1      ·      xn yn zn 1     where n = 1, ..N (2)

(21)

Tz are the translational components of the transformation matrix. For more information

concerning Euler angles and rigid transformations the reader can refer to [17].

One issue to be solved was that the coordinate system of Kinect is right-handed whereas the coordinate system of HRRT is left-handed, so no rigid transformation exists between these two systems. In order to account for that, the y and z coordinates of the target points were switched and the rigid transformation matrix was found. Thence, this transformation matrix was used to map 3D points from kinect’s coordinate system to HRRT’s coordinate system. Finally, in the points expressed in HRRT’s coordinate system, y and z coordinates were switched back.

3.2.2 Mapping angles from one coordinate system to the other

In order to process the results obtained by rotational movements of the head, composing and decomposing a 3× 3 rotation matrix was needed. This is the case because angles need to be in the form of a rotation matrix in order to be mapped from one coordinate system to the other. However after they are mapped to the new coordinate system, they must be converted again into angles in order to be compared with the actual angles of movement.

Assuming that Rhead is the rotation matrix composed using the pitch, yaw, roll angles (as

it will be described in subsection 3.2.2.1) and R the rotation matrix describing the rotation from the initial to the final coordinate system, relation 3 gives the new rotation matrix R′_head describing the angles with reference to the new coordinate system.

R′_head = R−1· Rhead (3)

3.2.2.1 Composing a 3× 3 rotation matrix

Given 3 Euler angles θx, θy, θz, the rotation matrix is calculated as follows, using the xyz

convention (Tait-Bryan angles):

(22)

3.2.2.2 Decomposing a 3× 3 rotation matrix

Given a 3× 3 rotation matrix

R =     r11 r12 r13 r21 r22 r23 r31 r32 r33     (8)

the 3 Euler angles are

θx = arctan( r32 r33 ) (9) θy = arctan( −r31 √ r2 32+ r233 ) (10) θz = arctan( r21 r11 ) (11) where θx → (−π, π), θy → (−π₂,π₂), θz → (−π, π)

3.3 Pre-heating time

Lachal et al in [18] conducted experiments where they showed that the depth data from kinect v2 drift during the initial 40 min from powering on. This is due to the kinect v2 fan being turned on and altering the thermal properties of the sensor. Thus, Kinect v2 requires a ’pre-heating’ time in order to stabilize thermally and obtain reliable data. This was confirmed by testing the sensor with a motionless volunteer and observing the drifting of the point cloud. This is why all the experiments in this work were performed with a kinect v2 that had been powered for at least 60 min before data acquisition.

3.4 Temporal Alignment

In order to align kinect measurements with the actual motion of the subject, two time stamps were used in the code marking the beginning and the ending of kinect frames acqui-sition. The initial faceHD code was modified so that the acquisition and storing of the data starts upon request of the end-user. At the same time, a timer was used and the time was recorded when the controlled movement of the bed started and ended. The bed was moved using the corresponding controller and a constant velocity of movement was assumed. In addition, constant kinect frame-rate acquisition was assumed.

3.5 Experiment

3.5.1 Translational movements

(23)

that no part of their face was obscured. The volunteer was asked to move their head slowly so that a personal model of their head was created by kinect. Thence, the bed was moved inside the gantry. The bed’s controller was used in order to move the bed in the y-direction and in the z-direction. The controller keeps track of the movements of the bed so it can be used in order to compare the results given by kinect to the actual movements. The volunteer was asked to remain as still as possible, so it was assumed that the movement of the volun-teer’s head concurred with the movement of the bed. Motion varied from regular step-wise motions of 1 mm - 5 mm to long drift motions up to 20 mm. Eight sessions were repeated. Fig. 8 and Fig. 9 show an example image frame captured by kinect with the volunteer’s head captured and tracked in HRRT and HR gantry respectively. The experiment was repeated with different volunteers, in order to increase the sample space and the external validity of the experiment.

(24)

Figure 9: Head tracking performed in the HR PET tomograph. Image frame as returned by kinect v2 using the FaceHD API.

3.5.2 Rotational movements

(25)

(26)

3.6 Data Post-Processing

3.6.1 Invalid frames elimination

Speckle noise is inherent in infrared data [9]. When the speckle noise is too high, kinect returns an invalid frame with zero values. Each such invalid frame was eliminated from the dataset and was replaced using interpolation by the mean of its previous and next frame. It was preferred to replace it with another valid frame rather than just getting rid of it in order not to disturb the temporal alignment of the data.

3.6.2 Results averaging

With Kinect v2, it was possible to acquire and store frames using Matlab with a rate of 15 Hz (15 frames/s). However, talking with the responsible physicians and end-users of the tomographs, it was decided that a temporal resolution below 1 s is practically useless since the events captured in a time interval below 1 s are too few to influence the final result. In addition, most of the motion appears during the last scanning frames that last up to 10 min, so a fraction of second is too small compared to the whole time interval. Taking that into consideration, an averaging of the results was carried out for each time interval of 1 s. In this way, it was possible to get rid of drifting of the point cloud which didn’t correspond to actual head movement and which was due to speckle noise which is inherent in infrared data [9].

An overview of the procedure followed in order for the final results to be obtained can be seen in the flowchart of Fig. 11.

(27)

3.7 Statistical processing of results

3.7.1 Bland-Altman plots

In order to check the agreement between kinect’s measurements and the ’ground truth’ measurements, the Bland-Altman plots of the measurements were created. Bland and Altman plots, proposed by Bland-Altman in 1983 [19], are extensively used to evaluate the agreement among two different instruments or two measurements techniques. Suppose that we want to evaluate the agreement between a random variable X1 and a second random variable X2 which measure the same variable. Let’s assume that we have n paired observations:

(X1k, X2k) k = 1, 2, …, n. The Bland-Altman plot is formed by plotting the differences

X1 − X2 on the vertical axis versus the averages (X1 + X2)/2 on the horizontal axis.

A horizontal line representing the bias is drawn at the mean d = X1− X2. Additional

horizontal lines, known as limits of agreement, are added to the plot at d_{− 1.96 · SD and}

d + 1.96· SD. If the mean value of the difference differs significantly from 0, this indicates

the presence of fixed bias. The d− 1.96 · SD and d + 1.96 · SD lines constitute limits of agreement which tell us how far apart measurements by the two methods are more likely to be for most individuals.

3.7.2 One-sided one-sample t-tests

In order to evaluate statistically the obtained results, one-sided one-sample t-tests were used. An one-sample t-test can determine whether the mean of a population significantly differs from a specific hypothesized mean. Let us have the null hypothesis that the sample comes from a population with mean µ0. The t-test calculates the test statistic value t according to

t = x¯− µ0

s/√n (12)

where ¯x is the mean of the sample, µ0 the hypothesized mean, s the standard deviation of the sample and n the sample size. The t-statistic follows a t-distribution with n_{− 1 degrees} of freedom. Using the known distribution of the test statistic, the P -value is calculated. The significance level α is set to a specific threshold. The P -value is compared to α. If the

P -value is greater than α the null hypothesis is not rejected, otherwise it is rejected. An one-sided one-sample t-test tests the null hypothesis that the data comes from a population

with mean equal to µ0 against the alternative that the mean is lower (left-sided one sample

t-tests) or greater (right-sided one sample t-tests) than µ0. In the specific work, left-sided

one sample t-tests were used.

(28)

4 Results

4.1 Static measurements

Sessions were repeated where the volunteer was asked to remain still in order to observe possible drifting of the point cloud. In all the sessions, the drifting was < 1,5 mm and < 1◦. In addition, sessions were repeated were the volunteer was asked to remain still and at some specific time point he was asked to move his jaw and his eyes while trying to keep his head stable. The scope was to see how these motions influence the movement of the head pivot. One such session can be seen in Fig. 12 where the volunteer was asked to move his jaw and eyes at t = 15 s. The moving of the non rigid parts of the face can lead to movement of up to 2,5 mm of the head pivot. However, the volunteer’s head wasn’t rigidly fixed. Thus, it is very possible that the subject actually moved his head when asked to move his jaw and eyes. As a result, it is not possible to have reliable conclusions about the influnce of the non-rigid parts of the head to the overall movement of the head, given the available instrumentation.

time(s) 0 5 10 15 20 25 30 x(m) ×10-3 -2 -1 0 1 2

kinect captured motion

time(s) 0 5 10 15 20 25 30 y(m) -0.02 -0.019 -0.018 -0.017 -0.016 time(s) 0 5 10 15 20 25 30 z(m) -0.026 -0.025 -0.024 -0.023 -0.022

(29)

4.2 Head translations

time(s) 0 50 100 150 200 250 300 350 400 x(m) -0,06 -0.05 -0,04 -0.03 -0,02

kinect captured motion bed controlled motion

time(s) 0 50 100 150 200 250 300 350 400 y(m) -0,06 -0.05 -0,04 -0.03 -0,02

time(s) 0 50 100 150 200 250 300 350 400 z(m) -0.14 -0.11 -0.08 -0.05 -0.02

(30)

time(s) 0 100 200 300 400 500 600 700 x(m) -0,05 -0.04 -0.03 -0,02 -0.01 0

time(s) 0 100 200 300 400 500 600 700 y(m) -0,05 -0.04 -0,03 -0.02 -0.01 0

time(s) 0 100 200 300 400 500 600 700 z(m) -0.08 -0.06 -0.04 -0.02 0

(31)

4.3 Head rotations

In the case of the rotational movements, further smoothing of the kinect data, in time in-tervals of 2 s, took place. It was difficult for the volunteer to move his head in a highly smooth and controlled way. Thus, further smoothing was considered appropriate for comparing the kinect measurements with the respective measurements on the protractor.

time(s) 0 50 100 150 200 250 300 350 pitch(degrees) -16 -14 -12 -10 -8 -6 -4 -2 0 2

kinect captured motion (smoothed) protractor

(32)

time(s) 0 20 40 60 80 100 120 140 160 180 200 yaw(degrees) 3 4 5 6 7 8 9 10 11 12 13

(33)

time(s) 0 50 100 150 200 250 300 350 roll(degrees) -10 -8 -6 -4 -2 0 2 4 6 8

Figure 17: Roll angle of the head during a session with total duration of 343 s. The continuous blue line corresponds to the angle as captured by kinect and then smoothed for each time interval of 2 s. The dashed red line corresponds to the angle as measured on the protractor.

4.4 Statistical results

(34)

Figure 18: Example of Bland-Altman plot of the translational measurements as obtained by kinect compared to the bed controlled measurements in one of the sessions. The bias line is placed at 3· 10−4 m. The limits of agreement are placed at ±0, 0028 m.

(35)

Table 2: Results of left-sided one sample t-tests

Motion Mean (µ0) Standard deviation (SD) P − value H

Motionless session (transl.) 0,0015 m 0,0004 m 5,2·10−10 1

Motionless session (rot.) 1,00◦ 0,40◦ 4,32·10−10 1

y-translation 0,0010 m 0,0008 m 0,0145 1

z-translation 0,0030 m 0,0015 m 4,09·10−10 1

Pitch 1, 00◦ 0,90◦ 2,32 _·10−36 1

Yaw 1, 00◦ 0,90◦ 3,64·10−32 1

Roll 1, 00◦ 0,58◦ 3,25·10−54 1

(36)

5 Discussion

In the current master thesis work, the potential of kinect v2 being used as an external tracking device for tracking the motion of the head during brain PET scanning was tested. The FaceHD API software released by Microsoft in June 2015 was used for this purpose. It is the first time that the specific API is tested and the corresponding results are presented. The results of the experimental sessions are promising and show that kinect v2 has the potential to be incorporated in the PET brain scanning process. However, further validation testing needs to take place, using more sophisticated and accurate instrumentation in order to establish a more reliable ground truth with which kinect’s results can be compared. In addition, kinect needs to undergo some modifications in order to be applicable in all the tomographs. Following, a presentation of the current work’s weaknesses, on the one hand, and value, on the other hand, takes place.

Using the setup described in the Methods section, kinect v2 was found to have a spatial accuracy µ0 < 1 mm in the y-direction of the tomograph’s camera’s coordinate system and

µ0 < 3 mm in the z-direction of the tomograph’s camera’s coordinate system. Movements in

the x-direction weren’t tested due to unavailability of appropriate way of controlled move-ment in that direction. However, it is assumed that the accuracy in that direction shouldn’t be different from the one in the z-direction. The better accuracy in the y-direction is ex-plained by the fact that this direction is the one in which kinect captures directly depth data. The data in the other directions are the result of processing the raw depth data, so minor errors are accumulated leading to worse spatial accuracy. In addition, kinect v2 was found to have a spatial accuracy µ0 < 1◦ for all the rotational movements (roll, pitch, yaw). In the Bland-Altman plots, the bias line is placed very close to zero both for the transla-tional and the rotatransla-tional movements showing the agreement between kinect measurements and the ’ground truth’ measurements. However, the presence of a ’stripe-like’ pattern may reveal an underlying error in the way the ’ground-truth’ measurements were taken.

It needs to be pointed out that the tools used in order to provide the measurements that played the role of the ground truth with which kinect measurements were compared present some limitations. For testing the accuracy in translational movements, the controlled movement of the bed was used. However, the bed can be moved with an accuracy of 1 mm. This directly limited the possibilities for measuring the accuracy over the level of 1 mm. In addition, it was assumed that the volunteer’s head didn’t move relatively to the bed. This assumption is not flawless. The volunteer felt that for some fractions of the second, he continued moving after the movement of the bed was stopped due to inertia. For testing the accuracy in the rotational movements, a protractor with accuracy of 1◦ was used. This again limited the possibilities for measuring the accuracy over the level of 1◦. The angles on the protractor were noted as seen by an observer, which added a random error of imperfect method of observation.

(37)

been developed and could be adopted, such as injecting trigger gates into the PET listmode data (e.g [20],[21]). The current temporal alignment involved using a timer by the observer and noting times manually, which adds to the overall random error.

In order to account for the afomentioned random errors, multiple sessions were re-peated and the results were analyzed statistically taking into consideration the number of repetitions and the variation presented in the measurements. According to the central limit

theorem, random errors tend to acquire a normal distribution when the random error is the

sum of many independent random errors. As a result, by averaging multiple measurements and taking the mean, the random errors can be reduced, leading to more reliable results.

Another issue to be taken into consideration is the imperfect calibration, which is a source of a systematic error in the measurements. Horn’s Closed-Form Solution of Absolute

Orientation using Orthonormal Matrices [16] was adopted for mapping the coordinates from

kinect’s coordinate system to the tomograph’s camera’s coordinate system. However, this solution presents a minor error in its results. In addition, the determination of the 3D positions of the markers used in the calibration process was made using kinect and applying a function which made use of kinects intrinsic parameters. This, again, is a source of minor errors. As a result, there is an accumulation of errors that influence the obtained results. In this frame, research in finding alternative methods of calibration could be the subject of future research work. Olesen and her team, in 2013, in [22] developed such an alternative method. They suggested that the calibration between the tracking device and the PET coordinate system can take place by applying a registration algorithm that aligns the initial surface captured by the tracking device directly to the PET transmission data.

Another issue to be thought of, is that although it was tried to simulate a wide range of head movements that could take place during a PET scanning, those movements were not realistic; they were constrained to single dimension translations and in-plane rotations. Thus, an interesting future work could be to use simultaneously different tracking devices that have been developed (e.g Polaris, Tracoline) and compare the obtained results while the volunteer is performing free movements.

(38)

the spatial resolution deteriorates due to the detectors’ geometry. In addition, calibration discrepancies, which arise when time passes from the calibrating procedure, lead also to spatial resolution deterioration. As a result, even in HRRT the spatial resolution can fall to 2 mm. Thus, kinect’s accuracy is on levels that are acceptable. As a first step kinect can replace the current data-driven motion correction method applied in Karolinska hospital. Currently, motion correction in Karolinska hospital is performed using Automated Image Registration (AIR)[23]. This aligns the frames of PET data to a reference frame accounting for the inter-frame motion but does not account for the intra-frame motion. In addition, it is based exclusively on the emission data itself, which can be poor due to noise or low count rate (see ’State of the Art’ Appendix). Kinect’s data could replace the emission data and kinect could be used for motion correction using an average of the externally tracked motion data for each frame (see ’State of the Art’ Appendix). In this case, kinect would still not account for the intra-frame movements. However, kinect has the advantage that it is an external robust system with known accuracy that is not influenced by the noise in the data or the radiotracer’s distribution. In this way, it allows for more consistent motion correction. In the ’State of the Art’ Appendix, a presentation of the alternative instrumentation developed and tested for motion correction in PET brain scanning during the last years can be found. The systems that are currently being investigated and which present the most promising results are the Polaris System (Northern Digital Inc.,Waterloo, Canada) and the Tracoline system developed by Olesen and her team in 2013. In the State of the Art Appendix, the reader can find a presentation of their strengths and weaknesses. Both of those systems were found to have an accuracy µ0 < 1 mm and µ0 < 1◦ under ideal conditions [24],[22]. However, limitations such as the motion and detachment of the used markers in the case of the Polaris system, or the high sensitivity to occlusions and illumination variances in the case of the Tracoline system have prevented their wide clinical adoption. Kinect v2, with its insensitivity to illumination variances and its insensitivity, to a big degree, to occlusions arises as a promising alternative. Imanova imaging centre in collaboration with Imperial College London, in 2015, conducted a research work in the potential of kinect v2 being used in PET brain scanning [21]. They made use of a different software package, called Kinect Fusion. They managed to get results with 0,5 mm spatial accuracy. However, they used Kinect only with the HR tomograph. They applied modifications in kinect’s hardware, increasing in this way its minimum operating distance. That allowed them to place the sensor inside the gantry. This can not take place in the HRRT tomograph, as its gantry is too small for the sensor to fit in.

(39)

6 Conclusion

(40)

A

Kinect v2 for Xbox One

A.1 Specifications

Kinect v2 is composed of two cameras; an RGB and an infrared (IR) camera allowing for the acquisition of both color and depth images respectively. The RGB camera captures color information with a resolution of 1920_{× 1080 pixels, whereas the IR camera is used} for real-time acquisition of depth data with a resolution of 512× 424 pixels. The whole acquisition can be carried out with a framerate up to 30 Hz. The field of view for depth sensing is 70◦ degrees horizontally and 60◦ vertically. Kinect v2 can work in a range from 0,5 m to 4,5 m distance from the subject. Kinect for Xbox One can be connected through a windows adapter to a USB 3.0 port, allowing for developing applications on computer. The afore-mentioned specifications can be seen in Table 3.

Table 3: Kinect v2 specifications

Feature Value

Color Camera 1920× 1080

Depth Camera 512× 424

Framerate 30 frames/s

Max Depth Distance ∼ 4,5 m

Μin Depth Distance 50 cm

Horizontal Field of View 70 degrees

Vertical Filed of View 60 degrees

USB Standard 3.0

Supported OS Windows 8, 10

A.2 Principle of Function

(41)

d = ∆ϕ

4πf · c (13)

(42)

B

Matlab Codes

f u n c t i o n c o m p i l e _ c p p _ f i l e s % c o m p i l e _ c p p _ f i l e s c o m p i l e s t h e Kin2 t o o l b o x . % The C++ code i s l o c a t e d i n 6 f i l e s : % Kin2.h : Kin2 c l a s s d e f i n i t i o n . % Kin2_base.cpp : Kin2 c l a s s i m p l e m e n t a t i o n o f t h e b a s e f u n c t i o n a l i t y i n c l u d i n g body d a t a .

% Kin2_mapping.cpp : Kin2 c l a s s i m p l e m e n t a t i o n o f t h e mapping f u n c t i o n a l i t y

% K in 2 _ fa c e. c p p : Kin2 c l a s s i m p l e m e n t a t i o n o f t h e Face and HD f a c e

p r o c e s s i n g .

% K i n 2 _ f u s i o n . c p p : Kin2 c l a s s i m p l e m e n t a t i o n o f t h e 3D r e c o n s t r u c t i o n .

% Kin2_mex.cpp : MexFunction i m p l e m e n t a t i o n .

%

% Requirements :

% − Kinect2 SDK. http : / / www.microsoft.com /en−us /download/ d e t a i l s . a s p x ? id =44561

% − Visual Studio 2012 or newer compiler

% − Matlab 2013a or newer ( in order to support Visual Studio 2012)

%

% Usage :

% 1 ) S e t t h e c o m p i l e r u s i n g mex −setup C++ ( note i t doesn ’ t work with

% c o m p i l e r s o l d e r than VS2012.

% 2 ) S e t t h e I n c l u d e P a t h and LibPath v a r i a b l e s i n t h i s f i l e t o t h e c o r r e c t

l o c a t i o n s

% 3 ) Add t o t h e windows path t h e b i n d i r e c t o r y c o n t a i n i n g t h e

% K i n e c t 2 0 . F u s i o n . d l l and K i n e c t 2 0 . F a c e . d l l

% ( For example : C: \ Program F i l e s \ M i c r o s o f t SDKs\ K i n e c t \ v2.0_1409 \ b i n )

% 4 ) C l o s e Matlab and open i t a g a i n .

% 5 ) Run t h i s f u n c t i o n .

%

% R e f e r e n c e : J . R. Terven and D. M. Cordova , �A k i n e c t 2 t o o l b o x f o r matlab , � % ps : / / g i t h u b . c o m / j r t e r v e n /Kin2 , 2016 .

I n c l u d e P a t h = ’C: \ Program F i l e s \ M i c r o s o f t SDKs\ K i n e c t \ v2.0_1409 \ i n c ’; LibPath = ’C: \ Program F i l e s \ M i c r o s o f t SDKs\ K i n e c t \ v2.0_1409 \ Lib \ x64 ’;

cd Mex

mex (’−compatibleArrayDims ’, ’−v ’, ’ Kin2_mex.cpp ’, ’ Kin2_base.cpp ’, . . .

’ Kin2_mapping.cpp ’, ’ K in 2 _ fa c e. c p p ’, ’ K i n 2 _ f u s i o n . c p p ’, [ ’−L ’ LibPath ] , . . .

(43)

% FaceHD_tracking implements f a c e d e t e c t i o n and t r a c k i n g

% R e f e r e n c e : J . R. Terven , D. M. Cordova , ”A K i n e c t 2 Toolbox f o r MATLAB” % h t t p s : / / g i th u b . c o m / j r t e r v e n /Kin2 , 2016 .

addpath (’Mex ’) ; c l e a r a l l

c l o s e a l l

% C r e a t e K i n e c t 2 o b j e c t and i n i t i a l i z e i t

% A v a i l a b l e s o u r c e s : ’ c o l o r ’ , ’ depth ’ , ’ i n f r a r e d ’ , ’ body_index ’ , ’ body ’ , % ’ f a c e ’ and ’ HDface ’

k2 = Kin2 (’ c o l o r ’,’ HDface ’) ;

% images s i z e s

c_width = 1 9 2 0 ; c_height = 1 0 8 0 ;

% C o l o r image i s t o o big , s o i t i s s c a l e d down COL_SCALE = 1 . 0 ;

% C r e a t e m a t r i c e s f o r t h e images

c o l o r = z e r o s ( c_height *COL_SCALE, c_width *COL_SCALE, 3 ,’ u i n t 8 ’) ;

% c o l o r stream f i g u r e c . h = f i g u r e ; c . a x = a x e s ; c . i m = imshow ( c o l o r , [ ] ) ; t i t l e (’ C o l o r S o u r c e ( p r e s s q t o e x i t ) ’) ; s e t ( g c f , ’ k e y p r e s s ’,’ k=g e t ( g c f , ’ ’ c u r r e n t c h a r ’ ’ ) ; ’) ; % l i s t e n k e y p r e s s model = z e r o s ( 3 , 1 3 4 7 ) ;

f i g u r e , hmodel = p l o t 3 ( model ( 1 , : ) , model ( 2 , : ) , model ( 3 , : ) ,’ . ’) ;

(44)

t 1 = [ ] ;

w h i l e t r u e

% Get f r a m e s from K i n e c t and s a v e them on u n d e r l y i n g b u f f e r v a l i d D a t a = k2.updateData ;

% B e f o r e p r o c e s s i n g t h e data , we need t o make s u r e t h a t a v a l i d % frame was a c q u i r e d . i f v a l i d D a t a % Get c o l o r frame c o l o r = k 2 . g e t C o l o r ; % update c o l o r f i g u r e c o l o r = i m r e s i z e ( c o l o r ,COL_SCALE) ; c . i m = imshow ( c o l o r , ’ Parent ’, c . a x ) ;

% Get t h e HDfaces data

% t h e output f a c e s i s a s t r u c t u r e a r r a y with a t most 6 f a c e s . Each f a c e has

% t h e f o l l o w i n g f i e l d s :

% − FaceBox : r e c t a n g l e c o o r d i n a t e s r e p r e s e n t i n g the f a c e p o s i t i o n in

% c o l o r s p a c e . [ l e f t , top , r i g h t , bottom ] .

% − FaceRotation : 1 x 3 vec tor c o n t a i n i n g : pitch , yaw , r o l l a n g l e s

% − HeadPivot : 1 x 3 vector , computed c e n t e r o f the head ,

% which t h e f a c e may be r o t a t e d a r o u n d .

% This p o i n t i s d e f i n e d i n t h e K i n e c t body c o o r d i n a t e s y s t e m .

% − AnimationUnits : 17 animation u n i t s (AUs) . Most o f the AUs are

% e x p r e s s e d a s a numeric w e i g h t v a r y i n g between 0 and 1 .

% For d e t a i l s s e e h t t p s : / / m s d n . m i c r o s o f t . c o m / en−us / l i b r a r y /

m i c r o s o f t . k i n e c t . f a c e . f a c e s h a p e a n i m a t i o n s . a s p x

% − ShapeUnits : 94 hape u n i t s ( SUs ) . Each SU i s expressed as a

% numeric w e i g h t t h a t t y p i c a l l y v a r i e s between −2 and +2 .

% For d e t a i l s s e e h t t p s : / / m s d n . m i c r o s o f t . c o m / en−us / l i b r a r y /

m i c r o s o f t . k i n e c t . f a c e . f a c e s h a p e d e f o r m a t i o n s . a s p x

% − FaceModel : 3 x 1347 p o i n t s o f a 3D f a c e model computed by f a c e

c a p t u r e

f a c e s = k2.getHDFaces (’ W i t h V e r t i c e s ’,’ t r u e ’) ; i f ~ isempty ( k )

i f strcmp ( k ,’ s ’) %i f u s e r has p r e s s e d ’ s ’ data a r e s t o r e d . c o o r d =[ c o o r d ; f a c e s ( 1 ) .FaceModel ] ;

head_pivot =[ head_pivot ; f a c e s ( 1 ) . Hea dP ivot ] ; r o t a t i o n =[ r o t a t i o n ; f a c e s ( 1 ) . F a c e R o t a t i o n ] ; t 1 =[ t 1 ; c l o c k ] ;

end end

(45)

% Parameters : % 1 ) image a x e s % 2 ) f a c e s s t r u c t u r e o b t a i n e d with g e t F a c e s % 3 ) d i s p l a y HD f a c e model v e r t i c e s ( 1 3 4 7 p o i n t s ) ? % 4 ) d i s p l a y t e x t i n f o r m a t i o n ( a n i m a t i o n u n i t s ) ? % 5 ) t e x t f o n t s i z e i n p i x e l s k2.drawHDFaces ( c . a x , f a c e s , t r u e , t r u e , 2 0 ) ; % P l o t f a c e model p o i n t s i f s i z e ( f a c e s , 2 ) > 0 % i =1 model = f a c e s ( 1 ) .FaceModel ;

(46)

% Mapping2Kinect_Space maps p o i n t s between depth and c o l o r images and from % depth and c o l o r images t o k i n e c t s p a c e .

%

% Usage :

% − Press ’d ’ to s e l e c t a point on the depth image. The s e l e c t e d point

% w i l l be mapped from depth t o camera and t h e r e s u l t i n g c o o r d i n a t e s a r e

% p r i n t e d on command window. Then t h e camera c o o r d i n a t e s a r e mapped

% back t o depth s p a c e and p r i n t e d t o command window.

% − Press ’ c ’ to s e l e c t a point on the c o l o r image. The s e l e c t e d point

% w i l l be mapped from c o l o r t o camera and t h e r e s u l t i n g c o o r d i n a t e s a r e

% p r i n t e d on command window. Then t h e camera c o o r d i n a t e s a r e mapped

% back t o c o l o r s p a c e and p r i n t e d t o command window.

% − Press ’ q ’ to e x i t . addpath (’Mex ’) ; c l e a r a l l c l o s e a l l % C r e a t e K i n e c t 2 o b j e c t and i n i t i a l i z e i t % S e l e c t s o u r c e s a s i n p u t p a r a m e t e r s .

% A v a i l a b l e s o u r c e s : ’ c o l o r ’ , ’ depth ’ , ’ i n f r a r e d ’ , ’ body_index ’ , ’ body ’ , % ’ f a c e ’ and ’ HDface ’

k2 = Kin2 (’ c o l o r ’,’ depth ’) ;

% images s i z e s

depth_width = 5 1 2 ; depth_height = 4 2 4 ; outOfRange = 4 0 0 0 ; c o l o r _ w i d t h = 1 9 2 0 ; c o l o r _ h e i g h t = 1 0 8 0 ;

% C o l o r image i s t o o big , s o i t i s s c a l e d down COL_SCALE = 0 . 5 ;

% C r e a t e m a t r i c e s f o r t h e images

depth = z e r o s ( depth_height , depth_width , ’ u i n t 1 6 ’) ;

c o l o r = z e r o s ( c o l o r _ h e i g h t *COL_SCALE, c o l o r _ w i d t h *COL_SCALE, 3 ,’ u i n t 8 ’) ; points_in_camera_coord = [ ] ;

points_in_camera_coord2 = [ ] ;

% Images used t o draw t h e markers

d e p t h A d d i t i o n s = z e r o s ( depth_height , depth_width , 3 , ’ u i n t 8 ’) ;

c o l o r A d d i t i o n s = z e r o s ( c o l o r _ h e i g h t *COL_SCALE, c o l o r _ w i d t h *COL_SCALE, 3 , ’ u i n t 8 ’) ;

% depth stream f i g u r e h1 = f i g u r e ;

hdepth = imshow ( depth , [ 0 2 5 5 ] ) ;

(47)

s e t ( g c f , ’ k e y p r e s s ’,’ k=g e t ( g c f , ’ ’ c u r r e n t c h a r ’ ’ ) ; ’) ; % l i s t e n k e y p r e s s % c o l o r stream f i g u r e h2 = f i g u r e ; h c o l o r = imshow ( c o l o r , [ ] ) ; t i t l e (’ C o l o r S o u r c e ( p r e s s q t o e x i t ) ’) ; s e t ( g c f , ’ k e y p r e s s ’,’ k=g e t ( g c f , ’ ’ c u r r e n t c h a r ’ ’ ) ; ’) ; % l i s t e n k e y p r e s s % Loop u n t i l p r e s s i n g ’ q ’ on any f i g u r e k = [ ] ; d i s p (’ I n s t r u c t i o n s : ’) d i s p (’ P r e s s d t o s e l e c t a p o i n t on t h e depth image ’) d i s p (’ P r e s s c t o s e l e c t a p o i n t on t h e c o l o r image ’) d i s p (’ P r e s s q on any f i g u r e t o e x i t ’) w h i l e t r u e

% Get f r a m e s from K i n e c t and s a v e them on u n d e r l y i n g b u f f e r v a l i d D a t a = k2.updateData ;

% B e f o r e p r o c e s s i n g t h e data , we need t o make s u r e t h a t a v a l i d % frame was a c q u i r e d .

i f v a l i d D a t a

% Copy data t o Matlab m a t r i c e s depth = k 2 . g e t D e p t h ;

c o l o r = k 2 . g e t C o l o r ; % update depth f i g u r e

depth8u = u i n t 8 ( depth * ( 2 5 5 / outOfRange ) ) ; depth8uc3 = repmat ( depth8u , [ 1 1 3 ] ) ;

s e t ( hdepth , ’ CData ’, depth8uc3 + d e p t h A d d i t i o n s ) ;

% update c o l o r f i g u r e

c o l o r = i m r e s i z e ( c o l o r ,COL_SCALE) ;

s e t ( h c o l o r , ’ CData ’, c o l o r + c o l o r A d d i t i o n s ) ; end

(48)

% Grab 1 p o i n t s [ x , y ] = g i n p u t ( 1 ) ;

d i s p ( ’ Input depth c o o r d i n a t e s ’) ; d i s p ( [ x y ] )

% Draw t h e s e l e c t e d p o i n t s i n t h e depth image

d e p t h A d d i t i o n s = i n s e r t M a r k e r ( d e p t h A d d i t i o n s , [ x y ] , ’ C o l o r ’,’ r e d ’) ;

% Map t h e p o i n t from depth c o o r d i n a t e s t o camera c o o r d i n a t e s % Input : 1 x 2 matrix ( 1 p o i n t s , x , y )

% Output : 1 x 3 matrix ( 1 p o i n t , x , y , z ) camCoords = k2.mapDepthPoints2Camera ( [ x y ] ) ;

points_in_camera_coord =[ points_in_camera_coord ; camCoords ] ; d i s p ( ’ Mapped camera c o o r d i n a t e s ’) ;

d i s p ( camCoords ) ;

% Map t h e r e s u l t i n g camera p o i n t back t o depth s p a c e depthCoords = k2.mapCameraPoints2Depth ( camCoords ) ; d i s p ( ’ Mapped depth c o o r d i n a t e s ’) ; d i s p ( depthCoords ) ; k = [ ] ; e l s e i f strcmp ( k ,’ c ’) f i g u r e ( h2 ) ; t i t l e (’ C l i c t h e image t o sample 5 p o i n t s ’) ; % Grab 1 p o i n t [ x , y ] = g i n p u t ( 1 ) ; d i s p ( ’ Input c o l o r c o o r d i n a t e s ’) ; d i s p ( [ x/COL_SCALE y/COL_SCALE] ) ; % Draw t h e s e l e c t e d p o i n t i n t h e c o l o r image c o l o r A d d i t i o n s = i n s e r t M a r k e r ( c o l o r A d d i t i o n s , [ x y ] ,’ C o l o r ’,’ g r e e n ’ ,’ S i z e ’, 5 ) ;

% Map t h e p o i n t s from c o l o r c o o r d i n a t e s t o camera c o o r d i n a t e s % Input : 1 x 2 matrix ( 1 p o i n t s , x , y )

% Output : 1 x 3 matrix ( 1 p o i n t , x , y , z )

camCoords = k2.mapColorPoints2Camera ( [ x/COL_SCALE y/COL_SCALE] ) ; points_in_camera_coord2 =[ camCoords ; points_in_camera_coord2 ] ; d i s p ( ’ Mapped camera c o o r d i n a t e s ’)

d i s p ( camCoords ) ;

(49)

(50)

% This s c r i p t c a l c u l a t e s t h e t r a n s f o r m a t i o n matrix between k i n e c t ’ s c o o r d i n a t e system

%( s o u r c e p o i n t s ) and PET’ s c o o r d i n a t e system ( t a r g e t p o i n t s ) .

% A: a 2xN o r 3xN matrix whos columns a r e t h e c o o r d i n a t e s o f N s o u r c e p o i n t s .

% B : a 2xN o r 3xN matrix whos columns a r e t h e c o o r d i n a t e s o f N t a r g e t p o i n t s .

%mapping p o i n t s f o r d i f f e r e n t s e s s i o n s %520−282 case % B=[−0.297 ,−0 .422 ,−0 .170 , ; 0,−0 .181 ,−0 .170 ; 0 .371 ,−0 .589 ,−0 .170 ] ; % B=B ’ ; %400−300 case % B=[−0.297 , −0.542 , −0.152 ; 0,−0 .301 , −0.152 ; 0 .371 , −0.709 , −0.152 ] ; % B=B ’ ; %400−350 case %B=[−0.297 , −0.542 , −0.102 ; 0 , −0.301 , −0.102 ; 0 .371 , −0.709 , −0.102 ] ; %B=B ’ ; %2nd s e s s i o n %150−322 case %B=[−0.364 ,−1 .276 ,−0 .130 ;0 , −0 .651 ,−0 .130 ; 0 .373 ,−1 .235 ,−0 .130 ; 0 .374 ,−1 .399 ,−0 . 1 3 0 ;−0 .024 ,−1 .141 ,−0 .130 ] ; %B2=[−0.364 ,−1 .126 ,−0 .008 ;0 , −0 .501 ,−0 .008 ; 0 .373 ,−1 .085 ,−0 .008 ; 0 .374 ,−1 .249 ,−0 . 0 0 8 ;−0 .024 ,−1 .091 ,−0 .008 ] ; %B3=[−0.364 ,−1 .126 ,−0 .130 ;0 , −0 .501 ,−0 .130 ; 0 .373 ,−1 .085 ,−0 .130 ; 0 .374 ,−1 .249 ,−0 . 1 3 0 ;−0 .024 ,−1 .091 ,−0 .130 ] ; %B=[B1 ; B2 ; B3 ] ; %B=B ’ ; %s e s s i o n 26/4/2016 176−250 %B=[−0.364 ,−1 .25 ,−0 .202 ;0 , −0 .625 ,−0 .202 ; 0 .373 ,−1 .209 ,−0 .202 ; 0 .374 ,−1 .373 ,−0 . 2 0 2 ;−0 .024 ,−1 .115 ,−0 .202 ] ; %B=B ’ ; %s e s s i o n 12/5/2016 B=[−0.154 , 0 , 0 .084 ; 0 , 0 , 0 ; 0 .145 , 0 , 0 .135 ;0 , −0 .150 , 0 ] ; B=B ’ ; A=points_in_camera_coord2 ’ ; [ regParams , B f i t , E r r o r S t a t s ]= a b s o r (A, B) ; T r a n s f =[ regParams.R , r e g P a r a m s . t ] ;

(51)

(52)

f u n c t i o n [ regParams , B f i t , E r r o r S t a t s ]= a b s o r (A, B, v a r a r g i n )

%ABSOR i s a t o o l f o r f i n d i n g t h e r o t a t i o n −− and o p t i o n a l l y a l s o the

%s c a l i n g and t r a n s l a t i o n −− that best maps one c o l l e c t i o n o f point c o o r d i n a t e s

t o

%a n o t h e r i n a l e a s t s q u a r e s s e n s e . I t i s based on Horn ’ s q u a t e r n i o n−based

method. The

%f u n c t i o n works f o r both 2D and 3D c o o r d i n a t e s , and a l s o g i v e s t h e o p t i o n o f w e i g h t i n g

%t h e c o o r d i n a t e s non−u n i f o r m l y . The code avoids for −loops so as to maximize

s p e e d . %

%DESCRIPTION : %

%As i n p u t data , one has %

% A: a 2xN o r 3xN matrix whos columns a r e t h e c o o r d i n a t e s o f N s o u r c e p o i n t s .

% B : a 2xN o r 3xN matrix whos columns a r e t h e c o o r d i n a t e s o f N t a r g e t p o i n t s .

% %The b a s i c sy nt a x % % [ regParams , B f i t , E r r o r S t a t s ]= a b s o r (A, B) % %s o l v e s t h e unweighted / u n s c a l e d r e g i s t r a t i o n problem %

% min. sum_i | | R*A( : , i ) + t − B( : , i ) | | ^ 2

%

%f o r unknown r o t a t i o n matrix R and unknown t r a n s l a t i o n v e c t o r t . %

%ABSOR can a l s o s o l v e t h e more g e n e r a l problem %

% min. sum_i w( i ) * | | s *R*A( : , i ) + t − B( : , i ) | | ^ 2

%

%where s>=0 i s an unknown g l o b a l s c a l e f a c t o r t o be e s t i m a t e d a l o n g with R and t

%and w i s a u s e r−s u p p l i e d N−vector o f w e i g h t s . One can i n c l u d e / e x c l u d e any

%c o m b i n a t i o n o f s , w, and t r a n s l a t i o n t i n t h e problem f o r m u l a t i o n . Which %p a r a m e t e r s p a r t i c i p a t e i s c o n t r o l l e d u s i n g t h e syntax ,

%

% [ regParams , B f i t , E r r o r S t a t s ]= a b s o r (A, B, ’ param1 ’ , valu e1 , ’ param2 ’ , val ue2 , . . . )

%

%with parameter / v a l u e p a i r o p t i o n s , %

% ’ d o S c a l e ’ − Boolean f l a g . I f TRUE, the g l o b a l s c a l e f a c t o r , s , i s

i n c l u d e d .

(53)

%

% ’ doTrans ’ − Boolean f l a g . I f TRUE, the t r a n s l a t i o n , t , i s i n c l u d e d .

Otherwise ,

% z e r o t r a n s l a t i o n i s assumed. D e f a u l t=TRUE.

%

% ’ w e i g h t s ’ − The length N−vector o f weights , w. Default , no w e i g h t i n g .

% %

%OUTPUTS: %

%

% regParams : s t r u c t u r e output with e s t i m a t e d r e g i s t r a t i o n parameters , %

% regParams.R : The e s t i m a t e d r o t a t i o n matrix , R

% r e g P a r a m s . t : The e s t i m a t e d t r a n s l a t i o n v e c t o r , t

% r e g P a r a m s . s : The e s t i m a t e d s c a l e f a c t o r .

% regParams.M : Homogenous c o o r d i n a t e t r a n s f o r m matrix [ s *R, t ; [ 0 0 . . .

1 ] ] . %

% For 3D problems , t h e s t r u c t u r e i n c l u d e s

%

% regParams.q : A u n i t q u a t e r n i o n [ q0 qx qy qz ] c o r r e s p o n d i n g t o R and

% s i g n e d t o s a t i s f y max( q )=max( abs ( q ) )>0

%

% For 2D problems , i t i n c l u d e s

%

% r e g P a r a m s . t h e t a : t h e c o u n t e r−c l o c k w i s e r o t a t i o n angle about the

(54)

% Copyright , Xoran T e c h n o l o g i e s , I n c . h t t p : / / www.xorantech.com %%Input o p t i o n p r o c e s s i n g and s e t up o p t i o n s . d o S c a l e = 0 ; o p t i o n s . d o T r a n s = 1 ; o p t i o n s . w e i g h t s = [ ] ; f o r i i = 1 : 2 : l e n g t h ( v a r a r g i n ) param=v a r a r g i n { i i } ; v a l=v a r a r g i n { i i +1}; i f s t r c m p i ( param , ’ d o S c a l e ’) , o p t i o n s . d o S c a l e=v a l ; e l s e i f s t r c m p i ( param , ’ w e i g h t s ’) o p t i o n s . w e i g h t s=v a l ; e l s e i f s t r c m p i ( param , ’ doTrans ’) o p t i o n s . d o T r a n s=v a l ; e l s e

e r r o r ( [ ’ Option ’ ’ ’ param ’ ’ ’ not r e c o g n i z e d ’] ) ; end end d o S c a l e = o p t i o n s . d o S c a l e ; doTrans = o p t i o n s . d o T r a n s ; w e i g h t s = o p t i o n s . w e i g h t s ; i f ~ isempty ( which (’ bs x f un ’) )

matmvec=@(M, v ) b s x f u n ( @minus ,M, v ) ; %matrix−minus−vector

mattvec=@(M, v ) b s x f u n ( @times ,M, v ) ; %matrix−minus−vector

e l s e matmvec=@matmvecHandle ; mattvec=@mattvecHandle ; end d i m e n s i o n=s i z e (A, 1 ) ; i f d i m e ns io n~= s i z e (B, 1 ) ,

e r r o r ’ The number o f p o i n t s t o be r e g i s t e r e d must be t h e same ’

(55)

%%C e n t e r i n g / w e i g h t i n g o f i n p u t data i f doTrans

i f isempty ( w e i g h t s ) sumwts =1;

l c=mean (A, 2 ) ; r c=mean (B, 2 ) ; %C e n t r o i d s

(56)

r i g h t = mattvec (B, s q r t w t s ) ; end end M= l e f t * r i g h t . ’ ; %%Compute r o t a t i o n matrix s w i t c h di m en si on c a s e 2 Nxx=M( 1 )+M( 4 ) ; Nyx=M( 3 )−M( 2 ) ; N=[Nxx Nyx ; . . . Nyx −Nxx ] ; [ V,D]= e i g (N) ; [ t r a s h , emax]=max( r e a l ( d i a g (D) ) ) ; emax=emax ( 1 ) ;

q=V( : , emax ) ; %Gets e i g e n v e c t o r c o r r e s p o n d i n g t o maximum e i g e n v a l u e q=r e a l ( q ) ; %Get r i d o f i m a g i n a r y p a r t c a u s e d by n u m e r i c a l e r r o r q=q* s i g n ( q ( 2 ) +(q ( 2 ) >=0)) ; %S i g n ambiguity q=q . /norm ( q ) ; R11=q ( 1 ) ^2−q ( 2 ) ^2; R21=prod ( q ) * 2 ; R=[R11 −R21 ; R21 R11 ] ; %map t o o r t h o g o n a l matrix c a s e 3

[ Sxx , Syx , Szx , Sxy , Syy , Szy , Sxz , Syz , Szz ]= d e a l r (M( : ) ) ;

(57)

( Syz−Szy ) ( Sxx−Syy−Szz ) ( Sxy+Syx ) ( Szx+Sxz ) ;. . .

( Szx−Sxz ) ( Sxy+Syx ) (−Sxx+Syy−Szz ) ( Syz+Szy ) ;. . .

( Sxy−Syx ) ( Szx+Sxz ) ( Syz+Szy ) (−Sxx−Syy+Szz ) ] ;

[ V,D]= e i g (N) ;

[ t r a s h , emax]=max( r e a l ( d i a g (D) ) ) ; emax=emax ( 1 ) ;

q=V( : , emax ) ; %Gets e i g e n v e c t o r c o r r e s p o n d i n g t o maximum e i g e n v a l u e q=r e a l ( q ) ; %Get r i d o f i m a g i n a r y p a r t c a u s e d by n u m e r i c a l e r r o r [ t r a s h , i i ]=max( abs ( q ) ) ; sgn=s i g n ( q ( i i ( 1 ) ) ) ; q=q* sgn ; %S i g n ambiguity %map t o o r t h o g o n a l matrix quat=q ( : ) ; nrm=norm ( quat ) ; i f ~nrm d i s p ’ Quaternion d i s t r i b u t i o n i s 0 ’ end

(58)

(59)

end end

i f nargout >2

l2norm = @(M, dim ) s q r t ( sum ( M.^2 , dim ) ) ; e r r=l2norm ( B f i t−B, 1 ) ;

i f ~ isempty ( w e i g h t s ) , e r r=e r r . * s q r t w t s ; end

(60)

%This s c r i p t maps t h e head p i v o t from k i n e c t ’ s c o o r d i n a t e system t o PET’ s %c o o r d i n a t e system and p l o t s t h e f i n a l x , y , z c o o r d i n a t e s . head_pivot=head_pivot ’ ; %r e p l a c e i n v a l i d f r a m e s with t h e mean o f n e i g h b o u r i n g f r a m e s [ i , j ]= f i n d ( head_pivot==0) ; f o r k=1: s i z e ( j ) /3

head_pivot ( : , j ( 3 * ( k−1)+1) )=mean ( [ head_pivot ( : , j (3*( k−1)+1)−1) , head_pivot ( : , j

(61)

(62)

f u n c t i o n [ x , y , z ] = decompose_rotation (R) x = atan2 (R( 3 , 2 ) , R( 3 , 3 ) ) ;

y = atan2 (−R( 3 , 1 ) , s q r t (R( 3 , 2 ) *R( 3 , 2 ) + R( 3 , 3 ) *R( 3 , 3 ) ) ) ;

(63)

%%This s c r i p t maps t h e measured a n g l e s from k i n e c t ’ s c o o r d i n a t e system t o PET’ s

%%c o o r d i n a t e system and p l o t s t h e f i n a l p i t c h , r o l l , yaw a n g l e s . %r e p l a c e i n v a l i d f r a m e s with t h e mean o f n e i g h b o u r i n g f r a m e s [ i , j ]= f i n d ( r o t a t i o n ==0) ; f o r k=1: i r o t a t i o n ( k , : ) =mean ( [ r o t a t i o n ( k , : )−1, r o t a t i o n (k , : ) +1] ,2) ; end c l e a r i j p i t c h=r o t a t i o n ( : , 1 ) ; yaw=r o t a t i o n ( : , 2 ) ; r o l l =r o t a t i o n ( : , 3 ) ; %from d e g r e e s t o r a d i a n t s x=deg2rad ( p i t c h ) ; y=deg2rad ( yaw ) ; z=deg2rad ( r o l l ) ; %i n i t i a l i z e m a t r i x e s R= [ ] ; x _ f i n a l = [ ] ; y _ f i n a l = [ ] ; z _ f i n a l = [ ] ; %compose r o t a t i o n matrix f o r i =1: s i z e ( x , 1 ) R_temp = compose_rotation ( x ( i ) , y ( i ) , z ( i ) ) ; R=[R, R_temp ] ; end

%map from one c o o r d i n a t e system t o t h e o t h e r r o t a t i o n _ f i n a l=T r a n s f ^(−1)*R; %decompose r o t a t i o n matrix N=s i z e ( r o t a t i o n _ f i n a l , 2 ) ; f o r k =1:(N/ 3 ) [ x _ f i n a l ( k ) , y _ f i n a l ( k ) , z _ f i n a l ( k ) ] = decompose_rotation ( r o t a t i o n _ f i n a l ( : , ( 3 * ( k −1)+1) : ( 3 * k ) ) ) ; end %from r a d i a n t s t o d e g r e e s

f i n a l =[ rad2deg ( x _ f i n a l ) ; rad2deg ( y _ f i n a l ) ; rad2deg ( z _ f i n a l ) ] ; %a v e r a g i n g o v e r 5 f r a m e s

f o r j =1:( s i z e ( f i n a l , 2 ) / 3 )

(64)

Real-time Head Motion Tracking for Brain Positron Emission Tomography using Microsoft Kinect V2