Online Predictions of Human Motion

(1)

INOM

EXAMENSARBETE TEKNIK, GRUNDNIVÅ, 15 HP

STOCKHOLM SVERIGE 2017 ,

Online Predictions of Human Motion

ANDREAS EDVARDSSON

LUCAS GRÖNLUND

(2)

(3)

Abstract

Collaboration between humans and robots is becoming an increasingly com- mon occurrence in both industry and homes, more so with every forthcoming technological advance. This paper examines the possibilities of performing human hand movement predictions on the fly, e.g. by only using informa- tion up to the specific moment in time of which the prediction is carried out.

Specifically, data will be collected using a Kinect (v.1).

The model used for the predictor developed is the Minimum Jerk model, which states that certain multi-joint reaching movements are planned in such a way that the hand is to follow a straight path while maximizing smoothness.

Extent, direction and duration of the motion are main objectives for the predictor to determine, with a Kalman filter and curve fitting as the main constituents. Another assumption in this work is that a reliable start detector is available. An experiment where five volunteers were to perform different reaching movements was conducted.

This study shows that the approach is feasible in some cases, namely

usable predictions is acquired for long movements. In the case of short

movements the alternative of not doing any prediction was by all means

better.

(4)

Referat

Samarbete mellan människor och robotar blir allt vanligare både i industrin och i hushållen, än mer så för varje ytterligare tekniskt framsteg. I detta arbete undersöks möjligheterna att genomföra förutsägelser av handrörelser online , d.v.s. att bara ta hänsyn till information upp till tillfället då förut- sägelsen äger rum. All data kommer att samlas in med en Kinect (v.1).

Antagandet som görs är att en bra startdetektor är tillgänglig. Den föreslagna och undersökta prediktorn grundas på Minimum-jerk modellen som bygger på att flerledade sträckrörelser utförs på ett sådant sätt att handen följer en rak bana med maximal jämnhet. Längd, riktning och tiden en rörelse tar utgör de huvudsakliga delarna som måste bestämmas. Ett experiment där frivilliga fick utföra olika sträckrörelser genomfördes.

Det visar sig att denna prediktor ger användbara förutsägelser på längre

rörelser, medan att inte göra någon förutsägelse alls är det bättre alternativet

för korta rörelser. Några förslag för att förbättra prediktorn presenteras

slutligen.

(5)

Preface

We would like to thank all the people who participated in the experiment

helping us collecting data, and Christian Smith for providing us with valuable

information and support.

(6)

1 Introduction 1

1.1 Previous work . . . . 1

1.2 Goals . . . . 1

1.3 Theory . . . . 2

1.3.1 Minimum Jerk . . . . 2

1.3.2 Kalman Filter . . . . 2

2 Method 6 2.1 Problem . . . . 6

2.2 Implementation . . . . 6

2.2.1 Equipment . . . . 6

2.2.2 Limitations . . . . 6

2.2.3 Design considerations . . . . 7

2.2.4 The Kalman filter . . . . 7

2.2.5 Predictor . . . . 7

2.3 Evaluation . . . . 8

2.3.1 Experiment . . . . 9

2.4 Analysis . . . . 9

2.4.1 Good results . . . 11

3 Results 13 4 Analysis 17 4.1 The influence of previous experience . . . 17

4.2 Radial and angle errors . . . 17

5 Discussion 18 5.1 Future Work . . . 18

5.2 Conclusion . . . 19

(7)

1 Introduction

In the future generations of robots, closer collaborations with humans will be a more common occurrence. When two humans interact, for example in a handover motion, the interaction has often begun long before the actual movement starts by eye contact and body posture among other things.[9]

This all comes natural to humans. If a robot is unable to mimic and interpret these signals, the interaction might feel unnatural and thus make humans less likely to proceed with the motion. For robots to achieve a more natural behavior in a handover motion, one of the things it must be able to do is predict when a handover motion starts and where it will end.[11] This would enable the robot to start its motion and meet the human, thus making the interaction more human like and possibly more efficient.

1.1 Previous work

The minimum-jerk model have been shown to describe human motion and thus be able to predict future motions accurately.[16][13] Previous studies on human-robot interactions has shown that when using a minimum jerk motion as basis for the robot as supposed to conventional trapezoidal one, reaction time was significantly shorter.[6] Another study suggests that about half of the giving motion has usually been carried through before the receiver reacts in ordinary human-human interactions.[1]

Without knowing the end of a motion, fitting a minimum-jerk polyno- mial would include two unknown parameters, the time and position where the movement ends. If instead, you could detect the duration of a motion earlier, there would only be one unknown parameter to fit. Different ways of predicting the duration of a movement have been tried previously, such as using the acceleration or velocity profile of a minimum-jerk motion.[12][10]

1.2 Goals

This study will be focused on predicting a reaching movement’s final position

early on using a relatively cheap sensor, the Kinect v. 1, contrary to most

previous studies where more expensive motion capture equipment have been

used. Since reaction time in human-human interactions seems to be up to

about half the duration of the total motion, predictions up to the 50 % point

of completed movement is of interest, with the goal being better performance

than not using any prediction at all. The problem of finding starting points

of the motions will not be considered.

(8)

1.3 Theory

In order to make predictions, assumptions on how humans move have to be done. A model that’s widely used is the minimum jerk model, which has been shown to obey certain properties of human reaching movements.[3] The Kalman filter[15] will also be central to the approach suggested in this study.

1.3.1 Minimum Jerk

One of the most prevalent mathematical models for predicting human move- ments is the minimum jerk model. For unconstrained point-to-point move- ments the objective that human bodies seems to strive towards is to generate the smoothest motion which brings the hand from the initial position to the final position in a given time. One way to accomplish this is to minimize the mean-square of the jerk ¹ .[3][13]

J = 1 2

Z _t _{f inish}

t start

( d ³ x

dt ³ )dt (1)

Under specific conditions the minimum jerk model can be simplified into a fifth degree polynomial, with the constraints being that the movement starts and ends with zero velocity and acceleration. Assuming this, the polynomial in equation (2) is acquired.

x(t) = x ₀ + (x ₀ − x _f )(15 ⁴ − 6 ⁵ − 10 ³ ) (2) Where x 0 is the initial position of the hand, x f is the final position and

= t/t _f is the normalized time parameter. The position, velocity and accel- eration profiles are shown in Figure (1).

When there is a change in the final position after a movement has begun, superposition of several minimum jerk motions have been suggested as a way to adapt the motion.[2] Though, in this study we will only look at a single minimum jerk motion and not superposition’s of several.

1.3.2 Kalman Filter

A Kalman filter is commonly used when the states of a linear model cannot accurately be calculated directly due to observation (measurement) noise.

Common applications include navigation, radar tracking and certain imple- mentations in econometrics. The classical Kalman filter, which is used in

1 Time derivative of the acceleration.

(9)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time [s]

0 0.2 0.4 0.6 0.8 1

Displacement [m]

(a) Minimum jerk trajec- tory.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time [s]

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

Velocity [m/s]

(b) Minimum jerk velocity profile.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time [s]

-6 -4 -2 0 2 4 6

Acceleration [m/s2]

(c) Minimum jerk acceler- ation profile.

Figure 1: Minimum jerk profiles.

this study, requires the model to be linear and the noise to be Gaussian in order guarantee successful operation. [14]

To use a Kalman filter we need a state vector, a model and measurements.

The state vector contains all parameters needed to describe the movement and make a prediction, like position, velocity and acceleration. The model describes the system will behave as a function of current state and time-step.

As mentioned before this is always a linear function of the states when using an ordinary Kalman filter.

State vector : x _t =





 x _0,t x 1,t

...

x n,t





 M odel : x _t−1 −−−→ x ^model _t

(3)

Reality often differs quite a bit from the model, so an additional term is added to the state equation - the process noise v t , which is assumed to be Gaussian. This can all be expressed in vector form with F as the model matrix:

x _t = F x _t−1 + v _t−1 (4)

The measured data does not have to contain exactly the same parameters as the state vector. For example, predictions about a movements velocity could be acquired using only position data. The extraction of relevant pre- dictions from the state vector is done by multiplication with an observation model matrix H that maps from the state space to the observation space.

Measurements are often accompanied by noise which is assumed to be Gaus- sian, so a measurement noise term w t is added to the equation:

z _t = Hx _t + w _t (5)

(10)

Actual estimations is now done by taking a linear function of the pre- dicted state and the difference between the actual measurement and the predicted measurement multiplied by the Kalman gain K:

x _t,est = x _t − K(z _t,true − z _t ) (6)

A derivation of the Kalman gain will not be presented here, but the idea behind it is to look at estimations of the covariance matrices of the state and measurement predictions and use this to minimize the a posteriori error covariance. The estimated state covariance matrix P t tells us the variability of the state, and in the same way the estimated measurement covariance matrix S t tells us the variability of the measurements. Intuitively we want the Kalman gain to be small if the measurement noise is large, and the reverse if the process noise is large. The optimal Kalman gain[15] can be written as:

K = P t H ^T S ⁻¹ _t (7)

The idea behind a Kalman filter is to incorporate the predictions of our state and measurement with the actual sensor information, and thus get a better estimation. Only information about the previous estimated states and the current measurements are needed, thus making it ideal to use in a real time scenario. [15, 5]

Each time new data is acquired, a process that can be described as a cycle containing two steps starts. Firstly during the prediction step, the predicted position and covariance is calculated. Then during the update step , the Kalman gain is computed along with the estimation of the state and the covariance matrix.

Figure 2: Kalman filter cycle diagram.

(11)

It is obvious that the model plays a huge role, but the requirement of

linearity might seem limiting in applications involving real physical phe-

nomenon since those more often than not are only approximately linear in

some interval. Moreover, even with the linearity requirement removed it is

reasonable to assume that the model will not exactly describe the process

in question. This problem, which is one common cause for filter divergence,

is battled by including a model noise term. If the model linearisation errors

are big one could instead use the Extended Kalman filter. [14]

(12)

2 Method

In this section the problem and the proposed predictor is defined, as well as the analysis to be carried out. In addition, the question of a good result is discussed.

2.1 Problem

One wishes to construct a causal predictor P delivering final position predic- tions ˆx f = P (y t , y t−1 , . . . , y 0 ) such that the error ε, defined in (8), is mini- mized early on, where x f is the true final position and y t−n = y _t−n (x, y, z) is the measured data from n time steps before. Moreover, it should be able to operate successfully with data of lower quality compared to traditional motion capture laboratories. The final values for quantities is denoted with subscript f, while predictions are marked with a hat. Therefore, the predic- tion of the final position is called ˆx f .

ε := |x f − ˆ x f | (8)

2.2 Implementation

Here the implementation and methods of the proposed predictor is described in greater detail.

2.2.1 Equipment

A Microsoft Kinect [7] with a frame rate of 30 fps was used together with Kinect for Windows SDK 1.7 for skeleton tracking. The joint positions reported by the development tool kit were recorded in a Cartesian coordinate system defined by the Kinect. Only the right hand joint data was saved and used in this study.

2.2.2 Limitations

A simple start detection algorithm based on edge detection using exponen-

tial moving averages was developed. However, for this study it was assumed

that start points could be found consistently. That is, starting points ac-

quired using edge detection that weren’t consistent with the ones of other

movements were tuned by hand.

(13)

2.2.3 Design considerations

Since the main focus of this study was to get good performances during the first 50 % or so of the movement, not much work was put into performance during the later parts of a movement.

2.2.4 The Kalman filter

Because of the somewhat noisy data acquired from the Kinect, a Kalman filter was used in order to better be able to track motions. The state vector was defined as

x _t =



 x t

˙ x _t

¨ x _t



 (9)

where x t is the position, v t is the velocity and a t is the acceleration. Us- ing the Weiner-Process Acceleration Model[8] where it is assumed that the acceleration is nearly constant, the state equation is

x _t = F x _t−1 + v _t−1 , F =





1 T T ² /2

0 1 T

0 0 1



 (10)

where T is the time between acquisition of two data points (in this case 33 ms because of the Kinect’s 30 FPS) and v t−1 being the process noise. Since the Kinect only measure position data, the measurement equation is

z _t = Hx _t + w _t , H = 1 0 0 (11)

where w t is the measurement noise. As mentioned in [8], the process noise error covariance Q is set to be:

Q =





T ⁵ /20 T ⁴ /8 T ³ /6 T ⁴ /8 T ³ /3 T ² /2 T ³ /6 T ² /2 T



 (12)

2.2.5 Predictor

Just performing a least squares fit of the Minimum Jerk polynomial (2) to

the data in each of the x, y, z coordinates turns out to be problematic for a

couple of reasons. Mainly, the number of data points is potentially very low

because of the desire to make a prediction, thus making the fitting of a fifth

degree polynomial with two unknown very sensitive to measurement errors.

(14)

In order to combat this, it is suggested to instead look at the time derivative of this polynomial, (15), and determine one of the unknowns, t f . [13]

While keeping the Cartesian raw data from the Kinect for direction pre- dictions, the first step is a conversion to a spherical coordinate system with the origin set to the (Cartesian) start position of the movement is conducted first.

x = (x, y, z) → r = (r, φ, θ) (13) Using the spherical representation the radial component r is fed into the Kalman filter, from which the most likely velocity profile belonging to the data points is gathered.

r _t − ^Kalman −−−− → v _t (14)

Finding the time it takes to perform the movement is now done using this velocity data, since the start of a movement is more discernible in the ve- locity space. Fitting a Minimum Jerk time derivative, equation (15), using Matlab’s least squares method now gives a guess of the motions duration ˆ t f .

˙

x(t) = (x 0 − x _f )

t _f (60 ³ − 30 ⁴ − 30 ² ) (15) With the time duration guess, an ordinary minimum jerk polynomial, equa- tion (2), with "known" time ˆt f could be fitted to the unfiltered radial position data r t . That is, the extent of the true movement r f was predicted this way, giving ˆr f . See Figure (3) for illustrative predictions of a movement.

However, knowing ˆr f and the origin is not enough to uniquely identify a point in a spherical coordinate system, the direction (φ, θ) is required as well. This was done by treating all the raw Cartesian coordinate data points belonging to a reaching movement as a point cloud, to which a best line was fit using Matlab’s least squares method. The direction of this line was considered the direction guess ˆd f . Note that this was calculated in Cartesian coordinates.

Every time new data is acquired these steps, or predictions, are carried out by the predictor P .

2.3 Evaluation

Here an explanation of the experiment conducted is presented, and a short

discussion on the analysis and what good results would be.

(15)

-0.2 0 0.2 0.4 0.6 0.8 1 1.2 Time [s]

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Displacement [m]

(a) Predictions on a movement.

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4

Time [s]

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

Velocity [m/s]

(b) Predictions on a Kalman filtered ve- locity profile.

Figure 3: predictions

2.3.1 Experiment

An experiment in which people were asked to do a simple reaching move- ment to different points in space was conducted. The setup consisted of a table with eight small targets on two different heights (15 cm, 40 cm) in five directions with respect to the subject’s reaching arm, and a marker indicat- ing the start position (height 30 cm). Five persons plus the authors of this paper performed five motions in a sequence towards each of the targets. A Kinect (v.1) placed approximately 2 meters away at an angle of about 45 ° was used to capture the data.

A target consisted of a stick with a paper pulp ball in one end and a block used for attachment to a table in the other.

Each subject was instructed to perform a natural reaching movement from the start marker to, or as close as possible to the target marker.

2.4 Analysis

The analysis is mainly carried out as a comparison between the proposed predictor and the naive approach - to say that the final position guess is the latest measured position: ˆx f,naive = y _t . The naive error is defined to be ε naive = |x f − ˆ x f,naive | , analogous to the prediction error ε.

A separation of the radial (extent) guess and the direction guess for the

long and short movements respectively will be made in a separate figure, in

order to draw more precise conclusions on the predictor’s performance. The

radial error is defined in a similar was to the total error ε, simply subtract

(16)

Figure 4: Final target placement.

Figure 5: Experiment setup illustration.

(17)

Figure 6: Setup idea, approximates the final experiment.

the extent (radial) guess ˆr f from the true value r f ; leading to the radial error defined in equation (16).

ρ = |r _f − ˆ r _f | (16)

The direction guess ˆd f = ˆ d _f (x, y, z) is compared to the true direction d f = d f (x, y, z) by using the planar angular error α, which is calculated by considering geometrical properties, leading to the expression in equation (17).

α = arctan |ˆ d _f × d _f | d ˆ f · d _f

!

(17)

2.4.1 Good results

In a human-robot interaction, as long as ε is relatively small the human can always adapt it’s motion to suit the robot better. In this study, ε is the prediction error early on during the movement so what is considered "small"

depends on the situation. During a normal handover motion, for example

(18)

handing over a tool in an industrial setting, an error of about 15 cm should

be acceptable.

(19)

3 Results

The most obviously interesting measurement is how the error ε in equation (8) depends on the proportion of the movement captured. Figure (7) shows both this error and the naive error ε naive for each of the movements. The values up to 50 % are the most relevant to this study because of the charac- teristics of human-human handover motions, as mentioned earlier.

A natural second question to ask is how the different parts of the guess, mainly the extent ˆr f and the direction ˆ (φ, θ) , contributes to the error. Fig- ures (8) and (9) breaks up the total error into those for the long and short movements respectively.

When making a predictor, certain assumptions about the properties of a movement has to be done. Does knowledge of the underlying assumptions, or experience with the predictor, affect the performance? Figure (10) attempts to illustrate this by comparing the recruited volunteers in the experiment to the authors of this paper performing the same experiment.

Table (1) attempts to give a quick overview of the performance by indi- cating the part of predictions with an error ε less than a certain amount, to the part of the movement recorded. That is, the percentages of movements with ε < r, where r is the radius of a sphere (globe) centred at the final position.

Table 1: Percentage of position predictions after a given time within a sphere of given radius centred around the goal.

radius [cm]

% of movement done

10 20 30 40 50

5 0 22.4 27.3 26.8 27.8

10 10.2 47.8 56.4 52.1 50.6

15 33.9 61.2 76.4 71.8 69.6

20 57.6 82.1 87.3 84.5 88.6

25 74.6 89.6 92.7 93.0 100.0

(20)

10 15 20 25 30 35 40 45 50 55 60

% of movement 0

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

[m]

(a) Movement 1

10 15 20 25 30 35 40 45 50 55 60

% of movement 0

0.05 0.1 0.15 0.2 0.25

[m]

(b) Movement 2

10 15 20 25 30 35 40 45 50 55 60

% of movement 0

0.05 0.1 0.15 0.2 0.25 0.3 0.35

[m]

(c) Movement 3

10 15 20 25 30 35 40 45 50 55 60

% of movement 0

0.05 0.1 0.15 0.2 0.25 0.3

[m]

(d) Movement 4

10 15 20 25 30 35 40 45 50 55 60

% of movement 0

0.05 0.1 0.15 0.2 0.25 0.3 0.35

[m]

(e) Movement 5

10 15 20 25 30 35 40 45 50 55 60

% of movement 0

0.05 0.1 0.15 0.2 0.25 0.3

[m]

(f) Movement 6

10 15 20 25 30 35 40 45 50 55 60

% of movement 0

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

[m]

(g) Movement 7

10 15 20 25 30 35 40 45 50 55 60

% of movement 0

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

[m]

(h) Movement 8

Figure 7: The absolute error of predictions in 3D space plotted against the

percentage of a movement done at the time of the prediction. The naive

prediction is no prediction at all, i.e. to suppose that the latest measured

position is final. Movement n corresponds to a motion from the start position

to position n according to Figure (5). The errors ε are defined as in equation

(8).

(21)

10 15 20 25 30 35 40 45 50 55 60

% of movement 0

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

[m]

(a) Radial error.

10 15 20 25 30 35 40 45 50 55 60

% of movement 0

5 10 15 20 25 30 35 40

[degrees]

(b) Angular error.

10 15 20 25 30 35 40 45 50 55 60

% of movement 0

0.1 0.2 0.3 0.4 0.5

[m]

(c) Total error.

Figure 8: Errors for different components of the predictions on long move- ments. The radial error ρ is defined in eq. (16), the angular α in eq. (17) and the total in eq. (8).

10 15 20 25 30 35 40 45 50 55 60

% of movement 0

0.05 0.1 0.15 0.2 0.25

[m]

(a) Radial error.

10 15 20 25 30 35 40 45 50 55 60

% of movement 0

10 20 30 40 50 60 70 80

[degrees]

(b) Angular error.

10 15 20 25 30 35 40 45 50 55 60

% of movement 0

0.05 0.1 0.15 0.2 0.25 0.3

[m]

(c) Total error.

Figure 9: Errors for different components of the predictions on short move-

ments. The radial error ρ is defined in eq. (16), the angular α in eq. (17)

and the total in eq. (8).

(22)

10 15 20 25 30 35 40 45 50 55 60

% of movement 0

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

[m]

(a) Errors for inexperienced users.

10 15 20 25 30 35 40 45 50 55 60

% of movement 0

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

[m]

(b) Errors for experienced users.

Figure 10: This illustrates the difference in prediction errors between inex-

perienced and experienced users. The error ε is defined in eq. (8).

(23)

4 Analysis

From figure (7) it is evident that the predictor works unequally well on short and long movements, with shorter ones at a disadvantage. Eventual causes for this might be that the noise is relatively larger compared to the movement of the hand. That is, a larger part of the movement has to be completed before a large absolute position change is accomplished. However, the long movements indicate that predictions of this sort in some cases is feasible and fully possible.

Table (1) indicates that about 90 % of the predictions at 30 % has an error of less than 15 cm. In chapter 2 it was determined that a tool-handing robot ought to perform well with an error ε of this size, given it is an early prediction.

4.1 The influence of previous experience

Figure (10) suggests that training and experience with the predictor and equipment might have a positive impact on the results. It is worth noting that the experiment was not set up to test this, but recordings of the authors performing the same experiment twice each still makes it possible to draw some (arguably) vague conclusions. In the case of robots assisting humans in some repetitive task, for example in the industry, this could prove valuable.

4.2 Radial and angle errors

The radial error represents the error in amplitude predictions of the move-

ment, while the angle error represents the two dimensional prediction devi-

ation from the true direction. Figures (8) and (9) illustrating this makes it

possible to conclude the order of magnitude of the errors originating from

the different parts of the prediction. While the angular error is neither better

or worse than the naive guess, that is the expected result since using a least

squares method like this puts emphasis on the points furthest away, or in

our case, the most recently acquired data points.

(24)

5 Discussion

During the experiment it was noted that people, when transitioning from one position to another, it was occasionally done with a curved motion.

This naturally introduces errors in the direction guess ˆd f early on, as seen in figures (9b) and (8b).

Looking at the predictors performance on short movements, it was noted that there is no benefit of using the predicted position and time over using no prediction at all. This might have several explanations. Firstly, all of the short movements had a downwards direction, this may have caused an abnormal reaching movement which makes the minimum jerk model not ideal in this experiment. Secondly, the capture noise from the Kinect matters more when looking at shorter movements since the noise is larger relative to the motion.

When comparing the results of using a minimum jerk predictor and using no prediction at all, there seems to be a small performance gain by using the predictor during long (> 30cm) motions, this even though much more improvement could be done on the methods used in this study. As suggested by figure (10) there seems to be merit undergoing training in how to move

"correctly", as this could enhance performance even more.

Given the results in this report, the predictor could be suitable for giving an early indication on where the target (hand) is going when the move- ment paths are long, wile dropping back to i.e. the naive method when the movement is approaching the end. This would guarantee convergence and illustrates a possible practical implementation.

5.1 Future Work

One of the greatest limitations in this study was the ability to predict a movements duration and direction. This could be improved upon by using a better Kalman filter[4], tracking more joints and making the predictor be able to work with a superposition of several minimum jerk motions. Some of these improvements have already been solved [13], and just needs to be incorporated in the predictor P . The methods used for predictions could also certainly be improved upon, and using a Kinect v. 2 [7] to capture data would also result in more precise measurements thanks to the increased resolution of the new depth camera.

An issue with the experiment conducted was that the movements per-

formed often had a slightly curved shape which resulted in impaired direction

guesses during the initial moments of a motion. This could be improved upon

(25)

by conducting a more strict experiment where subjects are better instructed on how to move.

5.2 Conclusion

Even though lots of improvements could be done upon the prediction meth-

ods and experiment conducted, some conclusion could be made. The results

show that using the minimum jerk model to predict a motions final position

generally yielded greater performance than using no predictions while move-

ments spanned a distance greater than about 30 cm. Suitable applications

include handover tasks over distances equivalent to that of a normal desktop

table, while an unsuitable task would be a robot assisting a surgeon. Fi-

nally, it is evident that the usage of cheap everyday equipment, such as the

Kinect, opens up possibilities of more human like interactions in ordinary

robot applications.

(26)

References

[1] Axel Demborg and Elon Såndberg. Common Representations for Human- Robot Handovers . 2016.

[2] Tamar Flash. “The Organization of Human Arm Trajectory Control”.

In: Multiple Muscle Systems: Biomechanics and Movement Organiza- tion . Springer-Verlag, 1990.

[3] Tamar Flash and Neville Hogan. “The coordination of arm movements:

an experimentally confirmed mathematical model”. In: The journal of Neuroscience 5.7 (1985), pp. 1688–1703.

[4] Keisuke Fujii. Extended Kalman filter. Tech. rep. The ACFA-Sim-J Group, 2003.

[5] How a Kalman filter works, in pictures. http://www.bzarg.com/p/

how-a-kalman-filter-works-in-pictures/. Accessed: 2017-04-18.

[6] Markus Huber et al. “Human Preferences in Industrial Human-Robot Interactions”. In: RO-MAN 2008 - The 17th IEEE International Sym- posium on Robot and Human Interactive Communication (2008), pp. 107–

112. [7] Kinect hardware. https : / / developer . microsoft . com / en - us / windows/kinect/hardware. Accessed: 2017-04-18.

[8] X. Rong Li and V. P. Jilkov. “Survey of maneuvering target tracking.

Part I. Dynamic models”. In: IEEE Transactions on Aerospace and Electronic Systems 39.4 (Oct. 2003), pp. 1333–1364. issn: 0018-9251.

doi: 10.1109/TAES.2003.1261132.

[9] Y. I. Nakano et al. “Towards a model of face-to-face grounding (2003)”.

In: In Proc. ACL 2003 (2003), pp. 553–561.

[10] Ren Ohmura. “Prediction of Human Pointing Gesture with Minimum- Jerk Model for Human-Robot Interaction and Its Evalutaion”. In: Jour- nal of Japan Society for Fuzzy Theory and Intelligent Informatics 28.6 (2016), pp. 911–919.

[11] Robots working in assembly line. http://www.industryweek.com/

robotics/future- robotics- manufacturing- moving- other- side- factory?page=1. Accessed: 2017-04-19.

[12] Christian Smith and Henrik I. Christensen. “A Minimum Jerk Predic-

tor for Teleoperation with Variable Time Delay”. In: 2009 IEEE/RSJ

International Conference on Intelligent Robots and Systems (IROS) .

IEEE/RSJ. Saint Louis, USA, Oct. 2009, pp. 5621–5627.

(27)

Online Predictions of Human Motion

INOM

EXAMENSARBETE TEKNIK, GRUNDNIVÅ, 15 HP

STOCKHOLM SVERIGE 2017 ,

Online Predictions of Human Motion

ANDREAS EDVARDSSON

LUCAS GRÖNLUND

Abstract

Specifically, data will be collected using a Kinect (v.1).

The model used for the predictor developed is the Minimum Jerk model, which states that certain multi-joint reaching movements are planned in such a way that the hand is to follow a straight path while maximizing smoothness.

This study shows that the approach is feasible in some cases, namely

usable predictions is acquired for long movements. In the case of short

movements the alternative of not doing any prediction was by all means

better.

Referat

Det visar sig att denna prediktor ger användbara förutsägelser på längre

rörelser, medan att inte göra någon förutsägelse alls är det bättre alternativet

för korta rörelser. Några förslag för att förbättra prediktorn presenteras

slutligen.

Preface

We would like to thank all the people who participated in the experiment

helping us collecting data, and Christian Smith for providing us with valuable

information and support.

Contents

1 Introduction 1

1.1 Previous work . . . . 1

1.2 Goals . . . . 1

1.3 Theory . . . . 2

1.3.1 Minimum Jerk . . . . 2

1.3.2 Kalman Filter . . . . 2

2 Method 6 2.1 Problem . . . . 6

2.2 Implementation . . . . 6

2.2.1 Equipment . . . . 6

2.2.2 Limitations . . . . 6

2.2.3 Design considerations . . . . 7

2.2.4 The Kalman filter . . . . 7

2.2.5 Predictor . . . . 7

2.3 Evaluation . . . . 8

2.3.1 Experiment . . . . 9

2.4 Analysis . . . . 9

2.4.1 Good results . . . 11

3 Results 13 4 Analysis 17 4.1 The influence of previous experience . . . 17

4.2 Radial and angle errors . . . 17

5 Discussion 18 5.1 Future Work . . . 18

5.2 Conclusion . . . 19

1 Introduction

In the future generations of robots, closer collaborations with humans will be a more common occurrence. When two humans interact, for example in a handover motion, the interaction has often begun long before the actual movement starts by eye contact and body posture among other things.[9]

1.1 Previous work

1.2 Goals

This study will be focused on predicting a reaching movement’s final position

early on using a relatively cheap sensor, the Kinect v. 1, contrary to most

previous studies where more expensive motion capture equipment have been

used. Since reaction time in human-human interactions seems to be up to

about half the duration of the total motion, predictions up to the 50 % point

of completed movement is of interest, with the goal being better performance

than not using any prediction at all. The problem of finding starting points

of the motions will not be considered.

1.3 Theory

In order to make predictions, assumptions on how humans move have to be done. A model that’s widely used is the minimum jerk model, which has been shown to obey certain properties of human reaching movements.[3] The Kalman filter[15] will also be central to the approach suggested in this study.

1.3.1 Minimum Jerk

J = 1 2

Z t f inish

t start

( d 3 x

dt 3 )dt (1)

Under specific conditions the minimum jerk model can be simplified into a fifth degree polynomial, with the constraints being that the movement starts and ends with zero velocity and acceleration. Assuming this, the polynomial in equation (2) is acquired.

x(t) = x 0 + (x 0 − x f )(15 4 − 6 5 − 10 3 ) (2) Where x 0 is the initial position of the hand, x f is the final position and

 = t/t f is the normalized time parameter. The position, velocity and accel- eration profiles are shown in Figure (1).

When there is a change in the final position after a movement has begun, superposition of several minimum jerk motions have been suggested as a way to adapt the motion.[2] Though, in this study we will only look at a single minimum jerk motion and not superposition’s of several.

1.3.2 Kalman Filter

A Kalman filter is commonly used when the states of a linear model cannot accurately be calculated directly due to observation (measurement) noise.

Common applications include navigation, radar tracking and certain imple- mentations in econometrics. The classical Kalman filter, which is used in

1 Time derivative of the acceleration.

(a) Minimum jerk trajec- tory.

(b) Minimum jerk velocity profile.

(c) Minimum jerk acceler- ation profile.

Figure 1: Minimum jerk profiles.

this study, requires the model to be linear and the noise to be Gaussian in order guarantee successful operation. [14]

To use a Kalman filter we need a state vector, a model and measurements.

The state vector contains all parameters needed to describe the movement and make a prediction, like position, velocity and acceleration. The model describes the system will behave as a function of current state and time-step.

Z _t _{f inish}

( d ³ x

dt ³ )dt (1)

x(t) = x ₀ + (x ₀ − x _f )(15 ⁴ − 6 ⁵ − 10 ³ ) (2) Where x 0 is the initial position of the hand, x f is the final position and

= t/t _f is the normalized time parameter. The position, velocity and accel- eration profiles are shown in Figure (1).

State vector : x _t =

 x _0,t x 1,t

 M odel : x _t−1 −−−→ x ^model _t

x _t = F x _t−1 + v _t−1 (4)

z _t = Hx _t + w _t (5)

x _t,est = x _t − K(z _t,true − z _t ) (6)

K = P t H ^T S ⁻¹ _t (7)

x _t =

˙ x _t

¨ x _t