Feature Assessment for the Analysis of Latin Dance Modeling Two-Person Dance with Machine Learning

(1)

Feature Assessment for the Analysis of

Latin Dance

Modeling Two-Person Dance with Machine Learning

Nicholas J Shindler

A thesis presented for the degree of Master of Computer Science

¨

Orebro University Sweden 29.06.2020

(2)

Feature Assessment for the Analysis of Latin Dance

Modeling Two-Person Dance with Machine Learning

Nicholas J Shindler

Abstract

The investigation of the relevant aspects of embedding space in couple dance is crucial for further analysis of two-person dances. Due in part to data complexity, previous models of two-person dance have been relatively simple. This project aims to determine a preferred method for encoding two-person dance skeleton data to create complex models for the classification of dance moves.

(3)

5.2.3 PCC . . . 24 5.2.4 Testing limitations . . . 25 5.3 Dance Classification . . . 25 5.4 Experimental Improvements . . . 26 5.4.1 Motion Tracking . . . 26 5.4.2 Features . . . 26 5.4.3 Classification . . . 26 5.5 Personal Reflection . . . 27 5.5.1 Dance . . . 27 5.5.2 Machine Learning . . . 27 5.6 Future Work . . . 27 6 Conclusion 27

(5)

1 Introduction

Creating models of couple dance is often considered an attractive environment to study modeling interactions between humans and robots [1]; as well as a necessary step for characterization and analysis of human-human interaction in dance and beyond. The goal of this paper’s work has been to create a method of classifying moves made in Latin dance.

In this project, different methods of calculating features will be tested for their ability to represent the state of a two-person dance and performance will be evaluated using machine learning classifiers and statistical evaluation.

1.1 Motivation

A key aspect of creating a computer system with the ability to analyze two-person dance is identifying the moves that a dancer is performing. Move identification is part of classifying the type of dance as a whole, creating robots that can dance with a human, and developing a system to evaluate the quality of a couple’s dancing.

1.1.1 Extension of existing research

While there is a substantial body of work studying single person dance, very little work in two-person dance has been published, particularly for classification. The extant work in a similar application gives our research a starting point and quantifiable expectations for what is possible modeling dance. However, it leaves a wide range of unanswered questions about how going from one-person to two-person dance will impact tracking and classification. It is expected that the process of tracking becomes more difficult with a second body, and the nature of the movements will change due to dancers reacting to their partners.

1.1.2 Application beyond dance

Representing the motion of two people interacting to perform a shared task is a high dimensional and complex problem, but very important for a robot’s ability to interact with people, particularly in the service industry [2]. Two-person dance can be used as a structured environment to investigate these interactions. The first step in accurate analysis of human motion is to determine the relevant features in the data.

1.1.3 Building blocks of two-person dance modeling

These same complexities make understanding dance an intricate and exciting task. Creating a model for dances’ classification can bring a better understanding of how different styles of dances relate to each other. Furthermore, such a representation may allow evaluating dancers in both instruction environments [3] and in regards to overall performance plus their ability to dance together as a couple.

1.2 Problem

This project is exploring different methods for characterizing the state of a two-person dance at a given time. Based on previous work in single-person dance, we will focus on calculating features based on skeleton data gained from a marker-based motion tracking system. This approach will give us detailed positional data of the two dancers. Using the positional data, we can calculate values using motion theories to derive meaningful numerical representations of qualitative descriptors. For this project, we will use two different motion theories: Laban Motion Analysis, based on work in single-person dance motion tracking [3]; and a basic two-person dance theory for representing aspects of non-verbal communication, used in research into human-robot dance [4][5]. We will use classification performance to evaluate the effectiveness of the features. Therefore a set of features that represents a dance well should result in high classification accuracy.

1.2.1 Calculating Features

A move in a dance it a loosely defined combination of body movements that two people make by moving their bodies in space and in relation to each other, and that when put together make a dance of a particular style. Laban Motion Analysis is a method of formalizing the movements that people make and has been used successfully to represent dance states for single person dance. This project hopes to build on this method to create an analogous set of features to describe two people dancing.

Research into creating robotic dancer partners or simulated dance partners that can react to humans, however, often takes a different approach: tracking how dancers communicate intention rather than the state

(6)

of the dance. In this project, the goal is to extend the number of calculated features used to represent the dancers’ communication as all previous work in this area has used a minimal selection of features in their experiments.

1.2.2 Classifying Dance State

Assessing the data also requires testing features, in this case, testing for their ability to classify dance moves. Originally we intended to implement classification with an LSTM Neural Network because we are working with time-series data; however, due to issues with high amounts of noise and uneven label distribution, we decided to use KNN and SVM machine learning classifiers. Additionally, in previous papers doing work in single-person dance [3], a simplistic approach using data correlation coefficients, calculated with the Pearson Correlation Coefficient, was used to gauge the similarities of two segments of dance data at a single time point. Therefore for this project, we are looking at both higher-level machine learning solutions to create a trained model for classifying dance movements and simple correlations with PCC. A trained model’s advantages come at the cost of requiring a more extensive set of data to work with, and a complicated optimization of parameters and validation to record meaningful results.

2 Background

This paper builds on work done in Folk Dance Evaluation Using Laban Movement Analysis [3] using Laban Motion Analysis to classify traditional, single person, Mediterranean dances. This paper is expanding the application of Laban Motion Analysis to look at dance in two-person Latin dance.

Couple dance is a much more complex relationship than a single dance. There are various aspects of increased complexity in modeling partner dance. Moving from single-person dance to two-person dance, the movements of a person become linked to their partner’s actions. This link is generally a physical connection by the hands, but not limited to hand to hand contact [4]. Couples in two-person dance are constantly reacting and adjusting to their partner’s movements. Therefore one person’s movement at a given time will affect a move made by the other person at a future time point [6]. Couple dance is structured around a basic step, with additional, generally predefined movements that can be executed around the framework of this step pattern, as well as combined. Therefore such movements may not be performed in the same way or with a repeated pattern, including variations in synchronization based on dancer skill[7]. Thus, the way the feature space is calculated is particularly essential, as it is necessary to capture the critical aspects of a dance move without getting lost in the noise.

While not part of this project, data collection is a large part of the work done to classify movement and is a particularly complicated task for two-person dance. Motion tracking involves the detection of markers by high-speed cameras. Markers placed on the body need to ideally be visible to at least two cameras at all times. Due to the nature of two-person dance, particularly Latin dance, partners are very close to each other. This proximity leads to problems where their partner’s body can often block markers from the camera view. Additionally, due to the nature of the movement, there can be issues with knocking markers off a dancer. While the data used in this paper is pre-collected, these issues still heavily impacted the work done in future steps.

2.1 Related Work

2.1.1 One-Person Dance

The principal work in single-person dance has been done in by Dr. Andreas Aristidou at the University of Cyprus. The first paper for this context is Folk Dance Evaluation Using Laban Movement Analysis as it discusses a method of calculating features in single-person dance and comparing different dancers based on the calculated features [3]. In this paper, a set of features based on Laban Motion is presented. The features are used for calculating the similarity between a student emulating a teacher performing a dance. Allowing the evaluation of dancer performance based on the calculated correlation between the student and the teacher over a set of features. This approach was also used for the analysis of the emotional state of a dancer, with application toward motion synthesis [8]. For analysis of the emotional state, dancers were instructed to perform while expressing a particular emotion. In the context of LMA, these dance segments were analyzed to track how each emotional state changed the resulting features. This emotional analysis was used to provide generative motion constraints based on LMA, with the goal of increasing the believability of the synthesized dance. While the evaluation appeared principally based on a subjective survey of human providing evaluation, the method notably increased the perceived cohesiveness of the dance. In a related study in identifying emotion in theater, a neural network was trained to identify the same emotional states used in the previously mentioned paper [9].

(7)

This research displays effectiveness in using the LMA calculations to correctly classify an individual’s emotional states using a basic neural network.

2.1.2 Two-Person Dance

Most existing research into two-person dance focuses on its application for creating a virtual dance partner. A primary resource was research into creating a simulated partner using haptic feedback for assisting patients with Parkinson’s with mobility [4]. In this paper, a subject moved with hand to hand contact with a robot. This robot was controlled through a combination of bio-mechanical calculations, namely the center of mass differences and estimates of interaction forces. There is additional research into haptic only signaling for dance coordination, including [10], which states that using a predefined set of movements, a human can follow a robotic lead using haptic signaling from the robot as well as music to align timing.

Further research in this area [11] extends the work in haptic signaling by implementing a two-user system in which the mechanical lead is controlled by force sensors on a human’s arms, and the follow cannot see the lead. While both these papers implement haptic signaling, they did provide music for the dancers as a supplemental reference. In further work into the relationship to music in dancer signaling and communication [12], there is an indication that while the music playing can dictate the appropriateness of a move, the follow will choose to follow the physical cues from the lead rather than the timing from the music. However, in testing with partners listening to different music during their dance, there is a marked disruption if the music was not synchronized for timing and bar structure, however, even with synchronization, the music was shown to influence the selections of dance moves the lead chose.

In research into creating a robot capable of performing a Waltz with a human partner [13], Accurately modeling a human proved a significant challenge, while being necessary for calculating how the robot should move. In this paper, an attempt was made to control the robot through a virtual force controller, which had poor performance due to high noise in the center of mass predictions for the human partner. In a similar attempt to create a robotic dance partner, a dance between two human partners was recorded with motion tracking as well as a force/torque sensor between the partners’ hands and used to synthesize trajectories for a robot lead to dance with a human partner [14]. This method yielded good results in creating a robot that was able to lead a partner in a dance. However, they did not synthesize unique trajectories, nor model the human movement discretely, the only adaptations were to adjust step size and force based on partner feedback. Another approach created a robot as a lead in a partner dance using the center of mass manipulations and linear force to guide the human partner [15]. In this work, they confirmed a control method whereby raising a person’s center of mass, they were brought to an unstable point and more susceptible to velocity imposing disturbances. This process was effectively used by a robot to guide a partner’s linear motions.

Work has furthermore been in classifying the ability of two partners to dance together in the Salsa dance [7]. However, in this paper, the researchers only looked at the movement of the feet of the participants. The dance performance was evaluated based on how well the dancers aligned with the musical framework they were dancing to, specifically, if they moved on or behind the beat. Correctness was evaluated based on the performance of expert dancers.

2.2 Motion Capture

While there are many ways of tracking human motion, this work has used pre-recorded data, recorded with the Qualisys Motion Capture System. Because of this, I am limiting the discussion to the specifics of the Qualisys system.

2.2.1 Qualisys Motion Capture System

This type of motion capture system uses an array of cameras in different positions to track visual markers places at fixed points on a person’s body. The marker’s position in three-dimensional space is determined using an aggregate of multiple camera position calculations. Qualisys Track Manager (QTM) software provides an infrastructure to label the marker positions. Labeling is the process of linking a marker’s calculated position to its actual body location. The Qualisys system additionally supports active markers. Active markers are sequence coded, relying on light emission at an interval, to make them able to be identified without a labeling stage. The Qualisys system supports using active and passive markers at the same time, such that not all body markers need to use active markers, simplifying configurations. Qualisys recommends that active markers be used when detection is difficult, or markers are moving independently or irregularly.

(8)

2.3 Motion Tracking

2.3.1 Kizomba Dance

The Kizomba dance originated in Angola and is danced in many Portuguese speaking countries, including Portugal. Traditionally, Kizomba music is sung in Portuguese. The dance derives from Semba fused with influences of the French Caribbean and is generally a two-person dance with a lead, and a follow partner[16]. Leads are the dominant partner, generally dictating the moves that a couple will make, and performing leadership functions, while the follow facilitates the dance moves by reading and reacting to the lead. Traditionally the lead is male, and the follow is female; however, this is not a strict definition.

The Kizomba dance is done in 4/4 time and can include a wide variety of dance steps. However, the majority of the Kizomba dance stems from 4 basic steps and two further moves. The four basic steps are generally referred to as Basic One, Basic Two, Basic Three, and Basic Four. The Two further moves are generally called Woman Saida/Out, Man Saida/Out, and there are many variations.

The literature I have seen does not seem to have a clear consensus on the naming order of the basic steps, particularly as names for dance moves can be in English, Portuguese, or a mixture of both languages.

In this paper, the dance moves are as follows. Basic One is a side to side step done in place. Basic Two is a forward and backward step (similar to the basic Salsa step). Basic Three is a combination of the basic one and basic two steps such that the partners move in a square pattern. Basic Four, is a circular movement, performing the basic one step but adding additional rotation. The Saida step involves a partner moving out to the side of their opposite. In Woman Saida, the woman steps out to the side, and in Man Saida the man steps out to the side. In both cases, the pair will proceed in the direction the partner that stepped out is facing. These moves carry many variations and combinations, but this project worked primarily with simple movements and only a few variations.

2.3.2 Qualisys Track Manager

QTM can be used to automatically or manually label data depending on data quality. To automatically label the markers, a calibration track is used to create a model for each person. To calibrate a model, the person needs to perform movements to allow it to identify each joint and how it moves. For two-person dance, two models are created, one for each person. Applying the model to the track will allow the QTM software to label the markers automatically; however, manual error fixing is often still necessary. The amount of manual error-correcting depends on the quality of the data, principally how often a marker is obscured and unable to be tracked by the motion capture system. QTM will also handle gap filling where covered markers can have their locations calculated mathematically for the periods that they are not visible. Once the markers are labeled, QTM can be used to create a skeleton based on the marker data using QTM’s skeleton solver.

2.4 Dance Classification

2.4.1 Laban Motion Analysis

Laban Movement Analysis is a method of describing body movement, created for theater, but now is used in many areas, including dance. Laban movement is broken down into six categories: Body, Effort, Shape, Space, Phrasing, and Relationship [17]. Where body and space are considered kinematic, and effort and shape are considered non-kinematic. Body, describes the physical characteristics of the human body while moving, how they move, how they relate to each other, and how body parts influence the movement of other body parts. Effort describes how movements are made: sudden, fast, smooth, slow. Shape refers to body shape and how that shape can change. Space is the connection of movement to its environment or how a body uses its space. Phrasing is the way a person’s movement is characterized. Relationship is how different objects relate to each other, such as two dancers, two body parts, or a body and an object.

Laban motion analysis (LMA) can be used for categorizing the motions made by a person when dancing. LMA is used as the underlying theory for classifying single person dance by Dr. Andreas Aristidou. Aristidou uses LMA theory to create a set of features obtained by performing calculations on skeleton data collected from a motion capture system.

By breaking down human motion into essential components using the principles of Laban Motion Analysis (LMA) [18], we can extract the movement into a form that is usable for comparison and analysis. LMA is a method developed principally through research at the University of Cyprus for single person dance analysis [3]. We want to test if it will be possible to extend these motions for two-person dance and create a similar encoding style, treating dance as an arrangement of these movements.

(9)

state of a dancer at a given time, allowing the categorization and analysis of a dance type being performed. The paper that we referenced for this experiment was for using LMA for classification and dancing assessment, to show how well a student was emulating a teacher in dance.

2.4.2 Non-verbal communication in dance

In non-choreographed dance, even when there are proscribed motions, non-verbal communication between partners occur [5]. This communication conveys information about what the next move that the partnership will perform will be. This cue and acknowledgment information exchange is an integral part of dancing, particularly at high levels. Non-verbal communication in dance has been studied, including showing notable improvements in robot-human dancing by manipulating the center of mass to indicate the direction of movement [15]. As well as studies that show, while music may indicate the appropriateness of a particular movement, the physical communication between dancers is robust enough to supersede the musical cues [12].

In dance information is conveyed through the interpretation of changes in pressure, position, and weight [5]. These cues signal a change in the movement and direction of the dance. And can provide non-verbal communication between partners as well as an indication of how well a couple dances together, and what dance moves are being performed.

Weight is the concept of how partners counterbalance each other’s body-weight and inertia while dancing; this is linked to fluidity in dance. Weight is generally a shared concept, and having proper management of weight can indicate that partners are dancing well together.

Position relates to multiple aspects, both in how a person physically positions themselves relative to their partner, as well as the position of a couple when dancing. For example, the Kizomba dance generally places partners much closer together than the Samba or Salsa dance [19]. Alternatively, a partner may move their hands to a specific position to indicate what move to make. Furthermore, various moves have set body positions, while partner dances generally have a basic step that other dance moves build off of this basic step can be done in several different ways depending on hand placement. Similarly, many dances have an approximation of walking side-by-side (promenade), and this move shares standard positional features across dance types, most notably that both partners are walking in parallel in the same direction.

Pressure is a primary component of how dancers can communicate movement to their partner. Pressure is a manipulation of the space shared by two dancers. Moving into the shared space (towards your partner) can direct the partners’ motion. By decreasing pressure (moving away), a partner can indicate a move is ending and a transition to a new step.

Based on my research into dance theory, information is transmitted between partners using weight, pressure, and position [5]. By breaking these concepts down into calculated relationships in the dancer’s body movements, it becomes possible to measure how dance partners are sending information to each other, as different dance movements may correspond to different interactions.

This measurement method may be very susceptible to noise and variation, particularly between different couples performing the same dance and the same moves.

2.5 Algorithms

Features are calculated using the skeleton data from the motion capture system. This paper uses two different theories for deriving the features. Once calculated, features need to be assessed.

The features are characterized visually with t-distributed stochastic neighbor embedding (t-SNE). t-SNE creates a low dimensional set of points that emulates the higher dimensional real data. This plot is used to show clustering in data. In general terms, the t-SNE function works to minimize the Kullback-Leibler divergence between the input point set, assumed to be Gaussian, and the generated data points by moving the generated points [20]. Features are demonstrated to be clearly defined representations of classes when the data points from each class are clustered in distinct regions. Time-wise data continuity is shown by the points forming continuous lines.

Feature set performance was initially assessed with the Pearson Correlation Coefficient calculation as this was the metric used in research in single-person dance.

In this project, we used two different types of classifiers. In both cases, models are created within supervised Machine Learning Frameworks. These models are trained on manually labeled data to identify unlabeled data. Processing features with a machine learning classifier is a way to assess the effectiveness of a set of features in representing a label in the data. In this way, we could see if there was improved performance in a given method of calculating features.

(10)

2.5.1 Pearson Correlation Coefficient

The Pearson Correlation Coefficient is a parametric measurement that shows if there is statistical evidence for a linear relationship between two populations [21].

ρ = NP XY − (P X P Y )

p[N P x2_{− (P x)}2_][N_{P y}2_{− (P y)}2_] (1)

In this formula, X and Y represent two populations, and the resulting coefficient represents the linear correlation between the two populations. In general terms, PCC is the covariance of population X and Y divided by the product of the standard deviation of Y and the standard deviation of Y.

ρ =cov(X, Y ) σxσy

(2)

Values of ρ are between -1 and 1. Where -1 is an absolute negative linear relationship, while 1 is a complete positive linear relationship. 0 is a total lack of linear relationships. The exact relationships between the degree of correlation and the coefficient’s value may vary based on application, an (absolute) value of 0.1 to 0.3 typically displays a weak correlation. A value of 0.3 to 0.5 represents a medium correlation, while an absolute value of 0.5 or higher is a strong correlation. Lastly, PCC measures linear relationships, which do not capture nonlinear relationships in data.

2.5.2 Support Vector Machines (SVM)

An SVM is a classification model that works by finding the optimal decision boundary between classes. Hyper-planes dividing classes are derived such that they maximize the margin of the decision boundary, and therefore, minimize the generalization error. SVM’s can handle nonlinearity by mapping data into higher dimensions to achieve linear separability [21].

For linear SVM models the hyperplane can be represented by the equation w · x + b = 0. For an SVM classifier, we take to find an optimized greatest margin between 2 planes. To do this, we take a plane w · x + b = δ and w · x + b = −δ, and based on the constraints that there must be no points between the two planes we calculate the optimal hyperplanes. For optimization we maximize the margin between the planes, where m = 2

||w||. To maximize m we only need to minimize in (w, b). This minimization is done using the Lagrangian formulation. min Lp= 1 2||w|| 2 − l X i=1 aiyi(xi· w + b) + l X i=1 ai 3 ∀iai≥ 0 (3) w = l X i=1 aiyixi, l X ,i=1aiyi= 0 (4)

This is the Lagrangian Dual Problem, and by substitution we can maximize in terms of a instead of minimizing in terms of w and b. max LD(ai) = l X i=1 ai− 1 2 l X i=1 aiajyiyj(xi· xj) 3 ai≥ 0 & l X ,i=1aiyi= 0 (5)

This method of calculation is necessary so that SVM’s can handle non-linear hyperplanes. In this case the equation becomes:

LD(ai) = l X i=1 ai− 1 2 l X i=1 aiajyiyjK(xi· xj) (6)

Where K is a Kernal function that transforms x to polar coordinates, many potential Kernel functions can be used for SVMs. In the Matlab Classification Learner context, we primarily use a Gaussian Kernel, though a Cubic Kernel is also considered.

(11)

2.5.3 K-Nearest Neighbor

The K-Nearest Neighbors (KNN) Classifier uses the k nearest neighbors to a point x to compute a prediction y. There are multiple ways of calculating the distance to neighbors as well as if distances should be weighted. Also, the number of neighbors used can vary [21].

In basic KNN, classification is determined by taking the k nearest neighbors, based on the list of training neighbors, and finding the majority class from the neighbors.

In the context of the Matlab implementation of the KNN Classifier, we used a set collection of properties. The distance metric that we used is the Euclidean distance, stored in a k-dimensional tree. To calculate the Euclidean distance we use this formula.

d(p, q) = v u u t n X i=1 (pi− qi)2 (7)

Where p and q are points in n dimensional Cartesian space. And the distance is the Cartesian straight line distance. A k-dimensional tree is simply a binary search tree of sufficient dimensions to store the distance relationships between each point. A BST has a space complexity of O(n) and an average search/insert complexity of O(logn)), and worst case O(n. Weights are assigned using the inverse squared method which is weight = 1/distance2.

2.6 Ethical Considerations

When working in the sphere of motion tracking, there are several peripheral issues to keep in mind. While creating models for classifying dance movements may seem straightforward, for the same reason that studying two-person dance can give insights into tracking human motion and interactions in other areas of life. Informa-tion learned in analyzing two-person dance may impact how machines can be used to track people. This work may be intended to be beneficial in the world of dance, this work will be used outside it, and its impact may create ethical infringements.

2.6.1 Motion Tracking Limitations

For qualitative assessment, like if two people are dancing well together, it is imperative to remember that systems such as those presented in this paper are limited in many ways. First, the selection of features is limited by the knowledge of the creator of the system. There are complex relationships involved in motion, and how visual information is represented to measure those relationships does not have a global truth, is subject to human error, and human limitation. Second, the model’s quantitative results are just as subjective as a person’s evaluation would be. Machine models discussed in this paper must learn, and the data they are trained with shapes their results. Therefore it does not output a global evaluation of what is a good representation of dance A. It gives an assessment relative to the data it was trained with: that a dance resembles the data (dances) used to train the model.

2.6.2 Interaction Tracking

One of the areas of motion classification in dance is recognizing dancers’ emotions based on how they are dancing [9]. By itself, this may not appear to have an extensive application beyond helping dancers with their performances and representing emotions. This principle could be used as a platform for tracking human emotions in other interactions, such as in the security industry. A qualitative emotional assessment could be very beneficial in some applications, but could just as readily be used to make substantial infringements on personal privacy. Or be used to dictate behavior in individual interactions, such as enforcing retail workers, maintain a positive emotional range when interacting with customers.

3 Methods and Implementation

3.1 Tools

The dance data was collected using the Qualisys motion capture system using ten cameras in the Robot Lab at Orebro University. This data was processed in the Qualisys Track Manager (QTM) to obtain skeleton data. The Skeleton data was exported as a TCSV and converted to CSV file format in LibreOffice Calc. The dance moves labels were recorded into a LibreOffice Calc table based on the video using VLC media player.

(12)

• Qualisys Track Manager (QTM) • LibreOffice Calc

• VLC media player

The skeleton data was converted into feature sets using python scripts. Machine learning tests performed in python used the Tensorflow Keras library. Additionally, python scripts were used to calculate Pearson’s Correlation Coefficient (PCC). T-SNE graphs were created in Matlab, as well as some data formatting. The Matlab Classification Learner was used for creating classification models with the formatted data.

• Python

• Tensorflow/Keras • MatLab

• MatLab Classification Learner

3.2 Experiment Method

3.2.1 Data Processing

Data was collected using a Qualisys motion capture system. The markers were then labeled in Qualisys Track Manager (QTM) and used to create skeleton data. QTM includes an auto labeling system for identifying markers using a model; this relies on having high-quality data with minimal gaps in the markers’ visibility. Due to a high gap rate, marker labeling was done using a combination of automated processes in QTM and manual labeling, as the model was not particularly useful. This skeleton data was exported as a TSV file and converted to CSV format. For the work done in this project, all data that was used comes from a single recording made of the Kizomba dance.

3.2.2 Data Formatting

Mean normalization was used for the skeleton XYZ position measurements before either feature was calculated. Normalization of the global position was not considered necessary as the features all used relational calculations for values in the XY plane. The normalization was intended to take out variability in body shape and size between dancers. Regardless, because these tests only used one set of partners, all such normalization attempts were primarily good practice because there was no body variation to normalize for as the tests were done with a single couple.

Dance move data was done two times, once at a coarse per one-second interval, and a second time at an interval of 1/30th of a second. Comparing these labels performance should give a better idea of the impacts of the labeling quality on the tests.

For the initial labeling, the dance move labeling was done to the nearest second resolution. This labeling was done watching the video recording of the dance and documenting move start and end times in a table. In the data, labels were accorded an integer value between 0 and 12, with 0 labels used for movements that did not correspond dance moves, such as improvisations. The label criteria were derived from named moves in Kizomba dance instructional content [19], and actions that while un-named repeated in multiple places in the dance.

The second labeling was done using the video broken down to 30 frames per second stills and an updated list of dance movements to identify all sections of the dance with unique labels. Additionally, labeling was based on the much more detailed descriptions in an alternate Kizomba dance move instruction [16]. Otherwise, the labeling was performed in the same way.

3.2.3 Feature Calculation: Laban Motion

Based on the research done at the University of Cyprus, creating an encoding method for single person dance using Laban Movement Analysis [8], a two-person version was created that uses a selection of basic body movements to develop features. This method involved the calculation of 186 features, 93 for each person. The features required calculating relational values over a time window, as the single person dace feature’s time window was 35 frames in a 30 fps data, this project used 160 frames for the 147 fps data.

(13)

Figure 1: LMA Features [3]

3.2.4 Feature Calculation: Dancer Interaction

For comparison, a secondary set of features based on values calculated to represent the couple’s pressure, weight, and position was created. This alternate feature set is significant for comparison, as these variables are linked to non-verbal communication methods in dance, and would mean that the interaction between dancers could be tracked to classify the performed movements. This method involved the calculation of 72 features, using the same time window as previously mentioned for LMA features. These features used both skeletons to calculate features rather than calculating the features independently and stacking them. However, the format and creation structure is the same. Research indicated that these metrics had been used successfully in previous research into human-robot interaction, and creating dancing robots [?].

# Feature Description Calculations

1 hands velocity A derivative of hands position side A max/min/mean/std

(person one left side, person two right side)

2 hands velocity B derivative of hands position side B max/min/mean/std

(person one right side, person two left side)

3 hands force A 3rd derivative of hands position side A max/min/mean/std

(person one left side, person two right side)

4 hands force B 3rd derivative of hands position side B max/min/mean/std

(person one right side, person two left side)

5 hip distance distance between hips max/min/mean/std

6 shoulder velocity difference difference in shoulder velocities mean/std

7 shoulder distance distance between partners shoulders max/min/mean/std

Figure 2: Pressure

The calculation of pressure focus on hand and arm moments as well as body spacing. A force or velocity near zero indicates low pressure, whereas a high (negative or positive) force or velocity indicates a higher pressure. Similarly, decreasing space between hips would indicate positive pressure, such as when the lead communicates the direction that partners should travel; conversely, an increasing space would indicate negative pressure, such as when a lead is spinning their partner. The same follows for shoulder distances.

(14)

# Feature Description Calculations 8 shoulder hips distance A xy dist between hips and shoulder midpoint, person A mean/std 9 shoulder hips distance B xy dist between hips and shoulder midpoint, person B mean/std 10 feet hips distance A xy plane dist between hips and feet midpoint, person A mean/std 11 feet hips distance B xy plane dist between hips and feet midpoint, person B mean/std 12 hips velocity difference difference in velocity of hips between dancers mean/std 13 hips acceleration difference difference in acceleration of hips between dancers mean/std

14 Center of mass A center of mass of person A mean/std

15 Center of mass B center of mass of person B mean/std

16 center of mass combined combined center of mass mean/std

Figure 3: Weight

The comparison between the shoulder and hip positions intended to calculate the offset between the upper body and the core; this should show if a person is leaning backward. The more a person is leaning away from their partner, the more weight they are generally giving. The same is true for the feet to core comparison. The calculations of velocity difference to show if the two bodies were moving away or towards each other, when two bodies are not moving away, but they have an unstable center of mass there is an indication that the partners are balancing each others weight, such as in a swing or turn. When two bodies move apart with a stable center of mass, there is an indication that the partners are moving independently, such as in the promenade.

#

Feature

Description

Calculations

17 gait A

distance between feet person A

max/min/mean/std

18 gait B

distance between feet person B

max/min/mean/std

19 feet velocity

derivative of feet average position

mean/std

20 hip orientation difference

difference in orientation of hips (yaw)

mean/std

21 hand distance A

side A distance between partners hands

max/min/mean/std

22 hand distance B

side B distance between partners hands

max/min/mean/std

23 volume

combined volume of both dancers

max/min/mean/std

24 feet volume

space all feet take up

max/min/mean/std

Figure 4: Position

The position calculations are similar to the computations for Laban Motion but focus on the relational positioning rather than the individual positioning. Comparable calculations (to the LMA features) were the gait, and volume, though the volume was calculated for both dancers together as a single volume. The hip orientation would show if partners are facing towards or away from each other, where dance moves such as the Basic One have partners facing inwards, and the promenade the partners are facing the same direction, and in a spin difference in orientation changes rapidly. Feet velocity and area give indications to how quickly the couple is moving, for example in the Basic One step the couple is relatively stationary, but in the Basic 4 the couple is moving in a circle, while doing the same basic step, giving a higher foot velocity, but the same foot volume. Hand distance can indicate the positioning of dancers, where some dance moves are made where partners hold hands, and others the lead places hands on the hips and the follow on the shoulder, which should be detected by this metric. The center of mass is calculated as an approximation, with every limb given equal weight.

The enhanced labeling is expected to create improved results for all tests.

3.2.5 Feature Assessment

Features are first assessed with Pearson’s Correlation Coefficient (PCC) to compare the correlation values of features with the same label to features populations with different labels. This calculation is a baseline assessment for performance comparisons with the feature calculations for single person dance. It is performed like the paper on dance evaluation with LMA [3], where it is used to compare a student’s ability to emulate a teacher. In our tests, this is an initial performance comparison between the two sets of features.

This testing assesses the correlation level in the features for data with the same label (a strong correlation is better) and data with different labels (a weak correlation is better). These tests use the three labels with over 2,000 observations to have sufficient test data, and data was split at intervals rather than randomly to account for the temporal nature. The full sets of features were tested for each feature calculation method, taking averages of all the features. A second test was then performed, selecting a subset of features that had

(15)

a high correlation internally and a low correlation externally, again taking an average. Unlike the paper on dance evaluation, we did not look at individual Laban categories or how the correlation changed over time, because the data was not supposed to be mirror movements, only identifying movement relating to the same dance move.

For the second label set, a similar set of 3 labels with large frame counts was used to compare results. Otherwise, the procedure was the same.

For the second label set, the data was evaluated in the same method as in the previously mentioned papers evaluating single person dance. In this case, the PCC was calculated for two timeframes of data, and the populations were the features. The resulting correlation was averaged over all features in the test set. Using this method, all 23 labels were tested internally and each against all other labels. From this, I expected to see how different dance moves related to each other and which moves were similar or different. For labels that are easy to differentiate, it was expected to have a value significantly lower than the correlation between frames of the same label. This test is a helpful way to assess the quality of the labels.

3.2.6 Classification

Two classification methods were selected: K-Nearest Neighbor and Support Vector Machines. The KNN clas-sifier was chosen as it is a simple clasclas-sifier that performs well even when the data is poorly labeled, which I suspected was an issue in this case. The SVM classifier was chosen to provide a comparison to the KNN; it has a harder time handling labeling issues but had the potential to provide better results.

The features were tested in a KNN and SVM Classification learner, using the Matlab Classification machine learning app. The classifiers were verified with both features independently, as well as the two sets of features combined. This testing shows us how well each set of features could classify the data in a simpler, non-temporal model. The KNN model was constructed using a weighted KNN with ten nearest neighbors, and distances calculated using the euclidean distance, and the weights calculated with the squared inverse. All KNN tests had 25% of the data held out for validation in the first label set, and 20% held out for the second label set. The SVM model used the Gaussian SVM for all the tests. The SVM used the same data splits as the KNN. Both classifiers where trained and tested multiple times, holding out a different segment of the data on each trial, and the resulting validation accuracies were averaged over all trials.

For the first label set, there were 20000 data frames used for training and testing. For the second label set, there were 30000 due to better labeling, however many labels populations were considered too small, and only labels with greater than 900 frames were used, leaving 25000 frames for testing and training. In both cases, 5000 frames of data were held out for validation.

Having two labeling methods showed how the classifiers’ results changed with labeling improvements. In addition to testing the calculated features, The skeleton data has been normalized and used with the classifiers to achieve a baseline performance value for the classification of the data without first calculating higher-order features.

These tests are expected to show how beneficial the features are at representing two-person dance. As the raw data is being used as a baseline, features that perform well will have a better classification accuracy than the baseline, while features that perform worse than the baseline decrease the information that is being passed. By performing calculations to derive higher-level information, I expect to see an increase in the classification accuracy of all three sets of features over the raw data’s performance.

3.3 Experiment Implementation

3.3.1 Data Formatting

CSV files were used to format the data, with each row containing all the information about that frame of data. I calculated a timestamp and label and appended it to the calculated features. The XYZ skeleton data was all mean normalized before being processed using the formula:

norm = X − µ

σ (8)

3.3.2 Feature Calculation: Laban Motion

The methods for calculating the features based on Laban Motion can be found in Dr. Aristidou’s paper on LMA based feature calculation [3], they were directly replicated in this experiment.

(16)

3.3.3 Feature Calculation: Dancer Interaction

In the following calculations L references the lead dancer in the couple and F represents the follow dancer in the couple. Each position is represented by a 3x1 vector giving x, y, and z positions. Unless specifying a single point, operations are done with the entire vector. For these calculations all skeleton parts are given an equal unit weight to simplify the calculations. If not explicitly stated when two skeleton attributes are averaged the calculation is as follows:

AvgP oint = [P ointAx+ P ointBx

2 ,

P ointAy+ P ointBy

2 ,

P ointAz+ P ointBz

2 ] (9)

For all velocities, accelerations and forces, velocity is the derivative of position, acceleration is the derivative of velocity, and force is the derivative of acceleration. All derivatives are calculated as follows:

∆ = P ointAt− P ointAt−1

∆t (10)

Where PointA is a vector [x, y, x] and the difference between two skeleton points is calculated as:

Dif f = [(P ointAx− P ointBx), (P ointAy− P ointBy), (P ointAz− P ointBz)] (11)

# Feature Calculation

1 hands velocity A avg(FLef tHandP os,LRightHandP os)t−avg(FLef HandP os,LRightHandP os)t−1

∆t

2 hands velocity B avg(FRightHandP os,LLef HandP os)t−avg(FRightHandP os,LLef HandP os)t−1

∆t

3 hands force A avg(FLef HandAcc,LRightHandAcc)t−avg(FLef HandAcc,LRightHandAcc)t−1

∆t

4 hands force B avg(FRightHandAcc,LLef HandAcc)t−avg(FRightHandAcc,LLef HandAcc)t−1

∆t

5 hip distance

q P[x,y,z]

i=x (LHipP os(i) − FHipP os(i))2

6 shoulder velocity difference avg(LLef tShoulderV el, LRightShoulderV el) − avg(FLef tShoulderV el, FRightShoulderV el)

7 shoulder distance avg(FLef tShoulderP os, FRightShoulderP os) − avg(LLef tShoulderP os, LRightShoulderP os)

8 shoulder hips distance A

q P[x,y]

i=x (avg(Llef tShoulderP os(i), LRightShoulderP os(i)) − LHipP os(i))2

9 shoulder hips distance B

q P[x,y]

i=x(avg(Flef tShoulderP os(i), FRightShoulderP os(i)) − FHipP os(i))2

10 feet hips distance A

q P[x,y]

i=x(avg(Llef tF ootP os(i), LrightF ootP os(i)− LHipP os(i))2

11 feet hips distance B

q P[x,y]

i=x (avg(Flef tF ootP os(i), FrightF ootP os(i)− FHipP os(i))2

12 hips velocity difference FHipP os(t)−FHipP os(t−1)

∆t −

LHipP os(t)−LHipP os(t−1)

∆t

13 hips acceleration difference FHipV el(t)−FHipV el(t−1)

∆t −

LHipV el(t)−LHipV el(t−1)

∆t

14 Center of mass A P<skeletonparts>

i=<part1>

L(i) weightL(i)

15 Center of mass B P<skeletonparts>

i=<part1>

F (i) weightF(i)

16 center of mass combined P<skeletonparts>

i=<part1> L(i) weightL(i)+ F (i) weightF(i) 17 gait A q P[x,y,z]

i=x (Llef tF ootP os(i) − LrightF ootP os(i))2

18 gait B

q P[x,y,z]

i=x (Flef tF ootP os(i) − FrightF ootP os(i))2

19 feet velocity avg(Flef tF ootV el, FrightF ootV el, Llef tF ootV el, LrightF ootV el)

20 hip orientation difference quaternionT oCartesian(L)[Θ] − quaternionT oCartesian(F )[Θ]

21 hand distance A LrightHandP os− Flef tHandP os

22 hand distance B Llef tHandP os− FrightHandP os

23 volume bounding box of all points

24 feet volume bounding box for all feet points

Figure 5: Dancer Interaction Feature Calculations

The PCC values were calculated by taking 2000 data frames from 3 different label populations with sufficient size. These were labels 2,4,and 5 in the first set of labels, and labels 2,4,and 11 in the second set. To test for inner correlation the 2 populations were split randomly into two equal populations of 1000 frames. The PCC value for each feature was calculated, and then the average over all features were taken. To compare different labels, 2000 frames were taken randomly from each label and the same process of calculating the PCC value was performed. These calculations were repeated a second time after filtering the data. Data filtering was performed by storing all features that had a value greater than 0.5 on any test for internal correlation, and features that had a value of less than 0.5 of and test for external correlation, and only keeping features that appeared in the first collection and did not appear in the second.

(17)

Testing the PCC values for the features as the population was performed by comparing each label to every other label and itself. When comparing frames with the same label, the process was as follows. All data points of a given label were selected from the data set and split randomly into two equal groups; if there was an odd number, one data frame was discarded. The PCC value was calculated between all the features in two frames, and then the average of all the frames was recorded. When comparing frames from different labels, a random selection of frames was taken from each population equal to the frame count of the label with the least frames. These populations were then compared the same as above, taking the PCC values for each frame over all features and storing the average value from all frames.

Both the KNN and SVM classifiers are tested in the same way. For all trials, I rounded the length of the data set down to the 1000th giving 33000 for the full data set. For each of the four trials that I ran for the data, I set aside 5000 frames of data. I split the data by stepping through the data in blocks of 1/1000th of the total length (33 for 33000 total frames) and setting aside 500 sequentially occurring data-frames from each block. For the full data set, this resulted in 5000 test frames and 28000 training frames. For the Filtered data, this resulted in 5000 test frames and 15000 training frames. For the full data, test data was taken from the 1 − 5, 9 − 14, 18 − 23, and 27 − 32 intervals of a total of 33intervals. While the filtered data was taken from the 1 − 5, 6 − 10, 11 − 15, and 16 − 20 intervals of a total of 20 intervals. The accuracy of each trial was calculated as the percentage of predicted labels that equaled the recorded label, and the final result was the average overall trials.

4 Results

4.1 Motion Capture

The result of data processing in QTM was a single track from the Kizomba dance, the markers in this dance were labeled by hand and included 33236 frames of data at a sampling frequency of 147 frames per second. The skeleton data was of the same length and frequency.

4.1.1 Dance Data Processing

The data for the Kizomba dance was labeled using two different methods as the method and knowledge for creating and applying labels were improved. Both ways are presented as this gives a perspective of how labeling affects the data.

Label Name Label ID Frame Count

Not A Move 0 10950 Basic 1 1 1950 Basic 2 2 3450 Basic 3 3 1800 saida men 4 2700 saida women 5 3150 walking 6 450 virgula 7 450 turn 8 2400 – 10 1500 – 11 2774 – 12 1500

Figure 6: Kizomba Labels 1

The labels with ID’s 10, 11, and 12 were not given names, they only recorded repeated actions in that dance, while the exact movement was not categorized. The ID of 9 was held out as a placeholder between the named and unnamed moves.

(18)

Label Name Label ID Frame Count

NA 0 161

Transition 1 1951

Basic One 2 3565

Basic One Pause 3 770

Woman Out 4 3831

Woman Out Partial 5 795

Woman Out Pause 6 1879

Woman Out Variant 7 495

Man Out 8 975

Man Out Alt 9 450

Man Out pause 10 615

Promenade 11 2909

Promenade Turn 12 1101

Basic Tango 13 3655

Basic Tango Variant 14 1105

Tango Cross 15 915 Facing Turn 16 660 Turn Twist 17 694 Turn woman 18 639 Terrachinha 19 930 Lean 20 1451 Dip 21 116 Spin 22 1180 Basic Four 23 2395

Figure 7: Kizomba Labels 2

This label set was used both in its entirety and with the low count labels filtered out when used with the classifiers. The filtering was done by removing all data that belonged to a label with less than 900 frames.

4.2 Dance Analysis

Testing with PCC on the first label calculation revealed moderate correlations between the labels and the features when examining features over time.

Labels LMA average LMA top 20 average Interaction average Interaction top 21 average

2 and 2 0.296 0.395 0.332 0.427 4 and 4 0.329 0.482 0.315 0.427 5 and 5 0.318 0.399 0.306 0.414 average 0.314 0.425 0.318 0.423 2 and 4 0.272 0.179 0.214 0.190 2 and 5 0.376 0.199 0.313 0.267 4 and 5 0.250 0.159 0.209 0.190 average 0.299 0.179 0.245 0.216

(19)

Labels LMA average LMA top 70 average Interaction average Interaction top 22 average 2 and 2 0.273 0.298 0.254 0.330 4 and 4 0.287 0.302 0.252 0.188 11 and 11 0.330 0.465 0.344 0.255 average 0.297 0.355 0.283 0.257 2 and 4 0.255 0.178 0.175 0.185 2 and 11 0.248 0.203 0.217 0.249 4 and 11 0.281 0.195 0.209 0.252 average 0.258 0.261 0.200 0.229

Figure 9: PCC feature analysis: label set 2

There appears to be very little improvement of classification correlation with improved labeling. Ad-ditionally there is not a substantial difference in the correlation on average across the features based on the labeling of the populations.

Labels LMA average LMA top 70 average Interaction average Interaction top 22 average

2 and 2 0.273 0.298 0.254 0.330 4 and 4 0.287 0.302 0.252 0.188 11 and 11 0.330 0.465 0.344 0.255 average 0.297 0.355 0.283 0.257 2 and 4 0.255 0.178 0.175 0.185 2 and 11 0.248 0.203 0.217 0.249 4 and 11 0.281 0.195 0.209 0.252 average 0.258 0.261 0.200 0.229

Figure 10: PCC feature analysis: label set 2

Labels 1 2 3 4 5 6 7 8 9 10 11 12

Internal Label Mean 0.7861 0.8205 0.9679 0.7682 0.8311 0.7199 0.9785 0.7174 0.7544 0.7799 0.6804 0.7403 Internal Label Standard Deviation 0.2664 0.0387 0.2545 0.1873 0.3254 0.0703 0.2989 0.2643 0.2358 0.3342 0.2795 0.2746

Labels 13 14 15 16 17 18 19 20 21 22 23

-Internal Label Mean 0.7385 0.7559 0.8367 0.9107 0.7249 0.7398 0.9187 0.8407 0.8020 0.8380 0.7490 -Internal Label Standard Deviation 0.2370 0.3076 0.1866 0.1182 0.2916 0.2654 0.0820 0.1708 0.2488 0.1894 0.2683

-Figure 11: PCC values comparing interaction features from data frames with the same labels averaged over time

(20)

Figure 12: PCC plot comparing interaction features from data frames with the same label averaged over time labels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 1 0.781 0.833 0.866 0.829 0.738 0.703 0.646 0.837 0.715 0.716 0.717 0.730 0.765 0.660 0.832 0.836 0.897 0.798 0.831 0.840 0.875 0.754 0.773 2 0.833 0.818 0.946 0.793 0.772 0.476 0.953 0.651 0.864 0.819 0.601 0.592 0.711 0.777 0.821 0.766 0.759 0.713 0.903 0.834 0.612 0.645 0.742 3 0.866 0.946 0.971 0.912 0.862 0.496 0.936 0.781 0.849 0.858 0.461 0.662 0.886 0.818 0.926 0.864 0.802 0.828 0.954 0.908 0.699 0.722 0.898 4 0.829 0.793 0.912 0.766 0.860 0.542 0.857 0.673 0.770 0.776 0.620 0.643 0.846 0.748 0.821 0.857 0.871 0.743 0.930 0.753 0.621 0.662 0.645 5 0.738 0.772 0.862 0.860 0.841 0.671 0.892 0.732 0.791 0.799 0.642 0.751 0.786 0.813 0.773 0.872 0.743 0.725 0.886 0.737 0.567 0.765 0.876 6 0.704 0.477 0.496 0.542 0.671 0.706 0.375 0.659 0.571 0.594 0.834 0.752 0.554 0.506 0.576 0.760 0.559 0.726 0.461 0.572 0.742 0.708 0.663 7 0.646 0.952 0.936 0.857 0.893 0.375 0.977 0.702 0.780 0.766 0.307 0.573 0.810 0.870 0.793 0.729 0.648 0.623 0.934 0.819 0.501 0.553 0.943 8 0.837 0.652 0.781 0.673 0.732 0.660 0.703 0.721 0.508 0.555 0.630 0.814 0.832 0.612 0.715 0.788 0.901 0.599 0.713 0.766 0.803 0.713 0.808 9 0.715 0.865 0.849 0.770 0.791 0.571 0.780 0.508 0.796 0.988 0.623 0.405 0.588 0.829 0.846 0.798 0.551 0.863 0.802 0.846 0.602 0.612 0.804 10 0.716 0.820 0.858 0.776 0.799 0.596 0.765 0.555 0.988 0.771 0.639 0.533 0.615 0.835 0.876 0.822 0.592 0.876 0.830 0.812 0.630 0.690 0.835 11 0.717 0.602 0.460 0.620 0.643 0.834 0.307 0.629 0.623 0.641 0.671 0.748 0.591 0.579 0.554 0.760 0.525 0.689 0.443 0.577 0.760 0.704 0.630 12 0.730 0.591 0.662 0.643 0.751 0.751 0.571 0.814 0.407 0.533 0.748 0.753 0.750 0.726 0.652 0.760 0.700 0.593 0.648 0.641 0.781 0.811 0.682 13 0.765 0.711 0.886 0.846 0.786 0.554 0.810 0.832 0.588 0.615 0.592 0.751 0.734 0.753 0.761 0.821 0.926 0.582 0.872 0.764 0.726 0.681 0.624 14 0.660 0.777 0.819 0.748 0.813 0.505 0.872 0.612 0.829 0.834 0.580 0.726 0.753 0.768 0.768 0.777 0.522 0.749 0.771 0.800 0.654 0.709 0.717 15 0.833 0.821 0.926 0.821 0.773 0.576 0.793 0.715 0.846 0.876 0.554 0.653 0.760 0.768 0.829 0.817 0.724 0.908 0.864 0.840 0.747 0.737 0.811 16 0.835 0.766 0.864 0.857 0.872 0.760 0.728 0.788 0.798 0.823 0.760 0.761 0.821 0.777 0.817 0.907 0.832 0.790 0.844 0.767 0.747 0.849 0.848 17 0.896 0.760 0.802 0.872 0.743 0.558 0.648 0.901 0.549 0.591 0.525 0.700 0.926 0.522 0.723 0.832 0.753 0.625 0.804 0.661 0.716 0.701 0.738 18 0.798 0.713 0.828 0.743 0.726 0.726 0.623 0.599 0.863 0.876 0.689 0.594 0.583 0.749 0.908 0.789 0.625 0.760 0.758 0.764 0.704 0.754 0.712 19 0.831 0.902 0.954 0.931 0.886 0.461 0.934 0.713 0.803 0.830 0.443 0.648 0.872 0.771 0.864 0.844 0.805 0.758 0.919 0.826 0.535 0.697 0.870 20 0.840 0.834 0.908 0.753 0.737 0.572 0.819 0.765 0.846 0.812 0.577 0.640 0.764 0.800 0.840 0.767 0.661 0.764 0.826 0.836 0.696 0.691 0.860 21 0.875 0.610 0.699 0.620 0.569 0.743 0.501 0.802 0.602 0.629 0.760 0.781 0.728 0.655 0.745 0.748 0.717 0.704 0.536 0.698 0.819 0.916 0.522 22 0.754 0.645 0.722 0.662 0.766 0.708 0.553 0.713 0.614 0.692 0.705 0.812 0.681 0.709 0.737 0.849 0.701 0.754 0.697 0.690 0.916 0.828 0.714 23 0.773 0.742 0.899 0.645 0.877 0.663 0.943 0.808 0.805 0.836 0.630 0.682 0.624 0.716 0.811 0.848 0.738 0.712 0.870 0.861 0.523 0.714 0.753

Figure 13: PCC values comparing interaction features from data frames with the different labels averaged over time

Labels 1 2 3 4 5 6 7 8 9 10 11 12

Internal Label Mean 0.9585 0.9450 0.8778 0.9372 0.9901 0.9647 0.9390 0.9504 0.9817 0.9442 0.9281 0.9629 Internal Label Standard Deviation 0.0458 0.0827 0.1313 0.0749 0.0172 0.0429 0.0639 0.0584 0.0217 0.0728 0.0863 0.0478

Labels 13 14 15 16 17 18 19 20 21 22 23

-Internal Label Mean 0.9507 0.9375 0.9560 0.9748 0.9769 0.9871 0.9724 0.9297 0.9241 0.9393 0.9540 -Internal Label Standard Deviation 0.0574 0.0829 0.0460 0.0356 0.0349 0.0169 0.0289 0.0851 0.1119 0.0841 0.0615

(21)

Figure 15: PCC values comparing LMA features from data frames with the same label averaged over time labels 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 1 0.957 0.950 0.959 0.939 0.955 0.972 0.959 0.945 0.963 0.961 0.937 0.947 0.909 0.920 0.933 0.957 0.948 0.937 0.954 0.934 0.852 0.948 0.948 2 0.950 0.945 0.863 0.932 0.939 0.955 0.972 0.945 0.996 0.931 0.919 0.933 0.937 0.932 0.934 0.954 0.985 0.956 0.948 0.935 0.934 0.908 0.948 3 0.959 0.863 0.875 0.971 0.891 0.936 0.861 0.864 0.888 0.950 0.875 0.881 0.857 0.864 0.862 0.920 0.908 0.850 0.934 0.870 0.745 0.907 0.904 4 0.939 0.932 0.971 0.938 0.879 0.941 0.850 0.898 0.865 0.948 0.918 0.889 0.911 0.898 0.874 0.918 0.937 0.833 0.952 0.901 0.749 0.901 0.948 5 0.955 0.939 0.891 0.879 0.990 0.973 0.961 0.949 0.982 0.921 0.987 0.983 0.960 0.950 0.969 0.962 0.947 0.993 0.927 0.968 0.935 0.988 0.957 6 0.972 0.955 0.936 0.941 0.973 0.966 0.955 0.970 0.980 0.946 0.937 0.961 0.928 0.948 0.967 0.973 0.975 0.959 0.972 0.921 0.904 0.959 0.948 7 0.959 0.972 0.861 0.850 0.961 0.955 0.946 0.931 0.976 0.914 0.948 0.954 0.917 0.895 0.933 0.939 0.956 0.960 0.901 0.963 0.918 0.976 0.928 8 0.945 0.945 0.864 0.898 0.949 0.970 0.931 0.950 0.932 0.913 0.969 0.924 0.914 0.965 0.968 0.961 0.947 0.963 0.942 0.950 0.945 0.950 0.953 9 0.963 0.996 0.888 0.865 0.982 0.980 0.976 0.931 0.985 0.936 0.958 0.978 0.944 0.922 0.949 0.965 0.979 0.972 0.944 0.979 0.931 0.992 0.958 10 0.961 0.931 0.950 0.948 0.921 0.946 0.914 0.913 0.936 0.939 0.911 0.911 0.885 0.928 0.891 0.961 0.941 0.909 0.947 0.926 0.851 0.944 0.970 11 0.937 0.919 0.875 0.917 0.987 0.937 0.948 0.969 0.958 0.911 0.929 0.962 0.906 0.958 0.973 0.960 0.951 0.987 0.941 0.942 0.946 0.946 0.939 12 0.947 0.933 0.881 0.889 0.983 0.961 0.954 0.924 0.978 0.911 0.962 0.963 0.938 0.921 0.956 0.945 0.945 0.966 0.926 0.958 0.916 0.971 0.941 13 0.909 0.937 0.857 0.911 0.960 0.928 0.917 0.914 0.944 0.885 0.906 0.938 0.952 0.931 0.953 0.957 0.900 0.977 0.872 0.914 0.961 0.918 0.920 14 0.920 0.932 0.864 0.897 0.950 0.948 0.895 0.965 0.922 0.928 0.958 0.921 0.931 0.939 0.960 0.973 0.951 0.950 0.931 0.949 0.956 0.915 0.977 15 0.933 0.934 0.862 0.874 0.969 0.967 0.933 0.968 0.949 0.891 0.973 0.956 0.953 0.960 0.958 0.964 0.949 0.972 0.926 0.966 0.964 0.952 0.944 16 0.957 0.954 0.920 0.918 0.962 0.973 0.939 0.961 0.965 0.961 0.960 0.945 0.957 0.973 0.964 0.976 0.962 0.960 0.937 0.957 0.947 0.969 0.973 17 0.948 0.985 0.908 0.937 0.947 0.975 0.956 0.947 0.979 0.941 0.951 0.945 0.900 0.951 0.949 0.962 0.975 0.931 0.972 0.957 0.924 0.962 0.969 18 0.937 0.956 0.850 0.833 0.993 0.959 0.960 0.962 0.972 0.909 0.987 0.966 0.977 0.950 0.972 0.960 0.931 0.987 0.898 0.966 0.956 0.984 0.951 19 0.954 0.948 0.934 0.952 0.927 0.972 0.901 0.942 0.944 0.947 0.941 0.926 0.872 0.931 0.926 0.937 0.972 0.898 0.974 0.924 0.834 0.938 0.960 20 0.934 0.935 0.870 0.901 0.968 0.921 0.963 0.950 0.979 0.926 0.941 0.958 0.914 0.949 0.966 0.957 0.957 0.966 0.924 0.926 0.969 0.933 0.944 21 0.852 0.934 0.746 0.749 0.934 0.904 0.918 0.945 0.931 0.851 0.946 0.916 0.961 0.956 0.964 0.946 0.924 0.956 0.834 0.969 0.926 0.931 0.918 22 0.948 0.908 0.907 0.901 0.988 0.959 0.976 0.950 0.992 0.944 0.946 0.971 0.918 0.915 0.952 0.969 0.962 0.984 0.938 0.933 0.931 0.937 0.916 23 0.948 0.948 0.904 0.948 0.957 0.948 0.928 0.953 0.958 0.970 0.939 0.941 0.920 0.977 0.944 0.973 0.969 0.951 0.960 0.944 0.918 0.916 0.951

Figure 16: PCC values comparing LMA features from data frames with the different labels averaged over time

By comparing the correlation between frames of data we can see how different dance moves relate to each other, where a high PCC value indicated a strong similarity in feature values and a lower correlation indicates a weaker similarity in feature values.

For the t-SNE plots both the LMA features and the interaction features are calculated with the same settings. They use a hamming distance for the distance calculation and the barneshut algorithm.

The plot of the LMA features reveals that distinct spacial groupings of features do not represent the classification labels. However, the features are shown to change smoothly over time, as shown by the points’ string-like nature.

(22)

Figure 17: t-SNE plot for label set 1

(23)

Figure 19: t-SNE plot for combined data with label set 2

Figure 20: t-SNE plot for raw data with label set 2

Feature Type Mean Accuracy SVM Mean Accuracy KNN

LMA 13.24% 56.83%

Interaction 16.57% 50.18%

(24)

Feature Type Accuracy SVM Gaussian Accuracy KNN Accuracy SVM Cubic

Raw Data 20.50% ± 3.59 64.05% ± 7.73 58.53% ± 6.54

Raw Data Filtered 42.7% ± 39.12 79.38% ± 16.35 74.43% ± 20.74

LMA 19.08% ± 4.97 66.06% ± 9.83 61.12% ± 9.26

LMA Filtered 40.59% ± 39.59 79.83% ± 15.25 79.97% ± 14.05

Interaction 41.25% ± 7.97 57.35% ± 10.8 57.35% ± 11.27

Interaction Filtered 63.79% ± 25.24 76.09% ± 18.47 79.39% ± 15.04

Combined Features 13.65% ± 4.74 67.21% ± 10.56 62.78% ± 10.83

Combined Features Filtered 37.18% ± 41.92 80.74% ± 14.41 81.48% ± 12.92

Figure 22: Classifier results: Label 2

5 Discussion

5.1 Data Collection

It is recognized that the type of motion found in two-person dance is challenging to track. For this reason, for future work, more effort needs to go into data collection. For the Qualisys to be able to identify and track markers, markers need to be visible in nearly every frame. Therefore either a custom algorithm needs to be developed and used to track and gap-fill the data, or the motion capture system needs to be tuned to record high-level results.

Due to the difficulty involved in obtaining high-quality motion capture data when the two subjects are frequently obscuring each other’s markers, the data collection could be much higher quality. Markers were often hidden in the data, and the QTM auto labeling of data points had poor performance. Data was labeled by hand rather than through automated processes. The vast increase in time that hand labeling caused meant fewer data to work with and that the existing data had more noise than if the markers were tracked automatically. Therefore substantially more time had to be dedicated to creating the skeleton data and produced only a single fully processed dance track, meaning that there would only be one dance style to work with instead of three, and a sixth of the total motion capture data available.

Increasing the amount of data will be essential for future work applying machine learning models because there are a large number of unique dance moves and, therefore, a large number of classes that need to each have sufficient data for the model to train to recognize them successfully.

5.1.1 Active Markers

The Qualisys Motion Tracking System includes active markers, markers that emit light and allow the system to assign a label to that marker automatically. Using this system would significantly decrease the effort required to process the tracks and create skeleton data. This enhancement is recommended explicitly by Qualisys when working in conditions with problems like those discussed in this paper.

5.1.2 Configuration Testing

Camera Positioning, marker fixing, and the camera’s optical configuration all influence the quality of the record-ings, while the data itself (in skeleton form) does not need to be of perfect condition, it is evident that the quality of the recording will determine how long the data takes to process. Therefore including a testing step to ensure that all the practical configurations are tuned to produce the best results will pay back in time spent preparing the data.

5.2 Motion Capture

5.2.1 Labeling

There are two aspects to labeling correctness, accuracy, and precision. Labeling done to only one second compared to the higher resolution labeling had a notable accuracy increase of at least 5% on all tests. Particularly for the SVM Classifier which is more sensitive to label quality, the

5.2.2 Feature Calculation

This project’s primary goal was to establish a baseline for feature performance, assessed by the ability to classify dance moves. From the general similarity in results from the two calculated sets, both to each other and the

Feature Assessment for the Analysis of Latin Dance Modeling Two-Person Dance with Machine Learning

Feature Assessment for the Analysis of

Latin Dance

Modeling Two-Person Dance with Machine Learning

Nicholas J Shindler

Feature Assessment for the Analysis of Latin Dance

Modeling Two-Person Dance with Machine Learning

Nicholas J Shindler

Contents

1

Introduction

1.1

Motivation

1.2

Problem

2

Background

2.1

Related Work

2.2

Motion Capture

2.3

Motion Tracking

2.4

Dance Classification

2.5

Algorithms

2.6

Ethical Considerations

3

Methods and Implementation

3.1

Tools

3.2

Experiment Method

#

Feature

Description

Calculations

17

gait A

distance between feet person A

max/min/mean/std

18

gait B

distance between feet person B

max/min/mean/std

19

feet velocity

derivative of feet average position

mean/std

20

hip orientation difference

difference in orientation of hips (yaw)

mean/std

21

hand distance A

side A distance between partners hands

max/min/mean/std

22

hand distance B

side B distance between partners hands

max/min/mean/std

23

volume

combined volume of both dancers

max/min/mean/std

24

feet volume

space all feet take up

max/min/mean/std

3.3

Experiment Implementation

4

Results

4.1

Motion Capture

4.2

Dance Analysis

5