• No results found

4. STUDYING THE TRAFFIC PROCESS

4.2. Behaviour classification by pattern recognition techniques

4.2.2. Pattern recognition at work

The dataset: left turning vehicles at a signalised intersection

The data for this example was collected at a signalised intersection in Lund (Figure 25). The vehicles making left turns from one of the entrances were detected and their speed profiles saved. The data was visually checked for consistency and manually corrected in cases of obvious errors in detection. The profiles were trimmed and adjusted so that each profile contained the same number of points, evenly distributed along the trajectory between the defined start and end lines.

Figure 25. View of the observation site and conflict points for left turning vehicles.

According to the rules, a left-turning vehicle must yield to traffic coming from the opposite direction and to pedestrians that have green in the same phase. As a result, four situation types are possible. The first one takes place when there are vehicles coming from the opposite direction and the driver has to yield by braking in the middle of the intersection (near the imaginary middle line – type a). If there is a pedestrian at the pedestrian crossing, the driver has to brake before the crossing (type b). If no conflicting traffic is present, and the speed of a turning vehicle remains nearly constant or slightly increases (type c). Situations when a driver has to brake both near the middle line and near the pedestrian crossing are extremely rare, since

In order to utilise the advantages of the detailed data contained in large samples of speed profiles, it is necessary to have a method that:

− differentiates between the behaviour types based on endogenous (derived from data) criteria in a way similar to a human observer;

− makes use of the systematic variation in the data that can be attributed to different types of behaviour (i.e. analyses shapes of the speed profiles);

− can handle large amounts of data as produced by the video analysis techniques.

In this section I describe how this methodological gap can be filled by using pattern recognition techniques. Pattern recognition is a topic in machine learning theory that aims at classifying data based on a priori knowledge or information extracted from the data itself. Three techniques – cluster analysis, supervised learning and dimension reduction – have been tested.

The more extensive description of these tests can be found in Paper III.

4.2.2. Pattern recognition at work

The dataset: left turning vehicles at a signalised intersection

The data for this example was collected at a signalised intersection in Lund (Figure 25). The vehicles making left turns from one of the entrances were detected and their speed profiles saved. The data was visually checked for consistency and manually corrected in cases of obvious errors in detection. The profiles were trimmed and adjusted so that each profile contained the same number of points, evenly distributed along the trajectory between the defined start and end lines.

Figure 25. View of the observation site and conflict points for left turning vehicles.

According to the rules, a left-turning vehicle must yield to traffic coming from the opposite direction and to pedestrians that have green in the same phase. As a result, four situation types are possible. The first one takes place when there are vehicles coming from the opposite direction and the driver has to yield by braking in the middle of the intersection (near the imaginary middle line – type a). If there is a pedestrian at the pedestrian crossing, the driver has to brake before the crossing (type b). If no conflicting traffic is present, and the speed of a turning vehicle remains nearly constant or slightly increases (type c). Situations when a driver has to brake both near the middle line and near the pedestrian crossing are extremely rare, since

the pedestrian flow is low and those who are present usually manage to complete their passage while the driver is waiting at the middle line.

Examination of the vehicle speed profiles in such situations reveals that they have quite typical shapes in general (Figure 26). However, not all the profiles resemble the typical shapes, appearing somewhat in between two shapes, which makes it difficult to assign them to a certain type. A similar problem is experienced by an observer who classifies the situations by watching them on video, as they seem to fit the definition of more than one type (for example, a vehicle moves forward slowly to the middle lane, thus avoiding abrupt braking but is still affected by the oncoming traffic). This may be seen as a natural variety of the behavioural forms, which complicates classification regardless of what method is used.

0 3 6 9 12 15

-30 -25 -20 -15 -10 -5 0 5 10 15 20

Distance, m

V, m/s Middle line Pedestrian crossing

c

b a

Figure 26. Three typical profile shapes: a – driver yields to the oncoming vehicles; b – driver yields to pedestrians at the pedestrian crossing; c – no on-coming traffic or pedestrians.

The techniques: cluster analysis, supervised learning and dimension reduction

To classify the speed profiles, I use three basic pattern analysis techniques, namely cluster analysis, supervised learning and dimension reduction (Ripley, 1996, Strang, 1986, Duda & Hart, 1973).

Cluster analysis is a general name for methods of dividing the data into several partitions (clusters) according to some properties considered common for the items within the cluster. Most often this property is proximity, i.e., the items in a cluster are closer to each other or to the cluster centre than to other items or other cluster centres (cluster centre in this case is also a profile with a certain shape considered “typical” by the algorithm). A clustering algorithm may force the data into a pre-defined number of clusters k (k-clustering) or find the optimal number of clusters based on the data.

The main difference in supervised learning compared to clustering is that the classification function is learnt from a training dataset containing both the input

the pedestrian flow is low and those who are present usually manage to complete their passage while the driver is waiting at the middle line.

Examination of the vehicle speed profiles in such situations reveals that they have quite typical shapes in general (Figure 26). However, not all the profiles resemble the typical shapes, appearing somewhat in between two shapes, which makes it difficult to assign them to a certain type. A similar problem is experienced by an observer who classifies the situations by watching them on video, as they seem to fit the definition of more than one type (for example, a vehicle moves forward slowly to the middle lane, thus avoiding abrupt braking but is still affected by the oncoming traffic). This may be seen as a natural variety of the behavioural forms, which complicates classification regardless of what method is used.

0 3 6 9 12 15

-30 -25 -20 -15 -10 -5 0 5 10 15 20

Distance, m

V, m/s Middle line Pedestrian crossing

c

b a

Figure 26. Three typical profile shapes: a – driver yields to the oncoming vehicles; b – driver yields to pedestrians at the pedestrian crossing; c – no on-coming traffic or pedestrians.

The techniques: cluster analysis, supervised learning and dimension reduction

To classify the speed profiles, I use three basic pattern analysis techniques, namely cluster analysis, supervised learning and dimension reduction (Ripley, 1996, Strang, 1986, Duda & Hart, 1973).

Cluster analysis is a general name for methods of dividing the data into several partitions (clusters) according to some properties considered common for the items within the cluster. Most often this property is proximity, i.e., the items in a cluster are closer to each other or to the cluster centre than to other items or other cluster centres (cluster centre in this case is also a profile with a certain shape considered “typical” by the algorithm). A clustering algorithm may force the data into a pre-defined number of clusters k (k-clustering) or find the optimal number of clusters based on the data.

The main difference in supervised learning compared to clustering is that the classification function is learnt from a training dataset containing both the input

objects and the desired outputs. The training dataset has to be produced manually beforehand. The decision is made based on the analysis of “similarity” of the classified items to each group in the training dataset.

Dimension reduction is a way to decrease the number of data points that describe each profile, but still preserve the most important information about them. This simplifies the later classification and allows visualisation of the data so that possible patterns can be seen (in this case the number of dimensions has to be reduced to less than three). In this test I use singular value decomposition technique to find the most important features in the data and represent each profile by only two co-ordinates.

Figure 26 illustrates an example of speed profile classification using these techniques.

a) b) c) Figure 27. Classification of speed profiles by three pattern recognition techniques: a) cluster analysis (k-means); b) supervised learning (nearest neighbour); c) dimension reduction (singular value decomposition).

The general conclusion is that the pattern recognition techniques perform quite well in classifying the behaviour types, even though some variation in accuracy between the techniques can be found. The great advantage of these techniques is the automation of the classification process which allows analysis of larger datasets.

Another aspect is the reduction of the subjective effects a specific observer might have on the results when doing the classification manually.

Finding the right technique for the data is often stated to be more of an art than a science, and parameters working well for one dataset may not work for another. The best strategy in this case is to have a toolbox of different techniques where the right one is found by using trials.

Profiles with shapes that do not match any of the typical patterns is a problem that needs special investigation. All three techniques are quite insensitive to such outliers and simply force them into one of the typical groups. However, examination of the outliers might be important in case they represent some kind of breakdown in normal

objects and the desired outputs. The training dataset has to be produced manually beforehand. The decision is made based on the analysis of “similarity” of the classified items to each group in the training dataset.

Dimension reduction is a way to decrease the number of data points that describe each profile, but still preserve the most important information about them. This simplifies the later classification and allows visualisation of the data so that possible patterns can be seen (in this case the number of dimensions has to be reduced to less than three). In this test I use singular value decomposition technique to find the most important features in the data and represent each profile by only two co-ordinates.

Figure 26 illustrates an example of speed profile classification using these techniques.

a) b) c) Figure 27. Classification of speed profiles by three pattern recognition techniques: a) cluster analysis (k-means); b) supervised learning (nearest neighbour); c) dimension reduction (singular value decomposition).

The general conclusion is that the pattern recognition techniques perform quite well in classifying the behaviour types, even though some variation in accuracy between the techniques can be found. The great advantage of these techniques is the automation of the classification process which allows analysis of larger datasets.

Another aspect is the reduction of the subjective effects a specific observer might have on the results when doing the classification manually.

Finding the right technique for the data is often stated to be more of an art than a science, and parameters working well for one dataset may not work for another. The best strategy in this case is to have a toolbox of different techniques where the right one is found by using trials.

Profiles with shapes that do not match any of the typical patterns is a problem that needs special investigation. All three techniques are quite insensitive to such outliers and simply force them into one of the typical groups. However, examination of the outliers might be important in case they represent some kind of breakdown in normal

traffic that might have implications for safety or efficiency. Detailed examination of such situations might give an idea of how they may be eliminated. A possible solution is to compare individual profiles with the average profile and select significantly different ones.

In some cases a subjective component introduced by an observer when making classifications might be useful, especially if the differences in behaviour are difficult to express in objective terms. An observer might be able to classify quite complex traffic situations (for example, traffic conflicts) without being able to explicitly formulate the classification criteria. The pattern recognition techniques might help reveal the relations between these subjective judgements of human observers and the objective variables and contribute to a better standardisation of the conflict classification. This, however, requires a large set of traffic conflicts with detailed data on the road users’

movements.

traffic that might have implications for safety or efficiency. Detailed examination of such situations might give an idea of how they may be eliminated. A possible solution is to compare individual profiles with the average profile and select significantly different ones.

In some cases a subjective component introduced by an observer when making classifications might be useful, especially if the differences in behaviour are difficult to express in objective terms. An observer might be able to classify quite complex traffic situations (for example, traffic conflicts) without being able to explicitly formulate the classification criteria. The pattern recognition techniques might help reveal the relations between these subjective judgements of human observers and the objective variables and contribute to a better standardisation of the conflict classification. This, however, requires a large set of traffic conflicts with detailed data on the road users’

movements.