• No results found

Developing crash models with supporting vector machine for urban transportation planning

N/A
N/A
Protected

Academic year: 2021

Share "Developing crash models with supporting vector machine for urban transportation planning"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

1(12)

Developing Crash Models with Supporting Vector Machine for Urban

Transportation Planning

Xiaoduan Sun, PhD and PE Professor

University of Louisiana Lafayette, Louisiana 70504, U.S.A

xsun@louisiana.edu

Subasish Das, PhD

Associate Transportation Researcher Texas A&M Transportation Institute

U.S.A.

Nicholas Broussard, MS and PE Traffic Planner and Engineer

Neel-Schaffer, Inc. U.S.A.

ABSTRACT

Effectively incorporating roadway safety into transportation planning requires robust safety models that can quantitatively predict the safety performance of future planned roadway development options. Although various safety models have been developed including the models introduced in the first edition of Highway Safety Manual (HSM) by American Association of State Highway Transportation Officials (AASHTO), these models try to link roadway design features, such as lane with, should width, horizontal curve and vertical grade design with crash occurrences at disaggregated level and require the detailed inputting data and complex application procedures. Transportation planning mainly deals with type and functionality of roadway or roadway network. The HSM crash prediction modes for urban and suburban roadway are complex involving several sub-models for different types of collisions, which makes it hard for transportation planning applications.

This paper introduces an innovative crash prediction model with so-called Support Vector Machines (SVM). Being a branch of machine learning, SVM focuses on the recognition of patterns and regularities in data. The dramatic growth in practical applications for machine learning over the last ten years has been made possible by many important developments in the underlying algorithms, techniques and readily available open-source programming code. Motivated by lack of suitable safety models for transportation planning, this study used the SVM with crash data from Louisiana urban roadways to develop safety models for urban 2-lane roadway, multi-lane roadway and freeways with satisfactory results. Comparing with parametric statistical regression models, the SVM model produces results can not only reach the same level of accuracy but also be straightforward for practical applications in urban transportation planning.

(2)

2(12)

1. INTRODUCTION

Transportation planning is the act of evaluating the existing transportation system of an area, projecting its future growth, identifying its current and future deficiencies, and selecting transportation projects to address the deficiencies under the budgetary constraints. The United States Transportation Bill, Moving Ahead for Progress in the 21st Century (MAP-21) provides the funding for roadway projects across the country and basic guidance for what a Metropolitan Transportation Plan (MTP) should encompass. One of the important aspects of the Bill states that the transportation planning must increase the transportation system’s safety for all users, which requires that any transportation improvement plan must make an effort to improve the safety of the area.

Due to the lack of quantitative analysis method in the past, the safety benefit for the future transportation network was often determined by engineering judgment, or that of an experienced transportation planner in the past transportation planning practice. The first edition of Highway Safety

Manual (HSM) published by AASHTO in 2013 (1) does tools for safety analysis in quantitative terms.

However there are several issues that make the HSM analysis largely inconvenient and incompatible with transportation planning. For example, the required roadway data for HSM models are lane width, shoulder width and type, median width, side slopes, lighting, density of roadside fixed object and driveway, presence of auto speed enforcement and on-street parking, and etc. At planning level, the roadway information needed are limited to roadway functionality, number of lanes, and daily traffic volume. It is hard, if not totally unfeasible, to obtained the detailed roadway data at the planning stage for the safety analysis.

Furthermore, the HSM models for urban and suburban roadway are the most complex compared to the models for rural roadways as shown in Figure 1. In addition to models for five different roadway

(3)

3(12) type and four different intersection types, there are sub-models for five collision types on segment and four collision types for intersection, which makes the safety analysis very time-consuming. The data requirement and complicated application process make use of the HSM methodology in transportation planning to be an inadequate approach and “too umbrella” to be used properly.

Figure 1. HSM Models for Urban and Suburban Roadways

Thus, the objective of the research was to establish a crash prediction model for roadway planning purposes with data readily available from the state Department of Transportation, local government with jurisdiction of the transportation project(s), or another reliable source in order to determine the safety impact of a transportation project on the roadway network.

2. LITERATURE REVIEW

Modelling roadway safety perofrmance has been a popular research topic for the past two decades while practitioners look for tools to link roadway design features to crash occurences. The analytical methods are the tradicitonal modelling tools because they can provide end user with an equation to

Urban and Suburban Arterials Segment Intersection 4–leg Signalized 2-lane 5-lane 3–leg Signalized

3–leg Sign Controlled 4–leg Sign Controlled 3-lane 4-lane divided 4-lane undivided By five collision Types By four collision Types

(4)

4(12)

predict the roadway safety performance for various ourposes. However, analytical solutions to complex systems can often become inadqute since those analytical solutions provide a concise numerical solution to a complicated system that is hard to capture. In highway safety, the most representable analytical models are presented in the first edition of Highway Safety Manual published by American Association of Highway Transportation Officials (AASHTO), which discusses crash prediction models for 3 types of roadways based on 50 years of research results from not only the United States but other countries as well.

Another modeling method that has been developed rapidly in real world during the past 5 years is called machine learning algorithms that “learns” from the available data and determines how to perform the given task(s) by generalizing from example and mimicking the expected results (2). There are several machine learning algorithms types, which include clustering, support vector machines (SVR), fuzzy algorithms, and kriging methods. The biggest advantages of these models are capable of recognizing sophiscated patterns through data analysis algorithim and self modifying to increase predictbility of complex relationships. The drawback to using the machine learning algorithms is that they do not provide an equation, but rather a “black box” .

In recent year, more and more research in transportation have utilized non-conventional machine learning methods in modeling because of the demonstrated modeling benifits. Support Vector Machines (SVP) as one of machine learning-vector regression techniques has been successfully used in Annual Average Daily Traffic (AADT) estimation (3 and 4). There are little studies in utilizing the SVP in safety modeling.

3. METHEDOLOGY

Three data sets were first prepared for the modeling, crash, roadway attributes and AADT. The three most recent ayears of crash data ( 2011-2013) in Louisiana are used, which also include AADT rom

(5)

5(12) the LADOTD. The use of three years of data also allows for the establishment of more recent traffic trends while avoiding a regression to the mean (a statistical event that makes natural variation between samples look like real change) bias. The Louisiana crash records contain data on crash number, highway number, AADT, control section, functional classification, highway classification, logmiles, latitude and longitude, milepost, and more. Table 1 lists the initial data items considered in modeling.

TABLE 1. Data for Modeling

Data Name Description

CONTROL_SECTION LADOTD Control Section number LOGMILE_FROM Control Section Logmile Starting point LOGMILE_TO Logmile Ending point

LENGTH Length of segment

AADT Annual Average daily traffic

FUNCTIONAL_CLASS Highway function classification HIGHWAY_CLASS Type of highway

MEDIAN_WIDTH Width of the median, if any, NUM_LANES Number of lanes in both directions PAVEMENT_WIDTH Total pavement width of the segment

To minimize the regression-to-the-mean effect, three years of crash data were used for 2-lane highway, multiple lane highway and freeway. The exploratory data analysis has yield the basic results shown in Figures 2, 3 and 4.

(6)

6(12)

Figure 2. Results of EDA for Urban 2-Lane

(7)

7(12) Figure 2. Results of EDA for Urban Freeway

Two variables were removed based on the initial EDA, pavement width and number of lanes since lane width is directly linked with pavement width and the analysis is separated by type of roadway (indicating number of lanes). Totally more than 450,000 crashes in 15,400 segments used in the modeling. The 70% of data were used in trainset for model development. The SVR models used in this study are dependent upon the kernel type, value of the penalty for excess deviation during training (C, Gamma), and error-term value (ε, Epsilon) for the ε-insensitive loss function. The number of support vectors to be used in modeling is determined before running the SVR analysis. The models were run using an open-source software programming language, R, to predict the average yearly crash frequency. The model parameters are as follows:

 SVM-Type, eps-regression  SVM-Kernel, radial

 Cost, a value of 100 in the study  Gamma, a value of 1

(8)

8(12)

The SVM results are displayed as predicted crashed vs. observed crashed in Figure 6, 7 and 8 for three types of roadways.

Figure 6. Results for Urban 2-Lane SVM Model (Trainset)

Figure 7. Results for Urban 4-Lane SVM Model (Trainset)

0 20 40 60 80 100 120 140 160 0 50 100 150 200

Pr

ed

ic

ted

C

rash

es

Observed Crashes

0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700

Pr

ed

ic

ted

Cras h es

Observed Crashes

(9)

9(12) Figure 8. Results for Urban Freeway SVM Model (Trainset)

It is clear that the SVR models are capable of predicting the crashes close to the observed values and have an acceptable amount of variation. Additionally, the models are not predicting values in a manner that is either consistently higher or consistently lower than the observed crashes.

4. MODEL VALIDATION

Model validation is used to determine if adjustments are necessary in order for a model to replicate the base model conditions as closely as possible. This means trying to get the predicted values as close to the observed values as possible, which can be done by changing the model parameters, the application of external factors for corrections, or other means through calibration if necessary. The validation process is meant to show that the model performs within a range of values that simulates the observed values shown in everyday application.

Validation of the crash predictions models is done at the individual model level using the trainset data. In order to validate the model an analysis of the 𝑅2 of the data plots (observed vs predicted), average absolute difference between observed and predicted values, percent deviation, RMSE, and Percent RMSE was conducted.

0 50 100 150 200 250 300 0 50 100 150 200 250 300 350 Predicted Cras h es Observed Crashes

(10)

10(12)

The use of RMSE and Percent Root Mean Square Error (Percent RMSE) was chosen due to the fact that a raw aggregate sum and percent deviation comparison can be misleading. This is due to the fact that the total sums of the observed and predicted crashes can be very close, but individual segments can have a high amount of variation between them, resulting in what appears to be a good overall model performance but a weak performance at the segment level. The RMSE is a representation of the standard deviation of the differences between the observed field values and predicted values within the model sample. However, the RMSE does not provide information about the magnitude of the error relative to the observed values, leading to why the Percent RMSE is also computed. This measure expresses the RMSE as a percentage of the average count value. The Percent RMSE is defined as below:

100 * / ) ( % 2        

unts Numberofco Count unts Numberofco Count Model RMSE j j j j j

Table 2 displays the model validation statistics with the Transet and Testset. The Testset uses model developed by Trainset.

(11)

11(12) Table 2. Base Model Validation Statistics

However, the results from Testset, i.e., comparing the observed crashes with results from the model developed by the Trainset, are not significant, which is common in both conventional statistical methods and SVM method. The prediction of the SVM model (results of the Testset) can be improved by eliminating the outliers in the Trainset. Another way to improve the model is to determine the optimized loss-function by simulation that is the future direction of this research.

5. DISCUSSIONS

Urban transportation planning and metropolitan transportation plans are complex, regulated, and vitally important undertakings that impact the long range growth and health of an area. As such, the Federal government ensures that where possible the planning processes are adequately defined and the desired outcomes are well known. Historically the safety element scoring in these transportation planning documents has relied solely upon the judgment and knowledge of experienced professionals. The performance-based measures require using quantitative terms in cooperating safety onto transportation planning. While the use of HSM modeling is available to transportation planning, the applications are cumbersome, consuming and requires the detailed data that are hard to obtain at the Urban Roadway Type Trainset (75%) Testset (25%) # of segments Total Crashes (Observed) Total Crashes (Predicted) RMSE R2 # of segments Total Crashes (Observed) Total Crashes (Predicted) RMSE R2 Urban 2-Lane 2,195 13,225 11,367 9.45 0.67 732 13,225 6,453 14.52 0.45 Urban Multi-Lane 1,772 25,746 24,446 13.42 0.86 590 8,742 2,123 18.65 0.16 Urban Interstate 525 9,783 8,894 14.52 0.68 175 3,116 574 19.13 0.16

(12)

12(12)

planning stage. The models developed in this project aimed to fulfill the safety analysis needs at the planning level with the data readily available at the planning level or easily obtainable.

This research have shown that using local crash data and limited number of variables can lead to better safety models for local transportation planning. These models can be easily used to predict the future crash frequencies on roadways based upon their future conditions. Using the predicted crash frequencies, the relative change between two different improvement projects can be easily obtained for “what if” analysis. This will allow transportation planners to use the results to rank and score transportation projects based on safety and other factors to satisfy the current Transportation Bill requirements.

REFERENCE

1. Highway Safety Manual, the first edition by AASHTO. 2013

2. Domingos, P. A Few Useful Things to Know about Machine Learning. University of Washington, Seattle. 2012

3. Castero-Neto, M., Jeong, Y., Jeong, M. K., and Han, L. D. AADT Prediction Using Support Vector Regression with Data-Dependent Parameters. Research Report. University of Tennessee,

Knoxville, 2008.

4. LeBouef and Sun. Estimating Annual Average Daily Traffic for Non-State Roads in Louisiana. University of Louisiana at Lafayette, Lafayette, LA. 2014

References

Related documents

Söderberg tror dock inte att en stor förnyelse av Råslätt centrum skulle bidra till att locka fler olika människor till Råslätt.. Istället fokuserar fastighetsbolaget på

This kind of variables also reduces the size of the dataset so that the measure points of the final dataset used to train and validate the model consists of one sample of

Resultatet av interaktionen i klassrummet beror till stor del på om läraren har utformat aktiviteter på ett strukturerat sätt och om han/hon kan få eleverna att arbeta bra

rjo dus fit. Quod fynonymiam fpedat» n, dicuntur Muhamedani, Moslemin, vuL n. go Mufelmanni, quod interprètes per 0, credentes five fiduciam in DEO collo- cantes vertunt :

Presenteras ett relevant resultat i förhållande till syftet: Resultatmässigt bevisas många åtgärder ha positiv inverkan för personer i samband med någon form av arbetsterapi,

Kosowan och Jensen (2011) skrev att sjuksköterskor i deras studie inte vanligtvis bjöd in anhöriga till att delta under hjärt- och lungräddning på grund av bristen på resurser

Denna studie är av en kvalitativ, beskrivande karaktär. För att införskaffa en mer generell kunskap där allmängiltiga teorier kan dras kring hur den

Trots att det finns en viss skillnad mellan begreppen har det inte gjorts någon skillnad mellan dessa i detta arbete då syftet är att beskriva vilka effekter djur i vården kan ha