All required packages are in the requirements.txt file
I came across this dataset on Kaggle and it had a lot of information. A lot of questions could be answered from this dataset so I wanted to see what I can discover from it. I also wanted to try GeoSpatial Analysis and Geoplotting.
From a business perspective it would be very informative to know where most accidents occur and if there is a pattern and when the accidents occur most. A business that would find this informative would be towing companies because they would be able to increase their response rate and minimize time to the accidents if they know where most of them occur per state.
There is a lot to analyze with this dataset. Some questions can be:
- Which States has the most accidents? This would help us know how to distribute the amount of tow trucks
- What time do most accidents occur? More workers should be available during those times
- What affects the severity of the accident? We would be able to minimize the number of severe accidents if we know the factors contributing to them.
- Can we predict an accident? This could help tow truckers anticipate a job coming.
There are 3 notebooks available.
- US Accidents Analysis is a notebook for initial exploration and visualizations, as well as data preparation and feature engineering.
- BERT Classifier classifies the severity of an accident by using the text descriptions given in the dataset. This uses BERT deep learning and it achieved 91% accuracy.
- Classification Model this is a notebook for classifying and predicting where the accident occurs. Sadly I did not have enough RAM to actually run the model and validate it but I was able to get the feature importance.
Some of the findings can be found in this medium post
- Which state has the most accidents? California - perhaps that suggests more tow truckers are needed in that state
- What time do most accidents occur? Differs per state but accidents happen most before and after work hours (6-8am & 3-6pm)
- What affects the severity of the accident? Duration, and Weather Condition
- Can we predict an accident? This is included in the Classification Model notebook
- Can we predict the severity of an accident? This is included in the BERT notebook
The Dataset:
- Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, and Rajiv Ramnath. "A Countrywide Traffic Accident Dataset.", 2019.
- Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, Radu Teodorescu, and Rajiv Ramnath. "Accident Risk Prediction based on Heterogeneous Sparse Data: New Dataset and Insights." In proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM, 2019.