Collection, cleaning and analyzing civic addresses open datasets for the Halifax region to classify them as Residential or Non-residential by creating a classification model and performing real-time web scraping to supplement the prediction.
Implemented Random Forest Algorithm to classify address into Residential and Non Residential. To improve the performance and accuracy of the model hyperparameter tuning has been implemented.
To be more specific with the addresses a web scraper was added as a layer which scrapes the top 10 pages result from web. The text data extracted is cleaned and preprocessed using NLP methods.
To get the real time map view and street images google API's are add on in the code.