This project aims to develop a web scraper to extract specific information from the Grab Food Delivery platform.
It scrapes restaurant lists, details, delivery fees, and estimated delivery times for selected locations. The scraper is implemented using Python and necessary frameworks like Selenium, following object-oriented programming (OOP) concepts, and optimized for scalability and performance using multithreading.
The tasks performed by the web scraper include:
- Extracting restaurant lists with details.
- Creating a unique restaurant list.
- Extracting average delivery fees and estimated delivery time for selected locations.
The scraper extracts the following fields/column data visible on the Grab Food Delivery website:
- Restaurant Name
- Restaurant Cuisine
- Restaurant Rating
- Estimate Time of Delivery
- Restaurant Distance from Delivery Location
- Promotional Offers
- Restaurant Notice
- Image Link of the Restaurant
- Is Promo Available (True/False)
- Restaurant ID
- Restaurant Latitude and Longitude
- Estimate Delivery Fee
- Scraping Logic: The scraper navigates through the Grab Food Delivery website, and selects the location following API calls to fetch the restaurant's data.
- OOP Implementation: The code follows object-oriented programming principles, ensuring modularity and maintainability.
- Optimization: Multithreading is employed to enhance performance and scalability, enabling efficient data extraction.
- Data Handling: Extracted data is saved in CSV and gzip of ndjson format for storage and analysis.
- Selenium Wire: The selenium wire package uses Blinker, whose latest version is no longer supported, so explicitly has to take 1.7.0.
- Blocking and Authentication: I did proxy/IP rotation to avoid blocking one IP.
- Error Handling: Implement more robust error handling mechanisms to handle edge cases gracefully.
- Proxy Rotation: Introduce proxy rotation in more efficient way, right now I am only doing the rotation at the very first step.
- Multi-Processing: This can be much better if given time, I will try to optimize it more.
# Clone this project
$ git clone https://github.com/{{YOUR_GITHUB_USERNAME}}/food-grab-web-scraping
# Access
$ cd food-grab-web-scraping
# Setup virtual environment
$ python3 -m venv venv
# Install dependencies
$ pip install -r requirements.txt
# Run the project
$ run XHR.py file