This repository contains the implementation of the paper Large-Scale Order Dispatch in On-Demand Ride-Hailing Platforms: A Learning and Planning Approach in Python. Specifically, it creates a synthetic environment to simulate the ridesharing marketplace according to Section 6.1 of the paper and applies the MDP order dispatch policy developed in the paper to this example. Please refer to Demonstration.ipynb
for the detailed implementation.
The algorithm consists of two steps:
- Policy Evaluation: Apply temporal difference learning to the historical data to learn the value function
- Order Dispatch: Implement the order dispatch policy by maximizing the value function
Illustration of the policy evaluation step:
Pseudocode:
The order dispatch step:
Simulation results and comparison against other baseline policies: