Two Phase $Q-$learning for Bidding-based Vehicle Sharing

Published 29 Sep 2015 in cs.AI and math.OC | (1509.08932v3)

Abstract: We consider one-way vehicle sharing systems where customers can rent a car at one station and drop it off at another. The problem we address is to optimize the distribution of cars, and quality of service, by pricing rentals appropriately. We propose a bidding approach that is inspired from auctions and takes into account the significant uncertainty inherent in the problem data (e.g., pick-up and drop-off locations, time of requests, and duration of trips). Specifically, in contrast to current vehicle sharing systems, the operator does not set prices. Instead, customers submit bids and the operator decides whether to rent or not. The operator can even accept negative bids to motivate drivers to rebalance available cars to unpopular destinations within a city. We model the operator's sequential decision-making problem as a \emph{constrained Markov decision problem} (CMDP) and propose and rigorously analyze a novel two phase $Q$-learning algorithm for its solution. Numerical experiments are presented and discussed.