Taxi Driving Fraud Detection Technology
✅ Paper Type: Free Essay | ✅ Subject: Information Technology |
✅ Wordcount: 1980 words | ✅ Published: 8th Feb 2020 |
ABSTRACT
Due to many taxi cabs now having an embedded Global Positioning System (GPS) we can collect massive amounts of taxi trajectories throughout urban environments. These GPS records provide an opportunity for us to uncover taxi driving fraud events. In this paper we describe a method of detecting anomalous taxi trajectories. Sometimes taxi drivers can purposefully take longer routes to the destination in an attempt to get a higher fare. This can be a problem for the passengers who are forced to pay higher fares as well as the taxi company who might lose customers to competitors if such fleecing is discovered. Hence detection of such anomalous trajectories can be of paramount importance. Additionally, this technology could possibly be extended to detecting anomalous traffic patterns in general which could help identify unusual road conditions caused due to accidents, weather, construction, or other events. We use machine learning to detect when a trajectory between an origin and destination differs extensively from other trajectories during the same time frame. This allows us to rule out the previously mentioned outside events and classify the driver as malicious. We evaluated our method against real-world taxi trajectories from the Chinese city of Shenzen.
1. INTRODUCTION
Taxi driving fraud is committed by greedy taxi drivers who deliberately take unnecessary detours in order to overcharge passengers. Many taxi service complaints are directly related to taxi driving fraud []. Therefore, it is extremely valuable for taxi companies to be able to access this information so that customers have a high satisfaction rate. However, fraud detection is a challenging problem to solve since experienced drivers often know the city better than their passengers.
Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Find out more about our Essay Writing Service
Fortunately, the GPS device equipped on modern taxis allows us to examine traces throughout the city. These traces provide the necessary information for large scale fraud detection. This paper proposes the use of machine learning to find anomalous taxis trajectories. We focus our research on the areas that we believe taxi drivers are most likely to take advantage of customers. We hypothesis that tourists are the most likely victims of a taxi driver taking a sub-optimal route. This is because tourists are unfamiliar with the area, and therefore unlikely to notice that the driver is taking an unnecessary detour. In order to find trajectories likely to a tourist we created fixed rectangular areas around key tourist destinations in Shenzen such as the airport and train station. We then focus on taxi trajectories that start and end at those key destinations. For each trajectory we calculate the total distance traveled and the total duration of the trajectory. Anomalies can then be detected by comparing the total distance and total time to historical trajectory data. Anomalous routes will have a larger total distance covered, a larger duration or both. Even though all taxi drivers committing fraud of the same motivation, to overcharge passengers. Solving this problem is by no means trivial.
A few of the main challenges in this will be:
- There can be multiple paths between a source and destination. It would be hard to classify a path as anomalous especially if two or more paths are similar in terms of distance and time. This makes it harder to detect routes that detour locally, but still fall within an overall acceptable distance measurement.
- Longer paths might take less time than the shortest route due to traffic conditions and therefore should not be counted as anomalies.
- Some routes might appear to be anomalies when they are in fact shortcuts.
- If a segment of a road is blocked off a route that was previously classified as anomalous would now have to be classified as the proper route. This detection can’t rely on historic data alone and will need multiple taxi trajectories in the same time frame with the same source and destination to be classified properly. However, such data is not always available.
- Some drivers many not be intentionally committing fraud they could truly be unfamiliar with the local area. Additionally, some suspicious activity might be the result of changes in traffic conditions.
In this paper we develop a taxi fraud detection system equipped with several components to overcome these challenges. First we will identify interesting sites from Shenzen in order to focus our detection. These sites are locations that are frequently visited as pick-up and drop-off locations for tourists. Between these locations we perform driving fraud detection. In order to provide detection information, we make use of two primary features of each trajectory distance traveled and duration in time. After evaluating all trajectories between these interesting locations we can identify typical routes taken by taxis based on a probabilistic model. Additionally, we can evaluate the distance and duration of routes using a probabilistic model as well. Finally, we compare a suspicious route to other routes taken between the same origin and destination at the same time in order to filter out anomalies caused by traffic conditions.
We will be using taxi trajectory data from the Chinese city of Shenzen. The data about the GPS location is updated every few seconds and there are approximately 2,500 records for a single taxi and a total of 350 taxis in the dataset. The dataset also contains occupancy status of the taxi. Which is used to determine whether the taxi is currently serving a customer. For our purposes we are only interested in anomalous routes while a passenger is in the taxi. However, the taxi company might also be interested in anomalous routes when the taxi does not have a passenger. Additionally, the dataset contains the speed of the taxi at every GPS reading.
2. RELATED WORK
Isolation Based Anomalous trajectory (iBAT)[] was proposed as a solution for detecting anomalous trajectories and detecting changes in traffic network. This method also relied on GPS trajectory data. An improvement on this called iBOAT[] (Isolation Based Online Anomalous Trajectory) can not only detect the anomalous trajectories but also detect which portion of the trajectory was anomalous in real time.
Others [2] have done work related to outlier detection. Similar methods were used to evaluate the distance traveled by a taxi during a route by evaluating GPS data. Additionally, these research provides guidelines for dealing with some potential excuses for fraud such as being unfamiliar with the area, and traffic conditions. However, their research did not make use of the duration in time of the trajectory. A passenger might prefer to take a physically longer route if it’s more time efficient.
3. MOTIVATION
A study by the National Bureau of Economic Research found that taxis take unnecessarily long detours on about seven percent of routes that originate from an airport [3]. Detection of anomalous taxi patterns can have a variety of applications.
- Detection and prevention of taxi drivers intentionally taking customers on longer routes.
- Detection of blocked roads and unusual traffic patterns or conditions.
- Increased customer satisfaction and driver accountability.
- This method could also be used to identify the routes that are anomalies in an inverse perspective. That is, we could identify the routes between interesting locations that have the shortest distance and duration in time. This would allow for taxi cab companies to optimize their routes.
4. MAIN DESIGN
For detection of anomalous trajectories, we first must fix a starting point and destination. We calculate a range of GPS points corresponding to the source and destination. To ensure we have plenty of data for analysis for a source-destination pair we will only be looking at major hubs like train stations or airports additionally these locations are often hotspots for tourists, who are the most likely victims of taxi fraud. Then we will analyze all trajectories with this source and destination to discover the route they took. By examining intermittent GPS readings, we are able to determine the route taken. Analyzing data over multiple taxis and multiple trips by the same taxi for a source-destination (S-D) pair we should be able to determine which routes are not fraudulent for each S-D pair. A valid route will be one that is taken by numerous taxis traveling from the same (S-D). Then we will calculate the distance and time distribution for these typical non-fraudulent routes.
Trajectories which deviate a lot from these routes are potential candidates for fraud. However, we still need to determine the cause of these anomalies. They could be caused by: 1) Taxis trying to cheat their customers, 2) The possibility that a taxi driver new to the area doesn’t know the optimal route or 3) If the road segment is blocked during that time due to traffic, accidents or any other reasons.
To tackle point (2) we can simply check the previous trajectories of the same taxi. If this driver has operated in this area previously and not taken the same route we are able to determine that they are in fact acting maliciously. For case (3) we can check the trajectories of other taxis for the same S-D pair during same time period. If other drivers are also taking the same route, then we conclude that there must be some traffic condition causing the change in route. However, this still isn’t enough information to classify the route as fraud. We still have to look at the overall time taken for the route. We can see that some suspicious trajectories detected by distance alone, may just be shortcuts instead of frauds. Such as the cyan route in Figure 1 below.
The distance covered in that short-cut might be larger and the path might be different, but if the time taken during the route in within our margin of error it is not fraud. If the time take is within our margin, then we can conclude that the driver is off route and they are in fact acting with malicious intent.
Additionally, a trajectory could have normal distance, but take much longer than other routes. Since taxi fare is based on a combination of time and distance this is another potential method for fraud. However, since we are replicating routes via their intermittent GPS readings this trajectory would already be flagged as an anomaly without having to look at its time. Thus our method will cover both potential sources of fraud.
5. REFERENCES
[1] Space
[2] Ge, Y., Xiong, H., Liu, C., & Zhou, Z. (2011). A Taxi Driving Fraud Detection System. 2011 IEEE 11th International Conference on Data Mining. Retrieved October 27, 2018.
[3] Liu, M., Brynjolfsson, E., & Dowlatabadi, J. (2018). Do Digital Platforms Reduce Moral Hazard? The Case of Uber and Taxis. Retrieved October 27, 2018.
Cite This Work
To export a reference to this article please select a referencing stye below:
Related Services
View allDMCA / Removal Request
If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please: