Global Positioning System (GPS) data has been a valuable source of information in transportation, urban planning, and logistics. In the Philippines, several transport companies and organization utilized GPS in order to optimize their operational policies to improved revenue and resources. In the government, the use of GPS has been pivotal to improve its key services particularly in public transportation.
This data has been collected by LTFRB through its mobile big data partners in telco. The time coverage is March 2018 and 2019 in a 24 hour interval. The objective of this project is to understand and analyze the behavior of commuters using taxi as a mode of transportation. Additionally, we have to recommend policy that will help improve the experience of commuters.
A. Contents
- Temporal Coverage
- With the aid of python, I analyze the area covered by the data such as logs per taxi, and the trends that corresponds to it such as daily average ridership.
- Stay Point Identification
- Using python I conducted a spatial grouping. In order to a GPS point belong to a group, the taxi should stay in a significant amount time and didn’t move or exceed a specified distance.
- Spatial Clustering
- After identifying the staypoints, I conducted a spatial clustering using DBSCAN and Hierarchical technique. From this two machine learning algorithm, I derived two clustering results that allows me to give recommendations.
- Recommendation
- In this part, I recommend practical strategies in mobility management in terms of shifting the commuter from taxi to public transportation.
B. Data Structure
- The daily GPS logs in this data is recorded in an average interval of 2 minutes. So the logs does not translate to an individual ride. We can see in the user logs that the records for each user is not equal since per user the interval is not equal.
- Using Spatial filtering, I reduced the coverage within
metro manila
with allowances for its adjacent provinces such as bulacan in the north, rizal in the east, and cavite in the south. Logs that reach up to clark, and down to laguna are clipped. - For this data to be understand, I utilized its date features and grouped them based on its day and hour. In this way, I will overpower the inaccuracy for the 2 minutes interval of the GPS logs.
- Daily Average Rides:
- The daily average rides follow a trend, it has 3 on-peaks: 9 AM, 1 PM, 7 PM
- 3 off-peaks: 11 AM, 5 PM, 11 PM
- For 9 AM, Monday recorded the highest ridership, with a 701 average rides.
- For 1 PM, Tuesday the is highest, with 618 average rides.
- For 7 PM, Wednesday recorded a 662 average rides.
- For a month basis, we will divide this ridership in 4 days:
- 175 cars for every monday morning
- 155 cars for every tuesday afternoon
- 166 cars for every wednesday evening
- Assuming every ride has 1 passenger/commuter, it will require a 175 car to transport 1 person from its origin to destination.
- If we compressed this by 4 passengers if ride sharing is implemented, it will require approximately 43 cars in monday morning
- If we compressed this by 60 passengers, if they used a bus or shuttle service, it will require 3 buses to transport the 175 passengers.
A stay point is a location identified from multiple GPS logs based on specific criteria. The GPS logs within this location are averaged to determine its latitude and longitude.
A. Criteria
- Define a radius for the basis of the stay point.
- Feed the GPS data into a loop, and test each point against the following conditions:
time_stayed
≥minimum_time_to_stay
distance_changed
≤threshold_distance
- Starting from the initial GPS point, measure the distance to subsequent GPS logs. If the distance exceeds the
threshold_distance
, exit the loop. - Compute the stay point’s latitude and longitude by averaging the GPS logs that meet the criteria.
- Calculate the
cumulative time
andcumulative distance
for the stay point. - Proceed to the next point to feed in the loop.
B. Pseudo-code
class StayPointIdentification:
def __init__(self, data, cutoff_distance, minimum_time):
# Initialize variables
self.gps_data = data
self.cutoff_distance = cutoff_distance
self.minimum_time = minimum_time
self.staypoints = self.identify_staypoints()
self.staypoints_df = self.to_dataframe()
def centroid(self, latitude, longitude):
# Compute centroid of given latitude and longitude lists
if len(latitude) == 1:
return sum(latitude) / len(latitude), sum(longitude) / len(longitude)
return sum(latitude) / (len(latitude) - 1), sum(longitude) / (len(longitude) - 1)
def radian(self, point):
# Convert point to radians
return float(point) * math.pi / 180.0
def haversine_distance(self, lat1, lon1, lat2, lon2):
# Calculate Haversine distance between two GPS points
radius = 6371 # Earth radius in km
phi1, phi2 = self.radian(lat1), self.radian(lat2)
delta_phi = phi2 - phi1
delta_lambda = self.radian(lon2) - self.radian(lon1)
# Haversine formula
a = sin(delta_phi / 2)**2 + cos(phi1) * cos(phi2) * sin(delta_lambda / 2)**2
c = 2 * asin(sqrt(a))
return radius * c
def identify_staypoints(self):
# Identify staypoints based on cutoff distance and minimum time
staypoints = []
for each point in gps_data:
if distance <= cutoff_distance and time_interval >= minimum_time:
staypoints.append(compute_centroid())
return staypoints
def to_dataframe(self):
# Convert staypoints to a dataframe, filtering by minimum time
dataframe = convert_to_dataframe(self.staypoints)
return filter_dataframe_by_time(dataframe, self.minimum_time)
a. Density-Based Spatial Clustering of Application with Noise
Fig. 6 DBSCAN Cumulative time vs Cumulative Count
b. Hierarchical Clustering
c. Insights
-
High Cumulative Count, High Cumulative Time: This combination may indicate areas of high traffic density and prolonged dwell time, where there is a lot of activity happening. These areas may be urban centers, shopping districts, or entertainment venues.
-
Low Cumulative Count, Low Cumulative Time: This combination may indicate areas of low traffic activity, where there is little movement or activity happening. These areas may be remote or less populated regions.
-
Cumulative count is low and the cumulative time varies: it suggest that the area is not heavily trafficked but that there are some events or activities that draw people to the area for varying amounts of time
-
High-Demand Areas
: Areas with high demand for taxis can be targeted for investment in public transit system such as point-to-point bus system. The stay points withinAntipolo
,Taguig
, andQC
can be redesign to have this bus system and connect them to the main transport network such as the MRT and EDSA Busway. In this way, commuters will be encourage to use public transport that offers minimal the transportation cost and time in changing modes. -
Low-Demand Areas
: Areas with low demand for taxi is accompanied of short travel time. This areas indicates that the origin-to-destination distance is short and can be done using other mode of transport such ascycling
orwalking
. This area can be targeted for green spaces infrastructures, such as exclusive pedestrian and bicycle lanes. This promotes commuters to change in active transport instead of taxi. -
Varying Demand Areas
: This area is accompanied with varying demand and travel time, stay points that falls in this area are present on both residential and business areas. This might indicates that the demand is based only in specific situation of the commuter. For low travel time, approach forlow-demand area
can be adapted. For areas with high travel time,carpooling
orride-sharing
can be implement within the area. This will reduce space in the road, saves cost for users. -
Average-Demand Areas
: The stay points that falls in this category are present in business areas, malls, schools, and local communities. The efforts in this category should focus in green space infrastructure such as inclusive waiting area for children, senior citizens, and PWDs.