Releases: JimHaughwout/GADM_DBSCAN
Sample Results documentation
Add sample results documentation to illustrate distance formula differences
Initial Release
Fun with DBSCAN algorithm and GADM-geocoded points of interest.
This reads in a data set of coordinates (latitude and longitude) along with
geocoded Global Administrative Area features
for these, then performs unsupervised learning to cluster these into
zones of interest based on geographic features using the
Density-Based Spatial Clustering of Applications with Noise
algorithm with a customized distance function.
Custom Distance Metric Modes
You can use one of three modes to calculation the distance between points for
DBSCAN clustering
vicenty-basic
Mode
Custom distance metric using Vincenty's Forumla.
vicenty-gadm
Mode
Custom distance metric that combines Vincenty's Forumla with GADM features
to calculate a scored distance (in km). The metric starts with a base
Vincenty's Forumal distance calculation, then modifies this based on
whether the two points are in the same city and or city neighborhood.
This is just one (illustrative) method of using GADM features to modify
distance. It is "magic numbery" for simplicity. In real-life one would
derive values for GADM feature weights -- or use the full proxy method.
proxy
Mode
Custom distance metric that uses a simple proxy ID to fetch attributes
from an external data set (for illustrative simplicity in this case,
the passed POI dataset)
While this Proxy approach replicates the same distance formula
of Vincenty-plus-GADM it could be modified to support ANY distance formula.
For example, rather that using GADM features one could instead extract
a key or GUID used to look up a whole array of features used for a custom
distance calculation (even to make a REST call to a route planning system
to get true driving times between each X and Y).
Vincenty MVP
What's New
This version uses a custom distance metric function that employs true
ellipsoid distance calculations (using Vincenty's formula).
What's Next
The desire is to modify the metric calculation to employ GADM features to
change the distance calcultion (i.e., leverage urbanization vs rural features).
Known Issues
As (Lat, Lng) is actually (Y, X) matplotlib
plots these with a 90-degree rotation.
Basic release
This release uses a conformal mapping approach to map ellipsoid
(latitude, longitude) coordinates to a flat Cartesian (X,Y) Plane. This allows use of sklearn's out-of-the-box
distance calculation functions.