- Understanding Geohashing: A Comprehensive Guide
Geohashing is a method for encoding geographic coordinates (latitude and longitude) into a short string of letters and digits. This encoded string, called a geohash, offers a compact, human-readable way to represent geographic locations. Developed by Gustavo Niemeyer in 2008, geohashing is widely used in geographic information systems (GIS), spatial databases, and location-based services.
-
Compact Representation
:- Geohashes condense geographic coordinates into a short string
- This makes them easier to store and communicate
-
Spatial Locality
:- Geohashes maintain spatial locality, meaning that points close to each other geographically will have similar geohashes
- This is crucial for efficient spatial querying and indexing
-
Hierarchical Structure
:- Geohashes provide a hierarchical structure where each additional character increases the precision of the location
- This allows for variable levels of granularity in spatial data representation
-
Efficient Querying
:- Geohashes enable efficient spatial queries, such as finding all points within a certain area or bounding box
- This is especially useful for applications like geocoding, mapping, and location-based services
Precision in geohashing refers to the accuracy of the geographic location represented by the geohash.
The length of the geohash string determines its precision: longer geohashes correspond to smaller areas
, thus representing locations more accurately.
Here is a table showing the precision of geohashes based on their length:
Geohash Length | Latitude Bits | Longitude Bits | Cell Width | Cell Height |
---|---|---|---|---|
1 | 2 | 3 | ≤ 5,000 km | ≤ 5,000 km |
2 | 5 | 5 | 1,250 km | 625 km |
3 | 7 | 8 | 156 km | 156 km |
4 | 10 | 10 | 39.1 km | 19.5 km |
5 | 12 | 13 | 4.89 km | 4.89 km |
6 | 15 | 15 | 1.22 km | 0.61 km |
7 | 17 | 18 | 153 m | 153 m |
8 | 20 | 20 | 38.2 m | 19.1 m |
9 | 22 | 23 | 4.77 m | 4.77 m |
10 | 25 | 25 | 1.19 m | 0.596 m |
11 | 27 | 28 | 149 mm | 149 mm |
12 | 30 | 30 | 37.2 mm | 18.6 mm |
Let's derive a 4-character (precision) geohash for Google's headquarters located at approximately latitude 37.422
and longitude -122.084
We'll break down the process into detailed steps.
Latitude ranges from -90 to +90.
Step | Range | Midpoint | 37.422 Comparison | Binary | New Range |
---|---|---|---|---|---|
1 | [-90, 90] | 0 | 37.422 > 0 | 1 | [0, 90] |
2 | [0, 90] | 45 | 37.422 < 45 | 10 | [0, 45] |
3 | [0, 45] | 22.5 | 37.422 > 22.5 | 101 | [22.5, 45] |
4 | [22.5, 45] | 33.75 | 37.422 > 33.75 | 1011 | [33.75, 45] |
5 | [33.75, 45] | 39.375 | 37.422 < 39.375 | 10110 | [33.75, 39.375] |
6 | [33.75, 39.375] | 36.5625 | 37.422 > 36.5625 | 101101 | [36.5625, 39.375] |
7 | [36.5625, 39.375] | 37.96875 | 37.422 < 37.96875 | 1011010 | [36.5625, 37.96875] |
8 | [36.5625, 37.96875] | 37.265625 | 37.422 > 37.265625 | 10110101 | [37.265625, 37.96875] |
9 | [37.265625, 37.96875] | 37.6171875 | 37.422 < 37.6171875 | 101101010 | [37.265625, 37.6171875] |
10 | [37.265625, 37.6171875] | 37.44140625 | 37.422 < 37.44140625 | 1011010100 | [37.265625, 37.44140625] |
Latitude binary: 1011010100
Longitude ranges from -180 to +180.
Step | Range | Midpoint | -122.084 Comparison | Binary | New Range |
---|---|---|---|---|---|
1 | [-180, 180] | 0 | -122.084 < 0 | 0 | [-180, 0] |
2 | [-180, 0] | -90 | -122.084 < -90 | 00 | [-180, -90] |
3 | [-180, -90] | -135 | -122.084 > -135 | 001 | [-135, -90] |
4 | [-135, -90] | -112.5 | -122.084 < -112.5 | 0010 | [-135, -112.5] |
5 | [-135, -112.5] | -123.75 | -122.084 > -123.75 | 00101 | [-123.75, -112.5] |
6 | [-123.75, -112.5] | -118.125 | -122.084 < -118.125 | 001010 | [-123.75, -118.125] |
7 | [-123.75, -118.125] | -120.9375 | -122.084 < -120.9375 | 0010100 | [-123.75, -120.9375] |
8 | [-123.75, -120.9375] | -122.34375 | -122.084 > -122.34375 | 00101001 | [-122.34375, -120.9375] |
9 | [-122.34375, -120.9375] | -121.640625 | -122.084 < -121.640625 | 001010010 | [-122.34375, -121.640625] |
10 | [-122.34375, -121.640625] | -121.9921875 | -122.084 < -121.9921875 | 0010100100 | [-122.34375, -121.9921875] |
Longitude binary: 0010100100
Interleave the bits of longitude and latitude to create a single binary string.
(Longitude, Latitude) -> (01)
Step | Latitude | Longitude | Interleaved |
---|---|---|---|
1 | 1 | 0 | 01 |
2 | 0 | 0 | 0100 |
3 | 1 | 1 | 010011 |
4 | 1 | 0 | 01001101 |
5 | 0 | 1 | 0100110110 |
6 | 1 | 0 | 010011011001 |
7 | 0 | 0 | 01001101100100 |
8 | 1 | 1 | 0100110110010011 |
9 | 0 | 0 | 010011011001001100 |
10 | 0 | 0 | 01001101100100110000 |
Interleaved binary: 01001101100100110000
The binary string 01001101100100110000
needs to be converted to base32.
-
Pad the binary string to make its length a multiple of 5: The length of
01000110100100110000
is 20, which is already a multiple of 5, so no padding is needed. -
Divide the binary string into 5-bit groups:
01001 10110 01001 10000
-
Convert each 5-bit group to its decimal equivalent:
01001
= 910110
= 2801001
= 910000
= 16
-
Map each decimal value to the corresponding base32 character: Base32 alphabet:
0123456789bcdefghjkmnpqrstuvwxyz
- 9 -> 9
- 28 -> u
- 9 -> 9
- 16 -> h
Therefore, the binary string 01000110100100110000
converts to the base32 string 9u9h
.
The 4-character geohash for Google's headquarters (latitude 37.422, longitude -122.084) is 9u9h
.
-
Spatial Locality
:- Interleaving ensures that geohashes for nearby locations are similar, preserving spatial locality. This is critical for spatial queries and indexing.
-
Efficient Range Queries
:- Interleaved geohashes allow for efficient bounding box queries, which are common in geographic searches. This makes it easier to query all points within a certain area.
-
Balanced Precision
:- By interleaving, both latitude and longitude contribute equally to the precision of the geohash. This avoids skewed precision that could occur if one coordinate is given more bits than the other.
-
Hierarchical Subdivision
:- Interleaving provides a hierarchical structure where each additional character refines the location, allowing for varying levels of granularity.