Data partitioning #226

PetroTruemetrics · 2024-09-12T16:15:05Z

I have the following kind of a query in AWS Athena, which takes about 12-13 seconds to run and over 20GB of data to scan, which is too slow for my use case. I would like to make use of partitioning by a division, for example by a country, but it seems like some rows, in particular in the following location, have division related data completely missing.

Is there any other alternative how I could make the query run faster?

SELECT *, ST_GeomFromBinary(geometry) AS geometry
FROM v2024_07_22_0
WHERE (theme = 'buildings' AND type = 'building')
AND bbox.xmin > 25.260103773772702 
AND bbox.xmax < 25.264066227154125
AND bbox.ymin > 54.66989833391441 
AND bbox.ymax < 54.6724108411568
AND ST_Intersects(
   ST_GeomFromBinary(geometry), 
   ST_GeometryFromText('POLYGON ((25.2616152492726 54.67081885611437, 25.262554637777246 54.67081885611437, 25.262554637777246 54.67149033634276, 25.2616152492726 54.67149033634276, 25.2616152492726 54.67081885611437))')
);

jwass · 2024-09-12T17:27:47Z

I think one of the reasons this is taking longer than expected is due to the (still outstanding) Athena bug where summary statistics on a nested float column (our bbox column) return incorrect results. More here: #1 (reply in thread). So the table currently has use of statistics disabled causing longer run times and increased data scanned. @mojodna

We should consider just returning the bbox column back to doubles.

JBisc · 2024-09-12T18:33:31Z

@jwass is there a special reason that there is no S2 or H3 partitioning?

JBisc · 2024-09-15T14:41:44Z

Is there another way to get around that problem? I currently see no way to use any geospatial indexing with AWS athena, which makes overture useless in scenarios in which you only want to read a small portion of the data.

e.g. I spend already hundreds of dollars on athena cost just for loading a couple of hundert building polygons via overture.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data partitioning #226

Data partitioning #226

PetroTruemetrics commented Sep 12, 2024

jwass commented Sep 12, 2024 •

edited

Loading

JBisc commented Sep 12, 2024

JBisc commented Sep 15, 2024

Data partitioning #226

Data partitioning #226

Comments

PetroTruemetrics commented Sep 12, 2024

jwass commented Sep 12, 2024 • edited Loading

JBisc commented Sep 12, 2024

JBisc commented Sep 15, 2024

jwass commented Sep 12, 2024 •

edited

Loading