Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data partitioning #226

Open
PetroTruemetrics opened this issue Sep 12, 2024 · 3 comments
Open

Data partitioning #226

PetroTruemetrics opened this issue Sep 12, 2024 · 3 comments

Comments

@PetroTruemetrics
Copy link

I have the following kind of a query in AWS Athena, which takes about 12-13 seconds to run and over 20GB of data to scan, which is too slow for my use case. I would like to make use of partitioning by a division, for example by a country, but it seems like some rows, in particular in the following location, have division related data completely missing.

Is there any other alternative how I could make the query run faster?

SELECT *, ST_GeomFromBinary(geometry) AS geometry
FROM v2024_07_22_0
WHERE (theme = 'buildings' AND type = 'building')
AND bbox.xmin > 25.260103773772702 
AND bbox.xmax < 25.264066227154125
AND bbox.ymin > 54.66989833391441 
AND bbox.ymax < 54.6724108411568
AND ST_Intersects(
   ST_GeomFromBinary(geometry), 
   ST_GeometryFromText('POLYGON ((25.2616152492726 54.67081885611437, 25.262554637777246 54.67081885611437, 25.262554637777246 54.67149033634276, 25.2616152492726 54.67149033634276, 25.2616152492726 54.67081885611437))')
);
@jwass
Copy link
Contributor

jwass commented Sep 12, 2024

I think one of the reasons this is taking longer than expected is due to the (still outstanding) Athena bug where summary statistics on a nested float column (our bbox column) return incorrect results. More here: #1 (reply in thread). So the table currently has use of statistics disabled causing longer run times and increased data scanned. @mojodna

We should consider just returning the bbox column back to doubles.

@JBisc
Copy link

JBisc commented Sep 12, 2024

@jwass is there a special reason that there is no S2 or H3 partitioning?

@JBisc
Copy link

JBisc commented Sep 15, 2024

Is there another way to get around that problem? I currently see no way to use any geospatial indexing with AWS athena, which makes overture useless in scenarios in which you only want to read a small portion of the data.

e.g. I spend already hundreds of dollars on athena cost just for loading a couple of hundert building polygons via overture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants