You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When query on hive partitioned table that is on s3 with limit, trino loads all of the queried partition and only then evaluates the limit part of the query.
On huge partitioned tables it makes it impossible to do the simple following query:
select*fromhive.schema.tablename limit10;
Env for reproduction (altho it shouldn't matter) -
Trino version - 436
Catalog - hive
Storage - s3 compatible ceph
Objects format - parquet
Do you think it's possible to make the coordinator check every x seconds how many rows each task retrieved and then choose if to abort the rest and return the combined results?
About queries with filtering AND limit, maybe the same is possible but to do it only to the last query stage (where the limit should happen)
The text was updated successfully, but these errors were encountered:
When query on hive partitioned table that is on s3 with limit, trino loads all of the queried partition and only then evaluates the limit part of the query.
On huge partitioned tables it makes it impossible to do the simple following query:
Env for reproduction (altho it shouldn't matter) -
Trino version - 436
Catalog - hive
Storage - s3 compatible ceph
Objects format - parquet
Do you think it's possible to make the coordinator check every x seconds how many rows each task retrieved and then choose if to abort the rest and return the combined results?
About queries with filtering AND limit, maybe the same is possible but to do it only to the last query stage (where the limit should happen)
The text was updated successfully, but these errors were encountered: