Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CSV Limit Pushdown to Object Storage #2930

Closed
sitano opened this issue Jul 16, 2022 · 1 comment · Fixed by #2936
Closed

Support CSV Limit Pushdown to Object Storage #2930

sitano opened this issue Jul 16, 2022 · 1 comment · Fixed by #2936
Labels
enhancement New feature or request

Comments

@sitano
Copy link

sitano commented Jul 16, 2022

Describe the bug

If you will take a 10 GB file from a S3 remote storage the following request:

SELECT * FROM test LIMIT 1;

will try to read the WHOLE file (10GB) instead of just a first row (chunk).

To Reproduce
Steps to reproduce the behavior:

  1. Put 1GB CSV file to S3
  2. Add s3 contrib object store that is fine
//  let mut ctx: Context = Context::new_local(&session_config);
    let mut ctx = {
        let runtime = RuntimeEnv::new(RuntimeConfig::default()).unwrap();
        runtime.register_object_store("s3", Arc::new(S3FileSystem::default().await));
        Context::Local(SessionContext::with_config_rt(
            session_config.clone(),
            Arc::new(runtime.clone()),
        ))
    };
  1. CREATE EXTERNAL TABLE test (...) STORED AS CSV WITH HEADER ROW LOCATION 's3://blah/blah.csv';
  2. SELECT * FROM test LIMIT 1;
list file from: s3://blah/blah.csv
sync_chunk_reader: 0-10428263736
sending get object request blah/blah.csv
ArrowError(ExternalError(Custom { kind: TimedOut, error: AWS("Timeout") }))

Expected behavior

It must read only a small chunk that is enough to execute the LIMIT 1 query.

Additional context

The contrib module is fine... It's an engine that requests this epic lenght.

@sitano sitano added the bug Something isn't working label Jul 16, 2022
@tustvold tustvold added enhancement New feature or request and removed bug Something isn't working labels Jul 17, 2022
@tustvold tustvold changed the title bug: executor does not understand the remote storage cost and reads the whole file instead of a chunk Support CSV Limit Pushdown to Object Storage Jul 17, 2022
@tustvold
Copy link
Contributor

I've reworded this into a feature request, and filed #2935 that documents how it could be implemented

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants