Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming CSV/JSON Object Store Read #2935

Closed
tustvold opened this issue Jul 17, 2022 · 0 comments · Fixed by #2936
Closed

Streaming CSV/JSON Object Store Read #2935

tustvold opened this issue Jul 17, 2022 · 0 comments · Fixed by #2936
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@tustvold
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently CsvOpener and JsonOpener call GetResult::bytes which downloads the entire file, prior to feeding it to the appropriate arrow reader.

This is not ideal:

Following on from #2677 we now support streaming responses from object storage

Describe the solution you'd like

The underlying challenge is to take arbitrary Stream<Bytes> and convert it into a Stream<Bytes> where each stream element contains complete rows, as delimited by a newline character. Once we have this DelimitedStream, it is trivial to feed each of these byte chunks individually into the corresponding decoder.

Describe alternatives you've considered

We could not do this

@tustvold tustvold added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Jul 17, 2022
@tustvold tustvold changed the title Streaming CSV/JSON Read Streaming CSV/JSON Object Store Read Jul 17, 2022
@tustvold tustvold self-assigned this Jul 17, 2022
tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Jul 17, 2022
tustvold added a commit that referenced this issue Jul 18, 2022
…2936)

* Add streaming JSON and CSV (#2935)

* Add license header

* Review feedback

* Add license header

* Review feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant