Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-s3: get object with context #344

Merged
merged 1 commit into from
Sep 19, 2022

Conversation

williamhbaker
Copy link
Member

@williamhbaker williamhbaker commented Sep 19, 2022

Description:

Uses the passed-in context for source-s3 (*s3Store).Read. This way the retrieving of objects can respect context timeouts.

Most notably, there is a timeout specified for schema discovery, and without this timeout, it is possible that a single extremely large file will take an excessive amount of time to process.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

Along these same lines, the List method doesn't use the provided context. Updating List to use a context would be a bit more complicated. I created a separate issue for that work, which isn't causing any immediate problems but might be good to take care of at some point: #345

Also, while investigating discovery from relatively large files, I found our parsing & possibly inference processes surprisingly slow. If the parsing and inference were much faster, this particular context deadline would be less important, although still useful to have. I have some details to still work out for this, and have created a placeholder for an issue related to this potential optimization: estuary/flow#681


This change is Reviewable

Unverified

This user has not yet uploaded their public signing key.
@williamhbaker williamhbaker marked this pull request as ready for review September 19, 2022 18:04
Copy link
Member

@willdonnelly willdonnelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@williamhbaker williamhbaker merged commit 0a20957 into main Sep 19, 2022
@williamhbaker williamhbaker deleted the wb/source-s3-read-context-timeout branch September 19, 2022 18:08
@oliviamiannone oliviamiannone added the docs complete / NA No (more) doc work related to this PR label Sep 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs complete / NA No (more) doc work related to this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants