-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce FileSystem calls in LinePageSourceFactory #18959
Reduce FileSystem calls in LinePageSourceFactory #18959
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need a similar change for the other formats
1a7ef1f
to
23206e3
Compare
/test-with-secrets sha=cd6fc9fe6f4669af151c5fe0a88f2ac8480b392d |
The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/6174060858 |
@electrum - one more commit was added to fix the related |
Avoids calling exists() and length() on TrinoInputFile inside of LinePageSourceFactory as part of LinePageSource creation. Both of which will map to some API metadata lookups like S3 HeadObject which is unnecessary since: 1. GetObject will throw FileNotFoundException anyway if the file does not exist and issuing a HeadObject eagerly doesn't guarantee the object will still exist for a subsequent GetObject call 2. Adjusting the split length based on the exact file size doesn't prevent data corruption, it's only meant to handle the scenario where encrypted objects may include the last block padding bytes into the file size. This can be detected by reading until an EOF is encountered and will similarly not be prevented by checking the length eagerly since the size could still change between the two calls.
cd6fc9f
to
b9fdcd9
Compare
Description
Avoids calling
exists()
andlength()
onTrinoInputFile
inside ofLinePageSourceFactory
, both of which will map to some API metadata lookups like S3's HeadObject which is unnecessary since:FileNotFoundException
anyway if the file does not exist and issuing aHeadObject
eagerly doesn't guarantee the object will still exist for a subsequentGetObject
callRelease notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: