-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30225][core] Correct read() behavior past EOF in NioBufferedFileInputStream #27084
Conversation
…leInputStream. This bug manifested itself when another stream would potentially make a call to NioBufferedFileInputStream.read() after it had reached EOF in the wrapped stream. In that case, the refill() code would clear the output buffer the first time EOF was found, leaving it in a readable state for subsequent read() calls. If any of those calls were made, bad data would be returned. By flipping the buffer before returning, even in the EOF case, you get the correct behavior in subsequent calls. I picked that approach to avoid keeping more state in this class, although it means calling the underlying stream even after EOF (which is fine, but perhaps a little more expensive). This showed up (at least) when using encryption, because the commons-crypto StreamInput class does not track EOF internally, leaving it for the wrapped stream to behave correctly. Tested with added unit test + slightly modified test case attached to SPARK-18105.
Test build #116056 has finished for PR 27084 at commit
|
retest this please |
Test build #116061 has finished for PR 27084 at commit
|
@@ -59,10 +59,10 @@ private boolean refill() throws IOException { | |||
while (nRead == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the comment says it checks the weather :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merged to master and branch-2.4. |
…leInputStream This bug manifested itself when another stream would potentially make a call to NioBufferedFileInputStream.read() after it had reached EOF in the wrapped stream. In that case, the refill() code would clear the output buffer the first time EOF was found, leaving it in a readable state for subsequent read() calls. If any of those calls were made, bad data would be returned. By flipping the buffer before returning, even in the EOF case, you get the correct behavior in subsequent calls. I picked that approach to avoid keeping more state in this class, although it means calling the underlying stream even after EOF (which is fine, but perhaps a little more expensive). This showed up (at least) when using encryption, because the commons-crypto StreamInput class does not track EOF internally, leaving it for the wrapped stream to behave correctly. Tested with added unit test + slightly modified test case attached to SPARK-18105. Closes #27084 from vanzin/SPARK-30225. Authored-by: Marcelo Vanzin <vanzin@cloudera.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This bug manifested itself when another stream would potentially make a call
to NioBufferedFileInputStream.read() after it had reached EOF in the wrapped
stream. In that case, the refill() code would clear the output buffer the
first time EOF was found, leaving it in a readable state for subsequent
read() calls. If any of those calls were made, bad data would be returned.
By flipping the buffer before returning, even in the EOF case, you get the
correct behavior in subsequent calls. I picked that approach to avoid keeping
more state in this class, although it means calling the underlying stream
even after EOF (which is fine, but perhaps a little more expensive).
This showed up (at least) when using encryption, because the commons-crypto
StreamInput class does not track EOF internally, leaving it for the wrapped
stream to behave correctly.
Tested with added unit test + slightly modified test case attached to SPARK-18105.