GFile S3 reader may corrupt files containing multibyte characters or CR/LF #2839

davidsoergel · 2019-10-25T16:34:36Z

In our stub GFile implementation, the S3 reader obtains chunks of n bytes and then decodes them as strings. This risks corrupting data or at least throwing decoding errors, because the byte chunk may end in the middle of a multibyte character (and correspondingly, the next chunk would begin in the middle of that character).

Thus the chunks cannot be decoded independently. Any partial character must be detected and removed from the end of each chunk, and prepended to the subsequent chunk before decoding.

CRLF (\r\n) is effectively a multibyte character in this context, because Python always "decodes" it to just \n.

See

tensorboard/tensorboard/compat/tensorflow_stub/io/gfile.py

Line 268 in 2b3479b

# TODO(orionr): This endpoint risks splitting a multi-byte

(Followup to #2791 and #2777)

davidsoergel · 2019-10-25T16:37:31Z

FYI @orionr, @sanekmelnikov, @natalialunova, @lanpa. I can't actually assign you here but please feel free to send a fix if you like. Thanks!

gowthamkpr self-assigned this Oct 28, 2019

gowthamkpr added core:backend type:bug labels Oct 28, 2019

gowthamkpr assigned bileschi and unassigned gowthamkpr Oct 28, 2019

gowthamkpr added the stat:awaiting tensorflower label Oct 28, 2019

bileschi removed their assignment Jan 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GFile S3 reader may corrupt files containing multibyte characters or CR/LF #2839

GFile S3 reader may corrupt files containing multibyte characters or CR/LF #2839

davidsoergel commented Oct 25, 2019 •

edited

Loading

davidsoergel commented Oct 25, 2019

GFile S3 reader may corrupt files containing multibyte characters or CR/LF #2839

GFile S3 reader may corrupt files containing multibyte characters or CR/LF #2839

Comments

davidsoergel commented Oct 25, 2019 • edited Loading

davidsoergel commented Oct 25, 2019

davidsoergel commented Oct 25, 2019 •

edited

Loading