-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fread error reading Latin-1 file containing NUL byte <0x00> #2435
Comments
I am also affected by this. For large files, replacing
Would that be a possible workaround for |
@adamaltmejd and @djbirke would you mind trying your files again please using 1.12.3 from GitHub: embedded NUL should be fixed now. |
@mattdowle That's great news, thanks a lot! We decided to bite the bullet and manually replace any |
Thanks for fixing this! In my case the file is in on a secure server where I cannot update Datatable myself, but pretty sure it would work if it does in other test cases :). Thanks! |
Copying from my question on SO:
Having trouble creating a reproducible example and can't share the data, but I think I stumbled upon a bug in fread(). Trying to read my 1.658GB tsv file encoded in Latin-1 produces the following error:
The problematic line is line no 11129896 where there is a NUL mark written out as
<0x00>
in Sublime Text and^@
in Vi (can't copy it). If i setskip = 11129895
, fread throws the same error but now on "jump 0", if I setskip = 11129896
it works, butnrows=11129895
still throws the same error. Having removed the character the file reads as it should. Maybe fread() is not supposed to support reading files with these encoding issues, but at least it would be great if the error was more informative. Took me quite a while to understand what was going on and to find the correct line.The verbose output of fread() is:
And sessionInfo():
The text was updated successfully, but these errors were encountered: