Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't skip over multiple spaces when '' (blank) is an NA string #6815

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dvg-p4
Copy link
Contributor

@dvg-p4 dvg-p4 commented Feb 12, 2025

Closes #3658

TODO:

  • Fix the failing existing tests (not sure if the tests are expecting the wrong behavior or if I introduced a bug)
  • Write a test for the intended behavior
  • Explain the existing space-condensing behavior and the new exception to it in fread's documentation

@dvg-p4
Copy link
Contributor Author

dvg-p4 commented Feb 12, 2025

Hmm, so it looks like (part) of the issue is that strip.white = TRUE (by default) for that test; and the other part of the issue is that the whitespace-stripping code isn't accounting for this change I made to not-skip-spaces. I don't think it's super clear what should be done with fread(sep = ' ', na.strings = '', strip.white = TRUE); it's a bit contradictory.

@dvg-p4
Copy link
Contributor Author

dvg-p4 commented Feb 12, 2025

I think given that strip.white = TRUE is the default, the correct behavior is:

If sep is specified or detected to be ' ' [space], na.strings includes '' [blank], and strip.white is TRUE; warn the user, set stripWhite to false, and continue.

@dvg-p4
Copy link
Contributor Author

dvg-p4 commented Feb 12, 2025

Ugh, this is even more of a mess than I realized. In some places, we're defining whitespace as ' ', in others as whiteChar [with 0 meaning "both ' ' and '\t'"], in others by isspace()...

and deciding whether to strip it sometimes by whether stripWhite is true, and other times by whether sep is ' '...

@dvg-p4 dvg-p4 changed the title Don't skip over multiple spaces when '' is an NA string Don't skip over multiple spaces when "" (blank) is an NA string Feb 12, 2025
@dvg-p4 dvg-p4 changed the title Don't skip over multiple spaces when "" (blank) is an NA string Don't skip over multiple spaces when "" (blank) is an NA string Feb 12, 2025
@dvg-p4 dvg-p4 changed the title Don't skip over multiple spaces when "" (blank) is an NA string Don't skip over multiple spaces when '' (blank) is an NA string Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Read space delimited file where missing values are blank "".
1 participant