Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix split when a nbsp character is present #1073

Merged
merged 3 commits into from
Oct 15, 2024

Conversation

FredTreg
Copy link
Contributor

@FredTreg FredTreg commented Oct 14, 2024

When computing the minReadableWidth of a cell, the code does not account for non-breaking space characters (also known as nbsp or \u00A0).

nbsp characters should be treated as non-space characters when calculating the string length, as this is the intended function of such characters.

Failing to do so results in suboptimal output, particularly for languages like French, where punctuation marks such as colons (:) are always preceded by a space and should remain on the same line as the preceding word.

This PR fixes that issue.

@umaganesan
Copy link

umaganesan commented Oct 14, 2024 via email

@simonbengtsson
Copy link
Owner

Interesting! Based on my very limited understanding shouldn't a cell with the content d'où viens-tu ? be considered two words then? Ie wouldn't it be better to calculate the "longestWordWidth" based on the entire viens-tu ? instead of viens-tu and "?" separately?

@FredTreg
Copy link
Contributor Author

FredTreg commented Oct 14, 2024

This is exactly the goal of this fix, the French would be pre-processed as viens-tu\u00A0?, so with my fix it would be one word only.
Another example is with currencies (still in French), where we would write 12 € preprocessed as 12\u00A0€. Without the fix it would be appear as 2 lines, with the fix it stays on one line

@simonbengtsson
Copy link
Owner

Got it! I thought since \s does not match non breaking spaces as far as I understand it would work with this. But I'll try it tomorrow considering you have experienced issues with it.

@FredTreg
Copy link
Contributor Author

since \s does not match non breaking spaces

I do not know about that, I did the following on the Chrome console:

> const words = 'two\u00A0words'
  words.split(/\s+/)
  > (2) ['two', 'words']

So it does split on nbsp and the character is indeed listed as whitespace on MDN: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Character_classes

@simonbengtsson
Copy link
Owner

simonbengtsson commented Oct 15, 2024

You are right. I tried with a regex tester yesterday, but apparently incorrectly. Can you add a comment or a named variable for the regex? Then I'll merge promptly.

@FredTreg
Copy link
Contributor Author

FredTreg commented Oct 15, 2024

Done!

@simonbengtsson simonbengtsson merged commit 70b66de into simonbengtsson:master Oct 15, 2024
@simonbengtsson
Copy link
Owner

Thanks! Merged and released in v3.8.4

@FredTreg
Copy link
Contributor Author

Thank you, and thanks for making it easy to contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants