Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-18236: Adjust str.isspace to use Unicode's White_Space property. #16254

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gnprice
Copy link
Contributor

@gnprice gnprice commented Sep 18, 2019

When Unicode support was first added to Python, there was no Unicode
property identifying whitespace, so we approximated it by putting
together a couple of other properties.

Now there is a White_Space property, so let's use it.

Happily, the difference from our original approximation is only in the
four rare control characters 001C..001F.

As a bonus, isspace now joins all similar methods in giving exactly
matching results for ASCII characters represented as str or as
bytes. Add a test for that nice property.

https://bugs.python.org/issue18236

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
When Unicode support was first added to Python, there was no Unicode
property identifying whitespace, so we approximated it by putting
together a couple of other properties.

Now there is a White_Space property, so let's use it.

Happily, the difference from our original approximation is only in the
four rare control characters 001C..001F.

As a bonus, `isspace` now joins all similar methods in giving exactly
matching results for ASCII characters represented as `str` or as
`bytes`.  Add a test for that nice property.
Copy link
Contributor

@Numerlor Numerlor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran into an issue with this when I noticed int's stripping is different to str's, so it's nice to see that there's already a PR for a fix.

The introduced changes seem good to my eye that's not entirely familiar with the codebase, apart from the outdated version notices.

I've also noticed that the added docstrings to test_unicode are using single quote marks while others are using double quotes so changing that to be consistent would be nice

Comment on lines +640 to +642
((bidirectional in ('WS', 'B', 'S')
or category == 'Zs')
and codepoint not in range(0x1c, 0x20)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed to hold in future unicode versions? i.e. should it be tested on the string method if it's only testing the unicode database parsing and generation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants