bpo-18236: Adjust str.isspace to use Unicode's White_Space property. #16254

gnprice · 2019-09-18T06:02:15Z

When Unicode support was first added to Python, there was no Unicode
property identifying whitespace, so we approximated it by putting
together a couple of other properties.

Now there is a White_Space property, so let's use it.

Happily, the difference from our original approximation is only in the
four rare control characters 001C..001F.

As a bonus, isspace now joins all similar methods in giving exactly
matching results for ASCII characters represented as str or as
bytes. Add a test for that nice property.

https://bugs.python.org/issue18236

When Unicode support was first added to Python, there was no Unicode property identifying whitespace, so we approximated it by putting together a couple of other properties. Now there is a White_Space property, so let's use it. Happily, the difference from our original approximation is only in the four rare control characters 001C..001F. As a bonus, `isspace` now joins all similar methods in giving exactly matching results for ASCII characters represented as `str` or as `bytes`. Add a test for that nice property.

Numerlor

Ran into an issue with this when I noticed int's stripping is different to str's, so it's nice to see that there's already a PR for a fix.

The introduced changes seem good to my eye that's not entirely familiar with the codebase, apart from the outdated version notices.

I've also noticed that the added docstrings to test_unicode are using single quote marks while others are using double quotes so changing that to be consistent would be nice

Numerlor · 2022-05-03T18:14:46Z

Lib/test/test_unicode.py

+                             ((bidirectional in ('WS', 'B', 'S')
+                               or category == 'Zs')
+                              and codepoint not in range(0x1c, 0x20)))


Is this guaranteed to hold in future unicode versions? i.e. should it be tested on the string method if it's only testing the unicode database parsing and generation?

the-knights-who-say-ni added the CLA signed label Sep 18, 2019

bedevere-bot added the awaiting review label Sep 18, 2019

csabella requested review from abalkin and benjaminp February 6, 2020 01:45

Numerlor reviewed May 3, 2022

View reviewed changes

ezio-melotti removed the CLA signed label Jul 13, 2022

abalkin mentioned this pull request Sep 18, 2023

str.isspace should use the Unicode White_Space property #62436

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-18236: Adjust str.isspace to use Unicode's White_Space property. #16254

bpo-18236: Adjust str.isspace to use Unicode's White_Space property. #16254

gnprice commented Sep 18, 2019 •

edited by bedevere-bot

Loading

Numerlor left a comment

Numerlor May 3, 2022

bpo-18236: Adjust str.isspace to use Unicode's White_Space property. #16254

Are you sure you want to change the base?

bpo-18236: Adjust str.isspace to use Unicode's White_Space property. #16254

Conversation

gnprice commented Sep 18, 2019 • edited by bedevere-bot Loading

Numerlor left a comment

Choose a reason for hiding this comment

Numerlor May 3, 2022

Choose a reason for hiding this comment

gnprice commented Sep 18, 2019 •

edited by bedevere-bot

Loading