Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Punctuation characters set #16708

Closed
wants to merge 4 commits into from

Conversation

artemklevtsov
Copy link
Contributor

Sometimes is useful to check or strip punctuation.

Sometimes is useful to check or strip punctuation.
lib/pure/strutils.nim Outdated Show resolved Hide resolved
@artemklevtsov
Copy link
Contributor Author

I also add the isPunctAscii function and some tests for it.

@@ -89,6 +89,9 @@ const
## All the characters that count as whitespace (space, tab, vertical tab,
## carriage return, new line, form feed).

Punctuation* = {'!', '"', '#', '$', '%', '&', '\'', '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=', '>', '?', '@', '[', '\\', ']', '^', '_', '`', '{', '|', '}', '~'}
## All the characters that count as punctuation (comma, colon, dash etc).

Copy link
Member

@timotheecour timotheecour Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: how about also adding Printable?
refs https://docs.python.org/3/library/string.html?highlight=string#string.printable

could also be another PR

doAssert isPunctAscii(':')
doAssert isPunctAscii(';')
doAssert(not isPunctAscii('A'))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and add from std/setutils import toSet on top

Suggested change
# check this matches https://en.cppreference.com/w/c/string/byte/ispunct
doAssert "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~".toSet == Punctuation

@@ -158,6 +161,14 @@ func isSpaceAscii*(c: char): bool {.rtl, extern: "nsuIsSpaceAsciiChar".} =
doAssert isSpaceAscii('\t') == true
return c in Whitespace

func isPunctAscii*(c: char): bool {.rtl, extern: "nsuIsPunctAsciiChar".} =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think we need this proc, c in Punctuation seems good enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'll remove it.

@Araq
Copy link
Member

Araq commented Jan 14, 2021

Seems more dangerous than useful, sorry. What does count as "punctuation" depends too much on the concrete use case. If you follow some spec, better write the set down yourself rather than relying on strutils's definition of it. And if you don't have a spec and use strutils.Punctuation, either we can never change its definition ("improve it because then it's consistent with Python...") or we break your code when we do.

@Araq Araq closed this Jan 14, 2021
@Clyybber Clyybber mentioned this pull request Jun 6, 2021
1 task
@timotheecour
Copy link
Member

=> followup #18193 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants