Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add punctuation more commonly used in Asian languages (ellipsis) to sentence parsing #1044

Merged
merged 1 commit into from
Sep 11, 2020

Conversation

felixonmars
Copy link
Contributor

@felixonmars felixonmars commented Sep 8, 2020

Ellipsis is used to terminate sentences in Chinese and used a lot at end of title.

Checklist

  • I have read the contributing doc.
  • I have included a link to the relevant issue number.
  • I have tested this code locally.
  • I have checked to ensure there aren't other open pull requests
    for the same issue.
  • I have written new tests for these changes, as needed.
  • All tests pass.

Ellipsis is used to terminal sentences in Chinese and used a lot at end of title.
@wren
Copy link
Member

wren commented Sep 9, 2020

Characters added are u2026 and u22EF.

2026 is in the "General Punctuation" block, but 22EF is in "Mathematical Operators." Does it make sense for 22EF to be part of this if it's for math instead of punctuation?

@felixonmars
Copy link
Contributor Author

It's at least used by default in Mac OS's default Chinese IME, and perhaps some others too. So I think it makes sense.

@wren wren added the enhancement New feature or request label Sep 11, 2020
@wren wren changed the title Add Ellipsis to SENTENCE_SPLITTER Add punctuation more commonly used in Asian languages (ellipsis) to sentence parsing Sep 11, 2020
Copy link
Member

@wren wren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@wren wren merged commit 9ee4c21 into jrnl-org:develop Sep 11, 2020
@felixonmars felixonmars deleted the patch-1 branch September 11, 2020 23:56
wren pushed a commit that referenced this pull request Oct 17, 2020
Ellipsis is used to terminal sentences in Chinese and used a lot at end of title.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants