Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(nlp): Romanian tokenizer, stemmer and stopwords added #1109

Merged
merged 3 commits into from
Nov 19, 2020

Conversation

elozano98
Copy link
Contributor

Description

Romanian tokenizer, stemmer, and stopwords have been added to contentful nlp.

Context

Adding them will make it possible to process Romanian text.

Approach taken / Explain the design

The tokenizer and the stemmer used are from the nlpjs library while the stopwords have been collected from the nltk python library.

Testing

The pull request...

  • ✔️ has unit tests

Copy link
Contributor

@vanbasten17 vanbasten17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@elozano98 elozano98 changed the title chore(nlp): romanian tokenizer, stemmer and stopwords added chore(nlp): Romanian tokenizer, stemmer and stopwords added Nov 19, 2020
@elozano98 elozano98 requested a review from dpinol November 19, 2020 11:01
@elozano98 elozano98 merged commit 745f5a8 into master Nov 19, 2020
@elozano98 elozano98 deleted the contentful/ro branch November 19, 2020 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants