-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[pt] Enable multi-token spell-checking (#10052)
* [pt] Adapt resources to multiwords, dictionary fixes * Add smart titlecase method to StringTools * [pt] Fix multiword prepositions * [pt] Update PT tokeniser - improve handling of percent signs (was: [50%OFF], will be: [50%, OFF]); - add some tests due to the latest dictionary version. * [pt] Add speller tests due to latest dictionary * Add titlecasing step to MultiWordChunker class - multitoken suggestions were failing because we were only checking if they were present in the dictionary by upcasing their first letter; - this failed to account for titlecasing (either naively or a little more smartly), which is relatively frequent; - cf. stuff like "The Lord of the Rings". * [pt] Bump up dict to v0.12 * Improve titlecase logic in MultiWordChunker * Add titlecasing option to multi-word chunker - only Portuguese has it *on*, all other locales have it set to false; - add a simple StringTools method to check if all words in a multi-token string are lowercase (and tests). --------- Co-authored-by: p-goulart <pedro.goulart@languagetool.org>
- Loading branch information
Showing
20 changed files
with
479 additions
and
628 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.