You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If we want to handle this in the spacy tokenizer the right approach would probably be to add training data for a Latin model and then retokenize to fix the instances that need to be fixed based upon the morph data.
For now, I've just implemented the simpler regex based handling from llt-tokenizer for the que enclytics.
The text was updated successfully, but these errors were encountered:
The llt-tokenizer (https://github.com/perseids-project/llt-tokenizer) used the prometheus latin stems database (https://github.com/perseids-project/llt-db_handler) to use morphological rules to determine when -ne and -ve represented enclyctics.
If we want to handle this in the spacy tokenizer the right approach would probably be to add training data for a Latin model and then retokenize to fix the instances that need to be fixed based upon the morph data.
For now, I've just implemented the simpler regex based handling from llt-tokenizer for the que enclytics.
The text was updated successfully, but these errors were encountered: