handle latin -ne and -ve enclytics #7

balmas · 2020-09-18T15:18:48Z

The llt-tokenizer (https://github.com/perseids-project/llt-tokenizer) used the prometheus latin stems database (https://github.com/perseids-project/llt-db_handler) to use morphological rules to determine when -ne and -ve represented enclyctics.

If we want to handle this in the spacy tokenizer the right approach would probably be to add training data for a Latin model and then retokenize to fix the instances that need to be fixed based upon the morph data.

For now, I've just implemented the simpler regex based handling from llt-tokenizer for the que enclytics.

balmas mentioned this issue Sep 21, 2020

urns and language models #9

Merged

balmas self-assigned this Sep 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handle latin -ne and -ve enclytics #7

handle latin -ne and -ve enclytics #7

balmas commented Sep 18, 2020

handle latin -ne and -ve enclytics #7

handle latin -ne and -ve enclytics #7

Comments

balmas commented Sep 18, 2020