-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I use the fine-tune code for Vietnamese? #221
Comments
#57 (comment) You'll need to edit vocab.txt to include missing Vietnamese characters, mainly accented vowels for both uppercase and lowercase (ắ, ấ, ồ, Ố...), replace them with unused characters (like Chinese, Korean characters) Edit convert_char_to_pinyin in /model/utils.py to something that only convert string to char array, like:
|
Could you provide me with your Vietnamese vocab.txt? I haven't set it up yet, and I really need it for testing.
|
I don't have access to my training PC now, basically you can just make a list of all uppercase and lowercase Vietnamese vowels and their accented combo ['a', 'á', 'à', 'ã', 'ạ', 'ả', 'A', 'Á', 'À', 'Ã', 'Ạ', 'Ả',...], then check each of them if it already existed in vocab.txt or not, then replace some of unused characters in vocab.txt with the missing Vietnamese ones. It should be easy to write a script to do this |
Thank you for your help. This is very useful to me. |
where can i find the Vietnamese version? |
I wan to using code to training model for Vietnamese domains. I using phoneme for vocab. How I can get it?
The text was updated successfully, but these errors were encountered: