Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with parsing Vietnamese names #17

Closed
yellow1912 opened this issue Mar 28, 2019 · 6 comments
Closed

Issue with parsing Vietnamese names #17

yellow1912 opened this issue Mar 28, 2019 · 6 comments
Assignees

Comments

@yellow1912
Copy link

Hello there,

I have a name like this:

Nguyễn Quốc Thái

After parsing, the fullname I get there was "NguyễN QuốC TháI", you can notice how the cases are messed up.

One more thing is that Vietnamese names are written like this: Lastname MiddleName Firstname, how should this be handled?

@wyrfel wyrfel self-assigned this May 7, 2019
@wyrfel
Copy link
Contributor

wyrfel commented May 7, 2019

Hey @yellow1912 . Only seeing this now. Thanks for raising the issue.
The capitalisation issue is definitely a code bug that shouldn't happen, probably has to do with the unicode characters. Does this happen only when re-rendering the full name or also when pulling parts individually?
The order issue is a bit of a conceptional challenge as so far he have stayed away from customising the name part scheme by language. We'd have to assess how much complication it causes to make this possible.

@yellow1912
Copy link
Author

Hi @wyrfel

Let me check the utf8 issue. Regarding the ordering, perhaps we could have a setting to reverse order? If not I will just go ahead and do the if else outside, np at all.

@wyrfel
Copy link
Contributor

wyrfel commented Jun 3, 2019

Hi @yellow1912,

i have tried to reproduce the normalisation (capitalisation) issue but couldn't. I have added cases for it to the unit test suite in PR #20 .

As for the reversed parsing order...i'm afraid there is no way this name parser can detect if the name is vietnamese and hence can't reverse the order on it's own. I could add a setting for this, but i think that may as well then be done outside of the parser.

@wyrfel
Copy link
Contributor

wyrfel commented Nov 5, 2019

@yellow1912 It bugs me that this issue is still open. 😄 I've been think about this more. There is a possibility to include automatic reversal of name parts or maybe even a sort of name part templating via the language files. But that means we would have to introduce a form of language detection in those files.
Can you think of any reliable way to detect vietnamese names or at least detect when a (vietnamese) name needs to be treated in reverse order?

@wyrfel
Copy link
Contributor

wyrfel commented Nov 8, 2019

@yellow1912 I have added a section in the README about possible ways to detect the names language outside of the name parser. At this stage would like to treat this as an edge case and as something that can be handled outside the core name parser (by reversing the word order in the string or by overloading name parser components). Should you come up with an implementation i'd be very interested to hear about it and possibly integrate it or parts of it into this package.

@wyrfel wyrfel closed this as completed Nov 8, 2019
@yellow1912
Copy link
Author

Sorry for disappearing for a while. I don't think the parser should detect the language, but perhaps there can be and option to reverse things when you return? Right now I manually detect and reverse outside of the parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants