Issue with parsing Vietnamese names #17

yellow1912 · 2019-03-28T10:38:47Z

Hello there,

I have a name like this:

Nguyễn Quốc Thái

After parsing, the fullname I get there was "NguyễN QuốC TháI", you can notice how the cases are messed up.

One more thing is that Vietnamese names are written like this: Lastname MiddleName Firstname, how should this be handled?

wyrfel · 2019-05-07T06:46:46Z

Hey @yellow1912 . Only seeing this now. Thanks for raising the issue.
The capitalisation issue is definitely a code bug that shouldn't happen, probably has to do with the unicode characters. Does this happen only when re-rendering the full name or also when pulling parts individually?
The order issue is a bit of a conceptional challenge as so far he have stayed away from customising the name part scheme by language. We'd have to assess how much complication it causes to make this possible.

yellow1912 · 2019-05-07T06:55:31Z

Hi @wyrfel

Let me check the utf8 issue. Regarding the ordering, perhaps we could have a setting to reverse order? If not I will just go ahead and do the if else outside, np at all.

wyrfel · 2019-06-03T13:35:59Z

Hi @yellow1912,

i have tried to reproduce the normalisation (capitalisation) issue but couldn't. I have added cases for it to the unit test suite in PR #20 .

As for the reversed parsing order...i'm afraid there is no way this name parser can detect if the name is vietnamese and hence can't reverse the order on it's own. I could add a setting for this, but i think that may as well then be done outside of the parser.

wyrfel · 2019-11-05T11:38:49Z

@yellow1912 It bugs me that this issue is still open. 😄 I've been think about this more. There is a possibility to include automatic reversal of name parts or maybe even a sort of name part templating via the language files. But that means we would have to introduce a form of language detection in those files.
Can you think of any reliable way to detect vietnamese names or at least detect when a (vietnamese) name needs to be treated in reverse order?

wyrfel · 2019-11-08T23:57:32Z

@yellow1912 I have added a section in the README about possible ways to detect the names language outside of the name parser. At this stage would like to treat this as an edge case and as something that can be handled outside the core name parser (by reversing the word order in the string or by overloading name parser components). Should you come up with an implementation i'd be very interested to hear about it and possibly integrate it or parts of it into this package.

yellow1912 · 2019-11-15T06:41:53Z

Sorry for disappearing for a while. I don't think the parser should detect the language, but perhaps there can be and option to reverse things when you return? Right now I manually detect and reverse outside of the parser.

wyrfel self-assigned this May 7, 2019

wyrfel mentioned this issue Jun 3, 2019

Add unit tests to verify parsing and normalisation of vietnamese names #20

Merged

wyrfel closed this as completed Nov 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with parsing Vietnamese names #17

Issue with parsing Vietnamese names #17

yellow1912 commented Mar 28, 2019

wyrfel commented May 7, 2019

yellow1912 commented May 7, 2019

wyrfel commented Jun 3, 2019

wyrfel commented Nov 5, 2019

wyrfel commented Nov 8, 2019

yellow1912 commented Nov 15, 2019

Issue with parsing Vietnamese names #17

Issue with parsing Vietnamese names #17

Comments

yellow1912 commented Mar 28, 2019

wyrfel commented May 7, 2019

yellow1912 commented May 7, 2019

wyrfel commented Jun 3, 2019

wyrfel commented Nov 5, 2019

wyrfel commented Nov 8, 2019

yellow1912 commented Nov 15, 2019