-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Foreign names analyzed as compound, should be flat #81
Comments
or we could introduce |
I'd prefer foreignness to be in the Foreign feat, since otherwise you only see this for multiword foreign expressions (for example if someone says: ¡Ole! In an English corpus, I'd like it to be Foreign=Yes, but it wouldn't have a flat relation of any kind. So I think these should be flat and foreign, but one is a deprel and the other is a foreign language identification (ideally coupled with what language it is, which we now have in GUM as well) |
Oops I spoke too soon. These are names, so should be
Both I see the subtypes as a way of explaining why a flat structure is needed. A |
Sure, nothing stops people from using subtypes. I'm just not tempted to add these to any corpus I maintain and would probably steer new developers away from them if it were up to me, because we already have PROPN for names and Foreign for foreign, so this just adds a layer where we could have conflicting analyses (and it missed one word instances, as mentioned, so it's not really useful for retrieval). I think names are really an entity level property, and foreignness is a text-span property, but I'm happy enough with PROPN and Foreign as practical operationalizations, especially given that most UD corpora don't have NER or codeswitching/Lang annotations. Documenting the reason why something is flat seems beyond the scope of what deprels should be responsible for to me - stating that something is foreign or a name seems interesting, by contrast, but is additional information to the syntax tree itself. |
Per the new foreign expressions policy I think we should just treat foreign names as flat and not specify a feature. |
etc.
The text was updated successfully, but these errors were encountered: