-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UPOS "X" #440
Comments
This is definitely not right, because LS is also the tag for graphical bullets, which are in no way numbers. I'm also not sure that "A1.iii)" is a number, I'd say it's much more of an |
https://universaldependencies.org/u/pos/SYM.html says bullets are PUNCT. It seems to be distinguishing them from list item markers with a (quasi)numerical component (i.e., they reflect a position in a sequential ordering of some kind). I could also imagine thinking of lists as a type of coordination, and these as helping to mark how a list item relates to other items in the list, so CCONJ. But that may be unpopular. :) |
I'm not so convinced. I think syntactically there is no difference between numerical, graphical, alphabetical and mixed list item markers. It's all the same kind of orthographic device, and I would like them to have the same analysis. I wouldn't feel too bad about punct, but then we are not allowed to treat them as kinds of numbers morphologically, and in any case it would create an uncomfortable situation where punctuation becomes open ended. Tagging them all as SYM, or even splitting them into SYM for non-numerical and NUM for numerical would be OK for me too, but I think they should have the same deprel regardless of what kind of list item marker they are. |
The guidelines at https://universaldependencies.org/u/pos/X.html say it should be used very restrictively.
Setting aside the usage with
goeswith
dependents, we have:FW
orLS
: https://universal.grew.fr/?custom=65184da9dbe01. There are also a fewFW
lexemes that are not X, mainly borrowed Latin abbreviations: https://universal.grew.fr/?custom=65184e2eded21LS
, https://universaldependencies.org/u/dep/list.html says that list item numbers should be NUMFW
,ADD
- URLs and email addresses (would PROPN work for these?),GW
(mainly space-separated parts of filenames),NN
andNNP
within filenames, andAFX
affixes like "ex". SomeGW
parts of filenames have substantive UPOS, as do someFW
andAFX
words: https://universal.grew.fr/?custom=651850b455da1GUM XPOS doesn't use
ADD
orAFX
(these are more recent additions to the PTB tagset). But I see internet addresses under PROPN in GUM, which makes sense linguistically.I think steps here are:
LS
list markersADD
to PROPN instead of X, and move guidelines examples from SYM (Internet addresses (URLs, emails), phone numbers: PROPN vs. SYM docs#973)flat
orgoeswith
, and what to do about transparent syntax within parts of filenames) (Filenames and other computery entities docs#666)X
and there should be anExtPos
)The text was updated successfully, but these errors were encountered: