Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UPOS "X" #440

Closed
3 of 5 tasks
nschneid opened this issue Sep 30, 2023 · 4 comments
Closed
3 of 5 tasks

UPOS "X" #440

nschneid opened this issue Sep 30, 2023 · 4 comments

Comments

@nschneid
Copy link
Contributor

nschneid commented Sep 30, 2023

The guidelines at https://universaldependencies.org/u/pos/X.html say it should be used very restrictively.

Setting aside the usage with goeswith dependents, we have:

GUM XPOS doesn't use ADD or AFX (these are more recent additions to the PTB tagset). But I see internet addresses under PROPN in GUM, which makes sense linguistically.

I think steps here are:

  1. Harmonize treatment of LS list markers
  2. Map EWT ADD to PROPN instead of X, and move guidelines examples from SYM (Internet addresses (URLs, emails), phone numbers: PROPN vs. SYM docs#973)
  3. Review separated affixes and assign a POS based on the kind of modification, typically ADJ or ADV (UPOS for affixes (AFX) #152)
  4. Come up with a coherent EWT policy for filenames (e.g. flat or goeswith, and what to do about transparent syntax within parts of filenames) (Filenames and other computery entities docs#666)
  5. Clarify UPOS policy for flat:foreign structures (maybe individual words should be X and there should be an ExtPos)
@amir-zeldes
Copy link
Contributor

list item numbers should be NUM

This is definitely not right, because LS is also the tag for graphical bullets, which are in no way numbers. I'm also not sure that "A1.iii)" is a number, I'd say it's much more of an X. I see some mention of using either PUNCT/punct or SYM/dep for these. In GUM xpos=LS is always attached as dep, and nummod is only used for counting things.

@nschneid
Copy link
Contributor Author

This is definitely not right, because LS is also the tag for graphical bullets, which are in no way numbers.

https://universaldependencies.org/u/pos/SYM.html says bullets are PUNCT. It seems to be distinguishing them from list item markers with a (quasi)numerical component (i.e., they reflect a position in a sequential ordering of some kind).

I could also imagine thinking of lists as a type of coordination, and these as helping to mark how a list item relates to other items in the list, so CCONJ. But that may be unpopular. :)

@amir-zeldes
Copy link
Contributor

I'm not so convinced. I think syntactically there is no difference between numerical, graphical, alphabetical and mixed list item markers. It's all the same kind of orthographic device, and I would like them to have the same analysis. I wouldn't feel too bad about punct, but then we are not allowed to treat them as kinds of numbers morphologically, and in any case it would create an uncomfortable situation where punctuation becomes open ended.

Tagging them all as SYM, or even splitting them into SYM for non-numerical and NUM for numerical would be OK for me too, but I think they should have the same deprel regardless of what kind of list item marker they are.

@nschneid
Copy link
Contributor Author

nschneid commented Jul 6, 2024

LS issue --> #465
AFX issue --> #152

So I think we're done here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants