-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update all labels according to a standard convention (replaces #20) #227
Comments
@DanCarey404 Can you please add a summary of the agreed-upon standard so that it is documented here? Then can #20 be closed? |
Replaces #20 |
Per Rebecca's request, these are the labeling standards being implemented. Classes
Properties
|
Just to clarify: this was not my request: it was my proposal, which the group discussed and agreed on. :) |
This task needs an assignee. |
@sa-bpelakh Boris has a query for this. Can you pop it into this issue, for convenience? |
What I have are SHACL rules that validate that labels are conformant: https://github.com/semanticarts/platts-ontology/blob/develop/shapes/ontologyShapes.ttl. They enforce the policy described above, and even detect acronyms in all caps (minimum of 2 letters, I believe) and ignore their casing. The current version does not allow for numbers in class or property names, so if that's a requirement, we'll have to make some changes. |
I think we should allow numbers; e.g., hypothetically we could define classes like Shimano105Components, Iso639 (subclass of Category), TourDeFrance2020Racers, CharactersIn1984, ... |
I can see that for a specific domain, but for the base gist? We can set up a fun game of regex golf for labels. |
Maybe it's less likely in an upper ontology, but why exclude it in principle? Re regex golf - that looks fun! Maybe at our next happy hour? |
We get to choose our own stylistic conventions, as does each client project. I don't think we want gist to have numbers in IRIs and a rule for this would find a one the looks exactly like a lower case el. So I vote to put it in our gist checks as a warning. |
If we disallow numbers in local names, and we happen to come across the need for one, we are then forced to spell the number out, which I think is worse. What's wrong with numbers in IRIs? A reminder that this issue is the implementation of a set of conventions that had already been decided on and documented in the gist style guide. The point is not to revisit the decisions here. Quoting from the style guide that we had agreed on:
This issue surfaced because I want to find a new assignee, since we agreed on the implementation back in April and have been postponing it since them. |
@sa-bpelakh Platts and gist should be able to have different naming conventions. Is it |
Its true that this issue should not get into what the style conventions are. That can be debated in a separate issue if anyone care enough to raise it. |
@rjyounes Yes, the bundle file configures which shapes to apply. So we can configure whatever we consider appropriate for gist, and customers can, um, customize 😄 whichever way they want. |
Team has disagreement on the naming convention as of 12/10/2020 issues meeting. @DanCarey404 will poll SA ontologists. While there IS consensus to follow the standard, there is not consensus ON the standard. (Detail: Some want Title Case for classes, not sentence case. Some want Title Case for all concepts. Rationale: all are concepts, and labeling for particular use cases (like sentence generation vs. column headings) won't always work. ) PS @sa-bpelakh will modularize the SHACL checking to allow ease of applying different conventions based on where starndard lands. |
@marksem @DanCarey404 Can we please move discussion of this issue to a gist review meeting and notes here? Our goal is to be transparent, and decisions made by internal polling are not. In addition, there needs to be a rationale for reopening a decision that was made months ago. We cannot rethink every issue for those who did not attend the discussion. If someone who is unable to attend wants to provide input, that can be indicated here and we can accommodate them by scheduling a special meeting if needed. |
My input is based on earlier decisions now recorded in the gist style guide: Classes
Properties
Rationale We adopt sentence over title case because the latter, while technically well-defined, has more complex rules and can introduce inconsistencies when implemented by different users. Additional notes:
|
I find @rjyounes 's arguments and rationale compelling. If anyone wants to use labels for column headers then they can introduce a subproperty of altLabel called, say titleCaseLabel. |
I didn't realize that one of the issues at stake in the renewed discussion was the use of labels as column headers. IMO that makes the case even stronger: it's hard to justify considering the preferred label as one designed for column headings or any other implementation-specific use. We have actually had this discussion during review of #20, where we reached the same conclusion as in @uscholdm's suggestion above, to define additional annotations for application-specific needs. In the case of column headers, they are (or could be) the same as the local names, so one could parse the IRIs to derive the local names for use as column headers and not maintain the values in an annotation. |
I suggest that all words in a label have a leading capital. One reason for this suggestion is that Notepad++ has a convert case option (Proper Case) which does that, as does MS Word (Capitalize Each Word). This removes ambiguity from the rule and ensures the consistency that some are looking for. |
@DanCarey404 Are you suggesting that even function words (prepositions, articles, etc) would be capitalized? That's not a type of casing I've ever heard of, other than the applications you mention. |
One reason for using initial lower for properties: we use labels that are tied to the local names, and should preferably be derivable from them by some simple rules, such as adding whitespace at word boundaries indicated by camel-casing. Since our properties have local names with initial lowercase, this suggests the labels should follow suit. |
These are the logical options for classes and properties:
Note: 2-4 make exceptions for acronyms and terms that are generally capitalized: Social Security Number, has SSN, has Social Security Number. I would reject 5 because a label is meant for humans and thus should be in natural language. We haven't mentioned taxonomy terms. Logical options for taxonomy terms:
Review of conventions used by well-known ontologies: SKOS: Concept Scheme, exact match (2) Conclusion: There are no generally accepted conventions; we should choose whichever one we like best. Note on title case: I also propose that labels conform to natural language standards by the insertion of, for example, hyphens, even if our standards for local names do not include such characters. E.g., ISBN-10 for class Isbn10. |
Notes from 2021-01-14 triage meeting: Dave: When do we see labels?
Which would you rather see in these contexts? Rebecca: we also see them in documentation (e.g., Widoco) Peter: accuracy more important than typographic consistency Will vote next meeting. |
Thank you @rjyounes for comprehensive summary.
Exactly.
Most taxonomy terms are instances of |
These are the logical options for class and property labels:
Offline voting yields #2 as the winner. Rebecca will compile a short list of title case conventions for consideration at next meeting. The selected convention will be included in the gist style guide. |
I've sorted through a number of style guides from reputable sources (AP, APA, Chicago Manual of Style, MLA, NYT, Wikipedia). The details are included in the attached document as I think they will not be of general interest. I've come up with an amalgam of various conventions that is also computable (e.g., a rule to capitalize nouns, verbs, adjectives, adverbs, and pronouns, or to lowercase prepositions unless stressed, is not computable), as follows:
Attachment: Title Case Conventions.pdf |
Regarding automated conversion of local names to labels: there's an issue in the conversion of acronyms and hyphenated words. There are two possible local name conventions:
However, labels should include natural language formats: is CIA agent, not is Cia agent. The correct version cannot be algorithmically computed from either local name. The same may be true of hyphenated words, depending on the local name convention. ISBN-10 can be automatically computed from In fact, in general it is easier to derive the local name from the label than vice versa. If we want to stick to our proposed local name conventions, we will use the forms |
Interesting observation, it usually goes the other way, but this sounds correct.
I think it is easier to see the boundary in the former: |
Yes, that's an error. I've fixed it above. |
Title case proposal above accepted for implementation. |
Boris will fix all labels, first by automation and then manual adjustment for exceptions. |
In writing the label validation script (see PR #428), Boris noted that proper nouns in labels must also retain capitalization. An emended version of the label conventions follows: Title Case Convention
Label Conventions The following exceptions apply to both class and property labels:
The exception for proper nouns makes the convention not fully automatable. The implementation of these conventions in current labels will be done by Boris using a script with manual corrections (for the non-automatable exceptions). To support label validation as part of bundling the ontology for release, we will add an additional ontology file with an annotation signaling to the validation script that the label is not subject to the validation rules. We propose Any objections to the annotation name should be voiced here. |
Add/replace rdfs:label values according to the agreed-to standard.
The text was updated successfully, but these errors were encountered: