Update all labels according to a standard convention (replaces #20) #227

DanCarey404 · 2020-04-22T17:44:01Z

Add/replace rdfs:label values according to the agreed-to standard.

rjyounes · 2020-04-23T14:49:27Z

@DanCarey404 Can you please add a summary of the agreed-upon standard so that it is documented here? Then can #20 be closed?

rjyounes · 2020-04-23T14:49:37Z

Replaces #20

DanCarey404 · 2020-05-17T21:16:15Z

Per Rebecca's request, these are the labeling standards being implemented.

Classes

Sentence case
Normalized to natural language standards. E.g., hyphens inserted, acronyms in all caps, etc.
- Examples: AMA guideline, ISBN-10

Properties

Same as classes, but initial lowercase
Examples: has unit of measure, has SSN.

rjyounes · 2020-05-18T14:20:44Z

Just to clarify: this was not my request: it was my proposal, which the group discussed and agreed on. :)

rjyounes · 2020-12-04T19:54:42Z

This task needs an assignee.

uscholdm · 2020-12-04T20:42:46Z

@sa-bpelakh Boris has a query for this. Can you pop it into this issue, for convenience?

sa-bpelakh · 2020-12-06T12:48:48Z

What I have are SHACL rules that validate that labels are conformant: https://github.com/semanticarts/platts-ontology/blob/develop/shapes/ontologyShapes.ttl. They enforce the policy described above, and even detect acronyms in all caps (minimum of 2 letters, I believe) and ignore their casing. The current version does not allow for numbers in class or property names, so if that's a requirement, we'll have to make some changes.

rjyounes · 2020-12-06T19:39:38Z

I think we should allow numbers; e.g., hypothetically we could define classes like Shimano105Components, Iso639 (subclass of Category), TourDeFrance2020Racers, CharactersIn1984, ...

sa-bpelakh · 2020-12-06T20:14:16Z

I can see that for a specific domain, but for the base gist? We can set up a fun game of regex golf for labels.

rjyounes · 2020-12-09T15:38:37Z

Maybe it's less likely in an upper ontology, but why exclude it in principle?

Re regex golf - that looks fun! Maybe at our next happy hour?

uscholdm · 2020-12-09T18:01:58Z

Maybe it's less likely in an upper ontology, but why exclude it in principle?

We get to choose our own stylistic conventions, as does each client project. I don't think we want gist to have numbers in IRIs and a rule for this would find a one the looks exactly like a lower case el. So I vote to put it in our gist checks as a warning.

rjyounes · 2020-12-09T18:23:03Z

If we disallow numbers in local names, and we happen to come across the need for one, we are then forced to spell the number out, which I think is worse. What's wrong with numbers in IRIs?

A reminder that this issue is the implementation of a set of conventions that had already been decided on and documented in the gist style guide. The point is not to revisit the decisions here. Quoting from the style guide that we had agreed on:

Alphanumeric characters only.
- Example: Isbn10, not Isbn-10 or ISBN-10.

This issue surfaced because I want to find a new assignee, since we agreed on the implementation back in April and have been postponing it since them.

rjyounes · 2020-12-09T18:25:13Z

@sa-bpelakh Platts and gist should be able to have different naming conventions. Is it onto_tool that applies the SHACL rules? If so, the SHACL shapes or files to invoke (or a folder containing them) should be configured in the YAML file, or stored in a particular directory, or something.

uscholdm · 2020-12-09T21:38:34Z

Its true that this issue should not get into what the style conventions are. That can be debated in a separate issue if anyone care enough to raise it.

sa-bpelakh · 2020-12-09T21:54:31Z

@rjyounes Yes, the bundle file configures which shapes to apply. So we can configure whatever we consider appropriate for gist, and customers can, um, customize 😄 whichever way they want.

marksem · 2020-12-10T16:55:14Z

Team has disagreement on the naming convention as of 12/10/2020 issues meeting. @DanCarey404 will poll SA ontologists. While there IS consensus to follow the standard, there is not consensus ON the standard.

(Detail: Some want Title Case for classes, not sentence case. Some want Title Case for all concepts. Rationale: all are concepts, and labeling for particular use cases (like sentence generation vs. column headings) won't always work. )

PS @sa-bpelakh will modularize the SHACL checking to allow ease of applying different conventions based on where starndard lands.

rjyounes · 2020-12-14T14:28:01Z

@marksem @DanCarey404 Can we please move discussion of this issue to a gist review meeting and notes here? Our goal is to be transparent, and decisions made by internal polling are not. In addition, there needs to be a rationale for reopening a decision that was made months ago. We cannot rethink every issue for those who did not attend the discussion. If someone who is unable to attend wants to provide input, that can be indicated here and we can accommodate them by scheduling a special meeting if needed.

rjyounes · 2020-12-14T14:39:26Z

My input is based on earlier decisions now recorded in the gist style guide:

Classes

Sentence case
Normalized to natural language standards. E.g., hyphens inserted, acronyms in all caps, etc.
- Examples: AMA guideline, ISBN-10

Properties

Same as classes, but initial lowercase
Examples: has unit of measure, has SSN.

Rationale

We adopt sentence over title case because the latter, while technically well-defined, has more complex rules and can introduce inconsistencies when implemented by different users.

Additional notes:

Sentence case vs title case: I hold by the decision made earlier: We adopt sentence over title case because the latter, while technically well-defined, has more complex rules and can introduce inconsistencies when implemented by different users.
Lower case for all properties, object and datatype
Acronyms in labels: since I believe that labels (as opposed to local names) should be in natural language form, acronyms should be spelled as they normally are. I will note that UoM is not an actual English-language acronym and therefore is not a good test case. We should also be careful about when an acronym is a prefLabel and when an altLabel: there are cases where the acronym is the most common term (e.g., "CIA", "FBI") and therefore it should be the prefLabel and the fully-spelled out version should be the altLabel, but there are also cases of the reverse (e.g., "Electronic Arts" not "EA").
Labels are meant to be in natural language, not camelcase etc. Therefore, hyphens are appropriate where they are used in natural language (in this case English) but not otherwise.

uscholdm · 2020-12-14T21:02:02Z

I find @rjyounes 's arguments and rationale compelling. If anyone wants to use labels for column headers then they can introduce a subproperty of altLabel called, say titleCaseLabel.

rjyounes · 2020-12-15T14:10:47Z

I didn't realize that one of the issues at stake in the renewed discussion was the use of labels as column headers. IMO that makes the case even stronger: it's hard to justify considering the preferred label as one designed for column headings or any other implementation-specific use. We have actually had this discussion during review of #20, where we reached the same conclusion as in @uscholdm's suggestion above, to define additional annotations for application-specific needs. In the case of column headers, they are (or could be) the same as the local names, so one could parse the IRIs to derive the local names for use as column headers and not maintain the values in an annotation.

DanCarey404 · 2021-01-14T20:14:06Z

I suggest that all words in a label have a leading capital. One reason for this suggestion is that Notepad++ has a convert case option (Proper Case) which does that, as does MS Word (Capitalize Each Word). This removes ambiguity from the rule and ensures the consistency that some are looking for.

rjyounes · 2021-01-15T14:00:58Z

@DanCarey404 Are you suggesting that even function words (prepositions, articles, etc) would be capitalized? That's not a type of casing I've ever heard of, other than the applications you mention.

rjyounes · 2021-01-15T14:02:18Z

One reason for using initial lower for properties: we use labels that are tied to the local names, and should preferably be derivable from them by some simple rules, such as adding whitespace at word boundaries indicated by camel-casing. Since our properties have local names with initial lowercase, this suggests the labels should follow suit.

rjyounes · 2021-01-15T14:09:56Z

These are the logical options for classes and properties:

Title case for all: Temporal Relation, Has Giver, Identified By (in title case, prepositions at the end of a phrase receive stress and are in upper case)
Title case for classes, lower case for properties: Temporal Relation, has giver
Sentence case for all: Temporal relation, Has giver
Sentence case for classes, lower case for properties: Temporal relation, has giver.
Same as local name: TemporalRelation, hasGiver
Lower case for all: This has not been mentioned and I doubt if anyone wants it; we can probably rule it out.
Every word upper case: Has Unit Of Measure

Note: 2-4 make exceptions for acronyms and terms that are generally capitalized: Social Security Number, has SSN, has Social Security Number.

I would reject 5 because a label is meant for humans and thus should be in natural language.

We haven't mentioned taxonomy terms. Logical options for taxonomy terms:

Title case
Sentence case
Lower case

Review of conventions used by well-known ontologies:

SKOS: Concept Scheme, exact match (2)
PROV: SoftwareAgent, atLocation (5)
FOAF: Online Account, based near (2)
OAI-ORE: Aggregated Resource, Is Aggregated By (1)
OWL Time: Duration description, has beginning (4)
BIBFRAME (Library of Congress): Key title, Has event content (3)
dcterms: Method of Accrual, Date Modified (1)
Schema: Ignore Action, Accepted Offer (1)
Lingvo: Language resource, resource type (4)
Open Annotation: TextPositionSelector, hasBody (5)
Ordered List Ontology: Ordered List, has ordered list (2)

Conclusion: There are no generally accepted conventions; we should choose whichever one we like best.

Note on title case:
There is no one standard for title case: see https://en.wikipedia.org/wiki/Title_case. Chicago Manual of Style, Associated Press, etc. each define their own, though of course the broad convention is common to all. If we adopt title case, I propose that we choose one of these standard variants (or invent our own) and document it in the gist style guide as a reference for ontology developers and reviewers.

I also propose that labels conform to natural language standards by the insertion of, for example, hyphens, even if our standards for local names do not include such characters. E.g., ISBN-10 for class Isbn10.

rjyounes · 2021-01-15T15:07:01Z

Notes from 2021-01-14 triage meeting:

Dave: When do we see labels?

Graphics
Forms

Which would you rather see in these contexts?

Rebecca: we also see them in documentation (e.g., Widoco)

Peter: accuracy more important than typographic consistency

Will vote next meeting.

uscholdm · 2021-01-15T21:53:34Z

Thank you @rjyounes for comprehensive summary.

Conclusion: There are no generally accepted conventions; we should choose whichever one we like best.

Exactly.

We haven't mentioned taxonomy terms.

Most taxonomy terms are instances of gist:Category, which is a lot like a class, semantically. the key technical difference is that we use gist:categorizedBy instead of rdf:type to indicate what kind of thing something is. So we may want to adopt the same convention for taxonomy terms as we do for Classes.

rjyounes · 2021-01-28T16:23:12Z

These are the logical options for class and property labels:

Title case for all: Temporal Relation, Has Giver, Identified By (in title case, prepositions at the end of a phrase receive stress and are in upper case)
Title case for classes, lower case for properties: Temporal Relation, has giver
Sentence case for all: Temporal relation, Has giver
Sentence case for classes, lower case for properties: Temporal relation, has giver
~~Same as local name: TemporalRelation, hasGiver~~
~~Lower case for all: This has not been mentioned and I doubt if anyone wants it; we can probably rule it out.~~
~~Every word upper case: Has Unit Of Measure~~

Offline voting yields #2 as the winner.

Rebecca will compile a short list of title case conventions for consideration at next meeting. The selected convention will be included in the gist style guide.

rjyounes · 2021-02-02T16:13:33Z

I've sorted through a number of style guides from reputable sources (AP, APA, Chicago Manual of Style, MLA, NYT, Wikipedia). The details are included in the attached document as I think they will not be of general interest. I've come up with an amalgam of various conventions that is also computable (e.g., a rule to capitalize nouns, verbs, adjectives, adverbs, and pronouns, or to lowercase prepositions unless stressed, is not computable), as follows:

Capitalize:
a. First and last words
b. Words of four or more letters
c. Second part of hyphenated word (e..g, Data-Centric, not Data-centric)
Lowercase:
a. Articles: a, an, the
b. Conjunctions: and, but, if, for, or, nor, so, yet
c. Prepositions: as, at, by, cum, ere, for, in, of, off, on, out, per, pre, pro, qua, re, sub, to, up, via
Capitalize everything else

Attachment: Title Case Conventions.pdf

rjyounes · 2021-02-02T17:35:05Z

Regarding automated conversion of local names to labels: there's an issue in the conversion of acronyms and hyphenated words. There are two possible local name conventions:

Represent as in natural language - generally all uppercase - e.g., hasSSN
Represent in camel case - e.g., hasSsn. The argument is that word boundaries can be easily detected. isCiaAgent allows word boundary detection, while isCIAAgent does not. Even for human users, the word boundary is easier to see in the former.

However, labels should include natural language formats: is CIA agent, not is Cia agent. The correct version cannot be algorithmically computed from either local name.

The same may be true of hyphenated words, depending on the local name convention. ISBN-10 can be automatically computed from ISBN-10 but not from Isbn-10, ISBN10, or Isbn10.

In fact, in general it is easier to derive the local name from the label than vice versa.

If we want to stick to our proposed local name conventions, we will use the forms hasSsn, isCiaAgent, and Isbn10. These require human correction once the automated label generator has applied. If the latter runs before every release, we would need human intervention each time. Another option: add a skos:editorialNote indicating to the generator that the label should not be touched.

uscholdm · 2021-02-02T20:02:42Z

In fact, in general it is easier to derive the local name from the label than vice versa.

Interesting observation, it usually goes the other way, but this sounds correct.

The argument is that word boundaries can be easily detected. isCiaAgent allows word boundary detection, while isCIAAgent does not. Even for human users, the word boundary is easier to see in the latter.

I think it is easier to see the boundary in the former: isCiaAgent . Was that a typo?

rjyounes · 2021-02-02T20:09:26Z

Yes, that's an error. I've fixed it above.

rjyounes · 2021-02-11T16:09:40Z

Title case proposal above accepted for implementation.

rjyounes · 2021-02-11T16:11:22Z

Boris will fix all labels, first by automation and then manual adjustment for exceptions.

rjyounes · 2021-02-12T15:31:47Z

In writing the label validation script (see PR #428), Boris noted that proper nouns in labels must also retain capitalization. An emended version of the label conventions follows:

Title Case Convention

Capitalize:
a. First and last words
b. Words of four or more letters
c. Second part of hyphenated word (e..g., Data-Centric, not Data-centric)
Lowercase:
a. Articles: a, an, the
b. Conjunctions: and, but, if, for, or, nor, so, yet
c. Prepositions: as, at, by, cum, ere, for, in, of, off, on, out, per, pre, pro, qua, re, sub, to, up, via
Capitalize everything else

Label Conventions
Classes: title case (as above)
Properties: all lowercase

The following exceptions apply to both class and property labels:

Acronyms and proper nouns are kept intact (e.g., has SSN, unit symbol Unicode, ISBN-10)
Numbers are allowed (e.g., ISBN-10)
Hyphens are allowed (e.g., ISBN-10)

The exception for proper nouns makes the convention not fully automatable.

The implementation of these conventions in current labels will be done by Boris using a script with manual corrections (for the non-automatable exceptions). To support label validation as part of bundling the ontology for release, we will add an additional ontology file with an annotation signaling to the validation script that the label is not subject to the validation rules. We propose gist:nonConformingLabel for the annotation. See additional notes in PR #428.

Any objections to the annotation name should be voiced here.

DanCarey404 self-assigned this Apr 22, 2020

DanCarey404 added impact: minor New, backward-compatible functionality (does not change inferences; e.g., adding a term) priority: should have Medium priority feature or bug fix labels Apr 22, 2020

This was referenced May 4, 2020

Update all labels according to a standard convention #20

Closed

Define a set of stylistic conventions for annotations other than labels. #253

Open

rjyounes added status: implementation specified Implementation has been specified. A developer should be assigned. status: triaged labels May 28, 2020

rjyounes removed the status: triaged label Oct 8, 2020

rjyounes unassigned DanCarey404 Dec 4, 2020

sa-bpelakh self-assigned this Dec 10, 2020

rjyounes mentioned this issue Dec 14, 2020

Update gist style guide based on any changes made to prefLabel conventions #421

Closed

rjyounes added status: under review In triage and removed status: implementation specified Implementation has been specified. A developer should be assigned. labels Dec 14, 2020

This was referenced Jan 25, 2021

Add gist:description #425

Closed

Document use of skos label properties vs names for real world instances #427

Open

rjyounes assigned rjyounes and unassigned sa-bpelakh Jan 28, 2021

semanticarts deleted a comment from uscholdm Jan 29, 2021

sa-bpelakh linked a pull request Feb 1, 2021 that will close this issue

Label validation #428

Merged

rjyounes assigned sa-bpelakh and unassigned rjyounes Feb 11, 2021

rjyounes added status: implementation specified Implementation has been specified. A developer should be assigned. and removed status: under review In triage labels Feb 11, 2021

rjyounes mentioned this issue Feb 15, 2021

Should altLabels follow conventions outlined for prefLabels? #432

Closed

sa-bpelakh closed this as completed in #428 Feb 16, 2021

pmcb55 mentioned this issue Oct 10, 2022

Term naming convention - should gist:_USDollar be gist:_UnitedStatesDollar...? #755

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update all labels according to a standard convention (replaces #20) #227

Update all labels according to a standard convention (replaces #20) #227

DanCarey404 commented Apr 22, 2020

rjyounes commented Apr 23, 2020

rjyounes commented Apr 23, 2020

DanCarey404 commented May 17, 2020

rjyounes commented May 18, 2020

rjyounes commented Dec 4, 2020

uscholdm commented Dec 4, 2020

sa-bpelakh commented Dec 6, 2020

rjyounes commented Dec 6, 2020

sa-bpelakh commented Dec 6, 2020

rjyounes commented Dec 9, 2020 •

edited

Loading

uscholdm commented Dec 9, 2020

rjyounes commented Dec 9, 2020 •

edited

Loading

rjyounes commented Dec 9, 2020 •

edited

Loading

uscholdm commented Dec 9, 2020

sa-bpelakh commented Dec 9, 2020

marksem commented Dec 10, 2020 •

edited

Loading

rjyounes commented Dec 14, 2020 •

edited

Loading

rjyounes commented Dec 14, 2020 •

edited

Loading

uscholdm commented Dec 14, 2020

rjyounes commented Dec 15, 2020

DanCarey404 commented Jan 14, 2021

rjyounes commented Jan 15, 2021 •

edited

Loading

rjyounes commented Jan 15, 2021

rjyounes commented Jan 15, 2021 •

edited

Loading

rjyounes commented Jan 15, 2021 •

edited

Loading

uscholdm commented Jan 15, 2021

rjyounes commented Jan 28, 2021 •

edited

Loading

rjyounes commented Feb 2, 2021 •

edited

Loading

rjyounes commented Feb 2, 2021 •

edited

Loading

uscholdm commented Feb 2, 2021

rjyounes commented Feb 2, 2021

rjyounes commented Feb 11, 2021

rjyounes commented Feb 11, 2021

rjyounes commented Feb 12, 2021 •

edited

Loading

Update all labels according to a standard convention (replaces #20) #227

Update all labels according to a standard convention (replaces #20) #227

Comments

DanCarey404 commented Apr 22, 2020

rjyounes commented Apr 23, 2020

rjyounes commented Apr 23, 2020

DanCarey404 commented May 17, 2020

Classes

Properties

rjyounes commented May 18, 2020

rjyounes commented Dec 4, 2020

uscholdm commented Dec 4, 2020

sa-bpelakh commented Dec 6, 2020

rjyounes commented Dec 6, 2020

sa-bpelakh commented Dec 6, 2020

rjyounes commented Dec 9, 2020 • edited Loading

uscholdm commented Dec 9, 2020

rjyounes commented Dec 9, 2020 • edited Loading

rjyounes commented Dec 9, 2020 • edited Loading

uscholdm commented Dec 9, 2020

sa-bpelakh commented Dec 9, 2020

marksem commented Dec 10, 2020 • edited Loading

rjyounes commented Dec 14, 2020 • edited Loading

rjyounes commented Dec 14, 2020 • edited Loading

uscholdm commented Dec 14, 2020

rjyounes commented Dec 15, 2020

DanCarey404 commented Jan 14, 2021

rjyounes commented Jan 15, 2021 • edited Loading

rjyounes commented Jan 15, 2021

rjyounes commented Jan 15, 2021 • edited Loading

rjyounes commented Jan 15, 2021 • edited Loading

uscholdm commented Jan 15, 2021

rjyounes commented Jan 28, 2021 • edited Loading

rjyounes commented Feb 2, 2021 • edited Loading

rjyounes commented Feb 2, 2021 • edited Loading

uscholdm commented Feb 2, 2021

rjyounes commented Feb 2, 2021

rjyounes commented Feb 11, 2021

rjyounes commented Feb 11, 2021

rjyounes commented Feb 12, 2021 • edited Loading

rjyounes commented Dec 9, 2020 •

edited

Loading

rjyounes commented Dec 9, 2020 •

edited

Loading

rjyounes commented Dec 9, 2020 •

edited

Loading

marksem commented Dec 10, 2020 •

edited

Loading

rjyounes commented Dec 14, 2020 •

edited

Loading

rjyounes commented Dec 14, 2020 •

edited

Loading

rjyounes commented Jan 15, 2021 •

edited

Loading

rjyounes commented Jan 15, 2021 •

edited

Loading

rjyounes commented Jan 15, 2021 •

edited

Loading

rjyounes commented Jan 28, 2021 •

edited

Loading

rjyounes commented Feb 2, 2021 •

edited

Loading

rjyounes commented Feb 2, 2021 •

edited

Loading

rjyounes commented Feb 12, 2021 •

edited

Loading