Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to formally record enumeration source? #402

Closed
rowlesmr opened this issue May 24, 2023 · 7 comments · Fixed by #406
Closed

Add ability to formally record enumeration source? #402

rowlesmr opened this issue May 24, 2023 · 7 comments · Fixed by #406

Comments

@rowlesmr
Copy link
Collaborator

rowlesmr commented May 24, 2023

From #400 (comment), I don't think there is the ability to firmally record the source of an enumeration value.

How about the following for addition to ddl.dic:

  • _enumeration.source for when there is a single source for an enumeration, or there is only a single value
  • _enumeration_default.source for when there are different sources for each value in an enumeration.
    ?

These tags would go in, for example, templ_enum.cif to record whence the values came.

# not a loop dataitem
save_enumeration.source

    _definition.id                '_enumeration.source'
    _definition.update            2023-05-24
    _description.text
;
    Reference to source of value(s) used in this enumeration.
;
    _name.category_id             enumeration
    _name.object_id               source
    _type.purpose                 Describe
    _type.source                  Recorded
    _type.container               Single
    _type.contents                Text
    _description_example.case     'International Tables Vol. C Table 4.4.4.1'

save_
# loop dataitem
save_enumeration_default.source

    _definition.id                '_enumeration_default.source'
    _definition.update            2023-05-24
    _description.text
;
    Reference to source of value used in this enumeration for this key.
;
    _name.category_id             enumeration_default
    _name.object_id               source
    _type.purpose                 Describe
    _type.source                  Recorded
    _type.container               Single
    _type.contents                Text
    _description_example.case     'International Tables Vol. C Table 4.4.4.1'

save_
@vaitkus
Copy link
Collaborator

vaitkus commented May 24, 2023

A similar problem was also outlined in comment #390 (comment). Thanks for filing a separate issue.

The proposal seems ok, but I am wondering if we shouldn't further normalise it by recording the references in a separate loop and the data sources in a separate loop and then only using the id of the references in the ENUMERATION_DEFAULT loop. Something like:

loop_
_data_source.id
_data_source.reference
_data_source.description
1 "International Tables Vol. C Table 4.4.4.1" 'Lists parameter x for neutral atoms.'
2 "International Tables Vol. R Table 5" 'Lists parameter x for ions.'

loop_
_enumeration_default.index
_enumeration_default.value
_enumeration_default.data_source_id
H   1 1
H- 2  2
...

The DATA_SOURCE category could have all of the proper bibliographical fields (e.g. Title, DOI, page numbers, etc.). Listing all of the authors in a normalised way would require an additional category, though.

The aspect of having a looped and an unlooped item for the same purpose could also be preserved, but we would then have to clearly state which takes precedence, i.e. use the unlooped value unless a looped value is provided.

@rowlesmr
Copy link
Collaborator Author

A similar problem was also outlined in comment #390 (comment).

Yes! I knew I had seen it somewhere else!

A separate category also then allows other places to use them as needed. I'll start something as soon as my computer gets fixed...

The DATA_SOURCE category could have all of the proper bibliographical fields (e.g. Title, DOI, page numbers, etc.). Listing all of the authors in a normalised way would require an additional category, though.

This is the next question: How far down the road do we want to go with this? Just a text string, or a full database solution? I'm assuming that this lives in ddl.dic.

The aspect of having a looped and an unlooped item for the same purpose could also be preserved, but we would then have to clearly state which takes precedence, i.e. use the unlooped value unless a looped value is provided.

I think we need both, but, yes, the looped value takes precedence.

@jamesrhester
Copy link
Contributor

I would prefer enumeration_source as the category name (instead of data_source) because only the enumeration attributes contain actual data. And we do need to go full looped because sometimes sources are mixed together, you can imagine somebody comes up with a new measurement of scattering length for a few atomic types that we might want to include.

If there is a single row in enumeration_source, then that can be taken as applying to all rows in enumeration_default, there is no need for a separate data name.

I don't think we need to go "full database" on this one, as we are not intending for checking software to go off and interpret the reference in order to check the provided values. Firstly because that is a hard problem, and secondly because it is a very rare requirement that could be fulfilled by an ad-hoc check by the person updating the values that could be done faster than software could be written. So a simple human-understandable reference is sufficient, with perhaps a DOI to find an electronic version.

@vaitkus
Copy link
Collaborator

vaitkus commented May 26, 2023

I would prefer enumeration_source as the category name (instead of data_source) because only the enumeration attributes contain actual data. And we do need to go full looped because sometimes sources are mixed together, you can imagine somebody comes up with a new measurement of scattering length for a few atomic types that we might want to include.

Okay. Any preferences for the new name of _enumeration_default.data_source_id, e.g. _enumeration_default.source_id or _enumeration_default.enumeration_source_id?

One thing that we need to clearly communicate in the definition is that the source actually refers to the default enumeration values and not to the regular enumeration values (which could be misconstrued from the name).

If there is a single row in enumeration_source, then that can be taken as applying to all rows in enumeration_default, there is no need for a separate data name.

Great idea and it should work well with automatic key-value derivation.

I don't think we need to go "full database" on this one, as we are not intending for checking software to go off and interpret the reference in order to check the provided values. Firstly because that is a hard problem, and secondly because it is a very rare requirement that could be fulfilled by an ad-hoc check by the person updating the values that could be done faster than software could be written. So a simple human-understandable reference is sufficient, with perhaps a DOI to find an electronic version.

Ok. I guess even the _enumeration_source.description could be omitted for now and added later on if needed.

@jamesrhester
Copy link
Contributor

Okay. Any preferences for the new name of _enumeration_default.data_source_id, e.g. _enumeration_default.source_id or _enumeration_default.enumeration_source_id?

I think _enumeration_default.source_id would work fine.

@rowlesmr
Copy link
Collaborator Author

rowlesmr commented Jun 1, 2023

OK. To summarise:

We're saying no to just adding:

_enumeration.source
_enumeration_default.source

as suggested in the first post.

We're saying yes to

_enumeration_source.id #loop category
_enumeration_source.reference

_enumeration_default.source_id

@jamesrhester
Copy link
Contributor

Yes, that is what I think would work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants