Add ability to formally record enumeration source? #402

rowlesmr · 2023-05-24T07:01:23Z

From #400 (comment), I don't think there is the ability to firmally record the source of an enumeration value.

How about the following for addition to ddl.dic:

_enumeration.source for when there is a single source for an enumeration, or there is only a single value
_enumeration_default.source for when there are different sources for each value in an enumeration.
?

These tags would go in, for example, templ_enum.cif to record whence the values came.

# not a loop dataitem
save_enumeration.source

    _definition.id                '_enumeration.source'
    _definition.update            2023-05-24
    _description.text
;
    Reference to source of value(s) used in this enumeration.
;
    _name.category_id             enumeration
    _name.object_id               source
    _type.purpose                 Describe
    _type.source                  Recorded
    _type.container               Single
    _type.contents                Text
    _description_example.case     'International Tables Vol. C Table 4.4.4.1'

save_

# loop dataitem
save_enumeration_default.source

    _definition.id                '_enumeration_default.source'
    _definition.update            2023-05-24
    _description.text
;
    Reference to source of value used in this enumeration for this key.
;
    _name.category_id             enumeration_default
    _name.object_id               source
    _type.purpose                 Describe
    _type.source                  Recorded
    _type.container               Single
    _type.contents                Text
    _description_example.case     'International Tables Vol. C Table 4.4.4.1'

save_

The text was updated successfully, but these errors were encountered:

vaitkus · 2023-05-24T20:35:47Z

A similar problem was also outlined in comment #390 (comment). Thanks for filing a separate issue.

The proposal seems ok, but I am wondering if we shouldn't further normalise it by recording the references in a separate loop and the data sources in a separate loop and then only using the id of the references in the ENUMERATION_DEFAULT loop. Something like:

loop_
_data_source.id
_data_source.reference
_data_source.description
1 "International Tables Vol. C Table 4.4.4.1" 'Lists parameter x for neutral atoms.'
2 "International Tables Vol. R Table 5" 'Lists parameter x for ions.'

loop_
_enumeration_default.index
_enumeration_default.value
_enumeration_default.data_source_id
H   1 1
H- 2  2
...

The DATA_SOURCE category could have all of the proper bibliographical fields (e.g. Title, DOI, page numbers, etc.). Listing all of the authors in a normalised way would require an additional category, though.

The aspect of having a looped and an unlooped item for the same purpose could also be preserved, but we would then have to clearly state which takes precedence, i.e. use the unlooped value unless a looped value is provided.

rowlesmr · 2023-05-25T01:41:05Z

A similar problem was also outlined in comment #390 (comment).

Yes! I knew I had seen it somewhere else!

A separate category also then allows other places to use them as needed. I'll start something as soon as my computer gets fixed...

The DATA_SOURCE category could have all of the proper bibliographical fields (e.g. Title, DOI, page numbers, etc.). Listing all of the authors in a normalised way would require an additional category, though.

This is the next question: How far down the road do we want to go with this? Just a text string, or a full database solution? I'm assuming that this lives in ddl.dic.

The aspect of having a looped and an unlooped item for the same purpose could also be preserved, but we would then have to clearly state which takes precedence, i.e. use the unlooped value unless a looped value is provided.

I think we need both, but, yes, the looped value takes precedence.

jamesrhester · 2023-05-26T05:54:11Z

I would prefer enumeration_source as the category name (instead of data_source) because only the enumeration attributes contain actual data. And we do need to go full looped because sometimes sources are mixed together, you can imagine somebody comes up with a new measurement of scattering length for a few atomic types that we might want to include.

If there is a single row in enumeration_source, then that can be taken as applying to all rows in enumeration_default, there is no need for a separate data name.

I don't think we need to go "full database" on this one, as we are not intending for checking software to go off and interpret the reference in order to check the provided values. Firstly because that is a hard problem, and secondly because it is a very rare requirement that could be fulfilled by an ad-hoc check by the person updating the values that could be done faster than software could be written. So a simple human-understandable reference is sufficient, with perhaps a DOI to find an electronic version.

vaitkus · 2023-05-26T13:57:20Z

I would prefer enumeration_source as the category name (instead of data_source) because only the enumeration attributes contain actual data. And we do need to go full looped because sometimes sources are mixed together, you can imagine somebody comes up with a new measurement of scattering length for a few atomic types that we might want to include.

Okay. Any preferences for the new name of _enumeration_default.data_source_id, e.g. _enumeration_default.source_id or _enumeration_default.enumeration_source_id?

One thing that we need to clearly communicate in the definition is that the source actually refers to the default enumeration values and not to the regular enumeration values (which could be misconstrued from the name).

If there is a single row in enumeration_source, then that can be taken as applying to all rows in enumeration_default, there is no need for a separate data name.

Great idea and it should work well with automatic key-value derivation.

I don't think we need to go "full database" on this one, as we are not intending for checking software to go off and interpret the reference in order to check the provided values. Firstly because that is a hard problem, and secondly because it is a very rare requirement that could be fulfilled by an ad-hoc check by the person updating the values that could be done faster than software could be written. So a simple human-understandable reference is sufficient, with perhaps a DOI to find an electronic version.

Ok. I guess even the _enumeration_source.description could be omitted for now and added later on if needed.

jamesrhester · 2023-05-31T01:35:33Z

Okay. Any preferences for the new name of _enumeration_default.data_source_id, e.g. _enumeration_default.source_id or _enumeration_default.enumeration_source_id?

I think _enumeration_default.source_id would work fine.

rowlesmr · 2023-06-01T05:04:21Z

OK. To summarise:

We're saying no to just adding:

_enumeration.source
_enumeration_default.source

as suggested in the first post.

We're saying yes to

_enumeration_source.id #loop category
_enumeration_source.reference

_enumeration_default.source_id

jamesrhester · 2023-06-05T05:10:26Z

Yes, that is what I think would work.

rowlesmr mentioned this issue Jun 2, 2023

Add ENUMERATION_SOURCE #406

Merged

jamesrhester closed this as completed in #406 Jun 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to formally record enumeration source? #402

Add ability to formally record enumeration source? #402

rowlesmr commented May 24, 2023 •

edited

Loading

vaitkus commented May 24, 2023

rowlesmr commented May 25, 2023

jamesrhester commented May 26, 2023

vaitkus commented May 26, 2023

jamesrhester commented May 31, 2023

rowlesmr commented Jun 1, 2023

jamesrhester commented Jun 5, 2023

Add ability to formally record enumeration source? #402

Add ability to formally record enumeration source? #402

Comments

rowlesmr commented May 24, 2023 • edited Loading

vaitkus commented May 24, 2023

rowlesmr commented May 25, 2023

jamesrhester commented May 26, 2023

vaitkus commented May 26, 2023

jamesrhester commented May 31, 2023

rowlesmr commented Jun 1, 2023

jamesrhester commented Jun 5, 2023

rowlesmr commented May 24, 2023 •

edited

Loading