-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language missing in the OAI-DDI #7388
Comments
Hi @bappun. I think there are multiple issues to tease out here. When you wrote that you "imported a DDI file where the French language is set at a file level also at a study level", could you write more about what you mean by "a file level"? Is that referring to metadata of data files within a dataset? |
Hi all! @pdurbin We are currently in contact with Danny about how we can contribute for some issues (that are sometimes linked). We should have more information soon! :) @jggautier I'm sorry for the "file level", I meant the root of the xml file ( |
Can't believe it's been 19 days! Sorry for this late reply. Information loss There are a few github issues about this, e.g. one about indicating the language of the metadata record as a whole (#4632) and one about indicating the language of metadata entered in specific fields (#4633). These issues follow the convention that when Dataverse imports metadata files, like the DDI xml you imported, it tries to map information in those files to fields in its own metadata model (the fields in its metadatablocks), and ignores information it can't map. Once that information is mapped to the Dataverse installation's fields, that metadata is also editable within the Dataverse repository. What hasn't been discussed in any open GitHub issues I could find is the possibility of Dataverse somehow retaining the information that it can't map to fields in its metadatablocks, instead of ignoring it; including it in its index so that its searchable; and adding it to the metadata exports. But if the metadata isn't mapped to Dataverse's fields, it won't show up in the UI and is effectively not editable by depositors/curators (at least not in through the Dataverse UI).
Language of data versus language of metadata And like you already wrote, Dataverse isn't mapping what's chosen for the "Language" metadata field, like the "French" value chosen in https://doi.org/10.21410/7E4/00LYOG, to any elements or attributes in the DDI xml that it exports. I'm not sure that DDI Codebook has one element to describe the language all of a dataset's data files. Maybe we would add a lang attribute to the metadata of each of a dataset's data files? For example:
|
At Sciences Po we only have 1 language per dataset. So we are wondering if it would be possible to add the For the datafiles I do not have more information as we only have 1 language per DDI. |
Thanks @bappun. In the Sciences Po repository, when a depositor chooses French in the Language field of the dataset, what is the depositor saying?:
I took a look at a few datasets in the Sciences Po repository but could not tell what the depositors intended since "French" is chosen in the Language field, plus the metadata and the data files of the datasets both contain text in the French language. |
Thanks @jggautier ! (there can still be some inconsistencies and no data file is present ... at the moment) |
Thanks. That's very helpful, and apologies again for the delay. I hope this illustration more clearly communicates my understanding of the XML you pasted in your first comment and clarifies the different needs: I interpret that XML (in red) as saying that:
This xml does not tell me the language of the data files. If Sciences Po needs to specify only (1) the language of the entire metadata document, (2) the dataset title in a second language, and (3) the language of the data files, here are some proposals I hope will help get your repository to a great solution: Proposal 1: Metadata form changes
Metadata mapping changes
Proposal 2: Metadata form changes
Metadata mapping changes
Proposal 3: Consider the Sciences Po repository's Citation metadatablock as a fork of the Dataverse software's Citation metadatablock, so that Sciences Po can redefine (or broaden the definition of) the current Citation metadatablock "Language" field. In the Sciences Po repository, that "Language" field would specify the language of both the metadata and the data files. The repository would also need to fork the code it uses for importing and exporting DDI Codebook XML in ways that maintain the meaning of the metadata it imports from other repositories (if it does or plans to do that) and maintain the meaning of its metadata when it's imported by other repositories. I'm only including proposal 3 to be divergent in my thinking and encourage more divergent thinking about solutions to this issue. I hope Sciences Po doesn't need to fork their code. I think that other proposals could include addressing the more robust and granular needs described in #4633. I also think that this issue is another example of how Dataverse software more flexibly handling metadata can benefit Dataverse repositories. Similar to how the customization of controlled vocabularies is being made more flexible (and the customization of metadata fields can be made more flexible), so can metadata mapping when importing and exporting metadata. But I'm assuming that making metadata mapping more flexible will take more time than you and your colleagues have for this task of reducing information loss when importing and exporting its dataset metadata. I can include illustrations (like mockups) for the first two proposals if that would help make sure that we all understand them. Looking forward to hearing what you think! |
Most of our datasets are described as "french" datasets in the metadata. For example: https://data.sciencespo.fr/dataset.xhtml?persistentId=doi:10.21410/7E4/00LYOG (detailed metadata are embedded in the dataset here).
We imported a DDI file where the French language is set at a file level also at a study level:
But this information is lost at study level, either when harvesting in oai-ddi or when downloading metadata as DDI, for example:
However it is kept in oai-dc harvesting:
<dcterms:language>French</dcterms:language>
Maybe this is two different issues: one about the import that does not save the xml:lang attribute, and another about the dataset language that is not added to the OAI-DDI.
The text was updated successfully, but these errors were encountered: