-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metadata field for describing language of metadata record #4632
Comments
Thanks Amber! In our emails I was very focused on how to include this new language information in the metadata standards Dataverse uses now. A few other things to consider:
|
Maybe a "default language for input" variable could be added to the account information for the user. When specified in the account profile, this language could be pre-selected, when entering metadata within the Citation metadata block, from drop-down menus displayed alongside of those target fields only (Title, Description, Subject, Keyword, Notes). |
I second this request as it is now even more relevant than it used to be for a great number of DDI producers, namely: members of the Consortium of European Social Science Data Archives (CESSDA). The CESSDA Metadata Management (CMM) working group produced guidelines for harmonizing metadata produced by CESSDA members, the Core Metadata Model 20191115_Core_Metadata_Model_v1_0.pdf, |
@amberleahey @mhvezina @BPeuch does this help? https://guides.dataverse.org/en/5.11.1/installation/config.html#allowing-the-language-used-for-dataset-metadata-to-be-specified It was added in 5.7 with this pull request: |
@pdurbin Thanks for the ping. I'd say it's certainly a step in the right direction! What I perhaps failed to specify, for a case such as Belgium with its three official languages, is that our idea of a language feature would be more like a field at the outset of the metadata form with which the depositor/contributor can specify what language they are going to use for the metadata. (We assume they will stick to the same language, though that might be bold of us.) At any rate, any "lang" attribute is better than none, as for instance it is required for metadata records to be harvested by the bot of the CESSDA Data Catalogue 👍 🚪 |
Just to be clear - the metadataLanguage functionality that was added is as you describe - a way for a user to select the language they will enter metadata in. The list of choices is constrained by admins but users then select from that list. The choice is fixed at dataset creation and users are reminded about what language they picked when they go back to edit. |
My bad, I thought Dataverse just added the language of the installation as value of the 'lang' attribute for all records. We are certainly going to configure and test this in our own installation! |
@BPeuch great, so how to do you, @amberleahey @mhvezina @bappun @DS-INRA (and others) this feel about this issue? Are we done? Is there more to do? Should we create smaller issues to adjust this or that? Can we close this one as done or at least mostly done? 😄 Issues are easier to work on when they are small and well defined. "Small chunks" as we like to say. 😄 |
Do I understand correctly that the mentionned development (#8588) allows to specify the language of all metadata in the dataset (and that it can be only one language)? It's interesting indeed but I think we might need more granularity. So for example, a depositor might want to use one main language for the majority of the metadata describing his/her dataset (let's say English), but will want to specify some metadata in several languages (say English, French and Spanish), I'm thinking of fields like title, description, notes that could facilitate retrieval/discoverability. In this case, the depositor should be able to specify the language for those aforementioned metadata. For some other fields (e.g. keywords, subjects, affiliation), we will benefit more from referencing to standardized vocabularies (multilingual or not), thus entering unique identifiers (say URLs, URIs) that are language independant. In short, the need to specify the language of the metadata is perhaps limited to a few fields at most. What does the Community think about this? |
@mhvezina thanks for your comment and question. I'm also very interested in what the community thinks. The following issue is about having multiple languages for a dataset, as I believe you are describing: In that issue you'll see screenshots from an installation (running a fork) that shows a dataset in English and Chinese. To me, this issue (#4632) is about "let me say which (single) language this dataset is in". And I think, I hope, it's done already and we can close it. 😄 But who knows, maybe it's not done. We can spawn as many sub-issues as we need to. We just want to make sure they have a clear "definition of done" (if that makes sense). And "small chunks" are great if that's possible. 😄 Small issues. Small pull requests. Small changes to the code. |
I think it is done too. We currently use French and English metadataLanguage values for our datasets and it works great for our needs. In some cases we need to have multiple metadata languages so we add translations in the corresponding fields (french and english text in the same field). #4633 would be nice to have since we would not need put multiple languages in a field or create duplicate datasets (one for each language) as a workaround. |
@pdurbin I'm afraid I don't have the time to test this new feature at the moment, but I hope we can do this at SODHA very soon! Thanks again for refreshing the issue :) |
This would ideally support users who may use dataverse in a different language or who may enter metadata in a different language and would like that language to be tracked independent from the data or software.
from julian: From what I can tell so far, DataCite 3.1 schema lets you specify the language of Title, Subject and Description with the xml lang attribute (4.1 adds the xml lang attribute to Rights) - https://schema.datacite.org/meta/kernel-4.1/doc/DataCite-MetadataKernel_v4.1.pdf. The schema says it accepts only IETF BCP 47 and ISO 639-1 language codes. But I don't think Dataverse knows the ISO language codes for the languages it displays in the Citation block (I vaguely remember a comment about this in a github issue or maybe a Google Group post but can't find it). The Consorcio Madroño Dataverse does this with the DataCite metadata they publish for each dataset. Here's an example: https://edatos.consorciomadrono.es/api/datasets/export?exporter=oai_datacite&persistentId=doi%3A10.21950/O53TLR
And most or all of the DDI elements that Dataverse uses can include a lang attribute (http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/field_level_documentation_files/schemas/xml_xsd/attributes/lang.html). Looks like it accepts any value for now.
see related ticket #4633 about adding additional language/ translations for title, subject, and abstract fields in citation block
The text was updated successfully, but these errors were encountered: