Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metadata field for describing language of metadata record #4632

Closed
amberleahey opened this issue Apr 30, 2018 · 14 comments
Closed

Add metadata field for describing language of metadata record #4632

amberleahey opened this issue Apr 30, 2018 · 14 comments
Labels
Feature: Internationalization Feature: Metadata UX & UI: Design This issue needs input on the design of the UI and from the product owner Vote to Close: pdurbin
Milestone

Comments

@amberleahey
Copy link

amberleahey commented Apr 30, 2018

This would ideally support users who may use dataverse in a different language or who may enter metadata in a different language and would like that language to be tracked independent from the data or software.

from julian: From what I can tell so far, DataCite 3.1 schema lets you specify the language of Title, Subject and Description with the xml lang attribute (4.1 adds the xml lang attribute to Rights) - https://schema.datacite.org/meta/kernel-4.1/doc/DataCite-MetadataKernel_v4.1.pdf. The schema says it accepts only IETF BCP 47 and ISO 639-1 language codes. But I don't think Dataverse knows the ISO language codes for the languages it displays in the Citation block (I vaguely remember a comment about this in a github issue or maybe a Google Group post but can't find it). The Consorcio Madroño Dataverse does this with the DataCite metadata they publish for each dataset. Here's an example: https://edatos.consorciomadrono.es/api/datasets/export?exporter=oai_datacite&persistentId=doi%3A10.21950/O53TLR

And most or all of the DDI elements that Dataverse uses can include a lang attribute (http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/field_level_documentation_files/schemas/xml_xsd/attributes/lang.html). Looks like it accepts any value for now.

see related ticket #4633 about adding additional language/ translations for title, subject, and abstract fields in citation block

@jggautier
Copy link
Contributor

Thanks Amber! In our emails I was very focused on how to include this new language information in the metadata standards Dataverse uses now. A few other things to consider:

  • How best to add the field to the create and edit dataset form, so depositors can say "The metadata I'm entering is in this language"
  • Adding the specified languages to the html of each webpage where dataset metadata is displayed, like the search results page html and dataset page html? (W3C's guide on the language attribute)
  • How to make Dataverse know what the ISO language codes are for the languages that depositors choose? (If a depositor chooses French, Dataverse would add the value "fr" to the xml lang attribute)

@mhvezina
Copy link
Contributor

mhvezina commented Jun 5, 2018

Maybe a "default language for input" variable could be added to the account information for the user.

When specified in the account profile, this language could be pre-selected, when entering metadata within the Citation metadata block, from drop-down menus displayed alongside of those target fields only (Title, Description, Subject, Keyword, Notes).
The user could either change the value, if necessary, or add additional metadata in other languages.
No value would be pre-selected (default choice for drop-down) if default language for input wasn't specified in the user profile.
(I guess this means a value_lang column should be added to the datasetfieldvalue table with ISO 639-1/2/3(?) value)
(example from DSpace interface: https://jira.duraspace.org/secure/attachment/17500/language-tag.png )

@pdurbin pdurbin added the UX & UI: Design This issue needs input on the design of the UI and from the product owner label Jun 5, 2018
@BPeuch
Copy link
Contributor

BPeuch commented Jan 14, 2020

I second this request as it is now even more relevant than it used to be for a great number of DDI producers, namely: members of the Consortium of European Social Science Data Archives (CESSDA). The CESSDA Metadata Management (CMM) working group produced guidelines for harmonizing metadata produced by CESSDA members, the Core Metadata Model 20191115_Core_Metadata_Model_v1_0.pdf,
and specifying the language of the content of various metadata fields is mandatory in this DTD.

@BPeuch
Copy link
Contributor

BPeuch commented Oct 3, 2022

@pdurbin Thanks for the ping. I'd say it's certainly a step in the right direction!

What I perhaps failed to specify, for a case such as Belgium with its three official languages, is that our idea of a language feature would be more like a field at the outset of the metadata form with which the depositor/contributor can specify what language they are going to use for the metadata. (We assume they will stick to the same language, though that might be bold of us.)

At any rate, any "lang" attribute is better than none, as for instance it is required for metadata records to be harvested by the bot of the CESSDA Data Catalogue 👍 🚪

@qqmyers
Copy link
Member

qqmyers commented Oct 3, 2022

Just to be clear - the metadataLanguage functionality that was added is as you describe - a way for a user to select the language they will enter metadata in. The list of choices is constrained by admins but users then select from that list. The choice is fixed at dataset creation and users are reminded about what language they picked when they go back to edit.

@pdurbin
Copy link
Member

pdurbin commented Oct 3, 2022

What @qqmyers said.

@BPeuch if it helps, here's a screenshot @tjouneau added to this related issue:

It looks like they allow datasets to be created in five languages:

162422291-ca65fcdc-b139-4a20-ace3-129bf1b18e99

@BPeuch
Copy link
Contributor

BPeuch commented Oct 4, 2022

My bad, I thought Dataverse just added the language of the installation as value of the 'lang' attribute for all records. We are certainly going to configure and test this in our own installation!

@pdurbin
Copy link
Member

pdurbin commented Oct 4, 2022

@BPeuch great, so how to do you, @amberleahey @mhvezina @bappun @DS-INRA (and others) this feel about this issue?

Are we done? Is there more to do? Should we create smaller issues to adjust this or that? Can we close this one as done or at least mostly done? 😄

Issues are easier to work on when they are small and well defined. "Small chunks" as we like to say. 😄

@tjouneau
Copy link

tjouneau commented Oct 4, 2022

Hi
@qqmyers @BPeuch TBH we only retained two languages, English and French.
(In reply to the message above)
Best
Thomas

@mhvezina
Copy link
Contributor

mhvezina commented Oct 4, 2022

Do I understand correctly that the mentionned development (#8588) allows to specify the language of all metadata in the dataset (and that it can be only one language)? It's interesting indeed but I think we might need more granularity. So for example, a depositor might want to use one main language for the majority of the metadata describing his/her dataset (let's say English), but will want to specify some metadata in several languages (say English, French and Spanish), I'm thinking of fields like title, description, notes that could facilitate retrieval/discoverability. In this case, the depositor should be able to specify the language for those aforementioned metadata. For some other fields (e.g. keywords, subjects, affiliation), we will benefit more from referencing to standardized vocabularies (multilingual or not), thus entering unique identifiers (say URLs, URIs) that are language independant. In short, the need to specify the language of the metadata is perhaps limited to a few fields at most. What does the Community think about this?

@pdurbin
Copy link
Member

pdurbin commented Oct 4, 2022

@mhvezina thanks for your comment and question. I'm also very interested in what the community thinks.

The following issue is about having multiple languages for a dataset, as I believe you are describing:

In that issue you'll see screenshots from an installation (running a fork) that shows a dataset in English and Chinese.

To me, this issue (#4632) is about "let me say which (single) language this dataset is in". And I think, I hope, it's done already and we can close it. 😄

But who knows, maybe it's not done. We can spawn as many sub-issues as we need to. We just want to make sure they have a clear "definition of done" (if that makes sense). And "small chunks" are great if that's possible. 😄 Small issues. Small pull requests. Small changes to the code.

@bappun
Copy link

bappun commented Oct 19, 2022

I think it is done too. We currently use French and English metadataLanguage values for our datasets and it works great for our needs.

In some cases we need to have multiple metadata languages so we add translations in the corresponding fields (french and english text in the same field). #4633 would be nice to have since we would not need put multiple languages in a field or create duplicate datasets (one for each language) as a workaround.

@BPeuch
Copy link
Contributor

BPeuch commented Oct 26, 2022

@pdurbin I'm afraid I don't have the time to test this new feature at the moment, but I hope we can do this at SODHA very soon! Thanks again for refreshing the issue :)

@pdurbin pdurbin added this to the 5.7 milestone Oct 7, 2023
@pdurbin pdurbin closed this as completed Oct 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Internationalization Feature: Metadata UX & UI: Design This issue needs input on the design of the UI and from the product owner Vote to Close: pdurbin
Development

No branches or pull requests

8 participants