-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review 12 unpublished datasets with unreserved DOIs, check for duplicates, contact depositors #203
Comments
Thanks for capturing this, Julian! Let me know if we need further
discussion.
…On Fri, Dec 2, 2022 at 12:07 PM Julian Gautier ***@***.***> wrote:
After the recent DataCite outage, I used an API endpoint
<https://urldefense.proofpoint.com/v2/url?u=https-3A__guides.dataverse.org_en_5.12_api_native-2Dapi.html-3Fhighlight-3Dreserve-23list-2Dunreserved-2Dpids&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=VkDzXXCXFMYJRZaZHRHmy89FtZ3-d84fkyBvg4FanlzHco--epzT8j0LSMhVmGw6&s=IYr3KApN-ltRhAm3tzrpi2UhtbNh6s13tA2pkM6JzG8&e=>
to see if other datasets in the Harvard repo are unpublished with
unreserved DOIs. There were 17, including 5 datasets created on the day of
the DataCite outage. I used another endpoint
<https://urldefense.proofpoint.com/v2/url?u=https-3A__guides.dataverse.org_en_5.12_api_native-2Dapi.html-3Fhighlight-3Dreserve-23reserve-2Da-2Dpid&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=VkDzXXCXFMYJRZaZHRHmy89FtZ3-d84fkyBvg4FanlzHco--epzT8j0LSMhVmGw6&s=v2nveZbI7tEcTSXu1GutCVO8yIJ1a5fvK9bcWgrlkR4&e=>
to reserve the DOIs of those 5, published the datasets, and followed up the
depositors that emailed the support email to let them know their datasets
were published.
The other 12 unpublished datasets whose DOIs are unreserved were created
between 2019 and 2021. Info about them are in Google Sheets
<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_spreadsheets_d_10hWVBb-2D9GiyBrdx4yZ9RvuuN2S9cXuKa9IuE-2DM1VJ5o&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=VkDzXXCXFMYJRZaZHRHmy89FtZ3-d84fkyBvg4FanlzHco--epzT8j0LSMhVmGw6&s=AZ9Hnd0wqn2Q1khxdHscG-9rhLOJ86OrzaWyQOlb04o&e=>
.
Since these datasets have been unpublished for a year or longer, we should:
- Search the repository to check that the depositors haven't published
their data in another dataset. Depositors do this sometimes when datasets
are locked for a long time (IQSS/dataverse-HDV-Curation#402
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse-2DHDV-2DCuration_issues_402&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=VkDzXXCXFMYJRZaZHRHmy89FtZ3-d84fkyBvg4FanlzHco--epzT8j0LSMhVmGw6&s=EuwHDH-CQzvNSEYmVM88o6oMNaGOtKN1yWediKu1UNc&e=>),
so I think they might've done that here, and we want to avoid having two
datasets published with the same data.
- Email the depositors:
- If we find datasets with the same data, email the depositors to
confirm the duplicate datasets and let them know we'll be deleting the
duplicates.
- For datasets where we couldn't find duplicates, reserve the
dataset's DOIs and let the depositors know know that their datasets are
still unpublished and that they should be able to publish them when they're
ready.
If the depositors don't reply, these unpublished datasets will eventually
be included in the Harvard repo curation team's "production cleanup," where
the team will try to contact depositors of datasets that have been
unpublished for a certain length of time to encourage the depositors to
publish, and the team will remove the datasets if we can't get in touch
with the depositors.
—
Reply to this email directly, view it on GitHub
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse.harvard.edu_issues_203&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=VkDzXXCXFMYJRZaZHRHmy89FtZ3-d84fkyBvg4FanlzHco--epzT8j0LSMhVmGw6&s=Bjf4zKtWwqtVyU-CgDX-OAaiueUlPRNhB-9jtdNlzIk&e=>,
or unsubscribe
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AB7P2KS4S6ZXQ627NG65FSTWLIUEFANCNFSM6AAAAAASSDYFF4&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=VkDzXXCXFMYJRZaZHRHmy89FtZ3-d84fkyBvg4FanlzHco--epzT8j0LSMhVmGw6&s=e1SGobP_uEm0cTsqY3v2kCljnkMCy0w-us2RmTzonK4&e=>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Sonia Barbosa
Manager of Data Curation, The Harvard Dataverse Repository
Manager of the Murray Research Archive <http://Murray.harvard.edu>, IQSS
The Dataverse Project <http://dataverse.org>
Data Science
Harvard University
Visit our Harvard Dataverse support website:
https://support.dataverse.harvard.edu/
Need to deposit data? Visit http://dataverse.harvard.edu
Harvard Library RDM services: <http://goog_1421170368>
https://hlrdm.library.harvard.edu/network
All Harvard Dataverse Repository inquiries should be sent to:
***@***.***
All software inquiries should be sent to: ***@***.***
Interested in sharing sensitive data? Coming soon to Harvard Dataverse:
http://datatags.org/
All test Dataverse Collections should be created in our demo environment:
https://demo.dataverse.org/
Join our Dataverse Community!
https://groups.google.com/forum/#!forum/dataverse-communit
<https://groups.google.com/forum/#!forum/dataverse-community>y
|
Thanks. I was able to reserve PIDs for 10 of the 12 datasets, after making sure the data hadn't already been published in other datasets. The spreadsheet includes the urls of the two datasets whose PIDs I haven't reserved.
|
The depositor of one of the two remaining datasets replied over the winter break and I was able to remove that unpublished dataset. Just one dataset to go. I just sent a follow up email (https://help.hmdc.harvard.edu/Ticket/Display.html?id=293853) |
I haven't heard back from the depositor of the last dataset whose DOI was unreserved. Because it's an unpublished dataset, I just removed what was typed in the Producer Name field, re-saved the unpublished dataset, and used the API endpoint to reserve the DOI. The curation team will probably remove this unpublished dataset eventually since it's pretty old. I found another dataset whose DOI was unreserved and I was able to use the API endpoint to reserve it. It looks like these unreserved DOI errors don't happen as often as datasets being locked for a long time (https://github.com/IQSS/dataverse-HDV-Curation/issues/345), but I'll be checking every so often to see if any datasets' DOIs aren't reserved and reserve them. |
After the recent DataCite outage, I used an API endpoint to see if other datasets in the Harvard repo are unpublished with unreserved DOIs. There were 17, including 5 datasets created on the day of the DataCite outage. I used another endpoint to reserve the DOIs of those 5, published the datasets, and followed up the depositors that emailed the support email to let them know their datasets were published.
The other 12 unpublished datasets whose DOIs are unreserved were created between 2019 and 2021. Info about them are in Google Sheets.
Since these datasets have been unpublished for a year or longer, we should:
If the depositors don't reply, these unpublished datasets will eventually be included in the Harvard repo curation team's "production cleanup," where the team will try to contact depositors of datasets that have been unpublished for a certain length of time to encourage the depositors to publish, and the team will remove the datasets if we can't get in touch with the depositors.
The text was updated successfully, but these errors were encountered: