-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata doesn't save #1925
Comments
@AMNHcjohnson Thank you for contacting us. |
Hi Mikhail,
It's Integrated Publishing Toolkit (IPT) Version 2.5.5-ra872e56
Probably out of date - it's also not showing the managed resource file I had uploaded. I didn't see any exception or anything that would indicate the information wouldn't save.
Thanks.
Chris
|
I don't remember anything like that. Could you send me your IPT logs please? Or provide me with administrator rights for your IPT? |
Hi,
I will have to ask our IT department to upgrade to the new version. I am not sure where to go to get the log, but if you can tell below, the AMNH-Crustacea is not showing up in my managed resources, yet, I cannot "create a new resource" with the same
Name because it says it exists.
Where would I find the logs for this?
Thank.
Chris
|
I'm sorry I don't quite understand, you can't see the resource now? Have you deleted it or it just disappeared? |
Correct I don’t see it. I never deleted it.
Christine Johnson, Curatorial Associate
American Museum of Natural History
|
Hi Mikhail,
This morning when I logged in, I could see my Crustacea resource.
I went ahead and published it (or set it from private to public). How long does is take to determine whether everything is correct? In the log there are a few question marks – does that mean these are incorrect? Or is it still determining whether all is true?
In addition, I am a big confused as to why I have datasets visible on GBIF, but when I search under our institution code, the records don’t appear although the institution code field is populated in these files.
Here is the log:
Archive generation started for version # 1.0
Start writing data file for Darwin Core Occurrence
No lines were skipped due to errors for mapping Darwin Core Occurrence in source amnhcrustaceacollection202317b
No lines were skipped due to errors for mapping Darwin Core Occurrence in source amnhcrustaceacollection202317b
No lines with fewer columns than mapped for mapping Darwin Core Occurrence in source amnhcrustaceacollection202317b
All lines match the filter criteria for mapping Darwin Core Occurrence in source amnhcrustaceacollection202317b
Data file written for Darwin Core Occurrence with 15285 records and 53 columns
All data files completed
EML file added
meta.xml archive descriptor written
Validating the core file: occurrence.txt. Depending on the number of records, this can take a while.
? Validating the core basisOfRecord is always present and its value matches the Darwin Core Type Vocabulary.
? Validating the core ID field occurrenceID is always present and unique.
No lines are missing occurrenceID
No lines have duplicate occurrenceID
✓ Validated each line has a occurrenceID, and each occurrenceID is unique
No lines are missing a basisOfRecord
All lines have basisOfRecord that matches the Darwin Core Type Vocabulary
No lines have ambiguous basisOfRecord 'occurrence'.
✓ Validated each line has a basisOfRecord, and each basisOfRecord matches the Darwin Core Type Vocabulary
Archive validated
Archive has been compressed
Archive version # 1.0 generated successfully!
|
I'm glad to hear you managed to publish your resource. Question mark in the publication log simply indicates that the validation process was started. As you can see further in the log the IPT reported all went successfully. What is your dataset please? After publishing in the IPT it might take some time for the dataset to be indexed by GBIF. |
Hi Mikhail,
Sorry for the bother again, but something still isn't working correctly. Although it shows there is a dataset, the dataset search comes up with 0 occurrences. Can someone please help me determine, why this is so?
Chris
|
Hi Again,
In my search for my Crustacea records, I came across this "finding" - something seems very off here. Crustace in GBIF backbone is a genus under the bee family Apidae.
It looks like the Benthic Baseline Biodiversity Survey has the wrong taxon string.
https://www.gbif.org/dataset/36449c1f-679d-4235-b34e-1c275ebcd968
Chris
|
@AMNHcjohnson I'm glad to help, but I don't know what dataset we're talking about. Could you send me the link please? |
@ManonGros Could you assist with this please? |
Thanks Mikhail, @ManonGros has access - I need to request IT to add you as well. I already asked them to install the updated release but that hasn't happened yet.
What email should I give our IT for you?
I realize there is something wrong with my file - when I try to keep the dates from turning into text in excel, I think something else went awry with my file.
I'm always so close but never can get over this hurdle and I have 750K records I would like to share.
Thanks.
Chris
|
Hi again, here is the link. I've asked IT to create an admin account for you.
https://ipt.amnh.org/manage/resource.do?r=amnh-crustacea
The resource is amnh-crustacea
Chris
|
Hi Mikhail,
Okay - our IT department has updated the IPT version & added you as a managed user (you should have received an email from them). I detected some errors in my upload file, which I fixed, and published a new dataset. It looks like everything is fine, however, I still see 0 occurrences for this dataset.
https://www.gbif.org/dataset/a8035a1d-e674-4d2a-bb59-b476af6a3d6d
Any help you can provide to identify the misstep would be appreciated (so I can go forward with our remaining datasets.
Best,
chris
|
Hi again Mikhail,
I really need help – I have tried to publish this dataset many, many times – the log says successful, the ingestion history say finishReason: ABORT.
I don’t understand what is wrong with the file that it can’t be publish.
The dataset is American Museum of Natural History (AMNH) Crustacea Collection.
Here are some links:
https://www.gbif.org/dataset/a8035a1d-e674-4d2a-bb59-b476af6a3d6d
https://registry.gbif.org/dataset/a8035a1d-e674-4d2a-bb59-b476af6a3d6d/ingestion-history
Thanks.
Chris
|
Hi @AMNHcjohnson I will take a look today |
@AMNHcjohnson it looks like we are unable to access the archives from your IPT. I will close this issue as I don't think this is a problem with the IPT software. Please follow up with us at helpdesk@gbif.org, thanks! |
@AMNHcjohnson One of my colleagues noticed that your IPT is behind Cloudflare, which is blocking machine access from our servers. You will need to configure Cloudflare to permit access to at least GBIF's servers, 130.225.43.0/25. |
Thank Marie!!!
I will forward this to our IT department. What a relief.
Chris
|
Hey All! AMNH IT Here :) I'll dig into the logs on our side of things, but my guess is that we're blocking it because it is automated/bot traffic. While we most certainly can add the range to our allow list it isn't the preferred solution as it does negate some security controls. We heavily leverage Cloudflare's Bot Management solution to help mitigate aggressive crawlers and data scrapers, unfortunately some legitimate solutions do run afoul of this. Coincidentally July 2021 is when we enabled this service within Cloudflare, so that adds up nicely. Do the GBIF servers make requests to servers that include a specific user agent (e.g. GBIF Metadata Bot v1.0) instead of a generic user agent (e.g. Curl, Python Requests, etc)? If not, that'd be the first step. And then from there you can request that Cloudflare marks the bot as verified. We're happy to leverage our account and support team at Cloudflare to help assist with this if necessary. https://developers.cloudflare.com/bots/reference/verified-bots-policy/ You can submit the bot verification on their Google Form: Source documentation for the Google Form link (because why is Cloudflare using Google Forms for this? I'm not entirely sure...) -Ben |
Hi Ben, The IPT tool provides a managed data repository; the purpose is to allow programmatic access to the published data, with GBIF as the primary user. I have completed the form, though I doubt we meet the scale Cloudflare requires. I think there are only 4 IPT installations behind Cloudflare, and yours is the only one with these tightened security settings. For https://ipt.amnh.org/ we would normally make 8 HTTP requests per week. Our user agents include Other biodiversity systems or researchers also access IPTs using various tools or scripting languages. In the last week, I can see two researchers/groups have used Python and RStudio to query IPTs at https://cloud.gbif.org/. Blocking Python, Curl etc will block these users. Matt |
Ahh ok understood. Thanks for submitting it anyways, I'll pass this up to our account rep at Cloudflare just to let them know. Un/fortunately the way the bot management works is essentially based on "machine learning" (of course taken with a grain of salt 😄) and is built off the reputation of known user agents, we're not explicitly allowing/denying them. We're just given the ability to say block automated traffic, allow "good bots", captcha "likely automated" and ultimately try to balance accessibility with excessive scraping (and other more security related issues) across all of our sites. We'll review implementing IP level controls to address this on our end. Thanks Matt! |
Hi, I am trying to upload a new dataset to GBIF. However, whenever I fill out the metadata section and save it, when I go back to the resource page, nothing is saved and I fill the info out all over again. I've done it about 5 times now...what am i doing wrong? Chris
The text was updated successfully, but these errors were encountered: