Metadata doesn't save #1925

AMNHcjohnson · 2023-01-07T17:08:28Z

Hi, I am trying to upload a new dataset to GBIF. However, whenever I fill out the metadata section and save it, when I go back to the resource page, nothing is saved and I fill the info out all over again. I've done it about 5 times now...what am i doing wrong? Chris

mike-podolskiy90 · 2023-01-07T17:44:43Z

@AMNHcjohnson Thank you for contacting us.
I need some more information please - what IPT version do you use? Do you have any exceptions displayed?

AMNHcjohnson · 2023-01-07T17:48:29Z

Hi Mikhail, It's Integrated Publishing Toolkit (IPT) Version 2.5.5-ra872e56 Probably out of date - it's also not showing the managed resource file I had uploaded. I didn't see any exception or anything that would indicate the information wouldn't save. Thanks. Chris

mike-podolskiy90 · 2023-01-07T18:10:02Z

I don't remember anything like that. Could you send me your IPT logs please? Or provide me with administrator rights for your IPT?
And, if possible, I would recommend you to update your IPT to the most recent version (2.6.3 currently)

AMNHcjohnson · 2023-01-07T18:30:42Z

Hi, I will have to ask our IT department to upgrade to the new version. I am not sure where to go to get the log, but if you can tell below, the AMNH-Crustacea is not showing up in my managed resources, yet, I cannot "create a new resource" with the same Name because it says it exists. Where would I find the logs for this? Thank. Chris

mike-podolskiy90 · 2023-01-07T18:53:44Z

I'm sorry I don't quite understand, you can't see the resource now? Have you deleted it or it just disappeared?
Log file is available for the admin users in the Administration -> Logs, or you can download them directly from the server: IPT data dir -> logs

AMNHcjohnson · 2023-01-07T18:54:59Z

Correct I don’t see it. I never deleted it. Christine Johnson, Curatorial Associate American Museum of Natural History

AMNHcjohnson · 2023-01-09T19:11:04Z

Hi Mikhail, This morning when I logged in, I could see my Crustacea resource. I went ahead and published it (or set it from private to public). How long does is take to determine whether everything is correct? In the log there are a few question marks – does that mean these are incorrect? Or is it still determining whether all is true? In addition, I am a big confused as to why I have datasets visible on GBIF, but when I search under our institution code, the records don’t appear although the institution code field is populated in these files. Here is the log: Archive generation started for version # 1.0 Start writing data file for Darwin Core Occurrence No lines were skipped due to errors for mapping Darwin Core Occurrence in source amnhcrustaceacollection202317b No lines were skipped due to errors for mapping Darwin Core Occurrence in source amnhcrustaceacollection202317b No lines with fewer columns than mapped for mapping Darwin Core Occurrence in source amnhcrustaceacollection202317b All lines match the filter criteria for mapping Darwin Core Occurrence in source amnhcrustaceacollection202317b Data file written for Darwin Core Occurrence with 15285 records and 53 columns All data files completed EML file added meta.xml archive descriptor written Validating the core file: occurrence.txt. Depending on the number of records, this can take a while. ? Validating the core basisOfRecord is always present and its value matches the Darwin Core Type Vocabulary. ? Validating the core ID field occurrenceID is always present and unique. No lines are missing occurrenceID No lines have duplicate occurrenceID ✓ Validated each line has a occurrenceID, and each occurrenceID is unique No lines are missing a basisOfRecord All lines have basisOfRecord that matches the Darwin Core Type Vocabulary No lines have ambiguous basisOfRecord 'occurrence'. ✓ Validated each line has a basisOfRecord, and each basisOfRecord matches the Darwin Core Type Vocabulary Archive validated Archive has been compressed Archive version # 1.0 generated successfully!

mike-podolskiy90 · 2023-01-09T19:29:55Z

I'm glad to hear you managed to publish your resource. Question mark in the publication log simply indicates that the validation process was started. As you can see further in the log the IPT reported all went successfully.

What is your dataset please? After publishing in the IPT it might take some time for the dataset to be indexed by GBIF.

AMNHcjohnson · 2023-01-12T17:14:42Z

Hi Mikhail, Sorry for the bother again, but something still isn't working correctly. Although it shows there is a dataset, the dataset search comes up with 0 occurrences. Can someone please help me determine, why this is so? Chris

AMNHcjohnson · 2023-01-12T17:30:33Z

Hi Again, In my search for my Crustacea records, I came across this "finding" - something seems very off here. Crustace in GBIF backbone is a genus under the bee family Apidae. It looks like the Benthic Baseline Biodiversity Survey has the wrong taxon string. https://www.gbif.org/dataset/36449c1f-679d-4235-b34e-1c275ebcd968 Chris

mike-podolskiy90 · 2023-01-12T18:04:59Z

@AMNHcjohnson I'm glad to help, but I don't know what dataset we're talking about. Could you send me the link please?
And, if possible, create an admin account in your IPT, that would help to diagnose what's going on.

mike-podolskiy90 · 2023-01-12T18:11:34Z

@ManonGros Could you assist with this please?

AMNHcjohnson · 2023-01-12T18:16:07Z

Thanks Mikhail, @ManonGros has access - I need to request IT to add you as well. I already asked them to install the updated release but that hasn't happened yet. What email should I give our IT for you? I realize there is something wrong with my file - when I try to keep the dates from turning into text in excel, I think something else went awry with my file. I'm always so close but never can get over this hurdle and I have 750K records I would like to share. Thanks. Chris

mike-podolskiy90 · 2023-01-12T18:18:16Z

mpodolskiy@gbif.org

AMNHcjohnson · 2023-01-12T18:32:48Z

Hi again, here is the link. I've asked IT to create an admin account for you. https://ipt.amnh.org/manage/resource.do?r=amnh-crustacea The resource is amnh-crustacea Chris

AMNHcjohnson · 2023-01-13T16:31:41Z

Hi Mikhail, Okay - our IT department has updated the IPT version & added you as a managed user (you should have received an email from them). I detected some errors in my upload file, which I fixed, and published a new dataset. It looks like everything is fine, however, I still see 0 occurrences for this dataset. https://www.gbif.org/dataset/a8035a1d-e674-4d2a-bb59-b476af6a3d6d Any help you can provide to identify the misstep would be appreciated (so I can go forward with our remaining datasets. Best, chris

AMNHcjohnson · 2023-01-17T19:47:13Z

Hi again Mikhail, I really need help – I have tried to publish this dataset many, many times – the log says successful, the ingestion history say finishReason: ABORT. I don’t understand what is wrong with the file that it can’t be publish. The dataset is American Museum of Natural History (AMNH) Crustacea Collection. Here are some links: https://www.gbif.org/dataset/a8035a1d-e674-4d2a-bb59-b476af6a3d6d https://registry.gbif.org/dataset/a8035a1d-e674-4d2a-bb59-b476af6a3d6d/ingestion-history Thanks. Chris

ManonGros · 2023-01-18T07:34:22Z

Hi @AMNHcjohnson I will take a look today

ManonGros · 2023-01-18T13:21:53Z

@AMNHcjohnson it looks like we are unable to access the archives from your IPT.
This could be due to some firewall settings. It looks like it isn't just this dataset, for example the last time we were able to access the archive from this dataset (https://www.gbif.org/dataset/a8035a1d-e674-4d2a-bb59-b476af6a3d6d) was in July 2021.
You can find more information in our IPT manual here: https://ipt.gbif.org/manual/en/ipt/latest/installation#opening-the-ipt-to-the-internet

I will close this issue as I don't think this is a problem with the IPT software. Please follow up with us at helpdesk@gbif.org, thanks!

ManonGros · 2023-01-18T13:33:26Z

@AMNHcjohnson One of my colleagues noticed that your IPT is behind Cloudflare, which is blocking machine access from our servers. You will need to configure Cloudflare to permit access to at least GBIF's servers, 130.225.43.0/25.

AMNHcjohnson · 2023-01-18T13:34:34Z

Thank Marie!!! I will forward this to our IT department. What a relief. Chris

bvirgilioamnh · 2023-01-18T18:28:16Z

Hey All! AMNH IT Here :)

I'll dig into the logs on our side of things, but my guess is that we're blocking it because it is automated/bot traffic. While we most certainly can add the range to our allow list it isn't the preferred solution as it does negate some security controls. We heavily leverage Cloudflare's Bot Management solution to help mitigate aggressive crawlers and data scrapers, unfortunately some legitimate solutions do run afoul of this. Coincidentally July 2021 is when we enabled this service within Cloudflare, so that adds up nicely.

Do the GBIF servers make requests to servers that include a specific user agent (e.g. GBIF Metadata Bot v1.0) instead of a generic user agent (e.g. Curl, Python Requests, etc)? If not, that'd be the first step. And then from there you can request that Cloudflare marks the bot as verified. We're happy to leverage our account and support team at Cloudflare to help assist with this if necessary.

https://developers.cloudflare.com/bots/reference/verified-bots-policy/

You can submit the bot verification on their Google Form:
https://forms.gle/pWVxfCj6cQgWGxDp9

Source documentation for the Google Form link (because why is Cloudflare using Google Forms for this? I'm not entirely sure...)
https://blog.cloudflare.com/friendly-bots/

-Ben

MattBlissett · 2023-01-19T10:45:35Z

Hi Ben,

The IPT tool provides a managed data repository; the purpose is to allow programmatic access to the published data, with GBIF as the primary user.

I have completed the form, though I doubt we meet the scale Cloudflare requires. I think there are only 4 IPT installations behind Cloudflare, and yours is the only one with these tightened security settings. For https://ipt.amnh.org/ we would normally make 8 HTTP requests per week.

Our user agents include COLServer (COLServer/24a3ae9 2022-12-20), org.GBIF.utils/1.16 (Java/11.0.17; M-1800000-25-2; +https://www.gbif.org/), GBIF-Url-Validator and Thumbor/6.7.0. As far as I know, no-one is currently using user agents to allow/block access to an IPT, so we have not made any particular effort to align or maintain these. A few publishers do limit access to 130.225.43.0/24.

Other biodiversity systems or researchers also access IPTs using various tools or scripting languages. In the last week, I can see two researchers/groups have used Python and RStudio to query IPTs at https://cloud.gbif.org/. Blocking Python, Curl etc will block these users.

Matt

bvirgilioamnh · 2023-01-19T19:51:54Z

Ahh ok understood. Thanks for submitting it anyways, I'll pass this up to our account rep at Cloudflare just to let them know. Un/fortunately the way the bot management works is essentially based on "machine learning" (of course taken with a grain of salt 😄) and is built off the reputation of known user agents, we're not explicitly allowing/denying them. We're just given the ability to say block automated traffic, allow "good bots", captcha "likely automated" and ultimately try to balance accessibility with excessive scraping (and other more security related issues) across all of our sites.

We'll review implementing IP level controls to address this on our end.

Thanks Matt!

mike-podolskiy90 self-assigned this Jan 7, 2023

ManonGros closed this as completed Jan 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metadata doesn't save #1925

Metadata doesn't save #1925

AMNHcjohnson commented Jan 7, 2023

mike-podolskiy90 commented Jan 7, 2023 •

edited

Loading

AMNHcjohnson commented Jan 7, 2023 via email •

edited by mike-podolskiy90

Loading

mike-podolskiy90 commented Jan 7, 2023

AMNHcjohnson commented Jan 7, 2023 via email •

edited by mike-podolskiy90

Loading

mike-podolskiy90 commented Jan 7, 2023

AMNHcjohnson commented Jan 7, 2023 via email •

edited by mike-podolskiy90

Loading

AMNHcjohnson commented Jan 9, 2023 via email •

edited by mike-podolskiy90

Loading

mike-podolskiy90 commented Jan 9, 2023

AMNHcjohnson commented Jan 12, 2023 via email •

edited by mike-podolskiy90

Loading

AMNHcjohnson commented Jan 12, 2023 via email •

edited by mike-podolskiy90

Loading

mike-podolskiy90 commented Jan 12, 2023

mike-podolskiy90 commented Jan 12, 2023

AMNHcjohnson commented Jan 12, 2023 via email •

edited by mike-podolskiy90

Loading

mike-podolskiy90 commented Jan 12, 2023

AMNHcjohnson commented Jan 12, 2023 via email •

edited by mike-podolskiy90

Loading

AMNHcjohnson commented Jan 13, 2023 via email •

edited by mike-podolskiy90

Loading

AMNHcjohnson commented Jan 17, 2023 via email •

edited by mike-podolskiy90

Loading

ManonGros commented Jan 18, 2023

ManonGros commented Jan 18, 2023

ManonGros commented Jan 18, 2023

AMNHcjohnson commented Jan 18, 2023 via email •

edited by MattBlissett

Loading

bvirgilioamnh commented Jan 18, 2023

MattBlissett commented Jan 19, 2023

bvirgilioamnh commented Jan 19, 2023

Metadata doesn't save #1925

Metadata doesn't save #1925

Comments

AMNHcjohnson commented Jan 7, 2023

mike-podolskiy90 commented Jan 7, 2023 • edited Loading

AMNHcjohnson commented Jan 7, 2023 via email • edited by mike-podolskiy90 Loading

mike-podolskiy90 commented Jan 7, 2023

AMNHcjohnson commented Jan 7, 2023 via email • edited by mike-podolskiy90 Loading

mike-podolskiy90 commented Jan 7, 2023

AMNHcjohnson commented Jan 7, 2023 via email • edited by mike-podolskiy90 Loading

AMNHcjohnson commented Jan 9, 2023 via email • edited by mike-podolskiy90 Loading

mike-podolskiy90 commented Jan 9, 2023

AMNHcjohnson commented Jan 12, 2023 via email • edited by mike-podolskiy90 Loading

AMNHcjohnson commented Jan 12, 2023 via email • edited by mike-podolskiy90 Loading

mike-podolskiy90 commented Jan 12, 2023

mike-podolskiy90 commented Jan 12, 2023

AMNHcjohnson commented Jan 12, 2023 via email • edited by mike-podolskiy90 Loading

mike-podolskiy90 commented Jan 12, 2023

AMNHcjohnson commented Jan 12, 2023 via email • edited by mike-podolskiy90 Loading

AMNHcjohnson commented Jan 13, 2023 via email • edited by mike-podolskiy90 Loading

AMNHcjohnson commented Jan 17, 2023 via email • edited by mike-podolskiy90 Loading

ManonGros commented Jan 18, 2023

ManonGros commented Jan 18, 2023

ManonGros commented Jan 18, 2023

AMNHcjohnson commented Jan 18, 2023 via email • edited by MattBlissett Loading

bvirgilioamnh commented Jan 18, 2023

MattBlissett commented Jan 19, 2023

bvirgilioamnh commented Jan 19, 2023

mike-podolskiy90 commented Jan 7, 2023 •

edited

Loading

AMNHcjohnson commented Jan 7, 2023 via email •

edited by mike-podolskiy90

Loading

AMNHcjohnson commented Jan 7, 2023 via email •

edited by mike-podolskiy90

Loading

AMNHcjohnson commented Jan 7, 2023 via email •

edited by mike-podolskiy90

Loading

AMNHcjohnson commented Jan 9, 2023 via email •

edited by mike-podolskiy90

Loading

AMNHcjohnson commented Jan 12, 2023 via email •

edited by mike-podolskiy90

Loading

AMNHcjohnson commented Jan 12, 2023 via email •

edited by mike-podolskiy90

Loading

AMNHcjohnson commented Jan 12, 2023 via email •

edited by mike-podolskiy90

Loading

AMNHcjohnson commented Jan 12, 2023 via email •

edited by mike-podolskiy90

Loading

AMNHcjohnson commented Jan 13, 2023 via email •

edited by mike-podolskiy90

Loading

AMNHcjohnson commented Jan 17, 2023 via email •

edited by mike-podolskiy90

Loading

AMNHcjohnson commented Jan 18, 2023 via email •

edited by MattBlissett

Loading