Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate date strings when transforming MITAardvark records #124

Merged
merged 5 commits into from
Feb 14, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion transmogrifier/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@
"gisogm": {
"name": "OpenGeoMetadata GIS Resources",
"base-url": "https://search.libraries.mit.edu/record/",
"transform-class": "transmogrifier.sources.json.aardvark.OGMAardvark",
"transform-class": "transmogrifier.sources.json.aardvark.MITAardvark",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, OGMAardvark was referenced here before being defined which we were expecting to need because of the get_source_link method including gismit, are we thinking differently about that approach now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really glad you raised this up, for a couple of reasons.

First of all, looking more closely at the get_source_link which I had not thought to check, I believe that we can update that for MITAardvark transformations to use data from dct_references_s. In this way, it also helps explain why MIT and OGM records can share the same transformer.

Every record that comes out of GeoHarvester is an "MIT" Aardvark record in the sense that, regardless of origin institution or metadata format, we have crafted the Aardvark file in a way that meets our TIMDEX needs. During that work in GeoHarvester, quite a bit of care is taken to craft the dct_references_s field which contains URLs.

The value for dct_refereces_s['http://schema.org/url'] is what the "source link" for the record should be. For MIT records this will be https://geodata.libraries.mit.edu/record/<IDENTIFIER> and for OGM records it will be an external URL that we extracted from the source metadata; gauranteed to be present or it does not get included in the harvester output.

Taking all this together, will work on another commit that:

  1. updates get_source_link to actually read data from the record
  2. will remove base_url from the gismit and gisogm configurations, as it's not needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehanson8 - just pushed this commit and have re-requested review.

},
"researchdatabases": {
"name": "Research Databases",
Expand Down