Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update MITAardvark 'rights' field method to support access filter #114

Merged

Conversation

jonavellecuerdo
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo commented Jan 24, 2024

Purpose and background context

The planned UI mockup for the updated TIMDEX UI includes a feature to filter records
based on "Access to files". This requires a static, aggregatable string value to be present
in a TIMDEX record that can be used by the TIMDEX API to retrieve relevant records
from OpenSearch.

How can a reviewer manually see the effects of these changes?

Review updated test_aardvark_get_rights_success: tests/sources/json/test_aardvark.py.

Run make test and confirm all tests are passing.

Includes new or updated dependencies?

NO

Changes expectations for external applications?

YES

What are the relevant tickets?

https://mitlibraries.atlassian.net/browse/GDT-138

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed and verified
  • New dependencies are appropriate or there were no changes

Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Just left a comment about the logic (which looks good) and it's relationship to required fields, and gismit vs gisogm records. But as stated in the comment, likely out of scope for this PR.

transmogrifier/sources/json/aardvark.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach, just some questions and suggestions about testing and fixtures

tests/fixtures/aardvark/aardvark_record_all_fields.jsonl Outdated Show resolved Hide resolved
tests/sources/json/test_aardvark.py Outdated Show resolved Hide resolved
Comment on lines 357 to 358
kind_access_rights = "Access"
kind_access_to_files = "Access to files"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get "Access to files" but I'm unclear about the difference here, what is "Access" referring to? And it looks like kind_access_rights is only used once so does it need to be a variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure of the intention of kind = "Access", but all it seems to do is retrieve the value from the Aardvark field dct_accessRights_s. For the MITAardvark transformer, the get_rights() field method essentially pulls from fields under the Aardvark Rights category.

@ehanson8 and @ghukill - What do you think of updating kind="Access" -> kind="Access rights"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a better way of stating my question is: is there actually a difference between Access and Access to files? There may not be but if we have 2 different strings, the user may think there's a difference

Copy link
Contributor

@ghukill ghukill Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ehanson8 about "Access" vs "Access to files" being very similar and potentially confusing. But the language for the values, and the language for that kind both come directly from the UI mockup:

7328473a-bcf5-4d91-9fbb-6458922eef72

While maybe we could make the kind something like "Download Access", we'd need to confirm with EngX that the actual "kind" string could differ from the text on the UI. I imagine it could, but unsure.

OR, we could just continue with "Access to files" (which does kind of suggest downloading), and then iterate later if determined confusing.

As far as the string values go (e.g. "Not owned by MIT"), I'd imagine those are good as-is because they will render directly on the page via the aggregation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll keep it as-is for now since this is a draft and update as needed later! Whatever we decide, I think it could (maybe) be a good thing if the names align in transmogrifier and the timdex/UI...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with that as is given the draft status and I get it now seeing the UI, but to me Access to files seems like the only type of access users would care about. But this is better sorted out in inter-team discussions later!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some context (this may no longer be relevant, I haven't checked to see what you all decided on in the code).

In general, field values can be manipulated between OpenSearch and TIMDEX API (or TIMDEX API and TIMDEX UI) when we have a good reason to do so, but when we find ourselves wanting to do that we should probably pause and ask why. If UX is suggesting a label in the UI is the best way to describe something from a user perspective, is there a reason an API user would not find the same language helpful? Or is there a reason we want to introduce potential future confusion between what is in OpenSearch and TIMDEX API?

So before forking our language across tiers we should reach out to UX and explain the situation and ask if they would consider language we feel is more specific or accurate. If there is a good reason to make a distinction between how we refer to data in TIMDEX Spec and TIMDEX API/TIMDEX UI, we can definitely support it... we just need to be sure that's what we want :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I better see your point @ehanson8, that conceptually what's even the difference... And, agreed about inter-team discussing, which this slack thread might tease out.

AND, one of the open questions for GDT-138 is whether this will even be a Rights entry, or maybe a dedicated field, so might all be moot if it's a new field.

transmogrifier/sources/json/aardvark.py Outdated Show resolved Hide resolved
transmogrifier/sources/json/aardvark.py Outdated Show resolved Hide resolved
if source == "gisogm":
rights.append(
timdex.Rights(
description="Not owned by MIT", kind=kind_access_to_files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious to get GIS staff feedback on phrasing, I don't know what is expected here (or if it only matters as a facet)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehanson8 Thank you for the really helpful suggestions! Can you clarify what you mean by this comment? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is likely that we may have MITAardvark and OGMAardvark classes since we hardcoded a gismit string into MITAardvark.get_source_link() but we may find a better solution that avoids that

@jonavellecuerdo jonavellecuerdo force-pushed the GDT-138-update-timdex-record-to-enable-access-filter branch 2 times, most recently from 8ec640f to ad4303a Compare January 25, 2024 14:50
@jonavellecuerdo
Copy link
Contributor Author

@ehanson8 @ghukill Let me know what you think of the latest changes!

  1. The UI mockup indicates "MIT Authentication", but I propose "Required MIT authentication" so that all string values will be adjectives. 🤓
  2. Acknowledging that the if-else conditions in MITAardvark.get_rights() field method may not be needed once we create an OGMAardvark transformer.
  3. We will circle back on Rights.kind values and will check in with EngX (@JPrevost) on whether or not the values in the TIMDEX record can (or should) differ from the TIMDEX (Geoweb) UI.

@JPrevost
Copy link
Member

@jonavellecuerdo I haven't read through the full discussion here, but is there a transform in dev1 S3 that has been created with this branch so I can pull in the data to better understand the ramifications?

@jonavellecuerdo
Copy link
Contributor Author

@JPrevost Thank you for asking! Here is an example of a transform in dev1 S3 with the proposed changes: JSON file. Also, I'll be sending over some notes from the spike work investigating nested, filtered aggregations in OpenSearch (heads up -- yes we can 😄 ).

@jonavellecuerdo jonavellecuerdo force-pushed the GDT-138-update-timdex-record-to-enable-access-filter branch from ad4303a to 0ec0931 Compare February 20, 2024 17:19
Why these changes are being introduced:
* Enable filtering TIMDEX records based on "Access to files" by
providing a static, aggregatable string value indicating the category
for a given record.

How this addresses that need:
* Update MITAardvark.get_rights field method
   * Accept MITAardvark.source as an argument to determine ownership
   * Use "dct_accessRights_s" to determine access category

Side effects of this change:
* Need to determine whether OpenSearch can limit aggregation to a field that is
filtered by a subfield (i.e., Rights.kind) and make necessary updates to timdex-ui.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/GDT-138
@jonavellecuerdo jonavellecuerdo force-pushed the GDT-138-update-timdex-record-to-enable-access-filter branch from bd820db to 8c54755 Compare February 20, 2024 17:39
@jonavellecuerdo
Copy link
Contributor Author

@ehanson8 @ghukill

Please see the latest commit. Here are the main takeaways:

  1. Ultimately, we confirmed that the proposed solution of introducing a new instance of "Rights" to TimdexRecord.rights field to support "Access to file" filters is (a) technically achievable with openSearch and (b) it is reasonable to use this field in this manner, as exemplified by at least one other transformer.

  2. During one of our sync-up meetings, @ghukill and I thought that a solution to address the comment on "Access" vs. "Access to files" could be to update the code such that:

timdex.Rights(description=<geoharvester.harvester.records.MITAardvark.dct_accessRights_s>, kind="Access rights")
  1. I decided to update "Requires MIT authentication" -> "MIT authentication" to align the value with the proposed UI's display.

Let me know what you think!

@jonavellecuerdo jonavellecuerdo marked this pull request as ready for review February 20, 2024 18:00
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, approved.

I imagine the TIMDEX data model questions will continue (and currently are for other fields), but this code looks good given the decision to store this information in TIMDEX.rights.

Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggested change

transmogrifier/sources/json/aardvark.py Show resolved Hide resolved
@jonavellecuerdo jonavellecuerdo merged commit c75fd04 into main Feb 21, 2024
5 checks passed
@jonavellecuerdo jonavellecuerdo deleted the GDT-138-update-timdex-record-to-enable-access-filter branch February 21, 2024 16:35
jonavellecuerdo added a commit to MITLibraries/timdex-index-manager that referenced this pull request Feb 21, 2024
…ords

Why these changes are being introduced:
* This change is required in order to perform aggregations on the 'description'
property of the 'rights' field. More specifically, this work is to enable
filtering based on "Access to files" and support the updates to the TIMDEX
data model in the PR: MITLibraries/transmogrifier#114

How this addresses that need:
* Update config to convert 'rights.description' to a multifield that indexes
the value as a "keyword" in a similarly named subfield (i.e., 'description.keyword')

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/GDT-138
jonavellecuerdo added a commit that referenced this pull request Feb 26, 2024
* Update MITAardvark 'rights' field method to support access filter

Why these changes are being introduced:
* Enable filtering TIMDEX records based on "Access to files" by
providing a static, aggregatable string value indicating the category
for a given record.

How this addresses that need:
* Update MITAardvark.get_rights field method
   * Accept MITAardvark.source as an argument to determine ownership
   * Use "dct_accessRights_s" to determine access category
* Update Rights.kind for MITAardvark.dct_accessRights_s field to 'Access rights'

Side effects of this change:
* Need to determine whether OpenSearch can limit aggregation to a field that is
filtered by a subfield (i.e., Rights.kind) and make necessary updates to timdex-ui.
* timdex-index-manager (TIM) needs to update `rights.description` as a multifield with a keyword value to enable aggregation.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/GDT-138
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants