Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintenance updates #207

Merged
merged 1 commit into from
Oct 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
default_language_version:
python: python3.11 # set for project python version
python: python3.12 # set for project python version
repos:
- repo: local
hooks:
Expand Down
2 changes: 1 addition & 1 deletion .python-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.11.2
3.12
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM python:3.11-slim as build
FROM python:3.12-slim as build
WORKDIR /app
COPY . .

Expand Down
2 changes: 1 addition & 1 deletion Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ ruff = "*"
safety = "*"

[requires]
python_version = "3.11"
python_version = "3.12"

[scripts]
transform = "python -c \"from transmogrifier.cli import main; main()\""
1,666 changes: 870 additions & 796 deletions Pipfile.lock

Large diffs are not rendered by default.

60 changes: 31 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@

An application to transform source records to the TIMDEX data model to facilitate ingest into an OpenSearch index.

## Description

TIMDEX ingests records from various sources with different metadata formats, necessitating an application to transform those source records to a common metadata format, the TIMDEX data model in this case. This application processes both XML and JSON source records and outputs a JSON file of records formatted according to the TIMDEX data model.

```mermaid
Expand All @@ -18,10 +16,10 @@ flowchart TD
transmogrifier((transmogrifier))
JSON
timdex-index-manager
ArchivesSpace[("ArchivesSpace\n(EAD XML)")] --> transmogrifier
DSpace[("DSpace\n(METS XML)")] --> transmogrifier
GeoData[("GeoData\n(Aardvark JSON)")] --> transmogrifier
MARC[("Alma\n(MARCXML)")] --> transmogrifier
ArchivesSpace[("ArchivesSpace<br>(EAD XML)")] --> transmogrifier
DSpace[("DSpace<br>(METS XML)")] --> transmogrifier
GeoData[("GeoData<br>(Aardvark JSON)")] --> transmogrifier
MARC[("Alma<br>(MARCXML)")] --> transmogrifier
transmogrifier --> JSON["TIMDEX JSON"]
JSON[TIMDEX JSON file] --> timdex-index-manager((timdex-index-manager))
```
Expand All @@ -34,34 +32,38 @@ After the JSON file of transformed records is produced, it is processed by `timd

## Development

To install with dev dependencies:

```
make install
```

To run unit tests:

```
make test
```

To lint the repo:
- To preview a list of available Makefile commands: `make help`
- To install with dev dependencies: `make install`
- To update dependencies: `make update`
- To run unit tests: `make test`
- To lint the repo: `make lint`
- To run the app: `pipenv run transform <command>`

```
make lint
```
## Environment Variables

To run the app:
### Required

```
pipenv run transform <command>
```shell
SENTRY_DSN=### If set to a valid Sentry DSN, enables Sentry exception monitoring. This is not needed for local development.
STATUS_UPDATE_INTERVAL=### The transform process logs the # of records transformed every nth record (1000 by default). Set this env variable to any integer to change the frequency of logging status updates. Can be useful for developm ent/debugging.
WORKSPACE=### Set to `dev` for local development, this will be set to `stage` and `prod` in those environments by Terraform.
```

## Required ENV
## CLI commands

`SENTRY_DSN` = If set to a valid Sentry DSN, enables Sentry exception monitoring. This is not needed for local development.
### `transform`

`STATUS_UPDATE_INTERVAL` = The transform process logs the # of records transformed every nth record (1000 by default). Set this env variable to any integer to change the frequency of logging status updates. Can be useful for development/debugging.
```text
Usage: -c [OPTIONS]

`WORKSPACE` = Set to `dev` for local development, this will be set to `stage` and `prod` in those environments by Terraform.
Options:
-i, --input-file TEXT Filepath for harvested input records to
transform [required]
-o, --output-file TEXT Filepath to write output TIMDEX JSON records
to [required]
-s, --source [alma|aspace|dspace|jpal|libguides|gismit|gisogm|researchdatabases|whoas|zenodo]
Source records were harvested from, must
choose from list of options [required]
-v, --verbose Pass to log at debug level instead of info
--help Show this message and exit.
```
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ exclude = ["tests/"]
log_level = "INFO"

[tool.ruff]
target-version = "py311"
target-version = "py312"

# set max line length
line-length = 90
Expand Down
4 changes: 2 additions & 2 deletions transmogrifier/sources/transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import os
from abc import ABC, abstractmethod
from importlib import import_module
from typing import TYPE_CHECKING, TypeAlias, final
from typing import TYPE_CHECKING, final

import smart_open # type: ignore[import-untyped]
from attrs import asdict
Expand All @@ -24,7 +24,7 @@

logger = logging.getLogger(__name__)

JSON: TypeAlias = dict[str, "JSON"] | list["JSON"] | str | int | float | bool | None
type JSON = dict[str, "JSON"] | list["JSON"] | str | int | float | bool | None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting change, thanks for highlighting the ruff rule!



class Transformer(ABC):
Expand Down