Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/al2 updates #99

Merged
merged 11 commits into from
Dec 6, 2024
16 changes: 11 additions & 5 deletions .secrets.baseline
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"files": null,
"lines": null
},
"generated_at": "2021-07-16T13:28:46Z",
"generated_at": "2024-11-19T22:00:45Z",
"plugins_used": [
{
"name": "AWSKeyDetector"
Expand Down Expand Up @@ -68,15 +68,21 @@
],
"poetry.lock": [
{
"hashed_secret": "04bc1e75a811cb16d5b276b9f02d083c6c62d936",
"hashed_secret": "3fe4cfda6b6d913d6ba809208c22b9cc682de7c6",
"is_verified": false,
"line_number": 147,
"line_number": 191,
"type": "Hex High Entropy String"
},
{
"hashed_secret": "683bf5e8b3f78752cab86cc2524104180d43d7d7",
"hashed_secret": "02e1ed68101272a29b85bb1623070c60e6a017cb",
"is_verified": false,
"line_number": 603,
"line_number": 224,
"type": "Hex High Entropy String"
},
{
"hashed_secret": "5bca853943adcb8e1f858addeb33fcde3ecea5a5",
"is_verified": false,
"line_number": 773,
"type": "Hex High Entropy String"
}
]
Expand Down
55 changes: 38 additions & 17 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,22 +1,43 @@
FROM quay.io/cdis/python-nginx:pybase3-1.5.0
ARG AZLINUX_BASE_VERSION=master

RUN pip install --upgrade pip
RUN apk add --update \
postgresql-libs postgresql-dev libffi-dev libressl-dev \
linux-headers musl-dev gcc g++ \
curl bash git vim logrotate
RUN apk --no-cache add --update \
aspell aspell-en ca-certificates \
&& mkdir -p /usr/share/dict/ \
&& aspell -d en dump master > /usr/share/dict/words
# Base stage with python-build-base
FROM quay.io/cdis/python-nginx-al:${AZLINUX_BASE_VERSION} AS base

COPY . /src/
WORKDIR /src
ENV appname=dictionaryutils

RUN pip install poetry \
&& poetry config virtualenvs.create false \
&& poetry install -vv --no-interaction
COPY --chown=gen3:gen3 . /${appname}

COPY . /dictionaryutils
WORKDIR /${appname}

CMD cd /dictionary; rm -rf build dictionaryutils dist gdcdictionary.egg-info; python setup.py install --force && cp -r /dictionaryutils . && cd /dictionary/dictionaryutils; pip uninstall -y gen3dictionary; pytest tests -s -v; export SUCCESS=$?; cd ..; rm -rf build dictionaryutils dist gdcdictionary.egg-info; exit $SUCCESS
# Builder stage
FROM base AS builder

RUN dnf install -y python3-devel postgresql-devel gcc

USER gen3

COPY poetry.lock pyproject.toml /${appname}/

RUN poetry install -vv --no-interaction --without dev

COPY --chown=gen3:gen3 . /${appname}

# Run poetry again so this app itself gets installed too
# include dev because we need data-simulator to run the unit tests.
RUN poetry install -vv --no-interaction

ENV PATH="$(poetry env info --path)/bin:$PATH"

# Final stage
FROM base

COPY --from=builder /${appname} /${appname}

# Switch to non-root user 'gen3' for the serving process
USER gen3

WORKDIR /${appname}

RUN chmod +x "/${appname}/dockerrun.bash"

CMD ["/bin/bash", "-c", "/${appname}/dockerrun.bash"]
24 changes: 20 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,7 @@ Then from the directory containing the `gdcdictionary` directory run `testdict`.
If you wish to generate fake simulated data you can also do that with dictionaryutils and the data-simulator.

```
simdata() { docker run --rm -v $(pwd):/dictionary -v $(pwd)/simdata:/simdata quay.io/cdis/dictionaryutils:master /bin/sh -c "cd /dictionary && python setup.py install --force; python /src/datasimulator/bin/data-simulator simulate --path /simdata/ $*; export SUCCESS=$?; rm -rf build dictionaryutils dist gdcdictionary.egg-info; chmod -R a+rwX /simdata; exit $SUCCESS"; }
simdataurl() { docker run --rm -v $(pwd):/dictionary -v $(pwd)/simdata:/simdata quay.io/cdis/dictionaryutils:master /bin/sh -c "python /src/datasimulator/bin/data-simulator simulate --path /simdata/ $*; chmod -R a+rwX /simdata"; }
simdata() { docker run --rm -v $(pwd):/dictionary -v $(pwd)/simdata:/simdata quay.io/cdis/dictionaryutils:master; /bin/bash -c "cd /dictionary/dictionaryutils; bash dockerrun.bash; cd /dictionary/dictionaryutils; poetry run python bin/simulate_data.py --path /dictionary/simdata $*; export SUCCESS=$?; cd /dictionary; rm -rf build dictionaryutils dist gdcdictionary.egg-info; chmod -R a+rwX /simdata; exit $SUCCESS "; }

```

Expand All @@ -39,11 +38,28 @@ The `--max_samples` argument will define a default number of nodes to simulate,
```
Then run the following:
```
docker run --rm -v $(pwd):/dictionary -v $(pwd)/simdata:/simdata quay.io/cdis/dictionaryutils:master /bin/sh -c "cd /dictionary && python setup.py install --force; python /src/datasimulator/bin/data-simulator simulate --path /simdata/ --program workshop --project project1 --max_samples 10 --node_num_instances_file instances.json; export SUCCESS=$?; rm -rf build dictionaryutils dist gdcdictionary.egg-info; chmod -R a+rwX /simdata; exit $SUCCESS";
docker run --rm -v $(pwd):/dictionary -v $(pwd)/simdata:/simdata quay.io/cdis/dictionaryutils:master /bin/bash -c "cd /dictionaryutils; bash dockerrun.bash; cd /dictionary/dictionaryutils; poetry run python bin/simulate_data.py --path /simdata/ --program workshop --project project1 --max_samples 10 --node_num_instances_file /dictionary/instances.json; export SUCCESS=$?; rm -rf build dictionaryutils dist gdcdictionary.egg-info; chmod -R a+rwX /simdata; exit $SUCCESS";
```
Then you'll get 100 each of `case` and `demographic` nodes and 10 each of everything else. Note that the above example also defines `program` and `project` names.

You can also run the simulator for an arbitrary json url by using `simdataurl --url https://datacommons.example.com/schema.json`.
You can also run the simulator for an arbitrary json url with the `--url` parameter. The alias can be simplified to skip the set up of the parent directory virtual env (ie, skip the `docker_run.bash`):
```
simdataurl() { docker run --rm -v $(pwd):/dictionary -v $(pwd)/simdata:/simdata quay.io/cdis/dictionaryutils:master /bin/bash -c "python /dictionaryutils/bin/simulate_data.py simulate --path /simdata/ $*; chmod -R a+rwX /simdata"; }

```

Then run `simdataurl --url https://datacommons.example.com/schema.json`.

## Using a local build of the Docker image

It is possible to use a local build of the `dictionaryutils` Docker image instead of the master branch stored in `quay`.

From a local copy of the `dictionaryutils` repo, build and tag a Docker image, for example
```
docker build -t dictionaryutils-mytag .
```
Then use this image in any of the aliases and commands mentioned
above by replacing `quay.io/cdis/dictionaryutils:master` with `dictionaryutils-mytag`.


## Use dictionaryutils to load a dictionary
Expand Down
67 changes: 67 additions & 0 deletions bin/simulate_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
import argparse

from datasimulator import main as simulator


def parse_arguments():
parser = argparse.ArgumentParser()

parser.add_argument(
"--path", required=True, help="path to save files to", nargs="?"
)
parser.add_argument(
"--program", required=False, help="program to generate data", nargs="?"
)
parser.add_argument(
"--project", required=False, help="project to generate data", nargs="?"
)
parser.add_argument(
"--max_samples",
required=False,
help="max number of samples for each node",
default=1,
nargs="?",
)
parser.add_argument(
"--node_num_instances_file",
required=False,
help="max number of samples for each node stored in a file",
nargs="?",
)
parser.add_argument(
"--random", help="randomly generate data numbers for nodes", action="store_true"
)
parser.add_argument(
"--required_only", help="generate only required fields", action="store_true"
)
parser.add_argument(
"--skip", help="skip raising an exception if gets an error", action="store_true"
)
parser.add_argument("--url", required=False, help="s3 dictionary link.", nargs="?")

return parser.parse_args()


def main():

args = parse_arguments()

graph = simulator.initialize_graph(
dictionary_url=args.url if hasattr(args, "url") else None,
program=args.program if hasattr(args, "program") else None,
project=args.project if hasattr(args, "project") else None,
consent_codes=args.consent_codes if hasattr(args, "consent_codes") else None,
)

simulator.run_simulation(
graph,
args.path,
args.max_samples,
args.node_num_instances_file,
args.random,
args.required_only,
args.skip,
)


main()
49 changes: 49 additions & 0 deletions dockerrun.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/bin/bash

# This script sets up the poetry environment for running tests
# against a local build of a dictionary repo.
# This will remove the default gdcdictionary (eg, version 2.0.0)
# and install the local dictionary (eg, version 0.0.0).
#
# Similar to
# https://github.com/uc-cdis/.github/blob/master/.github/workflows/dictionary_push.yaml

cd /dictionary
if [ -f pyproject.toml ]; then
export USE_POETRY=1
else
export USE_POETRY=0
fi

echo "Removing old dictionaryutils"
rm -rf build dictionaryutils dist gdcdictionary.egg-info

echo "Installing dictionary"
if [ $USE_POETRY -eq 1 ]; then
echo "Via poetry"
poetry install -v --all-extras --no-interaction || true
fi

cp -r /dictionaryutils .
cd /dictionary/dictionaryutils

echo "Removing old gdcdictionary"
if [ $USE_POETRY -eq 1 ]; then
poetry run pip uninstall -y gen3dictionary
poetry run pip uninstall -y gdcdictionary
else
poetry remove gdcdictionary
poetry run pip uninstall -y gen3dictionary
fi

echo "Reinstall dictionary"
poetry run pip install ..

echo "The following schemas from dictionary will be tested:"
ls `poetry run python -c "from gdcdictionary import SCHEMA_DIR; print(SCHEMA_DIR)"`

echo "Ready to run tests"
poetry run pytest -v tests
export SUCCESS=$?

exit $SUCCESS
Loading
Loading