-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure datagov-harvesting-logic as a cloud.gov python application #4617
Labels
H2.0/Harvest-Runner
Harvest Source Processing for Harvesting 2.0
Comments
does it make sense to just create a utility function in harvesting logic to fetch a harvest source config from the db? |
6 tasks
App
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
User Story
In order to allow datagov-harvesting-logic (DHL) to operate as an process independent of other parts of the Harvest 2.0 system, datagovteam wants to create a new cloud.gov python application.
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
GIVEN I have created a Manifest file in the DHL repo
AND I have configured it to deploy the repo as a standalone application with no public routes
WHEN I run
cf apps
in the appropriate spaceTHEN I expect to see datagov-harvesting-logic as its own app, with zero instances running.
GIVEN I have created a python script similar to
loadtest.py
in the DHL repoAND that script is tied to a command that can be invoked from the CLI which accepts harvest source Id as an argument
WHEN I run
cf run-task
with the appropriate-c
flagTHEN I expect to see the app launch a task and begin harvesting that source
GIVEN I have run a harvest job in the DHL repo
AND that job has completed
WHEN I run
cf tasks datagov-harvesting-logic
THEN I expect to see that the latest task is marked as: SUCCEEDED
AND WHEN I run
cf app datagov-harvesting-logic
THEN I expect to see that the app has cycled back to being idle
Background
[Any helpful contextual notes or links to artifacts/evidence, if needed]
As we are starting to understand more about Airflow, we are realizing that bundling DHL as a pypi module into an airflow instance is not worthwhile. Rather, we want to push up DHL as its own application that can be invoked by running a
cf task
command with a harvest source object as a payload. This means we will no longer publish this as a PyPi module.For an idea of how the manifest should look, you can reference the catalog-gather app:
https://github.com/GSA/catalog.data.gov/blob/main/manifest.yml#L84-L109
For reference on CF tasks: https://docs.cloudfoundry.org/devguide/using-tasks.html
For reference on invoking a task with arguments here are a few verified patterns:
cf run-task airflow-test-webserver -c "airflow users create \\n --username admin \\n --firstname admin \\n --lastname admin \\n --password admin \\n --role Admin \\n --email email@email.com"
cf run-task catalog-web -c "ckan report generate organization=None metrics-dashboard" --name=reports-list
Security Considerations (required)
[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]
DHL should only be able to to be invoked from within the cloud.gov space, so there is no threat of it's being compromised by external actors.
Sketch
cf run-task example-app "{harvest-cli-command} id={id}" --name {org_name}-harvest
Create script that will take the id, extract the harvest source config from the DB and execute the harvest.This can be part of flask api endpoint/actionThe text was updated successfully, but these errors were encountered: