-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow configuration to support data staging #8
Comments
I can start working on this by the end of the week, but the more work intensive business will be to set up the parallel pipeline physically (setting up a parallel triple store, pdb, owlery etc). Maybe the first step would be to actually mirror the existing pipeline physically on Jenkins ( |
I will setup a dual pipeline - is the whole thing needed or simply the dump stage and beyond? |
Everything except for KB is needed unfortunately, because the embargoeing happens pre-triplestore.. |
This pull implements the blocking and staging logics, which are totally independent. - Blocking is implemented through cypher queries, see process.sh lines 56-57. - Staging is the rest. We only care here about the embargo logic of staging. So if we embargo, then, depending on whether we are in the prod or dev stage mode (see Dockerfile), we will embargo different things. The logic corresponds to what was discussed [here](VirtualFlyBrain/neo4j2owl#52). The implementation is realised through two different sets of sparql queries (one for prod, one for dev), which apply differently rigorous embargo rules. The prod queries are unchanged, and the embargo rules in dev are tighter (i.e. less stuff gets embargoed). see #8 see VirtualFlyBrain/neo4j2owl#52
Fixed in #9 |
We originally discussed supporting a parallel staging pipeline
VirtualFlyBrain/neo4j2owl#18 (comment)
This doesn't seem to have happened, but looks like it would be relatively straightforward to lightly edit the SPARQL used for filtering our embargoed data: https://github.com/VirtualFlyBrain/vfb-pipeline-collectdata/tree/master/sparql
We would need a config for the parallel, staging pipeline that would allow through DataSets where production: False, staging: True
CC @Robbie1977 - please check my spec here.
The text was updated successfully, but these errors were encountered: