Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuration to support data staging #8

Closed
dosumis opened this issue Oct 6, 2020 · 4 comments
Closed

Allow configuration to support data staging #8

dosumis opened this issue Oct 6, 2020 · 4 comments
Assignees

Comments

@dosumis
Copy link
Member

dosumis commented Oct 6, 2020

We originally discussed supporting a parallel staging pipeline

VirtualFlyBrain/neo4j2owl#18 (comment)

This doesn't seem to have happened, but looks like it would be relatively straightforward to lightly edit the SPARQL used for filtering our embargoed data: https://github.com/VirtualFlyBrain/vfb-pipeline-collectdata/tree/master/sparql

We would need a config for the parallel, staging pipeline that would allow through DataSets where production: False, staging: True

CC @Robbie1977 - please check my spec here.

@matentzn
Copy link
Contributor

matentzn commented Oct 6, 2020

I can start working on this by the end of the week, but the more work intensive business will be to set up the parallel pipeline physically (setting up a parallel triple store, pdb, owlery etc). Maybe the first step would be to actually mirror the existing pipeline physically on Jenkins (vfb-pipeline2-devstage) and then start working with config to allow unpublished data to seep through?

@Robbie1977
Copy link
Contributor

I will setup a dual pipeline - is the whole thing needed or simply the dump stage and beyond?

@matentzn
Copy link
Contributor

matentzn commented Oct 6, 2020

Everything except for KB is needed unfortunately, because the embargoeing happens pre-triplestore..

matentzn added a commit that referenced this issue Oct 15, 2020
This pull implements the blocking and staging logics, which are totally independent.
- Blocking is implemented through cypher queries, see process.sh lines 56-57.
- Staging is the rest. We only care here about the embargo logic of staging. So if we embargo, then, depending on whether we are in the prod or dev stage mode (see Dockerfile), we will embargo different things. The logic corresponds to what was discussed [here](VirtualFlyBrain/neo4j2owl#52). The implementation is realised through two different sets of sparql queries (one for prod, one for dev), which apply differently rigorous embargo rules. The prod queries are unchanged, and the embargo rules in dev are tighter (i.e. less stuff gets embargoed).

see #8
see VirtualFlyBrain/neo4j2owl#52
@matentzn
Copy link
Contributor

Fixed in #9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants