-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dvc repro --dry
: should fail if any stage has to run
#9861
Comments
@Otterpatsch I opened a separate issue for this one to keep track. @iterative/dvc I think once we hit a stage that needs to be run, we need to stop execution for downstream stages, or at least we should raise a non-0 exit code. |
dvc repro --allow-missing --dry
couple of times to get each time one of the datasets which where still dvc2. Then i readd those and not anymore crashing.dvc repro --dry
: should fail if any stage has to run
@Otterpatsch do you hit this in CI or only when testing locally? I would expect that you would hit this when you have already pulled the data but not in CI since the data won't be pulled and downstream stages will likely fail to find the necessary dependencies. |
I hit this on the ci. I assume because of the --allow-missing the command does fail on the non present dependencies(partially .dvc tracked dependencies are fine). I run the following commands on the machine where i dvc pushed the data(and did the repro): Some more details:
dvc push was done, also
Basically instead of adding all full paths to the dependencies (which would be ~200-300 lines) we decided to just add the parent directory. This parent folder is not dvc tracked but all the subdirectories are. Could this may cause the issue? I assume it could have something do with dependencies which are not fully dvc tracked?
For stage
and referenced as such as
|
Have you git committed and pushed all changes as well? I'm a bit confused whether you are trying to simulate a clean state when no pipeline stages should run, or a messy state when it should run and fail? |
Everything is git pushed(git wise im on a clean state). Im trying to have a dvc clean state. The state itself is also clean i think.
|
In that case, it may not be about Edit: you may need to run |
Just to clarify the stage
Yes, if i run dvc pull everything is fine. But thats expected but the idea is that i dont have to dvc pull to verify the pipeline status? So i guess i have to combine the commands? And maybe dvc data status would be enough? or is there any other command missing to archive this? On the ci machine everything is deleted/cleared afterwards (at least thats the intended behavior) |
Is this true even on the CI machine? Does
What do you mean that the subdirs were not filled by |
Disclaimer: If i sound confusing or confuse things, its because i am.
the dvc pull failes to pull some data. This should have been fixed/is fixed on any other branch. So i will investigate that error im getting:
As a dvc pull was not done(in the ci). I may assume(d) the check for the dependency like datasets/training-sets depends on some check. But as this dependency is not a git tracked file nor a .dvc file i assume the check if differs to those files? Because pipelines which have only git/dvc tracked dependency are fine (are shown as "not changed, skipping") basically only dependecies which are paths to directories, which are either dvc outs or contain in some subdirs the .dvc files/the dvc tracked files So what does our ci pipeline do maybe thats helpfull or whats im trying to archive. As i suspect i do something wrong
|
on the machine where the repro was run
on CI machine
if i run dvc pull afterwards and then do a dvc status
After a lot of thinging of what might cause this issue: it seems that dvc repro --dry --allow-missing checks the "local" state but not the remote as intented for a ci pipeline (correct me if iam wrong). |
Closing as stale, but feel free to reopen if you are still facing issues with this |
So i fixed the issue (i think) on our side. I basically run
dvc repro --allow-missing --dry
couple of times to get each time one of the datasets which where still dvc2. Then i readd those and not anymore crashing.But now the pipeline succeeds even tho i get a the following lines in the command. Which makes sense because i changed a lot of .dvc files which are also in that path.
How can i fix this? Like it seems that i dont use the correct command for my pipeline. I mean the command succeeds but it should fail in a pipeline sense because a repro would be run if i just would use
dvc repro
.I believe im missing something similar to the dvc data status one
dvc data status --not-in-remote --json | grep -v not_in_remote
which got the grep but not sure how do it for dvc repro --allow-missing --dry so it failes for all kinds of the dependecies.
So i tried:
dvc repro --dry --allow-missing | grep -v "Running stage "
But it still succeds even tho if i just use grep "Running stage " i get some output
Originally posted by @Otterpatsch in #9818 (comment)
The text was updated successfully, but these errors were encountered: