-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dvc repro --dry --allow-missing
: fails on missing data
#9818
Comments
dvc repro --dry --allow-missing
: fails on missing data
@Otterpatsch I see you provided some verbose output in |
I dont hit any "error" just that notification due to --dry that staged would run. And further notification that some files are missing (dvc tracked). Just to clarify if i run dvc pull and run dvc status everything is reported as fine.
|
It seems like this happens when there is a dependency on data that was tracked via git clone https://github.com/iterative/example-get-started-experiments.git
cd example-get-started-experiments
dvc repro --allow-missing --dry Verbose output:
|
Looks like it is failing in my example because Lines 315 to 321 in 04e891c
The hashes are the same, but debugging shows that the different hash names make it fail:
@Otterpatsch Does |
@iterative/dvc Thoughts on how we should treat this? Is it modified or not? |
IMO, it was an overlook for this scenario |
@daavoo What does that mean? Do you think we should only compare the hash value and not all hash info? |
I mean that we should not consider it modified in the example-get-started-experiments scenario.
Can't say from the top of my mind. Would need to take a closer look to see what makes sense |
seems it does
With deleting the /var/tmp/dvc (was existing) error persists |
So, to give context, the problem appears if there is a That is referenced in a As soon as the contents associated with the
Strictly speaking, I guess there could be a collision where we would be miss identifying 2 different things as being the same 🤷 |
@Otterpatsch Is it possible to just force-commit for you to upgrade those hashes? We can't really compare those without computing both, which is undesirable. Seems like just upgrading old lock file should be an easy long-term fix. |
How do i upgrade the hashes? |
@Otterpatsch You can do |
@daavoo Are you planning a PR to fix the @Otterpatsch Are you still working through this problem? It turns out that |
yes |
alright we will test that. But currently we just rolled back to just jusing dvc pull and dvc status (close to a hour). |
Once |
So i fixed the issue (i think) on our side. I basically run But now the pipeline succeeds even tho i get a the following lines in the command. Which makes sense because i changed a lot of .dvc files which are also in that path.
How can i fix this? Like it seems that i dont use the correct command for my pipeline. I mean the command succeeds but it should fail in a pipeline sense because a repro would be run if i just would use I believe im missing something similar to the dvc data status one which got the grep but not sure how do it for dvc repro --allow-missing --dry so it failes for all kinds of the dependecies. So i tried:
|
|
The dvc cache migrate outputs that no file changed in the the cache. Even tho the I also run dvc cache migrate on the ci machine. I doenst apply any changes. Also the after each run everything is cleared (but wanted to check anyway) Sadly the pipeline still fails on the So the following output confused me a lot as there a .dvc files with no md5sum? even tho i run the commands you mentioned. So i do i get rid of the dvc2 .dvc files and replace them with their dvc3 counterpart?
|
Yes, sorry for the confusion @Otterpatsch. I initially thought |
So i just tried to do that (with version 3.22.0)
Now i expected if i run dvc repro --dry --allow-missing to not have an have the output md5: 'None for that one specific file.
|
That debug calls from here: Lines 466 to 473 in b856081
It will only show a non-empty md5 for an actual stage, not a Is |
Closing since I haven't heard back but feel free to reopen if you still have issues @Otterpatsch |
I tried to update our dvc ci pipeline
Currently we got the following commands (among others).
dvc pull
to check if everything is pusheddvc status
to check if the dvc status is clean. In other words no repro would be run if one would run dvc repro.But pulling thats a long time and with the now new --alllow-missing feature i though i can skip that with
the first is working like expected. Fails if data was forgotten to be pushed and succeeds if it was.
But the later just fails on missing data.
Reproduce
Example: Failure/Success on Machine Two and Three should be synced
Machine One:
--> doesnt fail, nothing changed (as expected)
Machine Two:
4. dvc data status --not-in-remote --json | grep -v not_in_remote
--> does not fail, everything is pushed and would be pulled
5. dvc repro --allow-missing --dry
--> fails on missing data (unexpected)
Machine Three
4. dvc pull
5. dvc status
--> succeeds
Expected
On a machine where i did not
dvc pull
i would expect on a git clean state and a cleandvc data status --not-in-remote --json | grep -v not_in_remote
state thatdvc repro --allow-missing --dry
would succed and show me that no stage had to run.Environment information
Linux
Output of
dvc doctor
:The text was updated successfully, but these errors were encountered: