-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
save import hash info to state #10531
Conversation
3b07501
to
64e6c82
Compare
I did some minor adjustments to the PR (removed I also ran the dvc-bench benchmark for
This is now back to 2.58.2 level. |
Great changes, thanks! I think if we had a benchmark using large files, you would see an improvement over 2.58.2. |
This saves the hash info from imports to the output state db so that we don't rehash imports. Without this PR, imported files were always hashed when the output was saved. Existing imported files were even hashed again when doing operations like
dvc import -f
.This should improve the speed of all imports, and it also reverts the changes in #10388 that caused slowdowns in some scenarios. That previous PR wrongly assumed the slowdown occurred during download, when it actually slowed during
output.save()
. That PR happened to work in some scenarios because the workaround updated the state db.