Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache migrate: add --dvc-files flag to migrate .dvc and dvc.lock files #10007

Merged
merged 1 commit into from
Oct 12, 2023

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Oct 10, 2023

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Closes #9938
Related (workaround for): #9818

Docs PR iterative/dvc.org#4910

Migration workflow now looks like:

$ dvc cache migrate --dvc-files --dry
3 files will be re-hashed and migrated to the DVC 3.0 cache location.
Entries in following DVC files will be migrated to the 3.0 format:
        dir/bar.dvc
        dir/dvc.yaml (dir/dvc.lock)
        foo.dvc

$ dvc cache migrate --dvc-files
Migrated 3 files to DVC 3.0 cache location.
Updating DVC file 'foo.dvc'
Modifying stage 'baz' in 'dir/dvc.yaml'
Updating lock file 'dir/dvc.lock'
Updating DVC file 'dir/bar.dvc'

To track the changes with git, run:

        git add dir/dvc.yaml foo.dvc dir/dvc.lock dir/bar.dvc

To enable auto staging, run:

        dvc config core.autostage true

$ dvc cache migrate --dvc-files
Migrated 0 files to DVC 3.0 cache location.
No DVC files in the repo to migrate to the 3.0 format.

@pmrowla pmrowla self-assigned this Oct 10, 2023
@pmrowla pmrowla added the A: data-management Related to dvc add/checkout/commit/move/remove label Oct 10, 2023
@pmrowla pmrowla requested a review from dberenbaum October 10, 2023 07:32
@codecov
Copy link

codecov bot commented Oct 10, 2023

Codecov Report

Attention: 31 lines in your changes are missing coverage. Please review.

Files Coverage Ξ”
dvc/output.py 84.51% <100.00%> (ΓΈ)
dvc/commands/cache.py 83.63% <25.00%> (-4.60%) ⬇️
dvc/repo/commit.py 47.69% <15.15%> (-33.56%) ⬇️

πŸ“’ Thoughts on this report? Let us know!.

@dberenbaum
Copy link
Collaborator

Thanks @pmrowla! I tried it on https://github.com/iterative/demo-bank-customer-churn and got an error:

$ dvc cache migrate --dvc-files -v
2023-10-10 12:20:00,017 DEBUG: v3.25.1.dev2+gef7ec9f9c, CPython 3.11.5 on macOS-14.0-arm64-arm-64bit
2023-10-10 12:20:00,017 DEBUG: command: /Users/dave/micromamba/envs/dvc/bin/dvc cache migrate --dvc-files -v
2023-10-10 12:20:00,132 DEBUG: '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5/fc/d1b7e28f320159a0eed658ee56ba97' doesn't look like a cache file, skipping
2023-10-10 12:20:00,132 DEBUG: '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5/c8/a06d11eee4053037d215f4fe6b7ae5' doesn't look like a cache file, skipping
2023-10-10 12:20:00,132 DEBUG: '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5/7c/aa1ff6b9fd082639f697e702dacb15' doesn't look like a cache file, skipping
2023-10-10 12:20:00,132 DEBUG: '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5/6c/6d54acd5bbdd2df1c280353e048a81' doesn't look like a cache file, skipping
2023-10-10 12:20:00,132 DEBUG: '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5/64/ec49282ee1bea38d577c92914b03be' doesn't look like a cache file, skipping
2023-10-10 12:20:00,132 DEBUG: '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5/c5/e58ac55c3f8077ffac66b0e9b3536a' doesn't look like a cache file, skipping
2023-10-10 12:20:00,133 DEBUG: '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5/77/43d704b8b97cc303ceef12b9be6832' doesn't look like a cache file, skipping
2023-10-10 12:20:00,133 DEBUG: '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/runs/8a/8a452bca2ba538e7f61d05005ec4dc767d7816cac5d6aadb210d08d48199feb3/adabb599095d094710498f18ba3e0926c9524edfe4c04cc627510b7888a26005' doesn't look like a cache file, skipping
Migrated 0 files to DVC 3.0 cache location.
2023-10-10 12:20:00,181 DEBUG: Computed stage: 'data_split' md5: '3dac47b5a1676654cccaa1f5e6c91b31'
2023-10-10 12:20:00,186 DEBUG: Preparing to transfer data from 'memory://dvc-staging-md5/4bbbf317c3ac8cf5983aeba433a41dd1f4163e7a0d16a4bc334eb4b434dddd39' to '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5'
2023-10-10 12:20:00,186 DEBUG: Preparing to collect status from '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5'
2023-10-10 12:20:00,186 DEBUG: Collecting status from '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5'
2023-10-10 12:20:00,187 DEBUG: Removing '/Users/dave/repo/demo-bank-customer-churn/data/.ZLPRM7GziAWMJceMmm3U3n.tmp'
2023-10-10 12:20:00,187 DEBUG: Removing '/Users/dave/repo/demo-bank-customer-churn/data/.ZLPRM7GziAWMJceMmm3U3n.tmp'
2023-10-10 12:20:00,187 DEBUG: Removing '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5/.EheJZDbFqNXwPmU2UDR55c.tmp'
2023-10-10 12:20:00,188 DEBUG: Removing '/Users/dave/repo/demo-bank-customer-churn/data/X_train.pkl'
2023-10-10 12:20:00,190 DEBUG: Preparing to transfer data from 'memory://dvc-staging-md5/4bbbf317c3ac8cf5983aeba433a41dd1f4163e7a0d16a4bc334eb4b434dddd39' to '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5'
2023-10-10 12:20:00,190 DEBUG: Preparing to collect status from '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5'
2023-10-10 12:20:00,190 DEBUG: Collecting status from '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5'
2023-10-10 12:20:00,190 DEBUG: Removing '/Users/dave/repo/demo-bank-customer-churn/data/.AewxZ7DKsQ7LcSjZgfMVAL.tmp'
2023-10-10 12:20:00,191 DEBUG: Removing '/Users/dave/repo/demo-bank-customer-churn/data/.AewxZ7DKsQ7LcSjZgfMVAL.tmp'
2023-10-10 12:20:00,191 DEBUG: Removing '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5/.hTSLQL6p32jW4KYWvyHYyM.tmp'
2023-10-10 12:20:00,191 DEBUG: Removing '/Users/dave/repo/demo-bank-customer-churn/data/X_test.pkl'
2023-10-10 12:20:00,193 DEBUG: Preparing to transfer data from 'memory://dvc-staging-md5/4bbbf317c3ac8cf5983aeba433a41dd1f4163e7a0d16a4bc334eb4b434dddd39' to '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5'
2023-10-10 12:20:00,193 DEBUG: Preparing to collect status from '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5'
2023-10-10 12:20:00,193 DEBUG: Collecting status from '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5'
2023-10-10 12:20:00,193 DEBUG: Removing '/Users/dave/repo/demo-bank-customer-churn/data/.gLvSaATJ8ytt542gChAfy4.tmp'
2023-10-10 12:20:00,194 DEBUG: Removing '/Users/dave/repo/demo-bank-customer-churn/data/.gLvSaATJ8ytt542gChAfy4.tmp'
2023-10-10 12:20:00,194 DEBUG: Removing '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5/.Ag4g6fexMAJ4JgYJHFLZZ7.tmp'
2023-10-10 12:20:00,194 DEBUG: Removing '/Users/dave/repo/demo-bank-customer-churn/data/y_train.pkl'
2023-10-10 12:20:00,196 DEBUG: Preparing to transfer data from 'memory://dvc-staging-md5/4bbbf317c3ac8cf5983aeba433a41dd1f4163e7a0d16a4bc334eb4b434dddd39' to '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5'
2023-10-10 12:20:00,196 DEBUG: Preparing to collect status from '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5'
2023-10-10 12:20:00,196 DEBUG: Collecting status from '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5'
2023-10-10 12:20:00,196 DEBUG: Removing '/Users/dave/repo/demo-bank-customer-churn/data/.UfKsCU2eSBpYUvEgDFBqpJ.tmp'
2023-10-10 12:20:00,197 DEBUG: Removing '/Users/dave/repo/demo-bank-customer-churn/data/.UfKsCU2eSBpYUvEgDFBqpJ.tmp'
2023-10-10 12:20:00,197 DEBUG: Removing '/Users/dave/repo/demo-bank-customer-churn/.dvc/cache/files/md5/.VU5RWTbyfNVAi8R5tM6pCf.tmp'
2023-10-10 12:20:00,197 DEBUG: Removing '/Users/dave/repo/demo-bank-customer-churn/data/y_test.pkl'
2023-10-10 12:20:00,199 ERROR: cannot dump a parametrized stage: 'data_split'
Traceback (most recent call last):
  File "/Users/dave/Code/dvc/dvc/cli/__init__.py", line 209, in main
    ret = cmd.do_run()
          ^^^^^^^^^^^^
  File "/Users/dave/Code/dvc/dvc/cli/command.py", line 26, in do_run
    return self.run()
           ^^^^^^^^^^
  File "/Users/dave/Code/dvc/dvc/commands/cache.py", line 47, in run
    commit_2_to_3(self.repo, dry=self.args.dry)
  File "/Users/dave/Code/dvc/dvc/repo/__init__.py", line 61, in wrapper
    return f(repo, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Code/dvc/dvc/repo/scm_context.py", line 151, in run
    return method(repo, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Code/dvc/dvc/repo/commit.py", line 121, in commit_2_to_3
    stage.dump()
  File "/Users/dave/Code/dvc/dvc/stage/__init__.py", line 783, in dump
    self.dvcfile.dump(self, **kwargs)
  File "/Users/dave/Code/dvc/dvc/dvcfile.py", line 251, in dump
    self._dump_pipeline_file(stage)
  File "/Users/dave/Code/dvc/dvc/dvcfile.py", line 265, in _dump_pipeline_file
    self._check_if_parametrized(stage)
  File "/Users/dave/Code/dvc/dvc/dvcfile.py", line 262, in _check_if_parametrized
    raise ParametrizedDumpError(f"cannot {action} a parametrized {stage}")
dvc.dvcfile.ParametrizedDumpError: cannot dump a parametrized stage: 'data_split'

2023-10-10 12:20:00,202 DEBUG: Analytics is enabled.
2023-10-10 12:20:00,219 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/24/99_tf1xj3vx8k1k_jkdmnhq00000gn/T/tmpgjjn73fk']'
2023-10-10 12:20:00,220 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/24/99_tf1xj3vx8k1k_jkdmnhq00000gn/T/tmpgjjn73fk']'

dvc/repo/commit.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@dberenbaum dberenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pmrowla pmrowla merged commit e5dd2bb into iterative:main Oct 12, 2023
@pmrowla pmrowla deleted the migrate-commit branch October 12, 2023 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: data-management Related to dvc add/checkout/commit/move/remove
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cache migrate: add an option to update dvc files
3 participants