Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

push: fails with an HTTP external URL tracked by a .dvc file #10065

Closed
mdekstrand opened this issue Nov 4, 2023 · 1 comment · Fixed by #10093
Closed

push: fails with an HTTP external URL tracked by a .dvc file #10065

mdekstrand opened this issue Nov 4, 2023 · 1 comment · Fixed by #10093
Assignees
Labels
A: data-sync Related to dvc get/fetch/import/pull/push bug Did we break something? p0-critical Critical issue. Needs to be fixed ASAP.

Comments

@mdekstrand
Copy link

Bug Report

Description

I have a project that has external data files added with import-url. With the latest DVC (3.28), when I run dvc push, it fails with an unsupported operation in the HTTP remote:

Backtrace
2023-11-04 15:21:53,740 ERROR: unexpected error
Traceback (most recent call last):
  File "/home/mde48/miniforge3/envs/lkpm/lib/python3.10/site-packages/dvc/cli/__init__.py", line 209, in main
    ret = cmd.do_run()
  File "/home/mde48/miniforge3/envs/lkpm/lib/python3.10/site-packages/dvc/cli/command.py", line 26, in do_run
    return self.run()
  File "/home/mde48/miniforge3/envs/lkpm/lib/python3.10/site-packages/dvc/commands/data_sync.py", line 64, in run
    processed_files_count = self.repo.push(
  File "/home/mde48/miniforge3/envs/lkpm/lib/python3.10/site-packages/dvc/repo/__init__.py", line 61, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/mde48/miniforge3/envs/lkpm/lib/python3.10/site-packages/dvc/repo/push.py", line 115, in push
    push_transferred, push_failed = ipush(
  File "/home/mde48/miniforge3/envs/lkpm/lib/python3.10/site-packages/dvc_data/index/push.py", line 59, in push
    old = build(data.path, data.fs)
  File "/home/mde48/miniforge3/envs/lkpm/lib/python3.10/site-packages/dvc_data/index/build.py", line 80, in build
    for entry in build_entries(path, fs, ignore=ignore):
  File "/home/mde48/miniforge3/envs/lkpm/lib/python3.10/site-packages/dvc_data/index/build.py", line 52, in build_entries
    walk_iter = fs.walk(path)
  File "/home/mde48/miniforge3/envs/lkpm/lib/python3.10/site-packages/dvc_http/__init__.py", line 153, in walk
    raise NotImplementedError
NotImplementedError

Reproduce

  1. dvc init
  2. dvc import-url https://files.grouplens.org/datasets/movielens/ml-latest-small.zip
  3. dvc stage add -n extract -d ml-latest-small.zip -o ml-latest-small unzip ml-latest-small.zip
  4. dvc repro
  5. dvc remote add test ../test-remote
  6. dvc push -r test

Expected

The cached outputs (and maybe a copy of the input, but I think that's a separate issue) to be pushed to the remote.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 3.28.0 (conda)
---------------------------
Platform: Python 3.10.13 on Linux-5.15.0-88-generic-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 2.20.0
        dvc_objects = 1.1.0
        dvc_render = 0.6.0
        dvc_task = 0.3.0
        scmrepo = 1.4.1
Supports:
        http (aiohttp = 3.8.6, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.6, aiohttp-retry = 2.8.3)
Config:
        Global: /home/mde48/.config/dvc
        System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/ubuntu--vg-ubuntu--lv
Caches: local
Remotes: local
Workspace directory: ext4 on /dev/mapper/ubuntu--vg-ubuntu--lv
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/45dbcc8b97aba74d36934a2f402ce97a
@dberenbaum
Copy link
Collaborator

Can confirm that I'm able to reproduce. Seems like this works fine in 3.27.0 and was introduced in 3.27.1. I'm guessing it's related to #9807 (cc @efiop).

@dberenbaum dberenbaum added bug Did we break something? p0-critical Critical issue. Needs to be fixed ASAP. A: data-sync Related to dvc get/fetch/import/pull/push labels Nov 6, 2023
@dberenbaum dberenbaum added this to DVC Nov 6, 2023
@github-project-automation github-project-automation bot moved this to Backlog in DVC Nov 6, 2023
@dberenbaum dberenbaum moved this from Backlog to Todo in DVC Nov 6, 2023
@github-project-automation github-project-automation bot moved this from Todo to Done in DVC Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: data-sync Related to dvc get/fetch/import/pull/push bug Did we break something? p0-critical Critical issue. Needs to be fixed ASAP.
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants