Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSH: ValueError: Can't create any SFTP connections! #16

Closed
gcoter opened this issue Oct 24, 2021 · 56 comments · Fixed by iterative/dvc#9311
Closed

SSH: ValueError: Can't create any SFTP connections! #16

gcoter opened this issue Oct 24, 2021 · 56 comments · Fixed by iterative/dvc#9311
Assignees

Comments

@gcoter
Copy link

gcoter commented Oct 24, 2021

Bug Report

I open this issue as a follow up to iterative/dvc#6138

Description

dvc push raises an error when trying to push to an SFTP remote. It used to work with older versions. The SFTP remote I use is a personal Raspberry Pi server. I did not change anything on the server.

Reproduce

Unfortunately, since I use a private server, I don't know whether it would be easy to reproduce.

After updating dvc, I tried to run dvc push and I got these logs:

$ dvc push -v -a
2021-10-03 11:59:51,299 DEBUG: Preparing to transfer data from '.dvc/cache' to 'ssh://<REMOTE URL>/gcoter/music-generation-v2.dvc'
2021-10-03 11:59:51,300 DEBUG: Preparing to collect status from 'ssh://<REMOTE URL>/gcoter/music-generation-v2.dvc'
2021-10-03 11:59:51,318 DEBUG: Collecting status from 'ssh://<REMOTE URL>/gcoter/music-generation-v2.dvc'
2021-10-03 11:59:51,918 DEBUG: Querying 38 hashes via object_exists
2021-10-03 12:00:30,881 ERROR: unexpected error - Can't create any SFTP connections!                                                                                         
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/command/base.py", line 45, in do_run
    return self.run()
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/command/data_sync.py", line 57, in run
    processed_files_count = self.repo.push(
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/repo/__init__.py", line 50, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/repo/push.py", line 48, in push
    pushed += self.cloud.push(obj_ids, jobs, remote=remote, odb=odb)
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/data_cloud.py", line 85, in push
    return transfer(
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/objects/transfer.py", line 221, in transfer
    status = compare_status(src, dest, obj_ids, check_deleted=False, **kwargs)
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 160, in compare_status
    dest_exists, dest_missing = status(
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 122, in status
    exists = hashes.intersection(
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 36, in _indexed_dir_hashes
    indexed_dir_exists.update(odb.list_hashes_exists(indexed_dirs))
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/objects/db/base.py", line 420, in list_hashes_exists
    ret = list(itertools.compress(hashes, in_remote))
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    yield fs.pop().result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/objects/db/base.py", line 411, in exists_with_progress
    ret = self.fs.exists(path_info)
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/dvc/fs/fsspec_wrapper.py", line 96, in exists
    return self.fs.exists(self._with_bucket(path_info))
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 480, in _exists
    await self._info(path)
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/sshfs/spec.py", line 135, in _info
    async with self._pool.get() as channel:
  File "/usr/lib/python3.8/contextlib.py", line 171, in __aenter__
    return await self.gen.__anext__()
  File "/home/gcoter/.cache/pypoetry/virtualenvs/music-generation-v2-7Nqe6-3y-py3.8/lib/python3.8/site-packages/sshfs/pools/soft.py", line 38, in get
    raise ValueError("Can't create any SFTP connections!")
ValueError: Can't create any SFTP connections!
------------------------------------------------------------
2021-10-03 12:00:31,971 DEBUG: Version info for developers:
DVC version: 2.7.2 (pip)
---------------------------------
Platform: Python 3.8.10 on Linux-5.4.0-88-generic-x86_64-with-glibc2.29
Supports:
	http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5),
	https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5),
	ssh (sshfs = 2021.9.0)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sda5
Caches: local
Remotes: ssh
Workspace directory: ext4 on /dev/sda5
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2021-10-03 12:00:31,973 DEBUG: Analytics is enabled.
2021-10-03 12:00:32,040 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpkmytybwl']'
2021-10-03 12:00:32,043 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpkmytybwl']'

Expected

Since I did not change the configuration of my server and it used to work, I would expect dvc push to work.

Environment information

$ dvc doctor
DVC version: 2.7.2 (pip)
---------------------------------
Platform: Python 3.8.10 on Linux-5.4.0-89-generic-x86_64-with-glibc2.29
Supports:
        http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5),
        https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.5),
        ssh (sshfs = 2021.9.0)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sda5
Caches: local
Remotes: ssh
Workspace directory: ext4 on /dev/sda5
Repo: dvc, git
@efiop
Copy link
Contributor

efiop commented Oct 24, 2021

@gcoter Maybe you could try to reproduce it with a docker image with the same config options? So far we are not able to reproduce ourselves.

@gcoter
Copy link
Author

gcoter commented Oct 24, 2021

@efiop Yes it is a good idea, actually I deployed the SFTP server as a docker container on my Raspberry Pi. However, the image I used has been built for ARM. I will try to reproduce the error locally on my computer with the original docker image.

@sjawhar
Copy link

sjawhar commented Nov 6, 2021

I created a quick SSH server in a Docker container, mounted my DVC cache into it, and set it as the remote for my local project. Running dvc pull didn't result in an error, but it did open over 1000 connections (according to netstat -tn). Is that expected?

@sjawhar
Copy link

sjawhar commented Nov 6, 2021

Also potentially relevant: when I get this error, it is always during the step of querying the remote cache. If I retry enough times and get past that step, then the actual up/downloading always succeeds.

@karajan1001
Copy link

I created a quick SSH server in a Docker container, mounted my DVC cache into it, and set it as the remote for my local project. Running dvc pull didn't result in an error, but it did open over 1000 connections (according to netstat -tn). Is that expected?

Do you set any --job related config and how many cores are there in your computer.

@sjawhar
Copy link

sjawhar commented Nov 7, 2021

Do you set any --job related config and how many cores are there in your computer.

No, I didn't use the --job flag. 8 cores, 16 threads.

@karajan1001
Copy link

Looks like we need to add some limitations on the status query.

@gcoter
Copy link
Author

gcoter commented Dec 19, 2021

Hi all, sorry for my late response! As I was about to try to reproduce this issue locally (as proposed by @efiop), I upgraded dvc to the last version (2.9.2) and now it works 🙂

@sjawhar Maybe you can try it and confirm whether it solves the issue for you as well?

@sjawhar
Copy link

sjawhar commented Dec 29, 2021

Unfortunately still an issue on 2.9.3

$ dvc pull --verbose --recursive pipelines/finger_tapping/
2021-12-29 21:15:54,657 DEBUG: Adding '/home/user/app/.dvc/config.local' to gitignore file.
2021-12-29 21:15:54,679 DEBUG: Adding '/home/user/app/.dvc/tmp' to gitignore file.
2021-12-29 21:15:54,679 DEBUG: Adding '/home/user/app/.dvc/cache' to gitignore file.
2021-12-29 21:15:54,687 DEBUG: Checking if stage 'pipelines/finger_tapping/' is in 'dvc.yaml'
2021-12-29 21:15:55,608 DEBUG: Preparing to transfer data from '/usr/data/project/dvc' to '/home/user/app/.dvc/cache'
2021-12-29 21:15:55,608 DEBUG: Preparing to collect status from '/home/user/app/.dvc/cache'
2021-12-29 21:15:55,619 DEBUG: Collecting status from '/home/user/app/.dvc/cache'
2021-12-29 21:15:55,735 DEBUG: Preparing to collect status from '/usr/data/project/dvc'                                                                                                                                                                                                                                                                                                                                   
2021-12-29 21:15:55,740 DEBUG: Collecting status from '/usr/data/project/dvc'
2021-12-29 21:15:55,856 DEBUG: Querying 126 hashes via object_exists
2021-12-29 21:16:14,510 ERROR: unexpected error - Can't create any SFTP connections!                                                                                                                                                                                                                                                                                                                                                              
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/command/base.py", line 45, in do_run
    return self.run()
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/command/data_sync.py", line 30, in run
    stats = self.repo.pull(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/pull.py", line 29, in pull
    processed_files_count = self.fetch(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/fetch.py", line 67, in fetch
    d, f = _fetch(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/fetch.py", line 87, in _fetch
    downloaded += repo.cloud.pull(obj_ids, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/data_cloud.py", line 114, in pull
    return transfer(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/transfer.py", line 153, in transfer
    status = compare_status(src, dest, obj_ids, check_deleted=False, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 164, in compare_status
    src_exists, src_missing = status(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 122, in status
    exists = hashes.intersection(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 36, in _indexed_dir_hashes
    indexed_dir_exists.update(odb.list_hashes_exists(indexed_dirs))
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/db/base.py", line 421, in list_hashes_exists
    ret = list(itertools.compress(hashes, in_remote))
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/db/base.py", line 412, in exists_with_progress
    ret = self.fs.exists(fs_path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/fsspec_wrapper.py", line 91, in exists
    return self.fs.exists(path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 549, in _exists
    await self._info(path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/sshfs/spec.py", line 135, in _info
    async with self._pool.get() as channel:
  File "/usr/local/lib/python3.8/contextlib.py", line 171, in __aenter__
    return await self.gen.__anext__()
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/sshfs/pools/soft.py", line 38, in get
    raise ValueError("Can't create any SFTP connections!")
ValueError: Can't create any SFTP connections!
------------------------------------------------------------
2021-12-29 21:16:14,984 DEBUG: Adding '/home/user/app/.dvc/config.local' to gitignore file.
2021-12-29 21:16:14,990 DEBUG: Adding '/home/user/app/.dvc/tmp' to gitignore file.
2021-12-29 21:16:14,990 DEBUG: Adding '/home/user/app/.dvc/cache' to gitignore file.
2021-12-29 21:16:14,991 DEBUG: [Errno 95] no more link types left to try out: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>: [Errno 18] Invalid cross-device link
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/command/base.py", line 45, in do_run
    return self.run()
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/command/data_sync.py", line 30, in run
    stats = self.repo.pull(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/pull.py", line 29, in pull
    processed_files_count = self.fetch(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/fetch.py", line 67, in fetch
    d, f = _fetch(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/fetch.py", line 87, in _fetch
    downloaded += repo.cloud.pull(obj_ids, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/data_cloud.py", line 114, in pull
    return transfer(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/transfer.py", line 153, in transfer
    status = compare_status(src, dest, obj_ids, check_deleted=False, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 164, in compare_status
    src_exists, src_missing = status(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 122, in status
    exists = hashes.intersection(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 36, in _indexed_dir_hashes
    indexed_dir_exists.update(odb.list_hashes_exists(indexed_dirs))
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/db/base.py", line 421, in list_hashes_exists
    ret = list(itertools.compress(hashes, in_remote))
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/db/base.py", line 412, in exists_with_progress
    ret = self.fs.exists(fs_path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/fsspec_wrapper.py", line 91, in exists
    return self.fs.exists(path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 549, in _exists
    await self._info(path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/sshfs/spec.py", line 135, in _info
    async with self._pool.get() as channel:
  File "/usr/local/lib/python3.8/contextlib.py", line 171, in __aenter__
    return await self.gen.__anext__()
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/sshfs/pools/soft.py", line 38, in get
    raise ValueError("Can't create any SFTP connections!")
ValueError: Can't create any SFTP connections!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 28, in _link
    func(from_path, to_path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/local.py", line 148, in reflink
    System.reflink(from_info, to_info)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/system.py", line 112, in reflink
    System._reflink_linux(source, link_name)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/system.py", line 96, in _reflink_linux
    fcntl.ioctl(d.fileno(), FICLONE, s.fileno())
OSError: [Errno 18] Invalid cross-device link

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 69, in _try_links
    return _link(link, from_fs, from_path, to_fs, to_path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 32, in _link
    raise OSError(
OSError: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 124, in _test_link
    _try_links([link], from_fs, from_file, to_fs, to_file)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 77, in _try_links
    raise OSError(
OSError: [Errno 95] no more link types left to try out
------------------------------------------------------------
2021-12-29 21:16:14,992 DEBUG: Removing '/home/user/.VxePWiE728u2v5gnSQT3vY.tmp'
2021-12-29 21:16:14,992 DEBUG: [Errno 95] no more link types left to try out: [Errno 95] 'hardlink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>: [Errno 18] Invalid cross-device link: '/home/user/app/.dvc/cache/.RhJXijpmS46m4MKMHUFkXk.tmp' -> '/home/user/.VxePWiE728u2v5gnSQT3vY.tmp'
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/command/base.py", line 45, in do_run
    return self.run()
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/command/data_sync.py", line 30, in run
    stats = self.repo.pull(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/pull.py", line 29, in pull
    processed_files_count = self.fetch(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/fetch.py", line 67, in fetch
    d, f = _fetch(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/repo/fetch.py", line 87, in _fetch
    downloaded += repo.cloud.pull(obj_ids, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/data_cloud.py", line 114, in pull
    return transfer(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/transfer.py", line 153, in transfer
    status = compare_status(src, dest, obj_ids, check_deleted=False, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 164, in compare_status
    src_exists, src_missing = status(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 122, in status
    exists = hashes.intersection(
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/status.py", line 36, in _indexed_dir_hashes
    indexed_dir_exists.update(odb.list_hashes_exists(indexed_dirs))
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/db/base.py", line 421, in list_hashes_exists
    ret = list(itertools.compress(hashes, in_remote))
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/objects/db/base.py", line 412, in exists_with_progress
    ret = self.fs.exists(fs_path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/fsspec_wrapper.py", line 91, in exists
    return self.fs.exists(path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/fsspec/asyn.py", line 549, in _exists
    await self._info(path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/sshfs/spec.py", line 135, in _info
    async with self._pool.get() as channel:
  File "/usr/local/lib/python3.8/contextlib.py", line 171, in __aenter__
    return await self.gen.__anext__()
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/sshfs/pools/soft.py", line 38, in get
    raise ValueError("Can't create any SFTP connections!")
ValueError: Can't create any SFTP connections!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 28, in _link
    func(from_path, to_path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/local.py", line 141, in hardlink
    System.hardlink(from_info, to_info)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/system.py", line 39, in hardlink
    os.link(src, link_name)
OSError: [Errno 18] Invalid cross-device link: '/home/user/app/.dvc/cache/.RhJXijpmS46m4MKMHUFkXk.tmp' -> '/home/user/.VxePWiE728u2v5gnSQT3vY.tmp'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 69, in _try_links
    return _link(link, from_fs, from_path, to_fs, to_path)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 32, in _link
    raise OSError(
OSError: [Errno 95] 'hardlink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 124, in _test_link
    _try_links([link], from_fs, from_file, to_fs, to_file)
  File "/home/user/.cache/pypoetry/virtualenvs/app-gcEbEm5J-py3.8/lib/python3.8/site-packages/dvc/fs/utils.py", line 77, in _try_links
    raise OSError(
OSError: [Errno 95] no more link types left to try out
------------------------------------------------------------
2021-12-29 21:16:14,993 DEBUG: Removing '/home/user/.VxePWiE728u2v5gnSQT3vY.tmp'
2021-12-29 21:16:14,993 DEBUG: Removing '/home/user/.VxePWiE728u2v5gnSQT3vY.tmp'
2021-12-29 21:16:14,993 DEBUG: Removing '/home/user/app/.dvc/cache/.RhJXijpmS46m4MKMHUFkXk.tmp'
2021-12-29 21:16:15,000 DEBUG: Version info for developers:
DVC version: 2.9.3 (pip)
---------------------------------
Platform: Python 3.8.8 on Linux-5.15.8-76051508-generic-x86_64-with-glibc2.2.5
Supports:
        hdfs (fsspec = 2021.11.1, pyarrow = 4.0.1),
        webhdfs (fsspec = 2021.11.1),
        http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.4.6),
        ssh (sshfs = 2021.11.2)
Cache types: symlink
Cache directory: ext4 on /dev/mapper/data-root
Caches: local
Remotes: ssh, ssh
Workspace directory: ext4 on /dev/mapper/data-root
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2021-12-29 21:16:15,002 DEBUG: Analytics is enabled.
2021-12-29 21:16:15,085 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpiyk0nprz']'
2021-12-29 21:16:15,088 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpiyk0nprz']'

@efiop
Copy link
Contributor

efiop commented Dec 29, 2021

@sjawhar Could you try --jobs 1?

Also, how many dvc files do you have in pipelines/finger_tapping/? Could you run

find pipelines/finger_tapping -type f -name '*.dvc' | wc -l

@mistermult
Copy link

mistermult commented Jan 26, 2022

I had the same error. Unfortunately, it is hard to replicate:

  • host1with no (or almost no) files to push: dvc push in repository root - no error
  • host2: dvc push in repository root on - Error
  • host2: Called dvc push --recursive . in each subdirectory (500 files/500GB total; 30 files/100GB total; 1-5 files/50GB). Files hat sizes up to 90GB - No Error
  • host2: dvc push in repository root on after doing it in each subdirectory (see above) - no error anymore

I also have directories with tens of thousands of files in the repository. But they where pushed in the past and there was no error.

@gcoter @sjawhar Maybe you can try it with an much older version, e.g. 1.2.x. In the past, I did exactly the same steps that resulted in this error with the newest version. With 1.2.x it worked in the past.

@gcoter
Copy link
Author

gcoter commented Feb 4, 2022

Hi @mistermult, thanks for your feedbacks 🙂 Indeed, using an older version worked for me as well and I encountered this issue when using a more recent version.

But since I have upgraded DVC (#16), I don't have this issue anymore. Which version of DVC are you using? In my case, I think the issue disappeared after version 2.9.2.

@gcoter
Copy link
Author

gcoter commented Feb 4, 2022

But it is weird because upgrading did not work for @sjawhar 🙁

@sjawhar
Copy link

sjawhar commented Feb 4, 2022

@sjawhar Could you try --jobs 1?

Also, how many dvc files do you have in pipelines/finger_tapping/? Could you run

find pipelines/finger_tapping -type f -name '*.dvc' | wc -l

I've tried with --jobs 1, still fails intermittently. Upgrading to 2.8.x helped a bit, it fails less often, but still intermittently. I can't yet upgrade to 2.9.x because that completely breaks on our infrastructure and I haven't had time to figure out why.

There are quite a few outputs in this repo, which might be why I have the issue. There are 18 or so .dvc files, which each track a directory that contains several files. Then each of those 18 directories gets processed through 10 or so stages (foreach), each of which also outputs a directory. So, lots of files, lots of directories.

@sjawhar
Copy link

sjawhar commented Mar 9, 2022

I'm getting this error still with 2.9.5

@ilankor
Copy link

ilankor commented Mar 10, 2022

Hi,
I am having the same issue with "unexpected error - Can't create any SFTP connections!" when running dvc push/pull.

Would appreciate any help!

@pared
Copy link
Contributor

pared commented Mar 10, 2022

@ilankor
reached out to us on discord:

stack trace:

2022-03-07 16:39:45,515 DEBUG: Preparing to transfer data from 'ssh://server/' to '.dvc/cache'
2022-03-07 16:39:45,516 DEBUG: Preparing to collect status from '.dvc/cache'
2022-03-07 16:39:46,512 DEBUG: Collecting status from '.dvc/cache'
2022-03-07 16:40:01,213 DEBUG: Preparing to collect status from 'ssh://server/'
2022-03-07 16:40:02,125 DEBUG: Collecting status from 'ssh://server'
2022-03-07 16:40:02,126 DEBUG: Querying 128 hashes via object_exists
2022-03-07 16:40:05,930 ERROR: unexpected error - Can't create any SFTP connections!
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/command/base.py", line 45, in do_run
    return self.run()
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/command/data_sync.py", line 41, in run
    glob=self.args.glob,
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/repo/__init__.py", line 50, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/repo/pull.py", line 38, in pull
    run_cache=run_cache,
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/repo/__init__.py", line 50, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/repo/fetch.py", line 72, in fetch
    odb=odb,
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/repo/fetch.py", line 87, in _fetch
    downloaded += repo.cloud.pull(obj_ids, **kwargs)
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/data_cloud.py", line 121, in pull
    verify=odb.verify,
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/objects/transfer.py", line 153, in transfer
    status = compare_status(src, dest, obj_ids, check_deleted=False, **kwargs)
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/objects/status.py", line 167, in compare_status
    src, obj_ids, index=src_index, **kwargs
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/objects/status.py", line 123, in status
    _indexed_dir_hashes(odb, index, dir_objs, name, cache_odb)
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/objects/status.py", line 48, in _indexed_dir_hashes
    dir_exists.update(odb.list_hashes_exists(dir_hashes - dir_exists))
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/objects/db/base.py", line 415, in list_hashes_exists
    ret = list(itertools.compress(hashes, in_remote))
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
    yield fs.pop().result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/objects/db/base.py", line 406, in exists_with_progress
    ret = self.fs.exists(path_info)
  File "/home/eden/.local/lib/python3.6/site-packages/dvc/fs/fsspec_wrapper.py", line 136, in exists
    return self.fs.exists(self._with_bucket(path_info))
  File "/home/eden/.local/lib/python3.6/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/eden/.local/lib/python3.6/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/home/eden/.local/lib/python3.6/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/home/eden/.local/lib/python3.6/site-packages/fsspec/asyn.py", line 555, in _exists
    await self._info(path)
  File "/home/eden/.local/lib/python3.6/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File "/home/eden/.local/lib/python3.6/site-packages/sshfs/spec.py", line 135, in _info
    async with self._pool.get() as channel:
  File "/home/eden/.local/lib/python3.6/site-packages/sshfs/compat.py", line 23, in __aenter__
    return await self.gen.__anext__()
  File "/home/eden/.local/lib/python3.6/site-packages/sshfs/pools/soft.py", line 38, in get
    raise ValueError("Can't create any SFTP connections!")
ValueError: Can't create any SFTP connections!
------------------------------------------------------------
2022-03-07 16:40:06,057 DEBUG: Version info for developers:
DVC version: 2.8.1 (pip)
---------------------------------
Platform: Python 3.6.9 on Linux-4.15.0-169-generic-x86_64-with-Ubuntu-18.04-bionic
Supports:
        webhdfs (fsspec = 2022.1.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        ssh (sshfs = 2021.11.2)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sda1
Caches: local
Remotes: ssh
Workspace directory: ext4 on /dev/sda1
Repo: dvc, git
 
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2022-03-07 16:40:06,058 DEBUG: Analytics is enabled.
2022-03-07 16:40:06,083 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpjgmju351']'
2022-03-07 16:40:06,094 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpjgmju351']'

discussion:
https://discord.com/channels/485586884165107732/485596304961962003/950780018999562250

@dberenbaum
Copy link

@dtrifiro You might want to keep an eye on this one.

@ilankor
Copy link

ilankor commented Mar 10, 2022

Thanks! The strange thing is, other users can run "dvc push/pull" from their profiles and the same server

@efiop efiop transferred this issue from iterative/dvc Jan 1, 2023
@ulie50
Copy link

ulie50 commented Jan 11, 2023

I am having the same issue with dvc 2.41.1 (dvc-ssh 2.20.0) when I try to push on the ssh server two csv files (each about 50 MB). The Server has
OpenSSH_7.9p1 and I can transfer files with scp to the folder which I specified in dvc remote add('/opt/textminer/text_miner_dvc' in my case). Here is the stack trace:

2023-01-11 10:33:38,297 DEBUG: Preparing to transfer data from '/home/usr/git/textminer_api/.dvc/cache' to '/opt/textminer/text_miner_dvc'
2023-01-11 10:33:38,298 DEBUG: Preparing to collect status from '/opt/textminer/text_miner_dvc'
2023-01-11 10:33:38,298 DEBUG: Collecting status from '/opt/textminer/text_miner_dvc'
2023-01-11 10:33:38,794 ERROR: unexpected error - Can't create any SFTP connections!                                                                                                                   
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc/cli/__init__.py", line 185, in main
    ret = cmd.do_run()
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc/commands/data_sync.py", line 59, in run
    processed_files_count = self.repo.push(
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc/repo/__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc/repo/push.py", line 92, in push
    result = self.cloud.push(
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc/data_cloud.py", line 143, in push
    return self.transfer(
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc/data_cloud.py", line 124, in transfer
    return transfer(src_odb, dest_odb, objs, **kwargs)
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_data/hashfile/transfer.py", line 190, in transfer
    status = compare_status(
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_data/hashfile/status.py", line 179, in compare_status
    dest_exists, dest_missing = status(
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_data/hashfile/status.py", line 151, in status
    odb.oids_exist(hashes, jobs=jobs, progress=pbar.callback)
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_objects/db.py", line 367, in oids_exist
    remote_size, remote_oids = self._estimate_remote_size(
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_objects/db.py", line 244, in _estimate_remote_size
    remote_oids = set(iter_with_pbar(oids))
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_objects/db.py", line 234, in iter_with_pbar
    for oid in oids:
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_objects/db.py", line 200, in _oids_with_limit
    for oid in self._list_oids(prefix):
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_objects/db.py", line 190, in _list_oids
    for path in self._list_paths(prefix):
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_objects/db.py", line 174, in _list_paths
    yield from self.fs.find(self.fs.path.join(*parts), prefix=bool(prefix))
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/dvc_objects/fs/base.py", line 366, in find
    yield from self.fs.find(path)
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/fsspec/asyn.py", line 113, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/fsspec/asyn.py", line 98, in sync
    raise return_result
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/fsspec/asyn.py", line 53, in _runner
    result[0] = await coro
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/fsspec/asyn.py", line 735, in _find
    async for _, dirs, files in self._walk(path, maxdepth, detail=True, **kwargs):
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/fsspec/asyn.py", line 607, in _walk
    listing = await self._ls(path, detail=True, **kwargs)
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/sshfs/spec.py", line 197, in _ls
    async with self._pool.get() as channel:
  File "/usr/lib/python3.8/contextlib.py", line 171, in __aenter__
    return await self.gen.__anext__()
  File "/home/usr/git/textminer_api/analysis_of_existing_model/.venv/lib/python3.8/site-packages/sshfs/pools/soft.py", line 38, in get
    raise ValueError("Can't create any SFTP connections!")
ValueError: Can't create any SFTP connections!
------------------------------------------------------------
2023-01-11 10:33:38,985 DEBUG: link type reflink is not available ([Errno 95] no more link types left to try out)
2023-01-11 10:33:38,985 DEBUG: Removing '/home/usr/git/.EXtcfySXBt3evPxN663izM.tmp'
2023-01-11 10:33:38,985 DEBUG: Removing '/home/usr/git/.EXtcfySXBt3evPxN663izM.tmp'
2023-01-11 10:33:38,985 DEBUG: Removing '/home/usr/git/.EXtcfySXBt3evPxN663izM.tmp'
2023-01-11 10:33:38,986 DEBUG: Removing '/home/usr/git/textminer_api/.dvc/cache/.GLXQSdg6vWyNowTkYfQpk2.tmp'
2023-01-11 10:33:38,989 DEBUG: Version info for developers:
DVC version: 2.41.1 (pip)
---------------------------------
Platform: Python 3.8.10 on Linux-5.15.0-56-generic-x86_64-with-glibc2.29
Subprojects:
        dvc_data = 0.29.0
        dvc_objects = 0.14.1
        dvc_render = 0.0.17
        dvc_task = 0.1.9
        dvclive = 1.3.2
        scmrepo = 0.1.5
Supports:
        http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
        ssh (sshfs = 2022.6.0)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p2
Caches: local
Remotes: ssh
Workspace directory: ext4 on /dev/nvme0n1p2
Repo: dvc, git

@Cnly
Copy link

Cnly commented Feb 5, 2023

Same here on dvc 2.43.1. Setting the log level to DEBUG2 for the ssh server gives lines like these:

debug1: channel 8: new [server-session]
debug2: session_new: allocate (allocated 8 max 10)
debug1: session_new: session 8
debug1: session_open: channel 8
debug1: session_open: session 8: link with channel 8
debug1: server_input_channel_open: confirm session
debug1: server_input_channel_open: ctype session rchan 9 win 2097152 max 32768
debug1: input_session_request
debug1: channel 9: new [server-session]
debug2: session_new: allocate (allocated 9 max 10)
debug1: session_new: session 9
debug1: session_open: channel 9
debug1: session_open: session 9: link with channel 9
debug1: server_input_channel_open: confirm session
debug1: server_input_channel_open: ctype session rchan 10 win 2097152 max 32768
debug1: input_session_request
debug2: channel: expanding 20
debug1: channel 10: new [server-session]
debug1: session_open: channel 10
error: no more sessions
debug1: session open failed, free channel 10
debug1: channel 10: free: server-session, nchannels 11
debug1: server_input_channel_open: failure session

A workaround is to increase the value of MaxSessions in sshd_config (default is 10).

@efiop
Copy link
Contributor

efiop commented Feb 5, 2023

@Cnly Thanks for the research and detailed report! Looks like we might need to tweak the max_sessions option for sshfs, or maybe we should lower the default (currently 10) in https://github.com/fsspec/sshfs/blob/b912e88d4a81d15cc660f3cb2f3a52480306d277/sshfs/spec.py#L28 or we might want to switch to SFTPHardChannelPool by default. The latter seems to be the best one. Maybe you could try adjusting it (just need to pass pool_type=SFTPHardChannelPool in

return _SSHFileSystem(**self.fs_args)
and contribute a patch if it works for you?

@Cnly
Copy link

Cnly commented Feb 6, 2023

@efiop Thanks for the quick response! Unfortunately pool_type=SFTPHardChannelPool doesn't seem to fix the issue.

...
  File "xxx/venv/lib/python3.8/site-packages/dvc_objects/executors.py", line 134, in batch_coros
    result = fut.result()
  File "xxx/venv/lib/python3.8/site-packages/fsspec/asyn.py", line 568, in _exists
    await self._info(path)
  File "xxx/venv/lib/python3.8/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File "xxx/venv/lib/python3.8/site-packages/sshfs/spec.py", line 125, in _info
    async with self._pool.get() as channel:
  File "xxx/.pyenv/versions/3.8.12/lib/python3.8/contextlib.py", line 171, in __aenter__
    return await self.gen.__anext__()
  File "xxx/venv/lib/python3.8/site-packages/sshfs/pools/hard.py", line 28, in get
    raise ValueError("Can't create any SFTP connections!")
ValueError: Can't create any SFTP connections!

I also tried modifying _DEFAULT_MAX_SESSIONS in sshfs/spec.py, but it seems it still tried to create 10 sessions according to the ssh server logs.

@efiop
Copy link
Contributor

efiop commented Feb 6, 2023

Thanks, @Cnly ! 🙏 Looks like we'll need a bit more research here.

@pmrowla
Copy link
Contributor

pmrowla commented Apr 1, 2023

I believe the number of sessions is set by the MaxSessions parameter in /etc/ssh/sshd_config on the server, and I believe the default is 10.

This is correct, and we do have client level maximum sessions value which is set to have a limit of 10, regardless of your --jobs setting

Setting the --jobs parameter to 4x the number of cores is grossly over the limit of 10 for all but one- or two-core cpus. For example, my cpu has 24 cores, 32 threads, which would be 96 or 128 simultaneous connections.

A safe default for --jobs would seem to be 8. This would allow for two other ssh/sftp connections from other applications. Those running high capacity servers could increase --jobs as they see fit, and as their MaxSessions allows.

@pmrowla did we get many complains about performance when it was capped at 4?

These questions are related, and no, it was not changed due to performance complaints. The issue is that the way --jobs setting in DVC works has changed vs the old behavior now that we use fsspec under the hood, and it seemed unnecessary to maintain separate defaults per DVC remote type.

Previously, --jobs was a hard limit for the number of parallel threads (with a single SFTP session per thread) used for network transfers.

The way it works now is that --jobs is a limit for the number of asyncio coroutines fsspec will allow to be active at a time. This batch of active coroutines is then also throttled by the underlying filesystem, which should be using a connection pool of whatever size the filesystem implementation decides. In sshfs, the way this is supposed to work is that we divide up any network requests between sessions in our pool, which is currently SFTPSoftChannelPool(max_sessions=10).

So in theory, it should still be safe to have a relatively high number of jobs. --jobs=128 doesn't mean that we try to open 128 simultaneous SFTP sessions, it means that we keep 128 queued requests at a time, that are actually only handled up to 10 at a time (via our session pool).

In practice, it may be that there is a problem with session pool implementations in sshfs where it doesn't properly handle cases where the # of active coroutines is larger than the pool size, which is why I initially suggested that we just use fs.config.max_sessions as the allowed maximum for --jobs.

@daavoo
Copy link
Contributor

daavoo commented Apr 1, 2023

The way it works now is that --jobs is a limit for the number of asyncio coroutines fsspec will allow to be active at a time. This batch of active coroutines is then also throttled by the underlying filesystem, which should be using a connection pool of whatever size the filesystem implementation decides. In sshfs, the way this is supposed to work is that we divide up any network requests between sessions in our pool, which is currently SFTPSoftChannelPool(max_sessions=10).

So in theory, it should still be safe to have a relatively high number of jobs. --jobs=128 does mean that we try to open 128 simultaneous SFTP sessions, it means that we keep 128 queued requests at a time, that are actually only handled up to 10 at a time (via our session pool).

Thanks for the explanation 🙏

In practice, it may be that there is a problem with session pool implementations in sshfs where it doesn't properly handle cases where the # of active coroutines is larger than the pool size, which is why I initially suggested that we just use fs.config.max_sessions as the allowed maximum for --jobs.

The code appears to be 2 years untouched and we heavily changed the usage upstream, may be worth dedicating some time to review the implementation on top of that change

@pmrowla
Copy link
Contributor

pmrowla commented Apr 1, 2023

I think we should also consider exposing max_sessions as an SSH remote config option as well (assuming that we look into fixing the pool behavior), given that there is a distinct difference between --jobs and the session count now, and that the server-side limit is based on the session count

@pmrowla pmrowla moved this from Todo to Review In Progress in DVC Apr 4, 2023
@pmrowla pmrowla moved this from Review In Progress to Todo in DVC Apr 4, 2023
@pmrowla pmrowla moved this from Todo to Backlog in DVC Apr 4, 2023
@pmrowla pmrowla removed their assignment Apr 4, 2023
@drozzy
Copy link

drozzy commented Apr 4, 2023

As per my earlier comment, the only semi-viable solution is to use the old version of dvc 2.41.1.
This makes the git hooks that invoke dvc push automatically work, and those are installed by dvc.

However, the old version of dvc 2.41.1 now breaks the VSCode dvc plugin.
image

@dberenbaum
Copy link

@drozzy Does the suggestion above to use dvc remote modify [--local] my-ssh-remote jobs 4 solve your issue? This should essentially match the behavior in 2.41.1.

@drozzy
Copy link

drozzy commented Apr 4, 2023

@dberenbaum Yes, dvc remote modify my-ssh-remote jobs 4 fixed the issue.
Thank you.

@pmrowla
Copy link
Contributor

pmrowla commented Apr 6, 2023

There was a bug in the sshfs soft channel pool handling that caused this issue for cases where jobs exceeded the server's MaxSession count. This will be fixed in the next sshfs/dvc-ssh release.

After the fix, it should no longer be necessary for most users to set --jobs for SSH remotes (and it will be safe to use the default number of jobs even with a high CPU core count). The soft channel pool will open as many channels as allowed by the server (up to the sshfs default of 10) and then divide up to --jobs # of coroutines between available pool channels as expected.

I think it is still worth exposing max_sessions to control the pool behavior. In some situations users may want to explicitly set this to a value lower than the server's MaxSessions in order to ensure that some number of SSH sessions are not used by DVC (i.e. to leave some dedicated number of sessions available for user ssh shell connections)

@pmrowla pmrowla moved this from Backlog to Review In Progress in DVC Apr 6, 2023
@efiop
Copy link
Contributor

efiop commented Apr 6, 2023

@pmrowla Thank you for looking into it! 🔥

@pmrowla
Copy link
Contributor

pmrowla commented Apr 6, 2023

This fix will be available in the next DVC release, in the meantime users using pip installations can also get the fix with

$ pip install dvc-ssh==2.22.1

@Cnly
Copy link

Cnly commented Apr 6, 2023

Great work, thank you! Can confirm this fixes my case.

@johnyaku
Copy link

Thanks again for the considered response.

I can verify that DVC 2.53 is working fine in our case, even with MaxSessions at the default of 10 from a machine with 64 cores.

@Cnly
Copy link

Cnly commented Apr 13, 2023

Just a side note for pip users: You need to also update dvc[ssh] if updating dvc doesn't fix this: pip install -U "dvc[ssh]"

@johnyaku
Copy link

Actually, I may have spoken too soon.

My environment includes both dvc 2.53 and dvc-ssh 2.22.1.

Sometimes I can simply dvc push to an SSH host. Other times I have to specify --jobs in order to get a connection.

Some of this erratic behaviour might be due to the pecularities of our network setup, but it is also possible that this issue is still not quite fixed. I'll collect a verbose log next time it happens, but let me know if there is any other information that might help troubleshoot.

@johnyaku
Copy link

Didn't have to wait long. Here is the (slightly truncated) log:

2023-04-13 14:26:12,443 DEBUG: v2.53.0 (conda), CPython 3.11.0 on Linux-3.10.0-1160.42.2.el7.x86_64-x86_64-with-glibc2.17
2023-04-13 14:26:12,444 DEBUG: command: /directflow/TumourProgressionGroupTemp/projects/dvc/envs/dvc_latest/bin/dvc push -r gadi-chromium-raw R_200608_ALESWA_INT_10X_NEOMET -vv
2023-04-13 14:26:12,444 TRACE: Namespace(cprofile=False, yappi=False, yappi_separate_threads=False, viztracer=False, viztracer_depth=None, viztracer_async=False, cprofile_dump=None, pdb=False, instrument=False, instrument_open=False, show_stack=False, quiet=0, verbose=2, cd='.', cmd='push', jobs=None, targets=['R_200608_ALESWA_INT_10X_NEOMET'], remote='gadi-chromium-raw', all_branches=False, all_tags=False, all_commits=False, with_deps=False, recursive=False, run_cache=False, glob=False, func=<class 'dvc.commands.data_sync.CmdDataPush'>, parser=DvcParser(prog='dvc', usage=None, description='Data Version Control', formatter_class=<class 'argparse.RawTextHelpFormatter'>, conflict_handler='error', add_help=False))
...
collecting stages
...
2023-04-13 14:26:13,232 DEBUG: Checking if stage 'R_200608_ALESWA_INT_10X_NEOMET' is in 'dvc.yaml'
2023-04-13 14:26:13,345 DEBUG: Preparing to transfer data from '/directflow/TumourProgressionGroupTemp/projects/dvc/cache' to '/g/data/a56/dvc/remotes/chromium-raw'
2023-04-13 14:26:13,345 DEBUG: Preparing to collect status from '/g/data/a56/dvc/remotes/chromium-raw'
2023-04-13 14:26:13,346 DEBUG: Collecting status from '/g/data/a56/dvc/remotes/chromium-raw'
2023-04-13 14:26:13,616 DEBUG: Querying 17 oids via object_exists
2023-04-13 14:27:59,957 DEBUG: link type reflink is not available ([Errno 95] no more link types left to try out)
2023-04-13 14:27:59,957 DEBUG: Removing '/share/ScratchGeneral/johree/data/registries/chromium/.K4tcpXxSeNJswTpZ34eQHK.tmp'
2023-04-13 14:27:59,958 DEBUG: link type hardlink is not available ([Errno 95] no more link types left to try out)
2023-04-13 14:27:59,958 DEBUG: Removing '/share/ScratchGeneral/johree/data/registries/chromium/.K4tcpXxSeNJswTpZ34eQHK.tmp'
2023-04-13 14:27:59,961 DEBUG: Removing '/share/ScratchGeneral/johree/data/registries/chromium/.K4tcpXxSeNJswTpZ34eQHK.tmp'
2023-04-13 14:27:59,963 DEBUG: Removing '/directflow/TumourProgressionGroupTemp/projects/dvc/cache/.TKBQ6DhhuVDr92phikC6rn.tmp'
2023-04-13 14:27:59,982 DEBUG: Version info for developers:
DVC version: 2.53.0 (conda)
---------------------------
Platform: Python 3.11.0 on Linux-3.10.0-1160.42.2.el7.x86_64-x86_64-with-glibc2.17
Subprojects:
        dvc_data = 0.47.1
        dvc_objects = 0.21.1
        dvc_render = 0.3.1
        dvc_task = 0.2.0
        scmrepo = 0.2.1
Supports:
        gs (gcsfs = 2023.3.0),
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2023.3.0, boto3 = 1.24.59),
        ssh (sshfs = 0.0.0)
Cache types: symlink
Cache directory: panfs on panfs://10.0.2.1/TumourProgressionGroupTemp
Caches: local
Remotes: ssh, gs, ssh, ssh, ssh, gs, local
Workspace directory: panfs on panfs://10.0.2.1/ScratchGeneral
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/a2be81959d7a3a40132a2834fa949fcd
2023-04-13 14:27:59,985 DEBUG: Analytics is enabled.
2023-04-13 14:28:00,152 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpbu336hf4']'
2023-04-13 14:28:00,156 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpbu336hf4']'

See also ...

conda list | grep dvc
# packages in environment at /directflow/TumourProgressionGroupTemp/projects/dvc/envs/dvc_latest:
dvc                       2.53.0             pyhd8ed1ab_1    conda-forge
dvc-data                  0.47.1             pyhd8ed1ab_0    conda-forge
dvc-gs                    2.22.0             pyhd8ed1ab_0    conda-forge
dvc-http                  2.30.2             pyhd8ed1ab_2    conda-forge
dvc-objects               0.21.1             pyhd8ed1ab_0    conda-forge
dvc-render                0.3.1              pyhd8ed1ab_0    conda-forge
dvc-s3                    2.21.0             pyhd8ed1ab_0    conda-forge
dvc-ssh                   2.22.1             pyhd8ed1ab_0    conda-forge
dvc-studio-client         0.6.1              pyhd8ed1ab_0    conda-forge
dvc-task                  0.2.0              pyhd8ed1ab_0    conda-forge

conda list | grep ssh
asyncssh                  2.13.1             pyhd8ed1ab_0    conda-forge
dvc-ssh                   2.22.1             pyhd8ed1ab_0    conda-forge
libssh2                   1.10.0               hf14f497_3    conda-forge
sshfs                     2023.1.0           pyhd8ed1ab_0    conda-forge

@pmrowla
Copy link
Contributor

pmrowla commented Apr 13, 2023

@johnyaku, are there other users accessing the SSH server at the same time as you? If someone else is eating up the available server-side session count, that would account for why it appears to only work intermittently for you.
(and note that this includes regular ssh shell sessions, DVC push/pull operations, or any other SFTP client connection)

You can also now use dvc remote modify <remotename> max_sessions <value> to cap the number of sessions DVC will try to use (it defaults to 10).

If you have a scenario where multiple users are regularly accessing your server (either via ssh shells or using DVC) you should consider increasing the server-side sshd MaxSessions count and/or capping max_sessions in DVC

@johnyaku
Copy link

Thanks @pmrowla. That is indeed quite possible. We have five "data mover" nodes with fast connections to the remote, and so I think most of the time I'll be the only person pushing from a particular node, but there might sometimes be two or more of us working on it at the same time, depening on the who gets allocated where. I'll double check this next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.