Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CachingFileSystem / SFTP read error #748

Closed
t-arsicaud-catie opened this issue Sep 10, 2021 · 2 comments
Closed

CachingFileSystem / SFTP read error #748

t-arsicaud-catie opened this issue Sep 10, 2021 · 2 comments

Comments

@t-arsicaud-catie
Copy link

Hi,

I'm trying to use the caching file system alongside with the sftp file system, with the following code :

import os

import fsspec.implementations.cached

SFTP_HOST="localhost"
SFTP_PORT=2022
SFTP_USER = "user"
SFTP_PASSWORD = "passwd"

CACHE_DIR = "./cache/"

if not os.path.isdir(CACHE_DIR):
    os.mkdir(CACHE_DIR)

fs = fsspec.implementations.cached.CachingFileSystem(
    target_protocol="sftp",
    cache_storage=CACHE_DIR,
    target_options = {
        "host": SFTP_HOST,
        "username": SFTP_USER,
        "password": SFTP_PASSWORD,
        "port": SFTP_PORT,
        "look_for_keys": False,
        "allow_agent": False
        }
    )

test_file = "/test.txt"

with fs.open(test_file, "wb") as f:
    f.write(b"Hello !")

assert fs.exists(test_file)

with fs.open(test_file, "rb") as f:
    print(f.read())

The "test.txt" file is created on the server as expected but I get an error when I try to read it back :

Traceback (most recent call last):
  File "debug-sftp.py", line 38, in <module>
    with fs.open(test_file_path_name, "rb") as f:
  File "/home/thierry/.virtualenvs/fsspec/lib/python3.8/site-packages/fsspec/implementations/cached.py", line 393, in <lambda>
    return lambda *args, **kw: getattr(type(self), item).__get__(self)(
  File "/home/thierry/.virtualenvs/fsspec/lib/python3.8/site-packages/fsspec/spec.py", line 978, in open
    f = self._open(
  File "/home/thierry/.virtualenvs/fsspec/lib/python3.8/site-packages/fsspec/implementations/cached.py", line 393, in <lambda>
    return lambda *args, **kw: getattr(type(self), item).__get__(self)(
  File "/home/thierry/.virtualenvs/fsspec/lib/python3.8/site-packages/fsspec/implementations/cached.py", line 328, in _open
    detail["blocksize"] = f.blocksize
AttributeError: 'SFTPFile' object has no attribute 'blocksize'

Are there special options to use to avoid this error or I am missing something ?

A silimar code, using CachingFileSystem alongside with S3 works, test file being created and read without error.

Tested with fsspec==2021.8.1.

@martindurant
Copy link
Member

Unfortunately, CachingFileSystem , which only stores the specific blocks you have accessed, only works with files which subclass AbstractBufferedFile: so yes s3, no sftp. You can still use WholeFileCacheFileSystem or SimpleCacheFileSystem, however.

@t-arsicaud-catie
Copy link
Author

Ok, thank you very much for your quick answer and these usefull explanations !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants