Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http get() call creates malformed dest path #1505

Open
cgrass opened this issue Jan 16, 2024 · 1 comment
Open

http get() call creates malformed dest path #1505

cgrass opened this issue Jan 16, 2024 · 1 comment

Comments

@cgrass
Copy link

cgrass commented Jan 16, 2024

See original discussion here and a linked bug here

Issue

Creating an http file system and using an rpath with query params results in a malformed destination path.

Steps to Reproduce

Using the test below, update lpath to a path you can write to.

def test_http_output():
    kwargs = {}
    fs = fsspec.implementations.http.HTTPFileSystem(fsspec.filesystem("https", **kwargs))
    expected_output_path = "/var/myloc/outputfile.txt"
    rpath = "https://httpbin.org/gzip?test=value"
    lpath = expected_output_path
   fs.get(rpath, lpath)

This behavior only happens (in my testing) when the rpath uses query params. That triggers exists to be set to True because has_magic returns True.

It also requires lpath and rpath to be strings rather than lists.

Expected Results

A file is written to /var/myloc/outputfile.txt

Actual Results

A file is written to /var/myloc/outputfile.txt/gzip\?test=value

Notes

It's also possible to test the underlying path construction using testutils.py test_other_paths and adding the following parameterized test: (["/path1"], "/path2/path3.txt", True, ["/path2/path3.txt"])

@cgrass
Copy link
Author

cgrass commented Jan 16, 2024

I don't really understand the use of has_magic when determining the exists flag. it seems to pick up on the query param segment and change the behavior of the downstream (other_paths) call. without ?test=value the output is /var/myloc/outputfile.txt, but if the query is included the result is /var/myloc/outputfile.txt/gzip\?test=value; is that intended?

breaking apart the other_paths method into use-case specific implementations might allow the interface to be cleaned up (no boolean flags) and the code to be more specific. for instance, exists doesn't make sense when called from the get() flow, as the source should always exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant