-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix missing folders http glob #1516
Fix missing folders http glob #1516
Conversation
Looks like the python-3.10 and python-3.11 test failures are unrelated. The failing tests are for the smb filesystem. |
A long standing flakiness I have yet to track down. |
fsspec/tests/conftest.py
Outdated
|
||
|
||
@pytest.fixture(scope="session") | ||
def stdlib_simple_http_server(tmp_path_factory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a different server implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially found the issue in tests with the stdlib http.server
module, and needed a nested folder to reproduce. I'll check again if I can use/extend the existing test server.
assert out == [ | ||
stdlib_simple_http_server + "/file1", | ||
stdlib_simple_http_server + "/file2", | ||
stdlib_simple_http_server + "/folder1/", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a "folder" because it ends in "/" or because it contains further things? I think this is the crux of the original issue - in most filesystems, a path cannot end in "/" (the folder's actual path is without the slash).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your comment made me realize that I was not entirely aware of what is and is not a directory for http filesystems.
It's a folder because it contains further things. Or if I understand correctly: It's a folder because for a request to http://server/somepath/folder1/ the server returns html that contains links other than "." and ".."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a folder because for a request to http://server/somepath/folder1/ the server returns html that contains links other than "." and ".."
Yes, or more precisely links with the same_prefix , like
- http://server/folder is a folder if it contains at least one link to
As you say, we exclude relative links "." and ".." but also links to completely different paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this discussion, do you still wish to continue with this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I accidentally marked this as completed.
I'll make the changes and will update this PR when I find some free time.
5fda691
to
1a7df23
Compare
I made the changes to use the existing The first commit adds the failing test, and the second commit restores the glob behavior. |
Closes #1515
This PR fixes a change in behaviour of
HTTPFileSystem.glob
which was introduced in2023.12.0
TODO: