Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing folders http glob #1516

Merged
merged 2 commits into from
Apr 10, 2024

Conversation

ap--
Copy link
Contributor

@ap-- ap-- commented Jan 26, 2024

Closes #1515

This PR fixes a change in behaviour of HTTPFileSystem.glob which was introduced in 2023.12.0

TODO:

  • don't require a new http test server fixture

@ap--
Copy link
Contributor Author

ap-- commented Jan 26, 2024

Looks like the python-3.10 and python-3.11 test failures are unrelated. The failing tests are for the smb filesystem.

@martindurant
Copy link
Member

Looks like the python-3.10 and python-3.11 test failures are unrelated. The failing tests are for the smb filesystem.

A long standing flakiness I have yet to track down.



@pytest.fixture(scope="session")
def stdlib_simple_http_server(tmp_path_factory):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a different server implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially found the issue in tests with the stdlib http.server module, and needed a nested folder to reproduce. I'll check again if I can use/extend the existing test server.

assert out == [
stdlib_simple_http_server + "/file1",
stdlib_simple_http_server + "/file2",
stdlib_simple_http_server + "/folder1/",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a "folder" because it ends in "/" or because it contains further things? I think this is the crux of the original issue - in most filesystems, a path cannot end in "/" (the folder's actual path is without the slash).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment made me realize that I was not entirely aware of what is and is not a directory for http filesystems.

It's a folder because it contains further things. Or if I understand correctly: It's a folder because for a request to http://server/somepath/folder1/ the server returns html that contains links other than "." and ".."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a folder because for a request to http://server/somepath/folder1/ the server returns html that contains links other than "." and ".."

Yes, or more precisely links with the same_prefix , like

As you say, we exclude relative links "." and ".." but also links to completely different paths.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this discussion, do you still wish to continue with this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I accidentally marked this as completed.

I'll make the changes and will update this PR when I find some free time.

@ap-- ap-- force-pushed the fix-missing-folders-http-glob branch from 5fda691 to 1a7df23 Compare April 7, 2024 17:35
@ap--
Copy link
Contributor Author

ap-- commented Apr 7, 2024

I made the changes to use the existing HTTPTestHandler in test/conftest.py and cleaned them up and force pushed to this PR.

The first commit adds the failing test, and the second commit restores the glob behavior.

@martindurant martindurant merged commit 2dd9355 into fsspec:master Apr 10, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTTPFileSystem.glob misses folders on fsspec>2023.10.0
2 participants