Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Reject trailing slashes on file path #20316

Closed
asfimport opened this issue Jul 11, 2022 · 1 comment
Closed

[C++] Reject trailing slashes on file path #20316

asfimport opened this issue Jul 11, 2022 · 1 comment

Comments

@asfimport
Copy link
Collaborator

asfimport commented Jul 11, 2022

We had several different behaviors when passing in file paths with trailing slashes: LocalFileSystem would return IOError, S3 would trim off the trailing slash, and GCS would keep the trailing slash as part of the file name (later creating confusion as the file would be labelled a "directory" in list calls). This PR moves them all to the behavior of LocalFileSystem: return IOError.

The R filesystem bindings relied on the behavior provided by S3, so they are now modified to trim the trailing slash before passing down to C++.

Here is an example of the differences in behavior between S3 and GCS:

import pyarrow.fs
from pyarrow.fs import FileSelector
from datetime import timedelta

gcs = pyarrow.fs.GcsFileSystem(
    endpoint_override="localhost:9001",
    scheme="http",
    anonymous=True,
    retry_time_limit=timedelta(seconds=1),
)

gcs.create_dir("py_test")

# Writing to test.txt with and without slash produces a file and a directory!?
with gcs.open_output_stream("py_test/test.txt") as out_stream:
    out_stream.write(b"Hello world!")
with gcs.open_output_stream("py_test/test.txt/") as out_stream:
    out_stream.write(b"Hello world!")
gcs.get_file_info(FileSelector("py_test"))
# [<FileInfo for 'py_test/test.txt': type=FileType.File, size=12>, <FileInfo for 'py_test/test.txt': type=FileType.Directory>]

s3 = pyarrow.fs.S3FileSystem(
    access_key="minioadmin",
    secret_key="minioadmin",
    scheme="http",
    endpoint_override="localhost:9000",
    allow_bucket_creation=True,
    allow_bucket_deletion=True,
)

s3.create_dir("py-test")

# Writing to test.txt with and without slash writes to same file
with s3.open_output_stream("py-test/test.txt") as out_stream:
    out_stream.write(b"Hello world!")
with s3.open_output_stream("py-test/test.txt/") as out_stream:
    out_stream.write(b"Hello world!")
s3.get_file_info(FileSelector("py-test"))
# [<FileInfo for 'py-test/test.txt': type=FileType.File, size=12>]

Reporter: Will Jones / @wjones127
Assignee: Will Jones / @wjones127

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-17045. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
Issue resolved by pull request 13577
#13577

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants