You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We had several different behaviors when passing in file paths with trailing slashes: LocalFileSystem would return IOError, S3 would trim off the trailing slash, and GCS would keep the trailing slash as part of the file name (later creating confusion as the file would be labelled a "directory" in list calls). This PR moves them all to the behavior of LocalFileSystem: return IOError.
The R filesystem bindings relied on the behavior provided by S3, so they are now modified to trim the trailing slash before passing down to C++.
Here is an example of the differences in behavior between S3 and GCS:
importpyarrow.fsfrompyarrow.fsimportFileSelectorfromdatetimeimporttimedeltagcs=pyarrow.fs.GcsFileSystem(
endpoint_override="localhost:9001",
scheme="http",
anonymous=True,
retry_time_limit=timedelta(seconds=1),
)
gcs.create_dir("py_test")
# Writing to test.txt with and without slash produces a file and a directory!?withgcs.open_output_stream("py_test/test.txt") asout_stream:
out_stream.write(b"Hello world!")
withgcs.open_output_stream("py_test/test.txt/") asout_stream:
out_stream.write(b"Hello world!")
gcs.get_file_info(FileSelector("py_test"))
# [<FileInfo for 'py_test/test.txt': type=FileType.File, size=12>, <FileInfo for 'py_test/test.txt': type=FileType.Directory>]s3=pyarrow.fs.S3FileSystem(
access_key="minioadmin",
secret_key="minioadmin",
scheme="http",
endpoint_override="localhost:9000",
allow_bucket_creation=True,
allow_bucket_deletion=True,
)
s3.create_dir("py-test")
# Writing to test.txt with and without slash writes to same filewiths3.open_output_stream("py-test/test.txt") asout_stream:
out_stream.write(b"Hello world!")
withs3.open_output_stream("py-test/test.txt/") asout_stream:
out_stream.write(b"Hello world!")
s3.get_file_info(FileSelector("py-test"))
# [<FileInfo for 'py-test/test.txt': type=FileType.File, size=12>]
We had several different behaviors when passing in file paths with trailing slashes: LocalFileSystem would return IOError, S3 would trim off the trailing slash, and GCS would keep the trailing slash as part of the file name (later creating confusion as the file would be labelled a "directory" in list calls). This PR moves them all to the behavior of LocalFileSystem: return IOError.
The R filesystem bindings relied on the behavior provided by S3, so they are now modified to trim the trailing slash before passing down to C++.
Here is an example of the differences in behavior between S3 and GCS:
Reporter: Will Jones / @wjones127
Assignee: Will Jones / @wjones127
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-17045. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: