-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-38319: Fix shutil._fastcopy_sendfile(): set sendfile() max block size #16491
Conversation
Lib/shutil.py
Outdated
@@ -136,6 +136,7 @@ def _fastcopy_sendfile(fsrc, fdst): | |||
# changes while being copied. | |||
try: | |||
blocksize = max(os.fstat(infd).st_size, 2 ** 23) # min 8MB | |||
blocksize = min(blocksize, 2 ** 31 - 2) # max 2GB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not 2 ** 31 - 1
?
Would not be better to use a power of two, 2**30
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, used 2**30
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds inefficient on 64-bit system: files larger than 2 GB will need multiple sendfile() syscalls, rather currently a single one is enough, no? Maybe use sys.maxsize instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my 64-bit Ubuntu 18.04 box sendfile() always copies up to 2GB per call, never more than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well ok, but the kernel can be updated to support larger copies. Why should Python put an arbitrary limit on a syscall? Well, you're the expert, it's up to you. It was just a comment ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think it makes sense. I added:
+ if sys.maxsize < 2 ** 32: # 32-bit architecture
+ blocksize = min(blocksize, 2 ** 30) # max 1GiB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...I did that for shutil only. In socket.py
I decided to always limit to max 1GiB: since we're dealing with a socket I want to exercise the loop more often (and hit the selector(), which implements the timeout logic).
Lib/shutil.py
Outdated
@@ -136,6 +136,7 @@ def _fastcopy_sendfile(fsrc, fdst): | |||
# changes while being copied. | |||
try: | |||
blocksize = max(os.fstat(infd).st_size, 2 ** 23) # min 8MB | |||
blocksize = min(blocksize, 2 ** 31 - 2) # max 2GB | |||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exceptions besides OSError
can be raised? I do not feel comfortable with silencing a wide range of exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
Lib/socket.py
Outdated
@@ -357,6 +357,7 @@ def _sendfile_use_sendfile(self, file, offset=0, count=None): | |||
if not fsize: | |||
return 0 # empty file | |||
blocksize = fsize if not count else count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocksize = fsize if not count else count | |
blocksize = count or fsize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I would use binary prefixes for exact powers of two (MiB, GiB).
Misc/NEWS.d/next/Library/2019-09-30-22-06-33.bpo-38319.5QjiDa.rst
Outdated
Show resolved
Hide resolved
Lib/shutil.py
Outdated
except Exception: | ||
blocksize = max(os.fstat(infd).st_size, 2 ** 23) # min 8MiB | ||
if sys.maxsize < 2 ** 32: # 32-bit architecture | ||
blocksize = min(blocksize, 2 ** 30) # max 1GiB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a comment explaining this 1 GiB limit?
Lib/socket.py
Outdated
@@ -356,7 +356,7 @@ def _sendfile_use_sendfile(self, file, offset=0, count=None): | |||
raise _GiveupOnSendfile(err) # not a regular file | |||
if not fsize: | |||
return 0 # empty file | |||
blocksize = fsize if not count else count | |||
blocksize = min(count or fsize, 2 ** 30) # max 1GiB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to truncate the size on 64-bit system? Please add a comment. I suggest to write this instruction in two limits, to make the purpose of each line more explicit. Lilke:
blocksize = fsize if not count else count
# bpo-38319: truncate to 1 GiB to prevent OverflowError
blocksize = min(blocksize, 2 ** 30)
1 GiB sounds an arbitrary limit. If the developer wants a fine control on the timing, they can pass a smaller "count", no? The current implementation of _sendfile_use_sendfile() doesn't seem to care about timeout. Can't it be implemented using a non-blocking socket and poll until the socket is ready to send? It should prevent to block in sendfile(), no? asyncio uses sendfile() on non-blocking socket, so it should be possible to do something similar in socket to respect the timeout :-) To fix https://bugs.python.org/issue38319 and only fix this issue: |
I think it is better to use a block size which is a power of two. It guaranties that it is divisible by the page size and by the logical or physical disk block size. 1 GiB is the largest power of two which not larger than I think it is worth to convert a count to |
Oh, I see. Sorry, I was lost in all these comments on the PR. That's why I asked for a quick comment added directly in the code to explain that. |
Thanks @giampaolo for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8. |
I'm having trouble backporting to |
Thanks @giampaolo for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8. |
GH-16506 is a backport of this pull request to the 3.8 branch. |
Thanks for the code review. |
Thanks @giampaolo for the fix, but also for your overall work: it's really cool that Python can use transparently sendfile() internally! I like this syscall() :-) |
An attempt to fix:
https://bugs.python.org/issue38319
https://bugs.python.org/issue38319