Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-38319: Fix shutil._fastcopy_sendfile(): set sendfile() max block size #16491

Merged
merged 6 commits into from
Oct 1, 2019
Merged

bpo-38319: Fix shutil._fastcopy_sendfile(): set sendfile() max block size #16491

merged 6 commits into from
Oct 1, 2019

Conversation

giampaolo
Copy link
Contributor

@giampaolo giampaolo commented Sep 30, 2019

Lib/shutil.py Outdated
@@ -136,6 +136,7 @@ def _fastcopy_sendfile(fsrc, fdst):
# changes while being copied.
try:
blocksize = max(os.fstat(infd).st_size, 2 ** 23) # min 8MB
blocksize = min(blocksize, 2 ** 31 - 2) # max 2GB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not 2 ** 31 - 1?

Would not be better to use a power of two, 2**30?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, used 2**30.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds inefficient on 64-bit system: files larger than 2 GB will need multiple sendfile() syscalls, rather currently a single one is enough, no? Maybe use sys.maxsize instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my 64-bit Ubuntu 18.04 box sendfile() always copies up to 2GB per call, never more than that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well ok, but the kernel can be updated to support larger copies. Why should Python put an arbitrary limit on a syscall? Well, you're the expert, it's up to you. It was just a comment ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think it makes sense. I added:

+        if sys.maxsize < 2 ** 32:  # 32-bit architecture
+            blocksize = min(blocksize, 2 ** 30)  # max 1GiB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...I did that for shutil only. In socket.py I decided to always limit to max 1GiB: since we're dealing with a socket I want to exercise the loop more often (and hit the selector(), which implements the timeout logic).

Lib/shutil.py Outdated
@@ -136,6 +136,7 @@ def _fastcopy_sendfile(fsrc, fdst):
# changes while being copied.
try:
blocksize = max(os.fstat(infd).st_size, 2 ** 23) # min 8MB
blocksize = min(blocksize, 2 ** 31 - 2) # max 2GB
except Exception:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exceptions besides OSError can be raised? I do not feel comfortable with silencing a wide range of exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Lib/socket.py Outdated
@@ -357,6 +357,7 @@ def _sendfile_use_sendfile(self, file, offset=0, count=None):
if not fsize:
return 0 # empty file
blocksize = fsize if not count else count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
blocksize = fsize if not count else count
blocksize = count or fsize

@vstinner vstinner changed the title BPO-38319 set sendfile() max block size bpo-38319: Fix shutil._fastcopy_sendfile(): set sendfile() max block size Sep 30, 2019
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I would use binary prefixes for exact powers of two (MiB, GiB).

Lib/shutil.py Outdated
except Exception:
blocksize = max(os.fstat(infd).st_size, 2 ** 23) # min 8MiB
if sys.maxsize < 2 ** 32: # 32-bit architecture
blocksize = min(blocksize, 2 ** 30) # max 1GiB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment explaining this 1 GiB limit?

Lib/socket.py Outdated
@@ -356,7 +356,7 @@ def _sendfile_use_sendfile(self, file, offset=0, count=None):
raise _GiveupOnSendfile(err) # not a regular file
if not fsize:
return 0 # empty file
blocksize = fsize if not count else count
blocksize = min(count or fsize, 2 ** 30) # max 1GiB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to truncate the size on 64-bit system? Please add a comment. I suggest to write this instruction in two limits, to make the purpose of each line more explicit. Lilke:

blocksize = fsize if not count else count
# bpo-38319: truncate to 1 GiB to prevent OverflowError
blocksize = min(blocksize, 2 ** 30)

@vstinner
Copy link
Member

...I did that for shutil only. In socket.py I decided to always limit to max 1GiB: since we're dealing with a socket I want to exercise the loop more often (and hit the selector(), which implements the timeout logic).

1 GiB sounds an arbitrary limit. If the developer wants a fine control on the timing, they can pass a smaller "count", no?

The current implementation of _sendfile_use_sendfile() doesn't seem to care about timeout. Can't it be implemented using a non-blocking socket and poll until the socket is ready to send? It should prevent to block in sendfile(), no? asyncio uses sendfile() on non-blocking socket, so it should be possible to do something similar in socket to respect the timeout :-)

To fix https://bugs.python.org/issue38319 and only fix this issue: size = min(size, sys.maxsize) should be enough and work on any platform, no?

@serhiy-storchaka
Copy link
Member

I think it is better to use a block size which is a power of two. It guaranties that it is divisible by the page size and by the logical or physical disk block size. 1 GiB is the largest power of two which not larger than sys.maxsize on 32-bit platform.

I think it is worth to convert a count to size_t instead of Py_ssize_t in future. This will allow to raise the limit to 2 GiB. But this is a separate issue.

@vstinner
Copy link
Member

I think it is better to use a block size which is a power of two. It guaranties that it is divisible by the page size and by the logical or physical disk block size. 1 GiB is the largest power of two which not larger than sys.maxsize on 32-bit platform.

Oh, I see. Sorry, I was lost in all these comments on the PR. That's why I asked for a quick comment added directly in the code to explain that.

@giampaolo giampaolo merged commit 94e1650 into python:master Oct 1, 2019
@giampaolo giampaolo deleted the sendfile-32bit-2 branch October 1, 2019 03:40
@miss-islington
Copy link
Contributor

Thanks @giampaolo for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

I'm having trouble backporting to 3.8. Reason: 'Error 110 while writing to socket. Connection timed out.'. Please retry by removing and re-adding the needs backport to 3.8 label.

@miss-islington
Copy link
Contributor

Thanks @giampaolo for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-16506 is a backport of this pull request to the 3.8 branch.

@giampaolo
Copy link
Contributor Author

Thanks for the code review.

ambv pushed a commit that referenced this pull request Oct 1, 2019
…size (GH-16491) (#16506)

(cherry picked from commit 94e1650)

Co-authored-by: Giampaolo Rodola <g.rodola@gmail.com>
@vstinner
Copy link
Member

vstinner commented Oct 1, 2019

Thanks @giampaolo for the fix, but also for your overall work: it's really cool that Python can use transparently sendfile() internally! I like this syscall() :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants