Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-112341: Specify the exact file size if offset is non zero in socket.sendfile #112342

Closed
wants to merge 4 commits into from

Conversation

aisk
Copy link
Contributor

@aisk aisk commented Nov 23, 2023

I think this is a minor fix, so the news entry is not needed.

@erlend-aasland erlend-aasland changed the title gh-112341: specify the exact filesize if offset is non zero in socket.sendfile gh-112341: Specify the exact file size if offset is non zero in socket.sendfile Jan 6, 2024
@erlend-aasland
Copy link
Contributor

Is there an easy way to add tests for this?

@aisk
Copy link
Contributor Author

aisk commented Jan 11, 2024

Thanks for the review, test case added.

@erlend-aasland
Copy link
Contributor

How does this relate to os.sendfile? Will os.sendfile and socket.sendfile have divergent behaviour? OTOH, how is their behaviour now?

@aisk
Copy link
Contributor Author

aisk commented Jan 12, 2024

os.sendfile is just a wrapper function which almost directly calls the syscall function sendfile. The sendfile is just like the write, it may not send all the data you give to it.

socket.sendfile is build upon os.sendfile, which will ensure all the data will be written. To get this, it may invoke os.sendfile multiple times.

The problem this PR want to resolve is that, if we have a file with size 42, and want to send it to a socket from the start 2, we will invoke it like s.sendfile(f, offset=2, count=None).

Because we do not specify the count parameter, socket.sendfile will assume that we want send the whole file from the offset. So the count should be fsize - offset, which is 42 - 2 = 40. And the count is << 1G (max block size), so we will use count as block size.

The current codes in main branch missed the - offset, so it will use 42 as blocksize.

Then socket.sendfile will invoke os.sendfile(in_fd, out_fd, offset=2, count=42). So we give the syscall sendfile a valid range, which it's end is out of the file.

Modern systems which CPython supports with sendfile feature will accept it, and only returns the actually sent bytes (is 40 in this case). So nothing goes wrong here.

But I've tried to implement the os.sendfile via Windows's TransmitFile, and found that TransmitFile don't accept the out of range count argument. (Meanwhile, it accepts the offset is out of range to the file).

So I think this is a bug and we should fix it.

But, the is not the end of this story. I found that the TransmitFile have much more different behaviors to the Unix's sendfile, the main block is that TransmitFile does not respect the SO_SNDTIMEO option to make a timeout. Then I give up it, and just try to implement the high level socket.sendfile via TransmitFile. This PR is opened as #112337.

And then the change in this PR is not a must to have. But I still feels like it's a bug, so we should fix it 😂.

@erlend-aasland
Copy link
Contributor

Thanks, @aisk, makes sense to me! cc. @serhiy-storchaka if he want's to take a look. I'm aiming for landing this in a couple of days.

@erlend-aasland erlend-aasland self-assigned this Jan 14, 2024
@erlend-aasland
Copy link
Contributor

Could you try to compose a short NEWS entry for this? I'm unsure if we really need one; this feels like an implementation detail.

@@ -360,7 +360,7 @@ def _sendfile_use_sendfile(self, file, offset=0, count=None):
if not fsize:
return 0 # empty file
# Truncate to 1GiB to avoid OverflowError, see bpo-38319.
blocksize = min(count or fsize, 2 ** 30)
blocksize = min(count or fsize - offset, 2 ** 30)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if offset >= fsize?

Please add tests for offset == fsize and offset > fsize.

@erlend-aasland
Copy link
Contributor

See also #114077 and #114079.

@aisk
Copy link
Contributor Author

aisk commented Jan 25, 2024

There are some discussions on the original issue #112341. I think this should not be treated as a bug, so I'm closing this. If anyone wants to discuss it further, we can reopen it. Still appreciate the review!

@aisk aisk closed this Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants