-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AsyncSocket is failing to send data under backpressure (Mac,Linux) #15003
Comments
Originally reported on https://forum.nim-lang.org/t/6548 |
Two forum users have now reproduced the same bug on Linux (see above thread.) |
I have the same issue as reported above , but if you change the code like so:
It seems to work as expected compiled with --gc:boehm or --gc:MarkandSweep . Compiled with --gc:arc it ran ok but memory usage moved towards 100% The main change is not to use != and use inc to increment and not + 1 as well as All this on debian based parrotOS . |
@qqtop for async you need orc, not arc :) |
@Yardanico obviously you are perfectly right and my test right now |
Well yes, orc (assuming you're on devel) still kinda leaks memory with async |
That's just syntactic; it doesn't affect the meaning of the code
But you also changed the await in If I take your modified test and simply change |
I changed the code based on previous advise when I ran into exactly the same issue . |
Just tested it on Windows: doesn't reproduce. Reproduces for me on WSL though. Edit: Fun bug. Thank you for the repro. Still looking into the cause. Edit2: Fixed #15012 |
Fixes #15003. This is a serious bug which occurs when data cannot be read/sent immediately and there are a bunch of other read/write events pending. What happens is that the new events are dropped which results in the case of the reported bug resulted in some data not being sent (!).
Fixes #15003. This is a serious bug which occurs when data cannot be read/sent immediately and there are a bunch of other read/write events pending. What happens is that the new events are dropped which results in the case of the reported bug resulted in some data not being sent (!).
Fixes #15003. This is a serious bug which occurs when data cannot be read/sent immediately and there are a bunch of other read/write events pending. What happens is that the new events are dropped which results in the case of the reported bug resulted in some data not being sent (!).
Ah, EAGAIN strikes again... >_< |
✔️ Verified the fix on macOS 10.15. |
Fixes #15003. This is a serious bug which occurs when data cannot be read/sent immediately and there are a bunch of other read/write events pending. What happens is that the new events are dropped which results in the case of the reported bug resulted in some data not being sent (!). (cherry picked from commit 1e3a0ef)
Fixes #15003. This is a serious bug which occurs when data cannot be read/sent immediately and there are a bunch of other read/write events pending. What happens is that the new events are dropped which results in the case of the reported bug resulted in some data not being sent (!). (cherry picked from commit 1e3a0ef)
AsyncSocket.send
loses data in the presence of backpressure, i.e. when the receiver reads data more slowly than the sender sends it. At some point soon after the OS's send buffer fills up, some number of pending send calls are ignored: their data is never written to the actual socket, and their Futures are not completed.This is quite reproducible on macOS 10.14 and 10.15; I don't have convenient access to other platforms to test on.
Occurs in Nim 1.2.2 as well as latest devel.
Example
Current Output
Note that messages 112…2073 were dropped on the floor: neither completed nor sent over the socket.
Expected Output
Should run forever without any assertion failures and without writing any "ERROR" lines.
Additional Information
This completely breaks the WebSocket-based protocol implementation I'm working on, in which one side typically sends large numbers of small messages. The bug did not show up as long as I was running my tests on one machine (over the loopback interface), because the receiver process was able to keep up with the sender (about 15 megabytes/sec); but it showed up very quickly as soon as I ran any tests on two machines, where WiFi bandwidth+latency was a bottleneck.
(Was pulled and built this morning.)
The text was updated successfully, but these errors were encountered: