-
-
Notifications
You must be signed in to change notification settings - Fork 41
[Bug]: Disconnect happening randomly and exiting application, WebSocket in aborted state #402
Comments
What docker base are you running the bot in? |
cc @lhjt |
|
What are your process configurations for starting FFMPEG, in my experience with ffmpeg I get that error of a broken pipe but my bot doesn't close |
This is quite a peculiar problem, but based on this information it doesn't seem like it would be related to your deployment environment. Do you happen to have any other action logs produced by your bot just before the disconnect and program exit happens? If not, would you be able to change the logging level to debug and continue running your workload until you run into this issue again, and then attach the logs before the exception happens? This may help us in producing an example where this issue can be reliably reproduced. |
I suppose the broken pipe could just be an after-affect of the socket being aborted. using var ffmpeg = Process.Start(new ProcessStartInfo
{
FileName = "ffmpeg",
Arguments = $"-hide_banner -i {inputFilePath} -ac 2 -f s16le -ar 48000 -",
UseShellExecute = false,
RedirectStandardInput = false,
RedirectStandardOutput = true
});
await using var output = ffmpeg.StandardOutput.BaseStream;
try
{
await output.CopyToAsync(audioStream, 3840, cancellationToken);
}
finally
{
await audioStream.FlushAsync(cancellationToken);
} If this was related I would expect this problem to happen much more often,
Indeed, I'm really at a loss as to what it can be.
At the moment sadly I do not. I already added the first chance exception logging for more information, but it did not help much. I will see if I can add even more logging and enable the debug logs and I will reply here again once I get some more data. I found this other similar issue: #203 |
Can you test if dragging your bot to a different vc crashes the pod? |
Yeah, I tried that, but it does not make it crash. I figured maybe it could be something similar, but perhaps an unhandled exception at the Discord.NET end? |
@lhjt I was able to get some debug logs already. First pod crash log
Second pod crash log
I did remove some unrelated logs, otherwise it would be too massive. Hope this helps. Thanks so much! |
The only Discord.Net trace I see here is the Would you be able to directly add the project source to your bot? This should give you a full stack trace including the line numbers for dnet, this will help me figure out the root cause. Just for reference, here is the StartAsync function |
Sure, here you go: Log with discord-net 3.1.0 source added
|
Does this error occur running the same application outside of docker? Are you able to run one process with 8 shards outside of docker for a day and see if you get the same logs? I've heard of people having issues with k8s/docker and websockets so it might be something there. Looking at the one stack we get from the logs that exception is handled properly. One piece of evidence to back this up is the continuation of code executing by the other logs below it. System.Threading.Tasks.TaskCanceledException: A task was canceled.
at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext() in /src/discord-net/Discord.Net.WebSocket/ConnectionManager.cs:line 79
[08:23:05 INF] Disconnecting // here
[08:23:05 DBG] Disconnecting ApiClient // here Maybe it has to do with the |
I'm afraid it's not possible for me to run it elsewhere, since there is shared infrastructure within the cluster (such as RabbitMQ and Redis) and nothing is reachable from the outside. I have been checking Discord.NET code a bit, and been trying to trace what is happening... So it seems it is Disconnecting but before the private async Task DisconnectAsync(Exception ex, bool isReconnecting)
{
if (State == ConnectionState.Disconnected) return;
State = ConnectionState.Disconnecting;
await _logger.InfoAsync("Disconnecting").ConfigureAwait(false);
await _onDisconnecting(ex).ConfigureAwait(false);
await _disconnectedEvent.InvokeAsync(ex, isReconnecting).ConfigureAwait(false); // it never gets here ?
State = ConnectionState.Disconnected;
await _logger.InfoAsync("Disconnected").ConfigureAwait(false);
} So I am assuming an exception is thrown within the I will throw some try blocks in the Discord.NET code that I added to my project to see if I get some more output. |
Feel free to experiment but I think it might be related to this pr. I'll make a change to how it works and get you to try it out. |
I think I'm already seeing some results. Caught this exception, which was otherwise uncaught:
Does this give you any ideas? |
Does catching that exception stop the pod from restarting? |
Well, I have only had it running for 37m so far, but no restarts yet at all so it seems like it. Edit: I was just a minute too soon with this message, as one restarted just now with the same issue as before. Log
I see that |
Does using .NET 5 change anything? |
That's not something I can downgrade to quickly and easily, so it will be a last resort test I can do. However if you're talking mainly about the For now I will continue to find places with unhandled exceptions in discord-net code and add try-catches and logging where I can. The code is rather hard to follow, so I hope I can keep my sanity. |
So far I have traced it to a Segmentation fault (exit code 139). I am assuming it is caused by either ffmpeg, libopus, or maybe libsodium? Things I have done that did not help:
FFmpeg version seems recent enough too:
I don't know anything about how this all works in discord-net, |
I have chosen to circumvent the issue by using Lavalink instead. It might be good to warn people that the Discord.NET audio client can kill your app with segmentation faults. |
Check The Docs
Verify Issue Source
Check your intents
Description
My application seems to exit completely following a
NULL
logged as warning and a disconnection from Discord.NET. (see the stacktrace)I'm not sure how it happens, since it happens very randomly (2 to 10 times per 24 hours with my production workload).
This makes me believe it has something to do with Discord and the users doing stuff with my bot.
I am running sharded discord client with 8 shards per process. Running 6 processes for a total of 48 shards at the moment.
However I had the same happening with 1 process per shard using the regular discord client.
They are running on a k8s cluster with .NET 6.
Some things I have done to try and solve this with no success:
Task.Run(...)
so they do not block the gateway.I hope someone is able to help me trace the problem and work around it.
Version
Currently 3.0.0 and also had this problem in 3.5.0-labs
Working Version
Not exactly sure when it started
Logs
I have been logging first chance exceptions as well to hopefully find the issue, although these might be unrelated.
Perhaps the
The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived'
is where things go wrong, but I know too little of Discord.NET and the API to say anything about this.Log Excerpt
Sample
I have not been able to reproduce the issue.
The text was updated successfully, but these errors were encountered: