-
-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too long restart on rpi2, 3 #653
Comments
shairport-sync -V |
I'm not sure how to show systemd logs
I reconfigure it right now:
|
Thans for the post, but it's rather hard to work out what the issue is. What is BTW, you need to be privileged to read the journal, and you don't need to alter its setting. Try, e.g:
|
@a-x- : |
I also noticed this on RaspberryPi, I'm on latest commit:
Also, when it works, works great, but lot of times, streaming from iPhone produces no sound and also forked-daapd cannot stream (checking the speaker immediately unchecks it). The only solution is to restart the whole device since
|
Thanks for the report. Would you be kind enough to set the [Update] Also, the output of |
I set log level to 2 on one of them, since then I could restart shairport-sync (!!)... so no report from this one, it just happened on another, couldn't restart again, I then set verbosity to 2 and rebooted the machine (it also took some time)... I think shairport gets stuck probably somehow because I noticed the issue again by trying to set the output of forked-daapd to this instance and it immediately turned off.. Something is not the best with shairport besides this restarting issue, this is probably connected... So, again, I noticed: cannot send output from forked-daapd, cannot play from iPhone (seems to be playing and the song progresses, but nothing plays). Report after reboot:
|
Thanks for this. All that seems fine. I have it running on a 3B here on Stretch Lite 4.14.41-v7+ #1113 without problems, so I wonder, what other applications do you have running on the Pi? Something that might be worth trying is to delay automatic startup for a few seconds -- see here in the TROUBLESHOTING.md page. Is it possible to get the output of |
You are right, I forgot to turn off wifi power management, after 1 day of testing with power management turned off, for now no issues with I have an update for your document (I can make a pull request if you want), so this:
was never working for me, tried again, power management=on after reboot again (so it didn't work)... What works:
Type:
|
UPDATE: The problem still present but much less frequently... I managed to confuse shairport by trying to rewing a track streaming from iPhone Music app directly to shairport instance.. it became unresponsive, I sshed into pi, tried to stop the service, it hung... then I rebooted (it also took a minute or two). Power management is off, unfortunately logging was disabled on this instance.. I enabled it, waiting for this to happen again, then I'll send the logs |
UPDATE: Problem is present with forked-daapd source as well (so not just streaming from iPhone) I managed to capture the logs:
I don't think "Already playing." part is true... because nothing was playing for a long time. Maybe what is different with my setup is that I have three different forked-daapd libraries as source and I play from different ones at different times... of course I have to stop the play from some forked-daapd source before starting from another. After service stop issued:
So as you can see - 2 min delay before anything happens... Hope something can be seen from this? Tried attaching gdb but I already issued stop command and I didn't manage to find anything in 2 minutes because I don't really know how to use gdb correctly in this case (should I try with something else when it happens again?). |
Many thanks for your great work and for your patience. It seems that, for some reason, Shairport Sync is hanging up in certain circumstances -- not correctly terminating a session. As you'll have seen from recent comments in the development branch, some bugs have recently been fixed. The version you are using incorporates these fixes, so it would be very interesting for me to see if we could catch Shairport Sync [still] locking up. Of course, it may be unrelated. So, if you could update to the latest development version, please, and set the log verbosity to 2, it would be a big help. The amount of output on log level 2 has been modified so that it will print RTSP messages coming from the sources (all except for one very common one), so we should be able to see the requests for a play session and for the termination of a session coming in, and get an idea what's happening. It shouldn't be too chatty. The idea is that if it happens again, you would have a history in the log of all the calls to set up and terminate the now-hung-up session and of the session that was refused because the hung-up session was already playing. Using |
As a matter of interest, what output device are you using please? |
Hi, Raspberry Pi B and B+ via mostly USB soundcard (hw1,0) and one RPi uses HifiBerry Digi+. I have new report with development version:
So on one of the devices I cannot play again, It's loading the connection to speaker in mobile app (Remote), then fails... Again Already playing. ... |
Thanks. Do you have the full "history" where thread 6 was initiated and where an attempt was make to tear it down please? |
Fantastic, thanks. Let me have a look. This is great. |
So, it looks like a TEARDOWN request for thread 6 was received by not fully acted upon. Let me think a bit further about this. This is really good stuff. |
Nice, I'm happy the issue is found... today I'll dedicate some time for studying the source as well... I should probably get back to reading more C code. Then I'll try to understand how the fix is done. |
Thanks. I'm almost certain that the problem is one of overlapping concurrent execution of critical sections of threads. It seems that forked-daapd can cause this (not it's fault), so I will play around with it can see if can induce it. I may push out an update that drops down to a greater level of detail for a TEARDOWN... |
Hello again. I've just pushed an update which should give some more debug information during a TEARDOWN, so it might offer a little more precision on where the problem is arising. If you would update to it, I'd be most grateful. Then if the same scenario as you looked above happened, we could see how far the TEARDOWN got before it stalled... Meantime, I'd installing forked-daapd... |
Great, I updated on all 5 players and will send a log when the problem happens again in a day or two...... thank you |
New log: https://gist.github.com/davidhq/1aecb28d409d2814bd862768fcb941a4 I noticed one of the players with the weakest wifi has the most issues... and the one on ethernet cable never had this problem occurring. Another one with a bit problematic wifi (when it works, it works, but sometimes the signal doesn't come through) also showed this problem. Also you mentioned that forked-daapd can cause this, but as I mentioned also streaming from latest iOS as well... since I figured out this is not isolated to streaming from iOS, I'm only testing as I actually use it most of the time - via forked-daapd. |
That's really good, thanks. |
OK, here's from another instance where it just happened: journalctl: https://gist.github.com/davidhq/58d70fe3e5861bc292977de52da8ff0c /var/log/syslog: https://gist.github.com/davidhq/7cb29f906ad6ce73a7dc3ed668833b37 |
It's the same hang-up in all these cases. I might be coming back with an update to get a bit more forensic at that point. |
Hello again. I am afraid I'm having difficulty figuring out what's happening (still), so I have pushed yet another update with more debug messages in the hope that I can zero-in on where the problem is. It's a "strong signal", since it seems to be rather readily reproducible, but I can't think what it is. Perhaps, whenever you get the opportunity, you would [yet again] update and see if you can capture another incident. BTW, if you could also turn on the "delta" timing, it might turn out to be useful -- do this by setting the configuration file's Here is what you'd get if a TEARDOWN went fully to plan:
My slight guess is that it won't get past |
Ok, no problem. I'm in the mood for helping with this, so I can do as many updates as needed.. and as you say, the problem happens reliably each day so there is steady progress. |
|
Hi David. Your logs do indicate that there are problems remaining with terminating a play session reliably under some circumstances. Sadly, I can't reproduce the conditions that cause them. So, two things: one, I think enough progress has been made to move the fixes into a Release Candidate and, two, it seems to me that we should be moving to a slightly different system for terminating a player thread. It will take some time -- time I won't have for another couple of weeks. So, if you could show us some output with |
Hello ! Here logs with stats: Kitchen: https://gist.github.com/davidhq/6eb2a516ae70929eb797921a178eca72 music stuttered about 1 minute before I saved the log.. not sure if Lab log: For lab I now do suspect some strange intermittent issues... Kitchen it should not happen... Regarding stuttering in general, I also tried setting Please let me know what is visible in the logs and thank you |
Thanks for these. I don't have time to study them in detail yet -- it'll have to be mid-week next week. However, the Kitchen log is showing statistics that show that the sender's clock is way off -- look at all the adjustment of around -800 ppm. This is way outside what would be normal now. I've seen Macs needing adjustment of 130 ppm, but I've only ever heard of this before with some old Dell laptops. The other thing is that somehow lots and lots of UDP packets are being lost -- it seems that almost 10% of UDP packets are being lost. For example, at the end of the log, it seems that 57,008 resend requests were made for a total number of packets of 674,016. That is extremely high. To eliminate the possibility that it's a faulty source, can you try another, e.g. an iPhone or an iTunes machine? |
Hmm interesting, however all clocks are in sync via ntp, I also checked now... I use two instances of forked-daapd: on a beefy server on LAN and another on one of the Raspberries. I also tried streaming from a Mac:
There is a lot of resend requests in all other cases (streaming from both instances of forked-daapd, also both are on a wire). I checked for UDP packet loss with iperf3, most of the times there was 0% loss or 1-2% sometimes. hmm indeed :S |
I think I have something: I installed v3.1.1 (44fbe8b) on 3 of the devices:
There is no more packet loss (resends). Latest development version still has lots of resends:
Hopefully I didn't overlook something? |
Thanks for the very interesting update. NTP only ensures the time-of-day is correct. If the system clock is running slow or fast, then the NTP protocol will make appropriate corrections to the time, but it doesn't actually speed up or slow down the clock itself, so I'm afraid that's not really relevant. Given that the |
Our posts crossed. The If would be interesting indeed to put the latest version on the It's hard to believe that you could have two Raspberry Pis with clocks that are so far out of whack. So, do you have something else running on them that might be loading up the USB or Etherenet subsystems? (It used to be that Pis would drop packets and other undesirable things if their USB ports were heavily loaded -- it seems the USB ports and the Ethernet port, and maybe WiFi, share the same output bus.) |
What source were you using in this case? |
Actually midroom is on low values only when playing to default jack... I didn't take that into consideration before, so:
when
and
when
What really is different on 3.1.1 is no resends... I couldn't read your two posts in detail right now (in hurry), I just managed to test what I'm reporting here... does it change anything? |
I always use
|
USB soundcard in all cases is:
(I broken-open one of them), probably not relevant, but still |
Thanks for all this. It is possible to turn off resend requests in the development version. There is a |
Right, so it looks like the USB DACs might be the cause of the very large level of net correction. That's a (possible) discovery. If you could show some stats of a system that has been generating lots of resend requests but now with them turned off, I wonder how what fraction of packets are actually missed in the end... If it turns out to be small, then I could lower the action of the resend requester... |
ok, later today |
https://gist.github.com/davidhq/fadbadd64b93dad31b3c8d820a4d95c4 Two sessions, first |
Thanks David -- I can only see one session, the one with disable set to "yes"... |
Sorry: https://gist.github.com/davidhq/18dd87d006bcec62f5eb6567e12f53ad Here are previous couple.... |
Except that I think other than this 12 min session this device was testing 3.1.1 :) so I think there is one useful session from this only.. do you need more? I will set it to 'no' now and play on it... |
Yes please — it Is fascinating. It does look like the resend requesting is too aggressive, but it would be great to get a picture of it with requesting back on, if it’s not too inconvenient. |
Yes, here is another 28min session in addition to 12min above: https://gist.github.com/davidhq/69ec73f7a1d319223bcb66ad89d6c91d |
Just let me know if I should do the same for other devices except 'lab' and for how long |
Thanks again. I have two hard days work to get through before I can get back to this, so perhaps we can pause until then. I’ll be thinking about the very interesting evidence in the meantime. |
SURE! Enjoy the next days, until soon... I will now disable resend requests on all devices and let you know if there were any problems on the outside with this on the latest development version. |
Hello Mike, reporting my findings in last week... Not sure if it's good news or not so much, but version 3.1.1 works almost without issues. There is no "stuttering", so far there is no thread stop / spawn issues (I think it only happened once but I'm not sure anymore, so if it happens it's very rare, not like 3.2* where it was regular occurrence). There was one other problem only twice, seems not related to the two issues above and forked-daapd could be the source of it. What happened was songs were rewinding, restarting, skipping in a strange way - I thought my girlfriend was switching songs, and she thought I was :) So either ghosts, russian hackers, or some other third issue which for now doesn't matter but I will investigate later. |
Thanks for the update, David. I have just pushed a |
Hello Mike, Regarding the other problem with dropping sound: also not a single occurrence since last report 6 days ago either. I have a question: resending requests should be useful to maintain a perfect reproduction when packets are lost? And the right amount is determined by trial and error? I personally would rather not have this since 3.1.1 without resending works great. Could be that this feature is overdoing it a little? I'm willing to test as I said but now at leat according to my current understanding this is not something I'd use later when I'm not testing... it adds complexity and potential problems with no obvious benefit. I'm sure I don't understand it correctly but still, speaking from practical experience. Until soon, thank you |
This issue has been inactive for 60 days so will be closed 7 days from now. To prevent this, please remove the "stale" label or post a comment. |
Apprx. 1 munite restart via systemctl
Happened between 1.1.1-23-gd65b8e8 (i'm not sure) and 3.1.7 versions
The text was updated successfully, but these errors were encountered: