-
-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][io.upnp] Sends periodic M-Search for devices #4527
base: 4.3.x
Are you sure you want to change the base?
Conversation
Signed-off-by: Dan Cunningham <dan@digitaldan.com>
That is also my interpretation. The underlying device discovery and availability is managed in the jupnp library, and to my understanding, relies on NOTIFY alive messages from the device. For some reason, your devices stop sending these alive messages. An M-SEARCH unicast request forces the device to send the info again, and it may be enough to get it to send alive messages again for a while (I don’t know). Note the jupnp library would send an M-SEARCH multicast when starting to get a response from all devices in the network. And a device added to the network should always send a NOTIFY alive to be picked up by a control point. See https://openconnectivity.org/upnp-specs/UPnP-arch-DeviceArchitecture-v2.0-20200417.pdf and https://upnp.org/resources/documents/UPnP_UDA_tutorial_July2014.pdf. If this code solves the issue, this is a good solution for me. The one question I have left is if the M-SEARCH being sent is a unicast, not a multicast. The uPnP standard only allows multicast M-SEARCH from control points at startup, only unicast afterwards. I don’t know the methods called well enough to judge. |
Two things..
|
@andrewfg You have more experience with this than I have. My statement comes from the specification document linked above, bottom of page 19, where it explicitly makes the statements I made above. Every device should listen to search messages on its own port 1900 an directly directed to it using unicast. Sending repeated multicast searches are explicitly disallowed. |
@mherwege your document relates to UPnP v2.0 which (different than v1.x) does indeed allow for M-SEARCH unicasts. So I stand corrected. However it is perhaps a slightly moot point since (analogous to the degree of take up of ipV6 vs ipV4) there are are many manufacturers who stayed with UPnP v1.x rather than migrating to 2.0. So in reality the CP (OH) probably has to do it both ways (at least until it has determined whether a specific device is on 1.x or on 2.0.. |
@andrewfg Version 1.1 also talks about unicast (https://openconnectivity.org/upnp-specs/UPnP-arch-DeviceArchitecture-v1.1.pdf), but anyway, if this solves the issue, I am fine with it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Would it make sense to let JUPnP do this, i.e. move the fix upstream? |
@jlaur I seem to recall that I already did something similar .. perhaps in the upnp addon suggestion finder. I will have a look.. |
It may be related to #3933 (replicated in #3918) .. but I need to check deeper. EDIT: yeah it is a similar issue; standard M-SEARCH uses a search target of ssdp:all (which is the specification) .. but in #3933 / #3918 it was necessary to extend that to a search target of root devices in general (since some devices are not specification compliant); and here it seems that @digitaldan needs to extend it to a search target of specific UDN (for other non compliant devices). => It might make sense to harmonise the approach between these three PRs. But I don't have a lot of time over new year as we have guests staying with us. EDIT2: .. so right now I am trying to figure out what is the relationship (if any) between UPnpIOServiceImpl and JUpnp .. and if there is already too much duplication between the two in general?? EDIT3: @digitaldan I do wonder if you need to M-SEARCH for each specific UDN? If there are many then there may be many such searches. Perhaps my solution to search for root devices in general may be sufficient? Out of interest did you test that? It might reduce the traffic a bit.. |
If you mean |
And also -- before merging this one -- we definitely need to do regression testing to ensure that this PR does not break the upnp addon suggestion finder service and/or the upnp thing discovery service. |
I did not test that, but i seem to remember someone saying when they scanned for new UPNP devices, it triggered the WiiMs to go online, so its possible that would work well. I'll test that today if i can to verify. If that does work, should we have the but, I guess this would be duplicating Lines 149 to 156 in 1a91ef2
Not sure if we would want to coordinate these two, or keep them separate? EDIT: it looks like UpnpAddonFinder.java also duplicates this. |
So i have been running a root search every min for the last hour and my WiiM is still online and registered (and i can see it responding). So i think this solution works. I'm kinda partial to just creating its own job in UPnpIOServiceImpl that runs this every minute and not try and create a dependency between the other two classes that also search? I let this run overnight to make sure there are no issues ( or leaks). protected void sendSearchRequest() {
upnpService.getControlPoint().search();
upnpService.getControlPoint().search(new RootDeviceHeader());
} An alternative is to have all core code reference |
I am tending to go for an aligned approach between all applications. But I need to get my head around the class architecture. I will get back to you. I'm thinking that we should not be putting hidden implicit functionality in the "Impl" implementation class; but rather expose it as explicit functionality in the interface class that it implements. So I am wondering why your binding code is calling the Impl class directly.. |
I am with you on this and I fully agree. It may make it harder to provide an easy fix on the 4.3 code though. While working on this more structural fix, should we make a temporary fix in the upnpcontrol binding as originally proposed anyway? This can then be removed after a 5.0 release.
I don’t think it does, but it is the only implementation, part of the OSGI bundle. You get the implementation with the interface in the bundle. |
I am working on this.. I will keep you posted. |
@digitaldan can you please help me to understand exactly what problem we are trying to fix here? I understand that your issue concerns devices that "disappear" from the OH upnp device registry. But there are several ways that such disappearance could occur as follows.
It seems that the current proposed 'solution' (aka work around) is to periodically "prod the device with a stick" by means of a periodic M-SEARCH (either for root devices in general, or for a specific UDN in particular). However this is at best a kludge, and there may be better solutions depending on what is the cause 1. thru 7. above. For example if it is issue 4. we could apply a (configurable) leniency period. Or if 5. we should patch Jupnp. Or if 3. then apply a one-time "prod" plus leniency period when the device expires. Plus many more possible solution ideas. So to help us move forward, can you please do a (e.g. Wireshark) trace of the Upnp UDP traffic for the device which currently fails. Please do the trace with the release versions of the OH binding and OH core i.e. without any "prod" traffic. EDIT: and in the meantime, I would suggest that you (temporarily) move on with the fix that you had originally planned in the binding without dependency on any eventual fix in OH core. (If we do eventually add a fix at OH core level, we can eventually revert the binding fix in OH 5.0) |
This is the issue. If i revert all changes and go back to the stock 4.3.x code, run wireshark and filter on the device's mac address, the device is quiet and does not send any messages when idle. With the proposed changes only then does it send messages (as replies to search requests), which i am also observing in wireshark.
Sure, sound good. |
In your original issue someone posted the following log. This means the device had been in the registry, and was removed. So there must have been some upnp traffic between OH and the device in order for it to have got in the registry. I would like to see that traffic.
PS it is worth being aware that not receiving a NOTIFY ALIVE is not synonymous with one not having been sent. So I do wonder if perhaps the notifications may be being lost due to a routing issue on your LAN. It would help if we are sure that Wireshark is connected to the exact same physical LAN segment that the device is physically connected to. Some routers and Wifi APs do sometimes do silly things with UDP multicasts (and less silly things with UDP unicasts, which might explain why the unicast response to M-SEARCH is always received; or something like that). PPS does the device show up in the DeviceSpy application? |
I assume this is b/c openHAB was restarted (and it sends something when it first comes online) , or the user did a scan for new UPNP devices, and this device answered. I can capture one or both of those events if you wish? |
Ideally both please. :) |
Maybe? but i don't think so, others have this same issue with this device, so i think its specific to this manufacturer implementation of UPNP. I have run a Unifi based network for over a decade , and not had any issues. |
See what you find with this tool.. |
I don't have a windows machine, do you have a mac or linux build of that ? |
^ |
@andrewfg Thank you for this very good analysis. Could IPv4 and IPv6 play a role in this? I assume the device could be discovered on both. But does it also send the ALIVE over both interfaces? And does uPnP listen for ALIVE messages from a specific source address? This may not have anything to do with this, but I got triggered by the discussion in this issue: jupnp/jupnp#73 |
fwiw, the WiiM does have a link local ipv6 address (fe80:) and i see mdns traffic on that as well as its iPV4 address. |
@mherwege I will look into the ipv4/6 and network interface issue; if there are 2 interfaces both with ipv4 and 6 then there could indeed be 4 potential input channels for search responses. I am not sure how jupnp resp. OH handles this. FWIW a control point multicast M-SEARCH elicits NOTIFY unicast responses from the device to the control point; and a device on its own sends (slightly different format) multicast NOTIFY messages every 2/3 (approx) of its declared max-age lifetime. |
@digitaldan if your Wireshark captures both ipv6 and ipv4 traffic, we might see if there is anything odd going on. |
I have not run all the tests, but this was one was the easiest to capture so i'll post what i have, when i scan for new UPNP devices (from the Main UI), the WiiM responds with 2 replies from the IPV4 and IPV6 addresses.
I ran this a couple times, each time is the same, responds with both ipv4 and ipv6. It also looks like there are devices on my network who are constantly (at least every minute) sending discovery requests for airplay and spotify services, which this device also responds to, but so far i have not seen the WiiM send anything but responses to queries. Next i'll try rebooting it and see if it announces something. Also this was running TCPDump directly on my openHAB host. EDIT: also this does trigger the WIiM to go ONLINE for some amount of time (maybe 30 mins?) |
Here are packets when the device reboots, and the device does come ONLINE in openHAB. I'm realize i know very little about the inner workings MDNS and UPNP. I assumed when wireshark says "standard query response" it indicates its responding to a request, but i get a feeling that a unsolicited response message is used as a notification mechanism ? So really its not always a response to something?
and
|
@digitaldan many thanks for running the trace. However unfortunately I don’t see any UPNP traffic in it. (So there is no evidence why the device came online). I would like to see the (apparently one single) udp NOTIFY packet from the device that made it originally come on line. There must have been one such packet to have caused the device to be added to the registry. |
@digitaldan rather than using Wireshark, it may be easier to turn on M-SEARCH ssdp:all and typical response
M-SEARCH upnp:rootdevice and typical response
Spontaneous NOTIFY ssdp:alive message
|
Thanks @andrewfg , for some reason i can't capture UDP 1900 with TCPDump, not sure if jupnp is binding to the interface in a way that prevents it from seeing packets. So i rebooted the device a few times, and jupnp does not see any messages at all from the device. The device remains OFFLINE in openHAB. If i then perform a scan in the Main UI, the device does respond, and comes back online
|
@digitaldan many thanks for the logs. The device seems quite normal in response to M-SEARCH responses.
Yeah. So my suspicion therefore is that the device is sending its notifications (on 1900) but that neither TCPDump nor OH are able to receive them. I honestly can't imagine a device claiming to be Upnp and not sending notifications, since that is so fundamental. (It is even a Linux based core, so should be quite professional). There are two things that could cause notifications to get lost -- namely..
I would love to get this device on a plain bare router and see what it is doing.. |
@digitaldan BTW I had a look over the Wiim forums to see if this issue had been reported there. And AFAICT it seems not. And it seems that other third party Upnp control points e.g. BubbleUpnp can find and operate these devices. That would indicate that your issue may indeed be related to the LAN stuff mentioned above. Or alternatively there is an issue in Jupnp. |
@digitaldan / @mherwege you may want to take a look at #4534 -- although I am not entirely sure if we should really add it to the OC Core or not. ?? |
See openhab/openhab-addons#17976
@mherwege This works nicely for my WiiM, and i think my Sonos connect devices also register now more quickly (maybe, maybe not, will test some more, they seem particularly temperamental these days).
This code is called when a participant is first registered, which makes sense, but then is polled when
addStatusListener
is called. That seemed a little strange to me, but i struggled a bit on where to put this logic short of creating another runnable.I'm guessing that there are 2 levels of keep alive going here? The existing one which tests if a device is responding to UPNP requests, and then this new code which keeps the underlying org.jupnp registry up to date, which the existing keep alive does not seem to affect ?
Any thoughts on this approach or an alternative way to periodically send?