Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Upnp Device Finder Service #4534

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

andrewfg
Copy link
Contributor

@andrewfg andrewfg commented Jan 2, 2025

Relates to #4527
Relates to openhab/openhab-addons#17976

This PR adds a new component service that can be used to keep mis-behaving UPnP devices 'alive' in the UPnP registry by sending a targeted M-SEARCH message for their UDN before their maxAge expires.

I am not sure if we should add the to OH Core or if we should actually solve this issue in the respective OH addon for example so I am posting this code (work-in-progress) for review and comments.

Signed-off-by: Andrew Fiddian-Green software@whitebear.ch

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
@digitaldan
Copy link
Contributor

digitaldan commented Jan 2, 2025

Wow, i had almost the exact same code, i mean really identical ( right down to using Math.max on the expiration calc ) . That was weird to read.

Couple things i observed running basically the same thing over night.

While you can get the devices getMaxAgeSeconds you don't really know when the device was registered with JUpNp. You can try to approximate this by listening to events, but in my case i was doing this in the addon, so if the device registered before the addon started, the time is already off. And if you restarted the binding, you would never know when the device actually registered.

Putting this in the finder core addon probably mitigates this a bit more, since it should startup along with the jupnp service, but who knows? I also noticed that when sending a targeted search, the device did not always respond, i'm guessing this is some type of backoff logic by the device to identical search requests to avoid flooding the network ? I would assume that after 60 or 120 seconds that would clear. But it seemed kinda inconsistant.

One thing i also did was when the thing was added, if the device was not in the jupnp registry (device is null), i would send a targeted search request to it and try and get it registered.

I wish Jupnp included the registration time of the device as well as it advertised max age.

I can give this a try today if you like and see how it works?

@andrewfg
Copy link
Contributor Author

andrewfg commented Jan 2, 2025

if the device registered before the addon started, the time is already off. And if you restarted the binding, you would never know when the device actually registered.

In my code when the addon would start (e.g. calling subscribe) that sends an immediate search which effectively resets and synchronizes the counting of both the registry and the finder service.

@digitaldan
Copy link
Contributor

that sends an immediate search which effectively resets and synchronizes the counting of both the registry and the finder service.

I did the same, but did not always get a subscription callback for the device, which was what i was trying to debug today.

@andrewfg
Copy link
Contributor Author

andrewfg commented Jan 2, 2025

would send a targeted search request to it and try and get it registered.

In my case when the component is activated it does two general searches for sddp:all and ssdp:rootdevice to make sure the registry is populated.

wish Jupnp included the registration time of the device as well as it advertised max age

Yeah. I think it might be accrssible via java reflection but probably a pita to implement.

@digitaldan
Copy link
Contributor

In my case when the component is activated it does two general searches for sddp:all and ssdp:rootdevice to make sure the registry is populated.

I also did something similar in the modified function of the HandlerFactory service

       upnpService.getControlPoint().search();
       upnpService.getControlPoint().search(new RootDeviceHeader());

@digitaldan
Copy link
Contributor

I'll try running this right now and revert the control binding back and see how it goes.

@digitaldan
Copy link
Contributor

digitaldan commented Jan 2, 2025

So my device still went offline after exactly 30 mins. Couple of observations.

  1. After installing these changes and reverting back the upnpcontrol binding i rebooted my openHAB to make sure we had a clean start.
  2. When openHAB came back online my WiiM was still offline, and there were no logs about it at the upnp transport layer.
  3. I Modified the upnpcontrol binding to send a search packet to a device if its not in the registry when adding a Render Thing and deployed that. See the following for the changes (pretty small)

https://github.com/digitaldan/openhab-addons/blob/08ee504f9ded2e0de10939910ec013e3093b287a/bundles/org.openhab.binding.upnpcontrol/src/main/java/org/openhab/binding/upnpcontrol/internal/UpnpControlHandlerFactory.java#L197-L201

  1. After the updated control binding was installed, my device came back online, but there were no logs messages from the new Finder Class here which i would at least expect it to log Added subscription for FF98F09C-C0D7-589B-C27C-CE6FFF98F09C, but it does not.
  2. After exactly 30 mins my device whent offline with these relevant logs:
11:32:27.258 [TRACE] [sport.upnp.internal.UpnpIOServiceImpl] - Removing an UPNP service subscription 'AVTransport' for particpant 'FF98F09C-C0D7-589B-C27C-CE6FFF98F09C'
...
11:32:27.691 [DEBUG] [overy.UpnpControlDiscoveryParticipant] - Device type MediaRenderer, manufacturer Linkplay Technology Inc., model WiiM Pro Receiver, SN# 00001, UDN FF98F09C-C0D7-589B-C27C-CE6FFF98F09C
11:32:27.691 [DEBUG] [sport.upnp.internal.UpnpIOServiceImpl] - Device 'FF98F09C-C0D7-589B-C27C-CE6FFF98F09C' reachability status changed to 'false'
11:32:27.691 [DEBUG] [overy.UpnpControlDiscoveryParticipant] - Media renderer found: Linkplay Technology Inc., WiiM Pro Receiver
11:32:27.693 [INFO ] [openhab.event.ItemStateChangedEvent  ] - Item 'WiiM_Pro1_Media_Control' changed from UNDEF to PAUSE
11:32:27.694 [DEBUG] [pcontrol.internal.handler.UpnpHandler] - UPnP device WiiM Pro-1 received status update false
11:32:27.696 [INFO ] [hab.event.ThingStatusInfoChangedEvent] - Thing 'upnpcontrol:upnprenderer:FF98F09C-C0D7-589B-C27C-CE6FFF98F09C' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Communication lost with WiiM Pro-1

Whats odd is that i don't see any logging from the Finder service, like the listeners are not firing. Also if you look above, the UpnpIOServiceImpl starts removing subscriptions, but then we see UpnpControlDiscoveryParticipant detect the device. I'm a little confused, but will look into it a little later today.

@andrewfg
Copy link
Contributor Author

andrewfg commented Jan 3, 2025

I will probably need to add some retry logic for the case the search fails.

@digitaldan
Copy link
Contributor

I will probably need to add some retry logic for the case the search fails.

So the good news is my device is still online, so thats great. There were 2 instances over night where the scheduled search happened, but the device did not respond and a minute later we see

Device uuid:FF98F09C-C0D7-589B-C27C-CE6FFF98F09C removed unexpectedly from registry; Executing search

But when the code tries searching right after this log statement, the device does respond and the thing comes back online, so thats good. Checking the upnp log, we indeed never received a response to the first search request performed a minute earlier.

One thing i'm a bit confused about, but is not really a problem, is this:

23:59:16.643 [WARN ] [upnpcontrol.internal.UpnpDeviceFinder] - Device uuid:FF98F09C-C0D7-589B-C27C-CE6FFF98F09C removed unexpectedly from registry; Executing search
23:59:16.643 [DEBUG] [sport.upnp.internal.UpnpIOServiceImpl] - Device 'FF98F09C-C0D7-589B-C27C-CE6FFF98F09C' reachability status changed to 'false'
23:59:16.644 [DEBUG] [overy.UpnpControlDiscoveryParticipant] - Media renderer found: Linkplay Technology Inc., WiiM Pro Receiver
23:59:16.646 [DEBUG] [pcontrol.internal.handler.UpnpHandler] - UPnP device WiiM Pro-1 received status update false
23:59:16.646 [INFO ] [openhab.event.ItemStateChangedEvent  ] - Item 'WiiM_Pro1_Media_Control' changed from UNDEF to PAUSE
23:59:16.647 [INFO ] [hab.event.ThingStatusInfoChangedEvent] - Thing 'upnpcontrol:upnprenderer:FF98F09C-C0D7-589B-C27C-CE6FFF98F09C' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Communication lost with WiiM Pro-1
23:59:19.201 [TRACE] [pcontrol.internal.handler.UpnpHandler] - UPnP device WiiM Pro-1, channel Singlevolume already exists
23:59:19.201 [DEBUG] [upnpcontrol.internal.UpnpDeviceFinder] - Scheduling search for uuid:FF98F09C-C0D7-589B-C27C-CE6FFF98F09C in 1740 seconds
23:59:19.201 [DEBUG] [overy.UpnpControlDiscoveryParticipant] - Device type MediaRenderer, manufacturer Linkplay Technology Inc., model WiiM Pro Receiver, SN# 00001, UDN FF98F09C-C0D7-589B-C27C-CE6FFF98F09C
23:59:19.201 [TRACE] [pcontrol.internal.handler.UpnpHandler] - UPnP device WiiM Pro-1, channel Singlemute already exists
23:59:19.203 [DEBUG] [overy.UpnpControlDiscoveryParticipant] - Media renderer found: Linkplay Technology Inc., WiiM Pro Receiver
23:59:19.749 [TRACE] [pcontrol.internal.handler.UpnpHandler] - UPnP device WiiM Pro-1, channel Singlevolume already exists
23:59:19.749 [DEBUG] [upnpcontrol.internal.UpnpDeviceFinder] - Scheduling search for uuid:FF98F09C-C0D7-589B-C27C-CE6FFF98F09C in 1740 seconds

The first 3 lines are within a single millisecond of each other and seem to show the device unregistering, but then the discovery participant is notified at the same time of the device. The device does not respond to the search request in the upnp logs until 23:59:19.128 (3 seconds later) which aligns with the logs above when the device is registered again. So i'm confused why the discovery participant was notified at all before that of the device? Again not a problem per se , just a curiosity.

@andrewfg
Copy link
Contributor Author

andrewfg commented Jan 3, 2025

why the discovery participant was notified at all before that of the device?
Again not a problem per se , just a curiosity.

As you say. Curious.

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
@andrewfg
Copy link
Contributor Author

andrewfg commented Jan 3, 2025

@digitaldan I just pushed a commit that retries the searches at 5 second intervals for a maximum of 5 retries, or until the device is discovered (in fewer tries).

@digitaldan
Copy link
Contributor

Thats great, i'll run that now for a while and let you know.

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
@andrewfg
Copy link
Contributor Author

andrewfg commented Jan 4, 2025

@mherwege I am interested to hear your thoughts on the following topics:

  1. Do you think this should be included at all in OH Core or not? (It is after all, a bolt on fix/kludge for non compliant devices, rather than normal functionality for well behaved ones). But if yes..

  2. The class is currently written as a fully separate OSGi component service that the consuming add-on must specifically instantiate via a @reference in an @activate method. But I am wondering if it should rather be implemented as a functional extension of one of the already existing UPnP classes (probably the Registry, or perhaps ControlPoint classes), so the new functionality becomes available to all existing OH consumers of those class components without the need to instantiate another component. However since those original class components have been externally declared in JUpnP I would struggle to know how to do this. => Any thoughts?

Signed-off-by: Andrew Fiddian-Green <software@whitebear.ch>
@digitaldan
Copy link
Contributor

So i ran this for about 24 hours. The WiiM stayed active all the way until about 2 hours ago. Over this time it did deregister 5 times, but in four of those cases it responded to the final search request sent after deregistering and stayed online.

Then, for some reason it would not respond to the scheduled search requests or the the requests sent when a device is deregistered (so 5 search requests before it timed out, and 5 search requests after it deregistered).

So this definitely helped, but it would be nice when a device deregisters to continue to try to discover it every minute or so (it could be rebooting, or there might be a network issue, etc..)

An yes, i think this device is not behaving nicely. I might see if i can report this on the forums, i think the devs there actually monitor those.

@andrewfg
Copy link
Contributor Author

andrewfg commented Jan 4, 2025

@digitaldan I am reluctant to go much further down this road because we really still have no clue about the real problems we are trying to work around. So before doing much more I want to run some proper UPnP diagnostic tools on this device to get a full picture of what it does/not do. I have plenty of test tools but no device to test. You have the device but apparently no tools. So I did some googling about possible Upnp tools from Apple (below) and so I wonder if you can kindly give them a try?

@digitaldan
Copy link
Contributor

So I did some googling about possible Upnp tools from Apple (below) and so I wonder if you can kindly give them a try?

Before i pick a tool, what type of information are you looking for? I guess my first choice for general troubleshooting would be Wireshark/tcpdump with a capture filter to specifically target the packets we are after, but again i guess i'm not sure what info you are looking for? I also have jUpNP logging at TRACE level to its own file appender, which seems like it does a pretty good job as a general sniffer?

@digitaldan
Copy link
Contributor

digitaldan commented Jan 4, 2025

Let me ask a slightly off-topic question, but still related. @mherwege this is probably aimed more to you.

For this device specifically, our UPnP control binding handles 80% of the functionality, but is missing some key vendor specific features, namely:

  1. Device playlist support via a custom UPnP/SOAP namespace
  2. Multi Room support via a local HTTP API (as well as a bunch more features)

I have struggled figuring out the right place to put these into a binding, some thoughts:

  1. Add to the existing UPnPControl binding using a "Vendor" abstraction for adding vendor specific features to the existing Renderer Thing.
  2. Add to the existing UPnPControl binding as its own "LinkPlay Thing" and extend the current Render handler with specific vendor logic
  3. Refactor the UPnPControl binding so other bindings can inherit from it (like mqtt bindings) and create a new LinkPlay binding (this sounds very hard).
  4. Create a new binding, ala Sonos, which largely duplicates much of what the UPnPControl binding does (do not like duplicating code).
  5. Create a new binding that just supports the vendor logic, so there would be 2 things for this device (UPnPControl and LinkPlayControl, this seems less then ideal)

I'm not saying i like all these options, its just what i have been thinking about.

The reason i ask is that if we specifically support this device in a binding, we could isolate workaround for it there and not in the core.

@andrewfg
Copy link
Contributor Author

andrewfg commented Jan 4, 2025

not sure what info you are looking for?

Initially I want to see raw data on every datagram that the devices sends, on every port, on every protocol, on every ip version, both autonomously, and as responses to queries from other apps.

@andrewfg
Copy link
Contributor Author

andrewfg commented Jan 4, 2025

Apropos playlists and multi room support: I have also done a lot of work on this. Pure UPNP / DLNA does neither of these very well.

I see that your Wiim device has an own namespace service for playlists to presumably get around UPNP / DLNA weaknesses. And Sonos and Linn (to name a few) also do similar things.

In pure UPNP/DLNA the playlist is not managed by the renderer but rather by the control point. The renderer supports a playlist of only two tracks (setAvTransportURI and setNextAvTransportURI) which allows renderers to do gapless track transitions. So rather than going down the custom namespace path for different manufacturers renderers I would suggest adding playlist support to OH whereby it pushes tracks to the renderer via setAvTransportURI plus setNextAvTransportURI if the renderer supports it.

Apropos multi room support, the original UPNP/DLNA standard only allows synchronising the track start (you can play around with custom delays in sending set.. resp. setNext.. commands in order to compensate for different renderers track start latency). However if the various renderers play out the tracks at different frames per second, then track synch can drift apart (particularly on longer tracks). Using renderers of the same model ameliorates but does not fully solve this.

There is a v2 of the renderer UPNP specification that implements clock synch methods across renderers plus a playSynched method. However different manufacturers have tended to spurn following the playSynched specification in favour of using their own custom namespace api. Probably through laziness and a wish to impose their own proprietary ecosystem.

Nevertheless there is some potential for the OH playlist service (TM! new!) to implement a common playlist towards multiple renderers from different manufacturers with track start synching via thing config params, and play rate synching via (say) a kind of proxy server to split a common stream from the media server into multiple rate transposed streams to the individual renderers.

This would be a big project. But I have the core concepts in my head. We could do this in parallel with your suggestion to go down the manufacturer proprietary route. But it would be well to have a clear architecture defined in advance. With (say) common interfaces defined by OH and implemented specifically for different manufacturers custom ecosystems.

@andrewfg
Copy link
Contributor Author

andrewfg commented Jan 4, 2025

PS one reason why manufacturers implement custom playlist management in the renderer is because if the playlist is managed by the control point then that must remain online to dish out the track play commands. This is not possible if the cp is a mobile device. But since OH remains on 7/24 it could easily dish out track play commands to all attached renderers continuously.

@mherwege
Copy link
Contributor

mherwege commented Jan 4, 2025

The renderer supports a playlist of only two tracks (setAvTransportURI and setNextAvTransportURI) which allows renderers to do gapless track transitions. So rather than going down the custom namespace path for different manufacturers renderers I would suggest adding playlist support to OH whereby it pushes tracks to the renderer via setAvTransportURI plus setNextAvTransportURI if the renderer supports it.

And this is exactly what I tried implementing in the upnpcontrol binding. It may not be perfect, but there is playlist management in that binding.

@mherwege
Copy link
Contributor

mherwege commented Jan 4, 2025

  • Do you think this should be included at all in OH Core or not? (It is after all, a bolt on fix/kludge for non compliant devices, rather than normal functionality for well behaved ones). But if yes..

It it solves the issue for a good number of devices, why not. But I am still struggling to understand if there is no issue with jupnp or the specific's user network setup in the first place. If this is a common problem, it should actually be resolved in the jupnp library, but that starts being outside of the scope of pure OH.

2. The class is currently written as a fully separate OSGi component service that the consuming add-on must specifically instantiate via a @reference in an @activate method. But I am wondering if it should rather be implemented as a functional extension of one of the already existing UPnP classes (probably the Registry, or perhaps ControlPoint classes), so the new functionality becomes available to all existing OH consumers of those class components without the need to instantiate another component. However since those original class components have been externally declared in JUpnP I would struggle to know how to do this. => Any thoughts?

The classes you refer to put it in the scope of jupnp, not OH. The upnp transport is there to try to isolate that. You may not need an OSGI service though, as all upnp users in OH also use the transport, so they could just call an interface to register for updates.

@mherwege
Copy link
Contributor

mherwege commented Jan 4, 2025

  • Add to the existing UPnPControl binding using a "Vendor" abstraction for adding vendor specific features to the existing Renderer Thing.
  • Add to the existing UPnPControl binding as its own "LinkPlay Thing" and extend the current Render handler with specific vendor logic
  • Refactor the UPnPControl binding so other bindings can inherit from it (like mqtt bindings) and create a new LinkPlay binding (this sounds very hard).
  • Create a new binding, ala Sonos, which largely duplicates much of what the UPnPControl binding does (do not like duplicating code).
  • Create a new binding that just supports the vendor logic, so there would be 2 things for this device (UPnPControl and LinkPlayControl, this seems less then ideal)

This is a hard call. I would probably go for 2 short term, but look into 3 is the longer term option.

@digitaldan
Copy link
Contributor

digitaldan commented Jan 4, 2025

I think at this point i would opt for # 2 as well. This handler could then send more frequent search requests just for this device, support the additional mutli-room and playlist support, and we will not need to touch the core at this point.

I dropped a message in the WiiM forums , its actually quite active with more then a few devs there, so maybe some progress will be made in an update.

@andrewfg
Copy link
Contributor Author

andrewfg commented Jan 5, 2025

Umm. Could it simply be that JUpnp does not support ipV6?

image

@digitaldan you say your device has both ipV4 and ipV6 connections. So perhaps it sends its NOTIFY announcements on the Upnp ipV6 multicast address/port and not on the Upnp ipv4 ones? In that case OH will not (yet) see those notifications.

Whereas by contrast the M-SEARCH hack proposed in this PR does work because OH sends the search on the Upnp ipV4 multicast address/port, and the device correctly responds using ipV4.

@digitaldan could you therefore please try disabling ipV6 for this device?

@kaikreuzer / @lolodomo just copying you on the post in case you have any thoughts on this..

@andrewfg
Copy link
Contributor Author

andrewfg commented Jan 5, 2025

Pinging @wborn too..

@digitaldan
Copy link
Contributor

Umm. Could it simply be that JUpnp does not support ipV6?

So i just ran tcpdump for all udp packets on port 1900, and i don't see anything from the WiiM on IPV6.

Also the WiiM support team got back to me and asked me to open a ticket up. Another user (likely a developer) on the forums confirmed with tcpdump he was also not seeing any notify packets from the device over a 90 min period, but his Sonos device were frequently broadcasting.

They have a frequent release cadence for firmware updates, so i'm optimistic this could get fixed sooner then later 🤞

@kaikreuzer
Copy link
Member

Could it simply be that JUpnp does not support ipV6?

Yes, that's correct, but I would think that this should be no problem, because IPv4 is still the default for UPnP. I don't know if the specs say anything about how a device is supposed to react if both IP versions are available, but my guess is that it definitely has to announce and respond on IPv4 in such a case.

But I am still struggling to understand if there is no issue with jupnp or the specific's user network setup in the first place. If this is a common problem, it should actually be resolved in the jupnp library

I agree here, but then again, jUPnP tries to be a spec-compliant implementation and workarounds are not really nice. Having said this, there are already a couple of special handlings for non-compliant devices in the code as it turns out that there are far too many non-compliant devices out there in the wild.

but that starts being outside of the scope of pure OH.

You are all invited to create PRs against jUPnP as well - luckily @wborn and myself are maintainers there. 😄

@digitaldan
Copy link
Contributor

Just a quick update, WiiM/Linkplay has been very responsive. They confirmed the issue and the development team is going to update their code to send NOTIFY packets. They are also planning on pushing an early build to my devices so i can test it out. Hopefully this resolves at least this device's issues sometime soon.

@andrewfg
Copy link
Contributor Author

the development team is going to update their code to send NOTIFY packets

Excellent news! Let us wait to confirm that they fixed it. But after that I think we can close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants