-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Matter Casting support for "Audio Player Architecture" with a new"Casting Audio Player device type" and "Audio Player endpoint" ("Casting Audio Player" and adding "Basic Audio Player") + look into adding multi-room music streaming? #31389
Comments
@decenzo please take a look. |
Thanks for the suggestion. Please join Matter so you can volunteer to lead this effort! |
Sorry, I do not have the capacity for that myself. Would think that Amazon and/or Google might be best suited to look into this? Again, both Amazon and Google have competing smart speakers with their own technology already implementing these features. Apple also has the technology + use case with AirPlay and their HomePod series, but not as sure they would lead such a project(?). Is there someone from Amazon who leads Matter's matching "Video Player Architecture" + "Casting Video Player" development? @pgregorr-amazon or @sharadb-amazon could you maybe refer Amazon leads to look into and consider this feature request? Could this audio-only casting track perhaps be tackled there as an extension and continued part of that video casting project? Referencing to Amazon driving "Matter Casting" for their video playback devices and them also having many smart speakers too: https://www.aboutamazon.com/news/devices/amazon-ces-2024-announcements https://www.theverge.com/2024/1/9/24030324/amazon-matter-casting-echo-show-fire-tv-prime-video https://9to5google.com/2024/01/09/amazon-fire-tv-matter-casting/ https://www.aftvnews.com/new-matter-casting-video-player-for-fire-tv-gets-certified/ |
@marcelveldt as "the Matter guy at Nabu Casa" (and Home Assistant) perhaps this is something that you and ESPHome developers at Nabu Casa would be interested in helping architecture and develop for the Matter project? I would think this functionality might be very relative to your roadmap now that Home Assistant's recent announcents about both your "Music Assistant" and the Open Home Foundation + the Home Assistant's voice assistant work, which are separate things that I believe all align in spirit with this concept at a high-level?
PS: I also read that Nabu Casa is developing your own ESPHome based smart-speakers (and/or smart-display) hardware, and for such devices to also work as music streaming and audio player (A/V-receiver endpoint) without your "Music Assistant" integration acting as middleware I am guessing you are eventually going to want to add support for some cross-ecosystem support for some kind of standardized audio streaming protocol like Matter Casting with audio player endpoint features? |
Any feedback or input on these feature request ideas about Matter Casting support fod audio-only streaming and multi-room support for syncronized music stream playback in multiple rooms? |
There is an effort within Matter right now to define use cases for audio players / smart speakers. Please join us! |
Regarding audio inputs, there's need for something much more simple: have the possibility to play local audio sources through the smart device. You don't want separate speakers for your TV and various players do you? Television sets built-in speakers generally suck, you'd love to hear TV sound on your new speakers but they lack a TosLink / ARC input or some RCA Line-Ins. https://forum.raspiaudio.com/t/suggestion-for-espmuse-multiple-analog-inputs/401 Product designers have to start thinking in hardware too, not software-only approach. Sonos soundbars have this; they can learn the Volume Up/Down IR commands of the TV remote (already feasible by ESPHome); nowdays TVs can be set to only output audio through TosLink / HDMI ARC. And volume can be adjusted from TV's remote. The selling point here would be to have this speaker/preamp/player dongle integrated with the HA system and have eg. announcements mute/dim TV sound and restore it afterwards. Also support multiple sources like being an USB soundcard / Aux Line ins and stream sources and handle them the same way. When you turn on the TV, change to the TV sound source automatically. Check the links above with POCs and use cases explained. |
@chrisdecenzo There are groups of developers (myself included) who work on Matter projects but can't participate in CHIP because our employers won't join the CSA for various reasons. CHIP would benefit from having an 'Invited Expert' membership (like the W3C has) allowing individuals caught in this situation to participate. 'Invited Expert' would allow access to Slack and the draft spec with no voting ability. |
Don't waste your time with the proprietary chips, we wasted a year on them. Instead use a generic ARM CPU running Linux. Amazon has an Alexa qualified hot word processing library for ARM if you can convince them to give it to you. Another alternative is to use TensorFlow and an ARM CPU with an NPU. Google around github and you'll find example code. Allwinner and Rockchip make low cost CPUs specifically designed for this use case. |
Off-topic but FYI, even if XMOS is proprietary hardware their chips are very popular and have open-source compatible libraries. As far as I can tell the complete source code for XMOS's xcode-voice firmware is available on Github under More information about that in their user-guide for their XK-VOICE-L71:
By the way, I stumbled on this new "voice-kit" GitHub repository where ESPHome firmware developers are developing new and improved components for I2S audio, including XMOS output and input support, as well as a their own ESP32 native media player with support for FLAC, WAV/PCM, MP3, etc. for the upcoming Home Assistant voice-kit hardware platform from Nabu Casa (that as mentioned will be based on a combination on an XMOS xCORE chip and ESP32-S3):
Cutting-edge however they so far already added features/functions or improvements/enhancements to ESPHome, such as:
They also have many TODO inline coments in the code there if anyone are interested in helping them: https://github.com/search?q=repo%3Aesphome%2Fvoice-kit%20todo&type=code Note! Be aware that there are many comments there to that most of the new stuff are not yet stable. |
@chrisdecenzo any updates on Matter Casting use cases to add music playback via dedicated audio players and smart speakers? Read that Matter 1.4 seems to be adding messaging for Matter Casting but again is only focusing on video and smart displays, (but there is no mentioning of smart speakers or other audio-only devices). Brining this up as it at least looks like HiFi Streamers / Music Streamers (network audio players) as well as other types or smart speakers and multi-room audio solutions are trending and becoming more popular as products. Again, there are no open standards/protocols for multi-room audio (distributed audio) systems made for home multiroom audio. |
For reference; spotted that The Verge has got an update from Chris LaPré, CTO of the Connectivity Standards Alliance (CSA) and posted a new article about Matter Casting, most notible saying that “a new streaming speaker device type and related controls” is under development a new Matter complatible smart home standard for smart speakers is being led by a former Sonos executive looking to disrupt the smart speaker market, (the article even mentions Home Assistant though how exactly it fits into this context is unclear at this time): According to Fiede Schillmoeller, CEO of Legato, the Dutch company is leading the group developing this new Matter smart speaker specification. Legato is working with CSA member companies that are speaker manufacturers, software makers, ecosystems, and content providers, said Schillmoeller. The group is “fully focused on making audio and speakers work through Matter,” he said. While he declined to name names, he said, “It includes everyone you would want in that group.” (It’s worth noting that both Sonos and Bose are members of the CSA.) “We want smart homes to have this ambient information stream [and] speakers are a big enabler of that” The specifics of how speakers will work are still under development, so Schillmoeller couldn’t share exact details, but he confirmed that the team is building something that will handle “all the important stuff.” This should include choosing your audio source, controlling volume and full transport controls, “play, pause, skip ... all these things that help you control the speaker from any device you like,” he said. If implemented, this would mean you could control any Matter speaker from any Matter-supported ecosystem app, such as Apple Home, Alexa, Google Home, and Home Assistant Schillmoeller also hopes the team can integrate speakers with your smart home through scenes. “You should be able to use your smart home to drive speaker behavior,” he said. “Set what type of music plays in the morning when you wake up, what happens to your speaker when you leave the house, what happens to the audio when someone rings your doorbell.” Functions like these are available in smart home ecosystems today but are limited to proprietary software and hardware. Matter won’t enable multiroom music, a feature likely to remain at the ecosystem level. What Matter probably won’t do is enable multiroom music, a feature likely to remain at the ecosystem level. Additionally, don’t hold out hope that Apple and Google, and possibly Amazon, will enable their speakers as Matter device types. While it’s technically possible with Matter, controlling your Apple HomePod from your Google Nest Hub just doesn’t feel likely. I’d love to be proven wrong, though. |
Feature description
I hope it is OK to submit this large feature request here as I do not have the skills or resources to implement this myself. As such, this is just a feature suggestion meant as an open letter to Matter members for discussion, and not a feature proposal from me.
To summerize; this is a feature request where I ask you and others to consider having a separate Matter architecture for an "Audio Player Architecture" and adding a new "Casting Audio Player device type" and "Audio Player endpoint" ("Casting Audio Player" and "Basic Audio Player"), including speaker setup, multi-room groups, and advanced control from Matter Castings manufacturer’s apps.
That is, Matter needs support for universal audio-only casting as a standard for music services and other streaming audio sources, cast to speaker-only devices, (i.e. devices such as example smart speakers that are not at all designed to handle video playback).
Also need an example similar to
tv-casting-app
but perhaps instead for audio-only casting of music, sospeaker-casting-app
or?Perhaps also an example audio player app similar to
tv-app
but for music playback (preferably multi-room synchronization)?By the way, off-topic but FYI, there is currently no
Audio Output Cluster
but noAudio Input Cluster
in the chip data-model, though there is a genericMedia Input Cluster
, also, there is only a genericMedia Playback Cluster
but noAudio Playback Cluster
for audio-only to can have an optimized pipeline for high-quality music playback.Anyway, I think that there is a need for a proper "Audio Player Architecture" and I see no reason that it could not initially be based on the existing "Video Player Architecture" for Matter, and need a new "Casting Audio Player device type" as well which could be based on the existing "Casting Video Player device type". However think there are then needs that have to be different between video player and audio player meant for HiFi music playback. As such I think you should try to aim to design an architecture primarily for music playback that works for a combination of "smart speaker", "home audio", and "high fidelity", which I understand may have different but at least more similar use cases if talking about voice control versus music playback.
While some video-specific features could be removed if basing it on the existing "Video Player Architecture", I think it would be preferable to also extend a dedicated "Audio Player Architecture" with some audio-specific features to optimize for home audio setups with Hi-Fi quality amplifiers and speakers designed for music playback, and not solely for embedded smart speakers.
An alternative could be to redesign and rename the "Video Player Architecture" into a more generic "Media Player Architecture"?
An example feature is real-time audio synchronization between different smart speakers to allow for synchronized multi-room audio playback of music on several Matter Audio Casting enabled speakers installed in the same home, (also known as distributed audio system). This needs support for "Audio Group", usually named "Speaker Group" and perhaps also "Audio Zone" ("Speaker Zone" or area). Preferably also need to have separate volume controls for each speaker and/or zone to compensate for differences in apparent volume due to room size and shape as well as speaker products used in different rooms.
https://en.wikipedia.org/wiki/Multi-room_audio
Multi-room audio:
Another argument for a separate audio-only architecture and audio player specialized for just music playback could enable it to run with even fewer resources on constrained devices.
Background: The existing Matter specification does feature a "Video Player Architecture" with a "Casting Video Player device type" and "Video Player endpoint" ("Casting Video Player" and "Basic Video Player"). What looks to currently missing but is directly related is a "pure" audio architecture with Matter Casting Audio Player device type, and maybe an Audio Input cluster as well.
Product use case: A client/server design that works well for music/audio apps and smart speaker products, + products with audio line-in, (i.e. devices designed for only pure audio output and/or input that are normally used just music playback, including multi-room sound systems. Probably sometimes but not always including microphone input for voice assistant. The point is that it means products that lack any kind of video output like with video screens such as televisions and/or smart control displays/screens).
The main problem to solve: There are many different audio streaming protocols for commercial use, from basic to audiophile-class audio quality, and there are plenty more music streaming services around today that do not support all of those. Having plenty of proprietary and closed-sourced solutions from different commercials means fragmentation, audio players/receivers and music services that do not communicate with one another, and no way for users to control all of their music from a single interface or stream the audio to different ecosystems at the same time.
I think that other than the obvious smart speakers with voice assistants, another real-world market and use-case for pure audio players are high-quality speakers and Hi-Fi grade sound systems for music playback, whether or not they would be used for being as a single-point and multi-room sound systems for music playback, their primary audience would probably be users of music streaming from example streaming music from different commercial music services apps like Amazon Music, Spotify, SiriusXM, Pandora, Tidal, Qobuz, Deezer, YouTube Music, Apple Music, as well as additional audio-streaming services for other types of content (like example Amazon Audible for audiobooks) if and when they add support “Matter Casting” streaming protocol for audio to their apps.
If implemented, please be sure to include support for the concept of so-called "speakerless devices", meaning audio-output dongles (with TOSLINK and Phono AUX-out or line-out ports for external speakers and sound systems from third parties), such as Google's original "Chromecast Audio" product which enables adding Google Cast audio player capability to any third-party speaker / sound system, as well as "Amazon Echo Input", "Amazon Echo Link", "Echo Link Amp" which similarly also adds AUX output to third-party speaker / sound system (but Amazon Echo products also have embedded voice assistant via built-in microphones).
https://en.wikipedia.org/wiki/Amazon_Echo#Speakerless_devices
https://en.wikipedia.org/wiki/Chromecast#Chromecast_Audio
A popular example of Hi-Fi audio streamer products without a built-in voice assistant is the WiiM series from Linkplay Technology:
A new product idea to consider accommodation for in a new audio-only architecture would also include support for the concept of audio-input dongles. That is, audio-steaming server dongles with "line-in" and/or "microphone" input ports that basically work as stand-alone soundcards on the network act as embedded audio digitizer appliance devices for streaming "Matter Casting Client” of audio-only which can be streamed to any set "Audio Player endpoint", which can either be a single endpoint of a grouped endpoints (audio group) for multiroom music playback. This would allow a user to connect any legacy audio source, like an LP record player (phonograph turntable), cassette deck, or CD-player (for Audio-CDs) to such an audio-input dongle and stream that audio to any “Matter Casting” enabled audio player.
As far as I know there are no commercial products on the market, but check out this "Vinyl Cast" app as a proof-of-concept:
Platform
all
Platform Version(s)
No response
Anything else?
Perhaps an existing Matter group member would be willing to contribute their existing technology solutions as a base for audio grouping and synchronized multi-room audio support? If not the whole thing then perhaps parts of the specifications, patents on software for relative technologies.
Amazon Alexa features multi-room music support:
Google has "Google Cast" (Chromecast Audio) which supports multi-room audio with grouping of speakers and multiroom synchronized playback so maybe they could be convinced to contribute components?
Apple features multiroom support for AirPlay 2 audio streaming:
DTS Play-Fiis a premium wireless audio ecosystem for whole-home music and TV audio, supporting low-latency and high-resolution 24-bit/192kHz lossless streaming, and sub-millisecond playback accuracy synchronization technology
Espressif ESP-ADF (Espressif Audio Development Framework ) do support ESP Multi-Room Music but not synchronized on its own?
Sonos, perhaps the largest on the market for multi-room audio speaker systems, and is now at least a member of CSA today:
IKEA of Sweden AB currently has a partnership with Sonos to make Wi-Fi speakers with multi-rooms audio support:
Yamaha MusicCast (Yamaha is not yet a member of the CSA), however Yamaha MusicCast prove need for high fidelity quality:
Roon Ready (Roon’s RAAT streaming technology by RoonLabs), not CSA member but again prove interoperability needed:
BluOS is a wireless hi-res multi-room platform that lets you manage all your music and stream it to any BluOS Enabled player using a phone, tablet, or computer. BluOS is an operating system that manages and controls all your music. They were the 2023 "Mark of Excellence" winner of Consumer Technology Association Smart Home Division.
HEOS (HEOS® Built-in) from Denon is multi-room speaker technology built-in to newer audio equipment from Denon:
There are also other open-source and closed-source multi-room audio solutions for multi-room audio synchronisation. Example:
Snapcast
SlimProto & SliMP3 protocols for Logitech Squeezebox players (for Logitech Media Server, a.k.a. LMS/SlimServer, SqueezeCenter)
Strobe audio
Music Player Daemon (MPD)
PS: FYI, maybe relative is that last year Google won over Sonos in a patent infringement lawsuit about multi-room audio groups:
https://www.engadget.com/google-brings-back-smart-speaker-grouping-after-sonos-lawsuit-victory-081200931.html
The text was updated successfully, but these errors were encountered: