You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I hope it is OK to submit this large feature request here as I do not have the skills or resources to implement this as a community project myself. As such, this is just a feature suggestion meant as an open letter to Espressif and community members for discussion, and not a feature proposal from me.
Please expand from and try making your ESP Multi-Room Music example from this Espressif ESP-ADF (Espressif Audio Development Framework) into ratified open standard and and turnkey solution for network music streamers / network audio players, adding matching reference network "music streamer" (audio input/output) board hardware for both server and client-sides as ready-made development kits (with audio-input and audio-output jacks for HiFi-music).
The point is that any company should be able to easily use that as a turnkey solution design to make "ESP Multi-Room Music" compatible audio streamers and audio players (music receivers) following your open standardization and should be compatible with each other out-of-the-box as long they follow your openly published specification that needs to be refined, well documentation and feature versioning for backwards and forward-compatibility.
The back-story is that HiFi Streamers / Music Streamers (network audio players) to enable multi-room audio (also known as "whole house audio") are trending and becoming more popular as commercial products but it is a wild west as there are no open standards so all manufacturers use their own closed-source or/and licensed proprietary technology for streaming and multi-room audio synchronization. As such this is a feature request where I ask you and others to consider creating a dedicated "Audio Player Architecture" for client-server model architecture and framework/libraries that enable audio casting (i.e. "Casting Audio Player device type") and audio receiver ("Audio Player endpoint" with "Casting Audio Player" and "Basic Audio Player"), that enable accurate low-latency multiroom audio synchronization for music playback using optimized pipeline(s) for high-quality music playback, including speaker setup, multi-room groups, and advanced control via an open API.
Therefore I would like to suggest that Espressif consider improving and making your ESP Multi-Room Music example and player pipelines some kind of Espressif official complete open-source multi-room audio streaming (distributed audio) solution that can used as an open standard on ESP32 based audio players (e.g. audio network player / HiFi music streamers) and music playback on smart speakers. The use case is to enable simpler creation of stanzaized and cross-manufacturer compatible multi-room audio (distributed audio) systems that are compatible with each other as they will all use the same open standard.
Key feature is accurate audio synchronization between different smart speakers to allow for synchronized multi-room audio playback of music on several audio recievers / smart speakers installed in the same home, (also known as distributed audio system). This needs support for "Audio Group", usually named "Speaker Group" and perhaps also "Audio Zone" ("Speaker Zone" or area). Preferably also need to have separate volume controls for each speaker and/or zone to compensate for differences in apparent volume due to room size and shape as well as speaker products used in different rooms.
Another key concept of so-called "speakerless devices", meaning audio-output dongles (with TOSLINK and Phono AUX-out or line-out ports for external speakers and sound systems from third parties), such as Google's original "Chromecast Audio" product which enables adding Google Cast audio player capability to any third-party speaker / sound system, as well as "Amazon Echo Input", "Amazon Echo Link", "Echo Link Amp" which similarly also adds AUX output to third-party speaker / sound system (but Amazon Echo products also have embedded voice assistant via built-in microphones).
Single source, single zone = Scenario: One music app streaming to a single audio player/speaker).
Single source, multiple zones = Scenario: One music app streaming to an audio group containing multiple audio players/speakers.
Multiple source, multiple zones = Scenario: Several music apps stream to separate audio players/speakers in the same home.
Product use case: A client/server design that works well for music/audio apps and smart speaker products, + products with audio line-in, (i.e. devices designed for only pure audio output and/or input that are normally used just music playback, including multi-room sound systems. Probably sometimes but not always including microphone input for voice assistant. The point is that it means products that lack any kind of video output like with video screens such as televisions and/or smart control displays/screens).
The main problem to solve: There are many different audio streaming protocols for commercial use, from basic to audiophile-class audio quality, and there are plenty more music streaming services around today that do not support all of those. Having plenty of proprietary and closed-sourced solutions from different commercials means fragmentation, audio players/receivers and music services that do not communicate with one another, and no way for users to control all of their music from a single interface or stream the audio to different ecosystems at the same time.
I think that other than the obvious smart speakers with voice assistants, another real-world market and use-case for pure audio players are high-quality speakers and Hi-Fi grade sound systems for music playback, whether or not they would be used for being as a single-point and multi-room sound systems for music playback, their primary audience would probably be users of music streaming from example streaming music from different commercial music services apps like Amazon Music, Spotify, SiriusXM, Pandora, Tidal, Qobuz, Deezer, YouTube Music, Apple Music, as well as additional audio-streaming services for other types of content (like example Amazon Audible for audiobooks) if and when they add support “Matter Casting” streaming protocol for audio to their apps.
A popular example of Hi-Fi audio streamer products without a built-in voice assistant is the WiiM series from Linkplay Technology:
A new product idea to consider accommodation for in a new audio-only architecture would also include support for the concept of audio-input dongles. That is, audio-steaming server dongles with "line-in" and/or "microphone" input ports that basically work as stand-alone soundcards on the network act as embedded audio digitizer appliance devices for streaming "Matter Casting Client” of audio-only which can be streamed to any set "Audio Player endpoint", which can either be a single endpoint of a grouped endpoints (audio group) for multiroom music playback. This would allow a user to connect any legacy audio source, like an LP record player (phonograph turntable), cassette deck, or CD-player (for Audio-CDs) to such an audio-input dongle and stream that audio to any “Matter Casting” enabled audio player.
As far as I know there are no commercial products on the market, but check out this "Vinyl Cast" app as a proof-of-concept:
Please build-out the ESP Multi-Room Music example from this Espressif ESP-ADF (Espressif Audio Development Framework) and make it into a ratified open standard and turnkey solution for network music streamers / network audio players, adding matching reference network "music streamer" (audio input/output) board hardware for both server and client-sides as ready-made development kits (with audio-input and audio-output jacks for HiFi-music).
Tip is that the existing Matter specification does feature a "Video Player Architecture" with a "Casting Video Player device type" and "Video Player endpoint" ("Casting Video Player" and "Basic Video Player"). What looks to currently missing is a "pure" audio architecture with Matter Casting Audio Player device type, and maybe an Audio Input cluster as well. Perhaps it could be based on ideas from looking at the existing "Video Player Architecture" from the Matter Casting specification (Matter standard). That features most of what is needed for video but seem to lack audio specific features for music streaming and multiroom syncronization, but I guess a new "Casting Audio Player device type" as well which could be based on the existing "Casting Video Player device type"?
While some video-specific features could be removed if basing it on the existing "Video Player Architecture", I think it would be preferable to also extend a dedicated "Audio Player Architecture" with some audio-specific features to optimize for home audio setups with Hi-Fi quality amplifiers and speakers designed for music playback, and not solely for embedded smart speakers.
An alternative could be to repurpose it and rename the "Video Player Architecture" into a more generic "Media Player Architecture"?
Important is also that both the audio streaming and low-latency multiroom audio synchronization technology is made as an open standard and use an open license, or at least royalty free even for commercial implementation so anyone can use it in any multi-room audio system. What is needed is an audio-only casting standard that can be used by various streaming audio sources, cast to speaker-only devices, (i.e. devices such as example smart speakers, or dedicated music players and audio receivers).
I think that there is a need is to design some kind of proper "Audio Player Architecture" with the use case understanding that there have to be different between video player and audio player meant for HiFi music playback, especially for multi-room syncronization. As such I think you should try to aim to design an architecture primarily for music playback that works for a combination of "smart speaker", "home audio", and "high fidelity", which I understand may have different but at least more similar use cases if talking about voice control versus music playback.
Describe alternatives you've considered
There are several open-source and many more closed-source alternatives, though I am not aware of any open-source solution that provides audio grouping and multi-room synchronization solution using a non-proprietary implementation.
Google has "Google Cast" (Chromecast Audio) which supports multi-room audio with grouping of speakers and multiroom synchronized playback so maybe they could be convinced to contribute components?
DTS Play-Fiis a premium wireless audio ecosystem for whole-home music and TV audio, supporting low-latency and high-resolution 24-bit/192kHz lossless streaming, and sub-millisecond playback accuracy synchronization technology
BluOS is a wireless hi-res multi-room platform that lets you manage all your music and stream it to any BluOS Enabled player using a phone, tablet, or computer. BluOS is an operating system that manages and controls all your music. They were the 2023 "Mark of Excellence" winner of Consumer Technology Association Smart Home Division.
The text was updated successfully, but these errors were encountered:
github-actionsbot
changed the title
[REQUEST] Multi-room audio streaming server and HiFi audio player as a fully featured multiroom audio solution
[REQUEST] Multi-room audio streaming server and HiFi audio player as a fully featured multiroom audio solution (AUD-5840)
Nov 7, 2024
Is your feature request related to a problem? Please describe.
I hope it is OK to submit this large feature request here as I do not have the skills or resources to implement this as a community project myself. As such, this is just a feature suggestion meant as an open letter to Espressif and community members for discussion, and not a feature proposal from me.
Please expand from and try making your ESP Multi-Room Music example from this Espressif ESP-ADF (Espressif Audio Development Framework) into ratified open standard and and turnkey solution for network music streamers / network audio players, adding matching reference network "music streamer" (audio input/output) board hardware for both server and client-sides as ready-made development kits (with audio-input and audio-output jacks for HiFi-music).
The point is that any company should be able to easily use that as a turnkey solution design to make "ESP Multi-Room Music" compatible audio streamers and audio players (music receivers) following your open standardization and should be compatible with each other out-of-the-box as long they follow your openly published specification that needs to be refined, well documentation and feature versioning for backwards and forward-compatibility.
That is, as a project that is constructed so that it can be easily manufactured and sold to any buyer as a completed product.
The back-story is that HiFi Streamers / Music Streamers (network audio players) to enable multi-room audio (also known as "whole house audio") are trending and becoming more popular as commercial products but it is a wild west as there are no open standards so all manufacturers use their own closed-source or/and licensed proprietary technology for streaming and multi-room audio synchronization. As such this is a feature request where I ask you and others to consider creating a dedicated "Audio Player Architecture" for client-server model architecture and framework/libraries that enable audio casting (i.e. "Casting Audio Player device type") and audio receiver ("Audio Player endpoint" with "Casting Audio Player" and "Basic Audio Player"), that enable accurate low-latency multiroom audio synchronization for music playback using optimized pipeline(s) for high-quality music playback, including speaker setup, multi-room groups, and advanced control via an open API.
Reference:
Therefore I would like to suggest that Espressif consider improving and making your ESP Multi-Room Music example and player pipelines some kind of Espressif official complete open-source multi-room audio streaming (distributed audio) solution that can used as an open standard on ESP32 based audio players (e.g. audio network player / HiFi music streamers) and music playback on smart speakers. The use case is to enable simpler creation of stanzaized and cross-manufacturer compatible multi-room audio (distributed audio) systems that are compatible with each other as they will all use the same open standard.
Key feature is accurate audio synchronization between different smart speakers to allow for synchronized multi-room audio playback of music on several audio recievers / smart speakers installed in the same home, (also known as distributed audio system). This needs support for "Audio Group", usually named "Speaker Group" and perhaps also "Audio Zone" ("Speaker Zone" or area). Preferably also need to have separate volume controls for each speaker and/or zone to compensate for differences in apparent volume due to room size and shape as well as speaker products used in different rooms.
Another key concept of so-called "speakerless devices", meaning audio-output dongles (with TOSLINK and Phono AUX-out or line-out ports for external speakers and sound systems from third parties), such as Google's original "Chromecast Audio" product which enables adding Google Cast audio player capability to any third-party speaker / sound system, as well as "Amazon Echo Input", "Amazon Echo Link", "Echo Link Amp" which similarly also adds AUX output to third-party speaker / sound system (but Amazon Echo products also have embedded voice assistant via built-in microphones).
https://en.wikipedia.org/wiki/Amazon_Echo#Speakerless_devices
https://en.wikipedia.org/wiki/Chromecast#Chromecast_Audio
Multi-room audio:
Product use case: A client/server design that works well for music/audio apps and smart speaker products, + products with audio line-in, (i.e. devices designed for only pure audio output and/or input that are normally used just music playback, including multi-room sound systems. Probably sometimes but not always including microphone input for voice assistant. The point is that it means products that lack any kind of video output like with video screens such as televisions and/or smart control displays/screens).
The main problem to solve: There are many different audio streaming protocols for commercial use, from basic to audiophile-class audio quality, and there are plenty more music streaming services around today that do not support all of those. Having plenty of proprietary and closed-sourced solutions from different commercials means fragmentation, audio players/receivers and music services that do not communicate with one another, and no way for users to control all of their music from a single interface or stream the audio to different ecosystems at the same time.
I think that other than the obvious smart speakers with voice assistants, another real-world market and use-case for pure audio players are high-quality speakers and Hi-Fi grade sound systems for music playback, whether or not they would be used for being as a single-point and multi-room sound systems for music playback, their primary audience would probably be users of music streaming from example streaming music from different commercial music services apps like Amazon Music, Spotify, SiriusXM, Pandora, Tidal, Qobuz, Deezer, YouTube Music, Apple Music, as well as additional audio-streaming services for other types of content (like example Amazon Audible for audiobooks) if and when they add support “Matter Casting” streaming protocol for audio to their apps.
A popular example of Hi-Fi audio streamer products without a built-in voice assistant is the WiiM series from Linkplay Technology:
A new product idea to consider accommodation for in a new audio-only architecture would also include support for the concept of audio-input dongles. That is, audio-steaming server dongles with "line-in" and/or "microphone" input ports that basically work as stand-alone soundcards on the network act as embedded audio digitizer appliance devices for streaming "Matter Casting Client” of audio-only which can be streamed to any set "Audio Player endpoint", which can either be a single endpoint of a grouped endpoints (audio group) for multiroom music playback. This would allow a user to connect any legacy audio source, like an LP record player (phonograph turntable), cassette deck, or CD-player (for Audio-CDs) to such an audio-input dongle and stream that audio to any “Matter Casting” enabled audio player.
As far as I know there are no commercial products on the market, but check out this "Vinyl Cast" app as a proof-of-concept:
Describe the solution you'd like
Please build-out the ESP Multi-Room Music example from this Espressif ESP-ADF (Espressif Audio Development Framework) and make it into a ratified open standard and turnkey solution for network music streamers / network audio players, adding matching reference network "music streamer" (audio input/output) board hardware for both server and client-sides as ready-made development kits (with audio-input and audio-output jacks for HiFi-music).
Tip is that the existing Matter specification does feature a "Video Player Architecture" with a "Casting Video Player device type" and "Video Player endpoint" ("Casting Video Player" and "Basic Video Player"). What looks to currently missing is a "pure" audio architecture with Matter Casting Audio Player device type, and maybe an Audio Input cluster as well. Perhaps it could be based on ideas from looking at the existing "Video Player Architecture" from the Matter Casting specification (Matter standard). That features most of what is needed for video but seem to lack audio specific features for music streaming and multiroom syncronization, but I guess a new "Casting Audio Player device type" as well which could be based on the existing "Casting Video Player device type"?
Perhaps also add an example audio player app similar to tv-app but for music playback that include multi-room synchronization?
Anyway, I believe it would be a good idea to at least check out "Matter Casting" and TV-casting-app as a concept:
While some video-specific features could be removed if basing it on the existing "Video Player Architecture", I think it would be preferable to also extend a dedicated "Audio Player Architecture" with some audio-specific features to optimize for home audio setups with Hi-Fi quality amplifiers and speakers designed for music playback, and not solely for embedded smart speakers.
An alternative could be to repurpose it and rename the "Video Player Architecture" into a more generic "Media Player Architecture"?
Important is also that both the audio streaming and low-latency multiroom audio synchronization technology is made as an open standard and use an open license, or at least royalty free even for commercial implementation so anyone can use it in any multi-room audio system. What is needed is an audio-only casting standard that can be used by various streaming audio sources, cast to speaker-only devices, (i.e. devices such as example smart speakers, or dedicated music players and audio receivers).
I think that there is a need is to design some kind of proper "Audio Player Architecture" with the use case understanding that there have to be different between video player and audio player meant for HiFi music playback, especially for multi-room syncronization. As such I think you should try to aim to design an architecture primarily for music playback that works for a combination of "smart speaker", "home audio", and "high fidelity", which I understand may have different but at least more similar use cases if talking about voice control versus music playback.
Describe alternatives you've considered
There are several open-source and many more closed-source alternatives, though I am not aware of any open-source solution that provides audio grouping and multi-room synchronization solution using a non-proprietary implementation.
Snapcast
SlimProto & SliMP3 protocols for Logitech Squeezebox players (for Logitech Media Server, a.k.a. LMS/SlimServer, SqueezeCenter)
Strobe audio
Music Player Daemon (MPD)
Amazon Alexa features multi-room music support:
Google has "Google Cast" (Chromecast Audio) which supports multi-room audio with grouping of speakers and multiroom synchronized playback so maybe they could be convinced to contribute components?
Apple features multiroom support for AirPlay 2 audio streaming:
DTS Play-Fiis a premium wireless audio ecosystem for whole-home music and TV audio, supporting low-latency and high-resolution 24-bit/192kHz lossless streaming, and sub-millisecond playback accuracy synchronization technology
Sonos, perhaps the largest on the market for multi-room audio speaker systems, and is now at least a member of CSA today:
IKEA of Sweden AB currently has a partnership with Sonos to make Wi-Fi speakers with multi-rooms audio support:
Yamaha MusicCast (Yamaha is not yet a member of the CSA), however Yamaha MusicCast prove need for high fidelity quality:
Roon Ready (Roon’s RAAT streaming technology by RoonLabs), not CSA member but again prove interoperability needed:
BluOS is a wireless hi-res multi-room platform that lets you manage all your music and stream it to any BluOS Enabled player using a phone, tablet, or computer. BluOS is an operating system that manages and controls all your music. They were the 2023 "Mark of Excellence" winner of Consumer Technology Association Smart Home Division.
HEOS (HEOS® Built-in) from Denon is multi-room speaker technology built-in to newer audio equipment from Denon:
Additional context
FYI, maybe relative is that last year Google won over Sonos in a patent infringement lawsuit about multi-room audio groups:
The text was updated successfully, but these errors were encountered: