Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFD - PCM audio support #1085

Closed
devsaurus opened this issue Feb 25, 2016 · 34 comments
Closed

RFD - PCM audio support #1085

devsaurus opened this issue Feb 25, 2016 · 34 comments

Comments

@devsaurus
Copy link
Member

The recent additions of sigma-delta modulation with #1000 and a precise µs timer in #1057 open up the potential for audio support. Apart from a Lua interface, just some glue code is required to combine both into a simple mono audio back-end.

What I envision is support for playing wav-like files over any of the GPIOs:

  • 1 k to 16 k sample rate
  • Raw, unsigned 8 bit audio files, stored in SPIFFS
  • Conversion from .wav format with OS tools like Audacity or SoX

Few external components for filtering will convert from digital to analog domain and attach to either a headphone jack or an active amplifier driving standard 4-8 Ω speakers. A quick 'n' dirty feasibility study is available in my pcm branch, complemented by some notes describing the external analog filter & amplifier and sample Lua code.

But before developing this into a PR, I'd like to check with the community whether such a module has a use case and is still within this project's scope. I'll follow up with API concept and architectural details once there's a strong indication that this functionality is considered to be useful.

@devyte
Copy link

devyte commented Feb 25, 2016

If it were possible to add audio sampling (i.e.: microphone) in addition to this, then the ESP could be used for simple 2-way audio, which would be beyond awesome.
If it's not possible, then this could still be useful. I have a friend who asked me if the ESP could play sounds, he wants to use it for fishing of all things. I'm pretty sure there could be other uses.

@mikewen
Copy link

mikewen commented Feb 26, 2016

+1, I remember ESP8266 support i2s interface, but could not find any code example.

@jmattsson
Copy link
Member

What @devyte said!

I think it would be a challenge to make it run smoothly given the non-preemptive nature of the SDK (and shortage of RAM), but if it can be done it'd get a 👍 from me.

@marcelstoer
Copy link
Member

Sounds exciting! How about a WeMos D1 mini with a stackable SD card shield that plays music autonomously when connected to speakers 😄

@devsaurus
Copy link
Member Author

Thanks for your inputs, guys.

Audio sampling
Amazing idea - I didn't think of it yet. Although analog-to-digital conversion with sigma-delta isn't as straight forward as generating analog audio, I'll look into it. Maybe the ADC would be useful here, let's see.

I2S
The ESP is said to have good hardware support for I2S, and there's the Mp3_Decode project by Espressif which can serve as a coding example. I haven't considered I2S so far for several reasons:

On the pro side this solution offers best audio quality and hardware streaming support.

Other audio solutions
Slightly out of scope, but there are nice mp3 players on the market. Standalone ones or controllable via UART. Pros: better audio quality and kind of cheap. Cons: low level of integration and still more expensive that some Rs & Cs plus an audio amplifier

Real-time characteristics
Definitely the main challenge IMO - feeding file data in real-time from SPIFFS to the audio back-end. I'm not yet sure if this can be done with 16 K sample rate, and I hope that the new tasking interface from #1061 could fill the gap. It was the major driver for opening the discussion before spending the effort to dig into this.

@devsaurus
Copy link
Member Author

Had a deeper look into sourcing pcm data from files. This worked very well using the new task interface. The current implementation uses double buffering, each with 1024 bytes. Their size can probably be reduced further since margins are quite big as seen in the plots below.

Yellow channel traces audio signal.
Green channel shows handshake timing between ISR and reader task.

  • falling edge: ISR requests to re-fill buffer and fires reader task. Repeating every ~63 ms.
  • rising edge: reader completed fetching 1024 bytes from file. Takes between 200 µs and 500 µs.

overview
flash_reload

@TerryE
Copy link
Collaborator

TerryE commented Mar 6, 2016

I am also interested in this short of approach for other bit banging drivers. Just an aside comment. But this is looking good 😄

@devsaurus
Copy link
Member Author

Yes, the concept of double buffering and filling them from a file reader task is quite generic. All specific logic is part of the ISR - feeding a DAC, or pushing patterns through GPIOs.

I don't have yet a final view on my implementation of the buffering stuff here, things are still moving. But the ingredients are clear: data producing task, data consuming ISR, and in between a fifo/ double buffering scheme. Having a shared, generic solution for the latter one should be feasible. But that's brainfood for a different issue.

@devsaurus
Copy link
Member Author

Up to now I worked just on demonstrators to investigate certain feasibility aspects. The Lua API itself still needs to be settled. Your feedback was very helpful to rethink the overall structure.

pcm module

Play sounds through various back-ends. Supported hardware is sigma-delta (, I2S, XYZ).

pcm.new()

Initializes the audio driver.

Syntax

pcm.new(pcm.SD, pin)
pcm.new(pcm.I2S, arg1, arg2, ...)
pcm.new(pcm.XYZ, arg1, ...)

Parameters

pcm.SD use sigma-delta hardware

  • pin 1~10, IO index

pcm.I2S use I2S hardware

  • arg1 ...
  • arg2 ...

Returns

Audio driver object.

Audio driver

Each audio driver provides the same control functions for playing sounds.

pcm.drv:close()

Stops playback and releases the audio hardware.

pcm.drv:on()

Register callback functions for events.

Syntax

pcm.drv:on(event[, cb_fn])

Parameters

  • event identifier
    • data callback function is supposed to return a string containing the next chunk of data.
    • drained playback was stopped due to lack of data. The last 2 invocations of the data callback didn't provide new chunks in time (intentionally or unintentionally) and the internal buffers were fully consumed.
    • paused playback was paused by pcm.drv:pause().
    • stopped playback was stopped by pcm.drv:stop().
  • cb_fn callback function for the specified event. Unregisters previous function if omitted.

Returns

nil

pcm.drv:play()

Starts playback.

Syntax

pcm.drv:play(rate)

Parameters

rate sample rate. Supported are pcm.RATE_1K, pcm.RATE_2K, pcm.RATE_4K, pcm.RATE_5K, pcm.RATE_8K, pcm.RATE_10K, pcm.RATE_12K, pcm.RATE_16K.

Returns

nil

pcm.drv:pause()

Pauses playback. A call to pcm.drv:play() will resume from the last position.

pcm.drv:stop()

Stops playback and releases buffered chunks.

@dvv
Copy link
Contributor

dvv commented Mar 6, 2016

I would vote pro:

  • introduce pcm.drv:play(chunk_as_string, rate, callback_fn(pcm.drv, event)) instead of file/network/whatever flavors.
  • cosmetic: use driver type as second parameter to pcm.new(pin, typ)
  • cosmetic: pcm.drv:close() for pcm.drv_close()

@devsaurus
Copy link
Member Author

Interesting input, thanks Vladimir!

introduce pcm.drv:play(chunk_as_string, rate, callback_fn(pcm.drv, event))

Will consider this for sure as it removes a lot of specific handling from the module. Adding a Lua call layer might slow down things, will check the timing impact later.
I can think of the following events:

  • data callback shall deliver further data
  • stopped by pcm.drv:stop()
  • paused by pcm.drv:pause()
  • drained stopped due to buffer underrun

Why do you propose chunk_as_string? If the callback function returns the data as a string on the Lua stack then there'd be no need for chunk_as_string. Or do I miss a use case where this parameter is definitely required?

use driver type as second parameter to pcm.new(pin, typ)

My first sketch considered dedicated new() functions because each (future) driver might require different parameter sets. The sigma-delta needs to know the pin while I2S has a fixed pinning. Don't know which other info would be required to configure the I2S hardware.

pcm.drv:close() for pcm.drv_close()

Yes, that was a typo.

@dvv
Copy link
Contributor

dvv commented Mar 7, 2016

chunk_as_string meant we feed :play() with string (not table, e.g.), as it corresponds one-to-one to unsigned byte stream accepted by chosen format ("Raw, 8 bit unsiged format").

:new(pin, typ[[, specific], arguments]) would imho be consistent, with a dummy pin for I2S.

Callbacks: I would just report pause, stop and drain events leaving it to user to act on them. In drain one might want to feed more data, in pause accumulate/buffer/flush input, in stop stop feeding the player and mark things for exit.

@devsaurus
Copy link
Member Author

My current model requires that the callback needs to feed data in time before the internal buffers are drained. This is why I plan to distinguish between data event and a drained event. The former is the request which has to be served as quick as possible, while the latter is the indication that continuous streaming ceased due to a lack of data.

A simple example :

function pcm_cb(d, event)
    if event == "data" then
        return file.read()
    elseif event == "drained" then 
        print("file done")
        file.close()
    end
end

file.open("output_16k.u8", "r")

drv = pcm.new_sigmadelta(1)
drv:play(pcm.RATE_16K, pcm_cb)

@devyte
Copy link

devyte commented Mar 11, 2016

@devsaurus I'm curious about your callback model, it's different than the other ones I've come across so far. E.g.: connections:

srv = net.createServer(blah)
srv:on("disconnection", onDisconnection)
srv:on("sent", onSent)

Applying that model to your interface, it would look like this:

function onData(d)
  return file.read()
end
function onDrained(d)
  print("file done")
  file.close()
end
drv = pcm.newsigmadelta(1)
drv:on("data", onData)
drv:on("drained", onDrained)
drv:play(pcm.RATE_16K)

Notice that the if-else logic for the event type in your callback is eliminated.

@devyte
Copy link

devyte commented Mar 11, 2016

I just read somewhere that the onboard ADC could maybe do 2.5KHz sample rate. that would give a theoretical mic bandwidth of 1250 Hz, which I think is too narrow for a mic. I guess sampling with the onboard ADC could still be attempted to check whether that's true, but most likely an external ADC over I2C or something would be needed to make it viable.
Still, an implementation could be pursued similar to the pcm proposal above: a frontend with different possible backend ADCs.

@devsaurus
Copy link
Member Author

@devyte Right, the example you gave for net is also used in mqtt and uart. A similar approach is found in wifi.sta.eventMonReg(), while other modules do callback registration with a single function like enduser_setup.start() and sntp.sync().

It appears that the on("event_name", cb_fn) pattern is the most common one. For sure this allows for a clearer separation between event handlers and eliminates the condition evaluation tree. Are there other pros? What would be the cons?

@devsaurus
Copy link
Member Author

Regarding ADC I did a quick assessment of the obvious options in the meantime.

  • Internal ADC: Probably too slow as you already concluded. Furthermore, system_adc_read() seems to be blocking until the next result is available (didn't check though) which would be a no-go for continuous sampling.
  • External ADC attached via I2C: Chips with good conversion quality and sampling rate are available, but ESP's low-level I2C routines keep the CPU busy with bit-banging. This would also block other tasks like wifi and break the system stability.

Up to now I don't see any promising approaches. My conclusion would change once there's an external solution which can be attached via an interrupt-driven or DMA-like interface.

@devyte
Copy link

devyte commented Mar 12, 2016

@devsaurus
Cons:

  • The dev has to enter a string, and both the id string and the lua strings, although small, require several bytes which just sit in mem and are used only to identify the type of event.
  • The string is something that the dev has to remember, and if not, lookup (i.e.: I've lost track of the number of times I've wondered: was it :on("disconnect") or :on("disconnection")? damit... open browser, readthedocs.org/... ), and is also a bit error prone, although if error handling in C-world is correct, a bad string should trigger a runtime error. Then again, how many bugs have the devs here come across related to error handling in C-world? :p

Pros:

  • Using the single :on(id, cb) pattern would make the interface consistent with the other cases. I had to think about this myself recently on lua side, and decided on the :on() in spite of the cons.

On the other hand, a different approach could be used with one function per callback, i.e.:

drv:onDrained(onDrainedCallback)
drv:onData(onDataCallback)

or:

drv:onSent = onSentCallback
drv:onData = onDataCallback

This requires no string, but it does require one function per callback. I'm not sure what that means for lua under the hood, though, maybe strings are used to identify/lookup the function?

The first of the above is safer from a coding PoV, because the error checking is implicit and smaller than in your case.

The second is easier from an implementation PoV, because it doesn't require functions to be implemented, i.e.: the callbacks are just table entries. However, it's slightly more error prone (typos and such), with pretty much no diagnostics to detect them.
Both are inconsistent with the rest of the callback setups elsewhere.

@devyte
Copy link

devyte commented Mar 12, 2016

@devsaurus about a device on I2C, I've seen ESP projects with external I2C ADCs doing sampling rates of 40KHz. Didn't look at the details tho...to be honest, 20KHz would be pretty for a mic, and I think we could probably get away with as low as 8KHz.

@devyte
Copy link

devyte commented Mar 12, 2016

Also, how about an ADC with SPI interface? They seem to be cheaper vs. I2C, I see a 4-channel one at USD$2.2 with sample rate of up to 200KSps, which is kind of overkill, of course. The I2C ones seem to go 6-12 bucks a pop.
Needs more pins, of course...

@devsaurus
Copy link
Member Author

Thanks for the detailed feedback on callbacks!
I'm tending to switch to the :on("event") despite its potential cons. In the end it's in line with most of the modules dealing with callbacks. API sketch is updated accordingly.

Regarding the ADCs - do you have any links for future reference? I don't intend to rule out recording, but would leave this topic to a second iteration once audio generation is settled.

@devyte
Copy link

devyte commented Mar 12, 2016

I found some cheaper I2C ones...

NCD9830 I2C, 8bit x 8ch, 2.5-70KSps @ just over 3 bucks
ADC101C021 I2C, 10bit x 1ch, 189KSps, $2.52
MCP3004 SPI, 10bit x 4ch, 200KSps, $2.20

@devyte
Copy link

devyte commented Mar 13, 2016

@devsaurus does this make any sense to you?

@devyte
Copy link

devyte commented Mar 13, 2016

List of links to discussions of the ADC that I've come across (for future ref)
Reliable audio adc timer-ing (same as link above)
ADC??
ADC is slow
ADC sample rate
Build in SAR ADC PHY_ADC_READ_FAST()

It seems some people claim it's possible to do fast sampling with the internal adc, it's just that the wifi and task priorities make it unreliable. I also suspect that they're not using an efficient interrupt scheme.
I was thinking along the lines of:
-low level fast ISR servicing the ADC. All it does is read the sample and stuff it in a buffer. Testing this by itself could provide a hint of what sampling rate could be accomplished, and what the impact would be for wifi in AP, STA, STATIONAP or NULL modes.
-high level callback: once a buffer is full, a higher level callback is called to take the buffer and propagate it upwards to lua. This is not called directly, of course, but via a scheduled task or something.
-double (triple?) buffers: once the ISR fills a buffer, it gets swapped with the other one (next one?) which is standing by empty, and the higher level task for calling the callback gets created. That keeps the ISR lean and fast.
-What frequency would make sense for the higher level callback? The lower the freq, the bigger the buffs => heap

Does that make any sense, or am I writing nonesense?

@devsaurus
Copy link
Member Author

Time for a sign of life - most of the API sketch is implemented now in devsaurus/pcm. I'll continue more checks and clean-ups as time permits.

@Phando
Copy link

Phando commented Apr 1, 2016

I have only ever used the online tool to make custom firmware. Is there a way to build devsaurus/pcm to include https support? I am very excited to play with some raw audio and the nodemcu.

@TerryE
Copy link
Collaborator

TerryE commented Apr 1, 2016

@Phando Joe, this isn't the right place to ask this sort of Q. Our support page give you links which provide forums for this type of Q.

@Phando
Copy link

Phando commented Apr 1, 2016

Thanks and sorry. Keep it up, the build service is great

On Mar 31, 2016, at 8:59 PM, Terry Ellison [email protected] wrote:

@Phando Joe, this isn't the right place to ask this sort of Q. Our support page give you links which provide forums for this type of Q.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@guillermo22
Copy link

guillermo22 commented May 3, 2016

Hi everybody! I am working with an ADC who works with SPI comunication to treat audio signals.(50Khz) (MCP3201). And this works good when i want to send the data through WIFI UDP using a time.alarm().
The problem is that this timer has a limit about 1mS to send. And how we know this needs too much more for audio signals.
So i have tried a cicle WHILE infinite but it can not do the sending. I saw #367 but i don´t understand very well how it can modify the size of buffer, i mean do it bigger.. and finally to send the data audio files through wifi

@nickandrew
Copy link
Contributor

@guillermo22, I think you should ask your question on the forum.

@devsaurus
Copy link
Member Author

I'm closing this since the related PR is well in review loop.

@navin-bhaskar
Copy link

Hi,

I know this is a closed issue but recently I was trying out the "play_file.lua" example code. But when I run the code, I get the following error message:

PANIC: unprotected error in call to Lua API (bad argument #1 to '?' (file.obj expected, got userdata))

I changed the line drv:on("data", file.read) to drv:on("data", file) after that, I do not see any error but nothing happens on the pin and also drained cb gets called real quick. Any reason as to why this might happen?

@devsaurus
Copy link
Member Author

@navin-bhaskar see #1712 for the fix of this error.
The example was updated on dev branch at https://github.com/nodemcu/nodemcu-firmware/blob/dev/lua_examples/pcm/play_file.lua.

@navin-bhaskar
Copy link

Thanks! that worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests