I2S audio and DMA it’s a mysterious world. What are these parameters: dma_buf_count and dma_buf_len?
What are they for? And what values should they be set to?
You can watch a detailed video explanation here on YouTube and I’ve summarised it below.
I’ve got a quick shout out to PCBWay for sponsoring the video. PCBWay offer PCB Production, CNC and 3D Printing, PCB Assembly and much much more. You can find their details here
DMA allows peripherals to directly access the system memory without involving the CPU.
When using DMA, the CPU initiates a transfer between memory and the peripheral. The transfer is all taken care of by the DMA controller and the CPU is free to go off and do other work.
When the DMA transfer is completed the CPU receives an interrupt and can then process the data that has been received or set up more data to be transmitted.
This gives us our first data point on how to choose a size for our DMA buffer. Small DMA buffers mean the CPU has to do more work as it will be interrupted more often. Large buffers mean the CPU has to do less work as it will receive fewer interrupts.
Taking audio as our example - suppose we are sampling in stereo at 44.1KHz with 16 bits per sample - this gives a data transfer rate of around 176KBytes per second.
If we had a DMA buffer size of 8 samples, we’d be interrupting the CPU every 181 microseconds.
If we had a buffer size of 1024 samples, we’d be interrupting the CPU every 23 milliseconds.
This is a very big difference.
The naive conclusion from this is that we should make our DMA buffers as large as possible.
But there is a tradeoff here - and that’s latency. We need to wait for the DMA transfer to complete before we can start reading from the buffer.
Generally, with audio, we don’t have very hard real-time constraints. However, you can easily imagine scenarios where a delay of 23 milli-seconds could impact your application.
What are the actual limits on the values for dma_buf_len?
It’s easy enough to test - we get a helpful message when we try to use a very large value:
(i2s_driver_install):I2S buffer length at most 1024 and more than 8 Guru Meditation Error: Core 1 panic'ed (LoadProhibited). Exception was unhandled. Core 1 register dump: PC : 0x400d6701 PS : 0x00060830 A0 : 0x800d14f8 A1 : 0x3ffb1ec0 A2 : 0x00000000 A3 : 0x3ffb850c A4 : 0xffffffff A5 : 0x00000000 A6 : 0x3f404434 A7 : 0x3f4012d8 A8 : 0x80105038 A9 : 0x3ffb1e70 A10 : 0x40143360 A11 : 0x3ffc456c A12 : 0x3f401344 A13 : 0x00000010 A14 : 0x00000003 A15 : 0x00000001 SAR : 0x00000004 EXCCAUSE: 0x0000001c EXCVADDR: 0x0000002c LBEG : 0x4008bdad LEND : 0x4008bdbd LCOUNT : 0xfffffff4
We can have at most 1024 and must have more than 8.
One interesting thing to note is that this value is in samples, so to calculate the number of bytes that are actually being used we need to use this formula:
The number of bytes per samples multiplied by the number of channels, the buffer length and the buffer count.
So in a concrete example, with 16 bits per sample, stereo left and right channels, dma_buf_len set to 1024 and dma_buf_count set to 2 we have a total of 8K allocated.
There’s a good tradeoff to talk about here - DMA buffers are allocated from memory - any space we use for DMA buffers is taken away from space that can be used by the rest of your code.
There is also a limitation that DMA buffers cannot be allocated in PSRAM. We are limited to the internal SRAM of the chips.
This limits us to a maximum of 328K. In testing with my Walky-Talky application, I managed to allocate a total of around 100K before the application started to crash.
This leads us nicely onto discussing of what to set dma_buf_count to.
We can break this question into two parts:
- The first part is “why do I need more than one DMA buffer?”
- The second part is how much total space do I need to allocate to DMA buffers?
Let’s answer the first question - why do I need more than one DMA buffer?
The issue with having only one buffer is that it does not give us any time to process the data. Without some serious hacking around, the DMA buffer can only be used by either the CPU or the DMA controller. They cannot access the buffer at the same time.
This means that we can only start processing the buffer once the DMA controller has finished transferring data.
If we only have one buffer, we need to complete our processing and give the buffer back to the DMA controller before any new data needs to be transferred. With a sample rate of 44.1KHz we only have 22 microseconds before the next sample comes in from the device.
This is not very long - it’s unlikely we could do anything meaningful in this time. Our processing task may not even be scheduled quickly enough to even catch the data.
The result of this means that you typically want to set the dma_buf_count to at least 2. With two buffers you can be processing one buffer with the CPU while the other buffer is being filled by the DMA controller.
So we need more than one buffer - let’s move onto the second question. How much total space do I need to allocate to my DMA buffers.
To understand this we need to think about the components of our system.
We have the I2S peripheral that is generating samples from our audio source. This will be generating data at a fixed rate that is set by the sample rate.
We then have something that is consuming and processing this data. To understand what we need to set the total buffer size to we need to understand how much time this processing will take.
Suppose we are sending data to a server somewhere and the server takes on average 100 milliseconds to process a request. Sometimes it is faster, and sometimes it is slower.
100ms of audio at 44.1KHz sampling rate works out at 4410 samples or around 17Kbytes for stereo 16-bit samples. This sets a lower limit on our DMA buffer size. We need space to store 4410 samples while we are busy sending the data to our server.
We also need to allow for the fact that sometimes our server takes longer to respond. What if it sometimes takes 150ms to respond? We need to allow a larger buffer size to take this into account. 150ms is 6615 samples. For a safety margin, we might bump this up to 10000 samples. So we might set our dma_buf_len to 1000 samples and our dma_buf_count to 10.
There’s a nice little visualisation of this in the video linked above.
This covers the case of processing audio data coming into the system. What about pushing samples out? How should we think about this?
We now have something that wants to be fed samples at a constant rate. And have a source of samples that may not be able to consistently push samples into the buffer.
To think about this, we need to consider how quickly we need to generate samples. And we need to think about what the worst-case delay there could be in generating those samples.
How quickly we need to generate samples is set by our sampling rate. If we cannot generate samples fast enough to meet our sample rate our system is not going to work. No amount of buffering will help us - unless we can generate all of the audio data upfront in one big buffer.
An example of this as a solution would be to load an entire audio file into memory from slow storage and play it directly from RAM.
Assuming we can generate data fast enough for our sample rate. We need to think about the length of random delays that may mean we can’t deliver some samples exactly when the output requires them.
A good example of this is from our walkie-talkie project. We are playing samples from a stream of UDP packets. Depending on network conditions, packets will be delayed by some variable time.
We need to have sufficient headroom in our buffer that our sample sink does not run out of data when there are delays in delivery.
Taking our UDP example of 1436 bytes - once the header has been removed - which equates to 718 samples - we would need to queue up at least this many samples into the output buffer. To allow for packet delays, we might want to have a buffer of twice this size so we can queue up two packets before playing. Once again, some safe values for this might be 2-3 DMA buffers of 1024 bytes each.
One last point to think about is that we can also have buffers in our application code. We don’t need to rely solely on DMA buffers to solve our problems. You may choose to have relatively small DMA buffers and use your own buffers to handle things.
There may be some good reasons for taking this path. You may have multiple different processing steps, some that require very low latency and some that have variable time delays. The low latency requirement forces you to use small DMA buffers. The variable time delays forces you to have your own quite large memory buffers
Hopefully, this has given you some insight into how to choose values for these two parameters - as always - the real answer is “It depends…”
But there is some guidance:
Use dma_buf_len to trade-off between latency and CPU load. A small value improves latency but comes at the cost of more CPU load.
Use dma_buf_count to trade-off between memory usage and the total amount of buffer space allocated. More buffer space gives you more time to process data, but comes at the cost of using more memory.
Now I need to go back and check all the values I’ve used in my code.
There are some good details on ESP32 and DMA here: https://www.espressif.com/sites/default/files/documentation/esp32_technical_reference_manual_en.pdf section 126.96.36.199 DMA