Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blackfin audio FIFO #69

Open
catfact opened this issue Oct 27, 2013 · 17 comments
Open

blackfin audio FIFO #69

catfact opened this issue Oct 27, 2013 · 17 comments

Comments

@catfact
Copy link
Collaborator

catfact commented Oct 27, 2013

should implement optional audio buffering.

@ghost ghost assigned catfact Oct 27, 2013
@catfact catfact self-assigned this Nov 9, 2015
@catfact
Copy link
Collaborator Author

catfact commented Nov 9, 2015

working on a branch for block processing using pingpong DMA descriptor lists, something like in this example:
https://ez.analog.com/thread/71499

@catfact
Copy link
Collaborator Author

catfact commented Dec 5, 2015

turns out that my optimism of some weeks back, was unfounded. but now a simple test case is working on the block-process-test branch (e28e3c4)

there is a module in the top-level directory called dsp-block-test that doesn't reference the bfin_lib code, and does very little (writes a sawtooth at a fixed frequency and doesn't respond to SPI.) but the setup seems ok!

what ended up working (so far) is something like this:

  • as before, DMA1 is assigned to sport0 RX, DMA2 to sport0 TX.
  • after messing with 2D and descriptor-based pingpong DMA, found that simply using 1D and autobuffering with a long buffer works just as well.
  • the trick is that sport0 RX and TX now each raise an interrupt, and they use separate processing buffers. each ISR sets a volatile flag and copies the corresponding I/O buffer to the corresponding processing buffer.
  • the main loop checks the flags and performs the block-process function once the last TX and RX have both completed. (i think they complete at nearly the same time, but there was potential for race conditions in other setups i tried.)
  • TX and RX buffers are located in separate data banks (supposedly this speeds up the DMA)
  • sport0 TX has 'late frame sync' enabled - not totally sure this is necessary.

this scheme is a little memory-hungry; each frame added to the blocksize adds 64 bytes. and i'm sure it could be mildly optimized in other ways.

will keep cleaning up and making this more useful.

@ngwese
Copy link
Member

ngwese commented Dec 9, 2015

(first a major caveat of this being entirely new territory for me, feel free to ignore)

if i'm reading the code correctly the audioProcessIn and audioProcessOut have the samples (for the 4 channels) interleaved. looking at the bf533 reference manual it seemed possible to have the dma transfer de-interleave the samples - is there any value in trying the get that to work?

i was also wondering if it made sense to pass pointers to the audioProcessIn and audioProcessOut buffers (and possibly block size and frame size) to module_process_block() in order to allow the block size and/or DMA strategy to change without causing breakage at the module level. the reason i started to think along those lines is that the 1D setup + isr which copies data from the i/o buffers to process buffers seems nearly identical to the double buffer (2D?) where DMA is filling/draining one pair of buffers while the process function operates on the other pair (then they switch avoiding the data movement being done in the isr). The tx/rx done signaling from the isr could also be realized by setting (global) pointers to the buffer(s) to process which main could test for != NULL.

it seems like you've potentially already explored some of this kind of thing and moved on. if there is anything in particular i'm happy to try some more experimentation. if not i'm temped to look into how the CV outputs and SPI bus servicing might factor into block test.

@catfact
Copy link
Collaborator Author

catfact commented Dec 9, 2015

good points!

i did try a couple of other things. in the end the 1D structure was used because it turned out that my main problem wasn't the DMA structure at all, but the RX/TX timing, and this needed to be isolated. but yeah, it's not ideal.

i agree that it would be better to:

    1. skip copy step in ISR
    1. deinterleave so the modules don't have to

for 1), the answer is (as you say) to set up 2 buffers and use 2 DMA descriptors to set them up in pingpong mode.

for 2), i think this could indeed be done with the 2D DMA features. it's a little tricky. the way they implement interleaved streams is to allow the Y offset to be negative.

so, i think the 2D setup would be something very approximately like this:


#define CHANNELS 4
#define BLOCKSIZE 16
#define SAMPLESIZE 4 // (sizeof(fract32))


fract32 inputChannels[CHANNELS][BLOCKSIZE];


*pDMA1_START_ADDR = (void*)(inChannelArray);

*pDMA1_X_COUNT = CHANNELS;
// byte-address increment for inner loop
// want each successive transfer to point at the next channel array
*pDMA1_X_MODIFY = (SAMPLESIZE * BLOCKSIZE);

*pDMA1_Y_COUNT = BLOCKSIZE;
// each outer loop, want to jump back to element N+1 of the first channel
*pDMA1_Y_MODIFY = ((1 - CHANNELS) * BLOCKSIZE + 1) * SAMPLESIZE;

i think by default, when interrupt is enabled it is triggered at the end of the outer loop.

in the pingpong mode, all of this stuff would go in each DMA descriptor, something like this:


typedef struct {
   void *pNext;
   void *pStart;
   short dConfig;
   short dXCount;
   short dXModify;
   short dYCount;
   short dYModify;
} dma_desc_t;

#define CHANNELS 4
#define BLOCKSIZE 16
#define SAMPLESIZE 4 // sizeof(fract32)

#define X_COUNT CHANNELS
#define X_MODIFY (SAMPLESIZE * BLOCKSIZE)
#define Y_COUNT BLOCKSIZE
#define Y_MODIFY (((1 - CHANNELS) * BLOCKSIZE + 1) * SAMPLESIZE)

#define DMA_FLOW_DESC 0x7700
#define DMA_CONFIG (WDSIZE_32 | DMA_FLOW_DESC | DMAEN | DI_EN | DMA2D )

fract32 inputChannels0[CHANNELS][BLOCKSIZE];
fract32 inputChannels1[CHANNELS][BLOCKSIZE];

dma_desc_t descRx1 = { NULL, inputChannels1, DMA_CONFIG, X_COUNT, X_MOD, Y_COUNT, Y_MOD };
dma_desc_t descRx0 = { &descRx1, inputChannels0, DMA_CONFIG, X_COUNT, X_MOD, Y_COUNT, Y_MOD };

void init_dma(void) {

     // ping-pong
     descRx1.next = &descRx0;

     *pDMA1_NEXT_DESC_PTR = &descRx0;
    //... etc   
}

and of course an equivalent setup for the TX DMA.

as far as priority:

  • setting up the de-interleaving is important, because it most directly affects the module interface and i'd like get moving on developing block-process modules.
  • equally important is to get the SPI handling sorted. i was thinking of moving back to a very simple FIFO to which the SPI ISR pushes control changes, and processing them in the main loop (since handling control changes can be potentially expensive.)
  • it would be nice to optimize out the ISR copy step, but the module doesn't need to know about that, so maybe it is lower priority.
  • it would also be nice to improve the SPORT1 setup, which drives the CV DACs. i spent a long time on this, but couldn't get it working properly in a daisy-chain configuration (where each channel TX would be triggered by completion of the previous.) hence the stupid system of transmitting one channel per module processing callback - this is sort of acceptable when the callback is per-frame, but will not be so when the callback is per-block.

i've got some more changes to the branch - just restructuring so the block processing is in a bfin_lib_block/ and the test module code is separated. haven't pushed this because i haven't actually tested it on hardware. but i'll just go ahead and do that.

@catfact
Copy link
Collaborator Author

catfact commented Dec 9, 2015

ok yeah, the restructured lib/module doesn't work yet. i pushed it to the branch anyway. dsp-block-test has a couple changes too; slightly cleaned up, and now plays a wavetable osc on top of a passthrough.

@ngwese
Copy link
Member

ngwese commented Dec 9, 2015

would you like me to explore getting the 2D stuff working?

i was also starting to wonder if it made sense to have some overall concept of a duty cycle with servicing SPI and/or CV outs such that each could be clocked down / computed at some integer multiple of the audio processing duty cycle. if modules expressed their desired control rate it could allow one to free up cycles when high frequency control is not needed....

@catfact
Copy link
Collaborator Author

catfact commented Dec 9, 2015

if you are interested in working out 2D DMA, and you have time right now, certainly go for it! i'm happy to try as well, but time is pretty crushed for the next few days.

also good ideas about specifying different control rates. i recently opened a sparate issue about SPI servicing (#239), commenting there

@catfact
Copy link
Collaborator Author

catfact commented Jan 31, 2016

@ngwese
i tried to implement deinterleaving and pingpong in the DMA descriptor. but i'm doing something wrong and can't figure it out.

latest commit (3b4da21) has a flag in audio.h to toggle this implementation.

with or without the flag, the module API now takes two deinterleaved 2d buffer of [channels][frames] as arguments for input/output. so it's pretty much transparent whether there is a copy step in the ISR or not.

the only wrinkle is that right now, the copy step also shifts the input/output frame data between the codec's native 24 bits, and the full 32 bits that is generally better for processing. maybe we should let the module do this instead so that the ISR wouldn't have to do much of anything if the pingpong DMA worked.

but i think i will go ahead and start working on some block-processing modules. i want a bigger bank of simple oscillators and a proper varispeed delay...

@ngwese
Copy link
Member

ngwese commented Feb 1, 2016

this isn't fully working yet but a promising step in the right direction (ngwese@7cd798a)
i get recognizable but distorted audio from ch 1,2 in to out

after lots of reading, re-reading, and staring at the DMA operation flow diagram things started to click. the above changes:

  • keeps the "large" descriptor setup so that the next and start pointers can be set directly
  • pulls all descriptor fields except the two registers which are changing after each transaction (next and start) this saves 20 shorts worth of memory. the NDSIZE values range from 1,9 based on the number of registers to load from the descriptor which was confusing because NEXT_DESC_PTR and START_ADDR values each map to 2 registers
  • the DMAx_CONFIG need not change between transactions since we're just going around in circles without signaling a stop.

it seems like there were two things in particular which where problematic in the initial attempt:

  • in init_dma() the pDMAx_NEXT_DESC_PTR values were getting set but the initial FLOW value wasn't so when dma was eventually enabled it didn't know it was supposed to load the first descriptor.
  • the other was that the config value in each of the descriptors lacked DMAEN. when enable_dma_sport0() was called the dma channels where enabled causing the descriptor to get loaded but it seemed to just hang because the channel was immediately disabled again (best i can tell)

i'll keep at it.

@catfact
Copy link
Collaborator Author

catfact commented Feb 1, 2016

(sorry, posted from wrong profile just now)

ach, thanks, not setting the initial flow mode/config was a pretty dumb one... maybe there is something wrong too with the way i set up the MODIFY/COUNTs causing your distortion?

in the meantime i went ahead with building out a test module (bank of sinewaves to start with) and making some modification to how param changes are handled - SPI rx ISR adds them to a FIFO queue and updates the raw param states, and the block ISR processes the queue and does whatever module-specific stuff with those values.

this almost works, but there is some screwy timing problem (i guess) when loading a scene from bees.

and, still need to add the sport1 stuff for CV output... a good opportunity to do it more correctly.

@ngwese
Copy link
Member

ngwese commented Feb 1, 2016

the MODIFY/COUNTs seemed fine when i checked - i walked the code line by line trying to wrap my head around it and verify things against the bfin reference manual. i can double check.

my next step was going to sanity check the sample handling (24 vs 32 and the serial setups) as well as wrap my head around the what is going on timing wise between the dma transactions, isr, and process code.

@catfact
Copy link
Collaborator Author

catfact commented Feb 1, 2016

well there is definitely the 32b/24b problem. but shouldn't matter if you're just copying in to out...

@ngwese
Copy link
Member

ngwese commented Feb 2, 2016

opened a PR (#248) with a functional setup. the final missing piece turned out to be trivial, minor typo in the tx isr.

@ngwese
Copy link
Member

ngwese commented Feb 2, 2016

btw. tried to compile the rawsc module and it is missing its linker script.

@catfact
Copy link
Collaborator Author

catfact commented Feb 2, 2016

wow, excellent, thanks! merged PR. also added missing .lds (its just the same default), moved sample shift to module, moved block size definition to module, couple other tweaks.

somewhere along the way this scene-recall init problem went away...? maybe it was user error somehow.

@electropourlaction
Copy link

I'm wondering if the dma controller ( as setup above ) is mem-to-mem between L1 and SDRAM? It doesn't look that way at a first glance. I´m thinking about using one stream as a playhead and the other as a record head between these two memory spaces?!

@catfact
Copy link
Collaborator Author

catfact commented Apr 1, 2016

the audio I/O buffers aren't in SDRAM, no. they are in L1; read in one bank, write in the other.

SDRAM access is about 8x slower than L1, so in general we would not want to put the I/O buffers there.

i would just transfer stuff in and out of SDRAM in the block process routine of your module.

but if you want to, you could use a different DMA setup pointed at SDRAM addresses. from the bf533 manual (page 9-51) it looks like using DMA to access external memory is perfectly OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants