-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with the Benchmark #68
Comments
|
I am guessing, like you @bperez77, I just have the wrong settings in my IP core.
@Westwood68, could you let us know what settings you used to get the VDMA working correctly? Thanks! |
Yeah this one was tricky, I still haven't figured out the proper settings on the IP core to get it to work. It may also be an issue inside the driver with how multiple VDMA buffers are set up, but I would still expect it to work for a single buffer case. IIRC from that original thread, he didn't remember what the original settings were. |
Hi @bperez77, the program does indeed crash when the single test transfer is performed (single_transfer_test). Since making some changes to the IP core, I am now getting a null pointer exception, so I guess it might actually not be related to the driver at all. It might also be that there is some significance to the fact that I am using a 64-bit platform (ZynqMP) instead of the 7-series Zynq. |
I guess this is the same problem as described in issue #52. |
I have added some kprint statements to the module to see exactly where in the DMA transfer things go sideways... See below, where I added also the lines of code where the print statements happened: `[ 124.105520] Running axidma_rw_transfer.464, axidma_chrdev.c [ 124.121121] Internal error: Oops: 86000006 [#1] SMP [ 124.246275] Process axidma_benchmar (pid: 3707, stack limit = 0xffffffc877f3c020) I sort of expected to find something like a failure to use the "copy to/from user" function or similar, but this is not the case. In fact it seems like the kernel actually completes the full DMA transfer function before getting the null pointer exception at the line where we return 0. Any ideas where I could keep looking for the problem? I have tried a lot of combinations of settings for the VDMA engine, nothing seems to be helping much, aside from the fact that now I get a null pointer exception instead of a timeout condition... Thanks! |
I have been rigorously pursuing the solution to this issue in parallel and come to the same conclusion. The DMA transfer function is completing correctly and the null pointer exception is occurring at the very point that the axidma_rw_transfer(dev, &inout_trans) in the ioctl case statement for AXIDMA_DMA_READWRITE in axidma_chrdev.c. I speculate that a conflict is occurring with the asynchronous signal being issued for transmit DMA completion. Because the axidma-benchmark application is blocking on the receive completion anyway, this signal is not needed for correct operation. I did a quick test today on the Zedboard by commenting out the "send_sig_info( ... )" line in the axi_dma_callback function and confirmed that the axidma-benchmark for VDMA still worked fine. I will have access again beginning on 9/24 to the ZynqMP ZCU102 board to test this out. If you can see if this makes a difference before then, please advise. If this turns out to be the problem, the fix should be implemented elsewhere in the driver so as not to impact cases where the signal is necessary. |
Looking deeper into the axidma_callback( ... ) signal handler for the VDMA Tx transaction done signal, there is an assert( ... ) function being called. An assertion is not considered async-signal-safe. This could be the root of the problem but won't know for sure until testing on the ZynqMP. |
Hi @EKjeldsen, I am glad I am not the only one having this problem. I tried your suggestion of commenting out the send_sig_info function call in the dma callback. I added some printk statements to follow the timing. Interestingly, it does indeed seem to be crashing during the second time we enter the callback (presumably for the RX thread), since I see the entrance printk, and not the exit printk. Below are my added printk statements from a run of the transfer test so you can observe the sequencing:
I will let you know if I discover anything else today while I am working on it. |
Follow up: I was running very small transfers to test the engine. Out of curiosity I ran some normal, frame sized transfers using the following command:
Behavior here was quite a bit different. The program seems to freeze after a different kernel error, before the DMA transfers are even set up...
|
I made some changes to my hardware design (address all 36 bits from dma engine) and now the built in VDMA test passes. Now "only" the original issue with the benchmark remains... |
@bperez77 : I have noticed in the character device driver, you have case statements for dma read, vdma read, dma write, vdma write, but then only a dma read and write (bidirectional). Are the additional frame housekeeping tasks performed in the respective vdma functions not necessary for the bidirectional case? |
@brianvg - thank you for updating your findings. We will be looking further into this today and will post updates as well. |
Found something interesting late today of interest. To rule out multiple buffer issues, I verified operation using a single frame buffer. On the Zedboard, Brandon's axidma_benchmark and Xilinx's vdmatest ran fine. On the ZCU102, vdmatest ran fine but axidma_benchmark encountered the NULL pointer exception. However, I noticed that the value of "xlnx,num-fstores" for the VDMA IP core in the pl.dtsi file generated for the ZCU102 was double what it should be. In my original 3 frame buffer case, "xlnx,num-fstores" equals 0x6 vs, 0x3. In the 1 frame buffer case, "xlnx,num-fstores" equals 0x2 vs. 0x1. In the case of the Zedboard, there was no discrepancy between the setting in the VDMA IP and the Petalinux generated value fox "xlnx,num-fstores". Tomorrow we will test a version on the ZCU102 that corrects the "xlnx,num-fstores" to equal 0x1. |
Hi @EKjeldsen, I just ran a test with the corrected entry for the frame buffer. I do not observe any changes in behavior, still seeing the segmentation fault. Nice catch though! Not sure how Petalinux is getting that wrong... |
Hi @EKjeldsen ! Any new progress on the issue? I was exploring the possibility of leveraging V4L2, which is what many ZynqMP users end up doing to get video streams from user space working, but unfortunately, the framework does not seem to allow sending video into the PL for any use other than sending it to a display port/hdmi interface. This seems to rule out the very plausible use case of using the framework for a video coprocessor (sending the data down to the PL, and then retrieving it after processing). @bperez77's work really does seem to be the only show in town for creating a data path for this sort of a use case. It is typical of Xilinx's startling lack of imagination that they would not prioritize this as a typical use case and offer some measure of support. (please excuse the rant...) |
We are still pursuing this effort, but haven’t posted anything lately because of dead ends. Our use case is the opposite direction, PL -> PS, for video source processing by the PS, but also what would seem to be a typical application. There is a very perplexing problem! One architecture difference with respect to the Zynq-7000 Zedbaord is the presence of an SMMU. I think I’ve finally convinced myself that the SMMU can be bypassed as this driver provides the necessary address translation. But I would appreciate your take on that. Yesterday I discovered that a switch to set the size of dma_addr_t to 64 bits was not being configured in Petalinux. I added a user_xx.cfg file containing the following entry: CONFIG_ARCH_DMA_ADDR_T_64BIT=y. I then verified that sizeof(dma_addr_t) was 8 bytes. Unfortunately the NULL pointer exception remained.
Today I’m going to look further into the possibility of a pointer arithmetic error. Also, a point of divergence to consider that his driver uses an interleaved DMA call for VDMA vs. scatter-gather call for regular DMA. Maybe something is being set incorrectly in that call when 64-bit bit addressing is used?
From: brianvg <[email protected]>
Sent: Tuesday, October 02, 2018 1:27 AM
To: bperez77/xilinx_axidma <[email protected]>
Cc: Kjeldsen, Erik H. <[email protected]>; Mention <[email protected]>
Subject: Re: [bperez77/xilinx_axidma] Issues with the Benchmark (#68)
Hi @EKjeldsen<https://github.com/EKjeldsen> ! Any new progress on the issue? I was exploring the possibility of leveraging V4L2, which is what many ZynqMP users end up doing to get video streams from user space working, but unfortunately, the framework does not seem to allow sending video into the PL for any use other than sending it to a display port/hdmi interface. This seems to rule out the very plausible use case of using the framework for a video coprocessor (sending the data down to the PL, and then retrieving it after processing). @bperez77<https://github.com/bperez77>'s work really does seem to be the only show in town for creating a data path for this sort of a use case. It is typical of Xilinx's startling lack of imagination that they would not prioritize this as a typical use case and offer some measure of support. (please excuse the rant...)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#68 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/Aenrph4DiBuhVnVpO6LvC3uuE0dIl1Aiks5ugvkKgaJpZM4WnKrY>.
|
Hi @EKjeldsen I may be mistaken, but my read of the SMMU is that it is disabled by default and even when enabled only has an effect when the relevant PL masters are set up correctly to use it. I have tried enabling it in the device tree, but I don't see any effect. Enabling the SMMU would in theory allow us to use virtual addresses for mapping memory locations from user space into the DMA engine. I guess one would need to map the VDMA to a VIO device type. It all looks very cool, and would save the memory copy that is otherwise necessary. See here and here . Also, I guess you need to add the smmu use statements to the device tree, after you figure out the stream ID of your AXI DMA interfaces. It is all explained on the wiki, but it sounds a bit iffy to be honest. It is not clear to me how you can guarantee contiguous memory address allocation for large buffers from user space. |
Sorry @brianvg @EKjeldsen, I've been pretty busy lately and haven't been active. So from the above thread it seems like there's two separate issues:
I don't think that the SMMU should have any influence on the correct operation of the driver, it is able to perfrom all the translations to physical addresses required by the IP. I'll have some time to work on this this weekend. Can one of you guys send me the configuration for your VDMA IP (a screenshot should be sufficient)? I still haven't gotten to the phase where I get a successful VDMA transfer. For both the VDMA test driver and my benchmark program, the transfer times out. I'm also working off a Zybo board, so I won't be able to directly replicate what you guys have on the ZynqMP board. P.S. Yeah @brianvg I understand your frustration. It's three years on from when I created this driver and Xilinx still doesn't have a driver that's usable directly from userspace programs. I imagine Xilinx has done it this way because they want to pigeonhole everyone into their SDSoC framework. |
@bperez77 I agree that it is all about moving people to SDSOC. Very annoying because they should not care what path people use to develop. Xilinx is ostensibly a hardware company. There are so many ways to do design input! Not everyone wants to pretend C++ is a good method to design programmable logic. I am not at all certain that my configurations are correct, but I have attached screenshots. Best Regards, Brian |
BTW, this is just my latest configuration. I have played around a lot over time... |
Here are our current VDMA setup screenshots. Thank you for looking at this further Brandon. Since you don't have any ZynqMP platform available, you could make suggestions for @brianvg and us to try on our hardware to further isolate the problem. I have noticed that you declare a static pointer to the character device - "axidma_dev" to store this pointer on the stack. Is is possible that is getting corrupted? The other critical argument in the ioctl is the "axidma_inout_transaction" structure which is also stored on the stack. Could this be somehow be going out of scope? |
To come back to some of your guys' original posts. When the two way/read write transfer occurs, the callback will be invoked twice. For the TX/write transfer, the callback is invoked with @EKjeldsen those are valid points about both of those structures. For the loopback test, even though those values are stored on the stack, they are gauranteed to remain in scope because the main kernel thread that invokes DMA transfer function will wait for the callback function to indicate that it can continue. So, the values will remain valid because the thread will be put to sleep while waiting. It's not clear from the code, but this is what should be happening. If had to guess, it seems that the macro I'm investigating more now. |
I'm still not able to get the VDMA IP working correctly with a loopback test. I tried both of your configurations against Xilinx's VDMA test driver, but both still timeout for me. Since I can't make more progress on this route, can you guys take your existing stack backtraces and determine the source line numbers as desribed in this StackOverflow post? |
when I run my benchmark on PetaLinux , I get 👍 prob is more a petalinux issue. |
@maikonadams how to add the driver's files to PetaLinux? I copy the drivers files |
Hi!
As I understand the benchmark example program should also work for a VDMA engine looped back, correct? I have a design which contains only a VDMA engine with TX and RX looped back. The module loads ok, and both channels seem to be seen:
[ 7.697740] axidma: axidma_dma.c: axidma_dma_init: 718: DMA: Found 0 transmit channels and 0 receive channels. [ 7.707679] axidma: axidma_dma.c: axidma_dma_init: 720: VDMA: Found 1 transmit channels and 1 receive channels.
When I run the benchmark code as below:
sudo ./axidma_benchmark -v -t 0 -r 1 -f 10x10x3 -g 10x10x3 -n 3
I get a segmentation fault.
In dmesg I see:
[ 170.988718] axidma: axidma_dma.c: axidma_start_transfer: 301: VDMA receive transaction timed out.
Any suggestions as to where I can look for the problem?
Thanks!!
The text was updated successfully, but these errors were encountered: