Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XRT won't configure PL on *_base platforms? #2747

Closed
SchuellerSe opened this issue Jan 31, 2020 · 11 comments
Closed

XRT won't configure PL on *_base platforms? #2747

SchuellerSe opened this issue Jan 31, 2020 · 11 comments

Comments

@SchuellerSe
Copy link

I'm trying to run the hello world example from the Vitis_Accel_Examples on a custom platform based on the zcu104_base platform form the official embedded platforms.

Running ./init.sh on the the board results in the following output:

root@MarsXu3:/mnt/hello_world# ./init.sh
.
Found Platform
Platform Name: Xilinx
INFO: Reading ./build_dir.hw.mars_xu3_base/vadd.xclbin
Loading: './build_dir.hw.mars_xu3_base/vadd.xclbin'
Trying to program device[0]: mars_xu3_base
Device[0]: program successful!

The host process never terminates and has to be terminated with .
Here is the dmesg output snippet that is produced when ./init.sh is executed (printk loglevel is set to debug):

[   88.765634] [drm] Pid 2325 opened device
[   88.769587] [drm] Pid 2325 closed device
[   88.785358] [drm] Pid 2325 opened device
[   89.097666] [drm] Finding IP_LAYOUT section header
[   89.097677] [drm] Section IP_LAYOUT details:
[   89.102488] [drm]   offset = 0x54fcf8
[   89.106754] [drm]   size = 0x58
[   89.110416] [drm] Finding DEBUG_IP_LAYOUT section header
[   89.113550] [drm] AXLF section DEBUG_IP_LAYOUT header not found
[   89.118855] [drm] Finding CONNECTIVITY section header
[   89.124764] [drm] Section CONNECTIVITY details:
[   89.129808] [drm]   offset = 0x54fd50
[   89.134329] [drm]   size = 0x28
[   89.137988] [drm] Finding MEM_TOPOLOGY section header
[   89.141126] [drm] Section MEM_TOPOLOGY details:
[   89.146169] [drm]   offset = 0x54fc00
[   89.150702] [drm]   size = 0xf8
[   89.155892] [drm] No ERT scheduler on MPSoC, using KDS
[   89.164640] [drm] scheduler config ert(0)
[   89.164642] [drm]   cus(1)
[   89.168647] [drm]   slots(16)
[   89.171344] [drm]   num_cu_masks(1)
[   89.174303] [drm]   cu_shift(16)
[   89.177782] [drm]   cu_base(0x80000000)
[   89.181003] [drm]   polling(0)
[   89.195344] [drm] User buffer is not physical contiguous
[   89.203715] [drm] zocl_free_userptr_bo: obj 0x00000000a468fc29
[  116.507682] [drm] zocl_free_userptr_bo: obj 0x00000000f79c66a6
[  116.513529] [drm] zocl_free_userptr_bo: obj 0x0000000069f734fe
[  116.519370] [drm] Pid 2325 closed device

The process seems to be stuck waiting for an interrupt that never arrives:

root@MarsXu3:/mnt/hello_world# cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
  4:     101272      39932      39313      35885     GICv2  30 Level     arch_timer
          … omitted for legibility …
 42:          0          0          0          0  axi-interrupt-ctrl   0 Level   -level     zocl
 74:          0          0          0          0     GICv2  97 Level     xhci-hcd:usb1
IPI0:      2008       1732       3298       2892       Rescheduling interrupts
IPI1:      3406       3590        421       3543       Function call interrupts
IPI2:         0          0          0          0       CPU stop interrupts
IPI3:         0          0          0          0       CPU stop (for crash dump) interrupts
IPI4:         0          0          0          0       Timer broadcast interrupts
IPI5:         0          0          0          0       IRQ work interrupts
IPI6:         0          0          0          0       CPU wake-up interrupts

I think the PL never gets configured at all. Looking into zocl, the only code that
handles FPGA programming is in zocl_fpga_mgr_load in zocl_ioctl.c here: https://github.com/Xilinx/XRT/blob/2019.2_RC1/src/runtime_src/core/edge/drm/zocl/zocl_ioctl.c#L193
According to that procedure there should be some kernel logging happening.

If I understand the code correctly, XRT is calling the DRM_ZOCL_READ_AXLF ioctl when creating an OpenCL Program. The corresponding kernel procedure zocl_read_axlf_ioctl then checks whether the platform support PR by checking if there is a zdev->pr_isolation_addr (for reference: https://github.com/Xilinx/XRT/blob/2019.2_RC1/src/runtime_src/core/edge/drm/zocl/zocl_ioctl.c#L577 ).
According to the comment directly above the check the pr_isolation_addr should be configured in the device tree. I can't find it in any *.dts(i) files of the *_base platforms. It's only configured for the zcu102_base_dfx platform.

Given that the _base platforms aren't build for PR, is there a way to use them with XRT?

@alexanpatr
Copy link

I tried to run examples from the Vitis_Accel_Examples, on a platform based on official embedded platforms, unsuccesfully.

The issue appears to be similar with the one that @FenrisTC mentions.

I would like also to mention that similar issues came up using the pre-built platform (zcu102_base). I was unable to run the examples.

Furthermore, i used Vivado to manually create the hardware components for a custom platform, based on the UG1416, and the results were the same. The PL seems to be unresponsive, and the kernel stalls.

Using :
Vitis 2019.2
PetaLinux 2019.2
XRT 2.3.0
ZCU102 UltraScale+ Eval Board (Production)

@SchuellerSe
Copy link
Author

I've found a reasonably ugly workaround that gives me a running PL design.

First, build a new XSA by following the steps in section IX of UG1393 (here). Don't add the AXI Interrupt Controller, it has to be initialized after each configuration which the workaround won't do.
Also remove the AXI Interrupt Controller and all interrupt sections from system-user.dtsi in the petalinux project.

Then get the fpgautil source code (linked here) and build it for your target platform.

Extract the pure bitstream from your xclbin files with
xclbinutil --dump-section BITSTREAM:RAW:vadd.bit -i vadd.xclbin

Boot your fresh platform and copy the fpgautil binary, the bitstream, your host-executable and the xclbin onto your board.

Before running your host code, manually configure the FPGA with fpgautil -b ./vadd.bit.
Then the example should run.

I'm fairly certain, that the zocl driver in it's released and current form never loads a full bitstream; at least not on a platform that has no support for PR.

@alexanpatr
Copy link

I confirm that this workaround seems to work. I tried it for 3 examples from Vitis_Accel_Examples and for a custom kernel of my own. All run properly with correct results.

Looking at kernel messages we can see:

[drm] Fail to install CU 0 interrupt handler: -22. Fall back to polling mode.

I do not know what other side effects might exist.

@FenrisTC Thanks for your help.

@zohourih
Copy link
Contributor

zohourih commented Feb 21, 2020

Maybe not very helpful in solving your problem, but I have ran both the Vitis Hello World example and a kernel of my own using the Vitis 2019.2 flow and the official zcu104_base platform out of the box. The only possible difference in my flow which I can think is that I used xrt_201920.2.3.1301_7.4.1708-xrt.rpm from here on CentOS 7 to compile the kernels. Then I just copy the content of the output sd_card folder into an SD Card and the kernel runs fine after booting the board.

@zohourih
Copy link
Contributor

As an update to my previous post, I think I am now encountering the same problem as the everyone else in this thread. Essentially, with the existing base platform for ZCU104, execution of code compiled with Vitis works only if the BOOT.BIN used for booting the board has the exact same FPGA bitsream as the one that is supposed to be used by the host code. If the BOOT.BIN contains a different bitstream, the OpenCL run-time will claim the FPGA has been reconfigured successfully, but then the host code either hangs or produces incorrect output in this case.

@zohourih
Copy link
Contributor

zohourih commented Oct 9, 2020

Based on the information I have gathered so far, standard base platforms for MPSoC boards do not support run-time reconfiguration and only "dfx" platforms such as the one here do. Hence, the issue we are all facing here with the base platform for ZCU104 is not actually an issue with XRT or the base platform and things are in fact working as they should. Though of course it would be nice if XRT correctly fails to reprogram the FPGA at run-time for non-dfx platforms on MPSoC so that users do not end up running in circles trying to figure out why run-time reconfiguration doesn't actually work even though XRT claims it has been successful.

@alexanpatr
Copy link

alexanpatr commented Oct 19, 2020

@zohourih I confirm that. Only DFX platform can be reconfigured at runtime, and it was not a problem with XRT.

I think the issue can be closed.

@SchuellerSe
Copy link
Author

Is there some mention of this in any official documentation? I couldn't find any mention that only DFX platforms are supported for acceleration on embedded platforms and there were incomplete code-paths in the zocl driver for 2019.2 that handled full bitstream programming (up to a the point I mentioned in the Issue).

And from the looks of it, the current master zocl driver seems to handle full bitstreams as well, but I didn't try 2020.1 yet.
But if there is some documentation I couldn't find that says that configuration after is only supported on DFX platforms or if it's resolved in 2020.1 I'll gladly close the issue.

@alexanpatr
Copy link

alexanpatr commented Nov 2, 2020

@FenrisTC I apologize for the delay.

We can see this in Vitis 2020.1 documentation section:

The Xilinx Dynamic Function eXchange (DFX) feature can change some blocks of PL function while keeping other areas of PL working, allowing you to configure PL kernels on the fly. To use the DFX feature, when the xclbin file is generated, configure it with your host application. The new kernels in the xclbin take effect immediately without requiring a reboot.

For platforms without DFX features, PL kernel must be packed into boot.bin. Copy it to the FAT32 partition on your SD card and reboot the system. Then, configure the xclbin file with your host application.

The xclbin file contains both bit files for PL kernel and metadata to describe these kernel features and connections. Programming the xclbin file on DFX platforms loads the bit file and metadata; programming on non-DFX platforms only loads the metadata.

The situation that we observed is exactly what is mentioned in this documentation section. The PL was never programmed and only the metadata was loaded.

@SchuellerSe
Copy link
Author

Oh ok, I completely missed that, thanks for the Info!

@kapoor7997
Copy link

kapoor7997 commented Feb 28, 2022

Hi SchuellerSe,

I've found a reasonably ugly workaround that gives me a running PL design.

First, build a new XSA by following the steps in section IX of UG1393 (here). Don't add the AXI Interrupt Controller, it has to be initialized after each configuration which the workaround won't do. Also remove the AXI Interrupt Controller and all interrupt sections from system-user.dtsi in the petalinux project.

Then get the fpgautil source code (linked here) and build it for your target platform.

Extract the pure bitstream from your xclbin files with xclbinutil --dump-section BITSTREAM:RAW:vadd.bit -i vadd.xclbin

Boot your fresh platform and copy the fpgautil binary, the bitstream, your host-executable and the xclbin onto your board.

Before running your host code, manually configure the FPGA with fpgautil -b ./vadd.bit. Then the example should run.

I'm fairly certain, that the zocl driver in it's released and current form never loads a full bitstream; at least not on a platform that has no support for PR.

Hi I am trying to get this working. I want to build the hardware but I dont see anything mentioned about hardware design in the Section IX of ug1393. The link you shared doesnt work so I downloaded ug1393 from here. https://www.xilinx.com/support/documentation/sw_manuals/xilinx2020_2/ug1393-vitis-application-acceleration.pdf.

Please guide.
I just want to get the vitis vision library working and I am having this issue where the example from vition vision libraries (Vitis_Libraries/vision/L2/examples/colorcorrectionmatrix) hangs up and i think its zocl interrupt related.

2020.2 run:
Xilinx/Vitis_Libraries#80

2021.2 run:
Xilinx/Vitis_Libraries#113

Thanks,
Kapoor7997

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants