Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XVDPU-TRD C64B5 timeout on VCK190-ES #670

Closed
ksstms opened this issue Feb 8, 2022 · 4 comments
Closed

XVDPU-TRD C64B5 timeout on VCK190-ES #670

ksstms opened this issue Feb 8, 2022 · 4 comments

Comments

@ksstms
Copy link

ksstms commented Feb 8, 2022

I've modified the XVDPU-TRD to target the VCK190-ES board, and removed the HDMI and MIPI stuff from the design.
Everything works fine, except for the C64-B5 DPU configuration.

I get setup violations inside the DPU. Unfortunately I don't know how I could provide more info about it, as the path endpoints are <hidden>. On the schematic I can see that most of the paths are around URAM instances, and some of them are coming from a NOC AXI port. The clock is clkout1_primitive_1 which is the 150 MHz clock connected to s_axi_aclk of the DPU.

Every other configuration works with the original 150 MHz clock. PG389 page 30 recommends using 100 MHz for s_axi_aclk, so I modified that in the Makefile:

@@ -107,7 +107,7 @@ $(BUILD_DIR)/binary_container_1.xclbin: $(BINARY_CONTAINER_1_OBJS)
        @$(VXX) $(VXXFLAGS) -l --config scripts/system.cfg --config scripts/xvdpu_aie_noc.cfg \
          --clock.freqHz $(PL_FREQ):DPUCVDX8G_1.m_axi_aclk \
          --clock.freqHz $(PL_FREQ):ai_engine_0.aclk0 \
-         --clock.freqHz 150000000:DPUCVDX8G_1.s_axi_aclk \
+         --clock.freqHz 100000000:DPUCVDX8G_1.s_axi_aclk \
          -o "$@" $(BINARY_CONTAINER_1_OBJS) 

Now the timing is okay, but when I try to benchmark it, I get the following error:

root@xilinx-vck190-2021_2:~# xdputil benchmark resnet50-64-5.xmodel -i -1 2
XAIEFAL: INFO: Resource group Avail is created.
XAIEFAL: INFO: Resource group Static is created.
XAIEFAL: INFO: Resource group Generic is created.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0208 04:14:22.811815  1070 test_dpu_runner_mt.cpp:475] shuffle results for batch...
I0208 04:14:22.814556  1070 performance_test.hpp:73] 0% ...
I0208 04:14:28.814703  1070 performance_test.hpp:76] 10% ...
W0208 04:14:33.576429  1078 xrt_cu.cpp:188] cu timeout! device_core_idx 0  handle=0xffffa0010a10 ENV_PARAM(XLNX_DPU_TIMEOUT) 10000 state 1 ERT_CMD_STATE_COMPLETED 4 ms 10010  bo=1 is_done 0 
I0208 04:14:33.576498  1078 xrt_cu.cpp:99] Total: 10010162us	ToDriver: 18446744073705911us	ToCU: 1254us	Complete: 1808us	Done: 10010739us
F0208 04:14:33.576524  1078 dpu_control_xrt_xv_dpu.cpp:193] dpu timeout! core_idx = 0
 AP 1  LSTART 309  LEND 309  CSTART 217  CEND 215  SSTART 0  SEND 0  MSTART 214  MEND 214  CYCLE_L 1000905996  CYCLE_H 0 
*** Check failure stack trace: ***
/usr/bin/xdputil: line 20:  1068 Aborted                 /usr/bin/python3 -m xdputil $*

This is the same error that I got in issue #576. Is the ES AIE workaround script wrong for this configuration?

@ksstms ksstms changed the title XVDPU-TRD B64C5 timeout on VCK190-ES XVDPU-TRD C64B5 timeout on VCK190-ES Feb 8, 2022
@ksstms
Copy link
Author

ksstms commented Feb 8, 2022

Since this configuration uses more AIE cores than any other, I got suspicious of the ES AIE workaround script.
I checked it, and found that it only sets registers in cores from 0_0 to 39_8. The 4x_x cores are not affected.

In the script below I tried to modify the outer for loop to run to 49 instead of 39, and now the benchmark runs fine.

for i in {0..39}
do
for j in {1..8}
do
a=0x20000000000
b=0x31000
devmem $[a+b+(i<<23)+(j<<18)] 32 0
done
done

Please let me know if this is actually a good fix.

@qianglin-xlnx
Copy link
Contributor

Hi @ksstms
Thank you for your feedback. It's a good fix. We will update the script in README.md

@ksstms
Copy link
Author

ksstms commented Feb 9, 2022

Great! Thanks!

@ksstms
Copy link
Author

ksstms commented Feb 17, 2022

dsa/XVDPU-TRD/README.md fixed by 6b96cc3

@ksstms ksstms closed this as completed Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants