Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vitis Custom Embedded Platform Creation Example on ZCU104 DPU Test 3: Run a Vitis-AI Demo not working #122

Closed
Ali-Flt opened this issue Nov 2, 2021 · 9 comments
Assignees

Comments

@Ali-Flt
Copy link

Ali-Flt commented Nov 2, 2021

Hi, I've gone through this tutorial with Vitis 2020.2 and Vitis AI v1.3 : https://github.com/Xilinx/Vitis-Tutorials/tree/2020.2/Vitis_Platform_Creation/Introduction/02-Edge-AI-ZCU104

With some slight differences:

  1. My Vitis platform has a MIG IP core for interfacing with PL-DDR4 SODIMM of ZCU104 board.
  2. I added some extra packages to my rootfs in petalinux (such as opencv)
  3. Instead of adding the Vitis AI Library using the explained method in Test 3, I cloned the repo using the code below:
    git clone https://github.com/Xilinx/Vitis-AI.git
    git checkout v1.3

And added the repo to vitis like this:
image

Every other step and instruction was followed without error.

But when I run the Vitis-AI demo on the bell pepper image, I get this for the first run:

root@zcu104_custom_plnx:~# env LD_LIBRARY_PATH=samples/lib XLNX_VART_FIRMWARE=/mnt/sd-mmcblk0p1/dpu.xclbin ./dpu_trd bellpeppe-994958.JPEG                                                                                                    

[  226.509859] [drm] Pid 1254 opened device                                                                                                                                                                                                   
[  226.513817] [drm] Pid 1254 closed device                                                                                                                                                                                                   
[  226.517845] [drm] Pid 1254 opened device                                                                                                                                                                                                   
[  226.521786] [drm] Pid 1254 closed device                                                                                                                                                                                                   
[  226.595670] [drm] Pid 1254 opened device                                                                                                                                                                                                   
[  226.599622] [drm] Pid 1254 closed device                                                                                                                                                                                                   
[  226.603564] [drm] Pid 1254 opened device                                                                                                                                                                                                   
[  226.607501] [drm] Pid 1254 closed device                                                                                                                                                                                                   
[  226.620442] [drm] Pid 1254 opened device                                        
[  226.624524] [drm] Pid 1254 closed device                                        
[  226.732663] [drm] Pid 1254 opened device                                                                                                                     
[  226.736601] [drm] Pid 1254 closed device                                              
[  226.740541] [drm] Pid 1254 opened device                                           
[  226.757298] [drm] get section DEBUG_IP_LAYOUT err: -22                                                         
[  226.757306] [drm] get section AIE_METADATA err: -22                             
[  226.762559] [drm] zocl_xclbin_read_axlf 1a5daa76-f818-40bb-af8a-c0bee51ee03b ret: 0            
[  226.771847] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=1
[  226.779523] [drm] No ERT scheduler on MPSoC, using KDS
[  226.791945] [drm] scheduler config ert(0)
[  226.791947] [drm]   cus(2)
[  226.795949] [drm]   slots(16)
[  226.798645] [drm]   num_cu_masks(1)
[  226.801612] [drm]   cu_shift(16)
[  226.805096] [drm]   cu_base(0xa0000000)
[  226.808309] [drm]   polling(0)
[  226.812174] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=0
[  226.815431] [drm] Pid 1254 opened device
[  226.826763] [drm] Pid 1254 closed device
[  226.830790] [drm] Pid 1254 opened device
[  226.834971] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=1
[  226.834992] [drm] Pid 1254 opened device
[  226.846316] [drm] Pid 1254 closed device
[  226.850262] [drm] Pid 1254 opened device
[  226.854305] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=2
[  226.854342] [drm] Pid 1254 opened device
[  226.865492] [drm] Pid 1254 closed device
[  226.869431] [drm] Pid 1254 opened device
[  226.873463] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=3
[  226.873968] [drm] Pid 1254 opened device
[  226.885109] [drm] Pid 1254 closed device
[  226.889045] [drm] Pid 1254 opened device
score[5]    =  0.00532136   text: electric ray, crampfish, numbfish, torpedo,
score[18]   =  0.00532136   text: magpie,
score[21[  226.893021] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=4
]   =  0.00532136   text: kite,
score[27]   =  0.00532136   tex[  228.466819] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=3
t: eft,
score[2]    =  0.00532136   text: great white shark, wh[  228.479460] [drm] Pid 1254 closed device
[  228.502080] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=2

[  228.502091] [drm] Pid 1254 closed device
[  228.705448] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=0
[  228.705454] [drm] Pid 1254 closed device
[  228.716795] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=0
[  228.716797] [drm] Pid 1254 closed device
[  228.728206] [drm] Pid 1254 closed device

Notice there are some errors that I have no idea about the reason:

[  226.757298] [drm] get section DEBUG_IP_LAYOUT err: -22                                                         
[  226.757306] [drm] get section AIE_METADATA err: -22                             
[  226.762559] [drm] zocl_xclbin_read_axlf 1a5daa76-f818-40bb-af8a-c0bee51ee03b ret: 0            
[  226.771847] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=1
[  226.779523] [drm] No ERT scheduler on MPSoC, using KDS
[  226.791945] [drm] scheduler config ert(0)
[  226.791947] [drm]   cus(2)
[  226.795949] [drm]   slots(16)
[  226.798645] [drm]   num_cu_masks(1)
[  226.801612] [drm]   cu_shift(16)
[  226.805096] [drm]   cu_base(0xa0000000)
[  226.808309] [drm]   polling(0)

And I get this results for the runs after the first one:

root@zcu104_custom_plnx:~# env LD_LIBRARY_PATH=samples/lib XLNX_VART_FIRMWARE=/mnt/sd-mmcblk0p1/dpu.xclbin ./dpu_trd b
ellpeppe-994958.JPEG
[ 1748.991559] [drm] Pid 1611 opened device
[ 1748.995524] [drm] Pid 1611 closed device
[ 1748.999547] [drm] Pid 1611 opened device
[ 1749.003469] [drm] Pid 1611 closed device
[ 1749.014716] [drm] Pid 1611 opened device
[ 1749.018667] [drm] Pid 1611 closed device
[ 1749.022609] [drm] Pid 1611 opened device
[ 1749.026530] [drm] Pid 1611 closed device
[ 1749.030646] [drm] Pid 1611 opened device
[ 1749.034660] [drm] Pid 1611 closed device
[ 1749.038987] [drm] Pid 1611 opened device
[ 1749.042912] [drm] Pid 1611 closed device
[ 1749.046862] [drm] Pid 1611 opened device
[ 1749.053860] [drm] zocl_xclbin_read_axlf The XCLBIN already loaded
[ 1749.053870] [drm] zocl_xclbin_read_axlf 1a5daa76-f818-40bb-af8a-c0bee51ee03b ret: 0
[ 1749.064277] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=1
[ 1749.071954] [drm] Reconfiguration not supported
[ 1749.083714] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=0
[ 1749.083851] [drm] Pid 1611 opened device
[ 1749.095173] [drm] Pid 1611 closed device
[ 1749.099195] [drm] Pid 1611 opened device
[ 1749.103379] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=1
[ 1749.103400] [drm] Pid 1611 opened device
[ 1749.114536] [drm] Pid 1611 closed device
[ 1749.118473] [drm] Pid 1611 opened device
[ 1749.122508] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=2
[ 1749.122541] [drm] Pid 1611 opened device
[ 1749.133675] [drm] Pid 1611 closed device
[ 1749.137612] [drm] Pid 1611 opened device
[ 1749.141596] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=3
[ 1749.142090] [drm] Pid 1611 opened device
[ 1749.153243] [drm] Pid 1611 closed device
[ 1749.157181] [drm] Pid 1611 opened device
score[5]    =  0.00522272   text: electric ray, crampfish, numbfish, torpedo,
score[4]    =  0.00522272   text: hammerhead, hamme[ 1749.161152] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b locked, ref=4
rhead shark,
score[18]   =  0.00522272   text: magpie,
score[2[ 1750.660487] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=3
0]   =  0.00522272   text: water ouzel, dipper,
score[2]    =  [ 1750.673183] [drm] Pid 1611 closed device
0.00522272   text: great white shark, white shark, man-eater, ma[ 1750.695665] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=2
n-eating shark, Carcharodon carcharias,
[ 1750.695675] [drm] Pid 1611 closed device
[ 1750.877305] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=0
[ 1750.877311] [drm] Pid 1611 closed device
[ 1750.888650] [drm] bitstream 1a5daa76-f818-40bb-af8a-c0bee51ee03b unlocked, ref=0
[ 1750.888653] [drm] Pid 1611 closed device
[ 1750.900065] [drm] Pid 1611 closed device

As you can see the app runs without errors but the predictions are not correct at all.
Any ideas why this could happen or how I can debug this?

Thanks

@imrickysu
Copy link
Contributor

The AIE_METADATA err: -22 can be safely ignored. It's an XRT code issue.
The wrong result can be caused by many reasons. Checking with Vitis-AI experts.

@Ali-Flt
Copy link
Author

Ali-Flt commented Nov 3, 2021

I went through all 4 steps of the tutorial again without changing anything. The only change was that I used Vitis AI branch 1.3.2 instead of V1.3 and still the results are incorrect.

This is getting really frustrating for me because I'm doing exactly what the tutorial tells me to do, I don't even get any errors but the model is not giving proper outputs.

@Ali-Flt
Copy link
Author

Ali-Flt commented Nov 3, 2021

I also installed Vitis 2021.2 and tried the whole flow with Vitis AI 1.4 (master). But the results were the same.

I forgot to mention that instead of loading the SD Card using the sd_card.img file, I extracted the zcu104_custom_plnx/images/linux/rootfs.tar.gz file in the second partition and copied all the files in the dpu_trd_system/Hardware/package/sd_card/ folder into the first SD card partition (boot partition). I assume this shouldn't change the application result. right?

I'm suspecting that because I have downloaded the vitis ai git separately and added its path to Vitis instead of letting Vitis download it itself the dpu trd application template files are not loaded properly. because I've done every other thing exactly like the tutorial. Could this be the cause of the issue?
Here are list of the warnings I get after Vitis application project has been built successfully:
image

Also here is the error I get when I try to download the Vitis AI library in the Vitis IDE:
image

I've tried downloading other gits as libraries in Vitis IDE without error but there seem to be a problem with Vitis AI Git.
Please tell me how to fix this error so that I can see if the issue is caused by the manual git download.

OS : Ubuntu 18.04 LTS
Vitis 2021.2
Vitis AI 1.4 (master)

@Ali-Flt
Copy link
Author

Ali-Flt commented Nov 4, 2021

Also I found out another problem. When I create the DPU Kernel application project using the Xilinx Official Vitis platform for ZCU104, the system builds without errors and resnet model works too.
There are some differences in the application project when I create it on top of the platform provided by xilinx and when I create it on top of my own custom platform. For example in the section below, the hw_link configurations are loaded automatically in the first case but not in the second case.
image

Why are such configurations not loaded in the application project on my custom platform? Can you tell me where the script for generating the application project files is located? and what could cause some files not to be loaded?

@imrickysu
Copy link
Contributor

The v++ configuration settings is set by https://github.com/Xilinx/Vitis-AI/blob/v1.3/dsa/DPU-TRD/prj/Vitis/config_file/prj_config_gui and this file is associated to the application project by https://github.com/Xilinx/Vitis-AI/blob/v1.3/dsa/DPU-TRD/description.json

In description.json, the "ldclflags" : "--config PROJECT/src/prj/Vitis/config_file/prj_config_gui" is set under platform_properties-> zcu104_base.
To workaround this issue, you can do any of the following

  • Add the v++ configuration manually
  • change your platform name to zcu104_base.
  • update description.json locally

If you update the description.json, it can be something like this:

"containers": [
        {
            "accelerators": [
                {
                    "kernel_type": "user", 
                    "name": "DPUCZDX8G",
                    "num_compute_units" : "2",
                    "build_command" : "$(VIVADO) -mode batch -source PROJECT/src/prj/Vitis/scripts_gui/gen_dpu_xo.tcl -tclargs $(PROJECT) $@ $(KERNEL_NAME) $(TARGET) $(DEVICE) $(XSA)",
                    "clean_command" : "rm -rf *.log *.jou *.xo packaged_* tmp_kernel_*",
                    "dependencies" : [
                        "src/prj/Vitis/kernel_xml/dpu/kernel.xml",
	                "src/prj/Vitis/scripts_gui/package_dpu_kernel.tcl",
	                "src/prj/Vitis/scripts_gui/gen_dpu_xo.tcl",
	                "src/prj/Vitis/dpu_conf.vh",
	                "src/dpu_ip/Vitis/dpu/hdl/DPUCZDX8G.v",
	                "src/dpu_ip/Vitis/dpu/inc/arch_def.vh",
	                "src/dpu_ip/Vitis/dpu/xdc/timing_clocks.xdc",
	                "src/dpu_ip/DPUCZDX8G_v3_3_0/ttcl/fingerprint_json.ttcl",
	                "src/dpu_ip/DPUCZDX8G_v3_3_0/hdl/DPUCZDX8G_v3_3_0_vl_dpu.sv",
	                "src/dpu_ip/DPUCZDX8G_v3_3_0/inc/function.vh",
                        "src/dpu_ip/DPUCZDX8G_v3_3_0/inc/arch_para.vh"
                    ]
                },
                {
                    "kernel_type": "user", 
                    "name": "sfm_xrt_top",
                    "build_command" : "$(VIVADO) -mode batch -source PROJECT/src/prj/Vitis/scripts_gui/gen_sfm_xo.tcl -tclargs $(PROJECT) $@ $(KERNEL_NAME) $(TARGET) $(DEVICE) $(XSA)",
                    "dependencies" : [
                        "src/prj/Vitis/kernel_xml/sfm/kernel.xml",
	                "src/prj/Vitis/scripts_gui/package_sfm_kernel.tcl",
	                "src/prj/Vitis/scripts_gui/gen_sfm_xo.tcl",
	                "src/dpu_ip/Vitis/sfm/hdl/sfm_xrt_top.v",
	                "src/dpu_ip/DPUCZDX8G_v3_3_0/hdl/DPUCZDX8G_v3_3_0_vl_sfm.sv",
	                "src/dpu_ip/DPUCZDX8G_v3_3_0/xci/sfm/fp_acc/fp_acc.xci",
	                "src/dpu_ip/DPUCZDX8G_v3_3_0/xci/sfm/fp_add/fp_add.xci",
	                "src/dpu_ip/DPUCZDX8G_v3_3_0/xci/sfm/fp_convert/fp_convert.xci",
	                "src/dpu_ip/DPUCZDX8G_v3_3_0/xci/sfm/fp_div/fp_div.xci",
	                "src/dpu_ip/DPUCZDX8G_v3_3_0/xci/sfm/fp_exp/fp_exp.xci"
                    ]
                }
            ], 
            "name": "dpu",
            "ldclflags" : "--config PROJECT/src/prj/Vitis/config_file/prj_config_gui"
        }
    ], 

I have reported this issue before but the fix hasn't been applied yet. Sorry for this gap in the tutorial.

@Ali-Flt
Copy link
Author

Ali-Flt commented Nov 4, 2021

Hi @imrickysu,

Thanks for the answer, I really appreciate your quick answers to my comments.
Yes today after I posted the last comment, I searched in the files and found the thing you mentioned in the .json file. And I was really shocked of the fact that the project's behavior depends on your platform's name. Please at least mention this in the tutorial.

But even with having zcu104_base in the platform's name, the resnet from the application project on my custom platform is not working.

root@zcu104_custom_plnx:~# env LD_LIBRARY_PATH=samples/lib XLNX_VART_FIRMWARE=/media/sd-mmcblk0p1/dpu.xclbin ./dpu bellpeppe-994958.JPEG
score[37]   =  0.0396331    text: box turtle, box tortoise,
score[117]  =  0.0396331    text: chambered nautilus, pearly nautilus, nautilus,
score[121]  =  0.0396331    text: king crab, Alaska crab, Alaskan king crab, Alaska king crab, Paralithodes camtschatica,
score[149]  =  0.0396331    text: dugong, Dugong dugon,
score[85]   =  0.0396331    text: quail,

I'm really frustrated at this moment after running this tutorial on many different conditions several times, so please tell me if you have any idea why the model may not work. Or a way to debug the DPU cores' behavior.
should I change anything in the petalinux project/vitis?
The worst part is that the Vitis project builds without any errors, giving no clues about where the issue may be.

Obviously this tutorial has not been tested on the current version of Vitis and Vitis AI, so please test it, find the issues and update the tutorial.

@imrickysu imrickysu self-assigned this Nov 6, 2021
@imrickysu
Copy link
Contributor

imrickysu commented Nov 7, 2021

Hi @Ali-Flt , I reran the VAI test for 2020.2. It worked well on my side.

Could you try to create the Vitis-AI application with the platform generated by the Makefile ? You can run make all and generate the platform.

For the configuration setting issue we discussed above, the tutorial Step 5 (Update system_hw_link for proper kernel instantiation) considered this issue and provided the method to overcome the descrption.json setting specific the platform name.

image

image

@Ali-Flt
Copy link
Author

Ali-Flt commented Nov 7, 2021

Hi @imrickysu ,
Thanks for going through the tutorial for verification.
I used the Makefile as you explained on Vitis 2021.2 to run all steps and the resnet app is working successfully now:
image

So after that I went to the make scripts in each step and looked for differences with the tutorial. Please read the differences I found and update the tutorial, because one of them is probably the cause of the platform not working.
Step 1:

  • The clock IDs were not changed to start from 0 as mentioned in the tutorial:
    image

  • In exporting the XSA, the platform name was in uppercase:
    image

  • The locked signal of the clocking wizard was not connected to the processing system reset IP cores as mentioned in the tutorial.
    image

I couldn't find any other differences but I may have missed something. I also didn't check the PS's configurations for any mismatch.

Step 2:

  • In the Step 2 Makefile there are some configs that are not described in the tutorial:
line 42: echo 'CONFIG_YOCTO_MACHINE_NAME="zcu104-zynqmp"' >> $(PETALINUX_CONFIG)
line 44: echo "CONFIG_YOCTO_BUILDTOOLS_EXTENDED=y" >> $(PETALINUX_CONFIG)
line 76: cd $(PETALINUX_DIR) && petalinux-package --boot --u-boot
line 80: cd $(PETALINUX_DIR) && petalinux-package --sysroot

(Note that the rootfs configs were different too but I believe the problem is not hidden in the rootfs because I ran the test with my own generated rootfs without issues so I didn't mention the rootfs differences.)

Step 3:
The platform is generated with this script so I don't exactly know the differences with the GUI Flow, but I think the main one is that the domain name in the script is set to "xrt" but in the GUI flow it is "linux on psu_cortexa53".

I did the last step (running the Vitis AI demo) exactly like before in the GUI so either the error lies in the things I mentioned above, or the behavior of VIVADO/Vitis GUI flow is not as expected and is not the same as the VIVADO/xsct script flow.

Thanks again for solving the issue for me by your suggestion and I hope this info helps in finding and fixing the issue in the tutorial.

imrickysu pushed a commit that referenced this issue Dec 3, 2021
* updated version to 2021.2
vmayoral pushed a commit to vmayoral/Vitis-Tutorials that referenced this issue Jan 20, 2022
imrickysu pushed a commit that referenced this issue Nov 3, 2022
* updated version to 2021.2
CRTejaswi pushed a commit to CRTejaswi/amd-vitis that referenced this issue Oct 3, 2023
f7d1abc Merge pull request Xilinx#122 from tuol/disable_2_case
ae62691 disable 2 case due to U250 platform change
3af143e Merge pull request Xilinx#118 from tuol/fix_cr_1122542
3e7f919 temporally disable L3/tests/mlp, due to U250 platform change
1728d13 update opts.cfg
98d3f3f Merge pull request Xilinx#117 from yuanqian/next
8639708 remove email from Jenkinsfile:https://jira.xilinx.com/browse/CR-1124831
18a7458 Merge pull request Xilinx#116 from changg/wa_u280_201920
86e28ef WA for xilinx_u280_xdma_201920_3
07abe54 Merge pull request Xilinx#114 from liyuanz/replace_cflags
7cb157c replace cflags with clflags
0196ded Merge pull request Xilinx#113 from changg/cov_fix
fc100b4 cov fix
b201f43 cov fix
14067e6 Merge pull request Xilinx#110 from liyuanz/next
bbe42e9 fix bug
257677d Merge pull request Xilinx#109 from changg/pr_108
79db50c fix makefiles
984a71c update Makefile and utils
daf9820 Merge pull request Xilinx#106 from liyuanz/replace_blacklist
28fe2ed replace whiltelist/blacklist to allowlist/blocklist
981b5a2 Merge pull request Xilinx#105 from changg/pr_104
2f45a63 add time for hw_build
a21b8db add time
7256e35 add time
5f2c36a Merge pull request Xilinx#102 from changg/add_extraflags
acce305 fix utils.mk
74536af fix utils.mk
3c0647e Merge pull request Xilinx#101 from liyuanz/next
fc26744 increase mem
7a1b220 Merge pull request Xilinx#99 from changg/fix_mks
055c521 fix typ
44ff7b9 fix utils.mk
4050d17 Merge pull request Xilinx#98 from liyuanz/replace_targets
b0157d6 update targes
e41fc60 Merge pull request Xilinx#96 from changg/metadata
f6d1e26 draft metadata
0bbb982 change 2021.2_stable_latest to 2022.1_stable_latest

Co-authored-by: sdausr <[email protected]>
CRTejaswi pushed a commit to CRTejaswi/amd-vitis that referenced this issue Oct 3, 2023
cf4065d Merge pull request Xilinx#123 from RepoOps/update_readme_5
4890779 update README
fa29498 update README
61c2cb5 Merge pull request Xilinx#119 from RepoOps/update_doc_url_3
b871455 fix url
2f7fb05 Merge pull request Xilinx#122 from tuol/cr_1142093_2
59cf572 fix input of cscmv
de579fa Merge pull request Xilinx#121 from tuol/cr_1140416
c00a509 update makefile and description.json for L2_tests_fp64_spmv
0a0771e update url and branch in doc
a69541e Merge pull request Xilinx#118 from tuol/fix_version
dfc5cb7 update version to 2022.2

Co-authored-by: sdausr <[email protected]>
@AnusheelXilinx
Copy link
Collaborator

Closing the thread as there are no open concerns.

Thanks
Anusheel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants