Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running on Vitis Getting Started tutorial: "Vitis Flow 101 – Part 4 : Build and Run the Example" on Alveo U250 #106

Closed
danna2019 opened this issue Sep 10, 2021 · 13 comments
Assignees

Comments

@danna2019
Copy link

danna2019 commented Sep 10, 2021

Following errors occured when running ./app.exe using same code of a vector add sample successfully ran on sw_emu/hw_emu with 'TEST PASSED' on my Alveo U250 host.

starra@alveo:~/Vitis-Tutorials/Getting_Started/Vitis/example/u250/hw$ ./app.exe 
INFO: Found Xilinx Platform
INFO: Loading 'vadd.xclbin'
XRT build version: 2.11.634
Build hash: 5ad5998d67080f00bca5bf15b3838cf35e0a7b26
Build date: 2021-06-08 22:08:45
Git branch: 2021.1
PID: 5819
UID: 1000
[Fri Sep 10 06:43:07 2021 GMT]
HOST: alveo016
EXE: /home/starra/Vitis-Tutorials/Getting_Started/Vitis/example/u250/hw/app.exe
[XRT] ERROR: See dmesg log for details. err=-2
[XRT] ERROR: failed to load xclbin: Invalid argument
[XRT] ERROR: Unexpected error memidx == -1
[XRT] ERROR: Unexpected error memidx == -1
[XRT] ERROR: Unexpected error memidx == -1
Segmentation fault (core dumped)

dmesg said:

...
[  257.381157] xocl 0000:d8:00.1:  ffff8b7eb6e860b0 xocl_read_axlf_helper: check interface uuid
[  257.381166] xocl 0000:d8:00.1:  ffff8b7eb6e860b0 xocl_fdt_check_uuids: Can not find uuid f2f6c5e1273e78948f2c4806221462f2
[  257.381203] xocl 0000:d8:00.1:  ffff8b7eb6e860b0 xocl_read_axlf_helper: interface uuids do not match
[  257.381224] xocl 0000:d8:00.1:  ffff8b7eb6e860b0 xocl_read_axlf_helper: Failed to download xclbin, err: -22
[  257.390111] show_signal_msg: 18 callbacks suppressed
[  257.390113] app.exe[2599]: segfault at 0 ip 0000561184c13e6d sp 00007ffd3656c810 error 6 in app.exe[561184c13000+c000]
[  257.390119] Code: ff ff 8b 95 bc fe ff ff 48 63 d2 48 8d 0c 95 00 00 00 00 48 8b 95 08 ff ff ff 48 01 d1 99 c1 ea 14 01 d0 25 ff 0f 00 00 29 d0 <89> 01 e8 cc f6 ff ff 8b 95 bc fe ff ff 48 63 d2 48 8d 0c 95 00 00
...

How can I fix it?

@danna2019 danna2019 changed the title Error running Error when running on Vitis Getting Started tutorial: "Vitis Flow 101 – Part 4 : Build and Run the Example" on Alveo U250 Sep 10, 2021
@randyh62
Copy link
Contributor

It seems like there might be a version mismatch for XRT between the development environment and the deployment environment.

@danna2019
Copy link
Author

xbmgmt said:

$ xbmgmt examine
System Configuration
  OS Name              : Linux
  Release              : 5.4.0-84-generic
  Version              : #94-Ubuntu SMP Thu Aug 26 20:27:37 UTC 2021  Machine              : x86_64
  CPU Cores            : 72
  Memory               : 386615 MB  Distribution         : Ubuntu 20.04.2 LTS
  GLIBC                : 2.31
  Model                : ProLiant DL380 Gen10 

XRT
  Version              : 2.11.634
  Branch               : 2021.1
  Hash                 : 5ad5998d67080f00bca5bf15b3838cf35e0a7b26
  Hash Date            : 2021-06-08 22:08:45
  XOCL                 : 2.11.634, 5ad5998d67080f00bca5bf15b3838cf35e0a7b26
  XCLMGMT              : 2.11.634, 5ad5998d67080f00bca5bf15b3838cf35e0a7b26

Devices present
  [0000:d8:00.0] : xilinx_u250_gen3x16_base_3 
  [0000:86:00.0] : xilinx_u250_gen3x16_base_3 
  [0000:37:00.0] : xilinx_u250_gen3x16_base_3

xrt version.json said:

$ cat /opt/xilinx/xrt/version.json 
{
  "BUILD_VERSION" : "2.11.634",
  "BUILD_VERSION_DATE" : "Tue, 08 Jun 2021 22:08:45 -0700",
  "BUILD_BRANCH" : "2021.1",
  "VERSION_HASH" : "5ad5998d67080f00bca5bf15b3838cf35e0a7b26",
  "VERSION_HASH_DATE" : "Tue, 8 Jun 2021 20:06:48 -0700"
}

xclbinutil said:

$xclbinutil  --info --input Vitis-Tutorials/Getting_Started/Vitis/example/u250/hw/vadd.xclbin
XRT Build Version: 2.11.634 (2021.1)
       Build Date: 2021-06-08 22:08:45
          Hash ID: 5ad5998d67080f00bca5bf15b3838cf35e0a7b26
------------------------------------------------------------------------------
Warning: The option '--output' has not been specified. All operations will    
         be done in memory with the exception of the '--dump-section' command.
------------------------------------------------------------------------------
Reading xclbin file into memory.  File: Vitis-Tutorials/Getting_Started/Vitis/example/u250/hw/vadd.xclbin

==============================================================================
XRT Build Version: 2.11.634 (2021.1)
       Build Date: 2021-06-08 22:08:45
          Hash ID: 5ad5998d67080f00bca5bf15b3838cf35e0a7b26
==============================================================================
xclbin Information
------------------
   Generated by:           v++ (2021.1) on 2021-06-09-14:19:56
   Version:                2.11.634
   Kernels:                vadd
   Signature:              
   Content:                Bitstream
   UUID (xclbin):          0ef1b67e-3fd9-a428-91af-087ba0aee0f5
   UUID (IINTF):           f2f6c5e1273e78948f2c4806221462f2
   Sections:               DEBUG_IP_LAYOUT, BITSTREAM, MEM_TOPOLOGY, IP_LAYOUT, 
                           CONNECTIVITY, CLOCK_FREQ_TOPOLOGY, BUILD_METADATA, 
                           EMBEDDED_METADATA, SYSTEM_METADATA, 
                           PARTITION_METADATA, GROUP_CONNECTIVITY, GROUP_TOPOLOGY

I seem they said XRT versions are same as '2.11.634'.

If I should take some information, please how I can take them.

@randyh62
Copy link
Contributor

That looks pretty consistent. Are you able to run any other examples on your U250?

@danna2019
Copy link
Author

No, it's first time run.
So, I think I have to started the "Getting Started" first example.
I believe the "Vector Add" example is a most simple example for using Vitis to Alveo U250.

If you have anthor simple examples for the alveo which are I should run before the Vector Add, please let me know.

@randyh62
Copy link
Contributor

randyh62 commented Sep 11, 2021

There are a number of different examples in the Vitis_Accel_Examples repository. You can build some of these pretty quickly, and run them on your hardware.
I would start with something like Hello World which also happens to use the vadd kernel. These examples have Makefiles to let you build them quickly, and can target a number of different platforms. The examples are also useful for exploring additional features of the system.

@danna2019
Copy link
Author

Using Hello World example, 'make all' could be successfully done, but 'make run' caused error below:

./hello_world ./build_dir.hw.xilinx_u250_gen3x16_xdma_3_1_202020_1/vadd.xclbin
Found Platform
Platform Name: Xilinx
INFO: Reading ./build_dir.hw.xilinx_u250_gen3x16_xdma_3_1_202020_1/vadd.xclbin
Loading: './build_dir.hw.xilinx_u250_gen3x16_xdma_3_1_202020_1/vadd.xclbin'
Trying to program device[0]: xilinx_u250_gen3x16_base_3
XRT build version: 2.11.634
Build hash: 5ad5998d67080f00bca5bf15b3838cf35e0a7b26
Build date: 2021-06-08 22:08:45
Git branch: 2021.1
PID: 473665
UID: 1000
[Sat Sep 11 16:36:40 2021 GMT]
HOST: alveo016
EXE: /home/starra/Vitis_Accel_Examples/hello_world/hello_world
[XRT] ERROR: See dmesg log for details. err=-2
[XRT] ERROR: failed to load xclbin: Invalid argument
Failed to program device[0] with xclbin file!
Trying to program device[1]: xilinx_u250_gen3x16_base_3
[XRT] ERROR: See dmesg log for details. err=-2
[XRT] ERROR: failed to load xclbin: Invalid argument
Failed to program device[1] with xclbin file!
Trying to program device[2]: xilinx_u250_gen3x16_base_3
[XRT] ERROR: See dmesg log for details. err=-2
[XRT] ERROR: failed to load xclbin: Invalid argument
Failed to program device[2] with xclbin file!
Failed to program any device found, exit!
make: *** [Makefile:169: run] Error 1

dmesg said same as previous vector add ran:

[154128.592603] xocl 0000:37:00.1:  ffff8b4eb6ab10b0 xocl_read_axlf_helper: check
 interface uuid
[154128.592611] xocl 0000:37:00.1:  ffff8b4eb6ab10b0 xocl_fdt_check_uuids: Can not find uuid f2f6c5e1273e78948f2c4806221462f2
[154128.593307] xocl 0000:37:00.1:  ffff8b4eb6ab10b0 xocl_read_axlf_helper: interface uuids do not match
[154128.593965] xocl 0000:37:00.1:  ffff8b4eb6ab10b0 xocl_read_axlf_helper: Failed to download xclbin, err: -22
[154128.656988] xocl 0000:37:00.1:  ffff8b4eb6ab10b0 xocl_destroy_client: client exits pid(473665)

I seem xocl_fdt_check_uuid is also failed same as previous vector add ran and caused the host application error.
How can I fix it?

I think it's troublesome that my Alveo host had be installed 3 alveo U250 cards.
If 3 cards would need some special configurations or command options, please let me know.

@randyh62
Copy link
Contributor

It seems like there is a configuration issue, though I am not sure where the problem lies. You can debug the Alveo installation using the following guide: https://xilinx.github.io/Alveo-Cards/master/debugging/docs/card-validation.html
Run some of the validation tests to see if your cards are working as expected.

@danna2019
Copy link
Author

I ran some of the validation tests on the guide, but all validation tests are successfully done.
How can I handle it?

@randyh62
Copy link
Contributor

If you have three cards, you can try to direct the xclbin to a different card in the system. This would be done in the host application. Currently it defaults to choosing the first card. You could change that to specify a card or hard code it to card 2 or 3?

    std::vector<cl::Device> devices = get_xilinx_devices();
    devices.resize(1);
    cl::Device device = devices[0];

@uday610
Copy link

uday610 commented Sep 11, 2021

Hello @danna2019 ,

I see the shell is not loaded for U250 platform, assuming you are using latest U250, which is a 2RP platform.

Please do execute this command first:

 sudo xbmgmt partition --scan 

This would show you the name of the shell partition installed in the system. Something like this:

Partitions installed in system:
    xilinx_u250_gen3x16_xdma_shell_3_1
        logic-uuid:
        bd5fb8abab266c3265918257b5048e88
        interface-uuid:
        f2f6c5e1273e78948f2c4806221462f2

For example, in my above sample output you see a Shell Partition (the name contain the word "shell" which is installed in system. Now copy paste that name and install the shell partition. For example

 sudo xbmgmt partition --program --name xilinx_u250_gen3x16_xdma_shell_3_1

After doing this do xbutil list again and you will see the shell partition is showing.

@uday610
Copy link

uday610 commented Sep 11, 2021

I forgot to mention in my previous reply regarding the --card option. As you have 3 cards you need to specify BDF to choose the card. So you should do this one after another for each card to load the shell.

 sudo xbmgmt partition --program --name xilinx_u250_gen3x16_xdma_shell_3_1 --card <bdf>

@danna2019
Copy link
Author

Thank you @uday610.
After 'xbmgmt partition', I could successfully run 'Hellow World' example like below:

$ make run DEVICE=/opt/xilinx/platforms/xilinx_u250_gen3x16_xdma_3_1_202020_1/xilinx_u250_gen3x16_xdma_3_1_202020_1.xpfm 
./hello_world ./build_dir.hw.xilinx_u250_gen3x16_xdma_3_1_202020_1/vadd.xclbin
Found Platform
Platform Name: Xilinx
INFO: Reading ./build_dir.hw.xilinx_u250_gen3x16_xdma_3_1_202020_1/vadd.xclbin
Loading: './build_dir.hw.xilinx_u250_gen3x16_xdma_3_1_202020_1/vadd.xclbin'
Trying to program device[0]: xilinx_u250_gen3x16_xdma_shell_3_1
Device[0]: program successful!
TEST PASSED

Unfortunately, I got another error when running the Vector add example on 'Getting_Started/Vitis/example' (originally this issue concerning, but it's a different error) below:

$ ./app.exe
INFO: Found Xilinx Platform
INFO: Loading 'vadd.xclbin'
terminate called after throwing an instance of '__gnu_cxx::recursive_init_error'
  what():  std::exception
Aborted (core dumped)

How can I fix it?

(bit curios, I already ran 'xbmgmt partition' accoding to the Alveo getting started instructions before open this issue, why and when partitions have been cleared...)

@danna2019
Copy link
Author

Oh, I got solve this error.
xbutil reset -d 0
ran before executes ./app.exe.

Thank you for your advices, @randyh62 and @uday610.

CRTejaswi pushed a commit to CRTejaswi/amd-vitis that referenced this issue Oct 3, 2023
37cba5b Clean WebpEnc
05b7e30 fixed index.html
d3e13e2 modified 2021.1 to 2021.2
5d21faf Merge pull request Xilinx#107 from yunleiz/fnext
2f4abaa [gui] add description
0183ab5 Merge pull request Xilinx#106 from yunleiz/fnext
672b1db [clean]fixed CR https://jira.xilinx.com/browse/CR-1109840
010b477 fixed CR https://jira.xilinx.com/browse/CR-1109840
a64fbce Merge pull request Xilinx#103 from yuxiangz/move
480ec86 rm L2include
8d4685b Merge pull request Xilinx#102 from siyangw/next
383fda2 fix some problem for https://jira.xilinx.com/browse/CR-1107161
de542c3 Merge pull request Xilinx#101 from yuxiangz/rmwebp
0b648f2 rm webp
5ff5869 Merge pull request Xilinx#100 from liyuanz/next
4fa8680 replace XILINX_VIVADO with XILINX_HLS
ffcb650 Merge pull request Xilinx#98 from yuxiangz/image_error
e95ae01 fixed image error
891a16e Merge pull request Xilinx#97 from yuxiangz/readme
b0c676e fixed benchmark
c04bc74 update release
1b28512 fixed kernel doc
c853e54 update benchmark wepb
573f3db revise code struct
451450e add wepb api
7d4d309 Merge pull request Xilinx#95 from yuxiangz/readme
b2e9ddd fixed error for readme
9f32eb2 Merge pull request Xilinx#94 from yuxiangz/readme
8974955 fixed error for readme
ca242c9 Merge pull request Xilinx#91 from yunleiz/fnext
53b7203 [doc] fixed pik profm in next
b53c54c [doc] fixed pik profm in net
4152274 Merge pull request Xilinx#90 from yunleiz/fnext
f4a9082 [doc] fixed readme on next
a1b4baa Merge pull request Xilinx#88 from siyangw/fix_sw_emu
4bbba7f change 2021.1_stable_latest to 2021.2_stable_latest
REVERT: 48cc941 Merge pull request Xilinx#99 from yuxiangz/cr-640
REVERT: c373206 fixed image error for master
REVERT: 4d5db06 Merge pull request Xilinx#92 from yunleiz/fmaster
REVERT: 19e4a69 [doc] fixed pik profm in master
REVERT: 74e5c6a Merge pull request Xilinx#89 from yunleiz/fmaster
REVERT: 2f9cc50 [doc] fixed readme on master
REVERT: 587473b Merge pull request Xilinx#87 from siyangw/fix_sw_emu
REVERT: 41a249c create master branch from next branch

git-subtree-dir: codec
git-subtree-split: 37cba5bec8072c63d0d75433cebe2467cd74f401

Co-authored-by: sdausr <[email protected]>
CRTejaswi pushed a commit to CRTejaswi/amd-vitis that referenced this issue Oct 3, 2023
f7d1abc Merge pull request Xilinx#122 from tuol/disable_2_case
ae62691 disable 2 case due to U250 platform change
3af143e Merge pull request Xilinx#118 from tuol/fix_cr_1122542
3e7f919 temporally disable L3/tests/mlp, due to U250 platform change
1728d13 update opts.cfg
98d3f3f Merge pull request Xilinx#117 from yuanqian/next
8639708 remove email from Jenkinsfile:https://jira.xilinx.com/browse/CR-1124831
18a7458 Merge pull request Xilinx#116 from changg/wa_u280_201920
86e28ef WA for xilinx_u280_xdma_201920_3
07abe54 Merge pull request Xilinx#114 from liyuanz/replace_cflags
7cb157c replace cflags with clflags
0196ded Merge pull request Xilinx#113 from changg/cov_fix
fc100b4 cov fix
b201f43 cov fix
14067e6 Merge pull request Xilinx#110 from liyuanz/next
bbe42e9 fix bug
257677d Merge pull request Xilinx#109 from changg/pr_108
79db50c fix makefiles
984a71c update Makefile and utils
daf9820 Merge pull request Xilinx#106 from liyuanz/replace_blacklist
28fe2ed replace whiltelist/blacklist to allowlist/blocklist
981b5a2 Merge pull request Xilinx#105 from changg/pr_104
2f45a63 add time for hw_build
a21b8db add time
7256e35 add time
5f2c36a Merge pull request Xilinx#102 from changg/add_extraflags
acce305 fix utils.mk
74536af fix utils.mk
3c0647e Merge pull request Xilinx#101 from liyuanz/next
fc26744 increase mem
7a1b220 Merge pull request Xilinx#99 from changg/fix_mks
055c521 fix typ
44ff7b9 fix utils.mk
4050d17 Merge pull request Xilinx#98 from liyuanz/replace_targets
b0157d6 update targes
e41fc60 Merge pull request Xilinx#96 from changg/metadata
f6d1e26 draft metadata
0bbb982 change 2021.2_stable_latest to 2022.1_stable_latest

Co-authored-by: sdausr <[email protected]>
CRTejaswi pushed a commit to CRTejaswi/amd-vitis that referenced this issue Oct 3, 2023
77d19fb Merge branch 'master' of https://gitenterprise.xilinx.com/FaaSApps/xf_genomics into next
7798873 Description updates
3b5337c new folder phmm8x2 for alveo boards
b7e2ecd update Makefile round two
5ae50e4 updated README for phmm
83bb7e9 QoR updates
7839976 PHMM board updates
504a5b2 Merge pull request Xilinx#106 from yuanqian/next
9ba4a9d remove email from Jenkinsfile:https://jira.xilinx.com/browse/CR-1124831
825bdd1 Merge branch 'master' of https://gitenterprise.xilinx.com/FaaSApps/xf_genomics into next
6375ca1 fixed smithwaterman cosim issue
8a86265 updates for description.json file
bb417c5 increaed memory limit for vivado_impl
cbfc11e increasing phmm hw_build timelimit
9493d78 Merge branch 'master' of https://gitenterprise.xilinx.com/FaaSApps/xf_genomics into next
841233c PHMM README updates

Co-authored-by: sdausr <[email protected]>
CRTejaswi pushed a commit to CRTejaswi/amd-vitis that referenced this issue Oct 3, 2023
1b75f16 Merge pull request Xilinx#117 from liyuanz/add_m
cfc460f update
1b1fd0c Merge pull request Xilinx#116 from tuol/cr_1138695
990951d remove connectivity from opts.cfg
fcff114 Merge pull request Xilinx#112 from liyuanz/next
3c583c5 Merge branch 'next' into next
d148b7e Merge pull request Xilinx#115 from tuol/1135042_2
517ab80 fix description.json
875ee0b Merge pull request Xilinx#114 from tuol/cr_1135042_1
66a513b fix description.json
818c768 Merge pull request Xilinx#113 from tuol/cr_1138695
0dd07e2 add missing app.bin
246611d update mk
be55cf9 Merge pull request Xilinx#111 from tuol/cr_1138321_1
0a3d580 fix --nk option in connectivity setup
c48114c Merge pull request Xilinx#110 from tuol/cr_1138321
35c48bc fix makefile, description.json and connectivity setup of cscmv and cscmvSingleHBM
651be1e Merge pull request Xilinx#109 from tuol/cr_1135042
e4becb4 remove un-allowed properties from description.json
370087d Merge pull request Xilinx#107 from yuanqian/update_doc_next_portal
b4d95f0 Merge pull request Xilinx#108 from liyuanz/add_mem
5fb26ce add mem
90fb7b8 update
24b1d5e add memory
f454c44 update doc in next branch for portal
0ea11e4 Merge pull request Xilinx#105 from yuanqian/update_hls_pragma
3491287 Merge pull request Xilinx#106 from liyuanz/next
4c5d9a3 update
f81ca5a update hls pragma
ab89f67 change 2022.1_stable_latest to 2022.2_stable_latest
20d34e9 Merge pull request Xilinx#103 from tuol/fix_conf_py
5b45226 update conf.py
eb29003 Update Jenkinsfile

Co-authored-by: sdausr <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants