Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with depth in Hardware kernel code of Vitis-Tutorials/Hardware_Acceleration/Design_Tutorials/01-convolution-tutorial #119

Closed
Ali-Flt opened this issue Oct 15, 2021 · 5 comments
Assignees

Comments

@Ali-Flt
Copy link

Ali-Flt commented Oct 15, 2021

Hi,
I've read the whole tutorial and understood all of it. But an issue I have is that the depth of output_stream in the top-level function of the filter2d_hw.cpp file is set to 64.
image

Why is this? Is it 64 so that the M_AXI interface can perform data width auto-widening to 512 (64 x 8) bits? If so then why is the input stream (pixel_stream) depth not set to 64 too and only the stream for output has 64 depth?

What I assumed was that even if the depth is 2, the M_AXI interface will be able to understand that the data is flowing in the kernel without stall, therefore achieving 512 bit data width for burst transfers.

I also synthesized the code for both 2 and 64 as depth value for output_stream and both systems pass the C/RTL Co-simulation without problem. The only difference in the synthesis reports I saw was that the FIFOs used 51 more LUTs which is because of the output_stream FIFO depth of 64.

P.S.: Although the FIFO depth is 32 times more (2 x 32 = 64), the number of additional LUTs used is less than two times more (51). So I'm assuming HLS has understood that it doesn't need all of the 64 FIFOs. But even the slight addition of LUTs is strange to me because it should not need more than depth=2.

Also there is a hint comment saying you can set the stream depth to 0 for minimum resources, how can the FIFO be created with a depth of 0? should the depth not be at least 1? When I set depth to 0 suddenly there are no BRAMs used for the window_stream but for Depth of 3, 100 BRAMs are used. Why?

Thanks,
Ali

@allyzhou
Copy link
Collaborator

We're looking at this issue and will get back to you soon. Thanks.

@akhilayyagari
Copy link

Hi @Ali-Flt - I have extracted the independent questions and answered them independently

Why is it necessary to have a FIFO depth of "64" instead of "2"

The Dataflow throughput is dynamic in nature and it is dependent on the FIFO full and empty conditions. When the user selects the FIFO of depth "2" they would see a small stall percentage because of FIFO getting full ( can be seen in the dataflow profiling information). To remove this stall, the author has increased it to"64", but the depth can go as low as "16" without any stall percentage.

How the MAXI interface can infer a 512-bit data width for burst transfers

the inferences of the 512-bit width is not dependent on the FIFO depth, it is dependent on the memory access pattern of the "dst" variable which is sequential. This infers the 512-bit width, there can be cases where the memory access pattern is random which fails the inferences

Resource utilization of the FIFO for a depth of "64" and "2"

The resource utilization of HLS is only an estimate, run the vivado to get more accurate results

FIFO of the depth of "0"

This comment will be removed, you cannot set a FIFO of depth "0", the default would be "2"

@Ali-Flt
Copy link
Author

Ali-Flt commented Oct 20, 2021

Hi @akhilayyagari ,
Thank you very much for the detailed answer.
If I've understood correct about the FIFO depth, the right flow of configuring the depths will be to start from a high number (maximum number of data that will be pushed into each FIFO) and run C/RTL co-simulation with buffer profiling and lower the depth until stall percentage goes higher than 0.

About the setting FIFO depth to 0 comment, I was assuming that maybe by setting the depth to 0, the depth will be calculated by DATAFLOW and therefore the dataflow fifo depth will override the user input (0). I ran the synthesis with depth of 0 and the depth of the FIFO was actually set to 2, using no BRAMs.

But about you saying you can't set the FIFO depth to "0", I actually changed the config_dataflow setting "-fifo_depth" to 0 and I observed that it synthesized the design without any errors. And the fifo depth in the synthesis resource profiling report was set to 0 too.

How is this possible? Shouldn't the fifo depth be at least 1?
There could be two scenarios:

  1. The system is actually synthesized with FIFO depth = 0 somehow and dataflow handled data transfer between registers without the need of buffers/fifos
  2. The system is synthesized with a depth higher than 0 and the "-fifo_depth 0" setting was ignored and the synthesis report is wrong.

Which one is the correct scenario?

Thanks again,
Ali

@akhilayyagari
Copy link

hi @Ali-Flt ,

There are two ways you can optimize the FIFO depth,

  • Manual Way: The designer would need to size the FIFO starting from maximum to an optimal size.
  • Automatic Way: The co-simulation provides a feature that can automatically find the optimal FIFO depth by running multiple simulations, which may take a long time for big designs

The second issue you have reported may be a bug, it should not take the "0" which does not make sense. We will fix this in the future release. The minimum depth should be "1", the tool default is "2". if the depth of the FIFO is less than 1024, then they use registers unless explicitly specified using a pragma.

vmayoral pushed a commit to vmayoral/Vitis-Tutorials that referenced this issue Jan 20, 2022
CRTejaswi pushed a commit to CRTejaswi/amd-vitis that referenced this issue Oct 3, 2023
9ba8ba0 Cleanup 1028 (Xilinx#119)
715ca77 update doc (Xilinx#118)
bd2a926 Issue#111, solve timing issue, apply jacobian, reduce latency. (Xilinx#114)

Co-authored-by: sdausr <[email protected]>
CRTejaswi pushed a commit to CRTejaswi/amd-vitis that referenced this issue Oct 3, 2023
cf4065d Merge pull request Xilinx#123 from RepoOps/update_readme_5
4890779 update README
fa29498 update README
61c2cb5 Merge pull request Xilinx#119 from RepoOps/update_doc_url_3
b871455 fix url
2f7fb05 Merge pull request Xilinx#122 from tuol/cr_1142093_2
59cf572 fix input of cscmv
de579fa Merge pull request Xilinx#121 from tuol/cr_1140416
c00a509 update makefile and description.json for L2_tests_fp64_spmv
0a0771e update url and branch in doc
a69541e Merge pull request Xilinx#118 from tuol/fix_version
dfc5cb7 update version to 2022.2

Co-authored-by: sdausr <[email protected]>
@AnusheelXilinx
Copy link
Collaborator

Closing the thread as there are no open concerns.

Thanks
Anusheel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants