Pointwise Conv1D with code generation for "Latency" strategy (update of #811) #881

jmduarte · 2023-10-08T05:18:06Z

Description

Update of #811 with code generation. This PR adds an explicit pointwise Conv1D implementation, where the reuse factor (RF) is used to split the layer execution and reuse the existing module RF times

Original pointwise Conv1D:

(in_width, n_chan) -> (in_width, n_filt)

This PR splits it into RF calls of

(in_width/RF, n_chan) -> (in_width/RF, n_filt)
(in_width/RF, n_chan) -> (in_width/RF, n_filt)
(in_width/RF, n_chan) -> (in_width/RF, n_filt)
...

The II ~ RF. It is on by default, but I think you should be able to use the standard conv1d implementation by skipping the optimizer.

Limitations:

Assumes in_width is divisible by RF

Type of change

New feature (non-breaking change which adds functionality)
A new research paper code implementation

Tests

See test/pytest/test_pointwiseconv.py

Checklist

I have read the guidelines for contributing.
I have commented my code, particularly in hard-to-understand areas.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have installed and run pre-commit on the files I edited or added.
I have added tests that prove my fix is effective or that my feature works.

hls4ml/backends/fpga/fpga_backend.py

hls4ml/backends/fpga/passes/codegen.py

hls4ml/backends/vivado/passes/convolution_templates.py

hls4ml/backends/vivado/vivado_backend.py

hls4ml/templates/vivado/build_prj.tcl

hls4ml/templates/vivado/nnet_utils/nnet_code_gen.h

hls4ml/templates/vivado/nnet_utils/nnet_common.h

hls4ml/templates/vivado/nnet_utils/nnet_conv1d.h

test/pytest/test_pointwiseconv.py

hls4ml/backends/vivado/vivado_backend.py

hls4ml/templates/vivado/nnet_utils/nnet_conv1d_latency.h

jmduarte · 2024-11-13T04:49:22Z

pre-commit.ci autofix

vloncar

Looks very good, there are some minor tweaks to integrate cleaner. Works very well, which is more important :-)

vloncar · 2024-11-18T00:01:19Z

hls4ml/backends/vivado/passes/pointwise.py

+    template<unsigned K, unsigned S, unsigned W>
+    using scale_index = nnet::{scale_index_type}<K, S, W>;
+    template<class data_T, class res_T, class CONFIG_T>
+    using pointwise_conv = nnet::{pointwise_fn}<data_T, res_T, CONFIG_T>;


In Dense layers we moved to using a function pointer like this that the main dense() function calls, eliminating the need for checks in HLS. Here it would also simplify the call hierarchy (no need for special handling of pointwise) and no need for this template.

attempted in jmduarte@d56dc73

hls4ml/backends/vivado/passes/pointwise_codegen.py

vloncar · 2024-11-18T00:03:02Z

hls4ml/backends/vivado/passes/pointwise_codegen.py

+    '''Generates code for pointwise 1D convolution'''
+
+    def match(self, node):
+        return isinstance(node, Conv1D) and node.model.config.get_config_value('IOType') == 'io_parallel'


Why is there no check for filt_width == 1 here? Otherwise we generate functions we don't use and is incorrect at that.

added in

hls4ml/hls4ml/backends/vivado/passes/pointwise_codegen.py

Lines 68 to 69 in d56dc73

def match(self, node):

return isinstance(node, Conv1D) and node.model.config.get_config_value('IOType') == 'io_parallel' and node.get_attr('filt_width') == 1

vloncar · 2024-11-18T00:04:27Z

hls4ml/backends/vivado/passes/pointwise_codegen.py

+        return isinstance(node, Conv1D) and node.model.config.get_config_value('IOType') == 'io_parallel'
+
+    def transform(self, model, node):
+        node_class = node.__class__.__name__


Minor point, but we have node.class_name for this purpose, but wouldn't make a big difference in the check here.

The bigger question is why the check at all? In what cases can it fail?

removed check in

hls4ml/hls4ml/backends/vivado/passes/pointwise_codegen.py

Lines 71 to 72 in d56dc73

def transform(self, model, node):

self._generate_pointwise_conv1d(node)

vloncar · 2024-11-18T00:05:37Z

hls4ml/templates/vitis/nnet_utils/nnet_conv1d_latency.h

+    #pragma HLS ARRAY_PARTITION variable=biases complete dim=0
+
+    // Limit multipliers to control parallelization
+    constexpr unsigned multiplier_limit = DIV_ROUNDUP(


This was problematic before, and we moved to setting multiplier_limit in python and using it here. For consistency, we should not re-introduce old approaches.

updated to use the multiplier_limit from the config here:

hls4ml/hls4ml/templates/vitis/nnet_utils/nnet_conv1d_latency.h

Line 110 in d56dc73

#pragma HLS ALLOCATION operation instances=mul limit=CONFIG_T::mult_config::multiplier_limit

but need to check the value is the same

ok, fixed the multiplier limit now... but it would be good to warn the user somehow if they use a reuse_factor that doesn't divide in_width

related: it would probably be beneficial to factorize the two into a reuse_factor (that limits the multipliers) and a parallelization_factor (that controls how many times the conv is split into separate calls)

vloncar · 2024-11-18T00:07:48Z

hls4ml/templates/vitis/nnet_utils/nnet_conv1d_latency.h

+    for (int ii = 0; ii < CONFIG_T::out_width / CONFIG_T::reuse_factor; ii++) {
+        for (int ff = 0; ff < CONFIG_T::n_filt; ff++) {
+            #pragma HLS UNROLL
+            res[ii * CONFIG_T::n_filt + ff] = (res_T)(acc[ii][ff]);


Do you need to call cast() here to ensure compatibility with all variants of product?

added here:

hls4ml/hls4ml/templates/vitis/nnet_utils/nnet_conv1d_latency.h

Line 161 in d56dc73

res[ii * CONFIG_T::n_filt + ff] = cast<data_T, res_T, typename CONFIG_T::mult_config>(acc[ii][ff]);

vloncar · 2024-11-18T00:09:47Z

hls4ml/templates/vivado/nnet_utils/nnet_common.h

@@ -2,6 +2,7 @@
 #define NNET_COMMON_H_

 #include "ap_fixed.h"
+#include "nnet_helpers.h"


nnet_helpers.h (mostly) doesn't contain synthesizable code and it is not intended to be included by other files apart from the testbench. And I don't see you using anything from it in the code that is introduced

If I remove that, I get this error on my EL 8 machine with g++ 8.5.0

In file included from firmware/nnet_utils/nnet_conv1d_latency.h:4, from firmware/nnet_utils/nnet_code_gen.h:4, from firmware/parameters.h:7, from firmware/myproject.cpp:4: firmware/nnet_utils/nnet_common.h: In function ‘T nnet::reduce(const T*, Op)’: firmware/nnet_utils/nnet_common.h:37:39: error: there are no arguments to ‘floorlog2’ that depend on a template parameter, so a declaration of ‘floorlog2’ must be available [-fpermissive] static constexpr int leftN = pow2(floorlog2(N - 1)) > 0 ? pow2(floorlog2(N - 1)) : 0; ^~~~~~~~~ firmware/nnet_utils/nnet_common.h:37:39: note: (if you use ‘-fpermissive’, G++ will accept your code, but allowing the use of an undeclared name is deprecated) firmware/nnet_utils/nnet_common.h:37:68: error: there are no arguments to ‘floorlog2’ that depend on a template parameter, so a declaration of ‘floorlog2’ must be available [-fpermissive] static constexpr int leftN = pow2(floorlog2(N - 1)) > 0 ? pow2(floorlog2(N - 1)) : 0;

vloncar · 2024-11-18T00:11:53Z

hls4ml/templates/vivado/nnet_utils/nnet_conv1d_latency.h

@@ -84,5 +84,85 @@ void conv_1d_latency_cl(data_T data[CONFIG_T::in_width * CONFIG_T::n_chan],
    }
 }

+template <class data_T, class res_T, typename CONFIG_T>
+void pointwise_conv_1d_latency_cl(data_T data[CONFIG_T::in_width * CONFIG_T::n_chan / CONFIG_T::reuse_factor],


The files in Vitis/Vivado are mostly identical (apart from the very recent change in the final loop), perhaps we could remove the Vitis one (first we follow up with the changes)

Do you mean both nnet_conv1d.h and nnet_conv1d_latency.h?

There are a few differences like the use of inline region, etc. Do you mean to test if just using the vivado versions works in vitis now?

vloncar · 2024-11-18T00:12:11Z

hls4ml/templates/vivado/nnet_utils/nnet_conv1d_latency.h

+    #pragma HLS ARRAY_PARTITION variable=biases complete dim=0
+
+    // Limit multipliers to control parallelization
+    constexpr unsigned multiplier_limit = DIV_ROUNDUP(


Same comment as in the other file

vloncar · 2024-11-18T00:12:30Z

hls4ml/templates/vivado/nnet_utils/nnet_conv1d_latency.h

+    for (int ii = 0; ii < CONFIG_T::out_width / CONFIG_T::reuse_factor; ii++) {
+        for (int ff = 0; ff < CONFIG_T::n_filt; ff++) {
+            #pragma HLS UNROLL
+            res[ii * CONFIG_T::n_filt + ff] = (res_T)(acc[ii][ff]);


Same comment as in the other file

JanFSchulte · 2024-11-22T16:17:27Z

pre-commit.ci autofix

jmduarte and others added 14 commits June 21, 2023 07:23

merge

ea5c5a8

add pointwise

6849e0b

latency

0244b66

unroll

3ae7752

add hls unroll

23126b7

fix pragma from walkie

6aff9e9

[pre-commit.ci] auto fixes from pre-commit hooks

7f1c318

add test

69aecc6

pre-commit

4febced

pre-commit

56797e7

Merge branch 'main' into split_pointwise_conv_by_rf_rebase_latest

fc8cf1f

Merge branch 'main' into split_pointwise_conv_by_rf_rebase_latest

48609c6

Merge branch 'main' into split_pointwise_conv_by_rf_rebase_latest

7e11eea

use code gen

a01080d

jmduarte added the please test Trigger testing by creating local PR branch label Oct 8, 2023

fix indent

30c5c70

jmduarte added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Oct 8, 2023

jmduarte mentioned this pull request Oct 9, 2023

Explicit pointwise Conv1D implementation for "Latency" strategy #811

Closed

8 tasks

update rf

a05bf69

jmduarte added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Oct 11, 2023

vloncar reviewed Oct 11, 2023

View reviewed changes

jmduarte added 2 commits October 11, 2023 19:58

address vlad comments part 1

445b2cd

default 4096

1dd2603

jmduarte added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Oct 12, 2023

only add pointwise function when optimizing

04997c2

jmduarte added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Oct 15, 2023

jmitrevs added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Nov 12, 2024

jmitrevs reviewed Nov 12, 2024

View reviewed changes

hls4ml/backends/vivado/vivado_backend.py Outdated Show resolved Hide resolved

jmitrevs requested changes Nov 12, 2024

View reviewed changes

hls4ml/templates/vivado/nnet_utils/nnet_conv1d_latency.h Outdated Show resolved Hide resolved

jovan comments

8ebeefe

jmduarte added 4 commits November 12, 2024 21:09

p-clang-format

4099c8d

p-clang-format

d999ad8

fix

daae96d

Merge branch 'main' into split_pointwise_conv_by_rf_codegen

6d84b80

bo3z mentioned this pull request Nov 14, 2024

User-exposed optimiser skipping #1128

Open

jmitrevs added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Nov 15, 2024

Merge branch 'main' into split_pointwise_conv_by_rf_codegen

9e3fc8d

bo3z added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Nov 17, 2024

vloncar requested changes Nov 18, 2024

View reviewed changes

vladimir comments

d56dc73

jmduarte mentioned this pull request Nov 22, 2024

Vladimir comments jmduarte/hls4ml#22

Merged

fix n_in/n_out

dd021ec

jmduarte and others added 2 commits November 22, 2024 08:20

pre-commit

93acaa6

Merge branch 'main' into split_pointwise_conv_by_rf_codegen

0268c2f

JanFSchulte added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Nov 22, 2024

jmduarte and others added 2 commits November 25, 2024 08:09

fix resource strategy

1867dfc

Merge branch 'main' into split_pointwise_conv_by_rf_codegen

4a1c25a

JanFSchulte added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Dec 4, 2024

JanFSchulte merged commit 2fc8941 into fastmachinelearning:main Dec 4, 2024
5 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pointwise Conv1D with code generation for "Latency" strategy (update of #811) #881

Pointwise Conv1D with code generation for "Latency" strategy (update of #811) #881

jmduarte commented Oct 8, 2023 •

edited

Loading

jmduarte commented Nov 13, 2024

vloncar left a comment

vloncar Nov 18, 2024

jmduarte Nov 22, 2024

vloncar Nov 18, 2024

jmduarte Nov 22, 2024 •

edited

Loading

vloncar Nov 18, 2024

jmduarte Nov 22, 2024 •

edited

Loading

vloncar Nov 18, 2024

jmduarte Nov 22, 2024 •

edited

Loading

jmduarte Nov 22, 2024

vloncar Nov 18, 2024

jmduarte Nov 22, 2024

vloncar Nov 18, 2024

jmduarte Nov 22, 2024 •

edited

Loading

vloncar Nov 18, 2024

jmduarte Nov 22, 2024

vloncar Nov 18, 2024

vloncar Nov 18, 2024

JanFSchulte commented Nov 22, 2024

	def match(self, node):
	return isinstance(node, Conv1D) and node.model.config.get_config_value('IOType') == 'io_parallel' and node.get_attr('filt_width') == 1

	def transform(self, model, node):
	self._generate_pointwise_conv1d(node)

Pointwise Conv1D with code generation for "Latency" strategy (update of #811) #881

Pointwise Conv1D with code generation for "Latency" strategy (update of #811) #881

Conversation

jmduarte commented Oct 8, 2023 • edited Loading

Description

Type of change

Tests

Checklist

jmduarte commented Nov 13, 2024

vloncar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmduarte Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmduarte Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmduarte Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmduarte Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JanFSchulte commented Nov 22, 2024

jmduarte commented Oct 8, 2023 •

edited

Loading

jmduarte Nov 22, 2024 •

edited

Loading

jmduarte Nov 22, 2024 •

edited

Loading

jmduarte Nov 22, 2024 •

edited

Loading

jmduarte Nov 22, 2024 •

edited

Loading