-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pointwise Conv1D with code generation for "Latency" strategy (update of #811) #881
Pointwise Conv1D with code generation for "Latency" strategy (update of #811) #881
Conversation
pre-commit.ci autofix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good, there are some minor tweaks to integrate cleaner. Works very well, which is more important :-)
template<unsigned K, unsigned S, unsigned W> | ||
using scale_index = nnet::{scale_index_type}<K, S, W>; | ||
template<class data_T, class res_T, class CONFIG_T> | ||
using pointwise_conv = nnet::{pointwise_fn}<data_T, res_T, CONFIG_T>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Dense
layers we moved to using a function pointer like this that the main dense()
function calls, eliminating the need for checks in HLS. Here it would also simplify the call hierarchy (no need for special handling of pointwise) and no need for this template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attempted in jmduarte@d56dc73
'''Generates code for pointwise 1D convolution''' | ||
|
||
def match(self, node): | ||
return isinstance(node, Conv1D) and node.model.config.get_config_value('IOType') == 'io_parallel' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there no check for filt_width == 1
here? Otherwise we generate functions we don't use and is incorrect at that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in
hls4ml/hls4ml/backends/vivado/passes/pointwise_codegen.py
Lines 68 to 69 in d56dc73
def match(self, node): | |
return isinstance(node, Conv1D) and node.model.config.get_config_value('IOType') == 'io_parallel' and node.get_attr('filt_width') == 1 |
return isinstance(node, Conv1D) and node.model.config.get_config_value('IOType') == 'io_parallel' | ||
|
||
def transform(self, model, node): | ||
node_class = node.__class__.__name__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor point, but we have node.class_name
for this purpose, but wouldn't make a big difference in the check here.
The bigger question is why the check at all? In what cases can it fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed check in
hls4ml/hls4ml/backends/vivado/passes/pointwise_codegen.py
Lines 71 to 72 in d56dc73
def transform(self, model, node): | |
self._generate_pointwise_conv1d(node) |
#pragma HLS ARRAY_PARTITION variable=biases complete dim=0 | ||
|
||
// Limit multipliers to control parallelization | ||
constexpr unsigned multiplier_limit = DIV_ROUNDUP( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was problematic before, and we moved to setting multiplier_limit
in python and using it here. For consistency, we should not re-introduce old approaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to use the multiplier_limit
from the config here:
#pragma HLS ALLOCATION operation instances=mul limit=CONFIG_T::mult_config::multiplier_limit |
but need to check the value is the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fixed the multiplier limit now... but it would be good to warn the user somehow if they use a reuse_factor
that doesn't divide in_width
related: it would probably be beneficial to factorize the two into a reuse_factor
(that limits the multipliers) and a parallelization_factor
(that controls how many times the conv is split into separate calls)
for (int ii = 0; ii < CONFIG_T::out_width / CONFIG_T::reuse_factor; ii++) { | ||
for (int ff = 0; ff < CONFIG_T::n_filt; ff++) { | ||
#pragma HLS UNROLL | ||
res[ii * CONFIG_T::n_filt + ff] = (res_T)(acc[ii][ff]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to call cast()
here to ensure compatibility with all variants of product
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added here:
res[ii * CONFIG_T::n_filt + ff] = cast<data_T, res_T, typename CONFIG_T::mult_config>(acc[ii][ff]); |
@@ -2,6 +2,7 @@ | |||
#define NNET_COMMON_H_ | |||
|
|||
#include "ap_fixed.h" | |||
#include "nnet_helpers.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nnet_helpers.h
(mostly) doesn't contain synthesizable code and it is not intended to be included by other files apart from the testbench. And I don't see you using anything from it in the code that is introduced
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remove that, I get this error on my EL 8 machine with g++ 8.5.0
In file included from firmware/nnet_utils/nnet_conv1d_latency.h:4,
from firmware/nnet_utils/nnet_code_gen.h:4,
from firmware/parameters.h:7,
from firmware/myproject.cpp:4:
firmware/nnet_utils/nnet_common.h: In function ‘T nnet::reduce(const T*, Op)’:
firmware/nnet_utils/nnet_common.h:37:39: error: there are no arguments to ‘floorlog2’ that depend on a template parameter, so a declaration of ‘floorlog2’ must be available [-fpermissive]
static constexpr int leftN = pow2(floorlog2(N - 1)) > 0 ? pow2(floorlog2(N - 1)) : 0;
^~~~~~~~~
firmware/nnet_utils/nnet_common.h:37:39: note: (if you use ‘-fpermissive’, G++ will accept your code, but allowing the use of an undeclared name is deprecated)
firmware/nnet_utils/nnet_common.h:37:68: error: there are no arguments to ‘floorlog2’ that depend on a template parameter, so a declaration of ‘floorlog2’ must be available [-fpermissive]
static constexpr int leftN = pow2(floorlog2(N - 1)) > 0 ? pow2(floorlog2(N - 1)) : 0;
@@ -84,5 +84,85 @@ void conv_1d_latency_cl(data_T data[CONFIG_T::in_width * CONFIG_T::n_chan], | |||
} | |||
} | |||
|
|||
template <class data_T, class res_T, typename CONFIG_T> | |||
void pointwise_conv_1d_latency_cl(data_T data[CONFIG_T::in_width * CONFIG_T::n_chan / CONFIG_T::reuse_factor], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The files in Vitis/Vivado are mostly identical (apart from the very recent change in the final loop), perhaps we could remove the Vitis one (first we follow up with the changes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean both nnet_conv1d.h
and nnet_conv1d_latency.h
?
There are a few differences like the use of inline region, etc. Do you mean to test if just using the vivado versions works in vitis now?
#pragma HLS ARRAY_PARTITION variable=biases complete dim=0 | ||
|
||
// Limit multipliers to control parallelization | ||
constexpr unsigned multiplier_limit = DIV_ROUNDUP( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in the other file
for (int ii = 0; ii < CONFIG_T::out_width / CONFIG_T::reuse_factor; ii++) { | ||
for (int ff = 0; ff < CONFIG_T::n_filt; ff++) { | ||
#pragma HLS UNROLL | ||
res[ii * CONFIG_T::n_filt + ff] = (res_T)(acc[ii][ff]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in the other file
pre-commit.ci autofix |
Description
Update of #811 with code generation. This PR adds an explicit pointwise Conv1D implementation, where the reuse factor (
RF
) is used to split the layer execution and reuse the existing moduleRF
timesOriginal pointwise Conv1D:
(in_width, n_chan) -> (in_width, n_filt)
This PR splits it into
RF
calls of(in_width/RF, n_chan) -> (in_width/RF, n_filt)
(in_width/RF, n_chan) -> (in_width/RF, n_filt)
(in_width/RF, n_chan) -> (in_width/RF, n_filt)
The II ~ RF. It is on by default, but I think you should be able to use the standard conv1d implementation by skipping the optimizer.
Limitations:
in_width
is divisible byRF
Type of change
Tests
See test/pytest/test_pointwiseconv.py
Checklist
pre-commit
on the files I edited or added.