Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic precision inference #855

Merged
merged 16 commits into from
Apr 19, 2024

Conversation

vloncar
Copy link
Contributor

@vloncar vloncar commented Aug 20, 2023

Description

This introduces the ability to specify auto as a precision string, that implies hls4ml should infer the precision in some way. This is not exposed by default via the config_from... functions for now. The goal is to have the framework for inferring types in some ways within hls4ml (e.g., QONNX parser) before fully exposing it to users. An initial inference of precision has been added via the infer_precision_types optimizer, based on previous attempts by various people. It's not advanced in any way. During testing, I encountered some issues with SeparableConv1D templates which I fixed.

Type of change

  • Bug fix (non-breaking change that fixes an issue) - Only related to the SeparableConv1D issue
  • New feature (non-breaking change which adds functionality)

Tests

There are new tests in test_auto_precision.py that cover the few use cases.

Checklist

  • I have read the guidelines for contributing.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have installed and run pre-commit on the files I edited or added.
  • I have added tests that prove my fix is effective or that my feature works.

@vloncar vloncar requested a review from jmitrevs August 20, 2023 21:01
@vloncar vloncar added the please test Trigger testing by creating local PR branch label Aug 20, 2023
@jmitrevs jmitrevs added this to the v0.8.0 milestone Sep 8, 2023
@jmitrevs jmitrevs added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Oct 6, 2023
@jmitrevs
Copy link
Contributor

Clang doesn't seem to like this

(fastml39) Jovans-Mac:hls4mlprj_auto_conv2d_Quartus_io_stream jmitrevs$ bash build_lib.sh 
In file included from firmware/myproject.cpp:2:
In file included from firmware/parameters.h:11:
firmware/nnet_utils/nnet_conv2d_stream.h:135:5: error: no matching function for call to 'shift_line_buffer_2d'
    nnet::shift_line_buffer_2d<data_T, CONFIG_T>(in_elem, line_buffer, shift_buffer);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
firmware/nnet_utils/nnet_conv2d_stream.h:199:13: note: in instantiation of function template specialization 'nnet::compute_output_buffer_2d<nnet::array<ac_fixed<16, 6>, 4>, nnet::array<ac_fixed<35, 15>, 4>, config8>' requested here
            compute_output_buffer_2d<data_T, res_T, CONFIG_T>(padds, res, line_buffer, kernel_window, weights, biases);
            ^
firmware/myproject.cpp:93:11: note: in instantiation of function template specialization 'nnet::conv_2d_cl<nnet::array<ac_fixed<16, 6>, 4>, nnet::array<ac_fixed<35, 15>, 4>, config8>' requested here
    nnet::conv_2d_cl<layer7_t, last_layer_result_t, config8>(layer7_out, layer8_out, w8, b8);
          ^
firmware/nnet_utils/nnet_conv2d_stream.h:69:6: note: candidate template ignored: substitution failure [with data_T = nnet::array<ac_fixed<16, 6>, 4>, CONFIG_T = config8]: zero-length arrays are not permitted in C++
void shift_line_buffer_2d(
     ^
1 error generated.
clang: error: no such file or directory: 'myproject.o'

while g++ compiles it just fine.

@jmitrevs
Copy link
Contributor

The issue seems to be that this:

    model.add(Conv2D(4, kernel_size=(1, 1), activation='relu', name='last_layer'))  # Will become PointwiseConv2D

doesn't actually become PointwiseConv2D for Quartus. The bug it uncovered seems tangential to this PR; nevertheless, we need to fix it, either as part of this PR or separately.

@vloncar
Copy link
Contributor Author

vloncar commented Oct 11, 2023

is this the same issue that was observed in #878?

@jmitrevs
Copy link
Contributor

is this the same issue that was observed in #878?

I believe so. If you have a filter of size 1, then things like line_buffer[CONFIG_T::filt_height - 1][CONFIG_T::n_chan] wind up with zero-size arrays.

@jmitrevs jmitrevs added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Nov 17, 2023

def _infer_precision(self, node, types_to_infer):
node_class = node.class_name
if node_class in ['Dense']:
Copy link
Contributor

@jmitrevs jmitrevs Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's better to use something like isinstance(node, Dense) instead of matching to a class name. Matching to a class name doesn't deal with inheritance. For example, I can see the BatchNormalization matching failing for ApplyAlpha matching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL, I did it specifically to avoid this since it is usually not what we want. The ApplyAlpha is an example of this.

Copy link
Contributor

@jmitrevs jmitrevs Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With QONNX ApplyAlpha does need the precision propagated. It worries me if derived classes by default have different behavior. That violates the "is a" principle. I think if you want different behavior you explicitly should code it. What happens in ApplyAlpha if you don't forbid it? Should QONNX not use ApplyAlpha in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You derive a class to have different behavior, not the same one. By principle, ApplyAlpha "is NOT" a BatchNormalization conceptually, it just happens to share the implementation (and honestly it shouldn't, both should have a same parent class, for example ScaleShift, then your logic would be good). Another example of this is DepthwiseConv2D vs Conv2D. It's true they are convolutions, but they have different behavior. I think it would be better for new layers that potentially inherit from other layers instead of the base Layer class to be unsupported by this rather then for them to be silently supported in a wrong way. I'm not gonna die on this hill though, if you feel strongly about this we could revisit, but I'd like stronger arguments 😄.

For ApplyAlpha, you can use it in QONNX, I can add it to have the same behavior as BN if that is what is needed.

return inferred_types

def _infer_dense_precision(self, node, types_to_infer):
n_ops = node.get_attr('n_in') * node.get_attr('n_out')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the total n_ops is the important value for the accumulator precision. Ignoring bias for now, each output value is a certain number of multiplies, resulting in the input_width + weight_width part of the equation, plus the accumulation, math.ceil(np.log2(num_acc)). But num_acc != num_ops. In particular, it's node.get_attr('n_in') (-1?), at least for the standard 1D Dense layer. Bias modifies things a bit, but the general trends are the same. This is the result we had from the CMS hackathon with Sioni: https://github.com/jmitrevs/hls4ml/blob/bit-correct/hls4ml/model/optimizer/passes/propagate_dense_precision.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, true, I'll fix this.


return ['result_t']

def _infer_common_precision(self, node, types_to_infer, n_ops):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this assumes integer or fixed precision types. Do we need to handle, e.g., xnor precision types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should test this PR with the binary model that has Xnor types. I wouldn't expect that you need to infer anything in that case, but perhaps it breaks the optimizer.

@jmitrevs
Copy link
Contributor

I made vloncar #53 into your branch with changes for dense and standard convolution. Let me know what you think. If you like the way I made this, I can also add signed/unsigned support to the other precision propagations (like merge, bn, sepconv) either in that PR or a different one.

def _infer_bn_precision(self, node, types_to_infer):
inferred_types = []

if 'scale_t' in types_to_infer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the input quantization in this case be the input mean and variance (+ other things), not the scale and bias?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(actually I have to see how this is handled by the qkeras parser)

@jmitrevs jmitrevs mentioned this pull request Feb 21, 2024
7 tasks
@@ -33,6 +33,7 @@
register_flow(
'convert',
[
'infer_precision_types',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer having this towards the end of convert, or just leaving the call out of convert, since we call it in the optimize flow anyway. What is the purpose of doing it here? Here it messes up some of the conversion steps I have for qonnx, which only happen if the type is not set, since it doesn't want to override set types, and this inferring here sets the types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"optimize" is sort-of optional, it belongs to "convert" because after that stage the other optimizers don't expect "auto" to exist. If onnx has its own optimizers that run, why not group them into a flow and run before convert? the idea is that after "convert" it shouldn't matter where the model came from

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we can move it later in concert, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have onnx-specific optimizers that must go before, place them before this one.

@jmitrevs jmitrevs mentioned this pull request Mar 12, 2024
8 tasks
calad0i added a commit to calad0i/hls4ml that referenced this pull request Apr 18, 2024
@jmitrevs jmitrevs added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Apr 18, 2024
@jmitrevs jmitrevs added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Apr 18, 2024
@jmitrevs jmitrevs merged commit 1616caf into fastmachinelearning:main Apr 19, 2024
9 checks passed
calad0i added a commit to calad0i/hls4ml that referenced this pull request Apr 26, 2024
latency pooling overhaul

vivado latency pooling overhaul

vitis latency pooling overhaul, fix comment

fix boundry cond

fix syn issues

latency pooling overhaul

Fix pooling accum_t autoset & avoid global override

[pre-commit.ci] auto fixes from pre-commit hooks

better way to get inp layer name

fix for vitis / input_t fetch

torch padding fix

avoid name dup in torch api test

rm pooling precision override in favor of fastmachinelearning#855
steltze pushed a commit to steltze/hls4ml that referenced this pull request Jul 11, 2024
latency pooling overhaul

vivado latency pooling overhaul

vitis latency pooling overhaul, fix comment

fix boundry cond

fix syn issues

latency pooling overhaul

Fix pooling accum_t autoset & avoid global override

[pre-commit.ci] auto fixes from pre-commit hooks

better way to get inp layer name

fix for vitis / input_t fetch

torch padding fix

avoid name dup in torch api test

rm pooling precision override in favor of fastmachinelearning#855
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
please test Trigger testing by creating local PR branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants