Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When Onnx Matmul inputs have different dimension #133

Closed
doo5643 opened this issue Jan 18, 2023 · 6 comments
Closed

When Onnx Matmul inputs have different dimension #133

doo5643 opened this issue Jan 18, 2023 · 6 comments
Labels
OP:ConvTranspose OP:ConvTranspose OP:MatMul OP:MatMul Parameter replacement Use Parameter replacement

Comments

@doo5643
Copy link

doo5643 commented Jan 18, 2023

Issue Type

Others

onnx2tf version number

1.5.18

onnx version number

1.13.0

tensorflow version number

2.10.0

Download URL for ONNX

test_model.zip

Parameter Replacement JSON

Not used

Description

  1. Purpose: Currently getting help from this library for research project of anomaly detection task. Thank you for building this library.
  2. What: Two inputs A, B of onnx Matmul ops are not equal. Error log is attached. error_log.txt
  3. How: I tried fixing it using param_replacement.json. But For Matmul ops, only axis transposing is suported, so I am not able to add dimension on the input.
    image
INFO:  input_name.1: input.35 shape: [1, 256, 66] dtype: float32
INFO:  input_name.2: onnx::MatMul_73 shape: [66, 2] dtype: <class 'numpy.float32'>

We are thinking if we can add one more dimension in input_name.2: onnx::MatMul_73 to [1, 66, 2], it can be resolved.

  1. Why: This onnx model is converted from torch module and using Autoencoder structure. It is important for us to keep current network structure. So we are hoping to be able to fix this dimension unequality issue.
@PINTO0309 PINTO0309 added the Parameter replacement Use Parameter replacement label Jan 18, 2023
@PINTO0309
Copy link
Owner

It can be avoided by a simple transposition of dimensions.

{
  "format_version": 1,
  "operations": [
    {
      "op_name": "MatMul_8",
      "param_target": "inputs",
      "param_name": "input.35",
      "pre_process_transpose_perm": [0,2,1]
    }
  ]
}

@PINTO0309 PINTO0309 added the OP:MatMul OP:MatMul label Jan 18, 2023
@PINTO0309
Copy link
Owner

@doo5643
Copy link
Author

doo5643 commented Jan 18, 2023

@PINTO0309 san, thank you a lot for your help. Another error happened in ConvTranspose_53, but it is resolved with permute.
In total, the conversion works well with below json.

{
    "format_version": 1,
    "operations": [
      {
        "op_name": "MatMul_8",
        "param_target": "inputs",
        "param_name": "input.35",
        "pre_process_transpose_perm": [0,2,1]
      },
      {
        "op_name": "Add_11",
        "param_target": "outputs",
        "param_name": "onnx::ConvTranspose_53",
        "post_process_transpose_perm": [0,2,1]
      }
    ]
  }

Successful job's log is attached. success_log.txt

I see one more MatMul ops having shape [1, 256, 2], [2, 66] as inputs. And I wonder why this layer does not throw an error like
MatMul_8 layer. Can I have any comments on this part?

<MatMul_10 not throwing error>

INFO: onnx_op_type: MatMul onnx_op_name: MatMul_10
INFO:  input_name.1: onnx::MatMul_50 shape: [1, 256, 2] dtype: float32
INFO:  input_name.2: onnx::MatMul_74 shape: [2, 66] dtype: <class 'numpy.float32'>
INFO:  output_name.1: onnx::Add_52 shape: [1, 256, 66] dtype: float32
INFO: tf_op_type: matmul
INFO:  input.1.a: name: tf.math.add_3/Add:0 shape: (1, 256, 2) dtype: <dtype: 'float32'>
INFO:  input.2.b: shape: (2, 66) dtype: float32
INFO:  input.3.output_type: name: float32 shape: ()
INFO:  output.1.output: name: tf.linalg.matmul_1/MatMul:0 shape: (1, 256, 66) dtype: <dtype: 'float32'>
<MatMul_8 throwing error before using param replace json >

INFO: onnx_op_type: MatMul onnx_op_name: MatMul_8
INFO:  input_name.1: input.35 shape: [1, 256, 66] dtype: float32
INFO:  input_name.2: onnx::MatMul_73 shape: [66, 2] dtype: <class 'numpy.float32'>
INFO:  output_name.1: onnx::Add_49 shape: [1, 256, 2] dtype: float32
ERROR: The trace log is below.
Traceback (most recent call last):
... 
Dimensions must be equal, but are 256 and 66 for '{{node tf.linalg.matmul/MatMul}} = BatchMatMulV2[T=DT_FLOAT, adj_x=false, adj_y=false](Placeholder, tf.linalg.matmul/MatMul/b)' with input shapes: [1,66,256], [66,2].

@PINTO0309
Copy link
Owner

PINTO0309 commented Jan 18, 2023

For you to properly understand what is going on here, you need to understand the inner workings of the tool. It is not easy to explain, as it involves many fairly complex operations.

I will only list the main points in bullet points.

  • ONNX and other ML models do not keep any channel information inside the model file.
  • Therefore, it is absolutely impossible to determine on the surface whether the tensor [1,256,5] is [N,H,C], [H,W,C], or [N,W,C].
  • For example, a multiplication such as [55,55] * [55,55] cannot determine the meaning of each dimension. This is the same for humans.
  • The tool attempts to automatically determine the dimension of the transposition destination by examining the relationship between the dimensions of the tensors before and after as much as possible, but for the aforementioned reasons, special tensors such as 3-dimensional or 6-dimensional may be misjudged. (This is described in the README, so please read it)
  • In such a situation, it is very difficult to determine how the channels of the tensor correspond to the 3-dimensional or 6-dimensional or larger tensor in the intermediate structure part of the model, since ONNX and TensorFlow expect a tensor structure of NCHW and NHWC, respectively.
  • This difficulty does not occur if the system is such that each operation is always transposed to NCHW, as in onnx-tensorflow.
  • This means that for tensors of structures other than 4 or 5 dimensional, the tool may make a wrong decision about transposition.
  • Models with an attentional mechanism such as Transformer contain a large number of tensors and MatMul with 3-dimensional channels, so there is a very high probability that the tool will make a mistake in determining the dimension of the transposition destination.

The tool is designed with the assumption that we humans accept that we may lose track of channel locations in the presence of special transpositions. Therefore, the JSON file is a mechanism for the person who designed the model to tell the tool what channel placement is in the intermediate structural part of the model.

@doo5643
Copy link
Author

doo5643 commented Jan 18, 2023

Thank you for your response. Understood.

@PINTO0309
Copy link
Owner

PINTO0309 commented Feb 4, 2023

This is only a temporary fix, but your model no longer produces conversion errors without using JSON, and accuracy errors no longer occur.
https://github.com/PINTO0309/onnx2tf/releases/tag/1.5.43

onnx2tf -i ts_ad_model.onnx -cotof -cotoa 1e-4

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OP:ConvTranspose OP:ConvTranspose OP:MatMul OP:MatMul Parameter replacement Use Parameter replacement
Projects
None yet
Development

No branches or pull requests

2 participants