-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Reshape of input array when the shape is available only at runtime is not possible #10789
Comments
Proposed Solution: A potential solution to this is to modify the existing Caffe2 does something similar in their resizeLike operator. |
Can we make this a feature request and add the label? Thanks! |
what did you use to draw the nice graphs? |
can you show a small example that reshape like that is absolutely required and reshape_like can't work? |
@zheng-da The example is there in the issue description. If we want the reshape to happen based on the shape generated by the Also seeing that caffe2 and other frameworks support reshape with the shape available only at runtime, if users are trying to move to MXNet for inference after building and training their models in Caffe2 or pytorch then it would not be possible. In fact, the OCR model I have mentioned in the issue description was built with Pytorch and exported into ONNX. And the customer wanted to import this model into MXNet and currently we do not support it. Here is the ONNX definition of Reshape and this is the reshape operator in caffe2 |
my understanding is that you want to do something like this:
my question is why something below is insufficient:
My main concern is that after you add the shape operator, how the first code work in the backward. I agree we should provide the shape for symbols. but adding this feature is very complex. It requires a fundamental change in the backward of MXNet. We need to add a special shape symbol so that we can still do shape inference and train a model. |
This is a mock scenario, where we will need the shape operator. var1 = mx.sym.var()
var2 = mx.sym.var()
s = mx.sym.shape(var1)
unsq = mx.sym.unsqueeze(s)
conc = mx.sym.concat(unsq, var3) # where var3 is another such sym.var
var2 = mx.sym.reshape(var2, conc)
constFill = mx.sym.ConstantFill(conc) # we currently do not have constant fill operator. The example you provided, more or less has the shape information upfront. But what if the shape is something that is computed and available only at runtime. Yes I understand that modifying the reshape operator will require fundamental changes. In fact I was discussing the exact same thing with @haojin2 yesterday, as to how we will do shape inference in the reshape operator. Do you have any suggestions? |
adding this feature requires at least creating a new shape symbol, change nnvm to handle shape symbols differently from normal symbols, rewrite the shape inference pass to propagate shape information. We need a good design for this, which should be compatible with the current shape inference scheme (I don't have any workable design). Even if we have a design, we need someone who is familiar with all of the components above to spend at least a month (likely much more) implementing it and making it work correctly. |
Shape operator itself doesn't have to be special. What needs special handling is using symbol in arguments which are normally kept as node attributes. We will also need zeroth-order tensor support that @tqchen has been requesting to avoid a mess. |
To get the shape of symbol is very important in some condition. It can be done as following: However, It’s difficult to get shape of a tensor or to define dims of weights according to shape of a tensor in HybridBlock in gluon. |
Currently the reshape operator in MXNet needs the shape attribute beforehand, which will be used as a parameter of the reshape operator.
And the Reshape_like operator takes two inputs and reshapes the first input based on the inferred shape of the second input.
But if I have two inputs where the second input is a shape tuple, and my input needs to be reshaped based on this shape tuple. It is not currently supported.
It would be good if mxnet supported this operation. Because if the
shape
attribute of thereshape operator
is generated at runtime, then we will need such an operator.Here is a mock instance of what we would need, where even after we implement the Shape and ConstantFill operator, it would not be possible to build this graph.
Here is a more real world example of a model that performs OCR, where the Shape operator is used to generate the shape of an array at runtime. If the output of this Shape operator was to be fed into a reshape operator then it would not be possible.
The text was updated successfully, but these errors were encountered: