-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LSTM to the operator list #321
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wchao1115 , thanks for your great work. lstm is an important rnn network, webnn should support it.
Sorry for the late reply, I was out sick for a few days.
index.bs
Outdated
- *weight*: an {{MLOperand}}. The 2-D input weight tensor of shape [3 * hidden_size, input_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the *layout* argument. | ||
- *recurrentWeight*: an {{MLOperand}}. The 2-D recurrent weight tensor of shape [3 * hidden_size, hidden_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the *layout* argument. | ||
- *weight*: an {{MLOperand}}. The 2-D input weight tensor of shape [3 * hidden_size, input_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the *options.layout* argument. | ||
- *recurrentWeight*: an {{MLOperand}}. The 2-D recurrent weight tensor of shape [3 * hidden_size, hidden_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the *options.layout* argument. | ||
- *hiddenState*: an {{MLOperand}}. The 2-D input hidden state tensor of shape [batch_size, hidden_size]. | ||
- *hiddenSize*: a {{long}} scalar. The value of the second dimension of the output tensor shape. It indicates the number of features in the hidden state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should hiddenSize
be an unsigned long
scalar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to change hiddenSize
of gruCell
to unsigned long
as well.
New commit should address all the feedbacks here. Please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update, you may may miss to change two hiddenSize
to unsigned long
scalar. Others look good to me.
index.bs
Outdated
- *weight*: an {{MLOperand}}. The 2-D input weight tensor of shape [3 * hidden_size, input_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the *layout* argument. | ||
- *recurrentWeight*: an {{MLOperand}}. The 2-D recurrent weight tensor of shape [3 * hidden_size, hidden_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the *layout* argument. | ||
- *weight*: an {{MLOperand}}. The 2-D input weight tensor of shape [3 * hidden_size, input_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the *options.layout* argument. | ||
- *recurrentWeight*: an {{MLOperand}}. The 2-D recurrent weight tensor of shape [3 * hidden_size, hidden_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the *options.layout* argument. | ||
- *hiddenState*: an {{MLOperand}}. The 2-D input hidden state tensor of shape [batch_size, hidden_size]. | ||
- *hiddenSize*: a {{long}} scalar. The value of the second dimension of the output tensor shape. It indicates the number of features in the hidden state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to change hiddenSize
of gruCell
to unsigned long
as well.
index.bs
Outdated
- *recurrentWeight*: an {{MLOperand}}. The 2-D recurrent weight tensor of shape [4 * hidden_size, hidden_size]. The ordering of the weight vectors in the first dimension of the tensor shape is specified according to the *options.layout* argument. | ||
- *hiddenState*: an {{MLOperand}}. The 2-D input hidden state tensor of shape [batch_size, hidden_size]. | ||
- *cellState*: an {{MLOperand}}. The 2-D input cell state tensor of shape [batch_size, hidden_size]. | ||
- *hiddenSize*: a {{long}} scalar. The value of the second dimension of the output tensor shape. It indicates the number of features in the hidden state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good with a few nits that I missed in the last review. Sorry and please take a look.
index.bs
Outdated
@@ -1391,9 +1391,9 @@ partial interface MLGraphBuilder { | |||
</div> | |||
|
|||
### The gru() method ### {#api-mlgraphbuilder-gru} | |||
Gated Recurrent Unit [[GRU]] recurrent network using an update gate and a reset gate to compute the hidden state that rolls into the output across the temporal sequence of the Network | |||
Gated Recurrent Unit [[GRU]] recurrent network uses an update, reset, and new gate to compute the output state that rolls into the output across the temporal sequence of the network |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Period at end.
@@ -1411,33 +1411,34 @@ dictionary MLGruOptions { | |||
boolean resetAfter = true; | |||
boolean returnSequence = false; | |||
MLRecurrentNetworkDirection direction = "forward"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since MLRecurrentNetworkDirection
is also used by MLLstmOptions
too, should this shared definition be outside the gru-specific section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't dedicate a section for an enum. Since gru
comes first, I just parked this enum there.
index.bs
Outdated
|
||
**Returns:** an {{MLOperand}}. The 2-D tensor of shape [batch_size, hidden_size], the cell output hidden state of a single time step of the recurrent network. | ||
|
||
<div class="note"> | ||
The behavior of this operation when the activations of the update/reset gate and new gate are of the operator types *sigmoid* and *tanh* respectively can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint. | ||
The behavior of this operation when the weight layout is the default *"zrn"* layout, and the activation functions of the update/reset gate and new gate are of the operator types *sigmoid* and *tanh* respectively can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"as follow" -> "as follows"
Wording is awkward because it piles on many intermediate phrases before the final verb clause. How about:
"The behavior of this operation can be generically emulated via other operations as shown below, when the weight layout is the default "zrn" layout, and the activation functions of the update/reset gate and new gate are of the operator types sigmoid and tanh respectively."
<div algorithm=lstm> | ||
**Arguments:** | ||
- *input*: an {{MLOperand}}. The input 3-D tensor of shape [steps, batch_size, input_size]. | ||
- *weight*: an {{MLOperand}}. The 3-D input weight tensor of shape [num_directions, 4 * hidden_size, input_size]. The ordering of the weight vectors in the second dimension of the tensor shape is specified according to the *options.layout* argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming inconsistencies:
hidden_size
(line 1873) vs hiddenSize
(line 1866)
num_directions
vs numDirections
(line 1893)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. But I'm not going to address this issue in this PR. It probably needs a round of cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the great work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wchao1115 thanks for adding these important ops. I'm clearing my review with a few grammar nits only since this PR has been already reviewed quite extensively.
I'll put this PR on our next call agenda and expect we can land this soon for timely inclusion into the CR scope. We explicitly mention LSTM alongside RNN in our current charter, so good to get this in from that point of view too.
index.bs
Outdated
- *direction*: a {{MLRecurrentNetworkDirection}}. The processing direction of the input sequence. When set to *"both"*, the size of the first dimension of the weight and the bias tensor shapes must be 2, and the input is processed in both directions. | ||
- *layout*: a {{MLRecurrentNetworkWeightLayout}}. The ordering of the weight and bias vectors for the internal gates of GRU, specifically the *update (z)*, *reset (r)*, and *new (n)* gate, as indicated in the second dimension of the weight and bias tensor shape. When not specified, the default layout is *"zrn"*. | ||
- *activations*: a sequence of {{MLOperator}}. A pair of activation functions with the first function used for the update and reset gate, and the second used for the new gate. When not specified, it's assumed to be the sigmoid (*"sigmoid"*) and the hyperbolic tangent (*"tanh"*) function respectively. | ||
- *layout*: a {{MLGruWeightLayout}}. The ordering of the weight and bias vectors for the internal gates of GRU, specifically the *update (z)*, *reset (r)*, and *new (n)* gate, as indicated in the second dimension of the weight and bias tensor shape. When not specified, the default layout is *"zrn"*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/a/an
index.bs
Outdated
- *initialHiddenState*: an {{MLOperand}}. The 3-D initial hidden state tensor of shape [num_directions, batch_size, hidden_size]. When not specified, it's assumed to be a tensor filled with zero. | ||
- *initialCellState*: an {{MLOperand}}. The 3-D initial hidden state tensor of shape [num_directions, batch_size, hidden_size]. When not specified, it's assumed to be a tensor filled with zero. | ||
- *returnSequence*: a {{boolean}} indicating whether to also return the entire sequence with every output from each time step in it in addition to the output of the last time step. Default to false. | ||
- *direction*: a {{MLRecurrentNetworkDirection}}. The processing direction of the input sequence. When set to *"both"*, the size of the first dimension of the weight and the bias tensor shapes must be 2, and the input is processed in both directions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/a/an
index.bs
Outdated
- *initialCellState*: an {{MLOperand}}. The 3-D initial hidden state tensor of shape [num_directions, batch_size, hidden_size]. When not specified, it's assumed to be a tensor filled with zero. | ||
- *returnSequence*: a {{boolean}} indicating whether to also return the entire sequence with every output from each time step in it in addition to the output of the last time step. Default to false. | ||
- *direction*: a {{MLRecurrentNetworkDirection}}. The processing direction of the input sequence. When set to *"both"*, the size of the first dimension of the weight and the bias tensor shapes must be 2, and the input is processed in both directions. | ||
- *layout*: a {{MLLstmWeightLayout}}. The ordering of the weight and bias vectors for the internal gates of LSTM, specifically the *input (i)*, *output (o)*, *forget (f)*, and *cell (g)* gate, as indicated in the second dimension of the weight and bias tensor shapes. When not specified, the default layout is *"iofg"*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/a/an
index.bs
Outdated
- *bias*: an {{MLOperand}}. The 1-D input bias tensor of shape [4 * hidden_size]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the *options.layout* argument. | ||
- *recurrentBias*: an {{MLOperand}}. The 1-D recurrent bias tensor of shape [4 * hidden_size]. The ordering of the bias vectors in the first dimension of the tensor shape is specified according to the *options.layout* argument. | ||
- *peepholeWeight*: an {{MLOperand}}. The 1-D weight tensor for peepholes of shape [3 * hidden_size]. The pack ordering of the weight vectors is for the *input (i)*, *output (o)*, and *forget (f)* gate respectively. | ||
- *layout*: a {{MLLstmWeightLayout}}. The ordering of the weight and bias vectors for the internal gates of LSTM, specifically the *input (i)*, *output (o)*, *forget (f)*, and *cell (g)* gate, as indicated in the first dimension of the weight and bias tensor shapes. When not specified, the default layout is *"iofg"*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/a/an
SHA: d87f5d2 Reason: push, by wchao1115 Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This CL implements a WebNN spec change [1] that, instead of exposing the generic MLOperator interface, exposes the MLActivation interface as a more specific definition of an activation function. Because MLOperator is not exposed to JavaScript anymore, instead of inheriting from ScriptWrappable, it inherits from GarbageCollected. MLOperator is still used internally by MLGraph to represent the graph node that connects with MLOperands. The standalone MLOperator is used by MLActivation to represent the activation function type and keep the additional attributes. The unit tests and WPT baseline are updated according to this interface change. [1]: webmachinelearning/webnn#321 Bug: 1273291 Change-Id: Ia5084a69b6756e54d1c5c05140df4f7844c88bd2 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4298931 Reviewed-by: Jiewei Qian <[email protected]> Commit-Queue: ningxin hu <[email protected]> Cr-Commit-Position: refs/heads/main@{#1114320}
Finally had a chance to do this. Adding
lstm
andlstmCell
as additional operations in the spec. Since the addition ofgru
andgruCell
there have been numerous questions about LSTM. As a widely popular recurrent network, supporting LSTM along with GRU is generally seen as important for API completeness.This change also made some editorial cleanups of existing wordings for GRU, and proposes a name change of the existing
MLOperator
toMLActivation
to reflect its true purpose in the spec as a generic definition of an activation function i.e. not every operation can be an activation function, etc.Preview | Diff