Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNNprofiler: fix gates size retrieval logic in _rnn_flops #3921

Merged
merged 6 commits into from
Jul 24, 2023
6 changes: 3 additions & 3 deletions deepspeed/profiling/flops_profiler/profiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -980,11 +980,11 @@ def _reload_tensor_methods():


def _rnn_flops(flops, rnn_module, w_ih, w_hh, input_size):
input_size, hidden_size = w_ih.shape
gates_size = w_ih.shape[0]
# matrix matrix mult ih state and internal state
flops += 2 * input_size * hidden_size - hidden_size
flops += 2 * w_ih.shape[0] * w_ih.shape[1] - gates_size
# matrix matrix mult hh state and internal state
flops += 2 * hidden_size * hidden_size - hidden_size
flops += 2 * w_hh.shape[0] * w_hh.shape[1] - gates_size
if isinstance(rnn_module, (nn.RNN, nn.RNNCell)):
# add both operations
flops += rnn_module.hidden_size
Expand Down