Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] InferenceEngine stores mixed time units in _model_times, depending on flow #3513

Closed
HolyFalafel opened this issue May 11, 2023 · 1 comment
Assignees
Labels
bug Something isn't working inference

Comments

@HolyFalafel
Copy link
Contributor

Bug description
In class InferenceEngine under deepspeed/inference/engine.py:
When using cuda_events, the measured model time is stored in ms.
Code:

    def _post_forward_hook(self, module, input, output):
        if self.use_cuda_events:
            self.timers(INFERENCE_MODEL_TIMER).stop()
            elapsed_time = self.timers(INFERENCE_MODEL_TIMER).elapsed(reset=True)

When not using cuda_events, the measured model time was stored in seconds.
Code:

        else:
            get_accelerator().synchronize()
            self._end = time.time()
            elapsed_time = self._end - self._start

Both of the values are stored under:

        self._model_times.append(elapsed_time)

Reproduction
Cd to deepspeed tests:

cd tests

Run the following unit test:

python -m pytest unit/inference/test_model_profiling.py::TestModelProfiling::test[False-True-roberta-base-fill-mask]  -m inference

Expected behavior
The test will pass.
But when printing out the results, for example:

count=0 e2e_t=895.174312 model_t=0.8529715538024902
count=1 e2e_t=7.500252 model_t=0.0041310787200927734
count=2 e2e_t=3.887346 model_t=0.0018568038940429688
count=3 e2e_t=3.577845 model_t=0.0016334056854248047
count=4 e2e_t=3.43976 model_t=0.0016703605651855469
count=5 e2e_t=3.310903 model_t=0.0016107559204101562
count=6 e2e_t=3.299556 model_t=0.001603841781616211
count=7 e2e_t=3.605722 model_t=0.0015969276428222656
count=8 e2e_t=3.273741 model_t=0.0015516281127929688
count=9 e2e_t=3.46306 model_t=0.0016617774963378906

We can see that the units difference is observed here, when model_t is in the order of 10e-3 comparing to e2e_t

ds_report output

$ ds_report 
/home/dsemiat/anaconda3/envs/py_venv_3.8_deepspeed4/lib/python3.8/site-packages/pandas/core/computation/expressions.py:20: UserWarning: Pandas requires version '2.7.3' or newer of 'numexpr' (version '2.7.2' currently installed).
  from pandas.core.computation.check import NUMEXPR_INSTALLED
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn cuda is not available from torch
 [WARNING]  sparse_attn requires a torch version >= 1.5 but detected 2.0
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
spatial_inference ...... [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/dsemiat/anaconda3/envs/py_venv_3.8_deepspeed4/lib/python3.8/site-packages/torch']
torch version .................... 2.0.0a0+gitcb066cd
torch cuda version ............... None
torch hip version ................ None
nvcc version .....................  [FAIL] cannot find CUDA_HOME via torch.utils.cpp_extension.CUDA_HOME=None 
deepspeed install path ........... ['/home/dsemiat/qnpu/deepspeed4/src/deepspeed-fork/deepspeed']
deepspeed info ................... 0.7.7+37b837fa, 37b837fa, HEAD
deepspeed wheel compiled w. ...... torch 1.13, cuda 0.0

Screenshots
If applicable, add screenshots to help explain your problem.

System info:

  • OS: Ubuntu 20.04
  • GPU count and types: one machine with HPU Gaudi2
  • Python version Python 3.8.15

Additional context
Opened a PR: #3501

@HolyFalafel HolyFalafel added bug Something isn't working inference labels May 11, 2023
@mrwyattii
Copy link
Contributor

Thank you @HolyFalafel - I've reviewed your PR and have it set to merge!

@mrwyattii mrwyattii self-assigned this May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working inference
Projects
None yet
Development

No branches or pull requests

2 participants