Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip metric computation if output is None #1065

Open
vfdev-5 opened this issue May 24, 2020 · 5 comments
Open

Skip metric computation if output is None #1065

vfdev-5 opened this issue May 24, 2020 · 5 comments
Labels

Comments

@vfdev-5
Copy link
Collaborator

vfdev-5 commented May 24, 2020

🚀 Feature

Considering the context of #1024
and another point about using tensor.item() on each iteration with TPU which slows down the computation, we need to have a way to skip using state.output if set to None.

In training on TPU

def training_step(e, b):
    # ....
    
    # instead of returning each iteration loss value
    # return loss.item()
    # we would like to do it 
    return loss.item() if (e.state.iteration - 1) % 20 == 0 else None 

however, if we have a running metric attached to the output it will fail if encounter None.

So, idea is to check engine.state.output here and return before doing any metric's update:

def iteration_completed(self, engine: Engine) -> None:
output = self._output_transform(engine.state.output)

@sdesrozis @erip any thoughts ?

This isssue is related to #996 but the difference is that we still want to execute all other handlers on iteration even if output is None.

@sdesrozis
Copy link
Contributor

Why do not return the tensor loss in function training_step and do the fix internally ? Like that, if one day it is resolved, we will not have to change user code.

@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented May 24, 2020

@sdesrozis for XLA (see the link above):

Calling item() forces extra graph executions, which does indeed slow down performance.
This is why we suggest (unless one does not care of execution time) to do that once every ~20 steps:
https://github.com/pytorch/xla/blob/3b4b5790ba9e9141f19b0e33b3a7a7cd1bad8ead/test/test_train_mp_mnist.py#L129
if step % FLAGS.log_steps == 0:

Why do not return the tensor loss in function training_step and do the fix internally ? Like that, if one day it is resolved, we will not have to change user code.

IMO, such internal behaviour will be opaque for user who does not intend to skip loss value by returning a tensor. Another point is which rate we should use internally ?

@sdesrozis
Copy link
Contributor

Ok I agree what you said. But I’m really not fan, I hope it will be fixed by xla (if possible...)

@vfdev-5
Copy link
Collaborator Author

vfdev-5 commented Jun 6, 2020

An update on that. I tried to implement skipping and there can a potential roadblock with MetricUsage. For example, RunningAverage metric when setup to average output coming from engine.state.output needs to skip its update() AND compute().
While skip update() is easy with Metric class inside iteration_completed method by checking if engine.state.output is None. There is no simple way to skip compute() if update() was skipped. We can also imagine the situation with other types of metric usages...

@sdesrozis
Copy link
Contributor

compute() should be skipped if no update() has been triggered until last reset(), right ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants