Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Callback PR Rev 3 #615

Merged
merged 45 commits into from
May 28, 2020
Merged

Callback PR Rev 3 #615

merged 45 commits into from
May 28, 2020

Conversation

blisc
Copy link
Collaborator

@blisc blisc commented May 6, 2020

Rebased on top of Master. Replaces #597

Changelog:
Major:

  • Reworked the callbacks systems to be more user-friendly
  • Added new callbacks:
    • SimpleLogger and TensorboardLogger which replace SimpleLossLoggerCallback
    • WandBLogger replaces WandbCallback
    • CheckpointCallback has been updated to the new callback system but usage and functionality remains the same as before.
    • Old callbacks are still available at nemo.core.callbacks but they have been moved to a new deprecated_callbacks.py file
  • Remove the logic of __get_top_sorted_modules_and_dataloader and split it off into a new function topological_sort_from_leaves which now lives in core/neural_factory.py
    • It could be moved back into backends/pytorch/actions.py, might be a better idea
  • Removed PtActions.modules in favour of using AppState().modules
  • Renamed PtActions.epoch_num to PtActions.epoch
  • Removed some leftover code with respect to TrainableNeuralModuleWrapper
  • Created TrainingState which replaces the registered_modules inside of train()
  • Moved callback functions from Action functions to inside of PtActions.train()
  • Deleted __get_pytorch_module from Actions.
  • Split Actions and TrainingState from neural_factory.py to actions.py

Minor:

  • Added NeuralGraphs to Jasper_an4 example
  • Added a NmTensorNameRegistry and the ability for users to name tensors.

Tasks:

  • Create a NmTensor Registry
  • Add tensor.register inside of Neural Module calls NmTensor.init()
  • Allow users to rename their tensors
  • Creating state during steps
    • Resetting state after each step
    • Pass state to callbacks
    • Allow users to access tensors from state
  • Map state["loss"] to final_loss
  • Allow users to compute value of uncomputed tensors - there is a somewhat hacky working version atm
  • Clean up the code used to compute values of uncomputed tensors
  • Refactor SimpleLossLogger
  • Create function decorators
  • Add new TensorboardLogger, WandBLogger, and CheckpointCallback callbacks
  • Bug fixes
  • Deprecation warnings
  • Changelog
  • add tests
  • Add documentation

@lgtm-com
Copy link

lgtm-com bot commented May 6, 2020

This pull request introduces 8 alerts when merging b16d356 into d219483 - view on LGTM.com

new alerts:

  • 5 for Unused local variable
  • 3 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented May 7, 2020

This pull request introduces 10 alerts when merging 879fcfc into d219483 - view on LGTM.com

new alerts:

  • 6 for Unused import
  • 4 for Unused local variable

def action(self, action_obj):
self._action = action_obj

def on_action_start(self, state):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you proposed to have this set of "events"
on_train_start
on_epoch_start
on_optimizer_step_start
on_batch_start
on_batch_end
on_optimizer_step_stop
on_epoch_end
on_train_end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am against on_train_, on_optimizer_ - there are specific for training action, and we already got two other types of major actions aside of training...

I would still suggest to use on_iteration_* instead of on_step_*, but I can live with it - as long we all agree on that name ;)

nemo/core/callbacks.py Outdated Show resolved Hide resolved
@lgtm-com
Copy link

lgtm-com bot commented May 12, 2020

This pull request introduces 10 alerts when merging 35d6b7d into deef552 - view on LGTM.com

new alerts:

  • 6 for Unused import
  • 4 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented May 16, 2020

This pull request introduces 15 alerts when merging 3c7b89e into 06b04ca - view on LGTM.com

new alerts:

  • 8 for Unused import
  • 7 for Unused local variable

@blisc blisc marked this pull request as ready for review May 18, 2020 21:40
blisc added 2 commits May 18, 2020 16:05
Signed-off-by: Jason <[email protected]>
Signed-off-by: Jason <[email protected]>
@lgtm-com
Copy link

lgtm-com bot commented May 18, 2020

This pull request introduces 15 alerts when merging fa6553f into 9d90a95 - view on LGTM.com

new alerts:

  • 8 for Unused import
  • 7 for Unused local variable

blisc added 3 commits May 20, 2020 13:43
Signed-off-by: Jason <[email protected]>
Signed-off-by: Jason <[email protected]>
# =============================================================================


class NmTensorNameRegistry:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So any particular reason why those shouldn't be weak references?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you store names only - not important anymore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass

# Finally, add object to the set.
self._nmtensor_uniname_dict[tensor.unique_name] = tensor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a "strong reference" to tensor object...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this reference and switching _nmtensor_uniname_dict to be a set()

self._nmtensor_name_registry = nemo.core.neural_types.NmTensorNameRegistry()

@property
def tensor_names(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... And in here you suggest that this is only mapping from one name to the other.

So what is really TensorRegistry storing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep NmTensorRegistry is just a mapping of user's naming of nmtensors to their unique_names. It also keeps track of all unique_names so we can initialize the TrainingState object.

self.nf.reset_trainer()

@pytest.fixture()
def create_tensorboard_file(self):
Copy link
Contributor

@tkornuta-nvidia tkornuta-nvidia May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the tmpdir fixture from pytest

example:

tmp_file_name = str(tmpdir.mkdir("export").join("nested_list_export.yml"))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

loss_tensor = loss(predictions=y_pred, target=y)

# Mock up both std and stderr streams.
with logging.patch_stdout_handler(StringIO()) as std_out:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, now I finally understand why you added those methods to logging... but should they really be part of logging, not testing?

epoch_step_counter = [0]
epoch_batch_counter = [0]

@on_step_end
Copy link
Contributor

@tkornuta-nvidia tkornuta-nvidia May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow! Those decorators are nice! 👍

I haven't got that from the NeMoCallbacks API... what those are good for?

Copy link
Contributor

@tkornuta-nvidia tkornuta-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major:

  1. Remove all "dead" (commented) code.
  2. Move the graph parsing out of NeuralModuleFactory.py

Minor:

  1. Use save/load from nemo.backends instead of torch
  2. Use tmpdir in tests

tensor_value = self.tensor_dict[unique_name]
return tensor_value


class Actions(ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @blisc : move Actions + TrainingState + graph traversing-related code to a separate file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved Actions, TrainingState and topological_sort_from_leaves to a new file called nemo/core/actions.py

blisc added 2 commits May 27, 2020 15:55
Signed-off-by: Jason <[email protected]>
Signed-off-by: Jason <[email protected]>
@lgtm-com
Copy link

lgtm-com bot commented May 27, 2020

This pull request introduces 1 alert and fixes 3 when merging 9f4566b into 5d1527a - view on LGTM.com

new alerts:

  • 1 for Mismatch between signature and use of an overridden method

fixed alerts:

  • 2 for Wrong name for an argument in a class instantiation
  • 1 for Mismatch between signature and use of an overridden method

blisc added 2 commits May 27, 2020 17:14
Signed-off-by: Jason <[email protected]>
@lgtm-com
Copy link

lgtm-com bot commented May 28, 2020

This pull request introduces 1 alert and fixes 3 when merging 31fc556 into 5d1527a - view on LGTM.com

new alerts:

  • 1 for Mismatch between signature and use of an overridden method

fixed alerts:

  • 2 for Wrong name for an argument in a class instantiation
  • 1 for Mismatch between signature and use of an overridden method

@lgtm-com
Copy link

lgtm-com bot commented May 28, 2020

This pull request introduces 1 alert and fixes 3 when merging b9e4441 into 5d1527a - view on LGTM.com

new alerts:

  • 1 for Mismatch between signature and use of an overridden method

fixed alerts:

  • 2 for Wrong name for an argument in a class instantiation
  • 1 for Mismatch between signature and use of an overridden method

@lgtm-com
Copy link

lgtm-com bot commented May 28, 2020

This pull request introduces 2 alerts and fixes 3 when merging 1e429af into 5d1527a - view on LGTM.com

new alerts:

  • 1 for Unused import
  • 1 for Mismatch between signature and use of an overridden method

fixed alerts:

  • 2 for Wrong name for an argument in a class instantiation
  • 1 for Mismatch between signature and use of an overridden method

@lgtm-com
Copy link

lgtm-com bot commented May 28, 2020

This pull request introduces 2 alerts and fixes 3 when merging fdae1f3 into 5d1527a - view on LGTM.com

new alerts:

  • 1 for Unused import
  • 1 for Mismatch between signature and use of an overridden method

fixed alerts:

  • 2 for Wrong name for an argument in a class instantiation
  • 1 for Mismatch between signature and use of an overridden method

@@ -0,0 +1,298 @@
# ! /usr/bin/python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@okuchaiev okuchaiev merged commit 52449a4 into NVIDIA:master May 28, 2020
@blisc blisc deleted the U_callbacks_4 branch May 28, 2020 22:05
dcurran90 pushed a commit to dcurran90/NeMo that referenced this pull request Oct 15, 2024
This commit introduces the following chagnes:
1. Makes sure not to clone taxonomy in non-interactive mode when
   it already exists
2. Adds a message when git clone failed informing user to manually clone
   the repo
3. Adds multiple tests for both interactive and non-interactive lab init

Signed-off-by: Maciej Szulik <[email protected]>
Signed-off-by: Martin Hickey <[email protected]>
Co-authored-by: Martin Hickey <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants