Callback PR Rev 3 #615

blisc · 2020-05-06T20:55:10Z

Rebased on top of Master. Replaces #597

Changelog:
Major:

Reworked the callbacks systems to be more user-friendly
Added new callbacks:
- SimpleLogger and TensorboardLogger which replace SimpleLossLoggerCallback
- WandBLogger replaces WandbCallback
- CheckpointCallback has been updated to the new callback system but usage and functionality remains the same as before.
- Old callbacks are still available at nemo.core.callbacks but they have been moved to a new deprecated_callbacks.py file
Remove the logic of __get_top_sorted_modules_and_dataloader and split it off into a new function topological_sort_from_leaves which now lives in core/neural_factory.py
- It could be moved back into backends/pytorch/actions.py, might be a better idea
Removed PtActions.modules in favour of using AppState().modules
Renamed PtActions.epoch_num to PtActions.epoch
Removed some leftover code with respect to TrainableNeuralModuleWrapper
Created TrainingState which replaces the registered_modules inside of train()
Moved callback functions from Action functions to inside of PtActions.train()
Deleted __get_pytorch_module from Actions.
Split Actions and TrainingState from neural_factory.py to actions.py

Minor:

Added NeuralGraphs to Jasper_an4 example
Added a NmTensorNameRegistry and the ability for users to name tensors.

Tasks:

Signed-off-by: Jason <[email protected]>

lgtm-com · 2020-05-06T21:12:13Z

This pull request introduces 8 alerts when merging b16d356 into d219483 - view on LGTM.com

new alerts:

5 for Unused local variable
3 for Unused import

Signed-off-by: Jason <[email protected]>

lgtm-com · 2020-05-07T00:31:17Z

This pull request introduces 10 alerts when merging 879fcfc into d219483 - view on LGTM.com

new alerts:

6 for Unused import
4 for Unused local variable

examples/asr/jasper_an4_debug.py

nemo/core/neural_factory.py

nemo/core/neural_types/nmtensor_registry.py

nemo/utils/neural_graph/object_registry.py

nemo/core/neural_types/neural_type.py

nemo/core/neural_factory.py

okuchaiev · 2020-05-07T18:03:53Z

nemo/core/callbacks.py

+    def action(self, action_obj):
+        self._action = action_obj
+
+    def on_action_start(self, state):


I thought you proposed to have this set of "events"
on_train_start
on_epoch_start
on_optimizer_step_start
on_batch_start
on_batch_end
on_optimizer_step_stop
on_epoch_end
on_train_end

I am against on_train_, on_optimizer_ - there are specific for training action, and we already got two other types of major actions aside of training...

I would still suggest to use on_iteration_* instead of on_step_*, but I can live with it - as long we all agree on that name ;)

nemo/core/callbacks.py

…tion Signed-off-by: Jason <[email protected]>

Signed-off-by: Jason <[email protected]>

lgtm-com · 2020-05-12T21:24:38Z

This pull request introduces 10 alerts when merging 35d6b7d into deef552 - view on LGTM.com

new alerts:

6 for Unused import
4 for Unused local variable

Signed-off-by: Jason <[email protected]>

lgtm-com · 2020-05-16T00:27:25Z

This pull request introduces 15 alerts when merging 3c7b89e into 06b04ca - view on LGTM.com

new alerts:

8 for Unused import
7 for Unused local variable

…rdLoggerCallback Signed-off-by: Jason <[email protected]>

Signed-off-by: Jason <[email protected]>

lgtm-com · 2020-05-18T23:23:20Z

This pull request introduces 15 alerts when merging fa6553f into 9d90a95 - view on LGTM.com

new alerts:

8 for Unused import
7 for Unused local variable

Signed-off-by: Jason <[email protected]>

tkornuta-nvidia · 2020-05-27T19:55:55Z

nemo/core/neural_types/nmtensor_registry.py

+# =============================================================================
+
+
+class NmTensorNameRegistry:


So any particular reason why those shouldn't be weak references?

if you store names only - not important anymore

See #615 (comment)

tkornuta-nvidia · 2020-05-27T19:56:50Z

nemo/core/neural_types/nmtensor_registry.py

+            pass
+
+        # Finally, add object to the set.
+        self._nmtensor_uniname_dict[tensor.unique_name] = tensor


This looks like a "strong reference" to tensor object...

Removing this reference and switching _nmtensor_uniname_dict to be a set()

tkornuta-nvidia · 2020-05-27T19:57:27Z

nemo/utils/app_state.py

+        self._nmtensor_name_registry = nemo.core.neural_types.NmTensorNameRegistry()
+
+    @property
+    def tensor_names(self):


... And in here you suggest that this is only mapping from one name to the other.

So what is really TensorRegistry storing?

Yep NmTensorRegistry is just a mapping of user's naming of nmtensors to their unique_names. It also keeps track of all unique_names so we can initialize the TrainingState object.

tests/unit/core/test_nemo_callbacks.py

tkornuta-nvidia · 2020-05-27T19:59:19Z

tests/unit/core/test_nemo_callbacks.py

+        self.nf.reset_trainer()
+
+    @pytest.fixture()
+    def create_tensorboard_file(self):


please use the tmpdir fixture from pytest

example:

NeMo/tests/unit/core/neural_module/test_module_configuration_export.py

Line 101 in d91f349

tmp_file_name = str(tmpdir.mkdir("export").join("nested_list_export.yml"))

tkornuta-nvidia · 2020-05-27T20:07:01Z

tests/unit/core/test_nemo_callbacks.py

+        loss_tensor = loss(predictions=y_pred, target=y)
+
+        # Mock up both std and stderr streams.
+        with logging.patch_stdout_handler(StringIO()) as std_out:


ok, now I finally understand why you added those methods to logging... but should they really be part of logging, not testing?

tkornuta-nvidia · 2020-05-27T20:09:17Z

tests/unit/core/test_nemo_callbacks.py

+        epoch_step_counter = [0]
+        epoch_batch_counter = [0]
+
+        @on_step_end


wow! Those decorators are nice! 👍

I haven't got that from the NeMoCallbacks API... what those are good for?

tkornuta-nvidia

Major:

Remove all "dead" (commented) code.
Move the graph parsing out of NeuralModuleFactory.py

Minor:

Use save/load from nemo.backends instead of torch
Use tmpdir in tests

tkornuta-nvidia · 2020-05-27T21:13:47Z

nemo/core/neural_factory.py

+            tensor_value = self.tensor_dict[unique_name]
+        return tensor_value
+
+
 class Actions(ABC):


Discussed with @blisc : move Actions + TrainingState + graph traversing-related code to a separate file

Moved Actions, TrainingState and topological_sort_from_leaves to a new file called nemo/core/actions.py

Signed-off-by: Jason <[email protected]>

lgtm-com · 2020-05-27T23:45:57Z

This pull request introduces 1 alert and fixes 3 when merging 9f4566b into 5d1527a - view on LGTM.com

new alerts:

1 for Mismatch between signature and use of an overridden method

fixed alerts:

2 for Wrong name for an argument in a class instantiation
1 for Mismatch between signature and use of an overridden method

Signed-off-by: Jason <[email protected]>

lgtm-com · 2020-05-28T00:27:19Z

This pull request introduces 1 alert and fixes 3 when merging 31fc556 into 5d1527a - view on LGTM.com

new alerts:

1 for Mismatch between signature and use of an overridden method

fixed alerts:

2 for Wrong name for an argument in a class instantiation
1 for Mismatch between signature and use of an overridden method

Signed-off-by: Jason <[email protected]>

lgtm-com · 2020-05-28T00:37:58Z

This pull request introduces 1 alert and fixes 3 when merging b9e4441 into 5d1527a - view on LGTM.com

new alerts:

1 for Mismatch between signature and use of an overridden method

fixed alerts:

2 for Wrong name for an argument in a class instantiation
1 for Mismatch between signature and use of an overridden method

Signed-off-by: Jason <[email protected]>

lgtm-com · 2020-05-28T19:00:43Z

This pull request introduces 2 alerts and fixes 3 when merging 1e429af into 5d1527a - view on LGTM.com

new alerts:

1 for Unused import
1 for Mismatch between signature and use of an overridden method

fixed alerts:

2 for Wrong name for an argument in a class instantiation
1 for Mismatch between signature and use of an overridden method

Signed-off-by: Jason <[email protected]>

lgtm-com · 2020-05-28T19:12:18Z

This pull request introduces 2 alerts and fixes 3 when merging fdae1f3 into 5d1527a - view on LGTM.com

new alerts:

1 for Unused import
1 for Mismatch between signature and use of an overridden method

fixed alerts:

2 for Wrong name for an argument in a class instantiation
1 for Mismatch between signature and use of an overridden method

okuchaiev · 2020-05-28T20:58:49Z

nemo/core/actions.py

@@ -0,0 +1,298 @@
+# ! /usr/bin/python


This commit introduces the following chagnes: 1. Makes sure not to clone taxonomy in non-interactive mode when it already exists 2. Adds a message when git clone failed informing user to manually clone the repo 3. Adds multiple tests for both interactive and non-interactive lab init Signed-off-by: Maciej Szulik <[email protected]> Signed-off-by: Martin Hickey <[email protected]> Co-authored-by: Martin Hickey <[email protected]>

Rebase off of master; add new working prototype of loss callback

b16d356

Signed-off-by: Jason <[email protected]>

blisc added 2 commits May 6, 2020 17:22

first working hack of computing uncomputed tensors

8024454

Signed-off-by: Jason <[email protected]>

style

879fcfc

Signed-off-by: Jason <[email protected]>