Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph freezing and unfreezing #614

Merged
merged 106 commits into from
May 7, 2020
Merged

Graph freezing and unfreezing #614

merged 106 commits into from
May 7, 2020

Conversation

tkornuta-nvidia
Copy link
Contributor

This PR introduces two graph "actions":

  • A method to freeze the weights of (all/selected) trainable modules in a graph
  • A method to unfreeze the weights of (all/selected) trainable modules in a graph

In order to make it work “in an elegant” way they PR introduces a “back-end independent module type” (ModuleType enum).

tkornuta-nvidia and others added 30 commits March 24, 2020 13:39
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Jason <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
…stry, starting to work on graph unit tests

Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
…ionality with unit/integration tests

Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
…o step_number:module_name:port_name (or in case of tensors: step_number:port_name). In short: enabled neural graphs to handle loops. Polished the code, unit tests and examples working. Added neural tensor type export along each connection between modules

Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
…d output ports, Polishes, cleanups, unit tests working

Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
… tests that check that, minor cleanups

Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
…ith ObjectRegistry) from __init__ files

Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
Signed-off-by: Tomasz Kornuta <[email protected]>
nemo/core/neural_modules.py Show resolved Hide resolved
nemo/backends/pytorch/nm.py Show resolved Hide resolved
Signed-off-by: Tomasz Kornuta <[email protected]>
@tkornuta-nvidia tkornuta-nvidia merged commit 33828a9 into master May 7, 2020
@tkornuta-nvidia tkornuta-nvidia deleted the feat-graph-freezing branch May 7, 2020 22:19
dcurran90 pushed a commit to dcurran90/NeMo that referenced this pull request Oct 15, 2024
…IA#614)

The chat was abruptly failing with the following stack trace if the chat
is answering and the server crashes during that time:

```
lab chat

╭──────────────────────────────────────────────────────────────────── system ────────────────────────────────────────────────────────────────────╮
│ Welcome to Chat CLI w/ MERLINITE-7B-Q4_K_M (type /h for help)                                                                                  │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
>>>                                                                                                                                   [S][default]
>>> hello                                                                                                                             [S][default]
╭───────────────────────────────────────────────────────────── merlinite-7b-Q4_K_M ──────────────────────────────────────────────────────────────╮
│ Hello! I'm here to                                                                                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── elapsed 1.031 seconds ─╯
Traceback (most recent call last):
  File "/home/leseb/cli/test_run_2Fv9yrYE/venv/lib64/python3.10/site-packages/httpx/_transports/default.py", line 69, in map_httpcore_exceptions
    yield
  File "/home/leseb/cli/test_run_2Fv9yrYE/venv/lib64/python3.10/site-packages/httpx/_transports/default.py", line 113, in __iter__
    for part in self._httpcore_stream:
  File "/home/leseb/cli/test_run_2Fv9yrYE/venv/lib64/python3.10/site-packages/httpcore/_sync/connection_pool.py", line 367, in __iter__
    raise exc from None
  File "/home/leseb/cli/test_run_2Fv9yrYE/venv/lib64/python3.10/site-packages/httpcore/_sync/connection_pool.py", line 363, in __iter__
    for part in self._stream:
  File "/home/leseb/cli/test_run_2Fv9yrYE/venv/lib64/python3.10/site-packages/httpcore/_sync/http11.py", line 349, in __iter__
    raise exc
  File "/home/leseb/cli/test_run_2Fv9yrYE/venv/lib64/python3.10/site-packages/httpcore/_sync/http11.py", line 341, in __iter__
    for chunk in self._connection._receive_response_body(**kwargs):
  File "/home/leseb/cli/test_run_2Fv9yrYE/venv/lib64/python3.10/site-packages/httpcore/_sync/http11.py", line 210, in _receive_response_body
    event = self._receive_event(timeout=timeout)
  File "/home/leseb/cli/test_run_2Fv9yrYE/venv/lib64/python3.10/site-packages/httpcore/_sync/http11.py", line 220, in _receive_event
    with map_exceptions({h11.RemoteProtocolError: RemoteProtocolError}):
  File "/usr/lib64/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/leseb/cli/test_run_2Fv9yrYE/venv/lib64/python3.10/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)
```

Now we handle this more elagantly and print before exiting:

```
lab chat
╭──────────────────────────────────────────────────────────────────── system ────────────────────────────────────────────────────────────────────╮
│ Welcome to Chat CLI w/ MERLINITE-7B-Q4_K_M (type /h for help)                                                                                  │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
>>> hello                                                                                                                             [S][default]
Connection to the server was closed
Executing chat failed with: API issue found while executing chat: Connection to the server was closed
```

Signed-off-by: Sébastien Han <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants