Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove close flush for mosaicml logger #3446

Merged

Conversation

mvpatel2000
Copy link
Contributor

What does this PR do?

Remove close flush for mosaicml logger. All data is force flushed on fit/predict/eval_end. Attempting to log on close may result in an error as it tries to schedule a future while the python interpreter is shutting down. Instead, we should only wait on existing futures already scheduled.

While we attempt to not schedule futures and do a blocking op currently, under the hood mcli still uses asyncio which uses futures.

@mvpatel2000 mvpatel2000 requested a review from a team as a code owner July 1, 2024 15:27
Copy link
Contributor

@XiaohanZhangCMU XiaohanZhangCMU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an example run name to the PR description which failed because the original flush?

@mvpatel2000
Copy link
Contributor Author

Can you add an example run name to the PR description which failed because the original flush?

I only was able to reliably reproduce on interactive runs :(. I have an example trace:

Traceback (most recent call last):
  File "/mnt/workdisk/mihirp/composer/composer/core/engine.py", line 553, in _close
    callback.close(state, logger)
  File "/mnt/workdisk/mihirp/composer/composer/loggers/mosaicml_logger.py", line 149, in close
    self._flush_metadata(force_flush=True, future=False)
  File "/mnt/workdisk/mihirp/composer/composer/loggers/mosaicml_logger.py", line 173, in _flush_metadata
    mcli.update_run_metadata(self.run_name, self.buffered_metadata, future=False, protect=True)
  File "/usr/lib/python3/dist-packages/mcli/api/runs/api_update_run_metadata.py", line 139, in update_run_metadata
    response = run_singular_mapi_request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/mcli/api/engine/engine.py", line 688, in run_singular_mapi_request
    future = connection.pool.submit(
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 167, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown

@mvpatel2000 mvpatel2000 merged commit 8fd9c11 into mosaicml:dev Jul 1, 2024
14 checks passed
@mvpatel2000 mvpatel2000 deleted the mvpatel2000/fix-mosaicml-logger branch July 1, 2024 17:08
mvpatel2000 added a commit to mvpatel2000/composer that referenced this pull request Jul 21, 2024
mvpatel2000 added a commit that referenced this pull request Jul 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants