Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add error handling to Cryomech PTC drivers and agent acq process #767

Merged
merged 4 commits into from
Oct 7, 2024

Conversation

BrianJKoopman
Copy link
Member

Description

This PR implements error handling for the Cryomech PTC class and most of the agent, with the main goal of making the acq() process robust to network interruptions.

In doing this I moved the "drivers" out to their own module. That's perhaps a bit noisy overall in this PR, but the interesting diffs are separated to:

  • Driver error handling implementation: 41f5f8f
  • Agent error handling in acq process: 1470e0a

Motivation and Context

This implementation is the one outlined in "Reconnecting in the Drivers" in #538, heavily inspired by #325.

Resolves #766.

How Has This Been Tested?

Tested by running the agent locally with an actual Cryomech compressor. Tested the agent remains online while disconnecting the network cable from the compressor and re-establishes connection when the cable is plugged back in. Example logs:

2024-10-02T15:18:47-0400 start called for acq
2024-10-02T15:18:47-0400 acq:1 Status is now "starting".
2024-10-02T15:18:47-0400 Response from acq.start(): Started process "acq".
2024-10-02T15:18:47-0400 init:0 PTC agent initialized
2024-10-02T15:18:47-0400 init:0 Status is now "done".
2024-10-02T15:18:47-0400 acq:1 Status is now "running".
2024-10-02T15:19:12-0400 Failed to get data from compressor. Check network connection.
2024-10-02T15:19:23-0400 Failed to get data from compressor. Check network connection.
2024-10-02T15:19:32-0400 Connection re-established.

Also tested shutting down the compressor and bringing it back up, which causes the reset() to trigger and produces a slightly different set of errors. It still reconnects once the compressor is back online. Example logs:

2024-10-02T15:19:51-0400 Failed to get data from compressor. Check network connection.
2024-10-02T15:20:02-0400 Failed to get data from compressor. Check network connection.
2024-10-02T15:20:12-0400 Failed to get data from compressor. Check network connection.
2024-10-02T15:20:13-0400 Connection error: [Errno 32] Broken pipe
2024-10-02T15:20:13-0400 Resetting the connection to the compressor.
2024-10-02T15:20:14-0400 Unable to connect. [Errno 111] Connection refused
2024-10-02T15:20:14-0400 Caught unexpected AttributeError during write:
2024-10-02T15:20:14-0400   'NoneType' object has no attribute 'sendall'
2024-10-02T15:20:14-0400 Failed to get data from compressor. Check network connection.
2024-10-02T15:20:15-0400 Caught unexpected AttributeError during write:
2024-10-02T15:20:15-0400   'NoneType' object has no attribute 'sendall'
2024-10-02T15:20:15-0400 Resetting the connection to the compressor.
2024-10-02T15:20:15-0400 Unable to connect. [Errno 111] Connection refused
2024-10-02T15:20:15-0400 Caught unexpected AttributeError during write:
2024-10-02T15:20:15-0400   'NoneType' object has no attribute 'sendall
2024-10-02T15:20:47-0400 Resetting the connection to the compressor.
2024-10-02T15:20:47-0400 Connection re-established.

(There will be a lot of the AttributeError logs, as the acq process loops.)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@BrianJKoopman
Copy link
Member Author

Added error handling for when the socket comes back as None to avoid those AttributeError exceptions. Retested, and I'm fairly happy with the performance and logging at the moment. Will likely merge when checks are done.

@BrianJKoopman
Copy link
Member Author

Messed up splitting out #771, had to push a rebase...will merge when checks complete.

@BrianJKoopman BrianJKoopman merged commit cc1efa3 into main Oct 7, 2024
5 checks passed
@BrianJKoopman BrianJKoopman deleted the koopman/cryomech-robustness branch October 7, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CryomechCPAAgent Robustness
1 participant