Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harden survey script to various failures #4407

Merged
merged 1 commit into from
Aug 1, 2024

Conversation

bboston7
Copy link
Contributor

Closes #4404

This change makes a few improvements to the survey script to enable it to better handle errors encountered during execution. Specifically, it:

  • Adds a periodic ping to the /info endpoint to keep any SSH tunnels the script may be running through alive.
  • Adds a flag --startPhase to allow the script to be attached to an already running survey. This ensures that if the script crashes it can pick up where it left off.
    • Additionally, the script now clears stellar-core's survey results cache when resuming gathering survey results.
  • Catches errors in write_graph_stats and logs them rather than crashing.
  • Writes out the graphml file as soon as the graph is complete so that any unhandled errors don't throw away the graph.
  • Batches survey requests into smaller batches.
  • Renames --collect-duration to --collectDuration for consistency with the other long option names.

I tested these changes using the script's simulate mode.

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

Closes stellar#4404

This change makes a few improvements to the survey script to enable it
to better handle errors encountered during execution. Specifically, it:

* Adds a periodic ping to the `/info` endpoint to keep any SSH tunnels
  the script may be running through alive.
* Adds a flag `--startPhase` to allow the script to be attached to an
  already running survey. This ensures that if the script crashes it can
  pick up where it left off.
  * Additionally, the script now clears stellar-core's survey results
    cache when resuming gathering survey results.
* Catches errors in `write_graph_stats` and logs them rather than
  crashing.
* Writes out the graphml file as soon as the graph is complete so that
  any unhandled errors don't throw away the graph.
* Batches survey requests into smaller batches.
* Renames `--collect-duration` to `--collectDuration` for consistency
  with the other long option names.
@marta-lokhova marta-lokhova force-pushed the survey-script-hardening branch from f9327c4 to b4b1e03 Compare August 1, 2024 16:11
@marta-lokhova marta-lokhova enabled auto-merge August 1, 2024 16:11
@marta-lokhova marta-lokhova added this pull request to the merge queue Aug 1, 2024
Merged via the queue into stellar:master with commit 7e984c1 Aug 1, 2024
14 checks passed
@bboston7 bboston7 deleted the survey-script-hardening branch August 1, 2024 17:45
bboston7 added a commit to bboston7/stellar-core that referenced this pull request Dec 20, 2024
This change updates the survey script documentation to reflect changes
from stellar#4407. It also fixes a link to the admin guide.
bboston7 added a commit to bboston7/stellar-core that referenced this pull request Dec 21, 2024
This change updates the survey script documentation to reflect changes
from stellar#4407. It also fixes a link to the admin guide.
github-merge-queue bot pushed a commit that referenced this pull request Jan 4, 2025
This change updates the survey script documentation to reflect changes
from #4407. It also fixes a link to the admin guide.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

survey: improve survey script in case of failures
2 participants