Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added test timeout #211

Merged
merged 6 commits into from
Dec 11, 2023
Merged

Added test timeout #211

merged 6 commits into from
Dec 11, 2023

Conversation

MeirShpilraien
Copy link
Collaborator

@MeirShpilraien MeirShpilraien commented Dec 10, 2023

The PR adds a new option --test-timeout that allows set a test timeout (in seconds) after which the test will be considered as failed. The timeout works as follow:

  1. Open a watcher thread that sleep for the given timeout
  2. Once wake up, check if the test that is been watched finished, if not:
    • Send a SIGUSR1 to the main processes thread that will cause it to print the current trace and enter a deep sleep.
    • Shutdown the environment using SIGSEGV to make sure we will get all the needed information from the shards
    • Print verbose information if needed (in case --verbose-information-on-failure was used).
    • Exit the processes using os._exit(1). Notice that it is important to exit using os._exit(1), if we exit in any other way, python might wait for active connections or thread to be close. We are killing the processes in the middle of its execution and we have no idea in which state it hang, so we prefer to wait for nothing.

For backward compatibility, the default timeout is 0 which implies no timeout.

Notice, when choosing the best way to trigger a timeout, 2 approaches was tested:

  1. Using a background watcher thread.
  2. Using signal.setitimer.

Eventually, the first approach was chosen, mainly because if we use signal.setitimer, the test itself might also use it and override our timer and callback. I believe the thread approach is safer and more reliable.

Extra additions/fixes:

  1. Progress bar indicating how many tests finished and how much is still left to run. Progress bar can be turned off using --no-progress. Progress bar will automatically turned off if --no-output-catch was used or if the stdout in not a terminal (output was redirected).
  2. Fix issue where on some cases the same test would have appear multiple time in the summary report.

Technical Low Level Details on Progressbar

Till today, when running with parallelism on more than one. Each processes reported its own progress. The PR changes this approach in way that only the main processes reports the progress and each sub-processes reports to the main processes. This gives us 2 main adventages:

  1. Avoid print collisions between processes
  2. Ability to report progress

To achieve this, each sub-processes introspects its own stdout and send the tests output to the main processes on a new channel called results. When the main processes gets a message on the results channel, it prints its connect to the stdout and increase the progress bar.

When running without parallelism, the output is printed to the stdout right away.

To avoid code duplication with the parallel and the none parallel flow, we extracted the code that runs a single tests to its own function, run_single_test, and we call it from the 2 different flows: run_jobs_main_thread, run_jobs.

@codecov-commenter
Copy link

codecov-commenter commented Dec 10, 2023

Codecov Report

Attention: 211 lines in your changes are missing coverage. Please review.

Comparison is base (409e17c) 34.16% compared to head (2eb4f1b) 32.28%.

Files Patch % Lines
RLTest/__main__.py 0.00% 183 Missing ⚠️
RLTest/redis_std.py 10.71% 25 Missing ⚠️
RLTest/redis_cluster.py 33.33% 2 Missing ⚠️
RLTest/env.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #211      +/-   ##
==========================================
- Coverage   34.16%   32.28%   -1.89%     
==========================================
  Files          17       17              
  Lines        2409     2565     +156     
==========================================
+ Hits          823      828       +5     
- Misses       1586     1737     +151     
Flag Coverage Δ
unittests 32.28% <2.31%> (-1.89%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 846 to 851
for i in range(num_elements):
if bar:
bar.update(i)
yield i
if bar:
bar.update(num_elements)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be indented? and maybe give up the if bar: then if ProgressBar cannot return None

@@ -558,15 +627,12 @@ def addFailure(self, name, failures=None):
failures = [failures]
if not failures:
failures = []
self.testsFailed.append([name, failures])
self.testsFailed.setdefault(name, []).extend(failures)

def getTotalFailureCount(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename functions

currPort += 30 # safe distance for cluster and replicas
processes.append(p)
p.start()
for _ in self.progressbar(n_jobs):
# for _ in range(n_jobs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

except Exception as e:
if not has_live_processor:
raise Exception('Failed to get job result and no more processors is alive')
_ = res['test_name']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

Comment on lines 982 to 983
for test_name, failures in res['failures'].items():
self.testsFailed[test_name] = failures
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for test_name, failures in res['failures'].items():
self.testsFailed[test_name] = failures
self.testsFailed.update(res['failures'])

@MeirShpilraien MeirShpilraien merged commit b5295ee into master Dec 11, 2023
23 checks passed
@MeirShpilraien MeirShpilraien deleted the test_timeout branch December 11, 2023 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants