Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix test_logging_to_driver and test_not_logging_to_driver #5462

Merged
merged 2 commits into from
Aug 17, 2019

Conversation

raulchen
Copy link
Contributor

@raulchen raulchen commented Aug 16, 2019

Why are these changes needed?

These 2 tests are flaky on CI. Because sometimes the previous autoscaler tests will start background threads and print the following errors to stderr.

Traceback (most recent call last):
  File "/home/travis/miniconda/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/home/travis/build/ray-project/ray/python/ray/autoscaler/updater.py", line 151, in run
    raise e
AssertionError: Unable to SSH to node

The purpose of these tests should be to verify that the logs are redirected (or not redirected) to driver stdout. So there's no need to check stderr. However, we should also fix the issue of not stopping autoscaler background threads in a different PR.

What do these changes do?

Related issue number

Linter

  • I've run scripts/format.sh to lint the changes in this PR.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16326/
Test FAILed.

@robertnishihara
Copy link
Collaborator

@raulchen @ericl how hard would it be to make sure the autoscaler test shuts down properly? That seems like the right fix.

@@ -2649,8 +2649,6 @@ def f():
output_lines = captured["out"]
for i in range(200):
assert str(i) in output_lines
error_lines = captured["err"]
assert len(error_lines) == 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me realize that this line should have been

     assert len(error_lines) == 0, error_lines

so that we can what the stderr was in the case of error

@@ -2649,8 +2649,6 @@ def f():
output_lines = captured["out"]
for i in range(200):
assert str(i) in output_lines
error_lines = captured["err"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove this check, then we should include a comment that explains what goes wrong if we do check it. Since people (myself included) will be very tempted to bring back this check.

@robertnishihara
Copy link
Collaborator

@raulchen @ericl how hard would it be to make sure the autoscaler test shuts down properly? That seems like the right fix.

@@ -2649,8 +2649,6 @@ def f():
output_lines = captured["out"]
for i in range(200):
assert str(i) in output_lines
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to the test failure, but in this test we should really be checking that we don't get any unintended log messages (or duplicates). Especially since @stephanie-wang saw some duplicates recently.

@raulchen
Copy link
Contributor Author

@raulchen @ericl how hard would it be to make sure the autoscaler test shuts down properly? That seems like the right fix.

I agree that autoscaler issue should be fixed. But I'm not familiar with autoscaler and don't know how to fix that.
Looking at this test_logging_to_driver test, I think its purpose is to verify that the worker logs will be sent to driver stdout. So I think this test doesn't need to care about stderr. And a more accurate way to test this is to mock the print_logs functions, instead of just checking stdout output.
But for now, I think this PR is enough for unblocking the CI first.

Copy link
Collaborator

@robertnishihara robertnishihara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raulchen this looks good to me. I pushed an additional comment. Does that look good to you?

@raulchen
Copy link
Contributor Author

@robertnishihara thanks. looks good

@raulchen raulchen merged commit 03d05c8 into ray-project:master Aug 17, 2019
@raulchen raulchen deleted the fix_logging_tests branch August 17, 2019 10:06
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16363/
Test PASSed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants