-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix O(n^2) behavior in the log_monitor #5569
Conversation
Test FAILed. |
Test PASSed. |
Test FAILed. |
python/ray/tests/test_basic.py
Outdated
del a | ||
|
||
while True: | ||
log_file_paths = glob.glob("{}/worker*.out".format(logs_dir)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be checking logs/old
instead of logs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we should also check it before the del a
call to make sure it is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right, the test was wrong. I updated it and it is now passing the right test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you see the other part of my comment, which was
And we should also check it before the del a call to make sure it is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe even check specifically that function f finished
appears in the log file? Up to you.
python/ray/tests/test_basic.py
Outdated
with open(log_file_path, "r") as f: | ||
if "function f finished\n" in f.readlines(): | ||
done = True | ||
time.sleep(0.2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.2
is awfully long, if you're going to sleep, then how about 0.01
or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is now removed.
Test PASSed. |
Test PASSed. |
Test FAILed. |
Test FAILed. |
Jenkins retest this please |
Test PASSed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
# The process is not alive any more, so move the log file | ||
# out of the log directory so glob.glob will not be slowed | ||
# by it. | ||
target = os.path.join(self.logs_dir, "old", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of moving it how about we add it to a blacklist of files to ignore? I don't think the glob is the issue just the opening the files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running the glob on ~200.000 logfiles takes about 15s, so I assumed it was a problem but not super sure. I hate moving the logfiles too.
Why are these changes needed?
Currently, the log monitor can get choked if there are too many logfiles around. If the log directory is too large, the
glob.glob
call inray/python/ray/log_monitor.py
Line 94 in ddfabab
glob.glob
keeps being fast.Related issue number
Fixes #5320
Linter
scripts/format.sh
to lint the changes in this PR.