-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding hardware usage and software packages tracker #2195
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting up this PR! Left some comments.
One general question I have is what the overhead of using Tracker
is. If it isn't too expensive or slow to run Tracker
, then it may make sense to use it by default for any Ludwig training process. I can see it being quite useful if we have a json with all the benchmarking stats generated in the model artifacts folder anytime we do training/evaluation.
f910b7b
to
068f710
Compare
for more information, see https://pre-commit.ci
requirements_tracker.txt
Outdated
@@ -0,0 +1,3 @@ | |||
experiment_impact_tracker | |||
gpustat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a separate requirements_tracker.txt
file or do I need to add it to the main requirements.txt
file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be ok with adding this to the main requirements.txt file, especially if hardware resource usage tracking adds marginal overhead.
Curious about other people's opinions on this: @dantreiman @w4nderlust @tgaddair
508b98b
to
53871f9
Compare
86b4d75
to
97b8ef3
Compare
for more information, see https://pre-commit.ci
…per to spawn the `Tracker` monitor process
@ShreyaR I ran CPU seems to be more or less unaffected, but there's some RAM overhead. In my opinion, it's worth adding an optional |
for more information, see https://pre-commit.ci
requirements_tracker.txt
Outdated
@@ -0,0 +1,3 @@ | |||
experiment_impact_tracker | |||
gpustat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be ok with adding this to the main requirements.txt file, especially if hardware resource usage tracking adds marginal overhead.
Curious about other people's opinions on this: @dantreiman @w4nderlust @tgaddair
ludwig/benchmarking/tracker.py
Outdated
time.sleep(logging_interval) | ||
|
||
|
||
class ResourceUsageTracker: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline: Add a basic unit test that shows how this class can/should be used.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
The pre-commit.ci check will not pass because of the following block
I'm temporarily redirecting stdout because the import statement is verbose. Made a PR in the original repo: Breakend/experiment-impact-tracker#74 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM, looks like there's a few last errors to resolve:
From pre-commit:
ludwig/benchmarking/resource_usage_tracker.py:24: [E402] module level import not at top of file
ludwig/benchmarking/resource_usage_tracker.py:25: [E402] module level import not at top of file
ludwig/benchmarking/resource_usage_tracker.py:26: [E402] module level import not at top of file
Finally, could you check the unit test you added? It looks like it's failing on one of the builds.
This one is due to my previous comment here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after the pre-commit error is fixed, and the tests are all green.
for more information, see https://pre-commit.ci
This provides a new
Tracker
class to track hardware usage and software packages while a block of code is being executed.Usage:
Will save hardware and software usage metrics under
f"{output_dir}/{tag}_metrics.json"