adding hardware usage and software packages tracker #2195

abidwael · 2022-06-27T07:18:52Z

This provides a new Tracker class to track hardware usage and software packages while a block of code is being executed.

Usage:

with Tracker(tag='train', output_dir=model.config['backend']['cache_dir'], num_batches=model.config[TRAINER]["batch_size"], num_examples=len(training_set)) as tracker:
    # code block
    .
    .

Will save hardware and software usage metrics under f"{output_dir}/{tag}_metrics.json"

github-actions · 2022-06-27T08:00:21Z

Unit Test Results

      6 files +    1       6 suites +1 2h 37m 9s ⏱️ + 36m 43s
2 913 tests -   16 2 868 ✔️ -   15   45 💤 -   1 0 ❌ ±0
8 739 runs +123 8 600 ✔️ +103 139 💤 +20 0 ❌ ±0

Results for commit c18b6bd. ± Comparison against base commit 9c58d5e.

♻️ This comment has been updated with latest results.

ShreyaR

Thanks for putting up this PR! Left some comments.

One general question I have is what the overhead of using Tracker is. If it isn't too expensive or slow to run Tracker, then it may make sense to use it by default for any Ludwig training process. I can see it being quite useful if we have a json with all the benchmarking stats generated in the model artifacts folder anytime we do training/evaluation.

ludwig/utils/tracker.py

ludwig/utils/misc_utils.py

for more information, see https://pre-commit.ci

abidwael · 2022-06-30T21:07:38Z

requirements_tracker.txt

@@ -0,0 +1,3 @@
+experiment_impact_tracker
+gpustat


Should this be a separate requirements_tracker.txt file or do I need to add it to the main requirements.txt file?

I'd be ok with adding this to the main requirements.txt file, especially if hardware resource usage tracking adds marginal overhead.

Curious about other people's opinions on this: @dantreiman @w4nderlust @tgaddair

for more information, see https://pre-commit.ci

…per to spawn the `Tracker` monitor process

abidwael · 2022-07-12T10:34:58Z

Thanks for putting up this PR! Left some comments.

One general question I have is what the overhead of using Tracker is. If it isn't too expensive or slow to run Tracker, then it may make sense to use it by default for any Ludwig training process. I can see it being quite useful if we have a json with all the benchmarking stats generated in the model artifacts folder anytime we do training/evaluation.

@ShreyaR I ran model.experiment with and without Tracker for ames_housing and mercedes_benz_greener and collected the total cpu and ram usage for all processes running on the machine at the time of execution. Here are the results:

CPU seems to be more or less unaffected, but there's some RAM overhead. In my opinion, it's worth adding an optional Tracker ctx per @justinxzhao 's suggestion.

for more information, see https://pre-commit.ci

justinxzhao · 2022-07-12T16:40:12Z

requirements_tracker.txt

@@ -0,0 +1,3 @@
+experiment_impact_tracker
+gpustat


I'd be ok with adding this to the main requirements.txt file, especially if hardware resource usage tracking adds marginal overhead.

Curious about other people's opinions on this: @dantreiman @w4nderlust @tgaddair

ludwig/benchmarking/tracker.py

justinxzhao · 2022-07-13T23:17:08Z

ludwig/benchmarking/tracker.py

+        time.sleep(logging_interval)
+
+
+class ResourceUsageTracker:


Discussed offline: Add a basic unit test that shows how this class can/should be used.

for more information, see https://pre-commit.ci

abidwael · 2022-07-14T23:41:59Z

The pre-commit.ci check will not pass because of the following block

# disabling print because the following imports are verbose
f = open(os.devnull, "w")
sys.stdout = f
from experiment_impact_tracker.cpu.common import get_my_cpu_info
from experiment_impact_tracker.gpu.nvidia import get_gpu_info
from experiment_impact_tracker.py_environment.common import get_python_packages_and_versions

f.close()
sys.stdout = sys.__stdout__

I'm temporarily redirecting stdout because the import statement is verbose.

Made a PR in the original repo: Breakend/experiment-impact-tracker#74
Will follow up with the maintainer.

justinxzhao

Changes LGTM, looks like there's a few last errors to resolve:

From pre-commit:

ludwig/benchmarking/resource_usage_tracker.py:24: [E402] module level import not at top of file
ludwig/benchmarking/resource_usage_tracker.py:25: [E402] module level import not at top of file
ludwig/benchmarking/resource_usage_tracker.py:26: [E402] module level import not at top of file

Finally, could you check the unit test you added? It looks like it's failing on one of the builds.

abidwael · 2022-07-15T04:57:51Z

Changes LGTM, looks like there's a few last errors to resolve:

From pre-commit:
ludwig/benchmarking/resource_usage_tracker.py:24: [E402] module level import not at top of file
ludwig/benchmarking/resource_usage_tracker.py:25: [E402] module level import not at top of file
ludwig/benchmarking/resource_usage_tracker.py:26: [E402] module level import not at top of file
Finally, could you check the unit test you added? It looks like it's failing on one of the builds.

This one is due to my previous comment here.

justinxzhao

LGTM after the pre-commit error is fixed, and the tests are all green.

for more information, see https://pre-commit.ci

abidwael requested a review from ShreyaR June 27, 2022 07:19

ShreyaR reviewed Jun 28, 2022

View reviewed changes

adding hardware usage and software packages tracker

068f710

abidwael force-pushed the monitoring-utils branch from f910b7b to 068f710 Compare June 30, 2022 20:58

[pre-commit.ci] auto fixes from pre-commit.com hooks

c3bcd5b

for more information, see https://pre-commit.ci

abidwael commented Jun 30, 2022

View reviewed changes

Wael Abid added 2 commits June 30, 2022 14:12

removed stdout redirection to null during import

dbf75d2

remove sys.stdout

53871f9

abidwael force-pushed the monitoring-utils branch from 508b98b to 53871f9 Compare June 30, 2022 21:30

Wael Abid added 2 commits July 6, 2022 13:40

reverting

44c574c

updated tracker.py

97b8ef3

abidwael force-pushed the monitoring-utils branch from 86b4d75 to 97b8ef3 Compare July 12, 2022 07:56

pre-commit-ci bot and others added 4 commits July 12, 2022 07:56

[pre-commit.ci] auto fixes from pre-commit.com hooks

f2b859a

for more information, see https://pre-commit.ci

improved docstring style

ccbffce

removing unnecessary torch.cuda.synchronize() call

e78a582

using the multiprocessing library instead of the @processify wrap…

d3f4fe7

…per to spawn the `Tracker` monitor process

[pre-commit.ci] auto fixes from pre-commit.com hooks

53ae6ad

for more information, see https://pre-commit.ci

abidwael requested a review from justinxzhao July 12, 2022 10:38

justinxzhao reviewed Jul 12, 2022

View reviewed changes

style changes

27133d8

abidwael marked this pull request as ready for review July 12, 2022 18:23

abidwael requested a review from ShreyaR July 12, 2022 18:25

adding s3fs to requirements.txt

4dc3347

justinxzhao reviewed Jul 13, 2022

View reviewed changes

Wael Abid and others added 4 commits July 14, 2022 13:00

name change to resource_usage_tracker.py

4bc606f

added test

f270ba4

[pre-commit.ci] auto fixes from pre-commit.com hooks

cf4c93c

for more information, see https://pre-commit.ci

tag name validation

e285481

pre-commit-ci bot and others added 4 commits July 14, 2022 21:28

[pre-commit.ci] auto fixes from pre-commit.com hooks

8807e05

for more information, see https://pre-commit.ci

flake8 updates

3bad0e3

fixed test file

a02e0bd

[pre-commit.ci] auto fixes from pre-commit.com hooks

4b7f7dc

for more information, see https://pre-commit.ci

update test file

2f09fd4

justinxzhao reviewed Jul 15, 2022

View reviewed changes

fixing empty utilization (due to very short experiment)

fe622d0

justinxzhao approved these changes Jul 15, 2022

View reviewed changes

Wael Abid and others added 2 commits July 14, 2022 23:02

added # noqa E402

a1fd13d

[pre-commit.ci] auto fixes from pre-commit.com hooks

c18b6bd

for more information, see https://pre-commit.ci

abidwael merged commit ae8de10 into master Jul 15, 2022

abidwael deleted the monitoring-utils branch July 15, 2022 14:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding hardware usage and software packages tracker #2195

adding hardware usage and software packages tracker #2195

abidwael commented Jun 27, 2022

github-actions bot commented Jun 27, 2022 •

edited

Loading

ShreyaR left a comment

abidwael Jun 30, 2022

justinxzhao Jul 12, 2022

abidwael commented Jul 12, 2022

justinxzhao Jul 12, 2022

justinxzhao Jul 13, 2022

abidwael commented Jul 14, 2022

justinxzhao left a comment

abidwael commented Jul 15, 2022

justinxzhao left a comment

		@@ -0,0 +1,3 @@
		experiment_impact_tracker
		gpustat

		@@ -0,0 +1,3 @@
		experiment_impact_tracker
		gpustat

adding hardware usage and software packages tracker #2195

adding hardware usage and software packages tracker #2195

Conversation

abidwael commented Jun 27, 2022

github-actions bot commented Jun 27, 2022 • edited Loading

Unit Test Results

ShreyaR left a comment

Choose a reason for hiding this comment

abidwael Jun 30, 2022

Choose a reason for hiding this comment

justinxzhao Jul 12, 2022

Choose a reason for hiding this comment

abidwael commented Jul 12, 2022

justinxzhao Jul 12, 2022

Choose a reason for hiding this comment

justinxzhao Jul 13, 2022

Choose a reason for hiding this comment

abidwael commented Jul 14, 2022

justinxzhao left a comment

Choose a reason for hiding this comment

abidwael commented Jul 15, 2022

justinxzhao left a comment

Choose a reason for hiding this comment

github-actions bot commented Jun 27, 2022 •

edited

Loading