[tune] Allow relative local_dir at tune.run() #4734

pengzhenghao · 2019-05-02T08:56:19Z

What do these changes do?

Allow the relative local_dir for tune.run() or Experiment.

Add unit test for both relative local_dir and absolute local_dir.

@hartikainen You said using the variables' names helps for debugging. However I think the restoring log information may be more helpful using the human-understandable names, for those users who might need to know whether they are using the correct checkpoint.

Related issue number

The same as #4725

Closes #4724

Linter

I've run scripts/format.sh to lint the changes in this PR.

AmplabJenkins · 2019-05-02T08:57:01Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/660/
Test PASSed.

AmplabJenkins · 2019-05-02T10:59:35Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14062/
Test FAILed.

AmplabJenkins · 2019-05-02T17:46:10Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/666/
Test PASSed.

AmplabJenkins · 2019-05-02T18:00:58Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/667/
Test PASSed.

AmplabJenkins · 2019-05-02T18:08:27Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/668/
Test PASSed.

AmplabJenkins · 2019-05-02T19:43:05Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14070/
Test FAILed.

AmplabJenkins · 2019-05-02T19:50:07Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14069/
Test FAILed.

AmplabJenkins · 2019-05-02T19:52:44Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14071/
Test FAILed.

AmplabJenkins · 2019-05-03T07:36:38Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/682/
Test PASSed.

AmplabJenkins · 2019-05-03T09:39:16Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14086/
Test FAILed.

pengzhenghao · 2019-05-04T03:27:54Z

@richardliaw Remember the PR #4733 that I formatted the files using scripts/format.sh but got different results as yours?

In this PR the travis failed due to the same issue:

The command "./ci/travis/format.sh --all" exited with 1.

I check the travis log and it turn out to format the codes in the same way as yours, that is:

        tune.run(
            "PG",
            name="TildeAbsolutePath",
            stop={"training_iteration": 1},
            checkpoint_freq=1,
            local_dir="~/test_tilde_absolute_local_dir",
            config={
                "env": "CartPole-v0",
            })

However, in my macOS Mojave with Python 3.6.0 and latest Ray environment, I still get the following formatted codes after running format script. Is that because some mistakes I made when set up the Ray?

        tune.run("PG",
                 name="TildeAbsolutePath",
                 stop={"training_iteration": 1},
                 checkpoint_freq=1,
                 local_dir="~/test_tilde_absolute_local_dir",
                 config={
                     "env": "CartPole-v0",
                 })

AmplabJenkins · 2019-05-04T04:26:56Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/692/
Test PASSed.

AmplabJenkins · 2019-05-04T06:31:41Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14096/
Test FAILed.

hartikainen · 2019-05-04T15:30:49Z

@pengzhenghao one problem with the linting could be wrong yapf version. What's the output of your pip freeze | grep yapf?

pengzhenghao · 2019-05-04T15:55:12Z

yapf==0.23.0 is the result. I think the linting problem is solved.

python/ray/tune/trainable.py

AmplabJenkins · 2019-05-06T12:56:07Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/713/
Test PASSed.

AmplabJenkins · 2019-05-06T14:33:11Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14122/
Test FAILed.

python/ray/tune/tests/test_tune_save_restore.py

hartikainen · 2019-05-06T17:02:29Z

python/ray/tune/tests/test_tune_save_restore.py

+        shutil.rmtree(absolute_local_dir, ignore_errors=True)
+
+    def testAbsolutePath(self):
+        local_dir = "~/test_absolute_local_dir"


Suggested change

local_dir = "~/test_absolute_local_dir"

local_dir = "~/test_absolute_local_dir"

self.assertFalse(os.path.exists(local_dir))

A failed run of test_tune_save_restore.py may left this dir ~/test_absolute_local_dir existing. Therefore I remove the potentially existing dir before training, otherwise you have to delete it by hand.

def _train(self, exp_name, local_dir, absolute_local_dir): shutil.rmtree(absolute_local_dir, ignore_errors=True)

I think it would be better to handle the failure cleanly to remove the directory, and then check the non-existence of it. This just because, I think deleting directories from users' home directory, even with hard-coded names like this, is kinda precarious.

Maybe you can add self.assertFalse(os.path.exists("~/test_absolute_local_dir")) in setUp and shutil.rmtree("~/test_absolute_local_dir", ignore_errors=True) in tearDown or something similar?

AmplabJenkins · 2019-05-06T18:00:50Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/715/
Test PASSed.

AmplabJenkins · 2019-05-06T18:33:04Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14124/
Test FAILed.

AmplabJenkins · 2019-05-07T04:17:58Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/724/
Test PASSed.

pengzhenghao · 2019-05-07T04:19:38Z

I still don't know why the builds using python 3.5 always stuck. Why it so special?

AmplabJenkins · 2019-05-07T06:25:33Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14133/
Test FAILed.

AmplabJenkins · 2019-05-11T18:21:24Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/764/
Test PASSed.

AmplabJenkins · 2019-05-11T20:57:00Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14187/
Test FAILed.

AmplabJenkins · 2019-07-11T21:57:04Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1628/
Test FAILed.

AmplabJenkins · 2019-07-12T00:03:37Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15319/
Test PASSed.

AmplabJenkins · 2019-08-08T21:35:58Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16146/
Test PASSed.

AmplabJenkins · 2019-08-09T00:58:55Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16153/
Test PASSed.

AmplabJenkins · 2019-08-09T22:36:18Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16192/
Test PASSed.

AmplabJenkins · 2019-08-11T00:31:24Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16206/
Test PASSed.

pengzh added 2 commits May 2, 2019 16:30

(1) fix restore error at tune.run() (2) add test script for restoring.

9d8c74f

(1)allow relative local_dir (2)add test script (3)add log when restore

9efab32

richardliaw added 2 commits May 2, 2019 12:26

Update test_tune_restore.py

7a9c33c

py2 support

ffdf75a

richardliaw assigned hartikainen May 2, 2019

replace single quotes with double quotes.

d1793cb

formatted

b2315a9

formatted

24fa3fe

richardliaw and others added 2 commits May 2, 2019 16:14

fixtests

56ba547

format

570540e

formatted

ab3f5a4

formatting

f70f410

formatted, using yapf==0.23

7e807e6

Merge branch 'pengzh_fix_tune_restore' into pengzh_relative_dir

2d79776

hartikainen reviewed May 4, 2019

View reviewed changes

python/ray/tune/trainable.py Outdated Show resolved Hide resolved

Merge branch 'master' into pengzh_relative_dir

ef6093c

hartikainen reviewed May 6, 2019

View reviewed changes

python/ray/tune/tests/test_tune_save_restore.py Show resolved Hide resolved

hartikainen reviewed May 6, 2019

View reviewed changes

python/ray/tune/tests/test_tune_save_restore.py Show resolved Hide resolved

hartikainen reviewed May 6, 2019

View reviewed changes

put rmtree into tearDown to make sure dirs are deleted

026c7de

Merge branch 'master' into pengzh_relative_dir

f60955e

Merge branch 'master' into pengzh_relative_dir

694f76b

richardliaw added 3 commits August 8, 2019 11:22

Merge branch 'master' into pengzh_relative_dir

c0b556a

fixpu

4f410a0

ok

700d1a5

richardliaw added 2 commits August 9, 2019 12:29

Merge branch 'master' into pengzh_relative_dir

bb83337

trials

eee50d5

richardliaw added 2 commits August 10, 2019 14:00

lint

f3e39d7

Merge branch 'master' into pengzh_relative_dir

c3854c2

richardliaw merged commit 983f3c8 into ray-project:master Aug 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tune] Allow relative local_dir at tune.run() #4734

[tune] Allow relative local_dir at tune.run() #4734

pengzhenghao commented May 2, 2019

AmplabJenkins commented May 2, 2019

AmplabJenkins commented May 2, 2019

AmplabJenkins commented May 2, 2019

AmplabJenkins commented May 2, 2019

AmplabJenkins commented May 2, 2019

AmplabJenkins commented May 2, 2019

AmplabJenkins commented May 2, 2019

AmplabJenkins commented May 2, 2019

AmplabJenkins commented May 3, 2019

AmplabJenkins commented May 3, 2019

pengzhenghao commented May 4, 2019

AmplabJenkins commented May 4, 2019

AmplabJenkins commented May 4, 2019

hartikainen commented May 4, 2019

pengzhenghao commented May 4, 2019

AmplabJenkins commented May 6, 2019

AmplabJenkins commented May 6, 2019

hartikainen May 6, 2019

pengzhenghao May 6, 2019

hartikainen May 6, 2019

hartikainen May 7, 2019

AmplabJenkins commented May 6, 2019

AmplabJenkins commented May 6, 2019

AmplabJenkins commented May 7, 2019

pengzhenghao commented May 7, 2019

AmplabJenkins commented May 7, 2019

AmplabJenkins commented May 11, 2019

AmplabJenkins commented May 11, 2019

AmplabJenkins commented Jul 11, 2019

AmplabJenkins commented Jul 12, 2019

AmplabJenkins commented Aug 8, 2019

AmplabJenkins commented Aug 9, 2019

AmplabJenkins commented Aug 9, 2019

AmplabJenkins commented Aug 11, 2019

	local_dir = "~/test_absolute_local_dir"
	local_dir = "~/test_absolute_local_dir"
	self.assertFalse(os.path.exists(local_dir))

[tune] Allow relative local_dir at tune.run() #4734

[tune] Allow relative local_dir at tune.run() #4734

Conversation

pengzhenghao commented May 2, 2019

What do these changes do?

Related issue number

Linter

AmplabJenkins commented May 2, 2019

AmplabJenkins commented May 2, 2019

AmplabJenkins commented May 2, 2019

AmplabJenkins commented May 2, 2019

AmplabJenkins commented May 2, 2019

AmplabJenkins commented May 2, 2019

AmplabJenkins commented May 2, 2019

AmplabJenkins commented May 2, 2019

AmplabJenkins commented May 3, 2019

AmplabJenkins commented May 3, 2019

pengzhenghao commented May 4, 2019

AmplabJenkins commented May 4, 2019

AmplabJenkins commented May 4, 2019

hartikainen commented May 4, 2019

pengzhenghao commented May 4, 2019

AmplabJenkins commented May 6, 2019

AmplabJenkins commented May 6, 2019

hartikainen May 6, 2019

Choose a reason for hiding this comment

pengzhenghao May 6, 2019

Choose a reason for hiding this comment

hartikainen May 6, 2019

Choose a reason for hiding this comment

hartikainen May 7, 2019

Choose a reason for hiding this comment

AmplabJenkins commented May 6, 2019

AmplabJenkins commented May 6, 2019

AmplabJenkins commented May 7, 2019

pengzhenghao commented May 7, 2019

AmplabJenkins commented May 7, 2019

AmplabJenkins commented May 11, 2019

AmplabJenkins commented May 11, 2019

AmplabJenkins commented Jul 11, 2019

AmplabJenkins commented Jul 12, 2019

AmplabJenkins commented Aug 8, 2019

AmplabJenkins commented Aug 9, 2019

AmplabJenkins commented Aug 9, 2019

AmplabJenkins commented Aug 11, 2019