Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tune] Allow relative local_dir at tune.run() #4734

Merged
merged 38 commits into from
Aug 10, 2019

Conversation

pengzhenghao
Copy link
Contributor

What do these changes do?

Allow the relative local_dir for tune.run() or Experiment.

Add unit test for both relative local_dir and absolute local_dir.

@hartikainen You said using the variables' names helps for debugging. However I think the restoring log information may be more helpful using the human-understandable names, for those users who might need to know whether they are using the correct checkpoint.

Related issue number

The same as #4725

Closes #4724

Linter

  • I've run scripts/format.sh to lint the changes in this PR.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/660/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14062/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/666/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/667/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/668/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14070/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14069/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14071/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/682/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14086/
Test FAILed.

@pengzhenghao
Copy link
Contributor Author

@richardliaw Remember the PR #4733 that I formatted the files using scripts/format.sh but got different results as yours?

In this PR the travis failed due to the same issue:

The command "./ci/travis/format.sh --all" exited with 1.

I check the travis log and it turn out to format the codes in the same way as yours, that is:

        tune.run(
            "PG",
            name="TildeAbsolutePath",
            stop={"training_iteration": 1},
            checkpoint_freq=1,
            local_dir="~/test_tilde_absolute_local_dir",
            config={
                "env": "CartPole-v0",
            })

However, in my macOS Mojave with Python 3.6.0 and latest Ray environment, I still get the following formatted codes after running format script. Is that because some mistakes I made when set up the Ray?

        tune.run("PG",
                 name="TildeAbsolutePath",
                 stop={"training_iteration": 1},
                 checkpoint_freq=1,
                 local_dir="~/test_tilde_absolute_local_dir",
                 config={
                     "env": "CartPole-v0",
                 })

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/692/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14096/
Test FAILed.

@hartikainen
Copy link
Contributor

@pengzhenghao one problem with the linting could be wrong yapf version. What's the output of your pip freeze | grep yapf?

@pengzhenghao
Copy link
Contributor Author

yapf==0.23.0 is the result. I think the linting problem is solved.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/713/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14122/
Test FAILed.

shutil.rmtree(absolute_local_dir, ignore_errors=True)

def testAbsolutePath(self):
local_dir = "~/test_absolute_local_dir"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
local_dir = "~/test_absolute_local_dir"
local_dir = "~/test_absolute_local_dir"
self.assertFalse(os.path.exists(local_dir))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A failed run of test_tune_save_restore.py may left this dir ~/test_absolute_local_dir existing. Therefore I remove the potentially existing dir before training, otherwise you have to delete it by hand.

    def _train(self, exp_name, local_dir, absolute_local_dir):
	        shutil.rmtree(absolute_local_dir, ignore_errors=True)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to handle the failure cleanly to remove the directory, and then check the non-existence of it. This just because, I think deleting directories from users' home directory, even with hard-coded names like this, is kinda precarious.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can add self.assertFalse(os.path.exists("~/test_absolute_local_dir")) in setUp and shutil.rmtree("~/test_absolute_local_dir", ignore_errors=True) in tearDown or something similar?

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/715/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14124/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/724/
Test PASSed.

@pengzhenghao
Copy link
Contributor Author

I still don't know why the builds using python 3.5 always stuck. Why it so special?

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14133/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/764/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14187/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1628/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15319/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16146/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16153/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16192/
Test PASSed.

@richardliaw richardliaw merged commit 983f3c8 into ray-project:master Aug 10, 2019
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/16206/
Test PASSed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[tune] Relative local_dir is not supported.
6 participants