Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment scripts #447

Merged
merged 6 commits into from
Oct 30, 2017
Merged

Conversation

Yancey1989
Copy link
Collaborator

Fixed #446

@@ -0,0 +1,163 @@
from PIL import Image
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not copy this file, just reuse the file under demo

Copy link
Collaborator

@helinwang helinwang Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very much agree that it's good to avoid duplicate code. However the demo/train.py will perhaps change for different reason and at different time comparing to mnist/train.py, maybe they are not truly the duplicate code? (even they currently have exact same code, but in the future they will diverge)

For example, mnist/train.py probably will never change after the experiment, but the demo/train.py maybe will keep evolving (and breaking the experiment code if we only have one train.py).

I think we need to avoid only the truly duplicate code, otherwise if two irrelevant components share the same code, it adds the unnecessary coupling of the two components.

Just my thoughts, open for discussion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mabye we can change the folder name for a general name, such as demo/job/..., or someone will feel uncertain about the different this experiemnt/mnist and demo/mnist.

@@ -0,0 +1,169 @@
from PIL import Image
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not copy this file, just reuse the file under demo.



# NOTE: must change this to your own username on paddlecloud.
USERNAME = "[email protected]"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put mnist dataset to the public folder so we don't need this hardcoded username anymore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# NOTE: must change this to your own username on paddlecloud.
USERNAME = "[email protected]"
DC = os.getenv("PADDLE_CLOUD_CURRENT_DATACENTER")
#common.DATA_HOME = "/pfs/%s/home/%s" % (DC, USERNAME)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this unintentionally commented out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, delete the unnecessary commented code.

#!/bin/bash
DEFAULT_JOBNAME_PREFIX="mnist"

function submit_general_job() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep submit_general_job, but do we need to use it in the experiment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the TestCase1 of the design, we will compare the gernatl jobs and fault-tolerant jobs, so maybe we need this function?

Copy link
Collaborator

@helinwang helinwang Oct 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I meant currently we are using tolerant job to indicate scaling the job or not. We can also do the experiment by just not scaling the fault tolerant job (but always start a tolerant job), so we do not need the general job. But that's fine. This is just a minor difference.



DC = os.getenv("PADDLE_CLOUD_CURRENT_DATACENTER")
common.DATA_HOME = "/pfs/%s/public/idl/users/dl/paddlecloud/public/dataset" % DC
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The especial path is only used for the internal CPU cluster, I will changed this one to a general path before merged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

helinwang
helinwang previously approved these changes Oct 29, 2017
Copy link
Collaborator

@helinwang helinwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM++!

typhoonzero
typhoonzero previously approved these changes Oct 30, 2017
Copy link
Collaborator

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Collaborator

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Yancey1989 Yancey1989 merged commit 77a76f2 into PaddlePaddle:develop Oct 30, 2017
@Yancey1989 Yancey1989 deleted the experiment_scripts branch October 30, 2017 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants