Experiment scripts #447

Yancey1989 · 2017-10-27T11:01:28Z

Fixed #446

typhoonzero · 2017-10-27T11:08:37Z

doc/autoscale/experiment/mnist/train.py

@@ -0,0 +1,163 @@
+from PIL import Image


Do not copy this file, just reuse the file under demo

Very much agree that it's good to avoid duplicate code. However the demo/train.py will perhaps change for different reason and at different time comparing to mnist/train.py, maybe they are not truly the duplicate code? (even they currently have exact same code, but in the future they will diverge)

For example, mnist/train.py probably will never change after the experiment, but the demo/train.py maybe will keep evolving (and breaking the experiment code if we only have one train.py).

I think we need to avoid only the truly duplicate code, otherwise if two irrelevant components share the same code, it adds the unnecessary coupling of the two components.

Just my thoughts, open for discussion.

Mabye we can change the folder name for a general name, such as demo/job/..., or someone will feel uncertain about the different this experiemnt/mnist and demo/mnist.

typhoonzero · 2017-10-27T11:08:50Z

doc/autoscale/experiment/mnist/train_ft.py

@@ -0,0 +1,169 @@
+from PIL import Image


Do not copy this file, just reuse the file under demo.

helinwang · 2017-10-27T21:44:01Z

doc/autoscale/experiment/mnist/train_ft.py

+
+
+# NOTE: must change this to your own username on paddlecloud.
+USERNAME = "[email protected]"


Can we put mnist dataset to the public folder so we don't need this hardcoded username anymore?

helinwang · 2017-10-27T22:00:27Z

doc/autoscale/experiment/mnist/train.py

+# NOTE: must change this to your own username on paddlecloud.
+USERNAME = "[email protected]"
+DC = os.getenv("PADDLE_CLOUD_CURRENT_DATACENTER")
+#common.DATA_HOME = "/pfs/%s/home/%s" % (DC, USERNAME)


Is this unintentionally commented out?

Done, delete the unnecessary commented code.

helinwang · 2017-10-28T00:33:12Z

doc/autoscale/experiment/control_case1.sh

+#!/bin/bash
+DEFAULT_JOBNAME_PREFIX="mnist"
+
+function submit_general_job() {


We can keep submit_general_job, but do we need to use it in the experiment?

Following the TestCase1 of the design, we will compare the gernatl jobs and fault-tolerant jobs, so maybe we need this function?

I see, I meant currently we are using tolerant job to indicate scaling the job or not. We can also do the experiment by just not scaling the fault tolerant job (but always start a tolerant job), so we do not need the general job. But that's fine. This is just a minor difference.

Yancey1989 · 2017-10-28T19:18:15Z

doc/autoscale/experiment/mnist/train.py

+
+
+DC = os.getenv("PADDLE_CLOUD_CURRENT_DATACENTER")
+common.DATA_HOME = "/pfs/%s/public/idl/users/dl/paddlecloud/public/dataset" % DC


The especial path is only used for the internal CPU cluster, I will changed this one to a general path before merged.

helinwang

LGTM++!

typhoonzero

LGTM!

typhoonzero

LGTM

Yancey1989 added 2 commits October 27, 2017 16:15

add some scripts to submit experiment jobs

f26a8d1

submit ft jobs

add7e13

Yancey1989 requested review from helinwang and typhoonzero October 27, 2017 11:01

update

a345063

typhoonzero reviewed Oct 27, 2017

View reviewed changes

Yancey1989 mentioned this pull request Oct 27, 2017

Autoscaling Experiment. #399

Closed

helinwang reviewed Oct 27, 2017

View reviewed changes

helinwang reviewed Oct 28, 2017

View reviewed changes

support mulitple passes

2b50765

Yancey1989 commented Oct 28, 2017

View reviewed changes

helinwang previously approved these changes Oct 29, 2017

View reviewed changes

delete trainer id in train_ft.py and job docker image

ea08eca

Yancey1989 dismissed helinwang’s stale review via ea08eca October 29, 2017 15:44

typhoonzero previously approved these changes Oct 30, 2017

View reviewed changes

update

05e2e64

Yancey1989 dismissed typhoonzero’s stale review via 05e2e64 October 30, 2017 02:13

typhoonzero approved these changes Oct 30, 2017

View reviewed changes

Yancey1989 merged commit 77a76f2 into PaddlePaddle:develop Oct 30, 2017

Yancey1989 deleted the experiment_scripts branch October 30, 2017 02:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment scripts #447

Experiment scripts #447

Yancey1989 commented Oct 27, 2017

typhoonzero Oct 27, 2017

helinwang Oct 27, 2017 •

edited

Loading

typhoonzero Oct 28, 2017

Yancey1989 Oct 28, 2017

typhoonzero Oct 27, 2017

helinwang Oct 27, 2017

Yancey1989 Oct 28, 2017

helinwang Oct 27, 2017

Yancey1989 Oct 28, 2017

helinwang Oct 28, 2017

Yancey1989 Oct 28, 2017

helinwang Oct 29, 2017 •

edited

Loading

Yancey1989 Oct 28, 2017

helinwang Oct 29, 2017

helinwang left a comment

typhoonzero left a comment

typhoonzero left a comment



		# NOTE: must change this to your own username on paddlecloud.
		USERNAME = "[email protected]"



		DC = os.getenv("PADDLE_CLOUD_CURRENT_DATACENTER")
		common.DATA_HOME = "/pfs/%s/public/idl/users/dl/paddlecloud/public/dataset" % DC

Experiment scripts #447

Experiment scripts #447

Conversation

Yancey1989 commented Oct 27, 2017

Choose a reason for hiding this comment

helinwang Oct 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

helinwang Oct 29, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

helinwang left a comment

Choose a reason for hiding this comment

typhoonzero left a comment

Choose a reason for hiding this comment

typhoonzero left a comment

Choose a reason for hiding this comment

helinwang Oct 27, 2017 •

edited

Loading

helinwang Oct 29, 2017 •

edited

Loading