-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment scripts #447
Experiment scripts #447
Conversation
@@ -0,0 +1,163 @@ | |||
from PIL import Image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not copy this file, just reuse the file under demo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very much agree that it's good to avoid duplicate code. However the demo/train.py
will perhaps change for different reason and at different time comparing to mnist/train.py
, maybe they are not truly the duplicate code? (even they currently have exact same code, but in the future they will diverge)
For example, mnist/train.py
probably will never change after the experiment, but the demo/train.py
maybe will keep evolving (and breaking the experiment code if we only have one train.py
).
I think we need to avoid only the truly duplicate code, otherwise if two irrelevant components share the same code, it adds the unnecessary coupling of the two components.
Just my thoughts, open for discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mabye we can change the folder name for a general name, such as demo/job/...
, or someone will feel uncertain about the different this experiemnt/mnist
and demo/mnist
.
@@ -0,0 +1,169 @@ | |||
from PIL import Image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not copy this file, just reuse the file under demo
.
|
||
|
||
# NOTE: must change this to your own username on paddlecloud. | ||
USERNAME = "[email protected]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put mnist
dataset to the public folder so we don't need this hardcoded username anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
# NOTE: must change this to your own username on paddlecloud. | ||
USERNAME = "[email protected]" | ||
DC = os.getenv("PADDLE_CLOUD_CURRENT_DATACENTER") | ||
#common.DATA_HOME = "/pfs/%s/home/%s" % (DC, USERNAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this unintentionally commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, delete the unnecessary commented code.
#!/bin/bash | ||
DEFAULT_JOBNAME_PREFIX="mnist" | ||
|
||
function submit_general_job() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep submit_general_job
, but do we need to use it in the experiment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the TestCase1 of the design, we will compare the gernatl jobs and fault-tolerant jobs, so maybe we need this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I meant currently we are using tolerant job to indicate scaling the job or not. We can also do the experiment by just not scaling the fault tolerant job (but always start a tolerant job), so we do not need the general job. But that's fine. This is just a minor difference.
|
||
|
||
DC = os.getenv("PADDLE_CLOUD_CURRENT_DATACENTER") | ||
common.DATA_HOME = "/pfs/%s/public/idl/users/dl/paddlecloud/public/dataset" % DC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The especial path is only used for the internal CPU cluster, I will changed this one to a general path before merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM++!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixed #446