Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executor interface design and implementation #4537

Merged
merged 64 commits into from
Oct 11, 2017

Conversation

QiJune
Copy link
Member

@QiJune QiJune commented Sep 30, 2017

Fix #4523 #4557

Timeline(@tonyyang-svail):

Oct 3rd, Tuesday

  • Implement vector<Tensor> Executor.Run). It
    • Takes a ProgramDesc and Scope
    • Creates a local scope
    • Runs the whole graph
    • fetches desired value
      • possible test: Init, add, then fetch

Oct 4th, Wednesday

  • Write FeedOp and FetchOp design Doc and get general feedback on it.

Oct 5th, Thursday

  • Reach a basic conclusion on FeedOp and FetchOp.
  • Implement FeedOp(by @QiJune )
    • Note: Prepend FeedOp to ProgramDesc, so that it will be ran first.
  • Implement FetchOp(by @QiJune )
  • Implement first version of Prune. It
    • Takes const ProgramDesc& input (UseFeedOp to find feed, and use FetchOp to find target.
    • Returns a vector<bool> indicates a op should be run or no.

Oct 6th, Friday

  • Make GPU unit test work (by @QiJune )
  • Test Simple Graph on Prune

Oct 7th, Saturday

  • Refactor executor_test.cc
    • FeedOp and FetchOp
      • Random number generator
      • Independent of the Graph
  • Draft executor design doc, which contains
    • Run()
    • Preprocessing()
  • Simplify c++ OpDesc create procedure

Oct 8th, Sunday

Oct 9th, Monday

  • Merge FeedOp and FetchOp design doc Create feed_op_and_fectch_op Desgin Doc #4599
  • Integrate InitOp into ProgramDesc. (More discussion needed for where to put init op)
  • Milestone: pass a simple test on forward and backward multiple times.

return new ExecutorImpl(GetScope(), GetDevice(place), &pdesc, is_linear);
}

void ExecutorImpl::Run() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one of the most important thing for the executor is Run should be thread-safe (e.g., ok to do concurrent Runs). This is a must for inferencing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. We must allow doing concurrent Runs in inference with only one copy of parameters in memory.
I am thinking whom and where to do parameters loading/saving. Our topology can be serialized to ProgramDesc, and what will the parameters serialized.


ProgramDescView* ProgramDescView::Create(bool is_linear) {
if (is_linear) {
return new LinearListView();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need LinearListView since we already have GraphView?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • LinearListView organizes the topology in a linearlist, and operators will be executed sequentially.
  • GraphView organizes the topology in a Graph, and further optimization can be applied based on Graph structure.
    I think that we can have LinearListView at now, maybe GraphView can be implemented later.

@tonyyang-svail tonyyang-svail mentioned this pull request Oct 2, 2017
class LinearListView;
class GraphView;

// Immutable view of a ProgramDesc organized for efficient execution.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been implemented by framework/op_desc.h

virtual void Run() = 0;
};

Executor* NewLocalExecutor(const platform::Place&, const ProgramDesc&, bool);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename and redesign NewLocalExecutor into

NewExecutor(const std::vector<Place>& places);
  • No need for the bool optimize parameter.
  • ProgramDesc is a parameter to Executor::Run.


class Executor {
public:
virtual ~Executor() {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@helinwang has a suggestion -- given that the construction of an executor could be expensive -- including the creation of thread pools, it would be reasonable to reuse an executor to run multiple ProgramDesc's.

Therefore we need the constructor:

Executor(const std::vector<Place>& places);

and the Run method:

virtual void Run(const ProgramDesc& program, Scope* scope);

public:
FeedOpMaker(framework::OpProto* proto, framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddAttr<int>("data_type", "output data type")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure that all Input, Output and Attribute names adhere to the naming convention. https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/name_convention.md

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


PADDLE_ENFORCE_GT(tensors.size(), static_cast<size_t>(col));
auto in_dim = tensors[col].dims();
ctx->SetOutputDim("Out", in_dim);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QiJune should we use dims attribute to infer output shape?

Copy link
Member Author

@QiJune QiJune Oct 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I fixed it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you also add enforce (*tensors)[col].numel == product(dims). Chances are a user specifies the wrong the col.


auto input_dim = ctx->GetInputDim("Input");
PADDLE_ENFORCE_GT(tensors->size(), col);
(*tensors)[col].Resize(input_dim);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QiJune Same for fetch op, inferShape according to dims attribute.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then enforce (*tensors)[col].numel == product(dims)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonyyang-svail
I have a discussion with @jacquesqiao, InferShape will be done in compile-time first. So, InferShape of FeedOp and FetchOp can not use run-time concepts, like GlobalScope.
FeedOp has an attribute dims to set its output tensor dims. And FecthOp does not need a dims attribute. dims of Tensors in fetch_result can be set from FetchOp's input tensor.

* @return
* vector<bool> Same size as ops. Indicates whether an op should be run.
*/
std::vector<bool> Prune(const ProgramDesc& pdesc, int block_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prune is a high level optimize part, which should be done before executor run. The executor takes a "no redundant op groups".

std::vector<bool> should_run = Prune(pdesc, block_id);
PADDLE_ENFORCE_EQ(should_run.size(), block.ops_size());
for (size_t i = 0; i < should_run.size(); ++i) {
// if (should_run[i]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No comment out code, please.

PADDLE_ENFORCE_EQ(should_run.size(), block.ops_size());
for (size_t i = 0; i < should_run.size(); ++i) {
// if (should_run[i]) {
if (true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to add a constant value and assigned with true. Otherwise, this would be a magic value.

USE_OP(fill_constant);
USE_OP(sgd);

using std::string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems useless?

};

/* @Brief
* Pruning the graph
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of Prune? I'd thought that it needs a target parameter, which could be either a variable or an operator, and it returns a new ProgramDesc that includes only dependent operators. But why doesn't the following code take a target parameter?

explicit Executor(const std::vector<platform::Place>& places);
~Executor();

/* @Brief
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think C++ code is the document, and we don't really need to use Doxygen. Therefore, we can write much shorter comments. For this specific case Executor::Run, I don't even think that it needs a comment.

Copy link
Collaborator

@wangkuiyi wangkuiyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving this PR so it doesn't last too long time. But please consider my comments in #4537 (comment).

In my mind,

  1. ProgramDesc shouldn't carry targets because a program includes all the instructions supposed to be executed.

  2. The Prune function's signature should be

    int/bool Prune(
        const ProgramDesc* input, 
        const std::vector<std::string>& targets, 
        ProgramDesc* output);

@tonyyang-svail tonyyang-svail changed the title (WIP)Executor interface design and implementation Executor interface design and implementation Oct 11, 2017
@tonyyang-svail tonyyang-svail merged commit c3bf332 into PaddlePaddle:develop Oct 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.