Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow generating execution graphs from MPI runs #154

Merged
merged 6 commits into from
Oct 15, 2021
Merged

Conversation

csegarragonz
Copy link
Collaborator

@csegarragonz csegarragonz commented Oct 13, 2021

In this PR I add support to gneerate execution graphs from MPI runs.

Before only chained calls would log subsequent invocations (i.e. calls to sch.callFunctions) to redis.

Logging all invocations to callFunctions would prevent having more fixes like this, but I am not sure if we want to do it or not. Plus, we'd have to pass the current message id to the scheduler, but that's a minor issue I guess.

I also add a test where we check that the MPI world creation generates the expected execution graph (i.e. root process faning-out to size - 1 processes). Given that we don't have control over the messages that MPI uses to generate that graph, we are just able to imitate it in the expected graph for the test. In particular some fields like msg.id() will differ from the actual graph and the expected one. Thus we only compare the fields that MpiWorld::create() sets.

@csegarragonz csegarragonz self-assigned this Oct 13, 2021
Copy link
Collaborator

@Shillaker Shillaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test to check that the MPI calls record what we expect them to?

@csegarragonz csegarragonz force-pushed the exec-graph branch 4 times, most recently from c631709 to 2257f29 Compare October 15, 2021 08:53
@csegarragonz csegarragonz force-pushed the exec-graph branch 2 times, most recently from fb8c201 to ba74dcf Compare October 15, 2021 09:17
// Update the result for the master message
sch.setFunctionResult(msg);
// Wait for the scheduler to set the result on the MPI non-master messages
SLEEP_MS(500);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than sleeping here, can we instead call getFunctionResult for each message in turn? Although some sleeps still exist in the tests, we should try to avoid wherever possible.

REQUIRE(msgA.mpiworldid() == msgB.mpiworldid());
REQUIRE(msgA.mpirank() == msgB.mpirank());
REQUIRE(msgA.mpiworldsize() == msgB.mpiworldsize());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the need for this method, but is it really just doing the same thing as checkMessageEquality, just not checking the ID? Could we instead add a bool flag to checkMessageEquality, which is something like checkIDs which is true by default, then pass false when we need to do this sort of check?

Then we could change the parameter on checkExecGraphEquality to also be checkIDs with true default.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we check only these fields is that these are the ones MPI changes, and thus we manually set them in the test.

Any other field we also wanted to check, we'd have to, afaict, manually add ourselves to each message.

Copy link
Collaborator

@Shillaker Shillaker Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we don't touch any of the others, is there any harm in checking them? They will all be uninitialised and therefore pass the checks won't they? We'll just be comparing lots of blank strings/ zeros etc.

I.e. can we avoid having to add a new comparison function to check every permutation of edits we might make to a message?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, for instance the executedHost or finishtimestamp fields are set by the scheduler when the chained functions finish (i.e. the world is destroyed).

Copy link
Collaborator

@Shillaker Shillaker Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the executed host not be the current host therefore we can check that? We deal with the finish timestamp issue in another tests here by just overriding it: https://github.com/faasm/faabric/blob/master/tests/test/scheduler/test_scheduler.cpp#L663

The reason I'm against having more than one message comparison function is that it's easy to forget to add fields to them, and if we just have one, then it's easier to track.

It's a sort of blacklisting/ whitelisting problem, where rather than saying "we want to check that this subset of fields is the same", we're saying "we only expect these fields to have changed, and all others to be the same". The latter is more strict (and more vebose), but catches problems more frequently as a result.

@csegarragonz csegarragonz merged commit ad8fcd3 into master Oct 15, 2021
@csegarragonz csegarragonz deleted the exec-graph branch October 15, 2021 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants