Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI Standalone Implementation - First Bits #46

Merged
merged 27 commits into from
Jan 21, 2021

Conversation

csegarragonz
Copy link
Collaborator

@csegarragonz csegarragonz commented Dec 15, 2020

This PR is the first bit towards running standalone MPI applications #42 . In it I include a simple executor implementation for MPI methods. It consists of a single thread execution thread in a singleton thread pool. I also implement support to run both threadPool and endpoint either in the background or in the foreground (depending on where we want our program to hang).

As of now, the way to port an MPI application is to change the int main line for a custom function (faabric::executor::mpiFunc), and prepend a different main entrypoint (this could be wrapped in a macro).

To test the hello world program, from a fresh instance:

inv container.build-mpi-native

# outside the container
# modify the values in ./docker/mpi_native.env
./bin/run_mpi_native.sh

To-Do:

  • Compilation of mpi_helloworld
  • Execution of mpi_helloworld
  • Scalability test of mpi_helloworld
  • Regression test

After this gets merged in:

To be discussed:

  • Running invoke from outside the container is a bit dodgy, migrate to standalone script?
  • Wrap all code in a macro?

@csegarragonz csegarragonz added the enhancement New feature or request label Dec 15, 2020
@csegarragonz csegarragonz self-assigned this Dec 15, 2020
@csegarragonz csegarragonz marked this pull request as draft December 15, 2020 13:08
@csegarragonz csegarragonz force-pushed the standalone-mpi branch 2 times, most recently from fa20d52 to 8b6251a Compare January 15, 2021 08:16
@csegarragonz csegarragonz marked this pull request as ready for review January 15, 2021 14:23
@@ -30,6 +30,7 @@ set(PROTOC_EXE /usr/local/bin/protoc)
set(GRPC_PLUGIN /usr/local/bin/grpc_cpp_plugin)

# Pistache
link_directories(${CMAKE_INSTALL_PREFIX}/lib)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifying the global CMake context like this should be avoided where possible, as it's changing it for all targets, and can lead to hard-to-diagnose build issues if this produces a conflict (e.g. if someone has installed pistache natively and there are two versions available). The CMake docs advise against using link_directories unless absolutely necessary (https://cmake.org/cmake/help/latest/command/link_directories.html).

As it is, this is not technically related to pistache either, so if it was to make it in, it should live in the top-level CMake file.

Can you look at the grey box in the docs page and try the alternatives they list there?

docker-compose.yml Outdated Show resolved Hide resolved
docs/native_mpi.md Outdated Show resolved Hide resolved
int port = faabric::util::getSystemConfig().endpointPort;
int threadCount = faabric::util::getSystemConfig().endpointNumThreads;
std::thread servingThread;
std::unique_ptr<Pistache::Http::Endpoint> httpEndpoint;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the modifications to the endpoint here? I would have thought we'd run the endpoint in the foreground or not run it at all. Is it needed for this native MPI stuff?

Copy link
Collaborator Author

@csegarragonz csegarragonz Jan 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The endpoint is currently used to bootstrap execution. This is, before instantiating the SingletonPool object that starts the thread pool, state server, and http endpoint, we make a call to the scheduler in examples/mpi_helloworld.cpp:35 this in turn loads the function defined in faabric::executor::mpiFunc and at this point we don't need the endpoint anymore.

If running the HTTP endpoint in the background is something we definately won't need down the road, then these changes may add complexity for a single call, but as previously mentioned they are needed for initialisation.

Alternatively, we could try and come up with a different bootstrapping scheme offline (that does not rely on the handling of http requests).

Copy link
Collaborator

@Shillaker Shillaker Jan 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see where the call to the endpoint is actually made. I can see that the scheduler is called, but I don't think the endpoint needs to be running for the scheduler to work. IIRC the endpoint is basically just an HTTP wrapper around the scheduler that doesn't do much else. What does the endpoint add in this case and where in mpi_helloworld.cpp is the endpoint actually used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are completely right about this. I missunderstood some of the endpoint functionalities, I really don't need it at all. I will revert to the implementation in master.

CMakeLists.txt Outdated Show resolved Hide resolved
tasks/__init__.py Outdated Show resolved Hide resolved
tasks/mpi.py Outdated Show resolved Hide resolved
tasks/mpi.py Outdated Show resolved Hide resolved
tasks/mpi.py Outdated Show resolved Hide resolved
tasks/mpi.py Outdated Show resolved Hide resolved
@csegarragonz csegarragonz force-pushed the standalone-mpi branch 2 times, most recently from 741b446 to 883c62f Compare January 18, 2021 15:25
@csegarragonz csegarragonz linked an issue Jan 19, 2021 that may be closed by this pull request
@csegarragonz csegarragonz force-pushed the standalone-mpi branch 6 times, most recently from a4a26fc to 4a4904f Compare January 20, 2021 06:46
Copy link
Collaborator

@Shillaker Shillaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of minor requests

add_dependencies(pistache pistache_ext)
set_target_properties(pistache
PROPERTIES IMPORTED_LOCATION ${CMAKE_INSTALL_PREFIX}/lib/libpistache.so
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this looks relatively clean. Can you move faasm and faasm-toolchain to this version of Faabric to check they both still work after (or before) merging this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is faasm's and here the toolchain's. Should we merge those PRs?

Note that the toolchain one would not pass for some missmatch in the submodule build (which I am looking into) but afaik everything else worked fine (as to what this PR is concerned).

docs/native_mpi.md Show resolved Hide resolved
Copy link
Collaborator

@Shillaker Shillaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny typo, then good to go

@task
def push_mpi_native(ctx):
"""
Push current version of gRPC container
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny typo

@csegarragonz csegarragonz merged commit 44e913c into faasm:master Jan 21, 2021
@csegarragonz csegarragonz deleted the standalone-mpi branch January 21, 2021 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[MPI] Centralize Logging
2 participants