Change default MPI_BASE_PORT
and turn into env. variable
#150
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
All tests running on Knative with, specifically, 10 ranks would fail, claiming that the port
9090
was already in use. Sure enough, ports9090
and9091
are used byknative-serving
on thefaasm-worker
service.The fact that this only happens with 10 ranks (i.e.
MPI_WORLD_SIZE = 10
) is because the ports we use are a combination of theoutbound-inbound
ranks and the world size.To solve this problem, I change the default MPI base port to a port area, apparently, less cluttered:
10800
. However, for us to actually benefit from this patch we will now have to:faabric
.faasm
.faasm
version and re-build containers.To prevent this from occuring in the future, I also make the
MPI_BASE_PORT
an environment variable in thefaabric
config.One may argue that the over-arching problem is that, in a multi-tenant environment with big MPI jobs, we end up using too many ports, and errors like this will keep appearing. I could agree with that. And given that we now will start working on multi-tenant MPI jobs, it's about time to think about this in detail. I say we move discussion for this offline.