Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime failures on larger process counts #66

Open
krzikalla opened this issue Jul 29, 2021 · 2 comments
Open

Runtime failures on larger process counts #66

krzikalla opened this issue Jul 29, 2021 · 2 comments

Comments

@krzikalla
Copy link

I run in all kinds of trouble, when I try running GPI2 1.5 on somewhat larger process counts. Up to 128 procs all is fine, but starting with 256 processes the program stops with all kinds of unreproducible errors. Tried on two Infiniband clusters.

Has something changed from 1.3 to 1.5, so that a gaspi_proc_init with GASPI_TOPOLOGY_STATIC isn't advisable anymore for those process counts? And if I use GASPI_TOPOLOGY_NONE and connect only neighbors by hand, can I then use the gaspi collectives nevertheless?

@krzikalla
Copy link
Author

The following program fails reliable with 1024 processes on our cluster. Please, can someone look at it, apparently somewhat in the collectives is broken (GPI2 v1.5.0). (or it is my AllGatherValueImpl function)

//  mpicxx gaspi_segment.cpp -pthread -I$GPI2_HOME/include -L$GPI2_HOME/lib64 -lGPI2

#include <iostream>
#include <cassert>
#include <vector>
#include <mpi.h>
#include "GASPI.h"


inline gaspi_return_t CheckGaspiResult(gaspi_return_t result, const char* what)
{
  if (result != GASPI_SUCCESS && result != GASPI_TIMEOUT)
  {
    throw std::runtime_error(what);
  }
  return result;
}

#define GASPI_CHECK( X ) CheckGaspiResult((X), #X)


using RankIndexT = unsigned int;

struct GASPICommunicator
{
  gaspi_rank_t numProcs_;
  gaspi_rank_t ownRank_;
  gaspi_number_t maxReduceElems_;

  GASPICommunicator()
  {
    GASPI_CHECK(gaspi_allreduce_elem_max(&maxReduceElems_));
    GASPI_CHECK(gaspi_proc_rank(&ownRank_));
    GASPI_CHECK(gaspi_proc_num(&numProcs_));
  }

  void AllGatherValueImpl(const unsigned int* values, unsigned int* data)
  {
    std::fill_n(data, numProcs_, 0);
    std::copy(values, values + 1, data + ownRank_);
    gaspi_number_t remainingElems = gaspi_number_t(numProcs_);
    while (remainingElems > 0)
    {
      auto reduceEles = std::min(maxReduceElems_, remainingElems);
      GASPI_CHECK(gaspi_allreduce(data, data, reduceEles, GASPI_OP_SUM, GASPI_TYPE_UINT, GASPI_GROUP_ALL, GASPI_BLOCK));
      remainingElems -= reduceEles;
      data += reduceEles;
    }
  }
};

void CheckAllreduce()
{
  GASPICommunicator communicator;
  unsigned int value = communicator.ownRank_;
  std::vector<unsigned int> allData (communicator.numProcs_, -1);
  communicator.AllGatherValueImpl(&value, allData.data());
  for (int i = 0; i < communicator.numProcs_; ++i)
  {
    if (i != allData[i])
    {
      std::cout << "At rank " << value << " first fail at " << i << ", content is " << allData[i] << std::endl;
      return;
    }
  }
  std::cout << "At rank " << value << " all OK." << std::endl;
}


int main(int argc, char** argv)
{
  int provided_thread_level;
  int mpi_init_result = MPI_Init_thread(&argc, &argv, MPI_THREAD_SERIALIZED, &provided_thread_level);

  GASPI_CHECK(gaspi_proc_init(GASPI_BLOCK));

  CheckAllreduce();

  GASPI_CHECK(gaspi_proc_term(GASPI_BLOCK));
  MPI_Finalize();
}

@krzikalla
Copy link
Author

Update on this issue: the reason seems to be the setting of PCI_WR_ORDERING. If set to per_mkey(0), all is fine. If set to force_relax(1), races will happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant