Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix several bugs in the team_split_2D operation #920

Merged
24 changes: 16 additions & 8 deletions src/shmem_team.c
Original file line number Diff line number Diff line change
Expand Up @@ -371,33 +371,41 @@ int shmem_internal_team_split_2d(shmem_internal_team_t *parent_team, int xrange,
const int num_xteams = ceil( parent_size / (float)xrange );
const int num_yteams = xrange;

int start = parent_start;
int start = 0;
int ret = 0;

shmem_internal_team_t *my_xteam, *my_yteam;

for (int i = 0; i < num_xteams; i++) {
int xsize = (i == num_xteams - 1 && parent_size % xrange) ? parent_size % xrange : xrange;

ret = shmem_internal_team_split_strided(parent_team, start, parent_stride,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the parent team's stride the correct thing to use here? I suspect we need some more unit tests to cover the various cases in here, for example where the parent team stride is more than 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I think it's correct. I believe the "3D" split test (spec-examples/shmem_team_split_2D.c) covers this case during the 2nd split: In the 1st split, the parent stride (team_world) is 1, but in the 2nd split, the parent team has a stride of 3 in the x-splits, and a stride of 12 in the y-splits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the stride argument supposed to be relative to the PE numbers of the parent team? And isn't the parent team's stride relative to the global PE number space?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any unit tests (in the branch of this PR, ignoring the awful spec test) that call shmem_team_split_2d?

Copy link
Member Author

@davidozog davidozog Dec 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Yes, and... Yes?

This stride is relative to the parent team, which is relative to the global PE space... maybe we should discuss in person, since I don't see an alternative stride that would make sense here.

Right, there are no unit tests with split_2d after moving the spec example, but I'm working on a 2D heat program, which might actually be a better fit in apps? Unfortunately, I don't believe a 2D heat example would exercise a non-unit stride in the parent, IIUC, but I believe the 3D split does.

xsize, xaxis_config, xaxis_mask, xaxis_team);
xsize, xaxis_config, xaxis_mask, &my_xteam);
if (ret) {
RAISE_ERROR_MSG("Creation of x-axis team %d of %d failed\n", i, num_xteams);
RAISE_ERROR_MSG("Creation of x-axis team %d of %d failed\n", i+1, num_xteams);
}
start += xrange * parent_stride;
start += xrange;

if (my_xteam != SHMEMX_TEAM_INVALID)
*xaxis_team = my_xteam;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want to add an assert here to ensure that xaxis_team is not assigned to more than once? Also, you can move the declaration of my_xteam into the loop body above and my_yteam into the loop body below. It looks like neither should be referenced outside of the loop body.

}

start = parent_start;
start = 0;

for (int i = 0; i < num_yteams; i++) {
int remainder = parent_size % xrange;
int yrange = parent_size / xrange;
int ysize = (remainder && i < remainder) ? yrange + 1 : yrange;

ret = shmem_internal_team_split_strided(parent_team, start, xrange*parent_stride,
ysize, yaxis_config, yaxis_mask, yaxis_team);
ysize, yaxis_config, yaxis_mask, &my_yteam);
if (ret) {
RAISE_ERROR_MSG("Creation of y-axis team %d of %d failed\n", i, num_yteams);
RAISE_ERROR_MSG("Creation of y-axis team %d of %d failed\n", i+1, num_yteams);
}
start += parent_stride;
start += 1;

if (my_yteam != SHMEMX_TEAM_INVALID)
*yaxis_team = my_yteam;
}

long *psync = shmem_internal_team_choose_psync(parent_team, SYNC);
Expand Down