Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defect: remote indirect addressing does not work #427

Closed
sfilippone opened this issue Aug 8, 2017 · 24 comments · Fixed by #528
Closed

Defect: remote indirect addressing does not work #427

sfilippone opened this issue Aug 8, 2017 · 24 comments · Fixed by #528

Comments

@sfilippone
Copy link

Avg response time
Issue Stats

Is remote indirect addressing supposed to work as of the current development status?
The key statement is

    xv(loc_idx(1:nhl,ip)) = xv(rmt_idx(1:nhl,ip))[xchg(ip)]

which is a (very) simplified version of the so-called halo exchange procedure.
Without remote indirect addressing I am forced to use a two-sided approach, with packing/transfer/unpacking.
On a related note, how is the implementation going to handle this? Is there any hope that this will be efficient? E.g., are the indices going to travel across the network? Is this going to be translated into multiple accesses? That would kill performance, hence preventing the use of 1-sided in a very important scenario, PDEs on unstructured meshes.

Defect/Bug Report

  • OpenCoarrays Version: 1.9.0
  • Fortran Compiler: GNU 6.3.0, 7.1.0 and 8.0.0
  • C compiler used for building lib: GNU
  • Installation method: cmake
  • Output of uname -a:Linux sow768056c-li.soe.cranfield.ac.uk 2.6.32-696.6.3.el6.x86_64 tests dis_transpose: test passed  #1 SMP
  • MPI library being used: MPICH 3.2.0
  • Machine architecture and number of physical cores: Intel 8 cores
  • Version of CMake:

Observed Behavior

e802756@sow768056c-li [103] 12:04 PM [testIndirect] caf --version

OpenCoarrays Coarray Fortran Compiler Wrapper (caf version 1.9.0)
Copyright (C) 2015-2016 Sourcery, Inc.

OpenCoarrays comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of OpenCoarrays under the terms of the
BSD 3-Clause License.  For more information about these matters, see
the file named LICENSE.

e802756@sow768056c-li [104] 12:22 PM [testIndirect] gfortran -v
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/home/opt/gnu/8.0.0/bin/../libexec/gcc/x86_64-pc-linux-gnu/8.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc/configure --prefix=/opt/gnu/8.0.0 --enable-languages=c,c++,fortran : (reconfigured) ../gcc/configure --prefix=/opt/gnu/8.0.0 --enable-languages=c,c++,fortran : (reconfigured) ../gcc/configure --prefix=/opt/gnu/8.0.0 --enable-languages=c,c++,fortran : (reconfigured) ../gcc/configure --prefix=/opt/gnu/8.0.0 --enable-languages=c,c++,fortran : (reconfigured) ../gcc/configure --prefix=/opt/gnu/8.0.0 --enable-languages=c,c++,fortran,lto --no-create --no-recursion : (reconfigured) ../gcc/configure --prefix=/opt/gnu/8.0.0 --enable-languages=c,c++,fortran,lto --no-create --no-recursion : (reconfigured) ../gcc/configure --prefix=/opt/gnu/8.0.0 --enable-languages=c,c++,fortran,lto --no-create --no-recursion : (reconfigured) ../gcc/configure --prefix=/opt/gnu/8.0.0 --enable-languages=c,c++,fortran,lto --no-create --no-recursion : (reconfigured) ../gcc/configure --prefix=/opt/gnu/8.0.0 --enable-languages=c,c++,fortran,lto --no-create --no-recursion : (reconfigured) ../gcc/configure --prefix=/opt/gnu/8.0.0 --enable-languages=c,c++,fortran,lto --no-create --no-recursion : (reconfigured) ../gcc/configure --prefix=/opt/gnu/8.0.0 --enable-languages=c,c++,fortran,lto --no-create --no-recursion : (reconfigured) ../gcc/configure --prefix=/opt/gnu/8.0.0 --enable-languages=c,c++,fortran,lto --no-create --no-recursion
Thread model: posix
gcc version 8.0.0 20170801 (experimental) (GCC)
e802756@sow768056c-li [102] 12:04 PM [testIndirect] cafrun -np 4 ./ind_test_gnu80 
1: From halo exchange :   1.    2.    3.    4.    5.    6.    7.    8. :   0.    0.
2: From halo exchange :   9.   10.   11.   12.   13.   14.   15.   16. :   0.    0.    0.    0.
3: From halo exchange :  17.   18.   19.   20.   21.   22.   23.   24. :   0.    0.    0.    0.
STOP 
STOP 
STOP 
4: From halo exchange :  25.   26.   27.   28.   29.   30.   31.   32. :   0.    0.
STOP 

Expected Behavior

With Intel I get:

[e802756@delta-login-1 testIndirect]$ ifort --version
ifort (IFORT) 17.0.3 20170404
Copyright (C) 1985-2017 Intel Corporation.  All rights reserved.
[e802756@delta-login-1 testIndirect]$ ./ind_test_intel 
1: From halo exchange :   1.    2.    3.    4.    5.    6.    7.    8. :   9.   10.
2: From halo exchange :   9.   10.   11.   12.   13.   14.   15.   16. :   7.    8.   17.   18.
3: From halo exchange :  17.   18.   19.   20.   21.   22.   23.   24. :  15.   16.   25.   26.
4: From halo exchange :  25.   26.   27.   28.   29.   30.   31.   32. :  23.   24.

Steps to Reproduce

Compile the attached file & run
Is this an OC issue, a GFORTRAN issue, or both?

  use iso_fortran_env
  implicit none
  integer, parameter :: nloc=8, nhl=2, ivsize=nloc+2*nhl
  real    :: xv(ivsize)[*]
  integer :: rmt_idx(2,2), loc_idx(2,2)
  integer, allocatable :: xchg(:)
  integer :: nrcv, me, np, nxch, i, ip, iv
  character(len=120) :: fmt

  me = this_image()
  np = num_images()
  
  if (np==1) then
    !    allocate(xchg(0))
    xchg = [ integer :: ]
    
  else if (me == 1) then
    xchg = [me+1]
  else if (me == np) then
    xchg = [me-1]
  else
    xchg = [me-1, me+1]
  end if
  nxch = size(xchg)
  nrcv = nxch * nhl 

  
  xv(1:nloc) = [(i,i=(me-1)*nloc+1,me*nloc)]
  iv = nloc + 1
  do ip=1, nxch
    loc_idx(1:nhl,ip) = [ (i,i=iv,iv+nhl-1) ]
    if (xchg(ip) == me-1) then
      rmt_idx(1:nhl,ip) = [ (i,i=nloc-nhl+1,nloc) ]
    else
      rmt_idx(1:nhl,ip) = [ (i,i=1,nhl) ]
    end if
    iv = iv + nhl
  end do

  sync images(xchg)
  iv = nloc + 1
  do ip=1, nxch
    xv(loc_idx(1:nhl,ip)) = xv(rmt_idx(1:nhl,ip))[xchg(ip)]
  end do
  
  do ip=1, np
    sync all
    if (ip == me) then
      write(fmt,*) '( i0,a,',nloc,'(f5.0,1x),a,',nrcv,'(f5.0,1x) )'
      write(*,fmt) me,': From halo exchange :',xv(1:nloc),':',xv(nloc+1:nloc+nrcv)
    end if
  end do
  stop

end program tst_ind_1
@vehre
Copy link
Collaborator

vehre commented Aug 8, 2017

Hi Salvatore,

gfortran implements this. OC only partially. For gcc 7+ this kind of addressing is implemented for get operations (i.e., coarray index on the right-hand side of the assignment) when the lhs is allocatable or the right hand side involves allocatable components in derived type coarrays. Unfortunately is the support for allocatable components currently broken in OC. I have a fix, but its not perfect yet.

Because we do not handle this yet, the first approach will be element by element, i.e., not using mpi_datatypes or the like. We strive to get all communications working before optimizing them. At the moment send() and getsend() for allocatable components is still missing completely.

Sorry, to have no better answer for you.

Regards,
Andre

@sfilippone
Copy link
Author

Hi Andre,
The funny thing here is that I was trying to write a simplified version, but the ultimate intended usage would be to have allocatable coarray components in the RHS....so I have simplified a bit too much!

Anyway, getting this to work, and to work efficiently, is going to be extremely important for applications.
In the not-so-distant future I have a new grad student starting, I shall probably divert some of his time to help into these matters.
Cheers
Salvatore

@sfilippone
Copy link
Author

Speaking of MPI datatypes, there is one thing that MPI would allow: from the application structure, I know that (some of) the set of indices are going to be reused over and over again. In MPI this would obviously be handled by saving the datatype in an application-level cache. With Coarray, I would not have control over such a thing, so it would depend on how expensive it is to define an MPI datatype every time this is used.
I think I need to make a lot more detailed measurements to explore all possibilities...

@vehre
Copy link
Collaborator

vehre commented Aug 8, 2017

Hi Salvatore,

have a look at branch issue-399 and make sure to use a gcc7+ for compiling. There support is available when you revert your simplification. I am curious how that implementation performs, if it runs at all. So only when may be your grad student has time for experimenting.

As a side note, this is no easy task. I have a Ph.D. in CS and it took me several months to understand all of the intricasies of gfortran, opencoarrays and the mpi-interplay. So don't get mad at him, when he makes only little progress.

MPI-Datatype caching is an interesting approach. The downside is, that OC can not identify identically structured calls easily, i.e., calls from the same call-side will be treated as totally different ones. There currently is no approach how to solve this (although it can be quite interesting from the optimization point of view).

I hope to have a bit more time left for OC in the future (already regretting to have said this, because Damian is likely to jump for it).

Andre

@sfilippone
Copy link
Author

This is what I get (with GNU 8)

e802756@sow768056c-li [158] 03:53 PM [testIndirect] cafrun -np 4 ./ind_test_gnu80
*** caf_mpi-lib runtime message on image 1:
*** The allocatable components feature 'caf_sendget_by_ref()' of Fortran 2008 standard
*** is not yet supported by OpenCoarrays.
*** caf_mpi-lib runtime message on image 2:
*** The allocatable components feature 'caf_sendget_by_ref()' of Fortran 2008 standard
*** is not yet supported by OpenCoarrays.
*** caf_mpi-lib runtime message on image 3:
*** The allocatable components feature 'caf_sendget_by_ref()' of Fortran 2008 standard
*** is not yet supported by OpenCoarrays.
*** caf_mpi-lib runtime message on image 4:
*** The allocatable components feature 'caf_sendget_by_ref()' of Fortran 2008 standard
*** is not yet supported by OpenCoarrays.

@vehre
Copy link
Collaborator

vehre commented Aug 8, 2017

Uh, that was a sendget()? I don't see one in the example above. So to make this clear: only get() is implemented there in OC yet. No send(get).

@sfilippone
Copy link
Author

This is the source code. I do not see any sendget .... I was extremely surprised at that message!

program tst_ind_1
  use iso_fortran_env
  implicit none
  integer, parameter :: nloc=8, nhl=2, ivsize=nloc+2*nhl
  type tempv
    real, allocatable :: v(:)[:]
  end type tempv
  ! real    :: xv(ivsize)[*]
  type(tempv) :: xv
  integer :: rmt_idx(2,2), loc_idx(2,2)
  integer, allocatable :: xchg(:)
  integer :: nrcv, me, np, nxch, i, ip, iv
  character(len=120) :: fmt

  me = this_image()
  np = num_images()
  
  if (np==1) then
    !    allocate(xchg(0))
    xchg = [ integer :: ]
    
  else if (me == 1) then
    xchg = [me+1]
  else if (me == np) then
    xchg = [me-1]
  else
    xchg = [me-1, me+1]
  end if
  nxch = size(xchg)
  nrcv = nxch * nhl 
  allocate(xv%v(ivsize)[*])
  
  xv%v(1:nloc) = [(i,i=(me-1)*nloc+1,me*nloc)]
  iv = nloc + 1
  do ip=1, nxch
    loc_idx(1:nhl,ip) = [ (i,i=iv,iv+nhl-1) ]
    if (xchg(ip) == me-1) then
      rmt_idx(1:nhl,ip) = [ (i,i=nloc-nhl+1,nloc) ]
    else
      rmt_idx(1:nhl,ip) = [ (i,i=1,nhl) ]
    end if
    iv = iv + nhl
  end do

  sync images(xchg)
  iv = nloc + 1
  do ip=1, nxch
    xv%v(loc_idx(1:nhl,ip)) = xv%v(rmt_idx(1:nhl,ip))[xchg(ip)]
  end do
  
  do ip=1, np
    sync all
    if (ip == me) then
      write(fmt,*) '( i0,a,',nloc,'(f5.0,1x),a,',nrcv,'(f5.0,1x) )'
      write(*,fmt) me,': From halo exchange :',xv%v(1:nloc),':',xv%v(nloc+1:nloc+nrcv)
    end if
  end do
  stop

end program tst_ind_1

@vehre
Copy link
Collaborator

vehre commented Aug 8, 2017

This line:
xv%v(loc_idx(1:nhl,ip)) = xv%v(rmt_idx(1:nhl,ip))[xchg(ip)]
will presumably be converted to a sendget, to allow better optimisation. The issue here is, that when xv%v on the lhs is reallocated (not in this case of course), then the OC needs to know or rather carry out the realloc. Therefore all ops where the target is a reallocatable coarray will be converted to a sendget(). It may help to use a temporary for the target here, just to learn whether it works.

@sfilippone
Copy link
Author

Still does not work

e802756@sow768056c-li [161] 04:32 PM [testIndirect] caf -o ind_test_gnu80 ind_test_ab.f90
e802756@sow768056c-li [162] 04:32 PM [testIndirect] cafrun -np 4 ./ind_test_gnu80
1: From halo exchange :   1.    2.    3.    4.    5.    6.    7.    8. :*****    0.
2: From halo exchange :   9.   10.   11.   12.   13.   14.   15.   16. :*****    0.    0.    0.
3: From halo exchange :  17.   18.   19.   20.   21.   22.   23.   24. :*****    0.    0.    0.
STOP 
STOP 
STOP 
4: From halo exchange :  25.   26.   27.   28.   29.   30.   31.   32. :*****    0.

Code;

program tst_ind_1
  use iso_fortran_env
  implicit none
  integer, parameter :: nloc=8, nhl=2, ivsize=nloc+2*nhl
  type tempv
    real, allocatable :: v(:)[:]
  end type tempv
  ! real    :: xv(ivsize)[*]
  type(tempv) :: xv
  real, allocatable :: tv(:)
  integer :: rmt_idx(2,2), loc_idx(2,2)
  integer, allocatable :: xchg(:)
  integer :: nrcv, me, np, nxch, i, ip, iv
  character(len=120) :: fmt

  me = this_image()
  np = num_images()
  
  if (np==1) then
    !    allocate(xchg(0))
    xchg = [ integer :: ]
    
  else if (me == 1) then
    xchg = [me+1]
  else if (me == np) then
    xchg = [me-1]
  else
    xchg = [me-1, me+1]
  end if
  nxch = size(xchg)
  nrcv = nxch * nhl 
  allocate(xv%v(ivsize)[*])
  allocate(tv(ivsize))

  
  tv(1:nloc) = [(i,i=(me-1)*nloc+1,me*nloc)]
  xv%v(1:nloc) = tv(1:nloc)
  iv = nloc + 1
  do ip=1, nxch
    loc_idx(1:nhl,ip) = [ (i,i=iv,iv+nhl-1) ]
    if (xchg(ip) == me-1) then
      rmt_idx(1:nhl,ip) = [ (i,i=nloc-nhl+1,nloc) ]
    else
      rmt_idx(1:nhl,ip) = [ (i,i=1,nhl) ]
    end if
    iv = iv + nhl
  end do

  sync images(xchg)
  iv = nloc + 1
  do ip=1, nxch
    !xv%v(loc_idx(1:nhl,ip)) = xv%v(rmt_idx(1:nhl,ip))[xchg(ip)]
    tv(loc_idx(1:nhl,ip)) = xv%v(rmt_idx(1:nhl,ip))[xchg(ip)]
  end do
  
  do ip=1, np
    sync all
    if (ip == me) then
      write(fmt,*) '( i0,a,',nloc,'(f5.0,1x),a,',nrcv,'(f5.0,1x) )'
      ! write(*,fmt) me,': From halo exchange :',xv%v(1:nloc),':',xv%v(nloc+1:nloc+nrcv)
      write(*,fmt) me,': From halo exchange :',tv(1:nloc),':',tv(nloc+1:nloc+nrcv)
    end if
  end do
  stop

end program tst_ind_1

@vehre
Copy link
Collaborator

vehre commented Aug 8, 2017

Unfortunately do I have to report, that you found a compiler bug. Gfortran is not copying the data back from the internal temporary it generates for tv, but rather treats the internal temporary as argument to get().

This bug is now known as: Fortran bugtracker 81773

@sfilippone
Copy link
Author

I have tried to run the original example with Intel. While it works as shown above, once I ramp up the array sizes either a) it takes forever or b) emits a segfault.

@sfilippone
Copy link
Author

Perhaps even more interesting, if I have contiguous addressing on the LHS I get wrong results:

program tst_ind_1
  use iso_fortran_env
  implicit none
  integer, parameter :: nloc=8, nhl=2, ivsize=nloc+2*nhl
  real    :: xv(ivsize)[*]
  integer :: rmt_idx(2,2), loc_idx(2,2)
  integer, allocatable :: xchg(:)
  integer :: nrcv, me, np, nxch, i, ip, iv
  character(len=120) :: fmt

  me = this_image()
  np = num_images()
  
  if (np==1) then
    !    allocate(xchg(0))
    xchg = [ integer :: ]
    
  else if (me == 1) then
    xchg = [me+1]
  else if (me == np) then
    xchg = [me-1]
  else
    xchg = [me-1, me+1]
  end if
  nxch = size(xchg)
  nrcv = nxch * nhl 

  
  xv(1:nloc) = [(i,i=(me-1)*nloc+1,me*nloc)]
  iv = nloc + 1
  do ip=1, nxch
    loc_idx(1:nhl,ip) = [ (i,i=iv,iv+nhl-1) ]
    if (xchg(ip) == me-1) then
      rmt_idx(1:nhl,ip) = [ (i,i=nloc-nhl+1,nloc) ]
    else
      rmt_idx(1:nhl,ip) = [ (i,i=1,nhl) ]
    end if
    iv = iv + nhl
  end do

  sync images(xchg)
  iv = nloc + 1
  do ip=1, nxch
    !    xv(loc_idx(1:nhl,ip)) = xv(rmt_idx(1:nhl,ip))[xchg(ip)]
    ! write(*,*) me,': Reading from :',xchg(ip),' :',rmt_idx(1:nhl,ip),' :',loc_idx(1:nhl,ip)
          
    xv(iv:iv+nhl-1) = xv(rmt_idx(1:nhl,ip))[xchg(ip)]
    iv = iv + nhl
  end do
  
  do ip=1, np
    sync all
    if (ip == me) then
      write(fmt,*) '( i0,a,',nloc,'(f5.0,1x),a,',nrcv,'(f5.0,1x) )'
      write(*,fmt) me,': From halo exchange :',xv(1:nloc),':',xv(nloc+1:nloc+nrcv)
    end if
  end do
  stop

end program tst_ind_1

With Intel I get the expected output:

[e802756@delta-login-1 testIndirect]$ ifort  -coarray -oind_test_intel ind_test.F90 -DIS_INTEL
[e802756@delta-login-1 testIndirect]$ ./ind_test_intel 
1: From halo exchange :   1.    2.    3.    4.    5.    6.    7.    8. :   9.   10.
2: From halo exchange :   9.   10.   11.   12.   13.   14.   15.   16. :   7.    8.   17.   18.
3: From halo exchange :  17.   18.   19.   20.   21.   22.   23.   24. :  15.   16.   25.   26.
4: From halo exchange :  25.   26.   27.   28.   29.   30.   31.   32. :  23.   24.

With OC1.9.0?GNU 8.0.0 I get something a bit strange:

e802756@sow768056c-li [194] 04:09 PM [testIndirect] caf -o ind_test_gnu80 ind_test.F90
e802756@sow768056c-li [195] 04:10 PM [testIndirect] cafrun -np 4 ./ind_test_gnu80
1: From halo exchange :   1.    2.    3.    4.    5.    6.    7.    8. :   9.   10.
2: From halo exchange :   9.   10.   11.   12.   13.   14.   15.   16. :   1.    2.   17.   18.
3: From halo exchange :  17.   18.   19.   20.   21.   22.   23.   24. :   9.   10.   25.   26.
STOP 
STOP 
STOP 
4: From halo exchange :  25.   26.   27.   28.   29.   30.   31.   32. :  17.   18.
STOP 

@vehre
Copy link
Collaborator

vehre commented Aug 12, 2017

Hi Salvatore,

for the last example, I just pushed a solution in the branch issue-427. That branch experimentally fixes the get to linear array issue as given in the last example. Only(!) this is fixed. Send and getsend are not yet fixed, i.e., to use it you have to provide a temporary on the left-hand side. Please test the solution. If it satisfies your needs, then I will try to get something similar up for send() and getsend().

@ALL: Btw, I also made the get() routine more flexible, when the efficient implementation for get() is used, i.e., when STRIDED is set, then for get also copy to self is supported (testcase sameloc/ get_self (we definitively should call testcase's executables/names the same like their source file)). The test nevertheless fails later because it also tests send and getsend. Which IMHO should better be split into separate tests.

Regards,
Andre

@sfilippone
Copy link
Author

Hi Andre,
The fix works for me, with the stated limitations.
As for whether it suits my needs, well, it certainly is an improvement, and I can now do some more "interesting" tests.
This is far from the end of the story, unfortunately there is a language standard limitation that is really getting in my way;
Anyway, thanks for your help
Salvatore

@sfilippone
Copy link
Author

Hello,
I tried to run one further test on the computing system here at Cranfield. The environment is:

[e802756@delta-login-2 CafTest_Indirect]$ caf --version

OpenCoarrays Coarray Fortran Compiler Wrapper (caf version 1.9.1-1-g9dffbca)
Copyright (C) 2015-2016 Sourcery, Inc.

OpenCoarrays comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of OpenCoarrays under the terms of the
BSD 3-Clause License.  For more information about these matters, see
the file named LICENSE.

[e802756@delta-login-2 CafTest_Indirect]$ module li 

Currently Loaded Modules:
  1) GCCcore/7.1.0                 3) GCC/7.1.0-2.28                5) OpenMPI/2.1.0-GCC-7.1.0-2.28
  2) binutils/2.28-GCCcore-7.1.0   4) hwloc/1.11.6-GCC-7.1.0-2.28

When I try the simple test program

program tst_ind_ab
  use iso_fortran_env
  implicit none
  integer, parameter :: nloc=8, nhl=2, ivsize=nloc+2*nhl
  type tempv
    real, allocatable :: v(:)[:]
  end type tempv
  ! real    :: xv(ivsize)[*]
  type(tempv) :: xv
  real, allocatable :: tv(:)
  integer :: rmt_idx(2,2), loc_idx(2,2)
  integer, allocatable :: xchg(:)
  integer :: nrcv, me, np, nxch, i, ip, iv
  character(len=120) :: fmt

  me = this_image()
  np = num_images()
  
  if (np==1) then
    !    allocate(xchg(0))
    xchg = [ integer :: ]
    
  else if (me == 1) then
    xchg = [me+1]
  else if (me == np) then
    xchg = [me-1]
  else
    xchg = [me-1, me+1]
  end if
  nxch = size(xchg)
  nrcv = nxch * nhl 
  allocate(xv%v(ivsize)[*])
  allocate(tv(ivsize))

  
  tv(1:nloc) = [(i,i=(me-1)*nloc+1,me*nloc)]
  xv%v(1:nloc) = tv(1:nloc)
  iv = nloc + 1
  do ip=1, nxch
    loc_idx(1:nhl,ip) = [ (i,i=iv,iv+nhl-1) ]
    if (xchg(ip) == me-1) then
      rmt_idx(1:nhl,ip) = [ (i,i=nloc-nhl+1,nloc) ]
    else
      rmt_idx(1:nhl,ip) = [ (i,i=1,nhl) ]
    end if
    iv = iv + nhl
  end do

  sync images(xchg)
  iv = nloc + 1
  do ip=1, nxch
    !xv%v(loc_idx(1:nhl,ip)) = xv%v(rmt_idx(1:nhl,ip))[xchg(ip)]
    !tv(loc_idx(1:nhl,ip)) = xv%v(rmt_idx(1:nhl,ip))[xchg(ip)]
    tv(iv:iv+nhl-1) = xv%v(rmt_idx(1:nhl,ip))[xchg(ip)]
    iv = iv + nhl
  end do
  
  do ip=1, np
    sync all
    if (ip == me) then
      write(fmt,*) '( i0,a,',nloc,'(f5.0,1x),a,',nrcv,'(f5.0,1x) )'
      ! write(*,fmt) me,': From halo exchange :',xv%v(1:nloc),':',xv%v(nloc+1:nloc+nrcv)
      write(*,fmt) me,': From halo exchange :',tv(1:nloc),':',tv(nloc+1:nloc+nrcv)
    end if
  end do
  stop

end program tst_ind_ab

I get the expected output:

[e802756@delta-login-2 CafTest_Indirect]$ caf -o ind_test_ab ind_test_ab.f90 
[e802756@delta-login-2 CafTest_Indirect]$ cafrun -np 4 ./ind_test_ab
1: From halo exchange :   1.    2.    3.    4.    5.    6.    7.    8. :   9.   10.
2: From halo exchange :   9.   10.   11.   12.   13.   14.   15.   16. :   7.    8.   17.   18.
3: From halo exchange :  17.   18.   19.   20.   21.   22.   23.   24. :  15.   16.   25.   26.
4: From halo exchange :  25.   26.   27.   28.   29.   30.   31.   32. :  23.   24.
STOP 
STOP 
STOP 
STOP 
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[delta-login-2:16781] 3 more processes have sent help message help-mpi-api.txt / mpi-abort
[delta-login-2:16781] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

but I am confused by the mpi-abort message.

Moreover, if I try to run a slightly more complex program,

program tst_ind_ab
  use iso_fortran_env
  implicit none
  integer, parameter :: nloc=128*128*128, nhl=16*128
  type tempv
    real, allocatable :: v(:)[:]
  end type tempv
  ! real    :: xv(ivsize)[*]
  type(tempv) :: xv
  real, allocatable :: tv(:)
  integer, allocatable :: rmt_idx(:,:), loc_idx(:,:)
  integer, allocatable :: xchg(:)
  integer :: nrcv, ivsize, me, np, nxch, i, ip, iv
  integer :: icnt1, icnt2, icr
  real :: t1, t2
  character(len=120) :: fmt

  me = this_image()
  np = num_images()

  call system_clock(count_rate=icr)
  
  if (np==1) then
    !    allocate(xchg(0))
    xchg = [ integer :: ]
    
  else if (me == 1) then
    xchg = [me+1]
  else if (me == np) then
    xchg = [me-1]
  else
    if (mod(me,2) == 0) then 
      xchg = [me-1, me+1]
    else
      xchg = [me+1, me-1]
    end if
  end if
  nxch = size(xchg)
  nrcv = nxch * nhl 
  ivsize = nloc + nrcv
  call co_max(ivsize)

  allocate(xv%v(ivsize)[*])
  allocate(tv(ivsize))
  allocate(rmt_idx(nhl,nxch),loc_idx(nhl,nxch))
  write(*,*) me,' My size :',nxch,nrcv, ivsize

  
  tv(1:nloc) = [(i,i=(me-1)*nloc+1,me*nloc)]
  xv%v(1:nloc) = tv(1:nloc)
  iv = nloc + 1
  do ip=1, nxch
    loc_idx(1:nhl,ip) = [ (i,i=iv,iv+nhl-1) ]
    if (xchg(ip) == me-1) then
      rmt_idx(1:nhl,ip) = [ (i,i=nloc-nhl+1,nloc) ]
    else
      rmt_idx(1:nhl,ip) = [ (i,i=1,nhl) ]
    end if
    iv = iv + nhl
  end do
  write(*,*) me,' Syncng with :',xchg, icr
  sync all 

  call system_clock(count=icnt1)
  sync images(xchg)
  iv = nloc + 1
  do ip=1, nxch
    !xv%v(loc_idx(1:nhl,ip)) = xv%v(rmt_idx(1:nhl,ip))[xchg(ip)]
    !tv(loc_idx(1:nhl,ip)) = xv%v(rmt_idx(1:nhl,ip))[xchg(ip)]
    tv(iv:iv+nhl-1)   = xv%v(rmt_idx(1:nhl,ip))[xchg(ip)]
    xv%v(iv:iv+nhl-1) = tv(iv:iv+nhl-1) 
    iv = iv + nhl
  end do
  call system_clock(count=icnt2)
  t1 = real(icnt2-icnt1)/real(icr)

  write(*,*) me,'Completed exchange', t1,icr
  deallocate(xv%v)
  
end program tst_ind_ab

I get the following:

e802756@delta-login-2 CafTest_Indirect]$ cafrun -np 2 ./ind_tab_perf
           1  My size :           1        2048     2099200
           1  Syncng with :           2        1000
           2  My size :           1        2048     2099200
           2  Syncng with :           1        1000
           1 Completed exchange   0.00000000            1000
           2 Completed exchange   0.00000000            1000
*** Error in `./ind_tab_perf': free(): invalid pointer: 0x00002b0dd9892018 ***
======= Backtrace: =========
[delta-login-2:17589] *** An error occurred in MPI_Win_detach
[delta-login-2:17589] *** reported by process [3157000193,1]
[delta-login-2:17589] *** on win rdma window 5
[delta-login-2:17589] *** MPI_ERR_OTHER: known error not in list
[delta-login-2:17589] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort,
[delta-login-2:17589] ***    and potentially your MPI job)
/lib64/libc.so.6(+0x7d053)[0x2b0dd9df0053]
/apps/software/OpenMPI/2.1.0-GCC-7.1.0-2.28/lib/libmpi.so.20(+0x5c52e)[0x2b0dd96d052e]
/apps/software/OpenMPI/2.1.0-GCC-7.1.0-2.28/lib/libmpi.so.20(ompi_mpi_errors_are_fatal_win_handler+0xed)[0x2b0dd96d0f2d]
/apps/software/OpenMPI/2.1.0-GCC-7.1.0-2.28/lib/libmpi.so.20(ompi_errhandler_invoke+0x155)[0x2b0dd96d01b5]
/apps/software/OpenMPI/2.1.0-GCC-7.1.0-2.28/lib/libmpi.so.20(MPI_Win_detach+0x118)[0x2b0dd9719df8]
./ind_tab_perf[0x403c66]
./ind_tab_perf[0x4036b9]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2b0dd9d94b15]
./ind_tab_perf[0x4026f9]
======= Memory map: ========
00400000-00414000 r-xp 00000000 00:29 4451611                            /mnt/gpfs0/home/e802756/NUMERICAL/IntelCAF/CafTest_Indirect/ind_tab_perf
00414000-00415000 r--p 00013000 00:29 4451611                            /mnt/gpfs0/home/e802756/NUMERICAL/IntelCAF/CafTest_Indirect/ind_tab_perf
00415000-00416000 rw-p 00014000 00:29 4451611                            /mnt/gpfs0/home/e802756/NUMERICAL/IntelCAF/CafTest_Indirect/ind_tab_perf
00416000-0041c000 rw-p 00000000 00:00 0 
0203e000-0211d000 rw-p 00000000 00:00 0                                  [heap]
0211d000-0211e000 rw-p 00000000 00:00 0                                  [heap]
0211e000-0214f000 rw-p 00000000 00:00 0                                  [heap]
0214f000-02154000 rw-p 00000000 00:00 0                                  [heap]
02154000-0221b000 rw-p 00000000 00:00 0                                  [heap]
0221b000-0221c000 rw-p 00000000 00:00 0                                  [heap]
0221c000-0221d000 rw-p 00000000 00:00 0                                  [heap]
0221d000-0222e000 rw-p 00000000 00:00 0                                  [heap]
0222e000-0228b000 rw-p 00000000 00:00 0                                  [heap]
0228b000-0228c000 rw-p 00000000 00:00 0                                  [heap]
0228c000-02291000 rw-p 00000000 00:00 0                                  [heap]
02291000-02292000 rw-p 00000000 00:00 0                                  [heap]
02292000-02294000 rw-p 00000000 00:00 0                                  [heap]
02294000-02295000 rw-p 00000000 00:00 0                                  [heap]
02295000-02297000 rw-p 00000000 00:00 0                                  [heap]
02297000-0229c000 rw-p 00000000 00:00 0                                  [heap]
0229c000-0229d000 rw-p 00000000 00:00 0                                  [heap]
0229d000-022a2000 rw-p 00000000 00:00 0                                  [heap]
022a2000-022a4000 rw-p 00000000 00:00 0                                  [heap]
022a4000-022bd000 rw-p 00000000 00:00 0                                  [heap]
022bd000-022be000 rw-p 00000000 00:00 0                                  [heap]
022be000-022d7000 rw-p 00000000 00:00 0                                  [heap]
022d7000-022da000 rw-p 00000000 00:00 0                                  [heap]
022da000-022de000 rw-p 00000000 00:00 0                                  [heap]
022de000-022eb000 rw-p 00000000 00:00 0                                  [heap]
022eb000-022fb000 rw-p 00000000 00:00 0                                  [heap]
022fb000-02300000 rw-p 00000000 00:00 0                                  [heap]
02300000-02310000 rw-p 00000000 00:00 0                                  [heap]
02310000-02315000 rw-p 00000000 00:00 0                                  [heap]
02315000-02325000 rw-p 00000000 00:00 0                                  [heap]
02325000-02330000 rw-p 00000000 00:00 0                                  [heap]
02330000-02332000 rw-p 00000000 00:00 0                                  [heap]
02332000-02335000 rw-p 00000000 00:00 0                                  [heap]
02335000-02337000 rw-p 00000000 00:00 0                                  [heap]
02337000-0233a000 rw-p 00000000 00:00 0                                  [heap]
0233a000-0234b000 rw-p 00000000 00:00 0                                  [heap]
0234b000-0234e000 rw-p 00000000 00:00 0                                  [heap]
0234e000-0235f000 rw-p 00000000 00:00 0                                  [heap]
0235f000-02362000 rw-p 00000000 00:00 0                                  [heap]
02362000-02373000 rw-p 00000000 00:00 0                                  [heap]
02373000-02375000 rw-p 00000000 00:00 0                                  [heap]
02375000-02387000 rw-p 00000000 00:00 0                                  [heap]
02387000-0238a000 rw-p 00000000 00:00 0                                  [heap]
0238a000-0239b000 rw-p 00000000 00:00 0                                  [heap]
0239b000-0239e000 rw-p 00000000 00:00 0                                  [heap]
0239e000-023af000 rw-p 00000000 00:00 0                                  [heap]
023af000-023b2000 rw-p 00000000 00:00 0                                  [heap]
023b2000-023c3000 rw-p 00000000 00:00 0                                  [heap]
023c3000-023c6000 rw-p 00000000 00:00 0                                  [heap]
023c6000-023d7000 rw-p 00000000 00:00 0                                  [heap]
023d7000-02441000 rw-p 00000000 00:00 0                                  [heap]
2b0dd9450000-2b0dd9471000 r-xp 00000000 fd:00 787918                     /usr/lib64/ld-2.17.so
2b0dd9471000-2b0dd9472000 rw-p 00000000 00:00 0 
2b0dd9472000-2b0dd949c000 r-xp 00000000 00:2a 7066570935574136645        /apps/software/OpenMPI/2.1.0-GCC-7.1.0-2.28/lib/libmpi_usempif08.so.20.10.0
2b0dd949c000-2b0dd949d000 r--p 00029000 00:2a 7066570935574136645        /apps/software/OpenMPI/2.1.0-GCC-7.1.0-2.28/lib/libmpi_usempif08.so.20.10.0
2b0dd949d000-2b0dd949e000 rw-p 0002a000 00:2a 7066570935574136645        /apps/software/OpenMPI/2.1.0-GCC-7.1.0-2.28/lib/libmpi_usempif08.so.20.10.0
2b0dd949e000-2b0dd949f000 rw-p 00000000 00:00 0 
2b0dd949f000-2b0dd94a4000 r-xp 00000000 00:2a 12801602832016719735       /apps/software/OpenMPI/2.1.0-GCC-7.1.0-2.28/lib/libmpi_usempi_ignore_tkr.so.20.10.0
2b0dd94a4000-2b0dd94a5000 r--p 00004000 00:2a 12801602832016719735       /apps/software/OpenMPI/2.1.0-GCC-7.1.0-2.28/lib/libmpi_usempi_ignore_tkr.so.20.10.0
Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x2b0dd9da866f in ???
#1  0x2b0dd9da85f7 in ???
#2  0x2b0dd9da9ce7 in ???
#3  0x2b0dd9de8326 in ???
#4  0x2b0dd9df0052 in ???
#5  0x2b0dd96d052d in ???
#6  0x2b0dd96d0f2c in ???
#7  0x2b0dd96d01b4 in ???
#8  0x2b0dd9719df7 in ???
#9  0x403c65 in ???
#10  0x4036b8 in ???
#11  0x2b0dd9d94b14 in ???
#12  0x4026f8 in ???
#13  0xffffffffffffffff in ???
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 0 on node delta-login-2 exited on signal 6 (Aborted).
--------------------------------------------------------------------------
[e802756@delta-login-2 CafTest_Indirect]$ 

It seems clear that OC and OpenMPI are not cooperating.
A vanilla installation of OC/OpenMPI 2.1.0 on my desktop works fine.

e802756@sow768056c-li [138] 01:26 PM [CafTest_Indirect] cafrun -np 2 ./ind_tab_perf
           1  My size :           1        2048     2099200
           2  My size :           1        2048     2099200
           2  Syncng with :           1        1000
           1  Syncng with :           2        1000
           1 Completed exchange   0.00000000            1000
           2 Completed exchange   0.00000000            1000
e802756@sow768056c-li [139] 02:07 PM [CafTest_Indirect] 

What should I try next to figure out where the problem is?
Salvatore

@sfilippone
Copy link
Author

If I take out the derived type with allocatable components, the "invalid pointer" error goes away, so it's definitely in the deallocation of allocatable components

@vehre
Copy link
Collaborator

vehre commented Nov 26, 2017

This should at least partially be fixed by pull request #468.

zbeekman added a commit that referenced this issue Dec 26, 2017
…onversion-during-communication

Fix type conversion during communication for regular get, send, sendget.

Fixes #292 and possibly #427
@zbeekman
Copy link
Collaborator

This may be related to #322

@zbeekman
Copy link
Collaborator

zbeekman commented Dec 26, 2017

The first iteration of Salvatore's test is still broken as of 0c2dce7 on 12/26/2017 after #468 was merged (others may still be broken too) using GCC 7.2. I have yet to test with GFortran 8

@rouson
Copy link
Member

rouson commented Dec 28, 2017

This appears to be a duplicate of issue #322. @vehre do you agree? If so, let's close this one because the code example in #322 is much shorter.

@vehre
Copy link
Collaborator

vehre commented Dec 28, 2017

Yes, it is a duplicate, but both shine a different light on the issue and give valuable context information. So if you close this issue, add to #322 clearly that more information is available here.

@zbeekman
Copy link
Collaborator

zbeekman commented Jan 2, 2018

Let's keep this open until the problem is resolved.

vehre added a commit that referenced this issue May 1, 2018
This patch fixes OpenCoarrays sendget_by_ref() operator by using the
types of src and dst during execution. These types are only handed to
the function by a gcc >= 8.

Fixes #427.
@vehre vehre mentioned this issue May 1, 2018
4 tasks
@rouson
Copy link
Member

rouson commented May 30, 2018

@sfilippone Could you test the current master branch of OpenCoarrays and give us an update on the status of this issue? Also, it would be great if you could submit a pull request with one more more tests to expose any remaining issues. If you add the test(s) somewhere in [src/tests/unit], I can set it up to run. Possibly these would just be the tests included higher up in this thread. And be sure to mention which version(s) of GCC were tested and whether the environment variable OPENCOARRAYS_DEVELOPER is set to ON or OFF before testing. As of today, OpenCoarrays supports GCC 6.4.0, 7.3.0, and 8.1.0, but GCC 8.1.0 requires a patch to build OpenCoarrays. That patch gets installed automatically if you build gfortran using the OpenCoarrays installer, e.g., via ./install.sh -p gcc -z -j 4 -y -i <desired-install-prefix> -I 8.1.0.

zbeekman added a commit that referenced this issue May 30, 2018
@sfilippone
Copy link
Author

@rouson Test codes work now with OC210 and patched GCC810.
S.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants