-
-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defect: remote indirect addressing does not work #427
Comments
Hi Salvatore, gfortran implements this. OC only partially. For gcc 7+ this kind of addressing is implemented for get operations (i.e., coarray index on the right-hand side of the assignment) when the lhs is allocatable or the right hand side involves allocatable components in derived type coarrays. Unfortunately is the support for allocatable components currently broken in OC. I have a fix, but its not perfect yet. Because we do not handle this yet, the first approach will be element by element, i.e., not using mpi_datatypes or the like. We strive to get all communications working before optimizing them. At the moment send() and getsend() for allocatable components is still missing completely. Sorry, to have no better answer for you. Regards, |
Hi Andre, Anyway, getting this to work, and to work efficiently, is going to be extremely important for applications. |
Speaking of MPI datatypes, there is one thing that MPI would allow: from the application structure, I know that (some of) the set of indices are going to be reused over and over again. In MPI this would obviously be handled by saving the datatype in an application-level cache. With Coarray, I would not have control over such a thing, so it would depend on how expensive it is to define an MPI datatype every time this is used. |
Hi Salvatore, have a look at branch issue-399 and make sure to use a gcc7+ for compiling. There support is available when you revert your simplification. I am curious how that implementation performs, if it runs at all. So only when may be your grad student has time for experimenting. As a side note, this is no easy task. I have a Ph.D. in CS and it took me several months to understand all of the intricasies of gfortran, opencoarrays and the mpi-interplay. So don't get mad at him, when he makes only little progress. MPI-Datatype caching is an interesting approach. The downside is, that OC can not identify identically structured calls easily, i.e., calls from the same call-side will be treated as totally different ones. There currently is no approach how to solve this (although it can be quite interesting from the optimization point of view). I hope to have a bit more time left for OC in the future (already regretting to have said this, because Damian is likely to jump for it). Andre |
This is what I get (with GNU 8)
|
Uh, that was a sendget()? I don't see one in the example above. So to make this clear: only get() is implemented there in OC yet. No send(get). |
This is the source code. I do not see any sendget .... I was extremely surprised at that message!
|
This line: |
Still does not work
Code;
|
Unfortunately do I have to report, that you found a compiler bug. Gfortran is not copying the data back from the internal temporary it generates for tv, but rather treats the internal temporary as argument to get(). This bug is now known as: Fortran bugtracker 81773 |
I have tried to run the original example with Intel. While it works as shown above, once I ramp up the array sizes either a) it takes forever or b) emits a segfault. |
Perhaps even more interesting, if I have contiguous addressing on the LHS I get wrong results:
With Intel I get the expected output:
With OC1.9.0?GNU 8.0.0 I get something a bit strange:
|
Hi Salvatore, for the last example, I just pushed a solution in the branch issue-427. That branch experimentally fixes the get to linear array issue as given in the last example. Only(!) this is fixed. Send and getsend are not yet fixed, i.e., to use it you have to provide a temporary on the left-hand side. Please test the solution. If it satisfies your needs, then I will try to get something similar up for send() and getsend(). @ALL: Btw, I also made the get() routine more flexible, when the efficient implementation for get() is used, i.e., when STRIDED is set, then for get also copy to self is supported (testcase sameloc/ get_self (we definitively should call testcase's executables/names the same like their source file)). The test nevertheless fails later because it also tests send and getsend. Which IMHO should better be split into separate tests. Regards, |
Hi Andre, |
Hello,
When I try the simple test program
I get the expected output:
but I am confused by the mpi-abort message. Moreover, if I try to run a slightly more complex program,
I get the following:
It seems clear that OC and OpenMPI are not cooperating.
What should I try next to figure out where the problem is? |
If I take out the derived type with allocatable components, the "invalid pointer" error goes away, so it's definitely in the deallocation of allocatable components |
This should at least partially be fixed by pull request #468. |
This may be related to #322 |
Yes, it is a duplicate, but both shine a different light on the issue and give valuable context information. So if you close this issue, add to #322 clearly that more information is available here. |
Let's keep this open until the problem is resolved. |
This patch fixes OpenCoarrays sendget_by_ref() operator by using the types of src and dst during execution. These types are only handed to the function by a gcc >= 8. Fixes #427.
@sfilippone Could you test the current master branch of OpenCoarrays and give us an update on the status of this issue? Also, it would be great if you could submit a pull request with one more more tests to expose any remaining issues. If you add the test(s) somewhere in [src/tests/unit], I can set it up to run. Possibly these would just be the tests included higher up in this thread. And be sure to mention which version(s) of GCC were tested and whether the environment variable OPENCOARRAYS_DEVELOPER is set to ON or OFF before testing. As of today, OpenCoarrays supports GCC 6.4.0, 7.3.0, and 8.1.0, but GCC 8.1.0 requires a patch to build OpenCoarrays. That patch gets installed automatically if you build gfortran using the OpenCoarrays installer, e.g., via |
@rouson Test codes work now with OC210 and patched GCC810. |
Is remote indirect addressing supposed to work as of the current development status?
The key statement is
which is a (very) simplified version of the so-called halo exchange procedure.
Without remote indirect addressing I am forced to use a two-sided approach, with packing/transfer/unpacking.
On a related note, how is the implementation going to handle this? Is there any hope that this will be efficient? E.g., are the indices going to travel across the network? Is this going to be translated into multiple accesses? That would kill performance, hence preventing the use of 1-sided in a very important scenario, PDEs on unstructured meshes.
Defect/Bug Report
uname -a
:Linux sow768056c-li.soe.cranfield.ac.uk 2.6.32-696.6.3.el6.x86_64 tests dis_transpose: test passed #1 SMPObserved Behavior
Expected Behavior
With Intel I get:
Steps to Reproduce
Compile the attached file & run
Is this an OC issue, a GFORTRAN issue, or both?
The text was updated successfully, but these errors were encountered: