-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test NOX_Tpetra_1DFEM_MPI_4 random failure showing "Concurrent modification of host and device views in DualView" starting 2020-02-09 #6790
Comments
@ndellingwood just fixed the one Tpetra issue. Usually this is an issue in using Tpetra, where users just assume that they can get views without respecting the sync / modify interface. The only thing that changed was that Kokkos now checks DualView flags by default in a debug build. It didn't used to do that. |
Looks like an error in |
Address issue #6790 Changes to be committed: modified: test/tpetra/ME_Tpetra_1DFEM_def.hpp
PR #6793 issued. @mhoemmen can you clarify if an initialized Tpetra MV returns with views having clean sync states? |
@ndellingwood wrote:
Not necessarily. MultiVector reserves the right to do the initialization wherever it likes. Even uninitialized MultiVector objects may be marked modified on either side in debug mode (since debug mode prefills with |
@ndellingwood Just to clarify: Tpetra::MultiVector<> X(map, numVecs);
// X may either be fully sync'd, or may be modified on either side.
assert((X.need_sync_host() && ! X.need_sync_device()) ||
(! X.need_sync_host() && X.need_sync_device()) ||
(! X.need_sync_host() && ! X.need_sync_device())); |
Thanks for the clarification @mhoemmen ! |
Address issue #6790 Changes to be committed: modified: test/tpetra/ME_Tpetra_1DFEM_def.hpp
PR #6793 merged. |
Address issue trilinos#6790 Changes to be committed: modified: test/tpetra/ME_Tpetra_1DFEM_def.hpp
Test results for issue #6790 as of 2020-08-16
Tests with issue trackers Passed: twip=1 Detailed test results: (click to expand)Tests with issue trackers Passed: twip=1
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat. |
Test results for issue #6790 as of 2020-08-23
Tests with issue trackers Passed: twip=1 Detailed test results: (click to expand)Tests with issue trackers Passed: twip=1
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat. |
CC: @trilinos/kokkos, @trilinos/tpetra, @trilinos/nox, @kddevin (Data Services Product Lead), @rppawlo (Nonlinear Solvers Product Lead)
As shown here, the test:
in the build:
failed with the error:
This is the first such failure I can find since the Kokkos 2.99 promotion on 2020-02-02. However, given this appears to be a random failure and given the large cost of the last major random error found in Trilinos, I thought it would be good to raise this issue early in case this turns into something big. (But this might just be a defect in this unit test and not in production code. Or might just be a hardware fluke that we never see again, who knows.)
The text was updated successfully, but these errors were encountered: