-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tpetra: offsetView of offsetView buggy #46
Comments
If you create a simple Tpetra vector (with LocallyReplicated map in the attached example) and first create an offsetView V1, and then an offsetView V2 of V1, the offsets should be summed, but this doesn't seem the case since I updated from 11.12.1 to 12.2.1. |
sorry for the messy bug report, I'm new to github... |
Hi @jthies ! I'm not getting e-mail notifications for Tpetra for some reason -- I have to figure out what's going on -- but thanks for reporting! |
Mark, Jonathan |
could someone fix this minor issue? It's preventing me from porting Software to the latest Trilinos Versions (or has it been fixed already? I didn't try it anymore) |
@jthies Hi Jonas -- I'll work on adding your test to the repo. I've been traveling a lot and it's been hard for me to keep track of all the things to do. Thanks for explaining the urgency for us. |
I'm calling this a "task," but please note that it does relate to known DualView issues, like #364. |
@jthies Hi Jonas! You might be pleased, annoyed, or both to note that I've finally gotten around to fixing this. Thanks for catching this and for providing a test! I modified it slightly so that it works as an automated test. I'm running the check-in tests now and am about to push. |
@trilinos/tpetra This commit adds a test for #46 (offset view of offset view of a Tpetra::MultiVector), and fixes #46. The fix includes a less-than-satisfactory hack (see FIXME in code comments) so that TpetraCore's CrsMatrix Gauss-Seidel test also passes. (That test relies on domain Map views of column Map MultiVectors, in particular the ability to get the original column Map MultiVector back out of the domain Map view.) to domain Map views of column Map MultiVectors. The FIXME comment above the hack explains how one could fix it in a more permanent way. Build/Test Cases Summary Enabled Packages: TeuchosNumerics, TpetraCore Disabled Packages: FEI,PyTrilinos,Moertel,STK,SEACAS,ThreadPool,OptiPack,Rythmos,Intrepid,ROL,Panzer Enabled all Forward Packages 0) MPI_DEBUG => passed: passed=940,notpassed=0 (10.47 min) 1) SERIAL_RELEASE => passed: passed=881,notpassed=0 (15.07 min) Other local commits for this build/test group: dc26ca9, acf27dd
@jthies I pushed a fix for this yesterday to the develop branch of Trilinos. Sorry it took us so long! |
Hi Mark, thanks, I’ll try it out soon and let you know if all our tests are running through! Jonas |
Thanks for still talking to me after it took so long for us to fix this silly thing! I'm not even terribly pleased with the fix, but as long as you don't exercise the weird use case of "recover the original column Map view of a domain Map view of a column Map (Multi)Vector, also with a nonzero row offset," your code should work fine. |
@trilinos/tpetra Fix #680. This issue refers to a build error in the Issue #46 test, that only shows up with GCC 4.7.2. GCC 4.7.2 appears to be broken, in the sense that it does not let me say things like X.template sync<Kokkos::HostSpace> (); even when X's type does not depend on template parameters. Other compilers require "template" in code like the above example, so I can't just remove "template". This commit works around the issue by making the test's implementation templated, but calling just one instantiation of that implementation. Thanks to Brent for reporting this! Build/Test Cases Summary Enabled Packages: TpetraCore Disabled Packages: FEI,PyTrilinos,Moertel,STK,SEACAS,ThreadPool,OptiPack,Rythmos,Intrepid,ROL,Panzer 0) MPI_DEBUG => passed: passed=103,notpassed=0 (4.79 min) 1) SERIAL_RELEASE => passed: passed=77,notpassed=0 (5.25 min)
Hi Mark, sure, thank you for still listening to These nasty technical issues :) The Situation I reported here seems to be fixed, but we still have some failing tests concerning multiple views. It might be that I didn't adapt everywhere to the changed "view semantics" though. If you are interested in running our tests, you could check out https://bitbucket.org/essex/phist.git (branch tpetra_update). Use cmake -DPHIST_KERNEL_LIB=tpetra, make and run the Driver phist-1.1.0-kernels-test- Or you could add similar tests to your Framework, basically what we do is also run all our tests once with standard objects and once with funny views. |
@jthies wrote:
This is super important, and I appreciate your patience :-)
Could you explain what you mean by "standard objects" vs. "funny views"? Are "funny views" the result of calling offsetView* on a Tpetra::MultiVector, with a nonzero offset? |
The operations I consider are e.g. C=V^TW, W=V*C with C a "small dense" and V,W "tall skinny" matrices. In Tpetra both are represented as MultiVectors with a local or distributed map, respectively. For the V/W Kind, I want to allow views with a column offset only (subView with a Range1D arg). For the C-type, I also want to use views with a row and a column offset, so an "offsetView" of a "subview", again for contiguous rows and columns. I certainly don't care what the map of such an object Looks like, I just want to use it for local stuff anyway. As to why we test this stuff with "funny views" (meaning e.g. V has a col offset, C a row and a col offset, and then compute C=V^TV): because the viewing leads to unaligned memory accesses and may break the kernel if it uses e.g. AVX, a mistake that we encounter quite often. The Errors I see with the "develop" branch of Tpetra seem to occur mostly when putting view objects into tests that create more views of them and use thm to perform some operations, so the nesting of subViews and offsetViews, and mostly when "C-type" matrices are involved (offsetViews, that is). This all works in Trilinos 11.12.1, so it broke somewhere during the recent refectorings. |
@jthies wrote:
I think you would be pleased with Kokkos::View. It's entirely local (no MPI), uses first-touch allocation by default for host memory, and can use NVIDIA GPU memory as well. View works almost like Matlab arrays; the only function you need to remember is Kokkos::subview. You can also choose the layout (e.g., column or row major; default depends on where you allocate). For example:
We're working on BLAS wrappers too, but it wouldn't be hard for you to write them, since the only function you need really is _GEMM. Our wrappers will take Views with any stride or layout, with special cases that use optimized BLAS when available. |
@jthies wrote:
See e.g., tpetra/ReleaseNotes.txt for 11.14. Could I trouble you perhaps to extract a minimal Tpetra use case that didn't break before, but breaks now? Thanks :-) |
# This is the 1st commit message: This PR extends previous work in "Structured Integration", a term that generalizes sum factorization algorithms for finite element assembly. The main thing that was missing for full support of the exact sequence was pullbacks in FunctionSpaceTools for spaces other than H(grad). This PR adds those, as well as implementations of some sample formulations -- corresponding to the natural norms in each function space -- in both unit tests and performance tests. There are some changes to the new data structures as well as to the new integration kernels that were required for this full support. StructuredIntegrationTests has gotten an upgrade; for the assembly comparison tests (of which there are now five distinct formulations assembled: Poisson plus the natural norms for each space), we now use test templates. This simplifies declarations, makes it easier to see what cases are covered, and eliminates a fair amount of code redundancy. Modified handling of tensor bases, especially with regard to CellTopology. Initial (draft) implementation of arbitrary-dimensional H^1, L^2 bases on the hypercube. Note: the basis equivalence test added here fails at the moment; an exception is thrown in TensorBasis::getValues() (the five-argument version that subclasses are nominally supposed to implement). Made TensorBasis default implementation support OPERATOR_VALUE and OPERATOR_GRAD. (TODO: delete implementations in subclasses that handle these for the five-argument getValues().) Revised quadrature in BasisEquivalenceTests to deal correctly with extrusions. Right now, the new basis equivalence test fails with an exception in getCubatureImpl(), where there's an assertion that there are either two or three direct cubatures. (And the copy logic only handles those two cases.) So we also need (probably) to revise this to support arbitrary dimensions. Working to rewrite BasisEquivalenceTests using BasisValues instead of DynRankView; right now, things don't compile… Dropping implementation of getValues() in tensor-product DerivedBasis_HGRAD and _HVOL, since now TensorBasis handles these. But right now we fail during TensorBasis::allocateBasisValues() for OPERATOR_Dn. Wrote a TODO there regarding what needs to change to fix this. Implemented a default getSimpleOperatorDecomposition() within TensorBasis. Extended getDkCardinality() computation within Basis to support spaceDim up to 7. Revisions, including a TODO discussing one current test failure. (There are almost certainly more.) Commented out some debugging output. Switching from fixed-length Kokkos::Array to dynamic Kokkos::vector for vector components, to support OP_Dn use cases. Implemented TensorBasis::getOperatorDecomposition() built-in support for OP_D1 through OP_D10. More work on TensorBasis; now all MonolithicExecutable tests pass. Some other test failures: 150 - Intrepid2_unit-test_Projection_Serial_Test_InterpolationProjection_HEX_MPI_1 (Failed) 158 - Intrepid2_unit-test_Projection_Serial_Test_Convergence_HEX_MPI_1 (Failed) (This is locally on my machine, a Serial Release build. Have not tried OpenMP or CUDA.) Fixed the logic for setting shards topology and tags for H(curl) and H(div) bases on the hexahedron. Locally (in serial, CPU), all tests now pass. Added Intrepid2::CellTopology, with support for arbitrarily many tensorial extrusions. Revised doxygen to reflect the fact that tags are supported by raw TensorBasis now. Added support for tags to raw TensorBasis, in terms of component tags. Getting BasisEquivalenceTests into better shape for testing hypercube bases, including tests up to spaceDim = 7, and fixed an issue in which the cellTopo used for checking tags was incorrect (used baseTopo instead of the tensorial extrusion). Draft implementation of SerendipityBasis. *Not* even a complete draft: refers to BasisValues::setOrdinalFilter(), which is not yet declared, much less implemented. But because most methods here aren't referenced by code in a .cpp file, the changes do not interfere with a successful build. Added BasisValues::setOrdinalFilter() method, and support within operator() and extent_int() to honor this. For Serendipity, eliminated the pointType argument, which is ignored for all of our hierarchical basis families anyway. Fixed definition of HostBasis type. Added a first test against Serendipity bases: a check that the actual cardinality matches the formula for the cardinality. This test passes. Fixed a missing return in allocateBasisValues(). Added accessor for ordinal map. Fixed a couple issues that affect BasisValues when you have a non-trivial ordinal filter. Fixed a test failure that resulted from comparing on two different cell topologies as though they were the same. Added tests against Serendipity basis. These pass. Generalized getDkEnumeration() and getDkEnumerationInverse() to arbitrary dimensions. Made getTensorDkEnumeration() support up to total dimension of 7. Added support for POINTTYPE_DEFAULT for nodal H(div) Hex and Quad bases. Made triangle H(grad) basis fail cleanly when the order exceeds MaxOrder; was a buffer overrun. Made various bases fail cleanly when the order exceeds MaxOrder; in most cases, this could cause a buffer overrun. Added a check that points container has the right number of tensor components. Fixed a couple places where we incorrectly determined the space dim for a basis because we neglected the number of tensorial extrusions, as returned by getNumTensorialExtrusions(). Fixed an error in which a Kokkos::Array was potentially used uninitialized. Added getDomainDimension() method. Made TensorBasis::allocateBasisValues() a bit more flexible with regard to inputPoints argument (allows more tensorial components in points than there are basis components, so long as the basis lines up; this case can arise when you have a hypercube topology for one of the componentsand a non-tensor basis defined on it; e.g. HGRAD_HEX_Cn_FEM, extruded some number of times). Added tests against getDkEnumeration and getDkEnumerationInverse, and fixed the issues in getDkEnumerationInverse that these revealed. Fixed an error in the treatment of tensor points for the first component basis of a TensorBasis (as when a 3-component point input is provided to HGRAD_QUAD x HGRAD_LINE). Added a TensorPoints constructor that copies selected tensor components from another TensorPoints container. Tweaked doxygen for the new TensorPoints constructor. Removed extraneous template parameter from the subset constuctor for TensorPoints. Fixed an error in the TensorData-combining constructor. Added some sanity checks on TensorPoints components. Continued refactoring StructuredIntegrationPerformance in preparation for H(div) and H(curl) test cases, as well as adding a choice of basis family. Also, some ExecutionSpace --> DeviceType replacements. TransformedVectorData class --> TransformedBasisValues (still need to rename the files). Added preliminary support for getHGRADtransformVALUE(). Added preliminary support for H^1 -- that is, (grad,grad) + (value,value) -- assembly in StructuredIntegrationPerformance. Right now, this fails with an exception during the structured assembly of (value,value), due to the transforms being the identity and therefore unset, but integrate() assumes that they are valid Data objects. # This is the commit message trilinos#2: Tweaked comment to cover the scalar basis value case. # This is the commit message trilinos#3: Made composedTransform construction cognizant of the invalid-transform-means-identity convention. # This is the commit message trilinos#4: Fixed H1StandardAssembly to make it tolerant of the case when a workset size does not evenly divide the number of cells. # This is the commit message trilinos#5: Removed console output indicating that H1StructuredAssembly was not yet fully implemented -- it is now, though we haven't verified the values at all. # This is the commit message trilinos#6: Added storage of the assembled matrix results, such that they can be compared across algorithms to verify that the various algorithms agree. (Have not yet implemented that comparison, but have added a TODO where it should go.) # This is the commit message trilinos#7: Added comparisons of assembled stiffness matrices to verify correctness. For Poisson, these pass. For the new H^1 formulation, they do not pass yet. Fixed domainExtents when generating CellGeometry (was 0, leading to NaNs in Jacobians). # This is the commit message trilinos#8: Some fixes for integrate() to support (value, value) integral (for H^1 values, at least). I've run a couple H^1 test cases here, and these pass. # This is the commit message trilinos#9: Commented out my debugging test cases in favor of the standard (CI) test cases. # This is the commit message trilinos#10: Initial effort at H(div) performance tests. At present, fail with an exception during the structured integration. TODO in place for a fix. # This is the commit message trilinos#11: A bunch of fixes toward getting the H(div) formulation working. From the test I'm running right now (single-cell), it looks like everything is working except the Uniform case, and in that case, we have an incorrect jacobianDividedByJacobianDet (the second and third diagonal entries are missing). The likely culprit is the in-place product computation; maybe it doesn't handle the block-plus diagonal well? This includes some console output that shows the wrong values for jacobianDividedByJacobianDet. # This is the commit message trilinos#12: Added getWritableEntryWithPassThroughOption() and getEntryWithPassThroughOption(), allowing in-place combinations involving block-diagonal matrices. Fixed a bug in getDataExtent() in which the for loop iterated over the whole activeDims_ container, rather than the first numActiveDims_ entries. If the data there happened to match the dimension d in the argument, this could have led to incorrect behavior. Fixed a bug in Data(std::vector<DimensionInfo> dimInfoVector) constructor, in which blockPlusDiagonal dimensions would not be processed correctly. # This is the commit message trilinos#13: Added unit test InPlaceProduct_Matrix, which motivated the changes in the previous commit. # This is the commit message trilinos#14: Got H(div) structured assembly working. # This is the commit message trilinos#15: Got rid of the "hypercube" modifier in the various assembly routine names: so far, we only test hypercubes, but the assembly is agnostic to that. The fact that it's a hypercube is in the geometry object which is passed in. # This is the commit message trilinos#16: Added definitions, implementations of getHCURLtransformVALUE and getHCURLtransformCURL. # This is the commit message trilinos#17: Fixed file creation dates listed in headers (not exact; just don't want to claim that these were there last summer)… # This is the commit message trilinos#18: Added H(curl) formulations, and filled out support for this in StructuredIntegrationPerformance. These already pass! The H(curl) transformations look a lot like the H(div) ones... # This is the commit message trilinos#19: Made getDkEnumeration(), getDkTensorIndex() KOKKOS_INLINE_FUNCTIONs (had been merely inline). # This is the commit message trilinos#20: Eliminating use of Kokkos::vector to store elements in VectorData's vectorComponents_ member; evidently, Kokkos::vector's operator[] requires UVM to execute on device under CUDA; it becomes merely an inline function when KOKKOS_ENABLE_CUDA_UVM is unset. # This is the commit message trilinos#21: Removing now-incorrect/unnecessary construction of VectorArray objects (now that VectorArray is a fixed-length Kokkos::Array). # This is the commit message trilinos#22: Eliminated a couple of unused variables. # This is the commit message trilinos#23: Changed dimToComponent_, dimToComponentDim_, numDimsForComponent_ from Kokkos::vector to Kokkos::Array. # This is the commit message trilinos#24: Eliminated a few more incorrect/unnecessary VectorArray() constructor calls. # This is the commit message trilinos#25: Dropped reference (&) from a few Data object definitions; looks like these are causing problems with lambda capture on CUDA. # This is the commit message trilinos#26: Dropped some const qualifiers, as well as references, for the affine path in IntegrationTools. # This is the commit message trilinos#27: Added support for outputting timings data to file. # This is the commit message trilinos#28: StructuredIntegrationPerformance: Reworking the way that meshes get sized, so that the stiffness matrices fit within a 2 GB budget. # This is the commit message trilinos#29: Reworking calibration to use the "core integration" throughput as the thing to maximize. Also increased the max workset size to match the new max cell count (32768). # This is the commit message trilinos#30: Fixed a bug in which the various structured assembly methods would undercount flop estimates (we were clobbering one integration's flops with another's). # This is the commit message trilinos#31: Trying something with Calibration mode, in which we skip over workset sizes that were bigger than the optimal choice for prior polyOrders. # This is the commit message trilinos#32: Implemented L^2 formulation, including getHVOLtransformVALUE() implementation in FunctionSpaceTools. Loosened tolerances. Added some calibration results (these are not yet complete). # This is the commit message trilinos#33: Added missing fences, which meant that "Standard" timers were undercounting (dramatically for H(vol)). # This is the commit message trilinos#34: Added calibrations for all formulations/platforms except Serial. Running Serial now... # This is the commit message trilinos#35: Finished calibrations (BestSerial, BestOpenMP_16, BestCuda). # This is the commit message trilinos#36: Moved assembly headers to a common location; started refactoring StructuredIntegrationTests to use this. (So far, it only uses Poisson formulation in testing.) # This is the commit message trilinos#37: An attempt to fix a failure in HexahedronHierarchicalDGVersusHierarchicalCG_HGRAD -- by expanding the max compenents in VectorData. At the moment, this is leading to segfaults without much clarity in lldb as to what's causing it. Pushing so that I can build on other machines with other debuggers. # This is the commit message trilinos#38: Attempting to resolve some "missing return statement" warnings from nvcc. # This is the commit message trilinos#39: Fixing unused variable warning. # This is the commit message trilinos#40: Turning off tests for OPERATOR_D3 and above for derived basis classes. # This is the commit message trilinos#41: Set MaxVectorComponents = 7 = MaxTensorComponents. There's a weird issue exhibited by an Apple clang (debug) build of MonolithicExecutable: a seg fault during HexahedronHierarchicalDGVersusHierarchicalCG_HGRAD, with little or no information from the debugger on what went wrong, which evidently happens whenever MaxVectorComponents != MaxTensorComponents. # This is the commit message trilinos#42: Eliminating certain OPERATOR_Dn tests for compatibility with VectorData limitations. # This is the commit message trilinos#43: Fixed a couple issues with CUDA build. # This is the commit message trilinos#44: Got rid of some extraneous "using" lines in Standard Assembly headers. # This is the commit message trilinos#45: Tweaked doxygen. # This is the commit message trilinos#46: Reverting some changes that are specific to the Serendipity branch -- want to exclude those from this branch, and do a separate PR for Serendipity. # This is the commit message trilinos#47: Deleted Serendipity basis. # This is the commit message trilinos#48: More serendipity reversion. # This is the commit message trilinos#49: Some tweaks to exception tests, code formatting. # This is the commit message trilinos#50: More Serendipity reversion (possibly done, but I haven't tried building yet!).
@trilinos/tpetra
The text was updated successfully, but these errors were encountered: