Performance of looping over Tpetra CrsMatrix rows #118

aprokop · 2016-01-30T05:15:28Z

@trilinos/tpetra @jhux2 @mhoemmen @crtrott

Let me first admit that I am very likely doing something wrong.

I wrote a simple driver (located at muelu/test/perf_test_kokkos, which essentially finds the number of nonzeros in a CrsMatrix by looping through rows and adding lengths. It considers three scenarios:

Looping through Xpetra layer abstraction (something MueLu is very interested in)
Looping directly through Tpetra/Epetra
Looping through the local Kokkos CrsMatrix

The results were somewhat unexpected for me. I was running with a single MPI rank with OpenMp OMP_NUM_THREADS=1, disabled HWLOC (so that Kokkos respects this). Here are some results:
For Tpetra

Loop #1: Xpetra/Tpetra  0.05980 (1)                 
Loop #2: Tpetra             0.05867 (1) 
Loop #3: Kokkos-1         0.00274 (1)               
Loop #4: Kokkos-2         0.00214 (1)

For Epetra

Loop #1: Xpetra/Epetra  0.01933 (1)                
Loop #2: Epetra             0.01385 (1)                
Loop #3: Kokkos-1         0.00427 (1)               
Loop #4: Kokkos-2         0.00213 (1)

So it seems to me that using local Kokkos matrix has absolutely be the way, as it is ~30 times faster than through Tpetra, and ~6 times faster than through Epetra.

I would like to know if anybody done any performance studies like this, or what could be the reason.
If I am doing something that is completely wrong, I would also like to know that.

The text was updated successfully, but these errors were encountered:

mhoemmen · 2016-01-30T06:16:45Z

Hi Andrey! Thanks for doing the comparison! btw is the matrix fill complete at this point? (It must be locally indexed, since you're getting local indices.)

My main concern is that Tpetra is nearly 3x slower than Epetra for this very simple operation. My guess is that Tpetra spends its extra time in Teuchos::ArrayView construction and manipulation. Both Epetra and Tpetra do error checking by throwing exceptions, and both have a similar division of labor between the graph and the matrix (e.g., the matrix gets the row's indices by calling a method on its graph). Besides the use of ArrayView, the main difference is that Epetra tends to branch on the matrix's current state ("is storage optimized?"), while Tpetra abstracts that away behind CrsGraph::getRowInfo (returns a struct with the row's current number of entries, amount of space, and offset if applicable). It could be that filling the struct wastes time vs. just branching, but I'm sure the ArrayView manipulation plays a big part. It's possible to strip all that away and just return to extracting and passing around raw pointers (except for the underlying Kokkos Views that actually hold the data).

Epetra could be optimized too. For example, 4-argument Epetra_CrsMatrix::ExtractMyRowView asks for the number of entries in the row twice (once in Epetra_Graph::ExtractMyRowView, and once in 3-argument Epetra_CrsMatrix::ExtractMyRowView). On the other hand, the Epetra and Tpetra interfaces do error checking and look up things for you, and you'll always have to pay something for that.

A while back, @nmhamster and I debated a bit about this. He argued that the main interface we give to users should always perform reasonably well. Otherwise, we shouldn't give it to users. Do we want to spend more effort making Tpetra's interface perform well, or do we just want to skip it?

For future reference:

Epetra_CrsMatrix::ExtractMyRowView lives in Epetra_CrsMatrix.cpp, lines 1577-1585
Epetra_CrsMatrix::Values lives in Epetra_CrsMatrix.h, lines 1475-1480

mhoemmen · 2016-01-30T06:44:53Z

Do we want to spend more effort making Tpetra's interface perform well, or do we just want to skip it?

What I mean specifically, is whether we expect users to access the matrix's entries through Tpetra methods, or whether they should always use the Kokkos interface. What I worry about with the latter, is that users will start asking for features that will make the latter more like the former, which will reduce performance, leading again to requests for a stripped-down interface. I think what @nmhamster was trying to say is that if we give users a higher-level interface, we should commit to making it reasonably fast.

etphipp · 2016-01-30T14:16:27Z

I would suggest running these tests through vtune to see where the differences are. At the conclusion of the milestone we had the performance between Epetra and Tpetra very similar for this operation. But that was pre-Kokkos and pre-KokkosRefactor.

-Eric

On Jan 29, 2016, at 11:16 PM, Mark Hoemmen <[email protected]mailto:[email protected]> wrote:

Hi Andrey! Thanks for doing the comparison! btw is the matrix fill complete at this point? (It must be locally indexed, since you're getting local indices.)

My main concern is that Tpetra is nearly 3x slower than Epetra for this very simple operation. My guess is that Tpetra spends its extra time in Teuchos::ArrayView construction and manipulation. Both Epetra and Tpetra do error checking by throwing exceptions, and both have a similar division of labor between the graph and the matrix (e.g., the matrix gets the row's indices by calling a method on its graph). Besides the use of ArrayView, the main difference is that Epetra tends to branch on the matrix's current state ("is storage optimized?"), while Tpetra abstracts that away behind CrsGraph::getRowInfo (returns a struct with the row's current number of entries, amount of space, and offset if applicable). It could be that filling the struct wastes time vs. just branching, but I'm sure the ArrayView manipulation plays a big part. It's possible to strip all that away and just return to extracting and passing around raw pointers (except for the underlying Kokkos Views that actuall y hold the data).

Epetra could be optimized too. For example, 4-argument Epetra_CrsMatrix::ExtractMyRowView asks for the number of entries in the row twice (once in Epetra_Graph::ExtractMyRowView, and once in 3-argument Epetra_CrsMatrix::ExtractMyRowView). On the other hand, the Epetra and Tpetra interfaces do error checking and look up things for you, and you'll always have to pay something for that.

A while back, @nmhamsterhttps://github.com/nmhamster and I debated a bit about this. He argued that the main interface we give to users should always perform reasonably well. Otherwise, we shouldn't give it to users. Do we want to spend more effort making Tpetra's interface perform well, or do we just want to skip it?

For future reference:

Epetra_CrsMatrix::ExtractMyRowView lives in Epetra_CrsMatrix.cpp, lines 1577-1585
Epetra_CrsMatrix::Values lives in Epetra_CrsMatrix.h, lines 1475-1480

Reply to this email directly or view it on GitHubhttps://github.com//issues/118#issuecomment-177084548.

aprokop · 2016-01-30T15:17:47Z

Hi Andrey! Thanks for doing the comparison! btw is the matrix fill complete at this point? (It must be locally indexed, since you're getting local indices.)

Yes, it is fill complete.

A while back, @nmhamster and I debated a bit about this. He argued that the main interface we give to users should always perform reasonably well. Otherwise, we shouldn't give it to users. Do we want to spend more effort making Tpetra's interface perform well, or do we just want to skip it?

At this point I think it would be a misguided effort to work on the older Tpetra's interface. As Kokkos is our future, I think in Trilinos we would be in a good shape if we just start skipping directly to Kokkos objects. We just need to let people know that this could greatly benefit them for certain computational workflows.

What I mean specifically, is whether we expect users to access the matrix's entries through Tpetra methods, or whether they should always use the Kokkos interface. What I worry about with the latter, is that users will start asking for features that will make the latter more like the former, which will reduce performance, leading again to requests for a stripped-down interface. I think what @nmhamster was trying to say is that if we give users a higher-level interface, we should commit to making it reasonably fast.

I see the point. I think there is some uncertainty to it, as Tpetra's API supports both MPI and local operations. I think local API that can be done in terms of Kokkos should be done by it, and the API that requires accessing the communicator should be done by Tpetra. But I don't have as good picture of Tpetra as you do, so take it with a grain of salt.

As for MueLu, my plan is to work on moving to Kokkos Refactor branch asap, as even with a single thread we get benefits. The reason I even started thinking about the issue is that I tried to compare the original and Kokkos Refactor variants of MueLu, and noticed the difference in a factory that just detected Dirichlet rows in a matrix.

aprokop · 2016-01-30T15:20:11Z

I would suggest running these tests through vtune to see where the differences are. At the conclusion of the milestone we had the performance between Epetra and Tpetra very similar for this operation. But that was pre-Kokkos and pre-KokkosRefactor.

If I find time, I'll do the VTune run. You are right about the milestone, as I believe performance of, say, Gauss-Seidel was the same for Epetra/Tpetra.

mhoemmen · 2016-01-31T03:50:29Z

As Kokkos is our future, I think in Trilinos we would be in a good shape if we just start skipping directly to Kokkos objects.

If "we" means Trilinos developers, I think that's acceptable. The main issue is that the features many users want in the Tpetra interface, are exactly the features that are missing or weakly implemented in the Kokkos interface. I'm thinking particularly of insert* (doesn't exist in Kokkos) and sumInto* /
replace* (Kokkos' search hasn't been optimized). I don't just want us to move features from Tpetra to Kokkos without thought (see Kokkos' current sumIntoValues).

We can think of Tpetra's current interface as a vehicle for understanding how to improve fill performance for any interface. I'm not committed to it, other than for backwards compatibility. But it exists, and we can study it and learn how to improve it. Furthermore, enriching the Kokkos interface will help improve Tpetra fill performance, by stripping away complexity and letting Kokkos take charge of local data structures. (Tpetra::CrsMatrix semantics are a mess. I don't think we have ever untangled all the possible states.)

mhoemmen · 2016-01-31T03:51:44Z

... as I believe performance of, say, Gauss-Seidel was the same for Epetra/Tpetra.

Gauss-Seidel is a different case; it does not use the Tpetra interface for accessing matrix entries, but talks directly to the low-level data structures. I was never happy about putting a public Gauss-Seidel method in Tpetra, but it worked.

maherou · 2016-01-31T05:00:44Z

I think it is a good separation of concerns to let Kokkos do what it does well. Also, it reduced direct user interaction with Tpetra and reduces Tpetra-Kokkos direct interactions.

On Jan 30, 2016, at 9:50 PM, Mark Hoemmen [email protected] wrote:

As Kokkos is our future, I think in Trilinos we would be in a good shape if we just start skipping directly to Kokkos objects.

If "we" means Trilinos developers, I think that's acceptable. The main issue is that the features many users want in the Tpetra interface, are exactly the features that are missing or weakly implemented in the Kokkos interface. I'm thinking particularly of insert* (doesn't exist in Kokkos) and sumInto* /
replace* (Kokkos' search hasn't been optimized). I don't just want us to move features from Tpetra to Kokkos without thought (see Kokkos' current sumIntoValues).

We can think of Tpetra's current interface as a vehicle for understanding how to improve fill performance for any interface. I'm not committed to it, other than for backwards compatibility. But it exists, and we can study it and learn how to improve it. Furthermore, enriching the Kokkos interface will help improve Tpetra fill performance, by stripping away complexity and letting Kokkos take charge of local data structures. (Tpetra::CrsMatrix semantics are a mess. I don't think we have ever untangled all the possible states.)

—
Reply to this email directly or view it on GitHub.

mhoemmen · 2016-01-31T08:10:19Z

I think it is a good separation of concerns to let Kokkos do what it does well.

I agree. It will take me 2-3 months of completely uninterrupted time to unravel Tpetra's semantics so I don't break backwards compatibility while pushing functionality into Kokkos, but I still agree.

Also, it reduced direct user interaction with Tpetra and reduces Tpetra-Kokkos direct interactions.

Could you comment further on these points?

maherou · 2016-01-31T14:38:05Z

If Kokkos is used directly, then Tpetra is no longer in the middle between the user and Kokkos, so user have fewer Tpetra calls and Tpetra has fewer Kokkos calls.

Mike

On Jan 31, 2016, at 2:10 AM, Mark Hoemmen [email protected] wrote:

Also, it reduced direct user interaction with Tpetra and reduces Tpetra-Kokkos direct interactions.

Could you comment further on these points?

—
Reply to this email directly or view it on GitHub.

nmhamster · 2016-01-31T15:25:48Z

@maherou, Mike, for many years we (SNL) have advocated strongly for abstraction in the software stack to maintain flexibility. I think this has been to our advantage. If we directly blend Kokkos and Tpetra are we losing that ability into the future? In our discussions with users during the codesign sessions, most times this has come up we have seen users express a want for Teptra interfaces over structures etc. One argument has been that not everyone buys into Kokkos, especially outside of Sandia, for a number of reasons, by having clear interfaces that are abstracted out they can continue to use their alternatives inside their applications. There may be some relatively small performance overheads (small as long as we get most of the inlining correct) to this approach but longer term the abstraction may provide greater flexibility?

srajama1 · 2016-01-31T18:15:07Z

Just a note of caution : Not a lot of Tpetra functionality is needed to be pushed to the Kokkos level. Fixing this performance regression could be an immediate need.

In the long term, I would argue petra abstractions are too monolithic for the regime of massive node level parallelism. (Eg : FillComplete(), All-or-nothing atomics, Poor support for asynchronous operations, tasking etc ). This needs not a refactor but complete rethinking of the design.

crtrott · 2016-01-31T18:27:45Z

One comment here about expected slowdowns of Tpetra vs direct usage of Kokkos. Generally the Tpetra interface is intended to handle a distributed memory object. I believe that that means it will always have a much higher overhead, since in generally it requires more error checks and other safe guards (like a richer state set) then a purely local object. Those things cost. The question in my mind is whether Tpetra objects should have "call through" functions to the underlying local Kokkos objects which omit some of the safe guards, or if its better to say: if you are doing a purely local operation, get the local objects to perform them.

nmhamster · 2016-01-31T18:41:29Z

It seems a good API approach to design for calls through into local objects without requiring explicit access to the Kokkos objects. Good abstraction has in many ways given us the ability to really use products like Kokkos. By passing this to avoid performance slowdowns seems to me the wrong long term path without more analysis on why the code is slower, how we can minimize the overhead or redesign the API to maintain the abstraction. This may not be achievable in the end but I would think the aggressive use of inlining we see in many compilers should mean we get down to raw calls quite quickly where the API design allows for that (we certainly see this in many Kokkos compiles already).

tawiesn · 2016-02-01T01:40:36Z

Tpetra and Xpetra allow to access the underlying local Kokkos objects (e.g. the Kokkos::CrsMatrix through the getLocalMatrix routine in Tpetra/Xpetra). Andrey is using them for his experiments. In my opinion, from a user's point of view, accessing the matrix/vector data through the local Kokkos objects is not very complicated. It's different if we're talking about matrix construction: One can locally fill the Kokkos::CrsMatrix objects (using parallel for, etc.) and then provide them in the constructor for generating the corresponding Tpetra/Xpetra::CrsMatrix object. Doing this right is extremely complicated as you need expert knowledge about how Tpetra organizes the global and local indices. For example, the user has to take care of which entries have to be communicated through MPI and put them in the right order after the local matrix entries. This is the exact opposite of what the user wants to do (especially when used to insertLocal and sumInto like routines!).

Long in short: i think one has to distinguish between the use cases:

access underlying data and change entries (without changing the communication pattern) is quite easy. One can use the already existing "call through" functions (e.g. getLocalMatrix or getHostLocalView). Of course, one can try to speed up the existing access routines (that are using ArrayViews). I rather would warn people using them and move to Kokkos (we have to keep them for backwards compatibility though). The code looks different of course (one has to port it), but the logic stays the same. It's up to the user whether he wants to do that (and gain performance) or just stay with the old style slow code. It might help and motivate users to migrate to Kokkos!
create new Tpetra matrices (setting up MPI communication) using local Kokkos objects can be done by experts. Not sure whether the API could be more user friendly (somehow one has to define the column map...). With the current status most of the users will be overwhelmed.

mhoemmen · 2016-02-01T07:32:41Z

btw I really appreciate that we're having a good discussion about this. Sometimes I feel like we don't have enough of these sorts of discussions. Also, there's nothing personal about this for me. When I first started working on Tpetra, I prioritized correctness and backwards compatibility, taking lessons from Mike's TUG and other talks on the subject. This meant retaining some perhaps unfortunate design decisions, like forcing Tpetra's users to deal with the Teuchos Memory Management classes when filling a matrix. Epetra hides MPI for good reasons; Tpetra should have done the same with Teuchos stuff, where possible.

mhoemmen · 2016-02-01T08:12:34Z

The question in my mind is whether Tpetra objects should have "call through" functions to the underlying local Kokkos objects which omit some of the safe guards, or if its better to say: if you are doing a purely local operation, get the local objects to perform them.

We don't actually have to resolve this now. If we enrich the Kokkos interface with exactly what it needs for local operations, then it will be easy for Tpetra to invoke that interface. Stripping unnecessary complexity out of Tpetra's innards will make Tpetra's interface faster too. It will also help us figure out whether Tpetra has necessary complexity that will always make filling into Tpetra a lot slower than filling into Kokkos.

Remember that some of Tpetra's internal complexity comes from the Petra Object Model. For example, the POM requires that multiple matrices be able to share the same graph. Both Epetra and Tpetra reflect this in their implementations. Christian and I spent a lot of effort preserving semantics when we did the Kokkos refactor of Tpetra. Think about how much trouble it was for apps and other Trilinos packages when we changed MultiVector from copy to view semantics to make it act more like Kokkos. Now imagine we change a bunch of other things too. (Remember the idiomatic way to test whether an Epetra matrix is empty on a process -- ask if it is neither locally nor globally indexed? This is inextricably tied to lazy allocation, which we won't be able to do if we want thread-parallel fill. Yet, I'm almost sure somebody will complain if we break this invariant in Tpetra. I broke a lot of tests when I tried getting rid of lazy allocation; see Issue #119.)

I don't mean to say that we should keep Tpetra's interface as it is. What I mean to do is advocate caution.

@trilinos/tpetra Start addressing Issue #118 by removing extra row Map lookup from CrsGraph's get{Global,Local}Row{Copy,View} methods. getRowInfo(FromGlobalIndex) already check the row Map; there's no need to check it again.

@trilinos/tpetra Continue addressing Issue #118 by removing extra row Map lookup from CrsMatrix's get{Global,Local}Row{Copy,View} methods, using the assumption explained in my previous commit message. Build/Test Cases Summary Enabled Packages: TpetraCore, Ifpack2, Amesos2, Zoltan2, MueLu, Stokhos Disabled Packages: FEI,STK,PyTrilinos,NOX,Teko,Piro 0) MPI_DEBUG => Test case MPI_DEBUG was not run! => Does not affect push readiness! (-1.00 min) 1) SERIAL_RELEASE => Test case SERIAL_RELEASE was not run! => Does not affect push readiness! (-1.00 min) 2) MPI_DEBUG_COMPLEX => passed: passed=274,notpassed=0 (64.07 min) 3) SERIAL_RELEASE => passed: passed=240,notpassed=0 (128.90 min) Other local commits for this build/test group: 7c24dde, 626bbdc

@trilinos/tpetra I was working on Issue #118 and Issue #41, and made some changes to Tpetra::CrsMatrix's getGlobalRowCopy and getLocalRowCopy methods. Tpetra's and Ifpack2's tests passed fine, but other downstream packages' tests did not. It looked like the tests were failing for a CrsMatrix with StaticProfile, never yet made fill complete, whose rows have extra space. This commit adds a unit test which exercises this case.

@trilinos/tpetra To reduce overhead of get*Row*, I added private methods to CrsGraph that get row views as raw pointers. The idea is to avoid the overhead of constructing a Teuchos::ArrayView, as well as its thread safety issues in a debug build (Teuchos_ENABLE_DEBUG=ON makes Teuchos::ArrayView not thread safe, because it retains a reference to the ArrayRCP from whence it came).

@trilinos/tpetra In the implementation of CrsMatrix::get{Global,Local}RowCopy, do not get row views of the current row's indices or values using Teuchos::ArrayView. Instead use the new nonpublic raw-pointer row view methods that CrsGraph and CrsMatrix now provide. This has the following benefits: 1. Thread safety: Teuchos::ArrayView is not thread safe in a debug build (when Teuchos_ENABLE_DEBUG=ON) 2. Performance (Issue #118): Creating and passing around Teuchos::ArrayView incurs overhead In a debug build, the new nonpublic raw-pointer row view methods do extra error checking. This should get back some of the bounds checking features of using Kokkos::View or Teuchos::ArrayView in a debug build.

@aprokop

@trilinos/tpetra This commit fixes Issue #132. @aprokop wrote a benchmark in MueLu that compares performance of getting local row views with Xpetra::CrsMatrix, Epetra_CrsMatrix, Tpetra::CrsMatrix, and KokkosSparse::CrsMatrix. See Issue #118 for discussion. The problem is that the benchmark lives in MueLu, and requires enabling experimental build options which are off by default (see Issue #128). This motivated me to port the benchmark to Tpetra. This commit adds the ported benchmark to Tpetra as an example. The main point of the benchmark is to compare Tpetra vs. Epetra vs. "raw" Kokkos performance, so I left Xpetra out. Build/Test Cases Summary Enabled Packages: TpetraCore Disabled Packages: FEI,STK,PyTrilinos,NOX,Teko,Piro 0) MPI_DEBUG => Test case MPI_DEBUG was not run! => Does not affect push readiness! (-1.00 min) 1) SERIAL_RELEASE => Test case SERIAL_RELEASE was not run! => Does not affect push readiness! (-1.00 min) 2) MPI_DEBUG_COMPLEX => passed: passed=86,notpassed=0 (3.94 min) 3) SERIAL_RELEASE => passed: passed=66,notpassed=0 (3.61 min)

mhoemmen · 2016-02-06T12:20:00Z

I think the main issue is virtual method call overhead. Epetra_CrsMatrix::NumMyEntries is NOT virtual. Epetra_CrsMatrix::NumMyRowEntries IS virtual, inherited from Epetra_RowMatrix. Tpetra::CrsMatrix::getNumEntriesInLocalRow is also virtual, inherited from Tpetra::RowMatrix. Here are some results (run time in seconds, over 2000 trials, with 10000 rows in the matrix and 10 entries per row) with Clang 3.7 on MacOS X on a laptop, release build:

Epetra ExtractMyRowView (virtual) 0.3318 (1)
Epetra NumMyEntries (NOT virtual) 0.02076 (1)
Epetra NumMyRowEntries (virtual) 0.1363 (1)
Kokkos parallel (Threads?) 0.01808 (1)
Kokkos sequential 0.00918 (1)
Tpetra getLocalRowView (virtual) 0.8482 (1)
Tpetra getNumEntriesInLocalRow (virtual) 0.265 (1)

When I replace the implementation of Tpetra::CrsMatrix::getNumEntriesInLocalRow with return 10;, the getNumEntriesInLocalRow timing becomes 0.09456. That actually beats Epetra's NumMyRowEntries. Thus, the issue we can't fix is virtual method call overhead. I think we can fix Tpetra's performance besides that, for example by getting around the getRowInfo stuff that happens inside (that does a lot more work than the method really needs).

@trilinos/tpetra See #118 for discussion. Build/Test Cases Summary Enabled Packages: TpetraCore Disabled Packages: FEI,STK,PyTrilinos,NOX,Teko,Piro 0) MPI_DEBUG => Test case MPI_DEBUG was not run! => Does not affect push readiness! (-1.00 min) 1) SERIAL_RELEASE => Test case SERIAL_RELEASE was not run! => Does not affect push readiness! (-1.00 min) 2) MPI_DEBUG_COMPLEX => passed: passed=86,notpassed=0 (5.43 min) 3) SERIAL_RELEASE => passed: passed=66,notpassed=0 (3.02 min) Other local commits for this build/test group: 194974a

mhoemmen · 2016-03-06T07:11:45Z

I'm moving this to the backlog. We've noted some possible improvements in Tpetra, but please note that a big part of performance issues for methods that don't do much work, is virtual method call overhead.

Tpetra::Details::findRelOffset is a new function (free function, not method) that finds the relative offset of an index in a row of a sparse graph or matrix, given an array of column indices in that row. The array may be either a rank-1 Kokkos::View or a raw 1-D array. The function lives in tpetra/core/src/Tpetra_Util.hpp. This fixes #205. I mean for the new function to replace or implement Tpetra::CrsGraph::find{Local,Global}Index. The function optimizes for the case where the input array to search is sorted, by using binary search in that case. This should make its performance comparable to that of Tpetra::CrsGraph::findLocalIndex or Epetra_CrsGraph::FindMyIndexLoc. I also added a test for this function. The test builds and passes. Note that it carefully exercises both the sorted and unsorted cases. Next step, not yet implemented is to optimize for the case of short rows, by using linear search in that case. This should actually improve on both Epetra and Tpetra, and possibly address #118.

mhoemmen · 2016-06-13T22:56:17Z

I'm closing this issue, because we need more specific use cases to drive it. KokkosSparse::CrsMatrix already has a low-overhead row access interface.

aprokop added the pkg: Tpetra label Jan 30, 2016

aprokop mentioned this issue Feb 2, 2016

Tpetra::Row{Graph,Matrix}: Add non-Teuchos versions of get{Global,Local}Row{Copy,View} #121

Closed

This was referenced Feb 4, 2016

MueLu: build errors when enabling MueLu_ENABLE_Kokkos_Refactor #128

Closed

Tpetra: Port Andrey's getLocalRowView benchmark from MueLu to Tpetra #132

Closed

mhoemmen added the stage: in progress Work on the issue has started label Feb 15, 2016

mhoemmen added impacting: performance and removed stage: in progress Work on the issue has started labels Mar 6, 2016

This was referenced Mar 10, 2016

Tpetra: Removed lambdas from CrsMatrix::getLocalDiagCopy crtrott/Trilinos#3

Open

Tpetra::CrsGraph::findLocalIndex: Port to use Kokkos #205

Closed

mhoemmen closed this as completed Jun 13, 2016

jgfouca mentioned this issue Jul 6, 2016

Panzer stk conversion2 #486

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of looping over Tpetra CrsMatrix rows #118

Performance of looping over Tpetra CrsMatrix rows #118

aprokop commented Jan 30, 2016

mhoemmen commented Jan 30, 2016

mhoemmen commented Jan 30, 2016

etphipp commented Jan 30, 2016

aprokop commented Jan 30, 2016

aprokop commented Jan 30, 2016

mhoemmen commented Jan 31, 2016

mhoemmen commented Jan 31, 2016

maherou commented Jan 31, 2016

mhoemmen commented Jan 31, 2016

maherou commented Jan 31, 2016

nmhamster commented Jan 31, 2016

srajama1 commented Jan 31, 2016

crtrott commented Jan 31, 2016

nmhamster commented Jan 31, 2016

tawiesn commented Feb 1, 2016

mhoemmen commented Feb 1, 2016

mhoemmen commented Feb 1, 2016

mhoemmen commented Feb 6, 2016

mhoemmen commented Mar 6, 2016

mhoemmen commented Jun 13, 2016

Performance of looping over Tpetra CrsMatrix rows #118

Performance of looping over Tpetra CrsMatrix rows #118

Comments

aprokop commented Jan 30, 2016

mhoemmen commented Jan 30, 2016

mhoemmen commented Jan 30, 2016

etphipp commented Jan 30, 2016

aprokop commented Jan 30, 2016

aprokop commented Jan 30, 2016

mhoemmen commented Jan 31, 2016

mhoemmen commented Jan 31, 2016

maherou commented Jan 31, 2016

mhoemmen commented Jan 31, 2016

maherou commented Jan 31, 2016

nmhamster commented Jan 31, 2016

srajama1 commented Jan 31, 2016

crtrott commented Jan 31, 2016

nmhamster commented Jan 31, 2016

tawiesn commented Feb 1, 2016

mhoemmen commented Feb 1, 2016

mhoemmen commented Feb 1, 2016

mhoemmen commented Feb 6, 2016

mhoemmen commented Mar 6, 2016

mhoemmen commented Jun 13, 2016