-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make PETSc solvers available #2645
Comments
PETSc seems a great package. When I looked at #942, I was surprised that the Sparse matrix multiplication was implemented using pure Julia with a simple loop. High performance sparse numeric computation is known to be very difficult to optimize (in a right way), and typically takes hundreds of thousands lines of C/ASM code (some guy working in Mathworks told me this). This is something that should be delegated to highly optimized C libraries (e.g. Sparse BLAS). |
In terms of backend, it seems that Sparse BLAS + ScaLAPACK is also widely used. |
ScaLAPACK is for parallel dense solvers, totally different problem. (And not as useful: the N^3 scaling means that even if you have 1000x more processors and get linear speedup, you can only do a 10x bigger problem.) Sparse BLAS is very low-level, and my understanding is that PETSc can utilize optimized sparse BLAS routines in its backend. |
Yes, you are right that ScaLAPACK is not for this purpose. For Sparse BLAS, I have been using the version provided by MKL, which seems fine for me. However, the interface looks somewhat different from that in netlib. |
Wasn't this discussed at some point earlier? PETSc is great, but it's a huge and complex dependency (how do you deal with optional solver interfaces?). It should definitely live in a package for some time before considering making it part of base. |
I agree that PETSc is a huge dependency (with an insane build system to boot). Distributing it with Julia is probably not in the cards. But I would rather see work on a solid PETSc package than a lot of effort re-inventing the wheel in Julia. |
While petsc has all the various sparse solvers and such, which we should certainly leverage, I did not know that it had a drop-in sparse matrix library for doing things such as matrix indexing, concatenation, various matrix operations on all the various |
You can't really take advantage of using PETSc without using its storage format (which is based on CSR by the way) and primitives. Other missing operations can be built on top of this. |
We can certainly convert from the existing data structure to CSR, and also most solvers allow for transposed solves. Sequential sparse libraries typically have CSC storage, and a lot of sparse codes from Matlab that are ported will end up having poor performance if we use CSR. |
Even if we use CSR, I'm not sure if we can use PETSc without making a copy; some rooting around in the guts of the library may be required. And even then that would only be the serial case. CSR and CSC are just transposes, and given a factorization of the transpose of a matrix it is trivial to infer a factorization of the matrix. Because of this, PETSc is able to use sparse-direct libraries that use either CSR or CSC, and it is able to solve problems with either the matrix or its transpose using the same factorization (i.e. it has Nor is this like the situation with dense matrices where all your loops are in the wrong order if you have a row-major array and were expecting column-major, because most codes don't traverse sparse matrices by explicit loops in this way. Can you give an example of a typical Matlab code using sparse matrices that would have problems? |
I think the right thing here is to have a PETSc package with a new The current SparseCSC stuff could be kept around for people who don't have PETSc and who aren't too picky about performance, but optimization of that code could be placed on a back burner. |
Agree. That is the general direction I was thinking of. The performance for CSC comparable in most cases to matlab. With the PetSc sparse backend, would PetSc own the sparse matrix, or would it be julia? One option would be to have a barebones SciPy has taken this path, with 3 implementations (CSC, CSR, and COO) and I always found myself getting annoyed. This was almost 2 years back, and perhaps things have significantly improved now. |
Yes, things in Python change a lot. Scipy now have implemented 7 different kinds of sparse matrices (http://docs.scipy.org/doc/scipy/reference/sparse.html). |
Actually, not only scipy, a lot of other important libraries support multiple sparse formats, notably Intel MKL and cuSparse. From a user's perspective, I don't think I would have any issue if Julia provide different kinds of sparse matrices as options, as long as they have consistent interfaces. I would simply choose the one that fits my problem naturally. I understand that it would involve more work on Julia developer's side. But, for many of the functions defined on sparse matrices, it would just be a wrapper of some C functions of the underlying library of choice. So it should not be a huge amount of work if we do not choose to implement everything in Julia. There is a Sparse BLAS interface standard (http://www.netlib.org/blas/blast-forum/chapter3.pdf). Here is a complete package of F95 implementation of the entire standard (http://www.netlib.org/toms/818), which supports 18 different formats (including CSC, CSR, and COO, etc). I believe it is in the public domain, so you can just grab and use them. It would be great to have I am not just being picky at performance. But in applications that use sparse matrices, sparse matrix multiplication and solving sparse equations are just the performance bottleneck. It is really important to get these fast. |
I've personally used It's really unavoidable to end up dealing with different sparse formats because different formats are more efficient in some applications than others. |
PETSc uses CSR of course because it's more convenient for mat-vec. |
Sorry for coming late to this party. I agree with @mlubin that keeping a general implementation if One thing I have tried to do in the umfpack.jl and cholmod.jl code is to integrate the sparse matrix solvers into the factorization framework so that a factorization can be reused. For sparse matrices it is also important to be able to separate the symbolic and numeric steps in a factorization. Right now there is the somewhat obscure I haven't looked in depth at PETSc but I do know that it is huge and I suspect that there must be some overhead in providing a unified interface to many different solvers. (By the way, has anyone looked at the rudimentary Julia interface in the PETSc sources?) It also seems that the purpose of PETSc is partial differential equations solutions. I appreciate that it will provide an interface to sparse solvers but pulling in the whole thing to get one part is overkill perhaps. It is also important to keep in mind that it is easier to write Julia interfaces to C/Fortran libraries than it is to write interfaces from Matlab/Octave or R. I am the co-author of the Matrix package for R and it contains a huge amount of glue code written in C. The Julia interface for UMFPACK/CHOLMOD now requires only two, very brief C functions and those are for the purposes of obtaining the size of the cholmod_common struct and offsets into the struct. There is always a balance in a new language between rewriting code and interfacing to code. Before going the PETSc route exclusively I would suggest trying to write Julia interfaces to other sparse solvers to see how far that approach can be carried. The PaStiX home page lists, in addition to PaStix MUMPS : http://mumps.enseeiht.fr/ Some of these may not be in contention. TAUCS, WSMP and PSPASES have not been updated in over 10 years. PARDISO, HSL and BCSLib have unsuitable licenses even for creating a Julia package The CHOLMOD home page links to a 2005 paper, http://doi.acm.org/10.1145/1236463.1236465 , that provides some comparisons on speed and reliability. |
@lindahua, the sparse BLAS is not in the same position as the dense BLAS. If you are doing dense linear algebra, you are strongly motivated to use BLAS: there are multiple competing free fast implementations, and the most widely used linear-algebra library (LAPACK) is built on top of it. The sparse BLAS does not have so many implementations (the reference BLAS is of course rather simplistic), nor does it dominate the solver libraries (and it is irrelevant to sparse-direct solvers). The dominant solution is PETSc (with Trilinos being a runner-up). @dmbates, while PETSc is motivated in part by sparse PDE problems, the PETSc library does not solve PDEs. Rather, most popular PDE solvers (libMesh, FENICS, deal.II) are built on top of PETSc. The PETSc library itself only provides sparse linear algebra (and closely related solvers like parallel Newton and sparse integrators). Moreover, there is a lot more out there than "just" (!!!) rolling our own interfaces to half a dozen sparse-direct solvers---there is the whole world of iterative solvers and approximate preconditioners as well. By interfacing one library, Julia gets access to all of these. It is such a huge waste of effort to roll our own interfaces and implementations of all these solvers. |
What about we just keep the sparse module as it is in Base temporarily, and try experimenting ideas using packages? For example, we can create PETSc.jl to experiment with @stevengj's idea. (There is |
@lindahua, we should definitely keep Base.SparseCSC as-is for now, and have a serious PETSc.jl package to experiment with. My main point is that using existing libraries (of which PETSc is currently the dominant choice) should be the focus of future developement on Julia's sparse-matrix support, not re-inventing the wheel in Julia. |
It may be worth also checking in with the guy did that PetSc.jl interface to see if he's interested in collaborating further, and perhaps even advertising the effort on the PetSc mailing list. We may be able to get collaborators that way for PetSc.jl. |
I found this PETSc package: @qsnake There seems to be more general interest in having Julia+PETSc. |
That package doesn't actually define a |
Trilinos project by Sandia labs (http://trilinos.sandia.gov/packages/) is mentioned above and should be considered seriously. it is modular than the monolith Petsc. |
Trilinos is C++, which makes it much harder to interface to Julia. (There is a |
Hi, PETSc developer here. We are interested in assisting to provide PETSc to Julia users. A couple comments: The PETSc.jl linked above is in our PETSc repository (https://github.com/petsc/petsc/blob/master/bin/julia/PETSc.jl). It surprises me that two unmaintained third-party mirrors are showing up in google search for "petsc.jl" before ours (bitbucket is our primary, but github.com/petsc/petsc is our mirror, and is updated regularly). @BarrySmith can comment further on those bindings. PETSc can operate on arbitrary user-provided data structures, but to do that, you would have to provide the matrix kernels. Some applications do this to access Krylov methods, eigensolvers (via SLEPc), and nonlinear solvers. It makes a lot of sense for regularized inverse problems and some integral equations, where the matrix is dense, but can be applied in O(n) and has spectral decay inherited from the "compact+identity" structure of the underlying continuous problem. However, for sparse PDE problems, it is generally worth using our matrix implementations for preconditioning (crucial for iterative methods) and direct solvers. We provide a common interface to many third-party sparse direct solvers and preconditioners, including ML (from Trilinos), Hypre, MUMPS, UMFPACK, CHOLMOD, SuperLU, and PaStiX. We also support Elemental, a modern dense linear algebra library that is roundly better than ScaLAPACK. Our recommendation for assembly is to allow PETSc to hold the data structure and use our API to set values. Internally, we can use any of several matrix formats based purely on run-time options. Additional formats can be supported via plugins. It would still be useful to have a "conversion" function, but the best performance and lowest memory use comes from assembling into the final data structure. (For standard CSR, you can use MatCreateSeqAIJWithArrays to avoid a copy, but other formats perform better in certain contexts, so we don't recommend cooking that into an interface.) Finally, what is your story on parallelism? Is distributed memory via MPI something that you want to target? Do you want to call PETSc from within threads, or is it enough to call PETSc from one thread, but have PETSc use threads internally? We also have some GPU support that may be interesting. I can answer further questions here, on your mailing list (I subscribe), or on ours ([email protected]). |
@jedbrown, thanks for chiming in. I think what we had in mind was precisely what you are suggesting: implement a new sparse-matrix type in Julia (a subclass of In the longer run, it would be nice to also support passing any arbitrary Regarding threads, in the short run (since Julia does not support shared-memory parallelism, issue #1790) we would probably just be calling PETSc from a single thread. PETSc can and should use threads internally (it might be nice to use OpenMP so that it can share the same thread pool as OpenBLAS and FFTW). It is also important in the longer run to support distributed-memory parallelism, but that is probably on a back burner for now. Julia can call MPI, of course (like any C library), but Julia also has its own distributed-memory and distributed-array interface and figuring out how that should interact with MPI and PETSc distributed arrays is a longer-run problem. |
Good to hear that callbacks are fully supported now. That was @BarrySmith's roadblock with petsc.jl: nonlinear solvers need callbacks. Let us know if there is anything specific that you'd like our assistance with. How would you prefer to package a PETSc interface? It could be in the PETSc repository, in the Julia repository, or as an independent package. |
Probably it would make the most sense as an separate repository (hosted wherever is convenient for you), but pointed to from Julia's Package-system metadata so that users can install it with |
If you only change ABI on the decade scale, that is easy enough to keep up with. I've maintained plenty of C libraries (e.g. you are using my FFTW library), and there is no reason for things like enums or basic function signatures to change unless there is a major API upheaval, and in that case we would have to modify our wrappers anyway. We can certainly parse the header file to generate the corresponding C interfaces (we even have a whole Clang package to do just that using the C compiler). But if you don't maintain a modicum of ABI stability, even this is extremely fragile — we cannot distribute pre-generated code since it might not match the Petsc on the user's system, and it is dangerous not to regenerate at runtime in case the library ABI has changed out from under them. Moreover, I don't want to just munge your header files and present the raw C interface to Julia users. That would be do-able with Clang in a fully automated way. I want a high-level interface that integrates nicely with the rest of Julia. But this has to be hand-written, and if you have any major API changes it will have to be hand-modified to keep up. As long as you don't have ABI changes for no reason (i.e. not unless there is a corresponding API change), then it is not really any more work to keep with ABI changes than it is to keep up with API changes. |
At the very least, we have to be cautious at runtime about the possibility that |
@stevengj I humbly submit that your comment under-estimates the size of the design space for scalable algebraic solvers by some orders of magnitude. FFTW is a great library, but it restricts itself to doing essentially one well-studied thing very well. The number of papers per year on new algorithmic variants, often with order-of-magnitude impact on problem classes appearing in various applications is one way to measure the difference in scope. PETSc's base APIs rarely change, but the advanced APIs will probably continue to change occasionally until our field stops finding better algorithms and different forms of exploitable structure. I understand that the Julia interface will need updates in these cases (if the Julia interface tries to be "complete"; i.e., if your target audience includes people developing new methods of the sort that might "normally" go inside PETSc) and I think that compiler type checking of some sort is extremely valuable in this setting. If you can use clang to verify that the calls make by the Julia interface is compatible with the interface in the header, perhaps even optionally, I think it would be very worthwhile. This is the sort of type checking that Cython offers to petsc4py, which also exposes a more pythonic (and complete) interface to PETSc. (The first generation of petsc4py used generated bindings and another layer to provide a pythonic interface.) We'll put Barry's patch into the next maintenance release (3.4.3) so you can use it. |
The approach I usually prefer its to wrap the C api using clang if possible giving a raw interface and then writing a Julian api on top of that. Users can then work at either layer. |
@ViralBShah, the thing to realize is that the Petsc C interface is huge, and the vast majority of it consists of things that most Julia users will never call directly if we have a native high-level interface. Until we get precompiled modules, wrapping the entire thing with Clang will impose a big time penalty for little benefit on anyone loading PETSc.jl .... the tiny number of very advanced users who need some inner Petsc functionality can always use (An analogous thing would have been for my PyCall library to first use clang to wrap the entire Python C API, which almost no one would use directly from Julia, and then write the higher-level interface on top of that.) @jedbrown, don't your API changes consist mainly of new APIs for new structures etcetera, rather than changing the old ones? The thing to realize is that writing wrappers is different from using the C API directly. In writing wrappers, I only call the C function once in the code, and in that case having an automatic translation of the C header is at best a minor convenience; I have to look carefully at your documentation (and occasionally your source code) in order to figure how best to call it, and typing in the parameters is the least of the difficulties. |
(And there is absolutely no technical excuse for ever changing the value of an |
Also, writing wrappers is different from using the C API directly. When one writes wrappers, one only calls each C API function once. Compared to the effort required to look in the Petsc manual and carefully scrutinize the documentation in order to determine the best way to call it, the effort of typing in the parameters is minimal; automatic translation of the headers is only a minor convenience. And keeping up with occasional API changes in wrappers is not really a burden, as long as you aren't changing existing APIs gratuitously (e.g. renumbering enums on every release). |
But since people seem to feel strongly about this, I'll run Clang on the Petsc header files and include an auto-generated C API in addition to the high-level wrappers. |
@stevengj I can think of at least one case where arguments to a function changed because the old interface implied that the caller would need to have done a non-scalable operation. Occasionally, a function is renamed or arguments are changed to be consistent with similar operations elsewhere. (We value consistency and simplicity above long-term binary stability.) One reason for an enum value to change is if making the values bitwise ORable makes the interface simpler and more powerful, or if it is converted to dynamic registration (the macros become string constants, users/plugins can register more). These changes aren't common, but we're not going to promise to never make such changes. If the wrapper is distinctly higher level than the underlying interface, how is the wrapper going to maintain a more stable API while still being feature-complete? As you say, PETSc's interface is large, but most users start out using only a few basic routines, and as the demands of their application grow, they incrementally use more. If the wrapper only provides a conservative subset of the functionality, chances are that the interfaces it touches won't ever change, but a large fraction of users will need to dip into the lower level interface. I look forward to seeing how the Clang-generated API is done. That approach may be very useful for us in other projects. |
@stevengj Note that PETSc already contains string representations for all enum types that you can use in Julia so you don't need to re-represent them all in your Julia bindings. For example /*E Level: beginner .seealso: KSPCGSetType() Not Collective Input Parameters:
Output Parameters:
Notes: Level: advanced If these don't suite your need they can always be extended/modified for more general usability. |
Steve, you are right that auto generated headers may take forever to load. |
It's about 6 seconds, which is not great, but better than I expected. (should be a bit faster with proper exclusion criteria so that it only pulls in the public API). See JuliaInterop/Clang.jl@1489ab3#commitcomment-3937825 edit: also, this doesn't help with the issue of |
6 seconds is actually ok. |
If the work to reduce loading times of Julia itself can also be extended to packages (presumably by building them along with Julia itself, at the user's discretion), then such issues will become much less important. And I'd agree that 6 seconds for a single package is tolerable, as long as there are few such packages you need to load to get work done. It doesn't take many such packages to yield 30s load times, which is why (currently) it's important to think about keeping things lean when possible. |
Note that on top of this 6 seconds you will have to add the time to load an MPI package, if you want to do any kind of extensive parallel work with PETSc (the basic PETSc interface is usable without directly calling MPI, but in my experience real applications often need a few direct MPI calls outside the PETSc interface). In the long run, it is really essential that Julia gain the ability to cache precompiled copies of modules, but looks like we have a ways to go. (Precompiled Julia is on the horizon with #3892, but it seems a big jump from there to precompile modules in such a way as to still be able to load them in arbitrary order at runtime.) |
Hi everyone. Great initiative interfacing PETSc with Julia. Since all posts on the subject are more then a year old, I was wondering if any progress has been made since? |
I'm afraid that I haven't had a chance to work on it. |
Thanks for the update Steven. On Mon, Dec 1, 2014 at 7:40 PM, Steven G. Johnson [email protected]
|
@JaredCrean2 and @stevengj have started work on PETSc.jl, so let's move further discussion over there. |
Note that PETSc.jl seems to be progressing nicely. @JaredCrean2 and @ihnorton put together auto-generated Clang wrappers of the entire PETSc API, so the low-level C API is completely wrapped and available, and we are slowly progressing towards a nicer high-level API built on top of that. Also note that we are wrapping/linking the double, single, and complex-double versions of PETSc simultaneously, so that you don't need to recompile to switch precisions or from real to complex. |
Nice, how are you managing the fact that in our C code we use the same symbol for, for example, VecAXPY() independent of he data type. Are you somehow automatically mapping to different symbols? Maybe we should use that same technology of PETSc as well? Does this mean I can remove the small amount of .jl code I put in PETSc a few years ago as a test prototype? Barry
|
Hello Barry, I think its safe to remove the .jl code in Petsc at this point. |
Fun fact: that little bit of code was the first place I ever heard of Julia. I didn't actually start using it for another 1.5-2 years, but oh well. |
@BarrySmith, to clarify, when you call a C function in Julia, you supply both the symbol and the library name, via (The auto-generated wrappers on top of this mean that the user never needs to see the low-level The analogous thing in C would be to |
Rather than rolling our own sparse-matrix library, it seems like it would be much better to use PETSc, which is the de-facto standard here.
PETSc would give us access to a huge number of sparse-direct back-ends (#2632), iterative solvers, sparse eigensolvers (#1573) via the SLEPc library, optimized implementations (#942), multiple output formats, and parallel support (currently completely missing and quite nontrivial to implement well).
Re-inventing the wheel here is a little like rewriting LAPACK.
The text was updated successfully, but these errors were encountered: