Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using scikit-fem on pip: solver remark #690

Closed
bhaveshshrimali opened this issue Aug 17, 2021 · 34 comments
Closed

Using scikit-fem on pip: solver remark #690

bhaveshshrimali opened this issue Aug 17, 2021 · 34 comments

Comments

@bhaveshshrimali
Copy link
Contributor

Although this is tangentially related to scikit-fem since the primary objective of scikit-fem is assembly and not solving the resulting systems, it maybe important to note that when working with a default pip installation on Windows, there is a significant performance degradation in solving, since the default sparse solver is SuperLU which is super slow. I think the conda installation may not suffer from this since the default solver is UMFPACK (?) which can be faster. There maybe further improvements possible when numpy is compiled against an optimized BLAS but that is something most users wouldn't care for.

I managed to get Pardiso working on Windows and with pypardiso things are significantly faster. See the timings on example 36 with nthreads=8 (and Mesh.refined(3)) and a slight addition at

def solver_direct_scipy(**kwargs) -> LinearSolver:

namely simply calling pypardiso.spsolve

def solver_direct_pypardiso(**kwargs) -> LinearSolver:
    """Linear solver provided by Pardiso."""

    def solver(A, b):
        return pypardiso.spsolve(A, b)

    return solver

Timings

Pardiso

Solve time: 0.3992440700531006
1, norm_du: 45.481410024640475, norm_dp: 14.934126744945504
Solve time: 0.31940722465515137
2, norm_du: 16.15926762709032, norm_dp: 20.93859584370273
Solve time: 0.3229987621307373
3, norm_du: 3.2471798205287667, norm_dp: 15.053533831714226
Solve time: 0.35599684715270996
4, norm_du: 0.4538801110428914, norm_dp: 2.6708267761548887
Solve time: 0.3192098140716553
5, norm_du: 0.027865264212750027, norm_dp: 0.1630714233226417
Solve time: 0.328704833984375
6, norm_du: 0.002041324778791658, norm_dp: 0.016283691403898768
Solve time: 0.3122262954711914
7, norm_du: 1.6033937733037654e-05, norm_dp: 0.0003318753774227752
Solve time: 0.33296942710876465
8, norm_du: 1.6887613490335165e-09, norm_dp: 4.315368227582554e-08
Solve time: 0.35301685333251953
9, norm_du: 4.4086266413504256e-15, norm_dp: 1.7099154835473912e-14
Total time: 98.29319977760315

SuperLU

Solve time: 19.864436626434326
1, norm_du: 45.48141002464045, norm_dp: 14.934126744945653
Solve time: 17.160115242004395
2, norm_du: 16.159267627090234, norm_dp: 20.93859584370295
Solve time: 18.916175842285156
3, norm_du: 3.247179820528735, norm_dp: 15.053533831714496
Solve time: 17.143075704574585
4, norm_du: 0.45388011104290815, norm_dp: 2.670826776154936
Solve time: 17.078906536102295
5, norm_du: 0.027865264212752802, norm_dp: 0.16307142332264873
Solve time: 17.101752281188965
6, norm_du: 0.0020413247787918446, norm_dp: 0.016283691403900905
Solve time: 17.19295907020569
7, norm_du: 1.6033937733110564e-05, norm_dp: 0.00033187537742298715
Solve time: 17.220698833465576
8, norm_du: 1.688761318114545e-09, norm_dp: 4.3153681931898745e-08
Solve time: 16.51562809944153
9, norm_du: 4.3255823949057516e-15, norm_dp: 1.5848762808515266e-14
Total time: 251.57028770446777

@kinnala
Copy link
Owner

kinnala commented Aug 17, 2021

@gdmcbain weren't you using Windows at some point? Are you qualified to comment on this? Is there anything we can do here?

@bhaveshshrimali
Copy link
Contributor Author

Just as a sidenote, why does conda-forge still have version 2.4.0?

@kinnala
Copy link
Owner

kinnala commented Aug 17, 2021

I believe it's caused by this line: https://github.com/conda-forge/scikit-fem-feedstock/blob/master/recipe/meta.yaml#L39

skfem.importers was renamed to skfem.io.

@gdmcbain
Copy link
Contributor

@gdmcbain weren't you using Windows at some point?

Yes, I use Ubuntu 20.04.2 LTS, Pop!_OS 21.04, and MS-Windows 10 regularly, all with Python installed from Miniconda and scikit-fem then pip-installed from master or other branch of this repo or sometimes PyPI.

I do have the anecdotal finding that when I rewrote the unsteady Navier–Stokes tutorial example from FEniCS in scikit-fem in gdmcbain/fenics-tuto-in-skfem#5, the timings were roughly: FEniCS Linux half an hour, scikit-fem Linux twelve minutes, scikit-fem MS-Windows five minutes, but the Windows and Linux were different machines, so really not a very useful comparison.

I find that I can conda install -c haasad pypardiso and then import spsolve from pypardiso on Pop!_OS too, so I'll be giving that a go. Thanks, Bhavesh!

@kinnala
Copy link
Owner

kinnala commented Aug 17, 2021

Trying to fix conda-forge in a PR: conda-forge/scikit-fem-feedstock#17

Edit: Oops, used wrong branch.

@bhaveshshrimali
Copy link
Contributor Author

Perfect!! Thanks! I have been meaning to switch to conda for quite sometime, but have resisted so far since another package netgen that I use for mesh generation wasn't available with anaconda.

@bhaveshshrimali
Copy link
Contributor Author

Yes, I use Ubuntu 20.04.2 LTS, Pop!_OS 21.04, and MS-Windows 10 regularly, all with Python installed from Miniconda and scikit-fem then pip-installed from master or other branch of this repo or sometimes PyPI.

I do have the anecdotal finding that when I rewrote the unsteady Navier–Stokes tutorial example from FEniCS in scikit-fem in gdmcbain/fenics-tuto-in-skfem#5, the timings were roughly: FEniCS Linux half an hour, scikit-fem Linux twelve minutes, scikit-fem MS-Windows five minutes, but the Windows and Linux were different machines, so really not a very useful comparison.

I find that I can conda install -c haasad pypardiso and then import spsolve from pypardiso on Pop!_OS too, so I'll be giving that a go. Thanks, Bhavesh!

Thanks, I think conda uses UMFPACK so solving should be just as fast, but yes definitely worthwhile seeing if pypardiso improves things. SuperLU is horribly slow. I recall observing this in #439 but didn't pay much attention at that time. I have been using PyPI for the most part because of netgen, but will try out conda in the future.

@gdmcbain
Copy link
Contributor

Well, I tried this out on my System 76 laptop running Pop!_OS 21.04—not on that unsteady Navier–Stokes solver, that's all preconditioned Krylov, but for the steady Navier–Stokes solver in ex27 which (currently still) needs direct solvers to track the steady solution branch past the supercritical Hopf bifurcation, and got the opposite result! The default SuperLU was significantly quicker than pypardiso! (The latter, changing only ex27, is in gdmcbain@bd2553b.)

Here's how SnakeViz saw it, run as, e.g.

python -m cProfile -o ex27.pardiso.prof -s cumtime docs/examples/ex27.py 
snakeviz ex27.pardiso.prof

Master (SuperLU):

ex27-master-snakeviz

pypardiso:

ex27-pardiso-snakeviz

@gdmcbain
Copy link
Contributor

I tried the same comparison, master v. bd2553b, on a Hewlett Packard laptop running MS-Windows 10 and it gave similar pictures, the big blocks again being linsolve.spsolve at 20.5 s v. pardiso_wrapper.factorize at 45.6 s.

Did I muck something up in bd2553b?

@bhaveshshrimali
Copy link
Contributor Author

linsolve.spsolve at 20.5 s v. pardiso_wrapper.factorize at 45.6 s.

Hmm interesting and opposite of what I've read so far. I can try running ex27 as well just to double check. It is possible that something is peculiar about the problem I'm solving in ex36.

@gdmcbain
Copy link
Contributor

Yes, please do. Maybe ex27 is too small? I did reduce the number of elements so that it wouldn't take so long every time tests.test_examples.TestEx27 is run.

@bhaveshshrimali
Copy link
Contributor Author

To that effect, here is the snakeviz from running ex36 with the two solvers

PyPardiso

image

Scipy

image

The assembly time is also large for my case, but the difference in solving is clear. Now onto, ex27.

Code

import numpy as np
from scipy.sparse import bmat
from skfem.helpers import grad, transpose, det, inv, identity
from skfem import *


mu, lmbda = 1., 1.e3


def F1(w):
    u = w["disp"]
    p = w["press"]
    F = grad(u) + identity(u)
    J = det(F)
    Finv = inv(F)
    return p * J * transpose(Finv) + mu * F


def F2(w):
    u = w["disp"]
    p = w["press"].value
    F = grad(u) + identity(u)
    J = det(F)
    Js = .5 * (lmbda + p + 2. * np.sqrt(lmbda * mu + .25 * (lmbda + p) ** 2)) / lmbda
    dJsdp = ((.25 * lmbda + .25 * p + .5 * np.sqrt(lmbda * mu + .25 * (lmbda + p) ** 2))
             / (lmbda * np.sqrt(lmbda * mu + .25 * (lmbda + p) ** 2)))
    return J - (Js + (p + mu / Js - lmbda * (Js - 1)) * dJsdp)


def A11(w):
    u = w["disp"]
    p = w["press"]
    eye = identity(u)
    F = grad(u) + eye
    J = det(F)
    Finv = inv(F)
    L = (p * J * np.einsum("lk...,ji...->ijkl...", Finv, Finv)
         - p * J * np.einsum("jk...,li...->ijkl...", Finv, Finv)
         + mu * np.einsum("ik...,jl...->ijkl...", eye, eye))
    return L


def A12(w):
    u = w["disp"]
    F = grad(u) + identity(u)
    J = det(F)
    Finv = inv(F)
    return J * transpose(Finv)


def A22(w):
    u = w["disp"]
    p = w["press"].value
    Js = .5 * (lmbda + p + 2. * np.sqrt(lmbda * mu + .25 * (lmbda + p) ** 2)) / lmbda
    dJsdp = ((.25 * lmbda + .25 * p + .5 * np.sqrt(lmbda * mu + .25 * (lmbda + p) ** 2))
             / (lmbda * np.sqrt(lmbda * mu + .25 * (lmbda + p) ** 2)))
    d2Jdp2 = .25 * mu / (lmbda * mu + .25 * (lmbda + p) ** 2) ** (3/2)
    L = (-2. * dJsdp - p * d2Jdp2 + mu / Js ** 2 * dJsdp ** 2 - mu / Js * d2Jdp2
         + lmbda * (Js - 1.) * d2Jdp2 + lmbda * dJsdp ** 2)
    return L


mesh = MeshTet().refined(3)
uelem = ElementVectorH1(ElementTetP2())
pelem = ElementTetP1()
elems = {
    "u": uelem,
    "p": pelem
}
basis = {
    field: Basis(mesh, e, intorder=2)
    for field, e in elems.items()
}

du = basis["u"].zeros()
dp = basis["p"].zeros()
stretch_ = 1.

ddofs = [
    basis["u"].find_dofs(
        {"left": mesh.facets_satisfying(lambda x: x[0] < 1.e-6)},
        skip=["u^2", "u^3"]
    ),
    basis["u"].find_dofs(
        {"bottom": mesh.facets_satisfying(lambda x: x[1] < 1.e-6)},
        skip=["u^1", "u^3"]
    ),
    basis["u"].find_dofs(
        {"back": mesh.facets_satisfying(lambda x: x[2] < 1.e-6)},
        skip=["u^1", "u^2"]
    ),
    basis["u"].find_dofs(
        {"front": mesh.facets_satisfying(lambda x: np.abs(x[2] - 1.) < 1e-6)},
        skip=["u^1", "u^2"]
    )
]

dofs = {}
for dof in ddofs:
    dofs.update(dof)

du[dofs["left"].all()] = 0.
du[dofs["bottom"].all()] = 0.
du[dofs["back"].all()] = 0.
du[dofs["front"].all()] = stretch_

I = np.hstack((
    basis["u"].complement_dofs(dofs),
    basis["u"].N + np.arange(basis["p"].N)
))


@LinearForm(nthreads=8)
def a1(v, w):
    return np.einsum("ij...,ij...", F1(w), grad(v))


@LinearForm(nthreads=8)
def a2(v, w):
    return F2(w) * v


@BilinearForm(nthreads=8)
def b11(u, v, w):
    return np.einsum("ijkl...,ij...,kl...", A11(w), grad(u), grad(v))


@BilinearForm(nthreads=8)
def b12(u, v, w):
    return np.einsum("ij...,ij...", A12(w), grad(v)) * u


@BilinearForm(nthreads=8)
def b22(u, v, w):
    return A22(w) * u * v

from time import time
t0 = time()
for itr in range(12):
    uv = basis["u"].interpolate(du)
    pv = basis["p"].interpolate(dp) 

    K11 = asm(b11, basis["u"], basis["u"], disp=uv, press=pv)
    K12 = asm(b12, basis["p"], basis["u"], disp=uv, press=pv)
    K22 = asm(b22, basis["p"], basis["p"], disp=uv, press=pv)
    f = np.concatenate((
        asm(a1, basis["u"], disp=uv, press=pv),
        asm(a2, basis["p"], disp=uv, press=pv)
    ))
    K = bmat(
        [[K11, K12],
         [K12.T, K22]], "csr"
    )
    t1 = time()
    uvp = solve(*condense(K, -f, I=I), solver="pardiso")
    print(f"Solve time: {time() - t1}")
    delu, delp = np.split(uvp, [du.shape[0]])
    du += delu
    dp += delp
    normu = np.linalg.norm(delu)
    normp = np.linalg.norm(delp)
    print(f"{itr+1}, norm_du: {normu}, norm_dp: {normp}")
    if normu < 1.e-8 and normp < 1.e-8:
        break

print(f"Total time: {time() - t0}")

if __name__ == "__main__":
    mesh.save(
        "example36_results.xdmf",
        {"u": du[basis["u"].nodal_dofs].T, "p": dp[basis["p"].nodal_dofs[0]]},
    )

@bhaveshshrimali
Copy link
Contributor Author

With ex27, indeed SuperLU is faster.

scipy

image

pardiso

image

I guess what I said might not be true in general then. It highly depends on the specific characteristics of the problem (associated jacobian and residual). Just out of curiosity, have you tried ex27 with larger meshes?

@bhaveshshrimali
Copy link
Contributor Author

I have tried now with self.mesh.refined(4) and the results are still in favor of SuperLU.
image

vs

image

Just to be definite, this is the code that I am running

from skfem import *
from skfem.helpers import grad, dot
from skfem.models.poisson import vector_laplace, laplace
from skfem.models.general import divergence, rot
from skfem.utils import LinearSolver
import skfem.io.json
from functools import partial
 
from pathlib import Path
from typing import Tuple, Iterable
from matplotlib.pyplot import subplots
from matplotlib.tri import Triangulation
import numpy as np
from scipy.sparse import bmat, block_diag, csr_matrix

from pacopy import natural
from pypardiso import spsolve


def solver_direct_pypardiso() -> LinearSolver:
    """Linear solver provided by Pardiso."""

    return spsolve


# solve = partial(solve, solver=solver_direct_pypardiso())


@LinearForm(nthreads=8)
def acceleration(v, w):
    """Compute the vector (v, u . grad u) for given velocity u
    passed in via `wind` after having been interpolated onto its
    quadrature points.
    In Cartesian tensorial indicial notation, the integrand is
    .. math::
        u_j u_{i,j} v_i.
    """
    return np.einsum('j...,ij...,i...', w['wind'], grad(w['wind']), v)

@BilinearForm(nthreads=8)
def acceleration_jacobian(u, v, w):
    """Compute (v, w . grad u + u . grad w) for given velocity w
    passed in via w after having been interpolated onto its quadrature
    points.
    In Cartesian tensorial indicial notation, the integrand is
    .. math::
       (w_j du_{i,j} + u_j dw_{i,j}) v_i
    """
    return dot(np.einsum('j...,ij...->i...', w['wind'], grad(u))
               + np.einsum('j...,ij...->i...', u, grad(w['wind'])), v)

class BackwardFacingStep:
    element = {'u': ElementVectorH1(ElementTriP2()),
               'p': ElementTriP1()}
    def __init__(self,
                 length: float = 35.):
        self.mesh = skfem.io.json.from_file(Path(__file__).parent / 'meshes' / 'backward-facing_step.json')
        self.mesh.refined(6)
        self.basis = {variable: Basis(self.mesh, e, intorder=3)
                      for variable, e in self.element.items()}
        self.basis['inlet'] = FacetBasis(self.mesh, self.element['u'],
                                         facets=self.mesh.boundaries['inlet'])
        self.basis['psi'] = self.basis['u'].with_element(ElementTriP2())
        self.D = np.concatenate([b.all() for b in self.basis['u'].find_dofs().values()])
        A = asm(vector_laplace, self.basis['u'])
        B = asm(divergence, self.basis['u'], self.basis['p'])
        self.S = bmat([[A, -B.T],
                       [-B, None]], 'csr')
        self.I = np.setdiff1d(np.arange(self.S.shape[0]), self.D)
    def inlet_dofs(self):
        return self.basis['inlet'].find_dofs()['inlet'].all()

    @staticmethod
    def parabolic(x):
        """return the plane Poiseuille parabolic inlet profile"""
        return np.stack([4 * x[1] * (1. - x[1]), np.zeros_like(x[0])])
    def make_vector(self):
        """prepopulate solution vector with Dirichlet conditions"""
        uvp = np.zeros(self.basis['u'].N + self.basis['p'].N)
        I = self.inlet_dofs()
        uvp[I] = projection(self.parabolic, self.basis['inlet'], I=I)
        return uvp
    def split(self, solution: np.ndarray) -> Tuple[np.ndarray,
                                                   np.ndarray]:
        """return velocity and pressure separately"""
        return np.split(solution, [self.basis['u'].N])

    def streamfunction(self, velocity: np.ndarray) -> np.ndarray:
        A = asm(laplace, self.basis['psi'])
        psi = self.basis['psi'].zeros()
        vorticity = asm(rot, self.basis['psi'],
                        w=self.basis['u'].interpolate(velocity))
        psi = solve(*condense(A, vorticity, D=self.basis['psi'].find_dofs()['floor'].all()))
        return psi

    def mesh_plot(self):
        """return Axes showing boundary of mesh"""
        termini = self.mesh.facets[:, np.concatenate(list(
            self.mesh.boundaries.values()))]
        _, ax = subplots()
        ax.plot(*self.mesh.p[:, termini], color='k')
        return ax

    def triangulation(self):
        return Triangulation(*self.mesh.p, self.mesh.t.T)

    def streamlines(self, psi: np.ndarray, n: int = 11, ax=None):
        if ax is None:
            ax = self.mesh_plot()
        n_streamlines = n
        plot = partial(ax.tricontour,
                       self.triangulation(),
                       psi[self.basis['psi'].nodal_dofs.flatten()],
                       linewidths=.5)
        for levels, color, style in [
            (np.linspace(0, 2/3, n_streamlines),
             'k',
             ['dashed'] + ['solid']*(n_streamlines - 2) + ['dashed']),
            (np.linspace(2/3, max(psi), n_streamlines)[0:],
             'r', 'solid'),
            (np.linspace(min(psi), 0, n_streamlines)[:-1],
             'g', 'solid')]:
            plot(levels=levels, colors=color, linestyles=style)
        ax.set_aspect(1.)
        ax.axis('off')
        return ax

    def inner(self, u: np.ndarray,  v: np.ndarray) -> float:
        """return the inner product of two solutions
        using just the velocity, ignoring the pressure
        """
        return self.split(u)[0] @ self.split(v)[0]

    def norm2_r(self, u: np.ndarray) -> float:
        return self.inner(u, u)

    def N(self, uvp: np.ndarray) -> np.ndarray:
        u = self.basis['u'].interpolate(self.split(uvp)[0])
        return np.hstack([asm(acceleration, self.basis['u'], wind=u),
                          self.basis['p'].zeros()])

    def f(self, uvp: np.ndarray, reynolds: float) -> np.ndarray:
        """Return the residual of a given velocity-pressure solution"""
        out = self.S @ uvp + reynolds * self.N(uvp)
        out[self.D] = uvp[self.D] - self.make_vector()[self.D]
        return out

    def df_dlmbda(self, uvp: np.ndarray, reynolds: float) -> np.ndarray:
        out = self.N(uvp)
        out[self.D] = 0.
        return out

    def jacobian_solver(self,
                        uvp: np.ndarray,
                        reynolds: float,
                        rhs: np.ndarray) -> np.ndarray:
        u = self.basis['u'].interpolate(self.split(uvp)[0])
        duvp = solve(*condense(
            self.S +
            reynolds
            * block_diag([asm(acceleration_jacobian, self.basis['u'], wind=u),
                          csr_matrix((self.basis['p'].N,)*2)]),
            rhs, self.make_vector() - uvp, I=self.I))
        return duvp

bfs = BackwardFacingStep()
psi = {}

def callback(_: int,
             reynolds: float,
             uvp: np.ndarray,
             milestones=Iterable[float]):
    """Echo the Reynolds number and plot streamlines at milestones"""
    print(f'Re = {reynolds}')
    if reynolds in milestones:
        psi[reynolds] = bfs.streamfunction(bfs.split(uvp)[0])
        if __name__ == '__main__':
            ax = bfs.streamlines(psi[reynolds])
            ax.set_title(f'Re = {reynolds}')
            ax.get_figure().savefig(Path(__file__).with_name(
                f'{Path(__file__).stem}-{reynolds}-psi.png'),
                                    bbox_inches="tight", pad_inches=0)

if __name__ == '__main__':
    milestones = [50., 150., 450., 750.]
else:
    milestones = [50.]
natural(bfs, bfs.make_vector(), 0.,
        partial(callback,
                milestones=milestones),
        lambda_stepsize0=50.,
        milestones=milestones)

@bhaveshshrimali
Copy link
Contributor Author

bhaveshshrimali commented Aug 18, 2021

A weird thing that I notice is that when I try self.mesh.refined(6) instead of self.mesh.refined(4) here

self.mesh = skfem.io.json.from_file(Path(__file__).parent / 'meshes' / 'backward-facing_step.json')

I get a smaller time for the solve in SuperLU whereas Pardiso is constant. Is this how it is supposed to be? Or am i doing something wrong here? (49.69 s for SuperLU vs 68.75 for Pardiso).

@gdmcbain
Copy link
Contributor

Ah, I have previously, but since pygmsh switched first its licence and then its API, the code referring to it was removed and replaced by a fixed mesh...

self.mesh = skfem.io.json.from_file(Path(__file__).parent / 'meshes' / 'backward-facing_step.json')

Unfortunately appending .refined() fails, raising

  File "/home/gdmcbain/src/scikit-fem/docs/examples/ex27.py", line 124, in __init__
    facets=self.mesh.boundaries['inlet'])
TypeError: 'NoneType' object is not subscriptable

i.e. boundaries are not preserved during refinement. Let me see if I can dig out the old mesh-generating function and pip install pygmsh\<7. …Nope, that doesn't work any more, too many upstream changes in pygmsh and meshio.

@gdmcbain
Copy link
Contributor

A weird thing that I notice is that when I try self.mesh.refined(6) instead of self.mesh.refined(4) here

Oh, that worked for you?

@gdmcbain
Copy link
Contributor

Ah, hang on: what is your changed line? Note that the .refined method is functional, it returns the refined mesh rather than mutating self. There used to be a .refine method that mutated self and returned None. That is, if you merely called

    self.mesh.refined(n)

it would not have done anything; that would be consistent with

whereas Pardiso is constant

@gdmcbain
Copy link
Contributor

See #536 for discussion of the change from .refine to .refined.

@bhaveshshrimali
Copy link
Contributor Author

You're right, I seem to have confounded the two. And there is the error you posted.

@bhaveshshrimali
Copy link
Contributor Author

But then, if I naively do

In [1]: from scipy.sparse import random as rand

In [2]: import numpy as np

In [3]: import pypardiso

In [4]: from scipy.sparse.linalg import spsolve as spl

In [5]: A = rand(10000, 10000, density=0.01, format="csr", dtype= np.float64)

In [6]: A += identity(A.shape[0])

In [7]: b = np.random.randn(A.shape[0])

In [8]: %timeit pypardiso.spsolve(A,b)
45.6 ms ± 8.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [9]: %timeit spl(A, b)

scipy.sparse.linalg.spsolve has been running for quite sometime now. I tested on a smaller matrix and found

In [12]: A = rand(1000, 1000, density=0.01, format="csr", dtype= np.float64) + identity(1000)

In [13]: %timeit spl(A, b)
286 ms ± 3.42 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [14]: %timeit pypardiso.spsolve(A,b)
330 µs ± 2.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Any thoughts @gdmcbain ?

@gdmcbain
Copy link
Contributor

Well, I'm not an expert but my intuition is that the sparse matrices assembled from finite element problems do have some kind of structure compared to rand and maybe the SuperLU is designed to do better on that kind of thing? I have no actual basis for believing that though.

How do we go with simple Laplace + Dirichlet, as in ex01?

@kinnala
Copy link
Owner

kinnala commented Aug 18, 2021

You could try reordering the rows and the columns, e.g., using skfem.utils.rcm. This kind of reordering is done to reduce the NNZ entries in the LU/Cholesky decomposition. I guess some linear algebra packages may do that automatically.

@gdmcbain
Copy link
Contributor

Ah, yes, reordering #168.

@kinnala
Copy link
Owner

kinnala commented Aug 18, 2021

I have been using PyPI for the most part because of netgen, but will try out conda in the future.

I don't use conda too much, but I think you can install pip inside conda and use both tools to install packages.

@bhaveshshrimali
Copy link
Contributor Author

bhaveshshrimali commented Aug 18, 2021

How do we go with simple Laplace + Dirichlet, as in ex01?

0.41s (pardiso) vs 1.495s (SuperLU) on my laptop. Will check skfem.utils.rcm

from skfem import *
from skfem.helpers import dot, grad

# create the mesh
m = MeshTri().refined(8)
# or, with your own points and cells:
# m = MeshTri(points, cells)

e = ElementTriP1()
basis = Basis(m, e)  # shorthand for CellBasis

# this method could also be imported from skfem.models.laplace
@BilinearForm
def laplace(u, v, _):
    return dot(grad(u), grad(v))


# this method could also be imported from skfem.models.unit_load
@LinearForm
def rhs(v, _):
    return 1.0 * v

A = asm(laplace, basis)
b = asm(rhs, basis)
# or:
# A = laplace.assemble(basis)
# b = rhs.assemble(basis)

# enforce Dirichlet boundary conditions
A, b = enforce(A, b, D=m.boundary_nodes())

# solve -- can be anything that takes a sparse matrix and a right-hand side
x = solve(A, b) # solve(A, b, solver="pardiso")

# plot the solution
from skfem.visuals.matplotlib import plot, savefig
plot(m, x, shading='gouraud', colorbar=True)
savefig('solution.png')

@bhaveshshrimali
Copy link
Contributor Author

I don't use conda too much, but I think you can install pip inside conda and use both tools to install packages.

Thanks! I have been meaning to do this for sometime. Back then I did not have a good understanding of package management and virtual environments.

@gdmcbain
Copy link
Contributor

I don't use conda too much, but I think you can install pip inside conda and use both tools to install packages.

This is correct. These days a conda env includes its own pip command that just pip-installs inside that conda env.

@gdmcbain
Copy link
Contributor

O. K., with the proposed fix #693 to #691, I can now read MSH 4.1 (or rather VTU recoded & saved from MSH 4.1) in ex27 and so can generate meshes of different fineness to see how pardiso scales compared to the default solver.

Halving the mesh-size with (see #693 for original generation instructions at -clscale 0.2)

gmsh -2 -clscale 0.1 backward-facing_step.geo

and recoding & saving with Mesh.load(...).save gives 8685 vertices and pardiso does now do slightly better than the default: 145 v. 168 s.

ex27-pardiso-snakeviz
ex27-scipy-snakeviz

@gdmcbain
Copy link
Contributor

Halving again (-clscale 0.05, 33751 vertices), the default solver used 1.96×10³ s but after doing fine at first (up to Reynolds number about 545), pardiso stalled in the natural-parameter continuation (observe that the continuation step in Reynolds number has dropped to 0.1189) and then failed

Step 20: lambda  5.742e+02 + 1.189e-01  ->  5.743e+02
||F(u)|| = 1.933075e-01
||F(u)|| = 3.569919e-03
Traceback (most recent call last):
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/cProfile.py", line 185, in <module>
    main()
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/cProfile.py", line 178, in main
    runctx(code, globs, None, options.outfile, options.sort)
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/cProfile.py", line 19, in runctx
    return _pyprofile._Utils(Profile).runctx(statement, globals, locals,
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/profile.py", line 62, in runctx
    prof.runctx(statement, globals, locals)
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/cProfile.py", line 100, in runctx
    exec(cmd, globals, locals)
  File "docs/examples/ex27.py", line 266, in <module>
    natural(bfs, bfs.make_vector(), 0.,
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/site-packages/pacopy/natural.py", line 93, in natural
    u, newton_steps = newton(
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/site-packages/pacopy/newton.py", line 20, in newton
    du = jacobian_solver(u, -fu)
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/site-packages/pacopy/natural.py", line 95, in <lambda>
    lambda u, rhs: problem.jacobian_solver(u, lmbda, rhs),
  File "docs/examples/ex27.py", line 229, in jacobian_solver
    duvp = solve(*condense(
  File "/home/gdmcbain/src/scikit-fem/skfem/utils.py", line 226, in solve
    return solve_linear(A, b, x, I, solver, **kwargs)  # type: ignore
  File "/home/gdmcbain/src/scikit-fem/skfem/utils.py", line 193, in solve_linear
    y[I] = solver(A, b, **kwargs)
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/site-packages/pypardiso/scipy_aliases.py", line 48, in spsolve
    x = solver.solve(A, b)
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/site-packages/pypardiso/pardiso_wrapper.py", line 155, in solve
    x = self._call_pardiso(A, b)
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/site-packages/pypardiso/pardiso_wrapper.py", line 262, in _call_pardiso
    raise PyPardisoError(pardiso_error.value)
pypardiso.pardiso_wrapper.PyPardisoError: The Pardiso solver failed with error code -4. See Pardiso documentation for details.

Hitherto I had not noticed any difference in the solutions obtained by pardiso & scipy, but I think this says that there must be. It's true that this is a fairly challenging nonlinear problem.

@gdmcbain
Copy link
Contributor

Hmm, the failure isn't deterministic either; here it was rerun (without profiling); same PyPardisoError but at a slightly different Reynolds number:

No convergence for lambda=572.6578742265701.
Step 13: lambda  5.584e+02 + 7.127e+00  ->  5.655e+02
||F(u)|| = 1.812719e-03
||F(u)|| = 3.381340e-06
||F(u)|| = 5.901106e-08
||F(u)|| = 2.683568e-11
||F(u)|| = 1.979994e-13
Re = 565.5306950211525
Step 14: lambda  5.655e+02 + 8.018e+00  ->  5.735e+02
||F(u)|| = 2.343903e-03
||F(u)|| = 2.885877e-06
||F(u)|| = 1.470555e-09
||F(u)|| = 2.115316e-09
Traceback (most recent call last):
  File "/home/gdmcbain/src/scikit-fem/docs/examples/ex27.py", line 266, in <module>
    natural(bfs, bfs.make_vector(), 0.,
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/site-packages/pacopy/natural.py", line 93, in natural
    u, newton_steps = newton(
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/site-packages/pacopy/newton.py", line 20, in newton
    du = jacobian_solver(u, -fu)
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/site-packages/pacopy/natural.py", line 95, in <lambda>
    lambda u, rhs: problem.jacobian_solver(u, lmbda, rhs),
  File "/home/gdmcbain/src/scikit-fem/docs/examples/ex27.py", line 229, in jacobian_solver
    duvp = solve(*condense(
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/site-packages/skfem/utils.py", line 226, in solve
    return solve_linear(A, b, x, I, solver, **kwargs)  # type: ignore
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/site-packages/skfem/utils.py", line 193, in solve_linear
    y[I] = solver(A, b, **kwargs)
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/site-packages/pypardiso/scipy_aliases.py", line 48, in spsolve
    x = solver.solve(A, b)
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/site-packages/pypardiso/pardiso_wrapper.py", line 155, in solve
    x = self._call_pardiso(A, b)
  File "/home/gdmcbain/miniconda3/envs/skfem/lib/python3.9/site-packages/pypardiso/pardiso_wrapper.py", line 262, in _call_pardiso
    raise PyPardisoError(pardiso_error.value)
pypardiso.pardiso_wrapper.PyPardisoError: The Pardiso solver failed with error code -4. See Pardiso documentation for details.

@gdmcbain
Copy link
Contributor

Another rerun kept going, but the continuation really wasn't getting anywhere:

No convergence for lambda=575.1003060782954.
Step 57: lambda 5.751e+02 + 1.467e-04 -> 5.751e+02

@bhaveshshrimali
Copy link
Contributor Author

Thanks for the detailed analysis @gdmcbain. Indeed its a bit htereogeneous as to when Pardiso outperforms SuperLU and vice-versa. I will check things on my laptop too this week. The significant speed up in incompressible hyperelsaticity led to me believe that it should carry over to other nonlinear problems as well, but not that simple I guess.

@gdmcbain
Copy link
Contributor

Yes, so far Pardiso looks better on everything except ex27 (lucky me), so I do think it's worth persisting with, particularly as it's so easy to use in scikit-fem via pypardiso. I'd say it was well worth including in the suite of integrations #474. I don't know why ex27 was problematic or why Pardiso gave different results to SciPy and different results from run to run, but would be interesting in learning more if this is expected or other evidence is found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants