Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqrtm is not type-stable #21

Closed
johnmyleswhite opened this issue Aug 10, 2013 · 14 comments
Closed

sqrtm is not type-stable #21

johnmyleswhite opened this issue Aug 10, 2013 · 14 comments

Comments

@johnmyleswhite
Copy link
Member

Maybe we should change sqrtm so that it always returns a complex array?

julia> X = sqrtm([1.0 -0.9; -0.9 1.0])
2x2 Float64 Array:
  0.847316  -0.531089
 -0.531089   0.847316

julia> X * X
2x2 Float64 Array:
  1.0  -0.9
 -0.9   1.0

julia> X = sqrtm([1.0 -0.9; 0.9 1.0])
2x2 Complex{Float64} Array:
   1.0829+0.0im  -0.415549+1.11022e-16im
 0.415549+0.0im     1.0829+1.38778e-16im

julia> X * X
2x2 Complex{Float64} Array:
 1.0+4.61352e-17im  -0.9+1.82784e-16im
  0.9+5.7669e-17im   1.0+3.46701e-16im
@ViralBShah
Copy link
Member

This is also the case with sqrt. This is what dynamic languages are meant for. The types also look fine. We can keep it the way it is IMO.

@johnmyleswhite
Copy link
Member Author

That's not how sqrt behaves:

julia> sqrt(1.0)
1.0

julia> sqrt(-1.0)
ERROR: DomainError()
 in sqrt at math.jl:118

julia> sqrt(1.0 + 0.0 * im)
1.0 + 0.0im

julia> sqrt(-1.0 + 0.0 * im)
0.0 + 1.0im

@JeffBezanson
Copy link
Member

eig behaves the same as sqrtm.

@ViralBShah
Copy link
Member

Oops, sqrt was a bad example. But yes, eig does this.

@johnmyleswhite
Copy link
Member Author

Ok. This behavior seems undesirable to me, but I suppose this is traditional enough that I should just learn to accept it. I would have thought we'd want behavior like fft has.

@StefanKarpinski
Copy link
Member

I agree that this does not seem great from the type-stability perspective.

@StefanKarpinski
Copy link
Member

One option would be to return a real matrix when the input is a matrix of Symmetric type and a complex matrix otherwise. That also has the advantage of avoiding the check when you've constructed the matrix to be symmetric. A big disadvantage of the current arrangement is that there isn't any way to call sqrtm type stably. We could do a similar scheme for eig too. The nice: the returned array is always complex and it's always correct but it might happen to be a purely real-valued complex array. If you know the matrix is symmetric and you want a real matrix back, you can wrap it in Symmetric first and get exactly what you want.

@jiahao
Copy link
Member

jiahao commented Jan 24, 2014

The underlying issue is that special symmetries of the matrix (Hermitian, real symmetric etc) allow for special-cased algorithms that are significantly faster than the generic algorithms. Functions like sqrtm can be computed more stably and quickly for complex Hermitian than for general unsymmetric complex, and so the current methods actually check for special symmetries and explicitly dispatch appropriate specialized methods. If we turn off these symmetry detection checks, we compromise performance. If we explicitly cast the return into a Complex matrix type, we also potentially compromise performance if we have to force memory copies to create the desired type.

If a user wants to explicitly cast the return into a Complex matrix type, they should feel free to do so. But there seems no way to force the issue without compromising performance.

@ivarne
Copy link
Member

ivarne commented Jan 25, 2014

It might be nice to mention the type instability issue in the documentation for sqrtm?

@jiahao
Copy link
Member

jiahao commented Jan 25, 2014

Yeah, we should document this. This behavior isn't specific to sqrtm however; rather it is also a feature of other functions, most notably linear solve and eig.

Reopening as documentation issue.

@johnmyleswhite
Copy link
Member Author

We should really be documenting the return type of every function.

@jiahao
Copy link
Member

jiahao commented Jan 26, 2014

My earlier analysis of the sqrtm polyalgorithm turns out to be not quite right. This is what actually happens when sqrtm(A) is called:

  1. check if A is Hermitian or symmetric.
  2. If A is neither Hermitian nor symmetric, triangularize A by computing its Schur factorization. Compute the matrix square root of the triangular factor, then rotate the new matrix back into the original basis.
  3. If A is either Hermitian or symmetric, diagonalize A by computing its eigendecomposition. Take the square roots of the eigenvalues, converting them as necessary to handle the possibility of complex values being produced, then rotate the new matrix back into the original basis.

A matrix that is neither Hermitian nor symmetric will in general have a square root with complex elements. A matrix that is Hermitian or symmetric will also have a square root with complex elements unless it is positive definite, in which case we can specialize to the case of square root with real elements. Since you can determine whether a Hermitian/symmetric matrix is positive definite essentially for free after computing its eigenvalues, and recomposing a real matrix as a product of real factors is cheaper than in the complex case, I still think that the current polyalgorithm is optimally performant.

The current implementation of the individual components, however, suggests some beneficial refactoring. For example:

  • The computation of the square root of a triangular matrix is buried in the general method and could be usefully exposed to compute square roots of Triangulars quickly.
  • The current check for positive definiteness converts all the eigenvalues to complex, then takes their square root, then checks if the result has zero imaginary component. This could be done more simply.

@GunnarFarneback
Copy link

A matrix that is neither Hermitian nor symmetric will in general have a square root with complex elements.

In general yes, but e.g. the special case of real matrices with no eigenvalues on the non-positive real axis (positive and conjugate pairs are fine) have real square roots. I wrote about some of the peculiarities of the matrix square root in a mailing list thread about sqrtm when it was implemented:

Like the scalar square root, the matrix square root often has multiple solutions, and even more so than in the scalar case. E.g. a positive definite nxn Hermitian matrix with distinct eigenvalues has 2^n different square roots, whereas repeated eigenvalues can allow infinitely many square roots. E.g. [-1 0;0 -1] has the obvious square roots [i 0;0 i], [i 0;0 -i], [-i 0;0 i], and [-i 0;0 i] but also non-diagonal solutions like [1 2;-1 -1], which has the property of being real. In fact real matrices often have real square roots, and it could be interesting to have a real variation of sqrtm. This is thoroughly investigated, at least in the non-singular case, in

N. J. Higham, Computing real square roots of a real matrix, Linear Algebra Appl. 88/89 (1987) 405–430.

As a special case, I believe that the implemented algorithm does compute a real solution, up to numerical noise, for real matrices with no eigenvalues on the non-positive real axis. This is easily checked in the Schur decomposition and the imaginary part could be dropped in that case. It would be good if someone could verify that this is correct.

The singular non-diagonalizable case can be quite tricky. E.g. [0 1;0 0] has no square root at all, neither real nor complex, but if we add another dimension we have [0 0 1;0 0 0;0 1 0]^2 = [0 1 0;0 0 0;0 0 0]. The implemented method not-a-numbers both of these.

I did write a Julia implementation of Higham's real square root method (starting from the real Schur decomposition instead of the complex Schur decomposition) but it's really not worth the code complexity.

@jiahao
Copy link
Member

jiahao commented Jan 27, 2014

@GunnarFarneback thanks for the back history. I'll have to think for a bit about how easy it would be to check for the special case you mentioned in the triangular sqrtm routine.

The mailing list discussion belies a much more complicated issue, namely that of defining a principal square root. It bears repeating that matrices don't have unique square roots. I think most people would expect matrix-valued functions to produce the "canonical" version that is implemented, which is to factorize, square root, and rotate back to the original basis. But perhaps this is worth mentioning in the docs.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants