-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ishermitian and issym test for approx. symmetry for floats. fix #10298 #10369
Conversation
@dhoegh, In particular, consider (In general, I think having a default |
19476f0
to
5750d66
Compare
Ok, kind of already knew that the |
Thanks for working on this. As @stevengj points out, the metric for this should take the whole matrix into account instead of comparing element by element. I think something along the lines of the @stevengj proposed, e.g. |
Ok, I will work towards to do the calculation without allocation. One issue though, I can't seem to replicate
|
No. It's not true. I don't know why I got that wrong. 2015-03-02 14:43 GMT-05:00 Daniel Høegh [email protected]:
|
c612fff
to
9739ab6
Compare
I have updated the pull where The last thing missing in this pull is probably better test cases, do anyone have a good idea for a test case? |
This still seems like a can of worms to determine what |
The failures are due to tol for If we abandon to test Line 28 in 5e2a617
isposdef will just return false because only very few arrays containing floats can pass the ishermitian before the LAPACK.potrf! is called.
|
@stevengj Isn't this similar to having tolerances in |
Bump what is the way forward, should I take the implementation of From my point of view it's seems misleading to have implemented an |
Btw Matlab do not supply an |
No. Let's continue along this way. I agree with you and I think it is the correct thing to do. The old behavior will be a special case of the new with The problem is to find the right default value for the tolerance and here it would be great to analyse the problem a bit. Some people reading this might find such analyses very easy so it would be great to have a reference, some plots and/or some math arguments for the choice of default tolerance. |
I'm used to the point of view that |
It's a matter of culture. In my first year math exam I'd get an error if I concluded that a non-symmetric matrix was positive definite. So far we have followed that tradition here. I guess there is a computational argument. You can check for my kind of definiteness with the Cholesky factorization whereas you'd have to calculate the eigen decomposition to check for your kind of definiteness, right? Even if we decide to use the other definition of definiteness, then I think it is too restrictive to require exact match in |
For my kind of definiteness, you have to symmetrize the matrix before calling Cholesky, which is an additional step, of course. I still don't like that you could be mislead to believe that your matrix was not positive definite because it was not Hermitian enough, even if the Hermitian part was positive definite. |
58f16ae
to
753f81f
Compare
According to my favorite matrix analysis book Hermitianity is actually a consequence of |
@toivoh, if But I think that most people would currently think of it as a generalization of the positive-definite concept, not positive-definiteness per se. If anyone wants to check for this they can always call |
Yes, that's true. |
So my point was really that it seems it could be quite confusing to people to if |
It wouldn't be crazy for |
Consider the matrix |
I'm fine we either of the two solutions. When we have |
@stevengj I wrote: According to my favorite matrix analysis book Hermitianity is actually a consequence of x'Ax>0 for all x. There is more room in the real case where x'Ax>0 only restricts the symmetric part. so you are not providing a counter example because you have restricted |
This came up again in #13753 so I think we should finalize this pr. It would be good if we could find some theory on this, but it seems like cc: @bnels |
With respect to the discussion on the definition of "Hermitian", if we're talking about matrices and not (possibly infinite-dimensional) linear operators, I think that the important property is that there exists a unitary spectral decomposition with real eigenvalues, and, of course, equivalently, A is equal to its adjoint. But I think that the discussion about |
753f81f
to
b45d5f4
Compare
b45d5f4
to
80b7c20
Compare
I have rebased the PR, added |
I guess we need to add the same functionality on the sparse side? |
@@ -7211,9 +7211,9 @@ Given `@osx? a : b`, do `a` on OS X and `b` elsewhere. See documentation for Han | |||
:@osx | |||
|
|||
doc""" | |||
ishermitian(A) -> Bool | |||
ishermitian(A; tol=eps(eltype(real(A)))*norm(A,1)) -> Bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean real(eltype(A))
. Doing real(A)
makes a real copy of the matrix if it is complex.
The default Let's consider the application of this to The other main use of checking whether a matrix is nearly Hermitian is to decide whether to use the Cholesky factorization for solving So, it seems like answering the question of whether a matrix is "close enough" to Hermitian may depend on the application. |
@stevengj Hermitian eigenvalue solvers typically only provide high absolute accuracy, not high relative accuracy, so I think that this is asking too much. The backwards error of a Hermitian eigensolver is typically of the order of epsilon times the two-norm (see, for example, http://www.netlib.org/lapack/lawnspdf/lawn84.pdf). I therefore think that a deviation of order epsilon times the two-norm from Hermitian is acceptable for the projection. Similar approximations are made to deflate the QR iteration for upper Hessenberg matrices. With that said, it is sometimes possible to achieve high relative accuracy, so perhaps there should be a flag in the eigensolver to make such an attempt and such approximations could be avoided (but, with the exception of bidiagonal SVD and symmetric tridiagonal EVP, I think custom software would need to be written). EDIT: Sorry for misreading your statement; I didn't realize you were using lambda as the largest eigenvalue. |
@poulson, saying that |δλ| = Ο(ε)*norm(A,2) is equivalent to saying that the relative error δλ/|λₘ| = O(ε), where |λₘ| is the maximum eigenvalue magnitude = norm(A,2) for Hermitian matrices. So, as I said, if you don't want to lose this property then you need norm(δB) = O(ε) norm(A) in the induced 2-norm, where δB is the anti-symmetric part of A. You can convert this into a statement about the 1-norm with appropriate constant factors, but I'm not sure whether the current test And, as I said, the application to I'd like to see some (a) more careful analysis and (b) some numerical experiments of the accuracy impact of this PR. Or at least (b). |
@stevengj I completely agree that the tolerance should be different for solving a linear system; my comment was only meant to relate to eigenvalue problems. And I agree that we should not use the constant of one; I would lean towards using tighter tolerances for linear systems since the relative cost of solving a symmetric versus a non symmetric linear system is small compared with the cost of a Hermitian eigenvalue problem relative to a Schur decomposition. EDIT: The font works in another browser. |
Probably your font is missing some glyphs for Unicode codepoints. (I'm switching back and forth between English and Greek keyboard inputs, so the codepoints might be unusual.) |
In summary, I would recommend that we try to preserve high absolute accuracy in projections to more structured forms for eigenvalue problems (until Julia supports algorithms which are likely to yield high relative accuracy), and for projections to preserve high relative accuracy when solving linear systems. As @stevengj said, the latter requires that the perturbations be on the order of epsilon times the smallest singular value rather than epsilon times the largest (consider a right-hand side living in the direction of the largest right singular vector and the perturbation of the matrix effecting its column space in the direction of the smallest right singular vector). Since knowing the smallest singular value is too expensive, we would need to decide on a condition estimator, but those typically required an initial (in general, LU) factorization... |
On the other hand, we could always be aggressive and attempt to solve the projected linear system using a heuristic projection tolerance, check the residual after two steps of iterative refinement (with the original matrix), and, if it is too large, fall back to the more expensive factorization of the original matrix. The caveat to this approach is the extra memory requirement, though it is no more than typically required by iterative refinement, and potentially performing an extra structured/symmetric factorization when an unstructured/unsymmetric factorization was required. |
@poulson, what you're calling "absolute" error in the eigenvalue is still a essentially a relative error δλ / (max |λ|) = O(ε). (A "small absolute error" condition would be something like δλ ≤ 1e-8, independent of A, which is impossible to enforce. Absolute errors are dimensionful.) Also, as I said, requiring small relative error |δx|/|x|=O(ε) in the solution So |δB| = |A| O(ε) seems right. The only question in my mind is what coefficient to use in the O(ε) for a chosen norm. |
In particular, the "constant" coefficient may depend on the matrix size, because the relationships between the norms depends on the matrix size. The error analysis (particularly for the eigenvalues) is most natural in the induced-2 norm. Suppose we want |
I agree. So we want the perturbation to satisfy At this point, what I am most more worried about is that I might have pushed us towards a confrontational tone. That was unintentional and perhaps the result of me firing off these responses while exhausted. With that said, I completely agree that, by definition, the worst-case relative error of the solution from a stable linear solver is O(eps*cond(A)), but I think we can avoid always forcing ourselves into the worst-case regime by recognizing that, for example, it can be better to look at the condition number after a diagonal scaling (e.g., http://www.netlib.org/lapack/lawnspdf/lawn14.pdf). One example showing a large difference in the relative residual after such a perturbation (that is recovered via iterative refinement with the unperturbed system) is
outputs (in my instance)
Another example that doesn't use singular vectors but demonstrates the relative error in the solution of an HPD matrix follows the condition number of the equilibrated matrix more closely than the condition number of the original matrix is the following, which would actually be a typical use-case. In particular, I construct a random matrix which is ill-conditioned (due to diagonal scaling) but close to HPD in the sense that its skew-Hermitian component has a two-norm which is on the order of epsilon times the two-norm of
which has output
which shows that iterative refinement using the original system recovers the same accuracy after three steps (using higher-precision for the IR would be better). So perhaps we need to consider diagonal equilibrations as well (or at least iteratively refine using the unprojected original system). I apologize for all of the edits, as I had to teach two classes this morning. |
@poulson, I think the norm equivalence works both ways. And you have to apply the norm bounds to both the left- and right-hand sides of the inequality (in opposite directions), since we're using the 1-norm for both A and δB, hence the factor of (For the eigenvalues, the Hermitian perturbation theory I used is derived via the usual Euclidean inner product, which is why you are forced to use the 2-norm. For However, there is also an |
@poulson, |
You're right that my example was rushed; I've modified it to show the phenomenon I was referring to. As can be seen, the relative error in the unperturbed solution is correct to fifteen digits, but the relative error in the perturbed solution is only correct to three. |
I must admit this discussion surpasses my linear algebra understanding and I cannot follow through with this PR. Please feel free to use the commit from here if it is useful. The tests fail because there is not check for |
@dhoegh I would argue that the conclusion is that (after my flurry of edits) is that a tolerance of |
@poulson, I think your conclusion is partially incorrect, because:
|
Good point on |
@poulson, the exponential worst-case growth is for LU factorization on arbitrary matrices; it doesn't apply to Cholesky anyway. |
No, but it does apply to symmetric-indefinite matrices: http://epubs.siam.org/doi/pdf/10.1137/100801548 |
This fixes JuliaLang/LinearAlgebra.jl#182.
I do not know if the
atol,rtol
keywords should be used onisapprox
and how they should be determined for the matrix inissym
andishermitian
.@andreasnoack would you give this a thorough review, since I am not familiar with this part of base.