Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove sqrt from the volatile list #43786

Merged
merged 1 commit into from
Jan 20, 2022
Merged

Remove sqrt from the volatile list #43786

merged 1 commit into from
Jan 20, 2022

Conversation

Keno
Copy link
Member

@Keno Keno commented Jan 12, 2022

The LLVM IR spec now explicitly says that this intrinsic is required
to be rounded correctly. That means that without fasthmath flag
(which we do not set here), this intrinsic must have the bitwise
correctly rounded answer and should thus not differ between compile
and runtime. If there is still a case where it does differ, that is
likely some other underlying bug that we should fix instead.

Before:

julia> f() = sqrt(2)
f (generic function with 1 method)

julia> @code_typed f()
CodeInfo(
1 ─ %1 = Base.Math.sqrt_llvm(2.0)::Float64
└──      return %1
) => Float64

After:

julia> @code_typed f()
CodeInfo(
1 ─     return 1.4142135623730951
) => Float64

@oscardssmith oscardssmith added the performance Must go faster label Jan 12, 2022
@oscardssmith
Copy link
Member

Nice to get this fixed!

@Keno
Copy link
Member Author

Keno commented Jan 13, 2022

my guess is that the common way this could fail is if a libm uses fqsrt without accounting for double rounding. glibc has test vectors for this case at https://sourceware.org/bugzilla/show_bug.cgi?id=14032, so we could add those as a test to see if anybody reports issues

@oscardssmith
Copy link
Member

If anyone fails this, it would probably be windows. Their libm isn't great in general.

@N5N3
Copy link
Member

N5N3 commented Jan 13, 2022

Random test passed locally on win64:

for I in 1:10000000
    r = reinterpret(Float64,rand(UInt))
    @test sqrt(abs(r)) === Base.sqrt_llvm(abs(r))
end

Edit: glibc test also passed.
So it should be OK for modern CPU? (with vsqrt support)

@Keno
Copy link
Member Author

Keno commented Jan 13, 2022

Base.sqrt is defined as sqrt_llvm, so unfortunately that test does not say anything. As I said, the interesting case are probably the test vectors in that glibc issue.

@N5N3
Copy link
Member

N5N3 commented Jan 13, 2022

IIUC, Base.sqrt_llvm always calls runtime sqrt like Base.fma_float. (#43530 use such feature to test julia_fma(f)'s precision).
Just for example: on offical 1.7.1 Win64

julia> fma(-1.9369631f13, 2.1513551f-7, -1.7354427f-24)
-4.1670958f6

julia> Base.fma_float(-1.9369631f13, 2.1513551f-7, -1.7354427f-24)
-4.1670955f6

@Keno Keno force-pushed the kf/sqrtnovolatile branch from 7922ce9 to b7a9051 Compare January 17, 2022 00:47
@Keno Keno changed the title RFC: Remove sqrt from the volatile list Remove sqrt from the volatile list Jan 17, 2022
@Keno
Copy link
Member Author

Keno commented Jan 17, 2022

Alright, I have added a test to detect if there's any platform with the fp80 double rounding problem. Otherwise, I think we should try this and see what happens.

@Keno
Copy link
Member Author

Keno commented Jan 17, 2022

Sigh, sqrt test failed on builtkite i686 Linux, so looks like whatever we're using there is buggy.

@Keno
Copy link
Member Author

Keno commented Jan 19, 2022

This might be our fault because it looks like openlibm has the double rounding bug on 32bit: https://github.com/JuliaMath/openlibm/blob/master/i387/e_sqrt.S#L12.

@oscardssmith wanna have a go at fixing this in openlibm?

That said, we assume a minimum of SSE2, so our minimum supported ISA has sqrtsd - we should probably just use that.

@Keno
Copy link
Member Author

Keno commented Jan 19, 2022

IIUC, Base.sqrt_llvm always calls runtime sqrt like Base.fma_float. (#43530 use such feature to test julia_fma(f)'s precision).

There's three cases that all need to be consistent:

  1. The code generated by LLVM for the intrinsic
  2. The code LLVM uses for constant folding the intrinsic
  3. Our own runtime intrinsic used by the interpreter

Depending on how things are built, these can all pick up different version of the intrinsics.

@oscardssmith
Copy link
Member

I'd honestly rather us move away from openlibm. can we just make llvm use the intrinsic?

@Keno
Copy link
Member Author

Keno commented Jan 19, 2022

Actually, looks like in this case LLVM doesn't actually constant fold sqrt, so we'll be ok once we fix runtime intrinsics (and since we require SSE2, it should be fine to do that using the SSE intrinsic). However, just for completeness, I do think we should also fix openlibm. Should be a 5 line change to set the precision flag in the control word appropriately.

@oscardssmith
Copy link
Member

That makes sense. (I am, however unlikely to touch the openlibm version).

@Keno
Copy link
Member Author

Keno commented Jan 19, 2022

Alright, do you want to change the runtime intrinsics here and I'll make the patch to openlibm (but we won't block on an openlibm upgrade)?

Keno added a commit to JuliaMath/openlibm that referenced this pull request Jan 19, 2022
As discussed in JuliaLang/julia#43786, openlibm's sqrt function is incorrectly rounded for i387. IEEE requires correct rounding for these functions and LLVM relies on it. Fix that by setting the precision in the FPU control word (see e.g. e_ceil.S for similar FPU modifications).
Keno added a commit to JuliaMath/openlibm that referenced this pull request Jan 19, 2022
As discussed in JuliaLang/julia#43786, openlibm's sqrt function is incorrectly rounded for i387. IEEE requires correct rounding for these functions and LLVM relies on it. Fix that by setting the precision in the FPU control word (see e.g. e_ceil.S for similar FPU modifications).
@Keno
Copy link
Member Author

Keno commented Jan 19, 2022

Openlibm fix is JuliaMath/openlibm#256.

ViralBShah pushed a commit to JuliaMath/openlibm that referenced this pull request Jan 19, 2022
As discussed in JuliaLang/julia#43786, openlibm's sqrt function is incorrectly rounded for i387. IEEE requires correct rounding for these functions and LLVM relies on it. Fix that by setting the precision in the FPU control word (see e.g. e_ceil.S for similar FPU modifications).
Keno added a commit that referenced this pull request Jan 19, 2022
This is more of a "Do we want to move in this direction RFC". As mentioned in #43786,
we currently have three implementations of these intrinsics:

1. The code generated by LLVM for the intrinsic
2. The code LLVM uses for constant folding the intrinsic
3. Our own runtime intrinsic used by the interpreter

This basically removes the third one, which will be required if we want to do
something about #26434 because we just forward these to libm. Of course we'll
still have to do something to teach LLVM how to constant fold these in a manner
compatible with what will actually end up running, but that's a separate issue.
@Keno
Copy link
Member Author

Keno commented Jan 19, 2022

As discussed, for now we won't touch runtime_intrinsic and just pull in the openlibm update to enable this. #43869 tracks just getting rid of the intrinsic entirely to reduce the complexity.

Keno added a commit that referenced this pull request Jan 20, 2022
This is more of a "Do we want to move in this direction RFC". As mentioned in #43786,
we currently have three implementations of these intrinsics:

1. The code generated by LLVM for the intrinsic
2. The code LLVM uses for constant folding the intrinsic
3. Our own runtime intrinsic used by the interpreter

This basically removes the third one, which will be required if we want to do
something about #26434 because we just forward these to libm. Of course we'll
still have to do something to teach LLVM how to constant fold these in a manner
compatible with what will actually end up running, but that's a separate issue.
@Keno Keno mentioned this pull request Jan 20, 2022
The LLVM IR spec now explicitly says that this intrinsic is required
to be rounded correctly. That means that without fasthmath flag
(which we do not set here), this intrinsic must have the bitwise
correctly rounded answer and should thus not differ between compile
and runtime. If there is still a case where it does differ, that is
likely some other underlying bug that we should fix instead.

Before:
```
julia> f() = sqrt(2)
f (generic function with 1 method)

julia> @code_typed f()
CodeInfo(
1 ─ %1 = Base.Math.sqrt_llvm(2.0)::Float64
└──      return %1
) => Float64
```

After:
```
julia> @code_typed f()
CodeInfo(
1 ─     return 1.4142135623730951
) => Float64
```
@Keno Keno force-pushed the kf/sqrtnovolatile branch from b7a9051 to bb2e1fb Compare January 20, 2022 15:18
@Keno Keno merged commit cc96240 into master Jan 20, 2022
@Keno Keno deleted the kf/sqrtnovolatile branch January 20, 2022 22:34
simeonschaub added a commit that referenced this pull request Jan 31, 2022
simeonschaub added a commit that referenced this pull request Feb 4, 2022
simeonschaub added a commit that referenced this pull request Feb 4, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
The LLVM IR spec now explicitly says that this intrinsic is required
to be rounded correctly. That means that without fasthmath flag
(which we do not set here), this intrinsic must have the bitwise
correctly rounded answer and should thus not differ between compile
and runtime. If there is still a case where it does differ, that is
likely some other underlying bug that we should fix instead.

Before:
```
julia> f() = sqrt(2)
f (generic function with 1 method)

julia> @code_typed f()
CodeInfo(
1 ─ %1 = Base.Math.sqrt_llvm(2.0)::Float64
└──      return %1
) => Float64
```

After:
```
julia> @code_typed f()
CodeInfo(
1 ─     return 1.4142135623730951
) => Float64
```
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
The LLVM IR spec now explicitly says that this intrinsic is required
to be rounded correctly. That means that without fasthmath flag
(which we do not set here), this intrinsic must have the bitwise
correctly rounded answer and should thus not differ between compile
and runtime. If there is still a case where it does differ, that is
likely some other underlying bug that we should fix instead.

Before:
```
julia> f() = sqrt(2)
f (generic function with 1 method)

julia> @code_typed f()
CodeInfo(
1 ─ %1 = Base.Math.sqrt_llvm(2.0)::Float64
└──      return %1
) => Float64
```

After:
```
julia> @code_typed f()
CodeInfo(
1 ─     return 1.4142135623730951
) => Float64
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants