Performance regression in `Normed` -> `Float` conversions on Julia v1.3.0 #144

kimikage · 2019-11-27T13:52:49Z

I have confirmed that Julia v1.2.0 and v1.3.0 give almost similar results on Normed->Float conversions (#129, #138). However, I found the performance regression (~2x - 3x slower) on x84_64 machines in the following cases:

Vec4{N0f32} -> Vec4{Float32}
Vec4{N0f64} -> Vec4{Float32}
Vec4{N0f64} -> Vec4{Float64}

(cf. #129 (comment))

I'm not going to rush to investigate the cause or fix this problem. I submit this issue as a placeholder in case any useful information is found.

The text was updated successfully, but these errors were encountered:

timholy · 2019-11-27T20:36:00Z

I think those types are very niche. I'm not that worried.

kimikage · 2019-11-29T08:43:10Z

I agree, but my concern is the cause rather than the result. The investigation may help improve other methods (e.g. Fixed -> Float conversions).

Benchmark

julia> versioninfo()
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

Matrix of Vec4 (unit: μs)

w64	`Float32` v1.2.0	`Float32` v1.3.0	`Float64` v1.2.0	`Float64` v1.3.0
`N0f8`	3.814	3.571	4.499	5.725
`N5f3`	3.786	3.457	5.400	5.533
`N0f16`	4.000	3.871	5.100	6.100
`N13f3`	3.800	3.700	4.800	6.333
`N0f32`	4.583	13.599	5.599	6.767
`N8f24`	5.033	4.243	7.800	8.134
`N29f3`	4.933	4.300	6.600	6.367
`N0f64`	13.399	23.000	12.600	21.699
`N61f3`	13.200	12.199	11.400	11.599
`N0f128`	38.800	37.099	35.600	35.200
`N125f3`	44.099	40.199	38.500	40.299

`@code_typed`

julia> Base.VERSION
v"1.2.0"

julia> @code_typed Float32(1N0f32)
CodeInfo(
1 ─       goto #3 if not false
2 ─       nothing::Nothing
3 ┄ %3  = Base.getfield(x, :i)::UInt32
│   %4  = Base.bitcast(Int32, %3)::Int32
│   %5  = Base.lshr_int(%4, 0x0000000000000010)::Int32
│   %6  = Base.shl_int(%4, 0xfffffffffffffff0)::Int32
│   %7  = Base.ifelse(true, %5, %6)::Int32
│   %8  = Base.sitofp(Float32, %7)::Float32
│   %9  = Base.and_int(%4, 65535)::Int32
│   %10 = Base.shl_int(%9, 0x0000000000000008)::Int32
│   %11 = Base.ashr_int(%9, 0xfffffffffffffff8)::Int32
│   %12 = Base.ifelse(true, %10, %11)::Int32
│   %13 = Base.lshr_int(%4, 0x0000000000000018)::Int32
│   %14 = Base.shl_int(%4, 0xffffffffffffffe8)::Int32
│   %15 = Base.ifelse(true, %13, %14)::Int32
│   %16 = Base.or_int(%12, %15)::Int32
│   %17 = Base.sitofp(Float32, %16)::Float32
│   %18 = Base.mul_float(%17, 9.094947f-13)::Float32
│   %19 = Base.muladd_float(%8, 1.5258789f-5, %18)::Float32
└──       return %19
) => Float32

julia> Base.VERSION
v"1.3.0"

julia> @code_typed Float32(1N0f32)
CodeInfo(
1 ─       goto #3 if not false
2 ─       nothing::Nothing
3 ┄ %3  = Base.getfield(x, :i)::UInt32
│   %4  = Base.bitcast(Int32, %3)::Int32
│   %5  = Base.lshr_int(%4, 0x0000000000000010)::Int32
│   %6  = Base.shl_int(%4, 0xfffffffffffffff0)::Int32
│   %7  = Base.ifelse(true, %5, %6)::Int32
│   %8  = Base.sitofp(Float32, %7)::Float32
│   %9  = Base.and_int(%4, 65535)::Int32
│   %10 = Base.sle_int(0, 8)::Bool
│   %11 = Base.bitcast(UInt64, 8)::UInt64
│   %12 = Base.shl_int(%9, %11)::Int32
│   %13 = Base.neg_int(8)::Int64
│   %14 = Base.bitcast(UInt64, %13)::UInt64
│   %15 = Base.ashr_int(%9, %14)::Int32
│   %16 = Base.ifelse(%10, %12, %15)::Int32
│   %17 = Base.lshr_int(%4, 0x0000000000000018)::Int32
│   %18 = Base.shl_int(%4, 0xffffffffffffffe8)::Int32
│   %19 = Base.ifelse(true, %17, %18)::Int32
│   %20 = Base.or_int(%16, %19)::Int32
│   %21 = Base.sitofp(Float32, %20)::Float32
│   %22 = Base.mul_float(%21, 9.094947f-13)::Float32
│   %23 = Base.muladd_float(%8, 1.5258789f-5, %22)::Float32
└──       return %23
) => Float32

Oh gosh...

#144) (#145) Julia 1.3.0 generates more redundant intermediate codes which are eventually optimized. This trivial change reduces the redundant intermediate codes to promote inlining.

kimikage mentioned this issue Nov 29, 2019

Fix performance regression with inlining failure on Julia v1.3.0 (Fixes #144) #145

Merged

kimikage closed this as completed in #145 Nov 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression in `Normed` -> `Float` conversions on Julia v1.3.0 #144

Performance regression in `Normed` -> `Float` conversions on Julia v1.3.0 #144

kimikage commented Nov 27, 2019 •

edited

Loading

timholy commented Nov 27, 2019

kimikage commented Nov 29, 2019 •

edited

Loading

Performance regression in Normed -> Float conversions on Julia v1.3.0 #144

Performance regression in Normed -> Float conversions on Julia v1.3.0 #144

Comments

kimikage commented Nov 27, 2019 • edited Loading

timholy commented Nov 27, 2019

kimikage commented Nov 29, 2019 • edited Loading

Benchmark

Matrix of Vec4 (unit: μs)

@code_typed

Performance regression in `Normed` -> `Float` conversions on Julia v1.3.0 #144

Performance regression in `Normed` -> `Float` conversions on Julia v1.3.0 #144

kimikage commented Nov 27, 2019 •

edited

Loading

kimikage commented Nov 29, 2019 •

edited

Loading

`@code_typed`