Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated executable is slow (doing JIT?) #292

Closed
PGS62 opened this issue Nov 21, 2019 · 13 comments
Closed

Generated executable is slow (doing JIT?) #292

PGS62 opened this issue Nov 21, 2019 · 13 comments

Comments

@PGS62
Copy link

PGS62 commented Nov 21, 2019

This is a cross-post from StackOverflow.

Though I can get build_executable to generate an exe file, that exe's performance is poor, consistent with it carrying out a JIT compilation every time it's called, despite my having used a snoop file designed to avoid that. This may be a bug (perhaps restricted to Windows?) or I may be failing correctly to use the functionality provided by PackageCompiler.

Here is my MWE:

TestBuildExecutable.jl contains:

module TestBuildExecutable

Base.@ccallable function julia_main(ARGS::Vector{String}=[""])::Cint
    @show sum(myarray())
    return 0
end

#Function which takes approx 8 seconds to compile. Returns a 500 x 20 array of 1s
function myarray()
    [1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1;
     1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1;
	 
# PLEASE EDIT TO INSERT THE MISSING 496 LINES, EACH IDENTICAL TO THE LINE ABOVE!

     1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1;
     1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1]
end

end #module

SnoopFile.jl contains:

module SnoopFile

currentpath = dirname(@__FILE__)
push!(LOAD_PATH, currentpath)
unique!(LOAD_PATH)

using TestBuildExecutable

println("Start of snoop file execution!")
TestBuildExecutable.julia_main()
println("End of snoop file execution!")

end # module

In a fresh Julia instance, julia_main takes 8.3 seconds for the first execution and half a millisecond for the second execution:

julia> @time TestBuildExecutable.julia_main()
sum(myarray()) = 10000
  8.355108 seconds (425.36 k allocations: 25.831 MiB, 0.06% gc time)
0

julia> @time TestBuildExecutable.julia_main()
sum(myarray()) = 10000
  0.000537 seconds (25 allocations: 82.906 KiB)
0

Call build_executable:

julia> using PackageCompiler

julia> build_executable("d:/philip/source/script/julia/jsource/TestBuildExecutable.jl",
                       "testexecutable",
                       builddir = "d:/temp/builddir15",
                       snoopfile = "d:/philip/source/script/julia/jsource/SnoopFile.jl",
                       verbose = false)
Julia program file:
  "d:\philip\source\script\julia\jsource\TestBuildExecutable.jl"
C program file:
  "C:\Users\Philip\.julia\packages\PackageCompiler\CJQcs\examples\program.c"
Build directory:
  "d:\temp\builddir15"
Start of snoop file execution!
sum(myarray()) = 10000
End of snoop file execution!
[ Info: used 79 out of 79 precompile statements
All done

Finally, in a Windows Command Prompt:

D:\temp\builddir15>testexecutable
sum(myarray()) = 1000

D:\temp\builddir15>

which took (by my stopwatch) 8 seconds to run, and it takes 8 seconds to run every time it's executed, not just the first time. This is consistent with the executable doing a JIT compile every time it's run, but the snoop file is designed to avoid that!

Version information:

julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 8
  JULIA_EDITOR = "C:\Users\Philip\AppData\Local\Programs\Microsoft VS Code\Code.exe"
@PGS62 PGS62 changed the title Windows. Generated executable is slow (doing JIT?) Generated executable is slow (doing JIT?) Nov 26, 2019
@PGS62
Copy link
Author

PGS62 commented Nov 26, 2019

Unfortunately, my guess that this problem is restricted to Windows is not correct. I have now run the MWE on MacOS and on Linux, with the same result: The generated executable is slow, consistent with it carrying out JIT compilation every time it's run.

versioninfo() for linux was:

julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

@KristofferC
Copy link
Member

Please try again with the new API. If it still doesn't work, it would be useful to get a reproduces so I can figure out what is happening.

@MasonProtter
Copy link

MasonProtter commented Feb 25, 2020

I've got a very simple reproducer (I think) on the current master:

Masons-MacBook-Pro:Foo mason$ ls
Manifest.toml	Project.toml	build		build.jl	precompile.jl	src

Masons-MacBook-Pro:Foo mason$ cat src/Foo.jl
module Foo

fib(n::Int) = n < 2 ? n : fib(n-1) + fib(n-2)

function julia_main()::Cint
    n = isempty(ARGS) ? 30 : parse(Int, ARGS[1])
    @time @show fib(n)
    @time @show fib(n)
    return 0
end

end # module
Masons-MacBook-Pro:Foo mason$ cat precompile.jl
using Foo

Foo.fib(10)
Foo.fib(20)
Foo.fib(30)

Foo.julia_main()
Masons-MacBook-Pro:Foo mason$ cat build.jl
using PackageCompiler
create_app(".", "build", precompile_execution_file="precompile.jl", force=true)
Masons-MacBook-Pro:Foo mason$ julia --project=. build.jl
┌ Warning: it is not recommended to create an app without a preexisting manifest
└ @ PackageCompiler ~/.julia/dev/PackageCompiler/src/PackageCompiler.jl:581
[ Info: PackageCompiler: creating base system image (incremental=false)...
[ Info: PackageCompiler: creating system image object file, this might take a while...
[ Info: PackageCompiler: creating system image object file, this might take a while...

Masons-MacBook-Pro:Foo mason$ time build/bin/Foo
fib(n) = 832040
  0.069544 seconds (90.83 k allocations: 4.465 MiB)
fib(n) = 832040
  0.004308 seconds (12 allocations: 384 bytes)

real	0m4.225s
user	0m0.305s
sys	0m0.099s

Masons-MacBook-Pro:Foo mason$ time build/bin/Foo
fib(n) = 832040
  0.062582 seconds (90.83 k allocations: 4.465 MiB)
fib(n) = 832040
  0.003901 seconds (12 allocations: 384 bytes)

real	0m0.318s
user	0m0.278s
sys	0m0.089s

Notice that the first invocation of fib takes longer than the second and has more allocations, furthermore there's a large overhead of ~4 seconds the first time the executable is run but not the second. Precompilation? What's more is that it's actually worse than running julia_main() outside the app:

Masons-MacBook-Pro:Foo mason$ julia --project=. --startup=no -q

julia> using Foo; Foo.julia_main()
[ Info: Precompiling Foo [c0d64c21-6ac2-435c-84ed-380d3bba6884]
fib(n) = 832040
  0.018421 seconds (25.66 k allocations: 1.218 MiB)
fib(n) = 832040
  0.004355 seconds (14 allocations: 736 bytes)
0

Am I doing something wrong in the precompile.jl file?

@MasonProtter
Copy link

Here's my version info:

julia> versioninfo()
Julia Version 1.4.0-rc1.0
Commit b0c33b0cf5 (2020-01-23 17:23 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.2.0)
  CPU: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 2

@KristofferC
Copy link
Member

I don't really think that is the same issue. The issue in OP was about something taking 8 seconds the first time and then being instant.

Here we are talking about something that takes 0.02 seconds the first time. I think the reason for the short first delay is that when we precompile stuff, stdout is not a TTY, so our statements look like

precompile(Tuple{typeof(Base.Printf.print_fixed), Base.PipeEndpoint, Int64, Int32, Int32, Bool, Array{UInt8, 1}})

When we then run the executable, we do output to a TTY so what actually ends up getting compiled is:

precompile(Tuple{typeof(Base.Printf.print_fixed), Base.TTY, Int64, Int32, Int32, Bool, Array{UInt8, 1}})

@KristofferC
Copy link
Member

Also, a majority of startup time is spent from OpenBLAS being really slow to start up...

Screenshot 2020-02-26 at 09 19 34

You can run with filter_stdlibs=true to remove e.g. LinearAlgebra and stuff from the sysimage.

@KristofferC
Copy link
Member

KristofferC commented Feb 26, 2020

If I change the read call to run in

I get the correct IOtype and the first call is fast:

❯ time ./Foo
fib(n) = 832040
  0.005339 seconds (67 allocations: 4.266 KiB)
fib(n) = 832040
  0.003579 seconds (15 allocations: 432 bytes)
./Foo  0.11s user 0.05s system 81% cpu 0.193 total

That would print all the output from the precompiling though. I could also use the same was a Base does by creating a fake TTY (https://github.com/JuliaLang/julia/blob/0fe45ef7349b6ad25fdba7dce169f9b9efa48620/contrib/generate_precompile.jl#L82) and use that. Or I could just try to compile all methods for PipeEndPoint also for TTY. Not sure which is best...

@MasonProtter
Copy link

MasonProtter commented Feb 26, 2020

If I change the read call to run in Line 170 in 68530ef PackageCompiler.jl/src/PackageCompiler.jl
I get the correct IOtype and the first call is fast. That would print all the output from the precompiling though.

I just tried that and indeed it's fast! Thanks for the pointers and investigation. Personally, I don't see a huge problem with printing the precompile statements, but maybe there's a way to filter them out?

@MasonProtter
Copy link

I don't really think that is the same issue. The issue in OP was about something taking 8 seconds the first time and then being instant.

The OP said:

which took (by my stopwatch) 8 seconds to run, and it takes 8 seconds to run every time it's executed, not just the first time. This is consistent with the executable doing a JIT compile every time it's run, but the snoop file is designed to avoid that!

So I suspected that it was something similar where there was a precompile statement that wasn't doing what the OP intended it to do. In this case, I thought it might have to do with the machinery to parse the array, but I'm out of my depth.

@KristofferC
Copy link
Member

KristofferC commented Feb 26, 2020

Yeah, I understand it is easy to think they are related but your case is more of a small inaccuracy while the case in OP seems to be a complete absence of precompilation.

but maybe there's a way to filter them out?

Well, the tricky part is that e.g. using read is exactly filtering them out but then they are no longer printed to a TTY so the type of the IO is no longer representative for normal use. I'm pretty sure I could get the Fake TTY thing we use in Base to work bt it will be a bit of code and I am wondering if the code complexity is worth it.

furthermore there's a large overhead of ~4 seconds the first time the executable is run but not the second. Precompilation?

My guess is that this is the OS running some scan (security?) or something the first time it is loaded. There should not be any precompilation.

@MasonProtter
Copy link

If we use run, things only get printed if the actual functions getting precompiled print, right? I think that's more or less acceptable. If someone wants to run create_app or whatever on a package that does a lot of printing they can alway just quiet the output.

Maybe I should open a PR with that read changed to run and see if anyone else has feelings or suggestions?

@PGS62
Copy link
Author

PGS62 commented Mar 2, 2020

Please try again with the new API. If it still doesn't work, it would be useful to get a reproduces so I can figure out what is happening.

I've done that and I'm happy to report that the new API works correctly against my MWE (mutated as necessary). So I've closed this issue (since I see that @MasonProtter has opened a separate one).

@PGS62 PGS62 closed this as completed Mar 2, 2020
@KristofferC
Copy link
Member

KristofferC commented Mar 2, 2020

Thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants