Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow precompile (>50min) on julia 1.6.0 on Windows #1554

Closed
yha opened this issue Mar 30, 2021 · 35 comments · Fixed by #1711
Closed

Very slow precompile (>50min) on julia 1.6.0 on Windows #1554

yha opened this issue Mar 30, 2021 · 35 comments · Fixed by #1711

Comments

@yha
Copy link

yha commented Mar 30, 2021

    Updating registry at `C:\Users\sternlab\.julia\registries\General`
    Updating git-repo `https://github.com/JuliaRegistries/General.git`
   Resolving package versions...
   Installed Requires ─────── v1.1.3
   Installed FillArrays ───── v0.11.7
   Installed ColorTypes ───── v0.10.12
   Installed StatsBase ────── v0.33.4
   Installed ChainRulesCore ─ v0.9.34
   Installed StaticArrays ─── v1.1.0
   Installed Zygote ───────── v0.6.7
   Installed Memoize ──────── v0.4.4
   Installed GPUCompiler ──── v0.10.0
   Installed CUDA ─────────── v2.6.2
    Updating `C:\Users\sternlab\.julia\environments\v1.6\Project.toml`
  [587475ba] + Flux v0.12.0
  [e88e6eb3] + Zygote v0.6.7
  [700de1a5] + ZygoteRules v0.2.1
    Updating `C:\Users\sternlab\.julia\environments\v1.6\Manifest.toml`
  [621f4979] + AbstractFFTs v1.0.1
  [1520ce14] + AbstractTrees v0.3.4
  [79e6a3ab] + Adapt v3.2.0
  [ab4f0b2a] + BFloat16s v0.1.0
  [fa961155] + CEnum v0.4.1
  [052768ef] + CUDA v2.6.2
  [082447d4] + ChainRules v0.7.56
  [d360d2e6] + ChainRulesCore v0.9.34
  [944b1d66] + CodecZlib v0.7.0
  [3da002f7] + ColorTypes v0.10.12
  [5ae59095] + Colors v0.12.6
  [bbf7d656] + CommonSubexpressions v0.3.0
  [34da2185] + Compat v3.25.0
  [9a962f9c] + DataAPI v1.6.0
  [864edb3b] + DataStructures v0.18.9
  [163ba53b] + DiffResults v1.0.3
  [b552c78f] + DiffRules v1.0.2
  [e2ba6199] + ExprTools v0.1.3
  [1a297f60] + FillArrays v0.11.7
  [53c48c17] + FixedPointNumbers v0.8.4
  [587475ba] + Flux v0.12.0
  [f6369f11] + ForwardDiff v0.10.17
  [d9f16b24] + Functors v0.2.1
  [0c68f7d7] + GPUArrays v6.2.0
  [61eb1bfa] + GPUCompiler v0.10.0
  [7869d1d1] + IRTools v0.4.2
  [692b3bcd] + JLLWrappers v1.2.0
  [e5e0dc1b] + Juno v0.8.4
  [929cbde3] + LLVM v3.6.0
  [1914dd2f] + MacroTools v0.5.6
  [e89f7d12] + Media v0.5.0
  [c03570c3] + Memoize v0.4.4
  [e1d29d7a] + Missings v0.4.5
  [872c559c] + NNlib v0.7.17
  [77ba4419] + NaNMath v0.3.5
  [bac558e1] + OrderedCollections v1.4.0
  [189a3867] + Reexport v1.0.0
  [ae029012] + Requires v1.1.3
  [6c6a2e73] + Scratch v1.0.3
  [a2af1166] + SortingAlgorithms v0.3.1
  [276daf66] + SpecialFunctions v1.3.0
  [90137ffa] + StaticArrays v1.1.0
  [2913bbd2] + StatsBase v0.33.4
  [a759f4b9] + TimerOutputs v0.5.8
  [3bb67fe8] + TranscodingStreams v0.9.5
  [a5390f91] + ZipFile v0.9.3
  [e88e6eb3] + Zygote v0.6.7
  [700de1a5] + ZygoteRules v0.2.1
  [efe28fd5] + OpenSpecFun_jll v0.5.3+4
  [0dad84c5] + ArgTools
  [56f22d72] + Artifacts
  [2a0f44e3] + Base64
  [ade2ca70] + Dates
  [8bb1440f] + DelimitedFiles
  [8ba89e20] + Distributed
  [f43a241f] + Downloads
  [b77e0a4c] + InteractiveUtils
  [4af54fe1] + LazyArtifacts
  [b27032c2] + LibCURL
  [76f85450] + LibGit2
  [8f399da3] + Libdl
  [37e2e46d] + LinearAlgebra
  [56ddb016] + Logging
  [d6f4376e] + Markdown
  [a63ad114] + Mmap
  [ca575930] + NetworkOptions
  [44cfe95a] + Pkg
  [de0858da] + Printf
  [9abbd945] + Profile
  [3fa0cd96] + REPL
  [9a3f8284] + Random
  [ea8e919c] + SHA
  [9e88b42a] + Serialization
  [1a1011a3] + SharedArrays
  [6462fe0b] + Sockets
  [2f01184e] + SparseArrays
  [10745b16] + Statistics
  [fa267f1f] + TOML
  [a4e569a6] + Tar
  [8dfed614] + Test
  [cf7118a7] + UUIDs
  [4ec0a83e] + Unicode
  [e66e0078] + CompilerSupportLibraries_jll
  [deac9b47] + LibCURL_jll
  [29816b5a] + LibSSH2_jll
  [c8ffd9c3] + MbedTLS_jll
  [14a3606d] + MozillaCACerts_jll
  [83775a58] + Zlib_jll
  [8e850ede] + nghttp2_jll
  [3f19e933] + p7zip_jll
Precompiling project...
  Progress [========================================>]  52/53
  ◐ Flux
rogress [========================================>]  53/53
53 dependencies successfully precompiled in 3367 seconds

julia> versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i9-7940X CPU @ 3.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Most of that time the progress bar was on 52/53, with only Flux still precompiling.
On WSL on the same machine precompilation finished in 59 seconds (but the environment was not identical)

@DhairyaLGandhi
Copy link
Member

What happens if you try to precompile Flux on windows again? Is this behavior reproducible?

I should reinstate windows CI too

@yha
Copy link
Author

yha commented Mar 31, 2021

I tried deleting everything under .julia\compiled\v1.6 and precompiling the same environment again and could not reproduce: this time it finished in under a minute.
Is there anything else to check?

@DhairyaLGandhi
Copy link
Member

Weird, it would be difficult to address if we can't reliably reproduce this, and CI hasn't shown that behaviour either.

@yha
Copy link
Author

yha commented Apr 5, 2021

Now saw a similar thing on a different machine

   Installed ForwardDiff ────────── v0.10.17
   Installed ChainRulesCore ─────── v0.9.37
   Installed StaticArrays ───────── v1.1.0
   Installed Zygote ─────────────── v0.6.8
   Installed CUDA ───────────────── v2.6.2
   Installed OpenSpecFun_jll ────── v0.5.3+4
   Installed CEnum ──────────────── v0.4.1
   Installed Memoize ────────────── v0.4.4
   Installed DiffRules ──────────── v1.0.2
   Installed TranscodingStreams ─── v0.9.5
   Installed AbstractFFTs ───────── v1.0.1
   Installed GPUCompiler ────────── v0.10.0
   Installed SpecialFunctions ───── v1.3.0
   Installed ZygoteRules ────────── v0.2.1
   Installed Functors ───────────── v0.2.1
   Installed TimerOutputs ───────── v0.5.8
   Installed ChainRules ─────────── v0.7.57
  Downloaded artifact: OpenSpecFun
    Updating `C:\Users\Yuval\.julia\environments\v1.6\Project.toml`
  [587475ba] + Flux v0.12.1
    Updating `C:\Users\Yuval\.julia\environments\v1.6\Manifest.toml`
  [621f4979] + AbstractFFTs v1.0.1
  [79e6a3ab] + Adapt v3.2.0
  [ab4f0b2a] + BFloat16s v0.1.0
  [fa961155] + CEnum v0.4.1
  [052768ef] + CUDA v2.6.2
  [082447d4] + ChainRules v0.7.57
  [d360d2e6] + ChainRulesCore v0.9.37
  [944b1d66] + CodecZlib v0.7.0
  [bbf7d656] + CommonSubexpressions v0.3.0
  [9a962f9c] + DataAPI v1.6.0
  [163ba53b] + DiffResults v1.0.3
  [b552c78f] + DiffRules v1.0.2
  [e2ba6199] + ExprTools v0.1.3
  [1a297f60] + FillArrays v0.11.7
  [587475ba] + Flux v0.12.1
  [f6369f11] + ForwardDiff v0.10.17
  [d9f16b24] + Functors v0.2.1
  [0c68f7d7] + GPUArrays v6.2.1
  [61eb1bfa] + GPUCompiler v0.10.0
  [7869d1d1] + IRTools v0.4.2
  [692b3bcd] + JLLWrappers v1.2.0
  [929cbde3] + LLVM v3.6.0
  [c03570c3] + Memoize v0.4.4
  [e1d29d7a] + Missings v0.4.5
  [872c559c] + NNlib v0.7.17
  [77ba4419] + NaNMath v0.3.5
  [6c6a2e73] + Scratch v1.0.3
  [a2af1166] + SortingAlgorithms v0.3.1
  [276daf66] + SpecialFunctions v1.3.0
  [90137ffa] + StaticArrays v1.1.0
  [2913bbd2] + StatsBase v0.33.4
  [a759f4b9] + TimerOutputs v0.5.8
  [3bb67fe8] + TranscodingStreams v0.9.5
  [a5390f91] + ZipFile v0.9.3
  [e88e6eb3] + Zygote v0.6.8
  [700de1a5] + ZygoteRules v0.2.1
  [efe28fd5] + OpenSpecFun_jll v0.5.3+4
  [4af54fe1] + LazyArtifacts
  [e66e0078] + CompilerSupportLibraries_jll
  Progress [========================================>]  90/91
  ◑ Flux

  Progress [========================================>]  91/91
91 dependencies successfully precompiled in 10453 seconds

(@v1.6) pkg>

(@v1.6) pkg>

julia> versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: AMD Ryzen 3 3100 4-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)

(the top part of this was no longer available in the terminal by scrolling up)
Again, not reproduced after deleting .julia\compiled\v1.6 and running ]precompile again.

@ToucheSir
Copy link
Member

Are you possibly running a slow disk? I've found that precompiling (not Flux, but in general) on a cold disk cache on an HDD can be very, very slow. Other avenues to explore might be whether the AV on Windows is interfering with this somehow.

@t-milan
Copy link

t-milan commented Apr 27, 2021

I'm having the same issue (~20 mins on my machine). I'm using an SSD and disabled my AV.

  ...
  [e5e0dc1b] + Juno v0.8.4
  [929cbde3] + LLVM v3.6.0
  [e89f7d12] + Media v0.5.0
  [c03570c3] + Memoize v0.4.4
  [a00861dc] + NNlibCUDA v0.1.0
  [e6cf234a] + RandomNumbers v1.4.0
  [a759f4b9] + TimerOutputs v0.5.8
  [e88e6eb3] + Zygote v0.6.10
  [9abbd945] + Profile
Precompiling project...
  15 dependencies successfully precompiled in 1223 seconds (259 already precompiled)

julia> versioninfo()
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: AMD Ryzen 7 3700X 8-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)

@DhairyaLGandhi
Copy link
Member

These issues are hard to fix without reproducers. Do you observe the same behavior consistently if you nuke the compiled versions of all the dependencies?

@t-milan
Copy link

t-milan commented Apr 27, 2021

No, I've just deleted .julia\compiled\v1.6 and ran ]precompile and it finished in 82 seconds

@femtomc
Copy link

femtomc commented May 5, 2021

I'm having the same issue on an AWS box with Ubuntu 18.04 -- precompilation seems to take upwards of 20+ minutes. Honestly, I haven't finished it yet because I keep getting frustrated.

I've repeatedly nuked .julia as well, and it occurs each time. I'm now trying a 4th time and will report timing.

Box info:
image

@femtomc
Copy link

femtomc commented May 6, 2021

Ah -- I think I may realize the issue. It's possible that it's pulling the CUDA drivers in > 1.6 but not showing that it is doing it.

@lmiq
Copy link

lmiq commented May 12, 2021

I think I have the same problem of pulling CUDA drivers. But no message is shown, and this process takes a lot of time (even if my connection is not bad).

But I did left it precompiling for two hours, with no success (linux 64bits, Julia 1.6.1, SSD disk):

@v1.6) pkg> st
      Status `~/.julia/environments/v1.6/Project.toml`
  [6e4b80f9] BenchmarkTools v0.7.0
  [336ed68f] CSV v0.8.4
  [b630d9fa] CheapThreads v0.2.3
  [46823bd8] Chemfiles v0.9.3
  [35d6a980] ColorSchemes v3.12.0
  [6f35c628] ComplexMixtures v0.4.16
  [a93c6f00] DataFrames v1.0.1
  [b4f34e82] Distances v0.10.2
  [e30172f5] Documenter v0.26.3
  [35a29f4d] DocumenterTools v0.1.10
  [fde71243] EasyFit v0.5.4
  [7a1cc6ca] FFTW v1.4.0
  [cc61a311] FLoops v0.1.10
  [587475ba] Flux v0.12.3
  [59287772] Formatting v0.4.2
  [c58ffaec] FortranFiles v0.6.0
  [28b8d3ca] GR v0.57.4
  [4c0ca9eb] Gtk v1.1.7
  [f67ccb44] HDF5 v0.15.4
  [cd3eb016] HTTP v0.9.0
  [7073ff75] IJulia v1.23.2
  [18364772] IPython v0.5.0
  [916415d5] Images v0.24.1
  [0f8b85d8] JSON3 v1.8.1
  [b964fa9f] LaTeXStrings v1.2.1
  [23fbe1c1] Latexify v0.15.5
  [bdcacae8] LoopVectorization v0.12.12
  [33e6dc65] MKL v0.4.0
  [6741aa20] Neptune v0.14.0
  [6fd5a793] Octavian v0.2.13
  [6fe1bfb0] OffsetArrays v1.7.0
  [429524aa] Optim v1.3.0
  [e29189f1] PDBTools v0.12.9
  [8314cec4] PGFPlotsX v1.2.10
  [d96e819e] Parameters v0.12.2
  [91a5bcdd] Plots v1.12.0
  [c3e4b0f8] Pluto v0.14.3
  [7f904dfe] PlutoUI v0.7.1
  [08abe8d2] PrettyTables v1.0.0
  [92933f4c] ProgressMeter v1.5.0
  [d330b81b] PyPlot v2.9.0
  [1fd47b50] QuadGK v2.4.1
  [295af30f] Revise v3.1.15
  [efcf1570] Setfield v0.7.0
  [f62ebe17] ShortCodes v0.3.2
  [aa65fe97] SnoopCompile v2.6.0
  [e2b509da] SnoopCompileCore v2.5.2
  [860ef19b] StableRNGs v1.0.0
  [90137ffa] StaticArrays v1.1.1
  [2913bbd2] StatsBase v0.33.7
  [09ab397b] StructArrays v0.5.1
  [856f2bd8] StructTypes v1.7.2
  [0c5d862f] Symbolics v0.1.24
  [b189fb0b] ThreadPools v1.2.1
  [ac1d9e8a] ThreadsX v0.1.7
  [a759f4b9] TimerOutputs v0.5.8
  [bc48ee85] Tullio v0.2.14
  [1986cc42] Unitful v1.7.0
  [45397f5d] UnitfulLatexify v1.5.1
  [42071c24] UnitfulRecipes v1.2.0
  [33b4df10] VectorizedRNG v0.2.8
  [9fa8497b] Future
  [de0858da] Printf

Now when I try to use Flux, this occurs:


julia> using Flux
[ Info: Precompiling Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]
 Downloading artifact: CUDA
    Downloading [=============>                           ]  30.6 %

(this has been running for more than an hour now)

@DhairyaLGandhi
Copy link
Member

Cc @maleadt

@maleadt
Copy link
Collaborator

maleadt commented May 12, 2021

Do you have slow internet? If not, the package servers may be experiencing an issue again (cc @staticfloat).

That said, Flux.jl should not initialize CUDA.jl when loading the package, but only when the user calls the gpu function for the first time. And/or leave it up to the user to select whether CUDA support is required and implement that using Preferences.jl.

@lmiq
Copy link

lmiq commented May 12, 2021

No, my internet is not particularly slow, it is a broadband, I can download gbs in few minutes.

Now it is like this:

julia> using Flux
[ Info: Precompiling Flux [587475ba-b771-5e3f-ad9e-33799f191a9c]
 Downloading artifact: CUDA  Downloaded artifact: CUDA
 Downloading artifact: CUDNN
    Downloading [============================>            ]  69.0 %

@lmiq
Copy link

lmiq commented May 12, 2021

Just for the record, it finally finished and apparently worked.

@maleadt
Copy link
Collaborator

maleadt commented May 12, 2021

You can always try with JULIA_PKG_SERVER="" to download directly from GitHub. If that helps (after deleting / moving your .julia folder to trigger the issue again), it might be a package server issue again

@lmiq
Copy link

lmiq commented May 12, 2021

If I understand correctly, I did export JULIA_PKG_SERVER="" in bash, and then launched Julia and ran ] add Flux.

I am now at this:

image

It is unclear if that is downloading anything behind the scenes or not. If the problem is the download speed, probably this is an issue that should be reported somewhere else, to improve the user experience at least such that the user knows that something is going on, and not just a stall.

edit: finished in ~5 minutes.

@maleadt
Copy link
Collaborator

maleadt commented May 12, 2021

It never downloads CUDA artifacts during an add, but when you do using Flux.
@DhairyaLGandhi This can't be due to downloading CUDA artifacts then (unless Flux calls CUDA functions from global scope, which it shouldn't).

@lmiq
Copy link

lmiq commented May 12, 2021

Uhm... so maybe there are two separate issues here. Yesterday night I had to Control-C the precompilation (which was on the above state) because it seemed to never end (>2 hours). Today I tried using Flux and it took a long time apparently downloading CUDA, as I have shown above.

After removing .julia and following your advice on the JULIA_PKG_SERVER installation went smoothly (5 minutes), but maybe that had nothing to do with the package server, but with removing the .julia directory.

@staticfloat
Copy link
Contributor

Can you also show me the output of curl -v -L https://pkg.julialang.org/meta?

@lmiq
Copy link

lmiq commented May 12, 2021

curl -v -L https://pkg.julialang.org/meta
*   Trying 2a04:4e42:2a::729:443...
* TCP_NODELAY set
* Connected to pkg.julialang.org (2a04:4e42:2a::729) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=pkg.julialang.org
*  start date: Nov  2 18:34:16 2020 GMT
*  expire date: Dec  4 18:34:16 2021 GMT
*  subjectAltName: host "pkg.julialang.org" matched cert's "pkg.julialang.org"
*  issuer: C=BE; O=GlobalSign nv-sa; CN=GlobalSign Atlas R3 DV TLS CA 2020
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x5601c0e98e10)
> GET /meta HTTP/2
> Host: pkg.julialang.org
> user-agent: curl/7.68.0
> accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 301 
< server: Varnish
< retry-after: 0
< location: https://us-east.pkg.julialang.org/meta
< x-geo-continent: SA
< x-geo-country: BR
< x-geo-region: SP
< accept-ranges: bytes
< date: Wed, 12 May 2021 15:38:46 GMT
< via: 1.1 varnish
< x-served-by: cache-gig17026-GIG
< x-cache: HIT
< x-cache-hits: 0
< x-timer: S1620833926.449236,VS0,VE0
< content-length: 0
< 
* Connection #0 to host pkg.julialang.org left intact
* Issue another request to this URL: 'https://us-east.pkg.julialang.org/meta'
*   Trying 2600:1f18:18bc:3800:8d39:de54:21e4:22d5:443...
* TCP_NODELAY set
*   Trying 54.144.24.222:443...
* TCP_NODELAY set
* Connected to us-east.pkg.julialang.org (2600:1f18:18bc:3800:8d39:de54:21e4:22d5) port 443 (#1)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=us-east.pkg.julialang.org
*  start date: May  4 15:44:20 2021 GMT
*  expire date: Aug  2 15:44:20 2021 GMT
*  subjectAltName: host "us-east.pkg.julialang.org" matched cert's "us-east.pkg.julialang.org"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x5601c0e98e10)
> GET /meta HTTP/2
> Host: us-east.pkg.julialang.org
> user-agent: curl/7.68.0
> accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 200 
< server: nginx/1.19.10
< date: Wed, 12 May 2021 15:38:47 GMT
< content-type: application/json
< content-length: 321
< 
* Connection #1 to host us-east.pkg.julialang.org left intact
{"start_time":"2021-05-11T21:15:30.409","pkgserver_version":"0.2.0-bd2b1e78605028ab840f8f5988f969a7f5a1beda","pkgserver_url":"https://us-east.pkg.julialang.org","julia_version":"1.6.1","registry_watchdog_task":"started","last_registry_update":"2021-05-12T15:38:46.435","registry_update_task":"started","maxrss":502829056}

@staticfloat
Copy link
Contributor

It never downloads CUDA artifacts during an add, but when you do using Flux.

This actually does not seem to be true anymore. With Julia v1.6.1 and a fresh depot on cyclops, I see Flux automatically installing this artifact during precompilation. My guess is that this is happening due to the automatic precompilation in v1.6.

@lmiq can you test how long it takes to download this artifact on your machine? You can test by doing something like:

curl -L https://pkg.julialang.org/artifact/d90103bfc06085e90c69bccc59c894e1ea6262db -O
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  824M  100  824M    0     0  8218k      0  0:01:42  0:01:42 --:--:-- 8540k

On my machine, I get an average download speed of ~8MB/s, which means it takes a little under two minutes to download the full artifact.

@maleadt
Copy link
Collaborator

maleadt commented May 12, 2021

This actually does not seem to be true anymore.

Ah yes, Flux does bad things. I've created #1597 to track this.

@lmiq
Copy link

lmiq commented May 12, 2021

My connection is not as good but, for example, this download occurred with 0.8Mb/s:

curl -L https://s3.amazonaws.com/nist-srd/SD19/by_class.zip -O
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  983M  100  983M    0     0   791k      0  0:21:13  0:21:13 --:--:-- 3305k

while the artifact is still downloading:

curl -L https://pkg.julialang.org/artifact/d90103bfc06085e90c69bccc59c894e1ea6262db -O
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
 66  824M   66  550M    0     0   370k      0  0:37:59  0:25:21  0:12:38 1245k

at, for the moment, at ~0.25Mb/s in average if I computed it correctly.

Both are not very good, but we suffer most fetching the file from the Julia server.

@staticfloat
Copy link
Contributor

Hmmm, I can download from the same server at much higher speeds, so I'm afraid this is most likely an issue due to your internet connection. Any way you cut it, downloading an 800MB file at <1MB/s is going to take a long time, and many things can go wrong during a download that takes >10 minutes. Not sure there's much we can do to improve this right now.

@lmiq
Copy link

lmiq commented May 12, 2021

Thanks, I completely understand. I will check if there is a Julia server available nearby (at one of the Brazilian universities) from which I can fetch things at higher speed. If not, perhaps I will think about building one.

So even the precompilation time was associated to that, as pointed by @maleadt ?

Should I/we/someone report an issue in general for download be dissociated from precompilation? Or at least show a progress bar?

@staticfloat
Copy link
Contributor

staticfloat commented May 12, 2021 via email

@lmiq
Copy link

lmiq commented May 12, 2021

I could easily give you access to a machine with those characteristics, and that would be fine if the traffic is not too high to/from it. But for a more persistent and supported solution I need to talk to the university staff. But it may be possible.

Another university here that traditionally hosts repositories is UFPR. I will try to link one colleague from there: @adolfont

Adolfo, how are you? The UFPR has a long tradition in hosting repositories for linux distributions. Do you think it would be easy there to host a repository for Juila?

@staticfloat
Copy link
Contributor

@lmiq I actually ended up being able to set up an EC2 instance in Sao Paulo today; so in the short term, we won't need to partner with a university (although they are of course welcome to set up a pkg server for their own internal use; it's quite easy.

Can you try some Pkg operations again now and see if things are any faster? You can check that the Pkg server you're connecting to is the new south american one by looking at the /meta endpoint of pkg.julialang.org:

$ curl -L https://pkg.julialang.org/meta | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   316  100   316    0     0   3807      0 --:--:-- --:--:-- --:--:--  3807
{
  "start_time": "2021-05-13T21:08:07.092",
  "pkgserver_version": "0.2.0-2cfd18ead7e9c1219d9efc3f071755d00c3538a2",
  "pkgserver_url": "https://sa.pkg.julialang.org",
  "julia_version": "1.6.1",
  "registry_watchdog_task": "started",
  "last_registry_update": "2021-05-13T21:37:12.111",
  "registry_update_task": "started",
  "maxrss": 373772288
}

@lmiq
Copy link

lmiq commented May 14, 2021

That download is definitely (much) faster:

curl -L https://pkg.julialang.org/meta | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   316  100   316    0     0    218      0  0:00:01  0:00:01 --:--:--   608
{
  "start_time": "2021-05-13T21:08:07.092",
  "pkgserver_version": "0.2.0-2cfd18ead7e9c1219d9efc3f071755d00c3538a2",
  "pkgserver_url": "https://sa.pkg.julialang.org",
  "julia_version": "1.6.1",
  "registry_watchdog_task": "started",
  "last_registry_update": "2021-05-14T00:03:52.027",
  "registry_update_task": "started",
  "maxrss": 403759104
}
leandro@pitico:~/Downloads% curl -L https://pkg.julialang.org/artifact/d90103bfc06085e90c69bccc59c894e1ea6262db -O
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  824M  100  824M    0     0  11.8M      0  0:01:09  0:01:09 --:--:-- 11.1M

Package operations appear to be faster now (although is night now, I will try again tomorrow at working hours).

edit: Day-time, and still very, very fast! That changes significantly the user experience here. Thank you! @staticfloat

curl -L https://pkg.julialang.org/artifact/d90103bfc06085e90c69bccc59c894e1ea6262db -O
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  824M  100  824M    0     0  11.1M      0  0:01:13  0:01:13 --:--:-- 13.3M

@adolfont
Copy link

I could easily give you access to a machine with those characteristics, and that would be fine if the traffic is not too high to/from it. But for a more persistent and supported solution I need to talk to the university staff. But it may be possible.

Another university here that traditionally hosts repositories is UFPR. I will try to link one colleague from there: @adolfont

Adolfo, how are you? The UFPR has a long tradition in hosting repositories for linux distributions. Do you think it would be easy there to host a repository for Juila?

Hi, @lmiq. I actually work at UTFPR Curitiba but I have friends at UFPR Curitiba.

I understand part of the problem was already solved ("in the short term, we won't need to partner with a university" @staticfloat) but that we/they could set up a pkg server for our/their own internal use). I am not sure there is enough people using Julia here to justify that. In case there is, I will point them here.

@lmiq
Copy link

lmiq commented May 14, 2021

@adolfont Thank you Adolfo! Apparently that is solved indeed. We'll keep one eye on those other possibilities if they become useful.

Grande abraço!

@adolfont
Copy link

@abelsiqueira works at UFPR and has several Julia repos https://github.com/abelsiqueira?tab=repositories

@abelsiqueira
Copy link

Unfortunately, I'm not involved with the mirrors, they are maintained by a research group: https://www.c3sl.ufpr.br/espelhos/
If the interest arises again, I can help ask them if they're willing to do it.

@lmiq
Copy link

lmiq commented May 17, 2021

Thank you @abelsiqueira . I think that for the moment a good solution was found by establishing a server in São Paulo. If Julia becomes (and hopefully it will :-) ) widely used here, it may be interesting to have a server within the university-network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants