Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent failures in bitarray tests #9176

Closed
tkelman opened this issue Nov 27, 2014 · 11 comments
Closed

Intermittent failures in bitarray tests #9176

tkelman opened this issue Nov 27, 2014 · 11 comments
Labels
test This change adds or pertains to unit tests
Milestone

Comments

@tkelman
Copy link
Contributor

tkelman commented Nov 27, 2014

This seems to have started within the last week, and is hitting a lot of unrelated PR's, merge commits, etc. Appears to be Linux-only. Can anyone reproduce this locally?

https://travis-ci.org/JuliaLang/julia/jobs/41787654 (failure in examples, maybe different)
https://travis-ci.org/JuliaLang/julia/jobs/41932980
https://travis-ci.org/JuliaLang/julia/jobs/42137195
https://travis-ci.org/JuliaLang/julia/jobs/42277643
https://travis-ci.org/JuliaLang/julia/jobs/42121382
https://travis-ci.org/JuliaLang/julia/jobs/41944568
https://travis-ci.org/JuliaLang/julia/jobs/41942488
https://travis-ci.org/JuliaLang/julia/jobs/41764444 (failure in subarray, maybe different)
https://travis-ci.org/JuliaLang/julia/jobs/41756704 (failure in linalg3, maybe different)
https://travis-ci.org/JuliaLang/julia/jobs/41743584 (failure in linalg3, maybe different)
https://travis-ci.org/JuliaLang/julia/jobs/41726550 (failure in linalg3, maybe different)
https://travis-ci.org/JuliaLang/julia/jobs/41682731 (failure in readdlm, maybe different)
https://travis-ci.org/JuliaLang/julia/jobs/41489707 (failure in reflection, maybe different)
https://travis-ci.org/JuliaLang/julia/jobs/42198614
https://travis-ci.org/JuliaLang/julia/jobs/42123010
https://travis-ci.org/JuliaLang/julia/jobs/42054331
https://travis-ci.org/JuliaLang/julia/jobs/42053919
https://travis-ci.org/JuliaLang/julia/jobs/41876291
https://travis-ci.org/JuliaLang/julia/jobs/41737821 (failure in linalg1, maybe different)
https://travis-ci.org/JuliaLang/julia/jobs/41764920 (failure in arrayops, maybe different)
https://travis-ci.org/JuliaLang/julia/jobs/41808489 (failures in linalg3 and 4, maybe different)
https://travis-ci.org/JuliaLang/julia/jobs/42415835
https://travis-ci.org/JuliaLang/julia/jobs/42441603
https://travis-ci.org/JuliaLang/julia/jobs/42706154
https://travis-ci.org/JuliaLang/julia/jobs/43031974

@tkelman tkelman added system:linux Affects only Linux test This change adds or pertains to unit tests labels Nov 27, 2014
@timholy
Copy link
Member

timholy commented Dec 1, 2014

I don't know if this is related, but as part of Travis-testing ImageView, I've been getting a crazy-large number of segfaults upon building TestImages, both from nightly and release. Likewise, it's intermittent, and likewise, I can't reproduce the failure locally.

Builds are at https://travis-ci.org/timholy/ImageView.jl/builds, specifically 61-64. Curiously, when I decided to try not relying on the "test/REQUIRE" file and insert a specific Pkg.add in .travis.yml (build 65), then I didn't get a segfault. Whether this is a coincidence or something systematic, I suppose time will tell.

@tkelman
Copy link
Contributor Author

tkelman commented Feb 21, 2015

Here was a bitarray failure on win64 appveyor that may or may not be related, looks a bit different than other failures I've seen: https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.2703/job/0p4en8aw6o9wi7ss

@tkelman
Copy link
Contributor Author

tkelman commented Apr 18, 2015

Really odd bitarray failure here https://travis-ci.org/JuliaLang/julia/jobs/59006825:

signal (11): Segmentation fault
type_eqv__ at /home/travis/build/JuliaLang/julia/src/jltypes.c:1525
typekey_compare at /home/travis/build/JuliaLang/julia/src/jltypes.c:1735
lookup_type_idx at /home/travis/build/JuliaLang/julia/src/jltypes.c:1761
lookup_type at /home/travis/build/JuliaLang/julia/src/jltypes.c:1773
inst_datatype at /home/travis/build/JuliaLang/julia/src/jltypes.c:1874
jl_inst_concrete_tupletype at /home/travis/build/JuliaLang/julia/src/jltypes.c:1992
jl_f_tuple at /home/travis/build/JuliaLang/julia/src/builtins.c:623
getindex at ./array.jl:165
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_apply_generic at /home/travis/build/JuliaLang/julia/src/gf.c:1655
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_f_apply at /home/travis/build/JuliaLang/julia/src/builtins.c:470
_methods at ./reflection.jl:100
unknown function (ip: -1240917776)
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_apply_generic at /home/travis/build/JuliaLang/julia/src/gf.c:1655
abstract_call_gf at ./inference.jl:629
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_apply_generic at /home/travis/build/JuliaLang/julia/src/gf.c:1655
abstract_call at ./inference.jl:800
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_apply_generic at /home/travis/build/JuliaLang/julia/src/gf.c:1655
abstract_eval_call at ./inference.jl:900
abstract_eval at ./inference.jl:927
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_apply_generic at /home/travis/build/JuliaLang/julia/src/gf.c:1655
abstract_interpret at ./inference.jl:1076
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_apply_generic at /home/travis/build/JuliaLang/julia/src/gf.c:1655
typeinf_uncached at ./inference.jl:1514
unknown function (ip: -1240830978)
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_apply_generic at /home/travis/build/JuliaLang/julia/src/gf.c:1655
typeinf at ./inference.jl:1303
unknown function (ip: -1240827567)
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_apply_generic at /home/travis/build/JuliaLang/julia/src/gf.c:1655
typeinf_ext at ./inference.jl:1247
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_apply_generic at /home/travis/build/JuliaLang/julia/src/gf.c:1655
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_type_infer at /home/travis/build/JuliaLang/julia/src/gf.c:412
cache_method at /home/travis/build/JuliaLang/julia/src/gf.c:906
jl_mt_assoc_by_type at /home/travis/build/JuliaLang/julia/src/gf.c:1051
jl_apply_generic at /home/travis/build/JuliaLang/julia/src/gf.c:1663
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_type_infer at /home/travis/build/JuliaLang/julia/src/gf.c:412
cache_method at /home/travis/build/JuliaLang/julia/src/gf.c:906
jl_mt_assoc_by_type at /home/travis/build/JuliaLang/julia/src/gf.c:1051
jl_apply_generic at /home/travis/build/JuliaLang/julia/src/gf.c:1663
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_f_apply at /home/travis/build/JuliaLang/julia/src/builtins.c:470
cat at bitarray.jl:1860
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_trampoline at /home/travis/build/JuliaLang/julia/src/builtins.c:1009
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_apply_generic at /home/travis/build/JuliaLang/julia/src/gf.c:1679
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_f_apply at /home/travis/build/JuliaLang/julia/src/builtins.c:470
check_bitop at bitarray.jl:7
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_apply_generic at /home/travis/build/JuliaLang/julia/src/gf.c:1655
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
do_call at /home/travis/build/JuliaLang/julia/src/interpreter.c:63
eval at /home/travis/build/JuliaLang/julia/src/interpreter.c:210
jl_interpret_toplevel_expr at /home/travis/build/JuliaLang/julia/src/interpreter.c:26
jl_toplevel_eval_flex at /home/travis/build/JuliaLang/julia/src/toplevel.c:504
jl_parse_eval_all at /home/travis/build/JuliaLang/julia/src/toplevel.c:552
jl_load at /home/travis/build/JuliaLang/julia/src/toplevel.c:591
jl_load_ at /home/travis/build/JuliaLang/julia/src/toplevel.c:600
runtests at /tmp/julia/share/julia/test/testdefs.jl:77
jlcall_runtests_66639 at  (unknown line)
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_apply_generic at /home/travis/build/JuliaLang/julia/src/gf.c:1655
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_f_apply at /home/travis/build/JuliaLang/julia/src/builtins.c:470
anonymous at multi.jl:836
run_work_thunk at multi.jl:587
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
jl_apply_generic at /home/travis/build/JuliaLang/julia/src/gf.c:1655
anonymous at task.jl:836
jl_apply at /home/travis/build/JuliaLang/julia/src/julia.h:1281
start_task at /home/travis/build/JuliaLang/julia/src/task.c:232
    From worker 9:       * bitarray            exception on 2: Worker 9 terminated.
ERROR (unhandled task failure): EOFError: read end of file
 in wait at ./task.jl:305
 in wait at ./task.jl:221
 in wait_full at ./multi.jl:570
 in remotecall_fetch at multi.jl:670
 in remotecall_fetch at multi.jl:675
 in anonymous at task.jl:1386
ERROR: LoadError: test error in expression: fetch(@spawnat id_other myid()) == id_other
ProcessExitedException()
 in wait at ./task.jl:305
 in wait at ./task.jl:221
 in wait_full at ./multi.jl:570
 in remotecall_fetch at multi.jl:670
 in call_on_owner at ./multi.jl:717
 in fetch at multi.jl:725
 in anonymous at test.jl:87
 in do_test at test.jl:47
 in runtests at /tmp/julia/share/julia/test/testdefs.jl:77
 in anonymous at multi.jl:836
 in run_work_thunk at multi.jl:587
 in anonymous at task.jl:836
while loading parallel.jl, in expression starting on line 11
    From worker 7:       * examples             in 146.69 seconds
    From worker 8:       * subarray             in 453.01 seconds
    From worker 4:       * ranges               in 372.09 seconds
ERROR: LoadError: ProcessExitedException()
 in anonymous at task.jl:1388
while loading /tmp/julia/share/julia/test/runtests.jl, in expression starting on line 3
    From worker 2:       * parallel   

and maybe related subarray failure here https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.3895/job/cbasxe5fj9408wcn

    From worker 3:       * sparse               in  43.76 seconds

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x68cc4203 -- unknown function (ip: 1758216707)
unknown function (ip: 1758216707)
unknown function (ip: 1758216707)
jl_inst_concrete_tupletype at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_f_tuple at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
unknown function (ip: 1659055879)
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_f_apply at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
unknown function (ip: 1659063123)
unknown function (ip: 1659063667)
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
unknown function (ip: 1658584129)
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
unknown function (ip: 1659069203)
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
unknown function (ip: 1659085863)
unknown function (ip: 1659087539)
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
unknown function (ip: 1658535338)
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
unknown function (ip: 1659123226)
unknown function (ip: 1659133413)
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
unknown function (ip: 1659134619)
unknown function (ip: 1659136457)
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
unknown function (ip: 1659136632)
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_method_cache_insert at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_method_cache_insert at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_method_cache_insert at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_method_cache_insert at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_get_specialization at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_load_and_lookup at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_load_and_lookup at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_load_and_lookup at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_load_and_lookup at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_compile at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_trampoline at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
anonymous at test.jl:87
do_test at test.jl:47
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_interpret_toplevel_expr at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_interpret_toplevel_thunk_with at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_eval_with_compiler_p at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_parse_eval_all at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_load_ at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
runtests at C:\projects\julia\test\testdefs.jl:77
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
anonymous at task.jl:11
jl_f_apply at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
anonymous at multi.jl:836
run_work_thunk at multi.jl:587
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
anonymous at task.jl:836
jl_unprotect_stack at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
unknown function (ip: 4203385)
Worker 2 terminated.    From worker 2:       * subarray             
ERROR (unhandled task failure): EOFError: read end of file
 in remotecall_fetch at multi.jl:670
 in remotecall_fetch at multi.jl:675
 in anonymous at task.jl:1386
    From worker 3:       * bitarray             in  57.47 seconds

@JeffBezanson
Copy link
Member

Travis seems to be pretty stable now.

@tkelman
Copy link
Contributor Author

tkelman commented Apr 25, 2015

Dunno, this one has been pretty nasty and long-standing. What do you think would have fixed this?

@tkelman
Copy link
Contributor Author

tkelman commented Apr 27, 2015

potentially related? https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.4189/job/khj3vl1bnlq21lxf

    From worker 3:       * linalg/lapack        in   1.35 seconds

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x68cc16ab -- unknown function (ip: 1758205611)
unknown function (ip: 1758205611)
unknown function (ip: 1758205611)
jl_f_isa at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
unknown function (ip: 1659056467)
unknown function (ip: 1659073461)
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
unknown function (ip: 1659074654)
unknown function (ip: 1659076441)
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
unknown function (ip: 1659076616)
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_method_cache_insert at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_method_cache_insert at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_method_cache_insert at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_method_cache_insert at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_get_specialization at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_load_and_lookup at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_load_and_lookup at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_load_and_lookup at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_load_and_lookup at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_compile at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_get_specialization at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_load_and_lookup at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_load_and_lookup at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_load_and_lookup at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_load_and_lookup at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_load_and_lookup at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_load_and_lookup at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_compile at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_trampoline at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
anonymous at no file:62
jl_eval_with_compiler_p at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_parse_eval_all at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_load_ at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
runtests at C:\projects\julia\test\testdefs.jl:77
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
jl_f_apply at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
anonymous at multi.jl:836
run_work_thunk at multi.jl:587
jl_apply_generic at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
anonymous at task.jl:836
jl_unprotect_stack at C:\projects\julia\usr\bin\libjulia.dll (unknown line)
unknown function (ip: 4203385)
Worker 2 terminated.    From worker 2:       * linalg1              
ERROR (unhandled task failure): EOFError: read end of file
 in yieldto at task.jl:19
 in remotecall_fetch at multi.jl:670
 in remotecall_fetch at multi.jl:675
 in anonymous at task.jl:1390
    From worker 3:       * linalg/triangular    in  62.42 seconds
ERROR: LoadError: ProcessExitedException()

@tkelman
Copy link
Contributor Author

tkelman commented Apr 27, 2015

@tkelman tkelman reopened this Apr 27, 2015
@tkelman tkelman changed the title Intermittent failures in parallel or bitarray tests on Travis Intermittent failures in parallel or bitarray tests Apr 27, 2015
@tkelman tkelman removed the system:linux Affects only Linux label Apr 27, 2015
@tkelman tkelman changed the title Intermittent failures in parallel or bitarray tests Intermittent failures in bitarray tests Apr 27, 2015
@carnaval
Copy link
Contributor

carnaval commented May 6, 2015

Hey, I may have fixed this. The missing gc root from b61f46f also triggered the bug in bitarray and the timeline somewhat corresponds to the introduction of staged function (we could probably be more definitive about this by looking at the precise history which I have not done).

What about trying a wishful closing again ?

@tkelman
Copy link
Contributor Author

tkelman commented May 6, 2015

Sounds good to me, that's not merged yet but presumably will be once CI's green.

I thought staged functions were merged well before november?

@carnaval
Copy link
Contributor

carnaval commented May 6, 2015

Yep but this only happens if something in the bitarray test ends up calling a staged function (for the first time with this particular arg tuple). And when the right kind of memory pressure is on the gc of course. I meant this comment in the sense that if the failure started before the introduction of staged function it would mean that another bug was hiding here.

Still, no proof of anything, so we'll have to wait and see.

@tkelman tkelman closed this as completed May 13, 2015
@timholy
Copy link
Member

timholy commented May 13, 2015

🍰

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test This change adds or pertains to unit tests
Projects
None yet
Development

No branches or pull requests

4 participants