Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in OCaml gc #103

Open
nathanfarlow opened this issue Oct 6, 2024 · 3 comments
Open

Segfault in OCaml gc #103

nathanfarlow opened this issue Oct 6, 2024 · 3 comments

Comments

@nathanfarlow
Copy link

The folllowing code will segfault on my system.

Python 3.11.4
Ocaml 5.1.1
pyml 20231101

let () =
  Py.initialize ();
  let m =
    Py.Import.exec_code_module_from_string
      ~name:"go.py"
      "import numpy as np\na = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32)"
  in
  let a = Py.Module.get m "a" |> Numpy.to_bigarray Float32 C_layout in
  Owl.Dense.Ndarray.S.print a;
  Gc.full_major ()
;;

It dies in the OCaml garbage collector:

(gdb) bt
#0  0x0000555555f84058 in custom_finalize_minor (domain=0x55555628bab0) at runtime/minor_gc.c:694
#1  caml_stw_empty_minor_heap_no_major_slice (domain=0x55555628bab0, participating_count=1, participating=<optimized out>, 
    unused=<optimized out>) at runtime/minor_gc.c:743
#2  0x0000555555f6fb19 in caml_try_run_on_all_domains_with_spin_work (sync=sync@entry=1, 
    handler=handler@entry=0x555555f840a0 <caml_stw_empty_minor_heap>, data=data@entry=0x0, 
    leader_setup=leader_setup@entry=0x555555f82e40 <caml_empty_minor_heap_setup>, 
    enter_spin_callback=enter_spin_callback@entry=0x555555f82ff0 <caml_do_opportunistic_major_slice>, enter_spin_data=enter_spin_data@entry=0x0)
    at runtime/domain.c:1483
#3  0x0000555555f841b2 in caml_try_stw_empty_minor_heap_on_all_domains () at runtime/minor_gc.c:799
#4  caml_empty_minor_heaps_once () at runtime/minor_gc.c:820
#5  0x0000555555f76058 in gc_full_major_exn () at runtime/gc_ctrl.c:269
#6  0x0000555555f76b0b in caml_gc_full_major (v=<optimized out>) at runtime/gc_ctrl.c:283
#7  <signal handler called>
#8  0x00005555557e96d0 in camlDune__exe__Main.entry () at bin/main.ml:15
#9  0x00005555557db0db in caml_program ()
#10 <signal handler called>
#11 0x0000555555f8ecfd in caml_startup_common (pooling=<optimized out>, argv=0x7fffffffd6a8) at runtime/startup_nat.c:132
#12 caml_startup_common (argv=0x7fffffffd6a8, pooling=<optimized out>) at runtime/startup_nat.c:88
#13 0x0000555555f8ed6f in caml_startup_exn (argv=<optimized out>) at runtime/startup_nat.c:139
#14 caml_startup (argv=<optimized out>) at runtime/startup_nat.c:144
#15 caml_main (argv=<optimized out>) at runtime/startup_nat.c:151
#16 0x00005555557da642 in main (argc=<optimized out>, argv=<optimized out>) at runtime/main.c:37

It might be related to this issue given that both code examples use Numpy.to_bigarray.

@nathanfarlow
Copy link
Author

The problem is that numpy_finalize is being called more than once with the same ops ptr, so freeing that many times is an issue. For example,

open! Core

let () =
  Py.initialize ();
  let m =
    Py.Import.exec_code_module_from_string
      ~name:"go.py"
      "import numpy as np\na = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32)"
  in
  let big_array = Py.Module.get m "a" |> Numpy.to_bigarray Float32 C_layout in
  let arr = List.init 10 ~f:(Fn.const big_array) in
  List.iter arr ~f:Owl.Dense.Ndarray.S.print;
  Gc.full_major ()
,;;

will cause numpy_finalize to be called 11 times, each with a different v, but Custom_ops_val(v) is the same across calls.

@nathanfarlow
Copy link
Author

Ahah, in bigarray.c, certain operations will copy the Custom_ops_val. One example is in caml_ba_slice.

#98 mentions that slicing is bugged, which makes sense.

@nathanfarlow
Copy link
Author

I'm not sure how to fix this. It seems like we need to do some ref counting in the finalizer unless we know that the subarrays are dead when the original array is finalized. From skimming bigarray.c, I couldn't conclude that since the subarrays don't hold a reference to the original array. In the refcounting case, we could decrement in the finalizer, but I'm not sure where we'd increment the refcount without modifying bigarray.c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant