-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Processing fst text lines #1055
Comments
Are you using the latest master? |
right, after a merge it worked. Thanks. by the way, this was when trying to save an arpa for use in LM rescoring. if I prune to have a 1 GB, I have no issues |
Can you show some debug info for the segmentation fault? |
contrary to what I said, the segmentation fault is caused by the call to G.as_dict() I'm not sure if it helps, I run the python script with dgb: Thread 1 "python" received signal SIGSEGV, Segmentation fault. |
gdb --args python /path/to/xxx.py
(gdb) catch throw
(gdb) run
# When it segfaults
(gdb) backtrace Please show the backtrace. |
#0 __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:500 does that help? |
Thanks! Could you build a debug version of k2 and show the information about
|
It calls Cat on 4 arrays, including the arcs linearized to where each arc is 4 int32_t's. The size of that could definitely overflow int32_t, if the number of arcs were more than about 2**(32 - 3) [-1 because it's signed, -2 because of the factor of 4]. |
.. I do see a problem though, at array_ops_in.h:349, |
(gdb) frame 1 |
replacing int32_t with int64_t has indeed solved the problem in my case (6GB 4-gram fst) |
(gdb) print elem_size
(gdb) print this_dim
(gdb) print elem_size * this_dim to see whether |
so anything bigger than 4GB would fail? |
not sure if it's related but I think the Kaldi code of arpa2fst is smart
and uses different data type depending on how big the LM is.
I could imagine this causing issue somewhere where the graph would be
processed by another tool not knowing about this.
y.
…On Wed, Sep 14, 2022 at 12:37 PM armusc ***@***.***> wrote:
so anything bigger than 4GB would fail?
a 4gram LM of that size is probably something that can be pruned, but I
have seen big HLG
—
Reply to this email directly, view it on GitHub
<#1055 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACUKYXZ2MFCQJKJBZBE4CELV6GTFPANCNFSM6AAAAAAQJ34YNY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
(gdb) print elem_size |
@armusc can you please make PR? |
Hi
I didn't have this problem before when i installed k2 with conda
I have recently cloned and compiled directly from sources, and I have this problem in reading fst (created by kaldilm)
k2/build_release_cpu_torch_cpu/k2/csrc/fsa_utils.cc:295:void k2::OpenFstStreamReader::ProcessLine(std::string&) Invalid line: 5 0 4
99458 0, eof=true, fail=true, src_state=5, dest_state=0
looks to me that the absence of a cost field in the line causes this issue (i.e. fail=true)
If I add a 0.0 field as a 5th field does not happen
suggestions?
The text was updated successfully, but these errors were encountered: