Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing fst text lines #1055

Open
armusc opened this issue Sep 11, 2022 · 16 comments
Open

Processing fst text lines #1055

armusc opened this issue Sep 11, 2022 · 16 comments

Comments

@armusc
Copy link
Contributor

armusc commented Sep 11, 2022

Hi

I didn't have this problem before when i installed k2 with conda
I have recently cloned and compiled directly from sources, and I have this problem in reading fst (created by kaldilm)

k2/build_release_cpu_torch_cpu/k2/csrc/fsa_utils.cc:295:void k2::OpenFstStreamReader::ProcessLine(std::string&) Invalid line: 5 0 4
99458 0, eof=true, fail=true, src_state=5, dest_state=0

looks to me that the absence of a cost field in the line causes this issue (i.e. fail=true)
If I add a 0.0 field as a 5th field does not happen

suggestions?

@csukuangfj
Copy link
Collaborator

Are you using the latest master?

@armusc
Copy link
Contributor Author

armusc commented Sep 12, 2022

right, after a merge it worked. Thanks.

by the way, this was when trying to save an arpa for use in LM rescoring.
THe unpruned arpa has 6GB and it causes segmentation fault when saving with torch
torch.save(G.as_dict(), f"{args.lm_dir}/G_4_gram_asdict.new.pt")

if I prune to have a 1 GB, I have no issues
I understand it's not k2, but may be you are aware of this issue?
it's something expected with LMs of that size?

@danpovey
Copy link
Collaborator

Can you show some debug info for the segmentation fault?

@armusc
Copy link
Contributor Author

armusc commented Sep 12, 2022

contrary to what I said, the segmentation fault is caused by the call to

G.as_dict()
rather than to torch.save

I'm not sure if it helps, I run the python script with dgb:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
__memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:500
500 ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory.

@csukuangfj
Copy link
Collaborator

gdb --args python /path/to/xxx.py
(gdb) catch throw
(gdb) run
# When it segfaults
(gdb) backtrace

Please show the backtrace.

@armusc
Copy link
Contributor Author

armusc commented Sep 13, 2022

#0 __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:500
#1 0x00007fff92b8cfcf in k2::Array1 k2::Cat(std::shared_ptrk2::Context, int, k2::Array1 const**) ()
from /home/amuscariello/mediaspeech/k2/build_release_cpu_torch_cpu/lib/libk2context.so
#2 0x00007fff92b85471 in k2::FsaVecToTensor(k2::Raggedk2::Arc const&) () from /home/amuscariello/mediaspeech/k2/build_release_cpu_torch_cpu/lib/libk2context.so
#3 0x00007fff92ea21bb in ?? () from /home/amuscariello/mediaspeech/k2/build_debug_cpu_torch_cpu/lib/_k2.cpython-38-x86_64-linux-gnu.so
#4 0x00007fff92ec6d85 in ?? () from /home/amuscariello/mediaspeech/k2/build_debug_cpu_torch_cpu/lib/_k2.cpython-38-x86_64-linux-gnu.so
#5 0x000055555568ff8e in cfunction_call_varargs (kwargs=0x0, args=0x7ffff7963400, func=0x7fff92f48590) at /usr/local/src/conda/python-3.8.13/Objects/call.c:743
#6 PyCFunction_Call (func=0x7fff92f48590, args=0x7ffff7963400, kwargs=0x0) at /usr/local/src/conda/python-3.8.13/Objects/call.c:773
#7 0x0000555555678651 in _PyObject_MakeTpCall (callable=0x7fff92f48590, args=, nargs=, keywords=)
at /usr/local/src/conda/python-3.8.13/Python/errors.c:219
#8 0x0000555555674471 in _PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x7fff8b097fd8, callable=0x7fff92f48590)
at /usr/local/src/conda/python-3.8.13/Include/cpython/abstract.h:125
#9 _PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x7fff8b097fd8, callable=0x7fff92f48590)
at /usr/local/src/conda/python-3.8.13/Include/cpython/abstract.h:115
#10 call_function (kwnames=0x0, oparg=, pp_stack=, tstate=0x5555558e64a0) at /usr/local/src/conda/python-3.8.13/Python/ceval.c:4963
#11 _PyEval_EvalFrameDefault (f=, throwflag=) at /usr/local/src/conda/python-3.8.13/Python/ceval.c:3469
#12 0x000055555568f886 in PyEval_EvalFrameEx (throwflag=0, f=0x7fff8b097e40) at /usr/local/src/conda/python-3.8.13/Python/ceval.c:738
#13 function_code_fastcall (globals=, nargs=, args=, co=) at /usr/local/src/conda/python-3.8.13/Objects/call.c:284
#14 _PyFunction_Vectorcall (kwnames=, nargsf=, stack=0x555557ca93b8, func=0x7fff8e377b80) at /usr/local/src/conda/python-3.8.13/Objects/call.c:411
#15 _PyObject_Vectorcall (kwnames=, nargsf=, args=0x555557ca93b8, callable=0x7fff8e377b80)
at /usr/local/src/conda/python-3.8.13/Include/cpython/abstract.h:127
#16 method_vectorcall (method=, args=0x555557ca93c0, nargsf=, kwnames=)
at /usr/local/src/conda/python-3.8.13/Objects/classobject.c:60

does that help?

@csukuangfj
Copy link
Collaborator

does that help?

Thanks!

Could you build a debug version of k2 and show the information about

(gdb) frame 1
(gdb) list

@danpovey
Copy link
Collaborator

It calls Cat on 4 arrays, including the arcs linearized to where each arc is 4 int32_t's. The size of that could definitely overflow int32_t, if the number of arcs were more than about 2**(32 - 3) [-1 because it's signed, -2 because of the factor of 4].
I can't see an easy way to fix that without breaking older formats or introducing redundant formats.

@danpovey
Copy link
Collaborator

.. I do see a problem though, at array_ops_in.h:349,
int32_t elem_size = src[0]->ElementSize();
this should be int64_t, so that when we multiply by the size it doesn't overflow.

@armusc
Copy link
Contributor Author

armusc commented Sep 14, 2022

(gdb) frame 1
#1 0x00007fff928de83d in k2::Cat (c=..., num_arrays=4, src=0x7fffffffd020) at /home/amuscariello/mediaspeech/k2/k2/csrc/array_ops_inl.h:353
353 memcpy(static_cast<void *>(ans_data),
(gdb) list
348 // CPU.
349 int32_t elem_size = src[0]->ElementSize();
350 for (int32_t i = 0; i < num_arrays; ++i) {
351 int32_t this_dim = src[i]->Dim();
352 const T *this_src_data = src[i]->Data();
353 memcpy(static_cast<void *>(ans_data),
354 static_cast<const void *>(this_src_data), elem_size * this_dim);
355 ans_data += this_dim;
356 }
357 } else {
(gdb)

@armusc
Copy link
Contributor Author

armusc commented Sep 14, 2022

replacing int32_t with int64_t has indeed solved the problem in my case (6GB 4-gram fst)

@csukuangfj
Copy link
Collaborator

(gdb) print elem_size
(gdb) print this_dim
(gdb) print elem_size * this_dim

to see whether elem_size * this_dim overflows.

@armusc
Copy link
Contributor Author

armusc commented Sep 14, 2022

so anything bigger than 4GB would fail?
a 4gram LM of that size is probably something that can be pruned, but I have seen big HLG

@jtrmal
Copy link
Contributor

jtrmal commented Sep 14, 2022 via email

@armusc
Copy link
Contributor Author

armusc commented Sep 14, 2022

(gdb) print elem_size
(gdb) print this_dim
(gdb) print elem_size * this_dim

to see whether elem_size * this_dim overflows.

(gdb) print elem_size
$2 = 4
(gdb) print this_dim
$3 = 771885848
(gdb) print elem_size * this_dim
$4 = -1207423904
(gdb)

@danpovey
Copy link
Collaborator

@armusc can you please make PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants