Validate alignments - Segmentation fault with addAlignmentsToBatch #450

gregtatum · 2024-02-15T14:33:40Z

In the en-ca training #384 I'm getting a crash in Marian from the alignments.

Error: Segmentation fault
Error: Aborted from setErrorHandlers()::<lambda(int, siginfo_t*, void*)> in /builds/worker/fetches/marian-source/src/common/logging.cpp:130

marian::data::CorpusBase::  addAlignmentsToBatch  (std::shared_ptr<marian::data::CorpusBatch>,  std::vector<marian::data::SentenceTuple,std::allocator<marian::data::SentenceTuple>> const&) + 0x438
marian::data::Corpus::  toBatch  (std::vector<marian::data::SentenceTuple,std::allocator<marian::data::SentenceTuple>> const&) + 0x1252
marian::data::BatchGenerator<marian::data::CorpusBase>::  fetchBatches  () + 0x1204
marian::ThreadPool::enqueue<marian::data::BatchGenerator<marian::data::CorpusBase>::fetchBatchesAsync()::{lambda()#1}>(marian::data::BatchGenerator<marian::data::CorpusBase>::fetchBatchesAsync()::{lambda()#1}&&)::{lambda()#1}::  operator()  () const + 0x33
std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base,std::__future_base::_Result_base::_Deleter> (),std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::deque<std::shared_ptr<marian::data::CorpusBatch>,std::allocator<std::shared_ptr<marian::data::CorpusBatch>>>>,std::__future_base::_Result_base::_Deleter>,std::__future_base::_Task_state<marian::ThreadPool::enqueue<marian::data::BatchGenerator<marian::data::CorpusBase>::fetchBatchesAsync()::{lambda()#1}>(marian::data::BatchGenerator<marian::data::CorpusBase>::fetchBatchesAsync()::{lambda()#1}&&)::{lambda()#1},std::allocator<int>,std::deque<std::shared_ptr<marian::data::CorpusBatch>,std::allocator<std::shared_ptr<marian::data::CorpusBatch>>> ()>::_M_run()::{lambda()#1},std::deque<std::shared_ptr<marian::data::CorpusBatch>,std::allocator<std::shared_ptr<marian::data::CorpusBatch>>>>>::  _M_invoke  (std::_Any_data const&) + 0x51
std::__future_base::_State_baseV2::  _M_do_set  (std::function<std::unique_ptr<std::__future_base::_Result_base,std::__future_base::_Result_base::_Deleter> ()>*,  bool*) + 0x2d
std::__future_base::_Task_state<marian::ThreadPool::enqueue<marian::data::BatchGenerator<marian::data::CorpusBase>::fetchBatchesAsync()::{lambda()#1}>(marian::data::BatchGenerator<marian::data::CorpusBase>::fetchBatchesAsync()::{lambda()#1}&&)::{lambda()#1},std::allocator<int>,std::deque<std::shared_ptr<marian::data::CorpusBatch>,std::allocator<std::shared_ptr<marian::data::CorpusBatch>>> ()>::  _M_run  () + 0xf0
std::thread::_State_impl<std::thread::_Invoker<std::tuple<marian::ThreadPool::reserve(unsigned long)::{lambda()#1}>>>::  _M_run  () + 0x1a5

In this message from Jorg: https://groups.google.com/g/marian-nmt/c/PjA-rQJ3Oio

To answer my own post: it was indeed a bug in my alignment. I had some links outside of throng of tokens in one language.

So I suspect there is an alignment that is broken somehow in our code. We should validate the alignments. I'll investigate.

The text was updated successfully, but these errors were encountered:

gregtatum · 2024-02-15T14:39:10Z

The crash appears to be in inlined code from: https://github.com/marian-nmt/marian/blob/65bf82ffce52f4854295d8b98482534f176d494e/src/data/corpus_base.cpp#L468-L487

gregtatum · 2024-02-21T20:07:31Z

~~Edit: The pruned lexical list appears broken.~~

Just kidding, I just didn't understand the format

gregtatum · 2024-02-22T22:00:35Z

I've validated the correctness of the generated alignments.

https://firefox-ci-tc.services.mozilla.com/tasks/b-7CDsKNQ_Cn7wf3RcxDXw#artifacts

gregtatum · 2024-03-06T20:04:08Z

This is still happening with OpusTrainer even with no augmentation. I removed it and used Marian directly in my training and worked around this, but we should fix this to support augmentation in teachers.

marco-c · 2024-03-29T09:38:12Z

Was this fixed by #491?

eu9ene · 2024-04-01T17:39:04Z

Yes, the students are training now and we don't see this error after Marian update

* Enables model ensembles Adds the ability to use ensembles of models. This supports ensembles of binary- or npz-format models, as well as mixtures of both. When all models in the ensembles are of binary format, the load from memory path is used. Otherwise, they are loaded via the file system. Enable log-level debug for output related to this. * Fix formatting * Fix WASM bindings for MemoryBundle For now, this does not support ensembles. * Remove shared_ptr wrapping the AlignedMemory of models. * Fix formatting

gregtatum added the bug Something is broken or not correct label Feb 15, 2024

gregtatum mentioned this issue Feb 15, 2024

[Experiment] Train en-ca - Feb 2024 #384

Closed

gregtatum self-assigned this Feb 15, 2024

gregtatum mentioned this issue Mar 4, 2024

Alignments are not updated for the PrefixModifier hplt-project/OpusTrainer#57

Open

gregtatum mentioned this issue Mar 6, 2024

[meta] Ship 30 languages #369

Closed

eu9ene closed this as completed Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate alignments - Segmentation fault with addAlignmentsToBatch #450

Validate alignments - Segmentation fault with addAlignmentsToBatch #450

gregtatum commented Feb 15, 2024 •

edited

Loading

gregtatum commented Feb 15, 2024

gregtatum commented Feb 21, 2024 •

edited

Loading

gregtatum commented Feb 22, 2024

gregtatum commented Mar 6, 2024

marco-c commented Mar 29, 2024

eu9ene commented Apr 1, 2024

Validate alignments - Segmentation fault with addAlignmentsToBatch #450

Validate alignments - Segmentation fault with addAlignmentsToBatch #450

Comments

gregtatum commented Feb 15, 2024 • edited Loading

gregtatum commented Feb 15, 2024

gregtatum commented Feb 21, 2024 • edited Loading

gregtatum commented Feb 22, 2024

gregtatum commented Mar 6, 2024

marco-c commented Mar 29, 2024

eu9ene commented Apr 1, 2024

gregtatum commented Feb 15, 2024 •

edited

Loading

gregtatum commented Feb 21, 2024 •

edited

Loading