Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory issues in ScrollView::MessageReceiver #3872

Merged
merged 3 commits into from
Jul 18, 2022

Conversation

p12tic
Copy link
Contributor

@p12tic p12tic commented Jul 18, 2022

This PR fixes memory issues that caused #3869. The first commit fixes the actual bug (unique_ptr::release() should be used instead of unique_ptr::get()). The second fixes the underlying problem of using plain pointers that caused the bug in the first place.

Fixes #3869.

Supersedes #3870

FYI @rmast @stweil.

@p12tic
Copy link
Contributor Author

p12tic commented Jul 18, 2022

@rmast Could you please check whether this PR fixes the issue you were seeing. I didn't try to reproduce the problem, only diagnosed the problem by looking into source code.

@stweil
Copy link
Member

stweil commented Jul 18, 2022

I still get an error when running with valgrind:

$ valgrind bin/debug/clang/tesseract -c invert_threshold=0.9 --dpi 300 -l Latin -c textord_debug_tabfind=1 -c textord_debug_bugs=1 -c tessedit_reject_block_percent=1 -c tessedit_reject_row_percent=1 -c debug_noise_removal=1 https://user-images.githubusercontent.com/3341558/179373903-ef6cc246-f4e5-4633-a762-ded4dd22708f.jpg output82
==1212620== Memcheck, a memory error detector
==1212620== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1212620== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==1212620== Command: bin/debug/clang/tesseract -c invert_threshold=0.9 --dpi 300 -l tessdata_fast/script/Latin -c textord_debug_tabfind=1 -c textord_debug_bugs=1 -c tessedit_reject_block_percent=1 -c tessedit_reject_row_percent=1 -c debug_noise_removal=1 https://user-images.githubusercontent.com/3341558/179373903-ef6cc246-f4e5-4633-a762-ded4dd22708f.jpg output81
==1212620== 
Vertical skew vector=(7,13173)
Starting sh -c "trap 'kill %1' 0 1 2 ; java -Xms1024m -Xmx2048m -jar ./ScrollView.jar & wait"
ScrollView: Waiting for server...
Socket started on port 8461
Client connected
Click at (855, 2749)
Click at (855, 2749)
==1212620== Thread 2:
==1212620== Invalid read of size 8
==1212620==    at 0x54AF90: tesseract::SVEvent::~SVEvent() (scrollview.h:70)
==1212620==    by 0x6013EA: std::default_delete<tesseract::SVEvent>::operator()(tesseract::SVEvent*) const (unique_ptr.h:85)
==1212620==    by 0x60050C: std::unique_ptr<tesseract::SVEvent, std::default_delete<tesseract::SVEvent> >::~unique_ptr() (unique_ptr.h:361)
==1212620==    by 0x5FD91F: tesseract::ScrollView::MessageReceiver() (scrollview.cpp:184)
==1212620==    by 0x604056: void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) (invoke.h:60)
==1212620==    by 0x603FEC: std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) (invoke.h:95)
==1212620==    by 0x603FC4: void std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (thread:264)
==1212620==    by 0x603F94: std::thread::_Invoker<std::tuple<void (*)()> >::operator()() (thread:271)
==1212620==    by 0x603EED: std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run() (thread:215)
==1212620==    by 0x50D3ECF: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==1212620==    by 0x4FEBEA6: start_thread (pthread_create.c:477)
==1212620==    by 0x52E9DEE: clone (clone.S:95)
==1212620==  Address 0xc15e4a0 is 16 bytes inside a block of size 48 free'd

@stweil
Copy link
Member

stweil commented Jul 18, 2022

With this additional change Tesseract no longer crashes:

diff --git a/src/viewer/scrollview.cpp b/src/viewer/scrollview.cpp
index 087dabe8..34d339f7 100644
--- a/src/viewer/scrollview.cpp
+++ b/src/viewer/scrollview.cpp
@@ -347,7 +347,7 @@ void ScrollView::StartEventHandler() {
       if (new_event->type == SVET_DESTROY) {
         // Signal the destructor that it is safe to terminate.
         event_handler_ended_ = true;
-        delete new_event; // Delete the pointer after it has been processed.
+        //delete new_event; // Delete the pointer after it has been processed.
         return;
       }
       delete new_event; // Delete the pointer after it has been processed.

I still don't know whether memory leaks remain.

@rmast
Copy link

rmast commented Jul 18, 2022 via email

@stweil
Copy link
Member

stweil commented Jul 18, 2022

I still don't know whether memory leaks remain.

There remain two direct leaks (48 + 16 bytes) and an indirect leak (4 bytes). They are shown when I build tesseract with sanitizers (./configure --disable-shared --enable-debug CXX=clang++-11 'CXXFLAGS=-D_GLIBCXX_DEBUG -Wall -g -O0 -fsanitize=address,undefined -fstack-protector-strong').

p12tic added 3 commits July 18, 2022 18:04
waiting_for_events takes ownership of the passed event which is later
deleted. Since we use unique_ptr::get() to acquire the pointer, we cause
double free: one free happens in the code path where the event from
waiting_for_events goes and the other free happens in unique_ptr
destructor.

The fix is to move ownership out of unique_ptr by unique_ptr::release().

Fixes: tesseract-ocr#3869
Fixes: 37b3374
The current usage of waiting_for_events is taking ownership of SVEvent
pointer from a unique_ptr. This is error prone as all code paths using
waiting_for_events need to ensure deletion. We fix it by using
unique_ptr in waiting_for_events and all dependent code paths.
@p12tic p12tic force-pushed the fix-scrollview-double-free branch from bd7e57b to 0107687 Compare July 18, 2022 15:04
@p12tic
Copy link
Contributor Author

p12tic commented Jul 18, 2022

I still get an error when running with valgrind

@stweil I couldn't reproduce the problem and looking into code this crash seems to be something different than what the initial two commits fix.

Anyway, I have converted event_table_ to std::unique_ptr too. This should fix any memory handling bugs in that area.

@rmast
Copy link

rmast commented Jul 18, 2022 via email

@p12tic
Copy link
Contributor Author

p12tic commented Jul 18, 2022

Thanks @rmast for testing.

Copy link
Member

@stweil stweil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My test also passes without problems, and there remains no memory leak.

Thank you very much for this contribution!

@stweil stweil merged commit e589bfa into tesseract-ocr:main Jul 18, 2022
@amitdo
Copy link
Collaborator

amitdo commented Jul 18, 2022

Nice work, thanks.

@p12tic p12tic deleted the fix-scrollview-double-free branch August 11, 2022 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression: 4.1.3 completes run of ScrollView-debugger. Both 5.0.0 and 5.2.0 dump.
4 participants