Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce clock syscalls #4303

Merged
merged 2 commits into from
Aug 23, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions src/ccmain/control.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1312,7 +1312,10 @@ void Tesseract::classify_word_and_language(int pass_n, PAGE_RES_IT *pr_it, WordD
PointerVector<WERD_RES> best_words;
// Points to the best result. May be word or in lang_words.
const WERD_RES *word = word_data->word;
clock_t start_t = clock();
clock_t start_t;
heshpdx marked this conversation as resolved.
Show resolved Hide resolved
if (tessedit_timing_debug) {
start_t = clock();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that the if statement is less expensive than the clock() call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to clock() results in a vdso syscall to the operating system. If you are running tesseract in batch mode with multiple copies running on the same machine, this seemingly innocuous call could result in a storm of useless syscalls to the OS. It is definitely more expensive than a single conditional that all branch predictors would get correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of whether one believes this is slower or faster, the start_t variable has no consumers unless tessedit_timing_debug is enabled.

const bool debug = classify_debug_level > 0 || multilang_debug_level > 0;
if (debug) {
tprintf("%s word with lang %s at:", word->done ? "Already done" : "Processing",
Expand Down Expand Up @@ -1364,8 +1367,8 @@ void Tesseract::classify_word_and_language(int pass_n, PAGE_RES_IT *pr_it, WordD
} else {
tprintf("no best words!!\n");
}
clock_t ocr_t = clock();
if (tessedit_timing_debug) {
clock_t ocr_t = clock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the commit is a good change.

tprintf("%s (ocr took %.2f sec)\n", word_data->word->best_choice->unichar_string().c_str(),
static_cast<double>(ocr_t - start_t) / CLOCKS_PER_SEC);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bad news is that g++ is not clever enough and produces a warning with this PR:

src/ccmain/control.cpp:1373:39: warning: 'start_t' may be used uninitialized [-Wmaybe-uninitialized]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What g++ version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that. I will initialize it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But init is exactly wasted cycles again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to use [[maybe_unused]] attribute here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What g++ version?

g++ (Debian 12.2.0-14) 12.2.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting to check it on gcc-14

Copy link
Member

@stweil stweil Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apple clang version 15.0.0 (clang-1500.3.9.4) and g++-14 (Homebrew GCC 14.1.0_2) 14.1.0 show the same warning. This is not surprising because the compiler must assume that tessedit_timing_debug might be changed between the two conditional statements. Therefore a local assignment const bool timing_debug = tessedit_timing_debug; helps. It fixes the warning for g++-14, but not for clang 15.

}
Expand Down
Loading