Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make list classes templated #4356

Merged
merged 17 commits into from
Nov 22, 2024
Merged

Make list classes templated #4356

merged 17 commits into from
Nov 22, 2024

Conversation

egorpugin
Copy link
Contributor

@egorpugin egorpugin commented Nov 22, 2024

This is a part of lists de-entanglement PR series.

In order to simplify lists (old old C style code), we need to make them templates instead of types emulation using void * pointers.

This PR handles the simplest class - CLIST.
See commit messages for details on changes.
No functional changes intended.

Make CLIST templated. Move member methods inside the class. Move help…
Use real CLASSNAME type for list. Update sorting callback signatures.
Make simple classes simpler.

cc: @stweil

ps.: I'd like to run tests on CI for this change, how can I do that now?

…er classes (CLIST_LINK and CLIST_ITERATOR inside the list class).

This allows us to use real C++ templates for different instantiations instead of void * emulation.
@egorpugin
Copy link
Contributor Author

egorpugin commented Nov 22, 2024

And we need this as part of updating memory management to unique ptrs.
So types are real types and not void *.

Feel free to contribute and push changes to this PR, so we can update things together (like Makefile changes).

@egorpugin egorpugin requested review from stweil and amitdo November 22, 2024 00:59
@egorpugin
Copy link
Contributor Author

Tested on windows, no regressions.

@egorpugin
Copy link
Contributor Author

egorpugin commented Nov 22, 2024

part2

Converted ELIST, not pushing it at the moment to keep this PR more clear.

Tested on windows.

Test results:
TOTAL:   63
PASSED:  60
FAILED:  0
SKIPPED: 3

@egorpugin
Copy link
Contributor Author

And ELIST2 is completed.

src/ccutil/clst.cpp Outdated Show resolved Hide resolved
@egorpugin egorpugin changed the title Make CLIST templated Make list classes templated Nov 22, 2024
@stweil
Copy link
Member

stweil commented Nov 22, 2024

CI tests fail. Here is the first failure from make check:

% ./layout_test 
layout_test(46416,0x1ec96f840) malloc: nano zone abandoned due to inability to reserve vm space.
Running main() from ../../../unittest/third_party/googletest/googletest/src/gtest_main.cc
[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from LayoutTest
[ RUN      ] LayoutTest.ArraySizeTest
[       OK ] LayoutTest.ArraySizeTest (0 ms)
[ RUN      ] LayoutTest.UNLV8087_054
=================================================================
==46416==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x613000099e28 at pc 0x000105951a08 bp 0x00016b412010 sp 0x00016b412008
READ of size 8 at 0x613000099e28 thread T0

resultiterator_test also shows a heap-buffer-overflow.

@egorpugin
Copy link
Contributor Author

I see them running currently. Where is that error from?

@egorpugin
Copy link
Contributor Author

Ok, seems CIs are happy. Atleast what I see in this PR.

@egorpugin
Copy link
Contributor Author

egorpugin commented Nov 22, 2024

It is probably worth to rename those lists to more clear names.

CLIST -> ConsList
ELIST -> IntrusiveList or IntrusiveLinkedList or IntrusiveForwardList (like in STL one directional list)
ELIST2 -> IntrusiveDoublyLinkedList or IntrusiveBidirectionalList or IntrusiveList (like in STL bidi list)

@stweil
Copy link
Member

stweil commented Nov 22, 2024

With the latest code (c3bf6a2) I now get a lot of compiler warnings like this one:

../../../src/ccstruct/stepblob.h:38:1: warning: 'C_BLOB_IT' defined as a struct here but previously declared as a class; this is valid, but may result in linker errors under the Microsoft C++ ABI [-Wmismatched-tags]

The unittests now seem to pass with sanitizers, too (still running).

@egorpugin
Copy link
Contributor Author

Reverted to classes.

@stweil
Copy link
Member

stweil commented Nov 22, 2024

@egorpugin, are you planning to add more commits, or is your pull request now ready for review and merging?

Did you compare old and new code size and performance?

@@ -34,7 +34,7 @@ namespace tesseract {

#define INTERSECTING INT16_MAX

int lessthan(const void *first, const void *second);
int lessthan(const ICOORDELT *first, const ICOORDELT *second);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int lessthan(const ICOORDELT *first, const ICOORDELT *second);
static int lessthan(const ICOORDELT *first, const ICOORDELT *second);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll reconsider those functions. Maybe change to lambdas one-use funcs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be fine for if this is done later. If you want, the PR can be merged as soon as the missing include statements which currently break my local build were added. Should we squash the commits, or do you prefer to keep them all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR has 2-3 unrelated tiny fixes around msvc warnings, sw build.

Not sure what is better here.
I'd prefer to keep separate commits because of separate changes to each of three lists.

const ICOORDELT *p1 = *reinterpret_cast<const ICOORDELT *const *>(first);
const ICOORDELT *p2 = *reinterpret_cast<const ICOORDELT *const *>(second);

int lessthan(const ICOORDELT *p1, const ICOORDELT *p2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int lessthan(const ICOORDELT *p1, const ICOORDELT *p2) {
static int lessthan(const ICOORDELT *p1, const ICOORDELT *p2) {

@@ -19,16 +19,13 @@
#ifndef ELST_H
#define ELST_H

#include "list.h"
#include "lsterr.h"
#include "serialis.h"

#include <cstdio>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <cstdio>
#include <algorithm> // for std::sort
#include <cstdio>

@@ -19,696 +19,980 @@
#ifndef CLST_H
#define CLST_H

#include "list.h"
#include "lsterr.h"
#include "serialis.h"

#include <cstdio>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <cstdio>
#include <algorithm> // for std::sort
#include <cstdio>

@@ -19,16 +19,13 @@
#ifndef ELST2_H
#define ELST2_H

#include "list.h"
#include "lsterr.h"
#include "serialis.h"

#include <cstdio>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <cstdio>
#include <algorithm> // for std::sort
#include <cstdio>

@egorpugin
Copy link
Contributor Author

I'll address your comments and we can merge after.

I did not measure perf and size.
I think size will increase because of templates.
And perf is also interesting for me to see, but I never run tess benchmarks.
So if you have anything easily runnable on your system/setup, would be nice to see.

@egorpugin
Copy link
Contributor Author

Should be done now.

Renaming lists, if necessary, and other changes can go into separate PRs.

Copy link
Member

@stweil stweil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this nice work.

@egorpugin egorpugin merged commit 66cf74f into tesseract-ocr:main Nov 22, 2024
2 checks passed
@stweil
Copy link
Member

stweil commented Nov 22, 2024

@egorpugin, I'm sorry, but I'm afraid that I was too fast: recodebeam_test and intsimdmatrix_test fail now.

@egorpugin
Copy link
Contributor Author

They fail before my changes.
After commit 32fee19

@stweil
Copy link
Member

stweil commented Nov 22, 2024

Yes, indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants