eliminate_duplicate_disjuncts(): Return the discarded disjunct count #1518

ampli · 2024-05-02T16:01:50Z

I created this branch to keep the number of disjuncts up-to-date by adding or substructing it every time a disjunct is added/discarded. The idea was to avoid wasting time on CPU cache misses while iterating long disjunct lists. However, it became very cumbersome, so I discarded this code.

This change in eliminate_duplicate_disjuncts() and the addition of the number of discarded disjuncts to - verbosity=2 remained.

linas · 2024-05-03T16:00:17Z

link-grammar/dict-common/print-dict.c

-		d = eliminate_duplicate_disjuncts(d, false);
-		unsigned int dnum1 = count_disjuncts(d);
+		eliminate_duplicate_disjuncts(d, false);
+		unsigned int dnum1 = dnum0 - eliminate_duplicate_disjuncts(d, false);



If I'm reading this correctly, eliminate_duplicate_disjuncts() is being called twice. Is that right?

Elsewhere, you call it twice, with arguments false and true but here it is false both times.

You caught a bug. The intention was to call it only once.
The idea here is to provide information on the number of duplicate disjuncts in the expression of the given word. With two calls, it would never report any duplicates. I will force-push a fix.

Without this fix:

linkparser> !!was.w-d// Token "was.w-d" matches: was.w-d 448 disjuncts Token "was.w-d" disjuncts: was.w-d 448/448 disjuncts was.w-d: [0] 1.000= @E- Ss- <> Xc+ Vv+ VC+ was.w-d: [1] 0.000= @E- Ss- dIV- <> Xc+ Vv+ VC+ was.w-d: [2] 0.000= @E- Ss- dIV- <> Xc+ Vv+ VC+ VC+ ...

With the fix:

linkparser> !!was.w-d// Token "was.w-d" matches: was.w-d 448 disjuncts Token "was.w-d" disjuncts: was.w-d 352/448 disjuncts was.w-d: [0] 1.000= @E- Ss- <> Xc+ Vv+ VC+ ...

Regarding the calls with false and true, this is done on a wildcard word to eliminate duplicate disjuncts with identical categories and then condense all the remaining duplicate disjuncts (disregarding the category). However, there is a code rot there, seemingly because I changed the implementation w/o fixing this code. It is inefficient, and when I look at this code again from a distance, I find nasty bugs, which may explain some strange things. I already rewrote part of the generation code for speed (and I know how to fix finding unused disjuncts), which I still need to continue. Instead, I'm now working on rewriting the counting stuff because I saw how to speed it up by a significant factor.

and when I look at this code again from a distance, I find nasty bugs, which may explain some strange things.

To clarify, I referred to generation mode. I didn't find bugs regarding parsing mode. However, the speed can be improved (future PR).

No need to return the disjunct list because the deletion is implemented "in place", and the first disjunct is never deleted. Thus the given argument of disjunct list still points to the list after duplicate removal. Returning the count is more useful since it, e.g., enables to find out the total disjuncts removed in the whole sentence.

Just for eliminating IDE errors.

…sjuncts()

linas reviewed May 3, 2024

View reviewed changes

ampli added 6 commits May 3, 2024 21:30

prepare_to_parse(): In -v=2, print the number of deleted disjuncts

de4c1d6

Align the end of the print_time() output to a more distance column

1b18695

word-structures.h: Add include files for integer types

7588853

Just for eliminating IDE errors.

wildcard word: Print the number of deleted disjuncts

dc8fb5b

display_disjuncts(): Use the returned value of eliminate_duplicate_di…

a9f081c

…sjuncts()

ampli force-pushed the disjunct-count branch from 0a030cb to a9f081c Compare May 3, 2024 18:31

linas merged commit ea8c04e into opencog:master May 3, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eliminate_duplicate_disjuncts(): Return the discarded disjunct count #1518

eliminate_duplicate_disjuncts(): Return the discarded disjunct count #1518

ampli commented May 2, 2024

linas May 3, 2024

ampli May 3, 2024

ampli May 3, 2024

eliminate_duplicate_disjuncts(): Return the discarded disjunct count #1518

eliminate_duplicate_disjuncts(): Return the discarded disjunct count #1518

Conversation

ampli commented May 2, 2024

linas May 3, 2024

Choose a reason for hiding this comment

ampli May 3, 2024

Choose a reason for hiding this comment

ampli May 3, 2024

Choose a reason for hiding this comment