Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eliminate_duplicate_disjuncts(): Return the discarded disjunct count #1518

Merged
merged 6 commits into from
May 3, 2024

Conversation

ampli
Copy link
Member

@ampli ampli commented May 2, 2024

I created this branch to keep the number of disjuncts up-to-date by adding or substructing it every time a disjunct is added/discarded. The idea was to avoid wasting time on CPU cache misses while iterating long disjunct lists. However, it became very cumbersome, so I discarded this code.

This change in eliminate_duplicate_disjuncts() and the addition of the number of discarded disjuncts to - verbosity=2 remained.

d = eliminate_duplicate_disjuncts(d, false);
unsigned int dnum1 = count_disjuncts(d);
eliminate_duplicate_disjuncts(d, false);
unsigned int dnum1 = dnum0 - eliminate_duplicate_disjuncts(d, false);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly, eliminate_duplicate_disjuncts() is being called twice. Is that right?

Elsewhere, you call it twice, with arguments false and true but here it is false both times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You caught a bug. The intention was to call it only once.
The idea here is to provide information on the number of duplicate disjuncts in the expression of the given word. With two calls, it would never report any duplicates. I will force-push a fix.

Without this fix:

linkparser> !!was.w-d//
Token "was.w-d" matches:
    was.w-d                        448 disjuncts


Token "was.w-d" disjuncts:
    was.w-d                        448/448 disjuncts

         was.w-d: [0] 1.000= @E- Ss- <> Xc+ Vv+ VC+
         was.w-d: [1] 0.000= @E- Ss- dIV- <> Xc+ Vv+ VC+
         was.w-d: [2] 0.000= @E- Ss- dIV- <> Xc+ Vv+ VC+ VC+
         ...

With the fix:

linkparser> !!was.w-d//
Token "was.w-d" matches:
    was.w-d                        448 disjuncts


Token "was.w-d" disjuncts:
    was.w-d                        352/448 disjuncts

         was.w-d: [0] 1.000= @E- Ss- <> Xc+ Vv+ VC+
         ...

Regarding the calls with false and true, this is done on a wildcard word to eliminate duplicate disjuncts with identical categories and then condense all the remaining duplicate disjuncts (disregarding the category). However, there is a code rot there, seemingly because I changed the implementation w/o fixing this code. It is inefficient, and when I look at this code again from a distance, I find nasty bugs, which may explain some strange things. I already rewrote part of the generation code for speed (and I know how to fix finding unused disjuncts), which I still need to continue. Instead, I'm now working on rewriting the counting stuff because I saw how to speed it up by a significant factor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and when I look at this code again from a distance, I find nasty bugs, which may explain some strange things.

To clarify, I referred to generation mode. I didn't find bugs regarding parsing mode. However, the speed can be improved (future PR).

ampli added 6 commits May 3, 2024 21:30
No need to return the disjunct list because the deletion is implemented
"in place", and the first disjunct is never deleted. Thus the given
argument of disjunct list still points to the list after duplicate
removal. Returning the count is more useful since it, e.g., enables to
find out the total disjuncts removed in the whole sentence.
@ampli ampli force-pushed the disjunct-count branch from 0a030cb to a9f081c Compare May 3, 2024 18:31
@linas linas merged commit ea8c04e into opencog:master May 3, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants