Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rule equality and hash codes #14

Open
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

Palatis
Copy link
Contributor

@Palatis Palatis commented Dec 7, 2024

  • also hash the Type for various rule classes
    • BooleanRule
    • OnFail
    • AtRule
    • NotAtRule
    • SequenceRule (wasn't able to distinguish between a SequenceRule and a ChoiceRule with same inner Rules by hash)
    • ChoiceRule (wasn't able to distinguish between a SequenceRule and a ChoiceRule with same inner Rules by hash)
    • NamedRule (wasn't able to distinguish between a NamedRule and a NodeRule with same inner Rule by hash)
    • NodeRule (wasn't able to distinguish between a NamedRule and a NodeRule with same inner Rule by hash)
    • StringRule (wasn't able to distinguish between a StringRule and CaseInvariantStringRule with same Pattern by hash)
    • CaseInvariantStringRule (wasn't able to distinguish between a StringRule and CaseInvariantStringRule with same Pattern by hash)
    • CharRule
  • hash against the Type instead of name
    • AnyCharRule
    • EndOfInputRule
    • CharSetRule
  • hash the charset for CharSetRule (wasn't able to distinguish between two CharSetRules with different charset)
  • add Type to the hash for CountedRule and its variants (wasn't able to distinguish between these types with same inner Rules by hash)
    • ZeroOrMoreRule
    • OneOrMoreRule
    • OptionalRule
    • CountedRule
  • hash Min and Max for CountedRule (wasn't able to distinguish between two CountedRule with different Min and/or Max but same inner Rules by hash)
  • rework GetHashCode() and Equals() for RecursiveRule
    • now it works pretty well as expected, see the tests for more information.

also hash the `Type` to get a more stable and unique hash.
also hash the `Type` to get a more stable and unique hash.
also hash the `Type` to get a more stable and unique hash.
also hash the `Type` to get a more stable and unique hash.
also hash the `Type` to get a more stable and unique hash.

currently `SequenceRule` and `ChoiceRule` with same child rules will return the same hash code.
this is probably not the desired behavior.
also hash the `Type` to get a more stable and unique hash.

currently `SequenceRule` and `ChoiceRule` with same child rules will return the same hash code.
this is probably not the desired behavior.
also hash the `Type` to get a more stable and unique hash.

currently `NamedRule` and `NodeRule` with same name and child rule will return the same hash code.
this is probably not the desired behavior.
also hash the `Type` to get a more stable and unique hash.

currently `NamedRule` and `NodeRule` with same name and child rule will return the same hash code.
this is probably not the desired behavior.
also hash the `Type` to get a more stable and unique hash.
…HashCode()`

also hash the `Type` to get a more stable and unique hash.
also hash the `Type` to get a more stable and unique hash.
hash the `Type` to get a more stable and unique hash.
hash the `Type` to get a more stable and unique hash.
hash the `Type` to get a more stable and unique hashHash(Chars)
hash `Chars` to distinguish between `CharSetRule`s with different charset.

complies with `CharSetRule.Equals()` where it also compare the `Chars` sequence.
has to cast it to object[] before passing it to the `Hash()` method, else we're hashing the `byte[]` instead of each character within.
also hash the `Type` to get a more stable and unique hash.
`Min` / `Max` should also be considered when generating a hash code.
`Min` / `Max` values should be considered when checking the equality of two `CountedRule`s.
also hash the `Type` to get a more stable and unique hash.

currently, `ZeroOrMoreRule`, `OneOrMoreRule`, and `OptionalRule` all return the same hash if they sharing the same inner `Rule`.
with this patch, `ZeroOrMoreRule` will return a different hash to the other 2 rule types even with same inner `Rule`.
also hash the `Type` to get a more stable and unique hash.

currently, `ZeroOrMoreRule`, `OneOrMoreRule`, and `OptionalRule` all return the same hash if they sharing the same inner `Rule`.
with this patch, `OneOrMoreRule` will return a different hash to the other 2 rule types even with same inner `Rule`.
also hash the `Type` to get a more stable and unique hash.

currently, `ZeroOrMoreRule`, `OneOrMoreRule`, and `OptionalRule` all return the same hash if they sharing the same inner `Rule`.
with this patch, `OptionalRule` will return a different hash to the other 2 rule types even with same inner `Rule`.
also hash the `Type` to get a more stable and unique hash.
just comparing `RuleFunc` will give none-desired result when `RuleFunc` are from different lambda expressions.
for example,
```C#
var a = new RecursiveRule(() => rule);
var b = new RecursiveRule(() => rule);
```
will not compare the same.

so, try compare the `RuleFunc.Method` and `RuleFunc.Target`.

also, change the `GetHashCode()` method to return a same hash for logically equals `RecursiveRule`s by hashing their `RuleFunc.Method` and `RuleFunc.Target`.
@Palatis
Copy link
Contributor Author

Palatis commented Dec 7, 2024

bigger problems are with the RecursiveRule

due to the nature of .Net handles delegate,

  1. two lambda expressions with same expression are different if they're on different location of codes:
var a = () => x;
var b = () => x;
Equals(a, b); // false
a == b; // false

this would become a problem when we're referencing the same rule by lambda expression, for example:

public Rule Identifier => Node(...);
public Rule Term => Node(Identifier | Recursive(() => Term));
public Rule Term2 => Node(Identifier | Recursive(() => Term));
// the two inner `RecursiveRule`s are considered different, even they shares the same lambda expression.
  1. lambda expressions returned by another method with same arguments are different:
public Func<int> GetFunc(int n) => () => n;
/* ... */
var a = GetFunc(0);
var b = GetFunc(0);
Equals(a, b); // false
a == b; // false

this would become a problem when we're reference the same rule by it's name, for example:

public Rule Identifier => Node(...);
public Rule Term => Node(Identifier | Recursive(nameof(Term));
public Rule Term2 => Node(Identifier | Recursive(nameof(Term));
// the two inner `RecursiveRule`s are considered different, even though they shares the same lookup method `() => GetRuleFromName(inner)` defined in `Grammar`.
// because `inner` is a captured variable.

thus, we consider the equality of delegate's Method and Target, also calculate the hash against them.

@Palatis
Copy link
Contributor Author

Palatis commented Dec 7, 2024

i had problems working on the optimizer when hash code and equality are not implemented correctly.

we had to put the rules and optimized rules in a LUT to avoid redundant optimization, but because the hash code are weird the LUT dictionary is working weirdly...

do the faster checks first, only fallback to the more intense Method IL code and Target content checks when absolutely necessary.
because we're actually doing a proper hash codes now, it takes dramatically more time to calculate them.
caculte and cache the hash code value and re-use it after, since `Rule` object is none-mutable, the hash code shouldn't change.
we sealed `Rule.GetHashCode()` and ask the derived class to implement `GetHashCodeInternal()`, and cache the computed hash code to avoid redundant recomputations.

disable the warning to make intellisense less paranoid.
else a `NamedRule` and another `NodeRule` with the same would compare the same.
we probably don't care about that, anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant