-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improved BDD Unicode table representation in NonBacktracking engine #61142
Conversation
Tagging subscribers to this area: @eerhardt, @dotnet/area-system-text-regularexpressions Issue DetailsMain updates:
|
...s/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/Algebras/BDD.cs
Show resolved
Hide resolved
...s/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/Algebras/BDD.cs
Outdated
Show resolved
Hide resolved
...ext.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/Algebras/CharSetSolver.cs
Outdated
Show resolved
Hide resolved
BDD bdd = BDD.True; | ||
for (int k = 0; k < 16; k++) | ||
{ | ||
bdd = (c & (1 << k)) == 0 ? GetOrCreateBDD(k, BDD.False, bdd) : GetOrCreateBDD(k, bdd, BDD.False); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there cheaper ways to build up a BDD? Maybe the caching involved helps, but it seems like otherwise this is going to incrementally build up the BDD by creating 15 intermediate ones that are then thrown away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are better ways, I think, but this would involve using e.g. a designated array and non-object base representation with own memory-management over that array.
However this incremental build only happens once per ASCII character, I think it is negligible.
...s/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/Algebras/BDD.cs
Outdated
Show resolved
Hide resolved
edd1b5a
to
f67d79c
Compare
Co-authored-by: Dan Moseley <[email protected]>
Co-authored-by: Stephen Toub <[email protected]>
Co-authored-by: Dan Moseley <[email protected]>
6d154a4
to
a4c1e04
Compare
Main updates:
byte[]
instead oflong[]
for saving serialization space used for these arrays. Overall this cut space requirements by at least half.\w
, instead deriving it from the 8 Unicode categories 0,1,2,3,4,5,8,18CharSetSolver._charPredTable
to ASCII only as it is almost never used for NonASCII but took up128kB space for all Unicode chars but essentially for no good reason.