More fixes and improvements to pattern matching #64

ViRb3 · 2024-08-17T20:03:58Z

Support pattern negated byte

This was a very simple one.

Optimize pattern sub-matches

With the current implementation of needle search + "truncate right" to handle sub-matches, we end up re-scanning the same regions multiple times. In some cases, this is negligible, in others, it's really bad. There's probably a better way to handle this, but to fix the most basic cases, we now cache each region (start + end address), and skip regex matching if the exact same address was processed before.

Fix one-off pattern mismatch

This prevented one of my test cases to match.

Return end index

Changes the matching function's signature to also return end indexes, in preparation for unit tests.

Deduplicate and sort results

This is workaround for the 2nd issue above.

stevemk14ebr · 2024-08-19T14:33:52Z

I am working on reviewing this, I don't quite understand why we need the caches, they come with a very high overhead if the needle is common and the region being scanned contains the needle a lot - we already have OOM issues on large samples so I am weary of including that particular commit TBH.

Our needle scan shouldn't really return duplicate regions (I could be missing something). Let's say the needle is a simple AA BB and the mem is FF AA 00 BB DD AA BB 00 AA BB 11 we'd get needMatches at 5 and 8. Then lets says the regex the needle was picked from is [0-2] AA BB 00 we'd scan start at the most pessimistic -2 == needleOffset or BB DD AA BB 00 for the first range and BB 00 AA BB 11 for the second range. I can't see how these would every land on the exact same start since our searches start from multiple needle locations.

This reverts commit 6f4badc.

stevemk14ebr · 2024-08-19T15:15:20Z

I've reverted the cache for now, I would consider adding it back if you can prove to me (ideally via a unit test in pattern_test.go) that it is necessary. Right now I cannot justify the memory overhead it introduces and it doesn't appear to me it's necessary for the kind of patterns used by GoReSym itself.

Thanks for the continued improvements!

ViRb3 added 3 commits August 17, 2024 15:15

Support pattern negated byte

750bb57

Optimize pattern sub-matches

6f4badc

Fix one-off pattern mismatch

3b16cc5

stevemk14ebr and others added 5 commits August 19, 2024 15:07

Fix test

f6fc21c

Fix test

74926fe

Merge remote-tracking branch 'refs/remotes/origin/master'

e3a2297

Revert "Optimize pattern sub-matches"

1b96965

This reverts commit 6f4badc.

Simplify data_end calculation

8f2d56b

stevemk14ebr approved these changes Aug 19, 2024

View reviewed changes

stevemk14ebr merged commit d7d9a98 into mandiant:master Aug 19, 2024
2 checks passed

ViRb3 mentioned this pull request Aug 25, 2024

Optimize and omit duplicate pattern matches #66

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More fixes and improvements to pattern matching #64

More fixes and improvements to pattern matching #64

ViRb3 commented Aug 17, 2024 •

edited

Loading

stevemk14ebr commented Aug 19, 2024

stevemk14ebr commented Aug 19, 2024 •

edited

Loading

More fixes and improvements to pattern matching #64

More fixes and improvements to pattern matching #64

Conversation

ViRb3 commented Aug 17, 2024 • edited Loading

stevemk14ebr commented Aug 19, 2024

stevemk14ebr commented Aug 19, 2024 • edited Loading

ViRb3 commented Aug 17, 2024 •

edited

Loading

stevemk14ebr commented Aug 19, 2024 •

edited

Loading