Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heuristic #1 is very restricted and result in false negatives #49

Open
raghavgarg1257 opened this issue Jul 19, 2021 · 4 comments
Open

Comments

@raghavgarg1257
Copy link

Context: Heuristic#1: Star height > 1, dictates there should be no repetition inside of repetition.

Issue: The regex in question is /abcd(-[0-9a-z]{10,20}){2}/, which has repetition inside of repetition but it is not a vulnerable pattern because of fixed range quantifier.

Probable Improvements:

  • Make the Heuristic#1 configurable like the other Heuristic#2, which takes in options from the user and matches the start height to that particular config.
  • Usage of Range Quantifier as a factor for vulnerability.

Please share your thoughts on the above improvements and their feasibility. I will be happy to raise a PR for it. :)

@raghavgarg1257
Copy link
Author

@davisjam Can you please share your views on this?

@davisjam
Copy link
Owner

davisjam commented Jul 30, 2021

Hi @raghavgarg1257. Thanks for your interest!

For a sound approach, see #17. I have some starter code I can share for this.

For a simpler improvement to the heuristic, I would be happy to give feedback on an approach that:

  • Preserves the existing behavior.
  • Can be configured to treat as safe any nested bounded repetition for which the total amount of ambiguity is "not too big", measured using the product of the size of the ranges involved. For example, ((a{5,10}){5,50}){0,12} would have a total amount of ambiguity of (10-5)(50-5)(12-0) = 2700.
  • Can be configured to treat as safe any nested bounded repetition for which each part has an upper bound -- as in your example. This would be a special case of the more general bounding strategy described in the previous bullet point.

Please @ me in any replies or PRs.

@raghavgarg1257
Copy link
Author

Hello @davisjam, Thanks for sharing your thoughts. :)

I will share an approach keeping the above points in mind and will try to create a poc for the same, before that I have a couple of doubts.

  • How can we define "not too big ambiguity", in your opinion? I mean coming up with a static upper limit might not be the best thing to do, right?
  • Readme #17 is a PR related to the updation of Readme. I guess you meant to share something else.

@davisjam
Copy link
Owner

davisjam commented Aug 4, 2021

Not too big ambiguity

Well, you can put an upper bound on the ambiguity (possibly a very loose upper bound) as I described above. Then the user can provide their desired bound. A little desktop benchmarking can be used to find a "recommended" value, e.g. 50 or 100 or etc.

#17

Oops, I meant #27.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants