Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large string alternations cause exponential preprocessing time #97

Closed
mnxn opened this issue Feb 10, 2021 · 0 comments · Fixed by #106
Closed

Large string alternations cause exponential preprocessing time #97

mnxn opened this issue Feb 10, 2021 · 0 comments · Fixed by #106

Comments

@mnxn
Copy link

mnxn commented Feb 10, 2021

Hi,

I've been trying to port the FStar lexer from Ulex to Sedlex (see FStarLang/FStar#2203) and the ppx hangs when trying to compile a rule that uses this token:

let op_token = [%sedlex.regexp?
  "~" | "-" | "/\\" | "\\/" | "<:" | "<@" | "(|" | "|)" | "#" |
  "u#" | "&" | "()" | "(" | ")" | "," | "~>" | "->" | "<--" |
  "<-" | "<==>" | "==>" | "." | "?." | "?" | ".[|" | ".[" | ".(|" | ".(" |
  "$" | "{:pattern" | ":" | "::" | ":=" | ";;" | ";" | "=" | "%[" |
  "!{" | "[@@" | "[@" | "[|" | "{|" | "[" | "|>" | "]" | "|]" | "|}" | "{" | "|" | "}"]

Surprisingly, splitting this token into 5 smaller parts and matching it as op_token_1 | ... | op_token_5 in the lexer rule compiles fine.

I managed to reproduce the issue with the following file:

let op_token =
  [%sedlex.regexp?
       "1" |  "2" |  "3" |  "4" |  "5" |  "6" |  "7" |  "8" |  "9" | "10"
    | "11" | "12" | "13" | "14" | "15" | "16" | "17" | "18" | "19" | "20"
    | "21" | "22" | "23" | "24" | "25" | "26" | "27" | "28" | "29" | "30"
    | "31" | "32" | "33" | "34" | "35" | "36" | "37" | "38" | "39" | "40" ]

let token buf =
  match%sedlex buf with
  | op_token -> print_endline (Sedlexing.Latin1.lexeme buf)
  | eof -> print_endline "EOF"
  | _ -> failwith "unexpected"

let () =
  let lexbuf = Sedlexing.Latin1.from_string "1234" in
  token lexbuf

I recorded some data from running time dune build that shows build time doubling for every string added:

Number of strings Time
30 1.555
31 2.372
32 3.181
33 5.387
34 9.819
35 18.733
36 36.529
37 72.960
38 147.23
39 304.89
40 614.28

While the ppx is running, my CPU temperature goes up, so it's definitely doing something, but I'm not sure what that is.

Is this expected behavior?

@toots toots closed this as completed in #106 Dec 3, 2021
toots pushed a commit that referenced this issue Dec 3, 2021
Fixes #97 by removing a duplicate evaluation

Signed-off-by: Fangyi Zhou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant