-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catastrophic performances on OCaml files #519
Comments
Nearly all of the time is spend in initializing the query in the tree-sitter crate. I don't think our OCaml queries are so crazy that they should cause such an issue, but we should investigate anyway. Otherwise this is an upstream issue that we've run into (tree-sitter or tree-sitter-ocaml). |
For a tiny bit of context, initially, I was just complaining to @nbacquey that Topiary was taking more than 1s per file for simple files. He then noticed that I was using v0.2.1 and asked me to try the latest version... that happened to be much much much worse. After we fix this, we should still investigate was to make normal runs faster. cf issue by @aspiwack above. |
…caml-files Resolve #519 catastrophic performances on ocaml files
Was this closed by mistake? Or is there another tracking issue now that #522 is merged? |
I think I can now open the issue I had initially about Topiary being quite slow (with quite slow = one second per file). edit: done in #525 |
You can (though there is #523 which tracks some of it). But we still need to understand what's going on in the last version Ocaml grammar and query, and address this issue. |
This was closed automatically by GitHub. |
Quick benchmark of running Topiary (as of 738a178) compiled in release mode on all the
a few times while bisecting on tree-sitter/tree-sitter-ocaml. I was expecting one commit changing the behaviour violently, but I actually found two so there were two different bisections. Here are the results:
The column “query” tells whether I use Topiary as of 738a178 or if I include the new query as per 738a178. The columns 1 and 2 represent the two bisections and what I told git. The column “time” is to be taken with a grain of salt, especially for the longer times that I repeated less often. There seems to be two different performance drops making Topiary approximately 3x slower (which, as the flamegraph suggests, should be almost linear in the time spent in the query initialisation).
|
The first change (tree-sitter/tree-sitter-ocaml@d025214) merely adds the ability to write |
The changes are massive, though, so I guess it all depends on what tools were used to re-generate the parsers from the grammar. Maybe those tools too were updated? |
(It is also entirely possible that my measurements are flawed in one way or another.) |
Ah, yes, the generated C files both advertise 500k line diffs (whereas |
Between those two commits, where the time blows up, the number of internal tree-sitter state machine states increases from 9769 to 24361. So it is probably worth for the Topiary team to invest some time into optimizing the ocaml grammar. |
It looks like I was using tree-sitter-cli 0.20.7 to generate the parser before tree-sitter/tree-sitter-ocaml@d025214, and accidentally switched back to 0.20.6. That can explain the difference, since the actual change in the grammar is very small. The number of states has indeed increased a lot because of tree-sitter/tree-sitter-ocaml@f4214a1. The actual structure of the grammar didn't change that much, but a lot of anonymous nodes became names nodes. It can be combination of slower parsing (although more states doesn't necessarily mean slower parsing) and the more complicated query. Operators are used a lot, so if the query becomes a bit slower that can have a big impact. |
Hej! I've done some experimentation using different tree-sitter versions, I'll post my results in a bit. |
@314eter for the record, in our current measurements, we basically can't see the time spent parsing. The time is overwhelmingly dominated by compiling the query. Compiling queries maybe scales super-linearly with the number of states of the grammar? This is what our current state of understanding would suggest. |
@314eter Would you be able to re-generate the OCaml |
I published a new release: https://github.com/tree-sitter/tree-sitter-ocaml/releases/tag/v0.20.3 |
@ErinvanderVeen Now that #533 is merged, is there still a reason to keep this one open? |
Describe the bug
On trivial OCaml files, Topiary in its latest version takes forever to run.
To Reproduce
On my machine, this takes about 20s:
It was confirmed on @aspiwack's machine as well.
Expected behavior
I would expect Topiary to run under one second.
Environment
v0.2.1,v0.2.2 and master all seem affected.edit: v0.2.1 is slow, but not catastrophically slow; cf Topiary is pretty slow on formatting OCaml codebases #525
Additional context
Bisect and flamegraph coming soon.
The text was updated successfully, but these errors were encountered: