Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ruby] ANTLR Profiler Summary #4950

Merged
merged 3 commits into from
Sep 27, 2024
Merged

Conversation

DavidBakerEffendi
Copy link
Collaborator

  • Introduced global profiling to summarize rule and parse performance across the project
  • Added a shutdown hook to dump a summary of the profiler rules in a file antlr_summary.log at the root of the project with this information

Examples from below give a good starting point for the currently most expensive rules:

Railsgoat

Summary for project at 'railsgoat'
Total Parsed Files: 85
Total Parse Time (CPU): 47960.30 (ms)
Avg. Parse Time Per File: 564.24 (ms)
Most Expensive File: app/controllers/admin_controller.rb (4232.11 ms)

Most Expensive Rules By Time in Prediction
==========================================
┌──────────────────────────────────┬────────────────────┬────────────────┐
│Rule Name                         │Prediction Time (ms)│Total Lookaheads│
├──────────────────────────────────┼────────────────────┼────────────────┤
│statement                         │21423.95            │26903           │
│expressionOrCommand               │15598.24            │32143           │
│primaryValue                      │4549.64             │31558           │
│methodCallsWithParentheses        │793.60              │3668            │
│methodInvocationWithoutParentheses│729.78              │3426            │
│bracketedArrayElement             │630.86              │9194            │
│command                           │525.43              │1500            │
│methodIdentifier                  │467.97              │2749            │
│commandArgument                   │365.44              │1387            │
│classPath                         │351.54              │462             │
└──────────────────────────────────┴────────────────────┴────────────────┘

Most Expensive Rules By Total SLL & LL Lookaheads
=================================================
┌──────────────────────────────────┬────────────────────┬────────────────┐
│Rule Name                         │Prediction Time (ms)│Total Lookaheads│
├──────────────────────────────────┼────────────────────┼────────────────┤
│expressionOrCommand               │15598.24            │32143           │
│primaryValue                      │4549.64             │31558           │
│statement                         │21423.95            │26903           │
│bracketedArrayElement             │630.86              │9194            │
│methodCallsWithParentheses        │793.60              │3668            │
│methodInvocationWithoutParentheses│729.78              │3426            │
│methodIdentifier                  │467.97              │2749            │
│associationList                   │67.32               │2340            │
│operatorExpression                │39.38               │2235            │
│statements                        │4.35                │1965            │
└──────────────────────────────────┴────────────────────┴────────────────┘

ChatWoot

Summary for project at 'chatwoot'
Total Parsed Files: 760
Total Parse Time (CPU): 1821977.80 (ms)
Avg. Parse Time Per File: 2397.34 (ms)
Most Expensive File: app/models/message.rb (37559.18 ms)

Most Expensive Rules By Time in Prediction
==========================================
┌──────────────────────────────────┬────────────────────┬────────────────┐
│Rule Name                         │Prediction Time (ms)│Total Lookaheads│
├──────────────────────────────────┼────────────────────┼────────────────┤
│expressionOrCommand               │743669.31           │915973          │
│statement                         │705930.12           │641483          │
│primaryValue                      │131346.63           │617939          │
│classPath                         │72364.52            │78083           │
│primaryValueListWithAssociation   │29789.80            │20934           │
│methodCallsWithParentheses        │25917.01            │81873           │
│bracketedArrayElement             │16748.04            │11456           │
│methodInvocationWithoutParentheses│13262.68            │72987           │
│command                           │10633.65            │27423           │
│methodIdentifier                  │10165.80            │64603           │
└──────────────────────────────────┴────────────────────┴────────────────┘

Most Expensive Rules By Total SLL & LL Lookaheads
=================================================
┌──────────────────────────────────┬────────────────────┬────────────────┐
│Rule Name                         │Prediction Time (ms)│Total Lookaheads│
├──────────────────────────────────┼────────────────────┼────────────────┤
│expressionOrCommand               │743669.31           │915973          │
│statement                         │705930.12           │641483          │
│primaryValue                      │131346.63           │617939          │
│methodCallsWithParentheses        │25917.01            │81873           │
│classPath                         │72364.52            │78083           │
│methodInvocationWithoutParentheses│13262.68            │72987           │
│methodIdentifier                  │10165.80            │64603           │
│operatorExpression                │657.86              │43449           │
│associationList                   │7340.97             │38729           │
│statements                        │17.21               │36336           │
└──────────────────────────────────┴────────────────────┴────────────────┘

* Introduced global profiling to summarize rule and parse performance across the project
* Added a shutdown hook to dump a summary of the profiler rules in a file `antlr_summary.log` at the root of the project with this information
@DavidBakerEffendi DavidBakerEffendi added the ruby Relates to rubysrc2cpg label Sep 25, 2024
@DavidBakerEffendi DavidBakerEffendi self-assigned this Sep 25, 2024
private val fileCost = TrieMap.empty[String, Long]
private var projectRoot: Option[Path] = None

sys.addShutdownHook {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wonder if a shutdown hook is really the best option here - feels a bit messy, since we're mixing concerns: the lifetime of the jvm vs. the completion of a scan.
Sure, that's identical in all current use cases, but are we certain that's going to stay like that forever?
I'd feel better if we collect the results somewhere, maybe even pass them properly, and explicitly print them at the end of the analysis.

Copy link
Collaborator Author

@DavidBakerEffendi DavidBakerEffendi Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, my concern was getting metrics on projects that currently don't finish within a couple of minutes, e.g., GitLab. I do, however, understand the concern that shutdown hooks generally aren't for things like this.

Perhaps I could add some flag to detect an early exit, and use the shutdown hook to dump the summary via the hook only if this is the case? Right now the results for each file are written next to each file, so these can be recovered if the scan fails.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. If a scan doesn't finish within a couple of minutes, how does a shutdown hook help with that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we, for example, want to find out a summary of the most expensive rules so far (and not wait for hours), and then we use Ctrl+C to prematurely stop joern-parse, we can still get the results.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpollmeier Just following up on this. If I remove the hook, it's not a big issue, as the conclusion of which rules are expensive are pretty universal across repos. Would that be preferrable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, understood. Given that for current use cases the hook is not a problem, we can just leave it there and add a comment saying it's optional and just for debug output and can be removed if need be. Or something like that...

private val fileCost = TrieMap.empty[String, Long]
private var projectRoot: Option[Path] = None

sys.addShutdownHook {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, understood. Given that for current use cases the hook is not a problem, we can just leave it there and add a comment saying it's optional and just for debug output and can be removed if need be. Or something like that...

@DavidBakerEffendi DavidBakerEffendi merged commit f141437 into master Sep 27, 2024
5 checks passed
@DavidBakerEffendi DavidBakerEffendi deleted the dave/ruby/more-profiling branch September 27, 2024 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ruby Relates to rubysrc2cpg
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants