Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'JSON Compilation Database' sensor slow #2383

Closed
thyros opened this issue May 31, 2022 · 21 comments · Fixed by #2410
Closed

'JSON Compilation Database' sensor slow #2383

thyros opened this issue May 31, 2022 · 21 comments · Fixed by #2410
Milestone

Comments

@thyros
Copy link

thyros commented May 31, 2022

Describe the bug
Hi, I'm trying to run sonar-cxx on my project.
I'm having problems getting it working when I set the sonar.cxx.jsonCompilationDatabase parameter.
Looking at the logs the execution cannot pass the parsing step:

04:51:25 03:51:25.433 DEBUG: Sensors : CXX -> CXX Clang-Tidy report import -> Zero Coverage Sensor -> Java CPD Block Indexer
04:51:25 03:51:25.433 INFO: Sensor CXX [cxx]
04:51:25 03:51:25.479 DEBUG: Parsing 'JSON Compilation Database' format

With the above parameter removed, the whole scan succeeds.

My compiler_commands.json is about 130KB and 35K lines.

Thanks

To Reproduce
Steps to reproduce the behavior:

  1. Generate compile_commands.json using e.g. Ninja Generator
  2. Configure sonar-cxx by setting sonar.cxx.jsonCompilationDatabase
  3. Run sonar-scanner

Expected behavior
Compilation Database is parsed and therefore source files are also parsed correctly

Desktop (please complete the following information):

  • OS: Linux
  • SonarQube version: 8.9.2.46101
  • cxx plugin version: 2.0.7.3119
  • sonar-scanner version: 4.6.2.2472
@thyros thyros changed the title Parsing 'JSON Compilation Database' takes forecer Parsing 'JSON Compilation Database' takes forever May 31, 2022
@thyros
Copy link
Author

thyros commented May 31, 2022

After having the job running overnight, the compilation database finally got parsed.

23:43:46 22:43:46.808 DEBUG: Parsing 'JSON Compilation Database' format
04:50:32 03:50:32.057 DEBUG: sonar.cxx.metric.api.file.suffixes: [.hxx, .hpp, .hh, .h]

Has anyone witnessed a similar issue as well?

@guwirth
Copy link
Collaborator

guwirth commented May 31, 2022

Hello @thyros,

Thanks for your feedback!

Think there was already an discussion about your problem in:

Most likely the problem is #2279

Regards,

@thyros
Copy link
Author

thyros commented Jun 6, 2022

Hi @guwirth,
Thanks for your reply. I read the above threads and yes, it looks like I'm suffering from the preprocessor being slow.

Our use case is that we generate the compilation database for our project and then we remove all compilation units we're not interested in. For example external libraries.
Having the above step makes our final compilation database file contain only files relevant to the project and I'm not sure if we could strip it further anymore. One thing we can do from our side is to try to optimize includes.

@guwirth
Copy link
Collaborator

guwirth commented Jun 6, 2022

Hi @thyros,

Think there a several problems:

  1. SQ works with a root folder and analyzes all .CPP files below root independent if they are in the Compilation Database or not. I’m not aware of an API to change this behavior.
  2. Too many includes.

To 1: Are there a lot of files you don’t like/need to analyze?

Our use case is that we generate the compilation database for our project and then we remove all compilation units we're not interested in

Is the long time after removing ‘not interested’ includes?

To better isolate the problem, can you play around with your Compilation Database:

  1. one try: without Compilation Database => resulting time?
  2. one try: https://github.com/SonarOpenCommunity/sonar-cxx/wiki/sonar.cxx.squid.disabled => resulting time?
  3. one try: remove all includes => resulting time?
  4. one try: remove all defines => resulting time?

By the way: How much LOC has the analyzed code base?

Regards,

@guwirth
Copy link
Collaborator

guwirth commented Jun 6, 2022

Hi @thyros,

Seems SonarSource is/was also working on a solution:
https://blog.sonarsource.com/alternative-way-to-configure-c-and-cpp-analysis/
https://docs.sonarqube.org/latest/analysis/languages/cfamily/

Maybe they did some API extensions we can also use?

But I still found “It is recommended to gather all your code tree in a subdirectory of your project to avoid analysing irrelevant source files like third party dependencies.” I understand this as reading everything below root?

Regards,

@thyros
Copy link
Author

thyros commented Jun 7, 2022

Hi @guwirth
I've already seen cfamily has the compilation database support but I'm not sure about the performance of their solution.

Sorry, I forgot to mention we also specify sonar.sources and sonar.tests to target code which we're only interested in (which also matches files in the compilation database). Luckily, they are already grouped in folders which means all the content of those folders should be scanned.

The size of the project is more or less like that:
Lines of Code 387,555
Lines 536,581
Statements 154,710
Functions 31,236
Classes 3,202
Files 4,921

Regarding too many includes, that's something we can definitely look at. We could probably run include-what-you-use as a first step but anything more will require quite a lot of time.

I can measure different configurations.
What do you mean by 3, 4? Removing includes and definitions from the compilation database?

@guwirth
Copy link
Collaborator

guwirth commented Jun 7, 2022

Hi @thyros,

they are already grouped

Means is than not an issue for you.

The size of the project is more or less like that: LOC 387,555

Comparable projects we are scanning in about 5 minutes.
There is also one bug slowing down scanning: #2131?
Turning on debug info and looking into the LOG file shows you where you are loosing the time:

Why these tests:

1. one try: without Compilation Database => resulting time?

With this we can see if the Compilation Database is the issue or parsing the code is the issue. Forward an empty Compilation Database or remove https://github.com/SonarOpenCommunity/sonar-cxx/wiki/sonar.cxx.jsonCompilationDatabase parameter.

2. one try: https://github.com/SonarOpenCommunity/sonar-cxx/wiki/sonar.cxx.squid.disabled => resulting time?

This is a test without parsing the code, reading only your reports. Maybe reading reports is slow?

What do you mean by 3, 4? Removing includes and definitions from the compilation database?

Yes, to see if the includes or the defines in the compilation database are the problem.

Regards,

@guwirth
Copy link
Collaborator

guwirth commented Jun 8, 2022

Hi @thyros,

maybe you can provide the LOG file or compilation database?

Regards

@thyros
Copy link
Author

thyros commented Jun 10, 2022

Hi @guwirth ,
Sorry for the lack of responsiveness. I got pulled into other tasks at work.
I will come back to the problem next week.
Regards,

@guwirth guwirth pinned this issue Jun 11, 2022
@thyros
Copy link
Author

thyros commented Jun 23, 2022

Hi @guwirth,
It took me some time to come back to you, sorry for the delay.

With sonar.cxx.jsonCompilationDatabase disabled Sensor CXX finishes the job almost in no time

14:58:10.061 INFO: Sensor CXX [cxx]
14:59:02.348 INFO: Sensor CXX [cxx] (done) | time=52287ms
15:00:01.046 INFO: Sensor CXX [cxx]
15:00:54.122 INFO: Sensor CXX [cxx] (done) | time=53076ms

with the property enabled, I was not penitent enough to wait longer than 1 hour to have the scan finished.
Removing all includes or defines from the compile_commands.json did not make any visible difference (Sensor CXX did not finish under 1 hour).

I managed to create a censored version of the log file. Maybe they can give us a hint. build.log
In the meantime, I'll have a look at what I can do regarding the compilation database.

Regards,
Thyros

@guwirth
Copy link
Collaborator

guwirth commented Jun 24, 2022

H @thyros,

thx for your investigations!

With sonar.cxx.jsonCompilationDatabase disabled Sensor CXX finishes the job almost in no time

This could mean two things:

  1. The jsonCompilationDatabase sensor needs a lot of time to read the database.
  2. As a result of the defines and includes the subsequent parsing of the source code becomes slower.

Removing all includes or defines from the compile_commands.json did not make any visible difference (Sensor CXX did not finish under 1 hour).

This is more an indication for 1).

I managed to create a censored version of the log file. Maybe they can give us a hint. build.log

The LOG file stops when it gets interesting. It's starting to parse the compilation database. This is also an indication for 1).

11:32:01.259 INFO: Sensor CXX [cxx]
11:32:01.365 DEBUG: Parsing 'JSON Compilation Database' format
// the execution stops here

Is there something special in your compilation database? Can you provide the database for testing?
The code reading the database is here:

LOG.debug("Parsing 'JSON Compilation Database' format");

It should proceed like this:

18:23:55.889 INFO: Sensor CXX [cxx]
18:23:55.983 DEBUG: sonar.cxx.metric.api.file.suffixes: [.h, .hh, .hpp]
18:23:55.985 DEBUG: 'Complex Functions' metric threshold (cyclomatic complexity): 10
18:23:55.986 DEBUG: 'Big Functions' metric threshold (LOC): 20
18:23:55.991 DEBUG: Evaluate issue exclusions for 'XXX'
18:23:55.993 DEBUG: 'XXX.cpp' generated metadata with charset 'windows-1252'
18:23:56.003 DEBUG: parsing force include: 'YYY.h'
18:23:56.004 DEBUG: process include file 'ZZZ.h'
18:23:56.046 DEBUG: parsing force include: 'Etas.h'
18:23:56.046 DEBUG: process include file 'AAA.h'
18:23:56.217 DEBUG: global include directories: ...
18:23:56.220 DEBUG: global macros: ...

Regards,

@guwirth
Copy link
Collaborator

guwirth commented Jun 25, 2022

Hi @thyros,

in case you can’t provide the compilation database would be helpful to run it over night to have also the LOG items after Parsing 'JSON Compilation Database' format. With the timestamp we can see if parsing the db is really the root cause.

Regards,

@thyros
Copy link
Author

thyros commented Jun 27, 2022

Hi,
I played around with settings a bit and I made sure the compilation database contains only source files that belong to the game (e.g. the engine code is removed).
I wasn't sure if that step was needed since I have already specified the subset of the code to analyze using sonar.sources and sonar.tests. Anyway, doing that seems to decrease the execution time.
We are also looking at removing redundant -I from the compilation database but that will take some time.

I've attached a log from the sensor-cxx, obfuscated a bit but should still be readable.
sensor-cxx.log

Regards,

@guwirth
Copy link
Collaborator

guwirth commented Jun 27, 2022

Hi @thyros,

thank you very much.

14:21:14.747 INFO: Sensor CXX [cxx]
14:21:14.894 DEBUG: Parsing 'JSON Compilation Database' format
15:07:29.415 DEBUG: sonar.cxx.metric.api.file.suffixes: [.hxx, .hpp, .hh, .h]

First issue is reading the Json db needs nearly 50 min. I will analyse why this is so slow.

15:07:30.656 DEBUG: unit macros: [{MACRO:1}, {MACRO:1}, {MACRO:1}, {IMGUI_USER_CONFIG:\"/path/file.h\"}, {BUILD_VARIATION:DEV}, {MACRO:1}, {MACRO:1}, {MACRO:1}, {MACRO:1}, {MACRO:1}, {MACRO:1}, {__cplusplus:201402L}, {MACRO:1}, {MACRO:1}, {MACRO:1}, {MACRO:1}, {MACRO:1}, {MACRO:1}, {__FILE__:"file"}, {__DATE__:"??? ?? ????"}, {MACRO:1}, {MACRO:1}, {MACRO:1}, {MACRO:1}, {MACRO:1}, {MACRO:0}, {MACRO:1}, {MACRO:1}, {__TIME__:"??:??:??"}, {MACRO:1}, {MACRO:1}, {MACRO:1}, {MACRO:1}, {MACRO:1}, {MACRO:1}, {MACRO:1}, {MACRO:1}, {MACRO:1}, {MACRO:1}, {MACRO:1}]

Second point could be optimized, is to move unique unit macros to global macros (which is set only once). Could be done in Json db sensor.

Will have a look to it, but can take some time.

Regards,

@guwirth guwirth added this to the 2.1.0 milestone Jun 30, 2022
guwirth added a commit to guwirth/sonar-cxx that referenced this issue Aug 2, 2022
- improve performance: search in XML tree with XPath is very slow (close SonarOpenCommunity#2383)
- add better Visual Studio support
  - support apostrophes around arguments
  - support include paths with apostrophes
  - optimize parsing arguments
@guwirth
Copy link
Collaborator

guwirth commented Aug 2, 2022

Root cause for slow JSON db reading is slow XPath search in CxxSquidConfiguration:

private Element findLevel(String level, @Nullable Element defaultElement) {

The problem occurs especially with many files (in sample above 4,921), because each file has its own entry in the XML file. XPath search works with an iterator that iterates many times serially over all elements.

guwirth added a commit to guwirth/sonar-cxx that referenced this issue Aug 3, 2022
- improve performance: search in XML tree with XPath is very slow (close SonarOpenCommunity#2383)
- add better Visual Studio support
  - support apostrophes around arguments
  - support include paths with apostrophes
  - optimize parsing arguments
@guwirth guwirth changed the title Parsing 'JSON Compilation Database' takes forever 'JSON Compilation Database' sensor slow Aug 3, 2022
@guwirth
Copy link
Collaborator

guwirth commented Aug 3, 2022

@thyros We found the issue why the JSON DB sensor is so slow, #2410 makes it faster. Please try with latest snapshot and give feedback: https://github.com/SonarOpenCommunity/sonar-cxx/releases/tag/latest-snapshot

@thyros
Copy link
Author

thyros commented Aug 5, 2022

Hi @guwirth, that's great news. I'll do my best to test your latest snapshot but it may take a while.
Thanks for your support.

@guwirth guwirth unpinned this issue Aug 9, 2022
@Kinokin
Copy link

Kinokin commented Sep 20, 2022

Any chance to have a v2.0.8 with this fix? This bug caused more than one hour delay in our analysis runs...

@guwirth
Copy link
Collaborator

guwirth commented Sep 20, 2022

Hi @Kinokin,

did you try https://github.com/SonarOpenCommunity/sonar-cxx/releases/tag/latest-snapshot with #2410? Should be faster...

Regards,

@Kinokin
Copy link

Kinokin commented Sep 21, 2022

Use of un-versioned software is prohibited at our site :-(

@guwirth
Copy link
Collaborator

guwirth commented Sep 21, 2022

Hi @Kinokin,

There is no difference between released versions and the snapshots. Both versions go through the same testing process and have a unique version number.

Next release will be v2.1 with the next SQ 9.x LTS.

Regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants