Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysing the project takes extremely long time #2267

Closed
Artalus opened this issue Oct 22, 2021 · 6 comments
Closed

Analysing the project takes extremely long time #2267

Artalus opened this issue Oct 22, 2021 · 6 comments

Comments

@Artalus
Copy link

Artalus commented Oct 22, 2021

Describe the bug
Basically, see #2228. I ran sonar-scanner on our project over night, and it did not finish after 18 hours. I used this command:

sonar-scanner
  -Dsonar.projectKey=trassir
  -Dsonar.sources=./source
  -Dsonar.host.url=****
  -Dsonar.login=****
  -Dsonar.password=****
  -Dsonar.pvs-studio.reportPath=./pvs/pvs-report.xml
  -Dsonar.pvs-studio.licensePath=./pvs/pvs.license
  -Dsonar.cxx.jsonCompilationDatabase=./build/compile_commands.json
  -Dsonar.inclusions='**/*.cpp,**/*.h,**/*.c'
  -Dsonar.exclusions='...some paths...'

From my understanding, the scanner completely ignores the compile_commands.json file and tries to analyze everything there is in working directory. Unless I provided exclusions=, I could see multiple parsing errors in unrelated files, like sources of ffmpeg that are checked our with our dependencies repo. It also does this in single-threaded mode.

When using -Dsonar.cxx.squid.disabled=true, the scanner finishes in about 1-2minutes, but does not provide any syntax highlighting (as expected).

Ways to improve the situation that I can see:

  • optimize the scan process itself (preprocessing, etc.), as was mentioned in Ability to disable CxxSquidSensor #2228 (comment)
  • process files in parallel to utilize all cores available on the machine (16 in our case)
  • process only the files mentioned in compilation database and ignore any thirdparty headers.

To Reproduce
Run scanner over several millions LOC, I guess?..

Expected behavior
The scan to complete in reasonable amount of time

Desktop (please complete the following information):

  • OS: Linux
  • SonarQube version: Community Edition 8.9.2 (build 46101)
  • cxx plugin version: 2.0.6 master snapshot f267915ee0878040af16339f6e3e4bd4544ffe38
  • sonar-scanner version: 4.6.2.2472
@guwirth
Copy link
Collaborator

guwirth commented Oct 22, 2021

Hi @Artalus,

thanks for your feedback.

process only the files mentioned in compilation database and ignore any thirdparty headers

Like to start with the last point first because that would be clearly a bug in our plugin. Maybe you can turn debug info on and verify it: https://github.com/SonarOpenCommunity/sonar-cxx/wiki/Get-Debug-Information.

I understood you are using: https://github.com/SonarOpenCommunity/sonar-cxx/wiki/sonar.cxx.jsonCompilationDatabase? That would mean all .CPP files listed there under "file" are handled. In case include files of .CPP files can be found (include path provided) it's also handling these files.

  • Are more .CPP files analysed as expected?

Ignoring third-party headers would result in syntax errors:

Last hint I like to give is #2131 that could also be a root cause for being slow.

Regards,

@guwirth
Copy link
Collaborator

guwirth commented Oct 22, 2021

...and another comment https://docs.sonarqube.org/latest/project-administration/narrowing-the-focus/ is only about filtering source code files and has nothing to do with including/excluding include files.

@Artalus
Copy link
Author

Artalus commented Oct 25, 2021

Are more .CPP files analysed as expected?

From my understanding, yes - it seemed to me that the plugin ultimately collected everything that matched for C++ file extensions, and tried to parse even files not mentioned in compilation database. I cannot test this behavior right now - but I will try to provide additional info later.

Ignoring third-party headers would result in syntax errors

Is it not possible to provide source code "as is", without parsing each and every function and identifier in every header? While it is definitely a nice feature needed to navigate the source tree, it seems to me that the plugin tries to bite more than it can chew. At least for our needs, the very basic syntax highlight like "this is keyword, this is operator and this is string literal" would surely work.

If this is needed in any case because of some SonarQube quirks, can this parsing be done concurrently, at least with different files being parsed in parallel? Right now it seems that the plugin processes files one by one, which is extremely slow in large codebase.

@guwirth
Copy link
Collaborator

guwirth commented Oct 25, 2021

Hi @Artalus,

From my understanding, yes - it seemed to me that the plugin ultimately collected everything that matched for C++ file extensions

Easy to verify: Which files do you see in the UI?


SonarQube (SQ) works like this:

  1. All file extensions assigned to https://github.com/SonarOpenCommunity/sonar-cxx/wiki/sonar.cxx.file.suffixes define one language. They are indexed and uploaded to the SQ UI. sonar-project.properties defines the project base directory, more here https://github.com/SonarOpenCommunity/sonar-cxx/wiki/Troubleshooting-Configuration.
  2. You can further narrow the scope with https://docs.sonarqube.org/latest/project-administration/narrowing-the-focus/. This filters the source code files (files visible in the UI).
  3. To create accurate software metrics and do the syntax highlighting we have to parse the code. To get meaningful results there should be no syntax errors in the code (https://github.com/SonarOpenCommunity/sonar-cxx/wiki/Detect-and-fix-parsing-errors). That's the reason why we have also a preprocessor implemented. Reading and handling all the header files is the most time consuming part.
    • Hint: Since the cxx plugin only reads macros from the include files in order to parse the code syntactically in a correct way, it is often better to provide the missing macros instead of reading in all include files. Reading many include files (e.g. STL, Boost, MFC, ATL, ...) often slows down the analysis considerably. The recommendation is to include only include files for your own code.

Thinking about this the problems seems to be that the compile_commands.json contains all macros and include paths to be able to compile the file. This is more than the CXX plugin need. On the other hand you have with this use case no possibility to filter out include paths, sonar.exclusions does not work for header files. A solution could be to add something like

sonar.cxx.excludeIncludeDirectories=...

At least for our needs, the very basic syntax highlight like "this is keyword

That's also something we discussed with SonarSource. A possible solution could look like this:
https://community.sonarsource.com/t/generic-language-sensor/41185


parsing be done concurrently

Our solution is on top of https://github.com/SonarSource/sslr, we are not sure if the library is multithreading safe? This is for sure a bigger effort.


There are still some other possibilities to speed up the preprocessor:

  • cache invalid header file accesses
  • support pragma once
  • add precompiled header support
  • add this "generic approach"
  • ...

At the end it's always about the effort we are able to spend. Interestingly, I don't think the commercial C++ plugin based on the clang parser is really faster either :-).

A solution with manageable effort seems to be: sonar.cxx.excludeIncludeDirectories. In a extreme case you could suppress with this all #includess. Could you support developing (or at least testing) such a solution?

Regards,

@Artalus
Copy link
Author

Artalus commented Oct 26, 2021

Easy to verify: Which files do you see in the UI?

Both related and unrelated ones. F.x. we have a subfolder that is not mentioned in cmake and not used by it at all - yet in SQ UI I still see sources and headers from it, since I forgot to add it to exclusions list. If this is how SQ works by default, "find everything that ends in .cpp and treat it as project code, unless manually excluded" - then all I have for it are swear words.

A solution could be to add something like sonar.cxx.excludeIncludeDirectories=...

Given that unrelated headers are mostly grouped under some common directories, like /usr/share/include/boost or /conan/data/qt5/include/qt, is it possible to provide only roots of these directories like excludeIncludeDirectories=/usr,/conan?

Interestingly, I don't think the commercial C++ plugin based on the clang parser is really faster either :-).

This IMO does not mean that "plugin's parser is on par with clang-based one", but rather "both are slow and single-threaded" 😞

Could you support developing (or at least testing) such a solution?

My current task is basically "provide means to view PVS-Studio reports conveniently". Currently I decided that at this point, this would be easier done in CI via Jenkins and its "warnings" plugin, rather than jump through loops with SQ. Once I am done with this task and have some spare time, I will try test if excludeIncludeDirectories helps on our codebase.

@guwirth
Copy link
Collaborator

guwirth commented Aug 9, 2022

Should be improved with #2410

@guwirth guwirth closed this as completed Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants