Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Line xxx is out of range" error on UTF-16LE files #2139

Closed
cheide opened this issue May 5, 2021 · 17 comments · Fixed by #2142 or #2145
Closed

"Line xxx is out of range" error on UTF-16LE files #2139

cheide opened this issue May 5, 2021 · 17 comments · Fixed by #2142 or #2145
Assignees
Labels
Milestone

Comments

@cheide
Copy link

cheide commented May 5, 2021

When scanning a source tree that contains header files stored in UTF-16LE, the scan fails with an error like:

10:44:37.920 DEBUG: 'src/securpass/PasswordReset/Projects/CoreResources/CoreResource.h' generated metadata with charset 'UTF-16LE'
10:44:40.781 INFO: ------------------------------------------------------------------------
10:44:40.781 INFO: EXECUTION FAILURE
10:44:40.781 INFO: ------------------------------------------------------------------------
10:44:40.781 INFO: Total time: 5:00.246s
10:44:40.854 INFO: Final Memory: 20M/100M
10:44:40.854 INFO: ------------------------------------------------------------------------
10:44:40.854 ERROR: Error during SonarScanner execution
java.lang.IllegalArgumentException: Line 600 is out of range for file src/securpass/PasswordReset/Projects/CoreResources/CoreResource.h. File has 599 lines.
	at org.sonar.api.utils.Preconditions.checkArgument(Preconditions.java:43)
	at org.sonar.scanner.DefaultFileLinesContext.checkLineRange(DefaultFileLinesContext.java:63)
	at org.sonar.scanner.DefaultFileLinesContext.setIntValue(DefaultFileLinesContext.java:56)
	at org.sonar.plugins.cxx.CxxSquidSensor.lambda$saveFileLinesContext$0(CxxSquidSensor.java:468)

There is a previous syntax error when parsing this file in the scan log:

10:42:20.902 DEBUG: process unit 'C:\work\xxx\src\securpass\PasswordReset\Projects\CoreResources\CoreResource.h'
10:42:21.093 DEBUG: [C:\work\xxx\src\securpass\PasswordReset\Projects\CoreResources\CoreResource.h:1177]:    syntax error:       /   /     M   i   c   r   o   s   o   f   t     V   i   s   u   a   l     C   +   +     g   e   n   e   r   a   t   e   d     i   n   c   l   u   d   e     f   i   l   e   .       /   /     U   s   e   d     b   y     C   o   r   e   R   e   s   o   u   r   c   e   .   r   c       /   /                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             

where you can see embedded nulls as "^@" in the log file between each character.

Scanning just this file by itself reproduces the syntax error, but not the line out of range error; I do not have a reduced case for that yet.

Environment:

  • OS: Windows 10
  • SonarQube version: 8.9.0.43852 Community Edition
  • cxx plugin version: 2.0.0.2650
  • sonar-scanner version: 4.6.1.2450
@guwirth guwirth added the bug label May 5, 2021
@guwirth
Copy link
Collaborator

guwirth commented May 5, 2021

Hi @cheide,

thanks for your feedback. Up to know the plugin is only tested with UTF8. Will have a look…
Should work but never an use case…

Regards,

@guwirth
Copy link
Collaborator

guwirth commented May 6, 2021

Root cause:

IncludeLexer.create(this).lex(getCodeProvider().getSourceCode(includedFile, charset));

public String getSourceCode(File file, Charset charset) throws IOException {

@guwirth
Copy link
Collaborator

guwirth commented May 6, 2021

Hi @cheide,

think the method reading include files is using predefined source encoding and not information from BOM. Can you confirm that file has a BOM?

Regards,

@guwirth guwirth added this to the 2.0.1 milestone May 6, 2021
guwirth added a commit to guwirth/sonar-cxx that referenced this issue May 6, 2021
- try to read `BOM` and use this encoding if available
- otherwise if defined use `sonar.sourceEncoding`
- otherwise `UTF-8`
- close SonarOpenCommunity#2139
@guwirth guwirth self-assigned this May 6, 2021
@cheide
Copy link
Author

cheide commented May 6, 2021

Yes, the file has the 0xFF 0xFE BOM.

I have a reduced test case, but it's...strange. Whether the "Line xxx is out of range" error occurs depends on the contents of an unrelated source file that doesn't even include a UTF-16LE header.

I've attached a zip file containing that test case and the log file from running it (with the snapshot build 2659). If I comment out the 'add' method in the struct in the .cpp file, the error no longer occurs. It only happens with this specific struct too, after eliminating various other structs that were in the same source file.

sonar_2139.zip

@guwirth
Copy link
Collaborator

guwirth commented May 6, 2021

@guwirth
Copy link
Collaborator

guwirth commented May 6, 2021

error occurs depends on the contents of an unrelated source file that doesn't even include a UTF-16LE header.

Think that has to do with sonar.cxx.file.suffixes. Both files are indexed:

08:45:56.640 INFO: Indexing files...
08:45:56.653 DEBUG: 'test\GetExpiringLicenses.cpp' indexed with language 'cxx'
08:45:56.654 DEBUG: 'test\test2.h' indexed with language 'cxx'
08:45:56.655 INFO: 2 files indexed

@cheide
Copy link
Author

cheide commented May 6, 2021

With build 2662, both the syntax error and the "line xxx is out of range" error still occur with a full scan. With the reduced test case, it doesn't generate the syntax error (looks like build 2659 didn't either), but does still have the "line xxx is out of range" error.

The strange thing with the reduced test case is that the error only occurs if both files are included in the scan, even though the include file isn't actually included by anything. Scanning just the UTF16-LE header by itself doesn't cause the error.

There's another UTF16-LE header file where if I substitute it into the reduced test case, it doesn't generate the "line xxx is out of range" error, but does generate the syntax error, and also notes:

10:40:06.117 DEBUG: 'test/Resource.h' generated metadata with charset 'UTF-16LE'
10:40:06.130 DEBUG: Not enough content in 'test/Resource.h' to have CPD blocks, it will not be part of the duplication detection
10:40:06.130 DEBUG: Highlighting error in file 'Resource.h' at start:9:1 end:9:1
10:40:06.130 DEBUG: Highlighting error in file 'Resource.h' at start:17:1 end:17:1
10:40:06.130 DEBUG: Highlighting error in file 'Resource.h' at start:19:1 end:19:1
10:40:06.131 DEBUG: Highlighting error in file 'Resource.h' at start:21:1 end:21:1
10:40:06.131 DEBUG: Highlighting error in file 'Resource.h' at start:23:1 end:23:1
10:40:06.131 DEBUG: Highlighting error in file 'Resource.h' at start:25:1 end:25:1
10:40:06.131 DEBUG: Highlighting error in file 'Resource.h' at start:27:1 end:27:1
10:40:06.131 DEBUG: Highlighting error in file 'Resource.h' at start:29:1 end:29:1
10:40:06.131 DEBUG: Highlighting error in file 'Resource.h' at start:31:1 end:31:1

There are only 16 lines in this Resource.h, so it seems like something might be double-counting lines somewhere. This test case is attached below.

sonar_2139_2.zip

@guwirth
Copy link
Collaborator

guwirth commented May 6, 2021

Hi @cheide,

thanks for testing. I will setup a complete sample with UTF-16LE source code, headers and reports. This was not really an use case up to now…

Regards,

@guwirth guwirth reopened this May 6, 2021
@guwirth guwirth changed the title "Line xxx is out of range" error on UTF-16LE header files "Line xxx is out of range" error on UTF-16LE files May 6, 2021
@guwirth
Copy link
Collaborator

guwirth commented May 7, 2021

Hi @cheide,

Problem is reproducible, #2143
We try to fix it.

Regards,

@guwirth
Copy link
Collaborator

guwirth commented May 10, 2021

Hi @cheide,

I have looked into this again. Unfortunately, the plugin does not support single files with UTF-16 if sonar.sourceEncoding is not set accordingly. What works is to set the entry sonar.sourceEncoding=UTF-16LE if all files are in this format. The change needed is unfortunately larger, the problem can not be solved in the short term.

What works:

  • Source code files with encoding UTF-8 with/without BOM are processed correctly regardless of the sonar.sourceEncoding setting (cxx plugin v2.0.1++).
  • Source code completely in UTF-16 and correct setting sonar.sourceEncoding is supported.
  • Reports (.TXT or .XML) are processed correctly with encoding UTF-8 and UTF-16 (with/without BOM).

Regards,

@guwirth guwirth removed this from the 2.0.1 milestone May 10, 2021
@cheide
Copy link
Author

cheide commented May 10, 2021

We've had to roll back to SonarQube 7.9 for other reasons anyway so this isn't critical for us, but thanks for looking into it!

@guwirth
Copy link
Collaborator

guwirth commented May 10, 2021

Hi @cheide,

some hints:

  • cxx plugin 1.3 should have the same problem?
  • cxx plugin 2.0 works also with 7.9
  • In case you find more 2.0 issues you are welcome to forward them. Sooner or later you will also have to go to 8.9 LTS ...

Regards,

@cheide
Copy link
Author

cheide commented May 10, 2021

Version 1.3.3 of the plugin has generally been working for us. It does have the same syntax error on those files, but it doesn't experience the same "Line xxx is out of range" error and runs to completion. Our main goal is really just to import a Cppcheck report, and 1.3 does that successfully.

@guwirth
Copy link
Collaborator

guwirth commented May 10, 2021

Hi @cheide,

supporting and suppressing are two different things. Catch and ignore the error should not be a problem…

Regards,

@guwirth guwirth mentioned this issue May 12, 2021
@guwirth guwirth added this to the 2.0.1 milestone May 12, 2021
@guwirth
Copy link
Collaborator

guwirth commented May 12, 2021

@cheide: starting with 2.0.1.2678 plugin behave similar to cxx plugin 1.3:

  • continue with an debug message in the .LOG file after an error
  • final solution (supporting UTF-16LE) is still open

@guwirth
Copy link
Collaborator

guwirth commented May 17, 2021

@cheide cxx plugin 2.0.2 the problem is completely fixed now (#2143)

@cheide
Copy link
Author

cheide commented May 17, 2021

Thank you!

I wound up just converting these files to plain ASCII since there turned out to only be a small handful of them, but it could pop up again since it looks like certain versions of Visual Studio created new projects as UTF-16 source files by default for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

2 participants