Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn about use of String charset methods #941

Closed
wants to merge 1 commit into from

Conversation

gaul
Copy link
Contributor

@gaul gaul commented Feb 19, 2018

Using typed Charset methods avoids unneeded
UnsupportedEncodingException handling.

@gaul
Copy link
Contributor Author

gaul commented Feb 19, 2018

CI fails due to #942.

Using typed Charset methods avoids unneeded
UnsupportedEncodingException handling.
@madrob
Copy link
Contributor

madrob commented Mar 20, 2018

For a while using the typed charset was about 15% slower than using the string charset, has this gone away? I know people were explicitly using string methods because of this before.

@gaul
Copy link
Contributor Author

gaul commented Mar 20, 2018

@madrob Can you provide a reference? String.getBytes(String) basically calls String.getBytes(Charset) after consulting a custom two-element cache. Some more background:

https://blog.codecentric.de/en/2014/04/faster-cleaner-code-since-java-7/

@madrob
Copy link
Contributor

madrob commented Mar 20, 2018

Sure, here's experimental results from several communities:

https://issues.apache.org/jira/browse/LOG4J2-935 <-- description claims Java 6, 7, and 8 affected, benchmark shows only 6?
https://bugs.openjdk.java.net/browse/JDK-6764325 <-- Looks like JDK6 only maybe, but issue is still open.

Unsure what to make of these results, I figured to run my own jmh tests.

https://gist.github.com/madrob/f0688f7148aa66a5406ce1b0fc1964c9

The charset usage is slower but within the error bounds for single encoding and much worse for various encodings (an idea borrowed from the blog post you link).

Edit: And much faster in Java 9!

@cushon
Copy link
Collaborator

cushon commented Sep 19, 2023

(Sorry for the extremely slow response to this PR.)

Warning on these APIs SGTM overall.

I think it would be nice to have a suggested fix. If the charset is a string literal, we could recommend the corresponding StandardCharsets constant. And if it isn't a literal, I wonder if it's still worth emitting the diagnostic, if the fix is to add an explicit call to Charset.forName() is that still worthwhile?

Also, this might be worth splitting out into a separate check from JdkObsolete. These APIs are arguably less obsolete than the other APIs here, and JdkObsolete has been around for a while and having the new diagnostics in a separate check makes it easier to roll out.

copybara-service bot pushed a commit that referenced this pull request Sep 22, 2023
Related to #941

PiperOrigin-RevId: 567623165
copybara-service bot pushed a commit that referenced this pull request Sep 22, 2023
Related to #941

PiperOrigin-RevId: 567623165
copybara-service bot pushed a commit that referenced this pull request Sep 22, 2023
Related to #941

PiperOrigin-RevId: 567719035
@cushon
Copy link
Collaborator

cushon commented Sep 22, 2023

(Sorry for the extremely slow response to this PR.)

Warning on these APIs SGTM overall.

I think it would be nice to have a suggested fix. If the charset is a string literal, we could recommend the corresponding StandardCharsets constant. And if it isn't a literal, I wonder if it's still worth emitting the diagnostic, if the fix is to add an explicit call to Charset.forName() is that still worthwhile?

Also, this might be worth splitting out into a separate check from JdkObsolete. These APIs are arguably less obsolete than the other APIs here, and JdkObsolete has been around for a while and having the new diagnostics in a separate check makes it easier to roll out.

I added a separate check with a suggested fix in 1147184

@cushon cushon closed this Sep 22, 2023
@Marcono1234
Copy link
Contributor

@cushon, this new pattern only covers String(byte[], String) and String.getBytes(String), right?

Would it make sense to cover some other common methods and constructors which take a string charset name as well, at least Charset.forName? (incidentally one of the test templates has an example for this)

Here is SonarSource's corresponding implementation, though I guess covering all of these cases is out of scope for Error Prone; especially since some of those methods and constructors might be used quite rarely.

@cushon
Copy link
Collaborator

cushon commented Sep 22, 2023

@Marcono1234 yep, it's just those two APIs for now. I think covering more APIs definitely makes sense, I'd be happy to review PRs for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants