-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default (implicit) workspace encoding becomes UTF-8 when running Eclipse on Java 18 #154
Comments
I think this is to be expected: |
But we should have a fix for that, so that we properly determine native charset, and the not explicitly set encoding stays the system one, not UTF8. The whole point of changes we did via https://bugs.eclipse.org/bugs/show_bug.cgi?id=516583 and linked bugs was to make sure workspaces and projects have explicit encoding specified, to avoid unattended encoding changes resulting in data corruption. |
To get the native encoding, the system property String nativeEncoding = System.getProperty("file.encoding"); To get the native encoding in Java 18 as well as in older Java versions, something like the following has to be used instead: String nativeEncodingSinceJava17 = System.getProperty("native.encoding");
String nativeEncoding = nativeEncodingSinceJava17 == null ? System.getProperty("file.encoding") : nativeEncodingSinceJava17; |
I could add this |
Yes, that would work around the issue for all Eclipses we shipped up to now, and yes, that would needed for Java 18+ only. |
But wouldn't that become a problem on its own? I understand your suggestion as "Oomph packaged Eclipse installations remain with the Cp1252 encoding on Windows for eternity", which would make a default installation quite surprising for new developers sometime in the future (since it would not follow the then established UTF-8 everywhere principle). And if it is not for eternity, then we only defer the issue and struggle with the same problems again at some point in the future. That's why I would rather be bold and go with the new encoding. If people didn't care about setting an encoding until now, some of them will be confused by either choice that we take. That's enough reason for me to take the more simple route. :) |
Or we leave installer with java 17 till we have a fix here (I'm on it). |
With https://openjdk.java.net/jeps/400 implemented in Java 18, "file.encoding" system property became meaningless and can't be used anymore to determine system native encoding. Unfortunately, that property was widely used in Eclipse API's and was the standard way to get default system encoding. So both org.eclipse.ui.WorkbenchEncoding.getWorkbenchDefaultEncoding() and org.eclipse.core.resources.ResourcesPlugin.getEncoding() were using this property and need now a proper replacement. The new API tries first to get the value of the "native.encoding" property (populated by Java 18), and if not there, uses internal "sun.jnu.encoding" property (used in all supported Java versions). In case neither property is set, Charset.defaultCharset() is used as fallback solution. See eclipse-platform/eclipse.platform.resources#154
With https://openjdk.java.net/jeps/400 implemented in Java 18, "file.encoding" system property became meaningless and can't be used anymore to determine system native encoding. Unfortunately, that property was widely used in Eclipse API's and was the standard way to get default system encoding. So both org.eclipse.ui.WorkbenchEncoding.getWorkbenchDefaultEncoding() and org.eclipse.core.resources.ResourcesPlugin.getEncoding() were using this property and need now a proper replacement. The new API tries first to get the value of the "native.encoding" property (populated by Java 18), and if not there, uses internal "sun.jnu.encoding" property (used in all supported Java versions). In case neither property is set, Charset.defaultCharset() is used as fallback solution. See eclipse-platform/eclipse.platform.resources#154
With https://openjdk.java.net/jeps/400 implemented in Java 18, "file.encoding" system property became meaningless and can't be used anymore to determine system native encoding. Instead, use new Platform.getNativeEncoding() API that tries to provide a suitable replacement compatible with Java 18+ and previous Java versions. See eclipse-platform#154
With https://openjdk.java.net/jeps/400 implemented in Java 18, "file.encoding" system property became meaningless and can't be used anymore to determine system native encoding. Instead, use new Platform.getNativeEncoding() API that tries to provide a suitable replacement compatible with Java 18+ and previous Java versions. This PR fixes eclipse-platform/eclipse.platform.resources#154
I've pushed API proposal that we need to fix this issue, see eclipse-platform/eclipse.platform.runtime#63, and two other PR's in platform UI / resources to accommodate this change. The one is the one that actually determines the workspace default encoding (!), not the #156, which is crazy enough, because IDE preference page that defines workspace defaults uses workbench API that has no dependencies to resources bundle itself. OMG. But now they both will use same API. Would be good if interested committers could review that. |
I just wanted to note that I know from some users using |
With https://openjdk.java.net/jeps/400 implemented in Java 18, "file.encoding" system property became meaningless and can't be used anymore to determine system native encoding. Instead, use new Platform.getNativeEncoding() API that tries to provide a suitable replacement compatible with Java 18+ and previous Java versions. Note: the proposed change makes sure that IF users have specified encoding via command line arguments -Dfile.encoding=XYZ, it will be still used. This PR fixes eclipse-platform/eclipse.platform.resources#154
With https://openjdk.java.net/jeps/400 implemented in Java 18, "file.encoding" system property became meaningless and can't be used anymore to determine system native encoding. Instead, use new Platform.getNativeEncoding() API that tries to provide a suitable replacement compatible with Java 18+ and previous Java versions. Note: the proposed change makes sure that IF users have specified encoding via command line arguments -Dfile.encoding=XYZ, it will be still used. See eclipse-platform#154
This all is a nightmare between compatibility to old non-standards, new non-standards, existing workspaces with no explicit encoding set and weird workbench/resources interdependencies. @tjwatson : WDYT about a new eclipse command line argument We would then ignore
Alternative: we have to "pimp" both ResourcesPlugin and WorkbenchEncoding classes to "understand" this Line 2290 in d914d91
Pros: no "silent" data corruption for those using Changes for the alternative (compatible) proposal would look like: |
Just another alternative: We switch to always use UTF-8 as java does, and add a warning popup for workspace without an encoding asking the user what to set from now on for this workspace? |
Note: I will be mostly offline for a week + 1 day from now on, so don't expect any feedback from me in that time. |
I think the "Workspace selection dialog" would be a good place for this, one could even think about a system property to disable this (e.g. if a
One might even detect if a file is stored in another encoding (I once wrote a charset detector class for that) and ask the user to convert that file if it is different. |
With https://openjdk.java.net/jeps/400 implemented in Java 18, "file.encoding" system property became meaningless and can't be used anymore to determine system native encoding. Unfortunately, that property was widely used in Eclipse API's and was the standard way to get default system encoding. So both org.eclipse.ui.WorkbenchEncoding.getWorkbenchDefaultEncoding() and org.eclipse.core.resources.ResourcesPlugin.getEncoding() were using this property and need now a proper replacement. The new API tries first to get the value of the "native.encoding" property (populated by Java 17+), and if not there, uses internal "sun.jnu.encoding" property (used in all supported Java versions). In case neither property is set, Charset.defaultCharset() is used as fallback solution. See eclipse-platform/eclipse.platform.resources#154
With https://openjdk.java.net/jeps/400 implemented in Java 18, "file.encoding" system property became meaningless and can't be used anymore to determine system native encoding. Instead, use new Platform.getSystemCharset() API that tries to provide a suitable replacement compatible with Java 18+ and previous Java versions. Note: the proposed change makes sure that IF users have specified encoding via command line arguments -Dfile.encoding=XYZ, it will be still used. This PR fixes eclipse-platform/eclipse.platform.resources#154
With https://openjdk.java.net/jeps/400 implemented in Java 18, "file.encoding" system property became meaningless and can't be used anymore to determine system native encoding. Instead, use new Platform.getSystemCharset() API that tries to provide a suitable replacement compatible with Java 18+ and previous Java versions. Note: the proposed change makes sure that IF users have specified encoding via command line arguments -Dfile.encoding=XYZ, it will be still used. See eclipse-platform#154
With https://openjdk.java.net/jeps/400 implemented in Java 18, "file.encoding" system property became meaningless and can't be used anymore to determine system native encoding. Instead, use new Platform.getSystemCharset() API that tries to provide a suitable replacement compatible with Java 18+ and previous Java versions. Note: the proposed change makes sure that IF users have specified encoding via command line arguments -Dfile.encoding=XYZ, it will be still used. See #154
When running Eclipse 4.24 with Java 18 (tested with JustJ 18.0.1 and Temurin-18.0.1+10 on Windows 10), the encoding of a workspace that was created with Eclipse 4.23 (with the default encoding Cp1252) becomes UTF-8. The same does not happen when running Eclipse 4.24 with Java 17.
Steps to reproduce:
→ preferences General > Workspace: Text file encoding: Default (Cp1252)
→ Project > Properties: Resource: Text file encoding: Inherited from container (Cp1252)
Äpp
→ right-click
Äpp.java
: Properties: Resource: Text file encoding: Default (inherited from container: Cp1252)→ preferences General > Workspace: Text file encoding: Default (UTF-8)
→ Project > Properties: Resource: Text file encoding: Inherited from container (UTF-8)
→ right-click
Äpp.java
: Properties: Resource: Text file encoding: Default (inherited from container: UTF-8)→ "Project 'Hello Wörld' has no explicit encoding set" warning
→ "Syntax error on token "Invalid Character", delete this token" error in
Äpp.java
for the first character of the class nameSee also eclipse-platform/eclipse.platform#86
The text was updated successfully, but these errors were encountered: