Default (implicit) workspace encoding becomes UTF-8 when running Eclipse on Java 18 #154

howlger · 2022-06-09T17:41:02Z

When running Eclipse 4.24 with Java 18 (tested with JustJ 18.0.1 and Temurin-18.0.1+10 on Windows 10), the encoding of a workspace that was created with Eclipse 4.23 (with the default encoding Cp1252) becomes UTF-8. The same does not happen when running Eclipse 4.24 with Java 17.

Steps to reproduce:

With Eclipse IDE for Java Developers 2022-03:
1. Create a new workspace
  → preferences General > Workspace: Text file encoding: Default (Cp1252)
2. Inside the new workspace create a new project Hello Wörld
  → Project > Properties: Resource: Text file encoding: Inherited from container (Cp1252)
3. Inside the new project create a new class Äpp
  → right-click Äpp.java: Properties: Resource: Text file encoding: Default (inherited from container: Cp1252)
Open the workspace with Eclipse IDE for Java Developers 2022-06 with an installed JustJ Adoptium OpenJDK Hotspot JRE Complete 18.0.1:
→ preferences General > Workspace: Text file encoding: Default (UTF-8)
→ Project > Properties: Resource: Text file encoding: Inherited from container (UTF-8)
→ right-click Äpp.java: Properties: Resource: Text file encoding: Default (inherited from container: UTF-8)
→ "Project 'Hello Wörld' has no explicit encoding set" warning
→ "Syntax error on token "Invalid Character", delete this token" error in Äpp.java for the first character of the class name

See also eclipse-platform/eclipse.platform#86

The text was updated successfully, but these errors were encountered:

merks · 2022-06-09T17:49:39Z

I think this is to be expected:

https://openjdk.java.net/jeps/400

iloveeclipse · 2022-06-09T18:17:35Z

I think this is to be expected:

https://openjdk.java.net/jeps/400

But we should have a fix for that, so that we properly determine native charset, and the not explicitly set encoding stays the system one, not UTF8.

The whole point of changes we did via https://bugs.eclipse.org/bugs/show_bug.cgi?id=516583 and linked bugs was to make sure workspaces and projects have explicit encoding specified, to avoid unattended encoding changes resulting in data corruption.
We (me) just missed the point that native charset detection has to change too because of breaking JEP-400 changes.

howlger · 2022-06-09T21:41:22Z

As @merks already pointed out, this is caused by JEP 400, so the following can be used as workaround.

Workaround when running Eclipse 2022-06 (4.24) with Java 18:
In eclipse.ini add the following line after the line -vmargs:

-Dfile.encoding=COMPAT

howlger · 2022-06-09T21:55:24Z

To get the native encoding, the system property file.encoding can be used in Java 17 or lower:

String nativeEncoding = System.getProperty("file.encoding");

To get the native encoding in Java 18 as well as in older Java versions, something like the following has to be used instead:

String nativeEncodingSinceJava17 = System.getProperty("native.encoding");
String nativeEncoding = nativeEncodingSinceJava17 == null ? System.getProperty("file.encoding") : nativeEncodingSinceJava17;

merks · 2022-06-10T09:09:54Z

I could add this -Dfile.encoding=COMPAT to the JRE-specific tasks for the 18.x JRE in the catalog. I suppose old VMs would not like this option, right?

iloveeclipse · 2022-06-10T09:15:38Z

I could add this -Dfile.encoding=COMPAT to the JRE-specific tasks for the 18.x JRE in the catalog. I suppose old VMs would not like this option, right?

Yes, that would work around the issue for all Eclipses we shipped up to now, and yes, that would needed for Java 18+ only.

Bananeweizen · 2022-06-10T09:29:55Z

I could add this -Dfile.encoding=COMPAT to the JRE-specific tasks for the 18.x JRE in the catalog. I suppose old VMs would not like this option, right?

But wouldn't that become a problem on its own? I understand your suggestion as "Oomph packaged Eclipse installations remain with the Cp1252 encoding on Windows for eternity", which would make a default installation quite surprising for new developers sometime in the future (since it would not follow the then established UTF-8 everywhere principle).

And if it is not for eternity, then we only defer the issue and struggle with the same problems again at some point in the future. That's why I would rather be bold and go with the new encoding. If people didn't care about setting an encoding until now, some of them will be confused by either choice that we take. That's enough reason for me to take the more simple route. :)

iloveeclipse · 2022-06-10T09:32:23Z

Or we leave installer with java 17 till we have a fix here (I'm on it).

With https://openjdk.java.net/jeps/400 implemented in Java 18, "file.encoding" system property became meaningless and can't be used anymore to determine system native encoding. Unfortunately, that property was widely used in Eclipse API's and was the standard way to get default system encoding. So both org.eclipse.ui.WorkbenchEncoding.getWorkbenchDefaultEncoding() and org.eclipse.core.resources.ResourcesPlugin.getEncoding() were using this property and need now a proper replacement. The new API tries first to get the value of the "native.encoding" property (populated by Java 18), and if not there, uses internal "sun.jnu.encoding" property (used in all supported Java versions). In case neither property is set, Charset.defaultCharset() is used as fallback solution. See eclipse-platform/eclipse.platform.resources#154

With https://openjdk.java.net/jeps/400 implemented in Java 18, "file.encoding" system property became meaningless and can't be used anymore to determine system native encoding. Instead, use new Platform.getNativeEncoding() API that tries to provide a suitable replacement compatible with Java 18+ and previous Java versions. See eclipse-platform#154

With https://openjdk.java.net/jeps/400 implemented in Java 18, "file.encoding" system property became meaningless and can't be used anymore to determine system native encoding. Instead, use new Platform.getNativeEncoding() API that tries to provide a suitable replacement compatible with Java 18+ and previous Java versions. This PR fixes eclipse-platform/eclipse.platform.resources#154

iloveeclipse · 2022-06-10T12:01:11Z

I've pushed API proposal that we need to fix this issue, see eclipse-platform/eclipse.platform.runtime#63, and two other PR's in platform UI / resources to accommodate this change.

The one is the one that actually determines the workspace default encoding (!), not the #156, which is crazy enough, because IDE preference page that defines workspace defaults uses workbench API that has no dependencies to resources bundle itself. OMG. But now they both will use same API.

Would be good if interested committers could review that.

laeubi · 2022-06-10T12:07:55Z

I just wanted to note that I know from some users using -Dfile.encoding=... in their eclipse ini, so even if that would work for the 'default user' other will still be broken.

With https://openjdk.java.net/jeps/400 implemented in Java 18, "file.encoding" system property became meaningless and can't be used anymore to determine system native encoding. Instead, use new Platform.getNativeEncoding() API that tries to provide a suitable replacement compatible with Java 18+ and previous Java versions. Note: the proposed change makes sure that IF users have specified encoding via command line arguments -Dfile.encoding=XYZ, it will be still used. This PR fixes eclipse-platform/eclipse.platform.resources#154

With https://openjdk.java.net/jeps/400 implemented in Java 18, "file.encoding" system property became meaningless and can't be used anymore to determine system native encoding. Instead, use new Platform.getNativeEncoding() API that tries to provide a suitable replacement compatible with Java 18+ and previous Java versions. Note: the proposed change makes sure that IF users have specified encoding via command line arguments -Dfile.encoding=XYZ, it will be still used. See eclipse-platform#154

iloveeclipse · 2022-06-10T14:45:06Z

I just wanted to note that I know from some users using -Dfile.encoding=... in their eclipse ini, so even if that would work for the 'default user' other will still be broken.

This all is a nightmare between compatibility to old non-standards, new non-standards, existing workspaces with no explicit encoding set and weird workbench/resources interdependencies.

@tjwatson : WDYT about a new eclipse command line argument -defaultWorkspaceEncoding ?
That would replace -Dfile.encoding=... for users who want specify default value for the workspace encoding for new workspaces and for workspaces without explicit encoding set.

We would then ignore -Dfile.encoding=... completely.

Pros: we get rid of using non-supported / broken -Dfile.encoding=... property.
Cons: all existing installation that relied on -Dfile.encoding=... set to some value would get native encoding instead, silent data corruption possible.

Alternative: we have to "pimp" both ResourcesPlugin and WorkbenchEncoding classes to "understand" this -Dfile.encoding=... by reading JVM command line arguments (and not the system property) like here:

eclipse.platform.resources/bundles/org.eclipse.core.resources/src/org/eclipse/core/internal/resources/Workspace.java

Line 2290 in d914d91

    
           List<String> commandLineArgs = ManagementFactory.getRuntimeMXBean().getInputArguments();

Pros: no "silent" data corruption for those using -Dfile.encoding=... property.
Cons: ugly code written to support non standard property obsoleted in Java 18...

Changes for the alternative (compatible) proposal would look like:

laeubi · 2022-06-10T14:51:14Z

Just another alternative: We switch to always use UTF-8 as java does, and add a warning popup for workspace without an encoding asking the user what to set from now on for this workspace?

iloveeclipse · 2022-06-10T15:22:20Z

add a warning popup for workspace without an encoding asking the user what to set from now on for this workspace?

Interesting idea.
Instead a warning dialog after the fact, one can also ask to confirm / change the decision with the help of similar dialog:

Something like: Your workspace haven't explicitly specified file encoding, so "XYZ native one" was used. Do you want to continue using "XYZ native one"?

The problem with "we switch to UTF-8" is that users may not realize that they data will be corrupted from that moment on. So we have to propose native encoding OR the one specified via -Dfile.encoding=... command line anyway, because that was the encoding with which the data was saved.

I see a problem here with all the "custom" products based on platform, where "workspace" doesn't mean anything or mean something completely different and they will now get a completely unrelated / strange popup saying something about "workspace", "encoding", bla bla.

So if we go this way, we probably should add this in the IDEApplication, but in that case those products that are not using IDEApplication and care about workspace encoding, will miss the warning. But I think that's OK considered that every product owner is supposed to read N&N before product update to the new platform version. At least responsible products owners do that.

iloveeclipse · 2022-06-10T15:32:18Z

Note: I will be mostly offline for a week + 1 day from now on, so don't expect any feedback from me in that time.

laeubi · 2022-06-11T03:58:17Z

So if we go this way, we probably should add this in the IDEApplication

I think the "Workspace selection dialog" would be a good place for this, one could even think about a system property to disable this (e.g. if a -DeclipseEncoding=... is set then always use that), ein think it should the has three choices where the first is the default selected:

Detected System Encoding
UTF-8 (recommended for interoperability)
Other

One might even detect if a file is stored in another encoding (I once wrote a charset detector class for that) and ask the user to convert that file if it is different.

With https://openjdk.java.net/jeps/400 implemented in Java 18, "file.encoding" system property became meaningless and can't be used anymore to determine system native encoding. Unfortunately, that property was widely used in Eclipse API's and was the standard way to get default system encoding. So both org.eclipse.ui.WorkbenchEncoding.getWorkbenchDefaultEncoding() and org.eclipse.core.resources.ResourcesPlugin.getEncoding() were using this property and need now a proper replacement. The new API tries first to get the value of the "native.encoding" property (populated by Java 17+), and if not there, uses internal "sun.jnu.encoding" property (used in all supported Java versions). In case neither property is set, Charset.defaultCharset() is used as fallback solution. See eclipse-platform/eclipse.platform.resources#154

With https://openjdk.java.net/jeps/400 implemented in Java 18, "file.encoding" system property became meaningless and can't be used anymore to determine system native encoding. Instead, use new Platform.getSystemCharset() API that tries to provide a suitable replacement compatible with Java 18+ and previous Java versions. Note: the proposed change makes sure that IF users have specified encoding via command line arguments -Dfile.encoding=XYZ, it will be still used. This PR fixes eclipse-platform/eclipse.platform.resources#154

With https://openjdk.java.net/jeps/400 implemented in Java 18, "file.encoding" system property became meaningless and can't be used anymore to determine system native encoding. Instead, use new Platform.getSystemCharset() API that tries to provide a suitable replacement compatible with Java 18+ and previous Java versions. Note: the proposed change makes sure that IF users have specified encoding via command line arguments -Dfile.encoding=XYZ, it will be still used. See eclipse-platform#154

With https://openjdk.java.net/jeps/400 implemented in Java 18, "file.encoding" system property became meaningless and can't be used anymore to determine system native encoding. Instead, use new Platform.getSystemCharset() API that tries to provide a suitable replacement compatible with Java 18+ and previous Java versions. Note: the proposed change makes sure that IF users have specified encoding via command line arguments -Dfile.encoding=XYZ, it will be still used. See #154

iloveeclipse changed the title ~~Workspace encoding becomes UTF-8 when running Eclipse 4.24 with Java 18 on Windows 10~~ Default (implicit) workspace encoding becomes UTF-8 when running Eclipse on Java 18 Jun 9, 2022

iloveeclipse self-assigned this Jun 9, 2022

iloveeclipse added the plan Planned bugs/enhancements for a release label Jun 9, 2022

iloveeclipse added this to the 4.25 milestone Jun 9, 2022

Bananeweizen mentioned this issue Jun 10, 2022

In "Project > Properties: Resource" "Restore Defaults" removes explicit project encoding eclipse-platform/eclipse.platform#86

Open

iloveeclipse mentioned this issue Jun 10, 2022

Add Platform.getNativeEncoding() API eclipse-platform/eclipse.platform.runtime#63

Closed

iloveeclipse mentioned this issue Jun 10, 2022

Use System Encoding for java command line argument files (fixes #45) eclipse-jdt/eclipse.jdt.debug#47

Merged

iloveeclipse mentioned this issue Jun 10, 2022

Don't use "file.encoding", use Platform.getSystemCharset() instead #156

Merged

iloveeclipse mentioned this issue Jun 10, 2022

Don't use "file.encoding", use Platform.getSystemCharset() instead eclipse-platform/eclipse.platform.ui#125

Merged

iloveeclipse mentioned this issue Jun 21, 2022

Add Platform.getSystemCharset() API eclipse-platform/eclipse.platform#38

Merged

iloveeclipse closed this as completed in eclipse-platform/eclipse.platform.ui@51ff449 Jun 22, 2022

iloveeclipse modified the milestones: 4.25, 4.25 M1 Jun 30, 2022

vogella mentioned this issue Jul 2, 2024

Reset preference should set the encoding to UTF-8 eclipse-platform/eclipse.platform#1458

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default (implicit) workspace encoding becomes UTF-8 when running Eclipse on Java 18 #154

Default (implicit) workspace encoding becomes UTF-8 when running Eclipse on Java 18 #154

howlger commented Jun 9, 2022

merks commented Jun 9, 2022

iloveeclipse commented Jun 9, 2022 •

edited

Loading

howlger commented Jun 9, 2022

howlger commented Jun 9, 2022

merks commented Jun 10, 2022

iloveeclipse commented Jun 10, 2022

Bananeweizen commented Jun 10, 2022

iloveeclipse commented Jun 10, 2022

iloveeclipse commented Jun 10, 2022

laeubi commented Jun 10, 2022

iloveeclipse commented Jun 10, 2022

laeubi commented Jun 10, 2022

iloveeclipse commented Jun 10, 2022 •

edited

Loading

iloveeclipse commented Jun 10, 2022

laeubi commented Jun 11, 2022

Default (implicit) workspace encoding becomes UTF-8 when running Eclipse on Java 18 #154

Default (implicit) workspace encoding becomes UTF-8 when running Eclipse on Java 18 #154

Comments

howlger commented Jun 9, 2022

merks commented Jun 9, 2022

iloveeclipse commented Jun 9, 2022 • edited Loading

howlger commented Jun 9, 2022

howlger commented Jun 9, 2022

merks commented Jun 10, 2022

iloveeclipse commented Jun 10, 2022

Bananeweizen commented Jun 10, 2022

iloveeclipse commented Jun 10, 2022

iloveeclipse commented Jun 10, 2022

laeubi commented Jun 10, 2022

iloveeclipse commented Jun 10, 2022

laeubi commented Jun 10, 2022

iloveeclipse commented Jun 10, 2022 • edited Loading

iloveeclipse commented Jun 10, 2022

laeubi commented Jun 11, 2022

iloveeclipse commented Jun 9, 2022 •

edited

Loading

iloveeclipse commented Jun 10, 2022 •

edited

Loading