Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default (implicit) workspace encoding becomes UTF-8 when running Eclipse on Java 18 #154

Closed
howlger opened this issue Jun 9, 2022 · 15 comments
Assignees
Labels
plan Planned bugs/enhancements for a release
Milestone

Comments

@howlger
Copy link

howlger commented Jun 9, 2022

When running Eclipse 4.24 with Java 18 (tested with JustJ 18.0.1 and Temurin-18.0.1+10 on Windows 10), the encoding of a workspace that was created with Eclipse 4.23 (with the default encoding Cp1252) becomes UTF-8. The same does not happen when running Eclipse 4.24 with Java 17.

Steps to reproduce:

  1. With Eclipse IDE for Java Developers 2022-03:
    1. Create a new workspace
      → preferences General > Workspace: Text file encoding: Default (Cp1252)
    2. Inside the new workspace create a new project Hello Wörld
      Project > Properties: Resource: Text file encoding: Inherited from container (Cp1252)
    3. Inside the new project create a new class Äpp
      → right-click Äpp.java: Properties: Resource: Text file encoding: Default (inherited from container: Cp1252)
  2. Open the workspace with Eclipse IDE for Java Developers 2022-06 with an installed JustJ Adoptium OpenJDK Hotspot JRE Complete 18.0.1:
    → preferences General > Workspace: Text file encoding: Default (UTF-8)
    Project > Properties: Resource: Text file encoding: Inherited from container (UTF-8)
    → right-click Äpp.java: Properties: Resource: Text file encoding: Default (inherited from container: UTF-8)
    "Project 'Hello Wörld' has no explicit encoding set" warning
    "Syntax error on token "Invalid Character", delete this token" error in Äpp.java for the first character of the class name

See also eclipse-platform/eclipse.platform#86

@merks
Copy link
Contributor

merks commented Jun 9, 2022

I think this is to be expected:

https://openjdk.java.net/jeps/400

@iloveeclipse
Copy link
Member

iloveeclipse commented Jun 9, 2022

I think this is to be expected:

https://openjdk.java.net/jeps/400

But we should have a fix for that, so that we properly determine native charset, and the not explicitly set encoding stays the system one, not UTF8.

The whole point of changes we did via https://bugs.eclipse.org/bugs/show_bug.cgi?id=516583 and linked bugs was to make sure workspaces and projects have explicit encoding specified, to avoid unattended encoding changes resulting in data corruption.
We (me) just missed the point that native charset detection has to change too because of breaking JEP-400 changes.

@iloveeclipse iloveeclipse changed the title Workspace encoding becomes UTF-8 when running Eclipse 4.24 with Java 18 on Windows 10 Default (implicit) workspace encoding becomes UTF-8 when running Eclipse on Java 18 Jun 9, 2022
@iloveeclipse iloveeclipse self-assigned this Jun 9, 2022
@iloveeclipse iloveeclipse added the plan Planned bugs/enhancements for a release label Jun 9, 2022
@iloveeclipse iloveeclipse added this to the 4.25 milestone Jun 9, 2022
@howlger
Copy link
Author

howlger commented Jun 9, 2022

As @merks already pointed out, this is caused by JEP 400, so the following can be used as workaround.

Workaround when running Eclipse 2022-06 (4.24) with Java 18:
In eclipse.ini add the following line after the line -vmargs:

-Dfile.encoding=COMPAT

@howlger
Copy link
Author

howlger commented Jun 9, 2022

To get the native encoding, the system property file.encoding can be used in Java 17 or lower:

String nativeEncoding = System.getProperty("file.encoding");

To get the native encoding in Java 18 as well as in older Java versions, something like the following has to be used instead:

String nativeEncodingSinceJava17 = System.getProperty("native.encoding");
String nativeEncoding = nativeEncodingSinceJava17 == null ? System.getProperty("file.encoding") : nativeEncodingSinceJava17;

@merks
Copy link
Contributor

merks commented Jun 10, 2022

I could add this -Dfile.encoding=COMPAT to the JRE-specific tasks for the 18.x JRE in the catalog. I suppose old VMs would not like this option, right?

@iloveeclipse
Copy link
Member

I could add this -Dfile.encoding=COMPAT to the JRE-specific tasks for the 18.x JRE in the catalog. I suppose old VMs would not like this option, right?

Yes, that would work around the issue for all Eclipses we shipped up to now, and yes, that would needed for Java 18+ only.

@Bananeweizen
Copy link
Contributor

I could add this -Dfile.encoding=COMPAT to the JRE-specific tasks for the 18.x JRE in the catalog. I suppose old VMs would not like this option, right?

But wouldn't that become a problem on its own? I understand your suggestion as "Oomph packaged Eclipse installations remain with the Cp1252 encoding on Windows for eternity", which would make a default installation quite surprising for new developers sometime in the future (since it would not follow the then established UTF-8 everywhere principle).

And if it is not for eternity, then we only defer the issue and struggle with the same problems again at some point in the future. That's why I would rather be bold and go with the new encoding. If people didn't care about setting an encoding until now, some of them will be confused by either choice that we take. That's enough reason for me to take the more simple route. :)

@iloveeclipse
Copy link
Member

Or we leave installer with java 17 till we have a fix here (I'm on it).

iloveeclipse added a commit to iloveeclipse/eclipse.platform.runtime that referenced this issue Jun 10, 2022
With https://openjdk.java.net/jeps/400 implemented in Java 18,
"file.encoding" system property became meaningless and can't be used
anymore to determine system native encoding.

Unfortunately, that property was widely used in Eclipse API's and was
the standard way to get default system encoding. So both
org.eclipse.ui.WorkbenchEncoding.getWorkbenchDefaultEncoding() and
org.eclipse.core.resources.ResourcesPlugin.getEncoding() were using this
property and need now a proper replacement.

The new API tries first to get the value of the "native.encoding"
property (populated by Java 18), and if not there, uses internal
"sun.jnu.encoding" property (used in all supported Java versions).
In case neither property is set, Charset.defaultCharset() is used as
fallback solution.

See eclipse-platform/eclipse.platform.resources#154
iloveeclipse added a commit to iloveeclipse/eclipse.platform.runtime that referenced this issue Jun 10, 2022
With https://openjdk.java.net/jeps/400 implemented in Java 18,
"file.encoding" system property became meaningless and can't be used
anymore to determine system native encoding.

Unfortunately, that property was widely used in Eclipse API's and was
the standard way to get default system encoding. So both
org.eclipse.ui.WorkbenchEncoding.getWorkbenchDefaultEncoding() and
org.eclipse.core.resources.ResourcesPlugin.getEncoding() were using this
property and need now a proper replacement.

The new API tries first to get the value of the "native.encoding"
property (populated by Java 18), and if not there, uses internal
"sun.jnu.encoding" property (used in all supported Java versions).
In case neither property is set, Charset.defaultCharset() is used as
fallback solution.

See eclipse-platform/eclipse.platform.resources#154
iloveeclipse added a commit to iloveeclipse/eclipse.platform.resources that referenced this issue Jun 10, 2022
With https://openjdk.java.net/jeps/400 implemented in Java 18,
"file.encoding" system property became meaningless and can't be used
anymore to determine system native encoding.

Instead, use new Platform.getNativeEncoding() API that tries to provide
a suitable replacement compatible with Java 18+ and previous Java
versions.

See eclipse-platform#154
iloveeclipse added a commit to iloveeclipse/eclipse.platform.ui that referenced this issue Jun 10, 2022
With https://openjdk.java.net/jeps/400 implemented in Java 18,
"file.encoding" system property became meaningless and can't be used
anymore to determine system native encoding.

Instead, use new Platform.getNativeEncoding() API that tries to provide
a suitable replacement compatible with Java 18+ and previous Java
versions.

This PR fixes
eclipse-platform/eclipse.platform.resources#154
@iloveeclipse
Copy link
Member

I've pushed API proposal that we need to fix this issue, see eclipse-platform/eclipse.platform.runtime#63, and two other PR's in platform UI / resources to accommodate this change.

The one is the one that actually determines the workspace default encoding (!), not the #156, which is crazy enough, because IDE preference page that defines workspace defaults uses workbench API that has no dependencies to resources bundle itself. OMG. But now they both will use same API.

Would be good if interested committers could review that.

@laeubi
Copy link
Contributor

laeubi commented Jun 10, 2022

I just wanted to note that I know from some users using -Dfile.encoding=... in their eclipse ini, so even if that would work for the 'default user' other will still be broken.

iloveeclipse added a commit to iloveeclipse/eclipse.platform.ui that referenced this issue Jun 10, 2022
With https://openjdk.java.net/jeps/400 implemented in Java 18,
"file.encoding" system property became meaningless and can't be used
anymore to determine system native encoding.

Instead, use new Platform.getNativeEncoding() API that tries to provide
a suitable replacement compatible with Java 18+ and previous Java
versions.

Note: the proposed change makes sure that IF users have specified
encoding via command line arguments -Dfile.encoding=XYZ, it will be
still used.

This PR fixes
eclipse-platform/eclipse.platform.resources#154
iloveeclipse added a commit to iloveeclipse/eclipse.platform.resources that referenced this issue Jun 10, 2022
With https://openjdk.java.net/jeps/400 implemented in Java 18,
"file.encoding" system property became meaningless and can't be used
anymore to determine system native encoding.

Instead, use new Platform.getNativeEncoding() API that tries to provide
a suitable replacement compatible with Java 18+ and previous Java
versions.

Note: the proposed change makes sure that IF users have specified
encoding via command line arguments -Dfile.encoding=XYZ, it will be
still used.

See eclipse-platform#154
@iloveeclipse
Copy link
Member

I just wanted to note that I know from some users using -Dfile.encoding=... in their eclipse ini, so even if that would work for the 'default user' other will still be broken.

This all is a nightmare between compatibility to old non-standards, new non-standards, existing workspaces with no explicit encoding set and weird workbench/resources interdependencies.

@tjwatson : WDYT about a new eclipse command line argument -defaultWorkspaceEncoding ?
That would replace -Dfile.encoding=... for users who want specify default value for the workspace encoding for new workspaces and for workspaces without explicit encoding set.

We would then ignore -Dfile.encoding=... completely.

  • Pros: we get rid of using non-supported / broken -Dfile.encoding=... property.
  • Cons: all existing installation that relied on -Dfile.encoding=... set to some value would get native encoding instead, silent data corruption possible.

Alternative: we have to "pimp" both ResourcesPlugin and WorkbenchEncoding classes to "understand" this -Dfile.encoding=... by reading JVM command line arguments (and not the system property) like here:

List<String> commandLineArgs = ManagementFactory.getRuntimeMXBean().getInputArguments();

Pros: no "silent" data corruption for those using -Dfile.encoding=... property.
Cons: ugly code written to support non standard property obsoleted in Java 18...

Changes for the alternative (compatible) proposal would look like:

@laeubi
Copy link
Contributor

laeubi commented Jun 10, 2022

Just another alternative: We switch to always use UTF-8 as java does, and add a warning popup for workspace without an encoding asking the user what to set from now on for this workspace?

@iloveeclipse
Copy link
Member

iloveeclipse commented Jun 10, 2022

add a warning popup for workspace without an encoding asking the user what to set from now on for this workspace?

Interesting idea.
Instead a warning dialog after the fact, one can also ask to confirm / change the decision with the help of similar dialog:
image

Something like: Your workspace haven't explicitly specified file encoding, so "XYZ native one" was used. Do you want to continue using "XYZ native one"?

The problem with "we switch to UTF-8" is that users may not realize that they data will be corrupted from that moment on. So we have to propose native encoding OR the one specified via -Dfile.encoding=... command line anyway, because that was the encoding with which the data was saved.

I see a problem here with all the "custom" products based on platform, where "workspace" doesn't mean anything or mean something completely different and they will now get a completely unrelated / strange popup saying something about "workspace", "encoding", bla bla.

So if we go this way, we probably should add this in the IDEApplication, but in that case those products that are not using IDEApplication and care about workspace encoding, will miss the warning. But I think that's OK considered that every product owner is supposed to read N&N before product update to the new platform version. At least responsible products owners do that.

@iloveeclipse
Copy link
Member

Note: I will be mostly offline for a week + 1 day from now on, so don't expect any feedback from me in that time.

@laeubi
Copy link
Contributor

laeubi commented Jun 11, 2022

So if we go this way, we probably should add this in the IDEApplication

I think the "Workspace selection dialog" would be a good place for this, one could even think about a system property to disable this (e.g. if a -DeclipseEncoding=... is set then always use that), ein think it should the has three choices where the first is the default selected:

  1. Detected System Encoding
  2. UTF-8 (recommended for interoperability)
  3. Other

One might even detect if a file is stored in another encoding (I once wrote a charset detector class for that) and ask the user to convert that file if it is different.

iloveeclipse added a commit to eclipse-platform/eclipse.platform that referenced this issue Jun 22, 2022
With https://openjdk.java.net/jeps/400 implemented in Java 18,
"file.encoding" system property became meaningless and can't be used
anymore to determine system native encoding.

Unfortunately, that property was widely used in Eclipse API's and was
the standard way to get default system encoding. So both
org.eclipse.ui.WorkbenchEncoding.getWorkbenchDefaultEncoding() and
org.eclipse.core.resources.ResourcesPlugin.getEncoding() were using this
property and need now a proper replacement.

The new API tries first to get the value of the "native.encoding"
property (populated by Java 17+), and if not there, uses internal
"sun.jnu.encoding" property (used in all supported Java versions).
In case neither property is set, Charset.defaultCharset() is used as
fallback solution.

See eclipse-platform/eclipse.platform.resources#154
iloveeclipse added a commit to iloveeclipse/eclipse.platform.ui that referenced this issue Jun 22, 2022
With https://openjdk.java.net/jeps/400 implemented in Java 18,
"file.encoding" system property became meaningless and can't be used
anymore to determine system native encoding.

Instead, use new Platform.getSystemCharset() API that tries to provide
a suitable replacement compatible with Java 18+ and previous Java
versions.

Note: the proposed change makes sure that IF users have specified
encoding via command line arguments -Dfile.encoding=XYZ, it will be
still used.

This PR fixes
eclipse-platform/eclipse.platform.resources#154
iloveeclipse added a commit to iloveeclipse/eclipse.platform.resources that referenced this issue Jun 22, 2022
With https://openjdk.java.net/jeps/400 implemented in Java 18,
"file.encoding" system property became meaningless and can't be used
anymore to determine system native encoding.

Instead, use new Platform.getSystemCharset() API that tries to provide
a suitable replacement compatible with Java 18+ and previous Java
versions.

Note: the proposed change makes sure that IF users have specified
encoding via command line arguments -Dfile.encoding=XYZ, it will be
still used.

See eclipse-platform#154
iloveeclipse added a commit that referenced this issue Jun 22, 2022
With https://openjdk.java.net/jeps/400 implemented in Java 18,
"file.encoding" system property became meaningless and can't be used
anymore to determine system native encoding.

Instead, use new Platform.getSystemCharset() API that tries to provide
a suitable replacement compatible with Java 18+ and previous Java
versions.

Note: the proposed change makes sure that IF users have specified
encoding via command line arguments -Dfile.encoding=XYZ, it will be
still used.

See #154
@iloveeclipse iloveeclipse modified the milestones: 4.25, 4.25 M1 Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan Planned bugs/enhancements for a release
Projects
None yet
Development

No branches or pull requests

5 participants