-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Charset terribleness breaks Electron apps #6011
Comments
looks like you want the https://github.com/mishoo/UglifyJS2 https://github.com/requirejs/r.js/blob/master/build/example.build.js#L179 |
also electron/electron#678 suggests that |
Thanks @shunter! I do have the I tried another thing that appears to work: Double slashes. I believe this leaves it to the client JS parser to interpret the codes, instead of our build process. Would this be OK? Or would it be better to add the
|
@shunter I tried to enable this but the build has no diffs, so it didn't take effect. |
Looks like |
@emackey I'm confused as to why this is a problem for you. The latest Cesium release works fine in Electron for me. Here's the HTML (index.js just creates the browser window and loads this HTML page and that's the entire app). <!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1, minimum-scale=1, user-scalable=no">
<title>Cesium</title>
<style>
@import url(./node_modules/cesium/Build/Cesium/Widgets/widgets.css);
html,
body,
#cesiumContainer {
width: 100%;
height: 100%;
margin: 0;
padding: 0;
overflow: hidden;
}
</style>
</head>
<body>
<div id="cesiumContainer"></div>
<script type="text/javascript" src="./node_modules/cesium/Build/Cesium/Cesium.js"></script>
<script>
window.viewer = new Cesium.CesiumWidget('cesiumContainer');
</script>
</body>
</html> |
It breaks in VSCode's HTML preview window. Scott's fix is going to work. Also I found some minus signs that weren't ASCII minus signs. PR soon. |
This seems like a bug in VSCode though, doesn't it? Even if we "fix" it in Cesium, sounds like we should opena a VSCode issue for it. I have no idea if other build systems, like webpack, have ascii only options. |
It used to work though, in VSCode. The hebrew/arabic stuff is what broke it specifically. And oddly, that's nowhere near the only place in Cesium where we've been using UTF-8 codes. I don't yet fully understand what's different about the new RTL regex from all the other old places we encode non-ascii things. |
Even more of a reason to open an issue in VSCode, they might be able to pinpoint the problem a lot faster since it works in browsers and in Electron and from our understanding we are doing nothing incorrect. I just want to make sure we full understand the problem before we implement a solution. |
Well. New clue: All of the existing non-ascii chars in Cesium are wrong in VSCode. But I'm only using the Scene class, and it doesn't depend on most of that, so you can't see they're wrong. The RegExp however contains a sanity check, so when you do something like |
Also I confirmed that I can manually add a BOM to the top of Cesium.js to fix VSCode. Naturally I don't want to do that, as it's a manual edit with each Cesium upgrade, and would prevent using the npm package eventually. |
Switching to an ASCII build adds 1330 bytes to the 2MB |
Well, dang. Almost too easy:
|
Cool, just to close the loop on this, the charset specified at the HTML level should normally take care of this, but for whatever reason in VSCode you need to specify the charset directly on the script? |
I blame the And if you think about it, the charset specified on the HTML technically should only apply to the HTML itself, not any included files. The trick of applying it to included JS files that you mention above is basically just an assumption on the part of the loader, that two separate files "probably" use the same charset. I would expect the real answer to come from HTTP headers on the JS itself, but, there's no HTTP in my environment here. Also, one more loop needs closing: Two of your minus signs aren't real minus signs. Do these chars act as actual RegExp range characters, or are they just loose UTF-8 things mucking up your language detection? We should write a unit test to test a char in the middle of each affected range. |
I was curious so I tested it out. The not-real minus sign is actually an en dash,
So, yes, the current code does not actually work. Looks like a unit test is missing. |
@shunter Your sample above is missing square brackets, but the findings are correct.
|
This is going to be a strange one.
Quick refresher: UTF-8 is, sadly, not the default charset, and must be manually specified. On the web, this is done with HTTP headers (or in HTML, a
<meta>
tag that emulates the correct HTTP header). On the desktop however, this is done with a BOM (Byte Order Mark, about 3 bytes at the start of a text file). The BOM is tolerated by most browsers, but is not intended to be transmitted over the web, as the web is supposed to be using the charset encoding HTTP headers.The built, combined/minified
Cesium.js
file now contains some UTF-8 range characters, as of this month's 1.39 release, due to Arabic and Hebrew chars added to a RegExp in #5931. Cesium's minifier dutifully makes this smaller by taking out the\u
codes and replacing them with actual UTF-8 chars.On the web, this is no problem. Cesium.js does not start with a BOM, since the web does not want a BOM. Most web servers are configured to serve the JS file as UTF-8, and so they interpret these characters correctly.
In a desktop Electron environment though, we're hosed. Electron does not automatically pick this up as UTF-8. The entire Cesium.js file fails to parse, and breaks the entire app even if Cesium is not being used.
One possible solution would be to somehow prevent the minifier from converting these
\u
codes to actual UTF-8. It would be nice to keepCesium.js
free of live UTF-8 chars for the sake of transmission through unorthodox mechanisms that can't handle the extended character set. The end client's own JS implementation should be able to recover the needed chars from their\u
codes that can survive an ASCII transmission. Thoughts?/cc @mramato
The text was updated successfully, but these errors were encountered: