Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate non-ASCII build file path on Windows #17077

Closed
mmarchini opened this issue Nov 16, 2017 · 18 comments · Fixed by #19756
Closed

Investigate non-ASCII build file path on Windows #17077

mmarchini opened this issue Nov 16, 2017 · 18 comments · Fixed by #19756
Assignees
Labels
addons Issues and PRs related to native addons. build Issues and PRs related to build files or the CI. i18n-api Issues and PRs related to the i18n implementation. windows Issues and PRs related to the Windows platform.

Comments

@mmarchini
Copy link
Contributor

  • Version: master
  • Platform: Windows
  • Subsystem: build

If there's a non-ASCII character in the path to the node directory, the build will break. This issue is only happening on Windows as far as I'm aware (there's a similar bug on POSIX systems, but it was already fixed). See the discussion on #16735 and the errors below:

image
(https://gist.github.com/anonymous/875aac0ab3e4dfa219854b691f8d3431)

@mscdex mscdex added build Issues and PRs related to build files or the CI. windows Issues and PRs related to the Windows platform. labels Nov 16, 2017
@seishun
Copy link
Contributor

seishun commented Feb 3, 2018

It's ironic that ICU doesn't support Unicode properly.

The issue here is that icutrim.py fails with a non-ASCII path:

C:\Users\Nikolai\WindÒo\node\tools\icu>python icutrim.py -P C:\Users\Nikolai\WindÒo\node\Debug\ -D ..\..\deps\icu-small\source\data\in\icudt60l.dat --delete-tmp -T C:\Users\Nikolai\WindÒo\node\Debug\obj\global_intermediate\icutmp -F icu_small.json -O icudt60l.dat -v -L en,root
Options: {'verbose': 1, 'filterfile': 'icu_small.json', 'toolpath': 'C:\\Users\\Nikolai\\Wind\xd2o\\node\\Debug\\', 'deltmpdir': 1, 'outfile': 'icudt60l.dat', 'datfile': '..\\..\\deps\\icu-small\\source\\data\\in\\icudt60l.dat', 'locales': 'en,root', 'endian': 'little', 'tmpdir': 'C:\\Users\\Nikolai\\Wind\xd2o\\node\\Debug\\obj\\global_intermediate\\icutmp'}
icu_small.json: icutrim.py config: Trim down ICU to just a certain locale set, needed for node.js use.
* converters: 0 items
* stringprep: 0 items
* translit: 2 items
* brkfiles: 0 items
* brkdict: 0 items
* confusables: 0 items
* brkitr: 0 items
* coll: 1 items
* curr: 1 items
* lang: 0 items
* rbnf: 0 items
* region: 0 items
* ROOT: 1 items
* unit: 0 items
* zone: 1 items

###WARNING: Encountered abnormal bytes while converting input stream to target encoding: U_ILLEGAL_CHAR_FOUND
        Pre-context: \Users\Nikolai\Win
        Context: Ê
        Post-context: o\node\Debug\iculs
An error occurred processing file C:\Users\Nikolai\WindÊo\node\Debug\obj\global_intermediate\icutmp\lang/\res_index.txt. Error: U_ILLEGAL_CHAR_FOUND
FAILED: C:\Users\Nikolai\WindÊo\node\Debug\genrb -d C:\Users\Nikolai\WindÊo\node\Debug\obj\global_intermediate\icutmp\lang/ -s C:\Users\Nikolai\WindÊo\node\Debug\obj\global_intermediate\icutmp\lang/ res_index.txt

Who maintains this file? It contains a link to http://bugs.icu-project.org/, but its history consists only of Node.js-local changes. cc @srl295

@srl295 srl295 self-assigned this Feb 3, 2018
@srl295 srl295 added i18n-api Issues and PRs related to the i18n implementation. addons Issues and PRs related to native addons. labels Feb 3, 2018
@srl295
Copy link
Member

srl295 commented Feb 3, 2018

Welcome to the irony of my life. You can workaround with full-icu as an option. Otherwise will look at this next week when I am at a windows box. And sorry.

@srl295
Copy link
Member

srl295 commented Feb 13, 2018

@seishun I can't even get this far. I get:

C:\Users\StevenLoomis\Documents\GitHub\NÒDE>vcbuild.bat
ERROR: The system was unable to find the specified registry key or value.
ERROR: The system was unable to find the specified registry key or value.
Cannot determine current version of Node.js

C:\Users\StevenLoomis\Documents\GitHub\NÒDE>

… and then an editor is popped up to edit getnodeversion.py

(running node --version returns v8.9.4)

I ran this from in powershell and cmd.exe, in and out of the visual studio command tools for 2015 and 2017.

@srl295
Copy link
Member

srl295 commented Feb 13, 2018

@mmarchini what is the actual path of your node dir ? I see both:

C:\Users\Nikolai\WindÒo

and

C:\Users\Nikolai\WindÊo

@seishun
Copy link
Contributor

seishun commented Feb 13, 2018

@srl295 That looks like you don't have Python installed (see #16864).

what is the actual path of your node dir ? I see both:

I reproduced it by placing it in "WindÒo". I don't know why Python outputs "WindÊo".

@srl295
Copy link
Member

srl295 commented Feb 17, 2018

Thx. Will retry when back at that box.

@seishun
Copy link
Contributor

seishun commented Feb 25, 2018

@srl295 do you think you'll have to time to look into it any time soon? I could try fixing it myself, but I might not have enough knowledge about it.

@srl295
Copy link
Member

srl295 commented Feb 26, 2018

@seishun I'll have to see, I just reinstalled my windows env and have some things coming up. So not necessarily soon.

@seishun
Copy link
Contributor

seishun commented Mar 31, 2018

The issue is in iculslocs. When run from a non-ASCII path, it generates a file with invalid UTF-8, which genrb chokes on.

To reproduce, run:

C:\<non-ascii path>\iculslocs -i icudt60l.dat -N icudt60l -T lang -b res_index.txt

@srl295 how would you suggest to proceed with this? It seems iculslocs is a third-party project.

@srl295
Copy link
Member

srl295 commented Mar 31, 2018 via email

@seishun
Copy link
Contributor

seishun commented Apr 1, 2018

It looks like you posted an unfinished comment.

@srl295
Copy link
Member

srl295 commented Apr 2, 2018

@seishun can you attach the res_index.txt file here? There are multiple files under ./out/Release/gen/icutmp/

@seishun
Copy link
Contributor

seishun commented Apr 2, 2018

You mean the file iculslocs generates? Here you go: res_index.txt

@srl295
Copy link
Member

srl295 commented Apr 2, 2018

^ that would still be interesting, but I think I found the problem.

iculslocs writes out:

(BOM)// -*- Coding: utf-8; -*-
//
// Warning this file is automatically generated
// Updated by ../../out/Release/iculslocs based on icudt60l:res_index.txt

… but the path given is based on argv[0] which is in process encoding, not necessarily utf-8. The path doesn't really add any value here. I'll change it to the following (note that the ../.. path is gone)

// -*- Coding: utf-8; -*-
//
// NOTE: This file was generated during the build process.
// Generator: tools/icu/iculslocs.cc
// Input package-tree/item: icudt60l-lang/res_index.res

(package icudt60l, tree lang, item res_index.res - probably only of interest to someone debugging )

@srl295
Copy link
Member

srl295 commented Apr 2, 2018

@seishun are you able to test the commit I made to https://github.com/srl295/node/tree/fix-iculslocs ?

@srl295
Copy link
Member

srl295 commented Apr 2, 2018

@seishun perfect, that confirms what I thought. I think the above fix will solve the issue. Thanks for your investigation.

For the record, your file had broken utf-8 in the comment mentioned above.

// Updated by C:\Users\Nikolai\Wind“o\node\Debug\iculslocs based on icudt60l-lang:res_index.txt

@seishun
Copy link
Contributor

seishun commented Apr 2, 2018

That fixes the issue with iculslocs, but doesn't completely fix the build issue because there is another similar issue with "protocol_generated_sources" from V8.

In addition, fixing both issues would allow builds from Latin-1 paths, but not for all Unicode paths. Fixing it generally probably requires moving to a different build system (#18560 (comment)).

@srl295
Copy link
Member

srl295 commented Apr 2, 2018

@seishun OK. this fix should at least make the ICU tooling subsystem not dependent on the path (whether latin1 or all unicode). I'll open a PR.

srl295 added a commit to srl295/node that referenced this issue Apr 3, 2018
- argv[0] was being emitted into a utf-8 stream, but argv[0] may
not be legal utf-8
- fix by not emitting argv[0] (was only for a source comment)
- partially resolves nodejs#17077

PR-URL: nodejs#19756
Fixes: nodejs#17077
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Nikolai Vavilov <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
targos pushed a commit that referenced this issue Apr 3, 2018
- argv[0] was being emitted into a utf-8 stream, but argv[0] may
not be legal utf-8
- fix by not emitting argv[0] (was only for a source comment)
- partially resolves #17077

PR-URL: #19756
Fixes: #17077
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Nikolai Vavilov <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
MylesBorins pushed a commit that referenced this issue Aug 17, 2018
- argv[0] was being emitted into a utf-8 stream, but argv[0] may
not be legal utf-8
- fix by not emitting argv[0] (was only for a source comment)
- partially resolves #17077

PR-URL: #19756
Fixes: #17077
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Nikolai Vavilov <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addons Issues and PRs related to native addons. build Issues and PRs related to build files or the CI. i18n-api Issues and PRs related to the i18n implementation. windows Issues and PRs related to the Windows platform.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants