Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deps: add postject #45066

Closed
wants to merge 2 commits into from
Closed

deps: add postject #45066

wants to merge 2 commits into from

Conversation

RaisinTen
Copy link
Contributor

This change has been extracted from the single executable application PR - #45038.

I'm doing this in a separate PR because as James said in #45038 (comment), it is a rather sizeable addition on its own, so it should be easier to review and address related changes in this PR.

This dependency takes care of the resource injection part of the single executable application work.

Refs: https://github.com/nodejs/single-executable/blob/34b6765a871222e142f9dbac2bcde5b4e79ebfb2/blog/2022-08-05-an-overview-of-the-current-state.md#resource-injection
Signed-off-by: Darshan Sen [email protected]

cc @nodejs/single-executable

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/actions
  • @nodejs/tsc

@nodejs-github-bot nodejs-github-bot added dependencies Pull requests that update a dependency file. meta Issues and PRs related to the general management of the project. needs-ci PRs that need a full CI run. tools Issues and PRs related to the tools directory. labels Oct 19, 2022
This change has been extracted from the single executable application
PR - nodejs#45038.

I'm doing this in a separate PR because as James said in
nodejs#45038 (comment),
it is a rather sizeable addition on its own, so it should be easier to
review and address related changes in this PR.

This dependency takes care of the resource injection part of the single
executable application work.

Refs: https://github.com/nodejs/single-executable/blob/34b6765a871222e142f9dbac2bcde5b4e79ebfb2/blog/2022-08-05-an-overview-of-the-current-state.md#resource-injection
Signed-off-by: Darshan Sen <[email protected]>
@targos
Copy link
Member

targos commented Oct 19, 2022

deps/postject/postject.js is 4MB, that's huge!

@RaisinTen
Copy link
Contributor Author

RaisinTen commented Oct 19, 2022

@targos yea that's mostly because of the webassembly code that emscripten generates in https://github.com/postmanlabs/postject/blob/f15198b88b7882116c43ff1ea1db969155dcf2d9/scripts/build.mjs#L33. The .wasm code is 3.7 MB and when it is compiled with -sSINGLE_FILE in https://github.com/postmanlabs/postject/blob/f15198b88b7882116c43ff1ea1db969155dcf2d9/CMakeLists.txt#L23 to generate a JS file with the embedded base64 wasm code, it becomes 4.5MB. The final size is 4.6 MB because of running esbuild which bundles all of this to this file - https://github.com/postmanlabs/postject/blob/52502e648bd5b0eb53c330ff43b0e30f12a1a0aa/src/api.js.

@targos
Copy link
Member

targos commented Oct 19, 2022

Would it be similarly large if we directly implemented it in C++ in src/ ?

@RaisinTen
Copy link
Contributor Author

I'm not sure about that because the LIEF static library, which we build in the postject repo, is 20MB:

$ du -sh build/vendor/lief/libLIEF.a
 20M	build/vendor/lief/libLIEF.a

@targos
Copy link
Member

targos commented Oct 19, 2022

Ah, I didn't know postject depended on an external library. I think we need to include the license of libLIEF ?

@ovflowd
Copy link
Member

ovflowd commented Oct 19, 2022

A 4MB source file, that is statically versioned. Is there a reason we're actually not using git submodules here for this specific dependency?

@RaisinTen
Copy link
Contributor Author

Done, I've added the LICENSE file of LIEF to deps but excluded the rest of the source code because I don't think it is needed here but I can include that too if anyone wants.

@RaisinTen
Copy link
Contributor Author

RaisinTen commented Oct 19, 2022

A 4MB source file, that is statically versioned. Is there a reason we're actually not using git submodules here for this specific dependency?

The 4MB file is actually not tracked by git. It is bundled and published to npm only, so I don't think git submodules would help here.

SOFTWARE.
"""

- LIEF, located at deps/LIEF, is licensed as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect -- deps/LIEF doesn't contain a copy of LIEF, only its license.

Since postject bundles a WASM compiled version of LIEF in it then I would have expected LIEF's license to be mentioned in either postject's license or some form of SBOM in postject's repository.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardlau sent a PR to add LIEF's license to the main license in nodejs/postject#55 just like how we do it in this repo. Does that look good? If so, I'll pull that in here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you.

RaisinTen added a commit to nodejs/postject that referenced this pull request Oct 19, 2022
@mhdawson
Copy link
Member

mhdawson commented Oct 19, 2022

My understanding from the discussion in the collaborator summit was that we'd move postject to be under a repo in the nodejs org, not necessarily integrate into Node.js itself.

My personal thought is that the tools to inject the content make sense as separate tools from Node.js itself, while what gets integrated into Node.js should just be enough to support what those tools need (which I think is what is in the PR this was extracted from but have not had time to check yet).

@jasnell
Copy link
Member

jasnell commented Oct 19, 2022

My key question is: is the dependency actually necessary to achieve the goal? What is postject giving us that we can't do any other way that doesn't involve bringing in such a massive new dependency?

@bnoordhuis
Copy link
Member

What postject does on macOS and Linux isn't very hard to reimplement, on the order of a few hundred lines of C++. It doesn't need to be very general, merely Good Enough for our specific use case.

Less sure about Windows, although I suspect it'd be fairly similar to the other two. I've written a PE/COFF loader in the past and it wasn't super complicated but it's long ago enough that I don't recall all the details anymore.

(ELF and Mach-O are fresh in my memory though!)

RaisinTen added a commit to nodejs/postject that referenced this pull request Oct 20, 2022
Refs: nodejs/node#45066 (comment)
Signed-off-by: Darshan Sen <[email protected]>

Signed-off-by: Darshan Sen <[email protected]>
@RaisinTen
Copy link
Contributor Author

RaisinTen commented Oct 20, 2022

Postject contains 2 components:

  • deps/postject/postject.js - This exports a JS function which takes a resource (for PE) / section (for ELF and Macho-O) name and a buffer and injects that resource into the node binary at that section / resource. This one takes up 4MB when the LIEF WASM code gets bundled into it.
  • deps/postject/src/dist/postject-api.h - This is a C header file which contains a function that takes a resource / section name and reads the injected buffer from the binary.

Compiling the C header file and using the function inside Node.js' src/ is necessary to achieve the goal because otherwise, the resource would get injected but it won't get used.

I think the JS file, which is supposed to do the injection, also makes sense to be internally used behind something like the --compile flag. I can think of these reasons:

  • Currently src: add initial support for single executable applications #45038 uses the NODE_JS_CODE as the resource name for the embedded JS file. This might change and if it does, it would be a breaking change. Hiding the resource names behind a flag would make it possible to do such things without causing any breakage because users wouldn't have to directly deal with the resource names in the binary.

  • I was thinking about having 3 separate resources in the node binary:

    • NODE_JS_CODE - The application code which would eventually become an archive of multiple files, like a ZIP, ASAR or any other desired format
    • NODE_JS_BOOTSTRAPPER - The code that would contain the VFS implementation, that can be used to read and execute the contents of NODE_JS_CODE. Projects like Pkg could use their own VFS implementation, Electron could use ASAR, Yarn could use fslib, etc.
    • NODE_JS_ARGUMENTS - This is the list of command line arguments that would be passed to Node.js internally every time the binary is run.

    This list might grow even more if some more use cases come up and this would make user code more error prone. Users might unintentionally omit adding some of the resources necessary for SEAs to function correctly. They might even make typos in the resource names in which case they would think that the resource has been injected but in reality it has been added under a different resource name. Having this dependency in here behind the --compile flag would also solve this problem.

  • It would also be possible to write end-to-end tests for single-executables that would run on CI if we have the injector inside deps/.


What postject does on macOS and Linux isn't very hard to reimplement, on the order of a few hundred lines of C++.

@bnoordhuis are you referring to the resource injection part or the reading part?

@bnoordhuis
Copy link
Member

@bnoordhuis are you referring to the resource injection part or the reading part?

Both. What you call "resource injection" is relatively straightforward if you reserve the extra section upfront through a linker script (can be zero-sized if you put it at the end of the executable.)

It only gets hairy when you add a section after the fact and have to renumber/relocate everything.

@mhdawson
Copy link
Member

My key question is: is the dependency actually necessary to achieve the goal? What is postject giving us that we can't do any other way that doesn't involve bringing in such a massive new dependency?

And does it need to be part of the Node.js binary? The model I have in mind is that we build into the Node.js binary what is needed to find/run what's been bundled in, but the bundler is a separate project (maybe npm) that could be mainted in something like nodejs/bundler (or nodejs/postject) if that made sense? @RaisinTen would it make any sense for use to have a short call to discuss?

@ruyadorno
Copy link
Member

My understanding from the discussion in the collaborator summit was that we'd move postject to be under a repo in the nodejs org, not necessarily integrate into Node.js itself.

I had the same takeaway from that discussion. Now I'm also confused if it's going to be the postject repo itself that will be transferred to the nodejs org or is that work going to happen in a new repo for the bundler that other collaborators could work in?

@RaisinTen
Copy link
Contributor Author

Both. What you call "resource injection" is relatively straightforward if you reserve the extra section upfront through a linker script (can be zero-sized if you put it at the end of the executable.)

It only gets hairy when you add a section after the fact and have to renumber/relocate everything.

Isn't that what PKG is doing currently for macOS in https://github.com/vercel/pkg/pull/1164/files#diff-33529dcbe9d6f9623eb6b9d1308153c118fd606cf2d809ad7006903b3b08aa0cR33? The ARM64 binary that PKG generates is not valid for codesign to work on in some cases. That's why we thought that adding the resource as a new section would be a better approach.

@RaisinTen would it make any sense for use to have a short call to discuss?

Sure, let's discuss these in the call. I have opened a discussion in the single-executable repo for the meeting - nodejs/single-executable#53.

Now I'm also confused if it's going to be the postject repo itself that will be transferred to the nodejs org

@ruyadorno I think it's going to be just the postject repo, see nodejs/admin#739.

@bnoordhuis
Copy link
Member

Isn't that what PKG is doing currently for macOS [..]? The ARM64 binary that PKG generates is not valid for codesign to work on in some cases. That's why we thought that adding the resource as a new section would be a better approach.

I think we're saying the same thing: use a new section?

The codesign section (LC_CODE_SIGNATURE) needs to be last but that's added post facto by the codesign tool.

What the pull request you link to seems to be doing is extend the existing LC_SYMTAB section to include the source as a string. That works as long as the symtab is always the last section in the file (and the source is < 4 GB.)

@RaisinTen
Copy link
Contributor Author

@bnoordhuis

Both. What you call "resource injection" is relatively straightforward if you reserve the extra section upfront through a linker script (can be zero-sized if you put it at the end of the executable.)

It only gets hairy when you add a section after the fact and have to renumber/relocate everything.

It would probably require the users of postject to create and maintain linker scripts for all supported platforms that would create the new section. This might make postject harder to use, so we were also thinking about expecting users to add code like this to reserve the section:

__attribute__((section("__CUSTOM_SEGMENT,__CUSTOM_SECTION")))
static const char custom_section[] __attribute__((used)) = "custom section";

but then we realized that if we want to modify this section to inject the new contents, this too might require us to renumber/relocate everything.

What the pull request you link to seems to be doing is extend the existing LC_SYMTAB section to include the source as a string. That works as long as the symtab is always the last section in the file (and the source is < 4 GB.)

I don't remember why exactly this didn't work before but I'm sure that we weren't trying to inject anything greater than 20 MB.


I'll move this PR to draft for now and we'll try to come up with a LIEF-free implementation for at least one binary format in JS because that would significantly reduce the size of the deps/postject/postject.js file.

@RaisinTen
Copy link
Contributor Author

We discussed this in the single-executable meeting and we had agreement on not adding postject to the node binary at this point. Refs - https://github.com/nodejs/single-executable/blob/1840f3d9c5f4fa0d29aabd5618c4ff9745f7be87/meetings/2022-10-31.md.

@RaisinTen RaisinTen closed this Nov 3, 2022
@RaisinTen RaisinTen deleted the deps/add-postject branch November 3, 2022 11:28
@RaisinTen RaisinTen added the single-executable Issues and PRs related to single-executable applications label May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file. meta Issues and PRs related to the general management of the project. needs-ci PRs that need a full CI run. single-executable Issues and PRs related to single-executable applications tools Issues and PRs related to the tools directory.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants