Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node_mksnapshot.cc: code cache generated based on temporary RO heap layout #47636

Closed
schuay opened this issue Apr 20, 2023 · 7 comments
Closed

Comments

@schuay
Copy link

schuay commented Apr 20, 2023

Version

No response

Platform

No response

Subsystem

No response

What steps will reproduce the bug?

No response

How often does it reproduce? Is there a required condition?

No response

What is the expected behavior? Why is that the expected behavior?

No response

What do you see instead?

fbe1478 moved code cache generation into node_mksnapshot, but I suspect this process is (and has always been) unsupported by V8. It has just worked somewhat by accident til now.

A hidden requirement of V8's code serializer is that it runs inside an Isolate that has been deserialized from the final snapshot, i.e. the same snapshot that is used in the node binary itself. Why? Because SerializeReadOnlyObjectReference encodes the RO object ref as {chunk index, offset}, but RO space layout can and does change between RO space serialization- and deserialization-time (and thus the serialized encoded RO ref may be invalid).

I'm hitting this locally (based on a V8 change I'm currently working on), note how the offset of the "load" string changes:
in node_mksnapshot: 0x0a78926543d9 <String[4]: #load> offset 82904
in node 0x32bf83c14789 <String[4]: #load> offset 83848.

This results in the code-deserializer creating a pointer pointing at a random location inside RO space.

We could fix this in several ways,

  1. change the snapshot-generation process in Node s.t. code serialization happens after heap serialization,
  2. disable node_use_node_code_cache
  3. (long-term vision) change V8 to keep RO space layout fixed, or
  4. (pragmatic hack) have Node pass a parameter to ScriptCompiler::CompileFunction to enable slow-but-safe RO object serialization (although I hesitate to expose this quirk through the API)

@joyeecheung wdyt?

Additional information

No response

@joyeecheung
Copy link
Member

joyeecheung commented Apr 20, 2023

Thanks for reporting this, I did fbe1478 because of the same issue but I didn't realize that the RO heap could also change after serialization. I think what I was seeing from https://bugs.chromium.org/p/v8/issues/detail?id=12718 was that the code cache should be serialized from the same heap where the snapshot was taken to match the RO heap layout, but the order didn't matter as much.

Option 1 sounds good enough for me for now, thanks (3 would be better in the long term, or maybe some cleverer way to encode the RO heap positions can be invented to avoid requiring the layout to be fixed). Is it safe to still use the serialized isolate for compiling code? If so I think maybe simply moving

#ifdef NODE_USE_NODE_CODE_CACHE
to somewhere after the CreateBlob() would be enough? (I tried it in this branch, which seems to work fine with the V8 on main branch, but I'm not sure if it works with your patch)

@schuay
Copy link
Author

schuay commented Apr 25, 2023

Re option 1: no, unfortunately it's not enough to move code-cache serialization til after CreateBlob. It has to use precisely the same initial heap shape as 'production' isolates, i.e. an Isolate deserialized from the final snapshot. The simplest solution is to do code-cache serialization in a separate binary.

Any opinions on how we should proceed on the V8 side until this is fixed? I would like to land my change (crrev.com/c/4440686) soon, but am currently blocked by the node failure at https://logs.chromium.org/logs/node-ci/buildbucket/cr-buildbucket/8783294649412924721/+/u/build/compile/stdout. Suggestion: disable node_use_node_code_cache in our build until your fix trickles down.

@joyeecheung
Copy link
Member

joyeecheung commented Apr 25, 2023

To clarify, is it safe to deserialize that snapshot in the same process later and use the deserialized isolate to generate code cache? Or does it have to be deserialized from a new process? Does turning off shared RO heap make a difference here?

I am concerned that more differences in build configs e.g. disabling code cache in V8's build would result in more breakages that go unnoticed until it gets fixed properly, it depends on how long it's going to take to fix it though.

@joyeecheung
Copy link
Member

joyeecheung commented Apr 25, 2023

Or, from what I can tell from https://chromium-review.googlesource.com/c/v8/v8/+/4440686, it seems possible to make the timing of the sealing of the RO space configurable in V8 internally? If V8 keeps the old timing of sealing the RO heap for the embedder API, and uses the new extended timing for the internal default snapshot, we could have some time to tackle it without turning off code cache in Node.js while the V8 default snapshot gets the benefit. We still need to figure out how to address the potential mismatch of the code cache, but has always been the case anyway, and having the V8 default snapshot built somewhat differently from the snapshots built using the embedder API seems reasonable to me. Setting aside the code cache mismatch issue, I can see that maybe for the default V8 snapshot, the more moved from the isolate snapshot to the RO snapshot, the better, but that's not necessarily the case for snapshots built from the embedder API.

@schuay
Copy link
Author

schuay commented Apr 26, 2023

To clarify, is it safe to deserialize that snapshot in the same process later and use the deserialized isolate to generate code cache? Or does it have to be deserialized from a new process? Does turning off shared RO heap make a difference here?

With some custom logic (which would fully tear down RO space prior to deserialization) that'd work, but not out of the box. Currently we only fully support this in a new process.

it seems possible to make the timing of the sealing of the RO space configurable in V8 internally?

In V8, I've moved the Seal call around a bit to include initial Context setup as well. But I think that's unrelated to this issue - the RO space layout changes between the serialize- and deserialize steps due to the way we currently visit heap objects for serialization.

Also note RO space currently isn't unsealed when building custom snapshots. The steps that that led to the crash I've been seeing were:

  1. when building V8's default snapshot, we alloc String Foo in RO space.
  2. node_mksnapshot deserializes this default snapshot and creates a code cache entry referencing string Foo.
  3. node_mksnapshot serializes a custom snapshot, changing RO space layout.
  4. node deserializes the custom snapshot and the code cache entry, the reference to Foo is invalid.

@joyeecheung
Copy link
Member

With some custom logic (which would fully tear down RO space prior to deserialization) that'd work, but not out of the box. Currently we only fully support this in a new process.

I think for Node.js that might be an easier way out if we can still do it in the same process, or otherwise we need to update node.gyp to allow another circular dependency in the current circular dependency (libnode -> mkcodecache -> libnode + mksnapshot ->libnode), which isn't necessarily impossible, but would be more complicated to implement.

@schuay
Copy link
Author

schuay commented Aug 2, 2023

We fixed this on the V8 side by preserving RO space layout through serialization-deserialization.

@schuay schuay closed this as completed Aug 2, 2023
zcbenz pushed a commit to photoionization/node_ci that referenced this issue Aug 14, 2023
See github.com/nodejs/node/issues/47636.

If enabled, node creates the code cache on a transitory read-only heap
state that may be invalidated when the heap snapshot is taken. This
has worked so far, but will break in the near future. Disable this
option until we have a solution in node.

Note also the slight confusion between node_use_node_code_cache
and node_use_code_cache: gn/node/BUILD.gn uses `node_use_code_cache`
to set `NODE_USE_NODE_CODE_CACHE=1`.

Bug: v8:13949
Change-Id: Idbe1b18a36ff6eb6508dc6265fdb69418de04881
Reviewed-on: https://chromium-review.googlesource.com/c/v8/node-ci/+/4474327
Commit-Queue: Patrick Thier <[email protected]>
Reviewed-by: Patrick Thier <[email protected]>
Auto-Submit: Jakob Linke <[email protected]>
zcbenz pushed a commit to photoionization/node_ci that referenced this issue Aug 14, 2023
This reverts commit 742268f.

Reason for revert: V8 now preserves RO space layout through
serialization crrev.com/c/4543306

Original change's description:
> Temporarily disable node_use_node_code_cache
>
> See github.com/nodejs/node/issues/47636.
>
> If enabled, node creates the code cache on a transitory read-only heap
> state that may be invalidated when the heap snapshot is taken. This
> has worked so far, but will break in the near future. Disable this
> option until we have a solution in node.
>
> Note also the slight confusion between node_use_node_code_cache
> and node_use_code_cache: gn/node/BUILD.gn uses `node_use_code_cache`
> to set `NODE_USE_NODE_CODE_CACHE=1`.
>
> Bug: v8:13949
> Change-Id: Idbe1b18a36ff6eb6508dc6265fdb69418de04881
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/node-ci/+/4474327
> Commit-Queue: Patrick Thier <[email protected]>
> Reviewed-by: Patrick Thier <[email protected]>
> Auto-Submit: Jakob Linke <[email protected]>

Fixed: v8:13949
Change-Id: I454809c6393f41ff80fc593cb2972a061d614d1c
Reviewed-on: https://chromium-review.googlesource.com/c/v8/node-ci/+/4577062
Auto-Submit: Jakob Linke <[email protected]>
Reviewed-by: Patrick Thier <[email protected]>
Commit-Queue: Patrick Thier <[email protected]>
nodejs-github-bot pushed a commit that referenced this issue Aug 18, 2023
V8 now requires the code cache to be compiled with a finalized
read-only space, so we need to serialize the snapshot to get
a finalized read-only space first, then deserialize it to compile
the code cache.

PR-URL: #49099
Refs: #47636
Refs: https://bugs.chromium.org/p/v8/issues/detail?id=13789
Reviewed-By: Yagiz Nizipli <[email protected]>
UlisesGascon pushed a commit that referenced this issue Sep 10, 2023
V8 now requires the code cache to be compiled with a finalized
read-only space, so we need to serialize the snapshot to get
a finalized read-only space first, then deserialize it to compile
the code cache.

PR-URL: #49099
Refs: #47636
Refs: https://bugs.chromium.org/p/v8/issues/detail?id=13789
Reviewed-By: Yagiz Nizipli <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants