Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

Heap snapshots #39

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Heap snapshots #39

wants to merge 1 commit into from

Conversation

GustavoCaso
Copy link

@GustavoCaso GustavoCaso commented Jan 4, 2022

Adding snapshot functionality to v8go.

We use the provided API from v8 to create a snapshot from the SnapshotCreator class.

This PR adds:

  • v8.CreateSnapshot function in go to create a snapshot from a script
  • v8.NewIsolate now accepts a list of createOptions. This PR provides the WithStartupData option
  • We created some new C functions: CreateSnapshot, SnapshotBlobDelete, NewIsolateWithCreateParams, NewContextFromSnapShot

We can explore adding the AddData functionality as well. I wanted to get some thoughts before adding more code.

isolate.go Outdated Show resolved Hide resolved
isolate.go Outdated Show resolved Hide resolved
isolate.go Outdated Show resolved Hide resolved

const (
FunctionCodeHandlingKlear FunctionCodeHandling = iota
FunctionCodeHandlingKeep

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  //  kKeep - keeps any compiled functions
  //  kClear - does not keep any compiled functions

We should:

  1. Document what each one means here
  2. Decide if we want to keep the k prefix as well as correct FunctionCodeHandlingKlear->FunctionCodeHandlingClear

v8go.h Outdated Show resolved Hide resolved
v8go.cc Outdated Show resolved Hide resolved
v8go.cc Outdated Show resolved Hide resolved
@genevieve
Copy link

A general question to consider about the API is the indexing of contexts in the startup data. Right now the user could only pass in a single script in a snapshot blob. If we want to support multiple scripts being executed in a single snapshot blob such that all of that is available for any new context created from it we could (1) just have the user concatenate all their scripts together into one or (2) CreateSnapshot accepts variable Script where script follows the above structure and in our c++ code, we can compile and run each script and add it to the snapshot creator as we currently do (ndex = creator.AddContext(ctx)) so that each script as passed in maps to the same index number and a user can choose which scripts to run from a snapshot blob in a new context. This might be a more complicated usecase and possibly not worth striving for right now.

type Script struct {
  Code string
  Origin string
}

v8go.cc Outdated Show resolved Hide resolved
v8go.cc Outdated Show resolved Hide resolved
@dylanahsmith
Copy link

We should split creating the snapshot creator from capturing the snapshot so that we can actually get the full benefit of the V8 API (e.g. as NewSnapshotCreator and SnapshotCreator.CreateBlob).

For instance, this way we could not only capture running a single script (e.g. for builtins) but we could take a snapshot after loading an application script as well. It would also not just be limited to scripts, since we would also like to support JS and WASM modules in the future. It would also allow properties to be set, functions to be called, etc. Following the V8 API also makes it easier to extend v8go to add support for new features.


In trying to provide the full benefits of the V8 API, it does mean we will need to consider how to adapt a feature like external references to work in the Go API. V8 allows an array of external references to be passed in when creating the snapshot, which need to match the ones given to create the isolate from a snapshot. It looks like V8 uses this feature to handle function callbacks. To V8, it looks like the only v8go external reference would currently be FunctionTemplateCallback, but v8go would still need the corresponding Go references that get registered with Isolate.registerCallback.

It is worth considering where we can initially only support creating the snapshot when there are no registered Go callbacks (e.g. checking if Isolate.cbSeq is 0), but this would still require considering how the API can be designed so that this feature can be added later without a breaking change.

Unfortunately, it doesn't look like the way callbacks are currently registered in v8go will even work with external references, since a func value can't be compared for equality or used as a map index. If the callback were provided as an interface or a *func then it could work, but the stable reference would need to be created outside v8go, so it could be looked up to make sure it was provided as an external reference when the snapshot is created or when the isolate was created from the snapshot.

It does already look like the functions that register callbacks are already limited in other ways:

  • v8::FunctionTemplate::New has several optional arguments that v8go doesn't support, so it would need to be extended into a variadic function or another function would need to be provided that takes an options struct
  • v8::Promise::Then & v8::Promise::Catch actually take Local arguments, which would allow a JS function to be called without having to register another function callback.

Of course, we could introduce breaking changes since v8go hasn't had a stable release, it would just be good to avoid introducing new parts of the API that we know will need to break in the foreseeable future or we will have a hard time actually stabilizing the API.

@genevieve
Copy link

WRT splitting the constructor for the creator from creating snapshots, there is an older commit on this branch that had that implementation but things were condensed into a single function for the purposes of getting something functional. Now that tests are passing, should be straightforward to return to that format: 718b639

Comment on lines 19 to 31
type StartupData struct {
ptr *C.SnapshotBlob
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user of the library will need to be able to both get a byte array for serialization and to be able to construct the type for passing back for creating an isolate from it. So perhaps this should just be a byte slice.

Suggested change
type StartupData struct {
ptr *C.SnapshotBlob
}
type StartupData []byte

that would make it simpler to use, since there would be no need to garbage collect the data. When creating the isolate from this data, we would just need to store a reference to the byte slice in the isolate so that it isn't garbage collected while it is in use (assuming it needs to outlive the isolate it was used to initialize)

@GustavoCaso
Copy link
Author

GustavoCaso commented Jan 17, 2022

@dylanahsmith @genevieve I upddated the code so we can split the creation of the snapshot creator from actually creating the snapshot.

This is the new API:

snapshotCreator := v8.NewSnapshotCreator()

data, err := snapshotCreator.Create("function run() { return 1 };", "script.js", v8.FunctionCodeHandlingKlear)
fatalIf(t, err)

iso := v8.NewIsolate(v8.WithStartupData(data))
defer iso.Dispose()
defer snapshotCreator.Dispose()

ctx := v8.NewContext(iso)
defer ctx.Close()

runVal, _ := ctx.Global().Get("run")

fn, _ := runVal.AsFunction()
val, err := fn.Call(v8.Undefined(iso))

if val.String() != "1" {
	t.Fatal("invalid val")
}

I also added a new method Dispose that would take care of deleting all Snapshot creator information, such as the snapshot creator pointer as well as the snapshot blob.
So when the people are done using the snapshot creator they have to call snapshotCreator.Dispose(). As illustrated on the code above.

I also played with the idea of passing options when creating the Snapshot creator. As of today, V8 allows passing an Exiting Isolate, external references and an existing blob. V8 API

One issue I found when testing using an existing Isolate is that the SnaphotCreator destructor will take care of disposing and exiting the Isolate https://github.com/v8/v8/blob/lkgr/src/api/api.cc#L514-L515. So calling iso.Dispose() from Go would cause a crash. Since the isolate is already exited when we delete the SnapshotCreator

To avoid that crash I made sure to set the Isolate ptr to nil after creating the Blob:

if s.snapshotCreatorOptions.iso != nil {
          s.snapshotCreatorOptions.iso.ptr = nil
}

This avoids the v8go from crashing but we are getting some leak error on CI https://github.com/Shopify/v8go/runs/4841647995?check_suite_focus=true Since creating a new Isolate from Go it will allocate some internal data:

// Create a Context for internal use
m_ctx* ctx = new m_ctx;
ctx->ptr.Reset(iso, Context::New(iso));
ctx->iso = iso;
ctx->startup_data = nullptr;
iso->SetData(0, ctx);

@GustavoCaso
Copy link
Author

GustavoCaso commented Jan 17, 2022

I was chatting with @davars and we thought I nice API would be one that removes all logic for running scripts on the SnapshotCreator and simply accept a context on which all the scripts have been executed already. Then Snapshot Creator is only responsible to create blob from that context and return that data to the caller.

Something like this:

snapshotCreator := v8.NewSnapshotCreator()
iso := v8.NewIsolate()
defer iso.Dispose()
ctx := v8.NewContext(iso)
defer ctx.Close()

ctx.RunScript(`const add = (a, b) => a + b`, "add.js")
ctx.RunScript(`add(3, 4)`, "main.js")

snapshotCreator.SetContext(ctx)
data, err := snapshotCreator.CreateBlob(v8.FunctionCodeHandlingKlear)
if err != nil {
	snapshotCreator.Dispose()
}
defer data.Dispose()

iso2 := v8.NewIsolate(v8.WithStartupData(data))
defer iso2.Dispose()

runVal, err := ctx.Global().Get("run")
if err != nil {
	panic(err)
}

fn, err := runVal.AsFunction()
if err != nil {
	panic(err)
}
val, err := fn.Call(v8.Undefined(iso))
if err != nil {
	panic(err)
}
if val.String() != "1" {
	t.Fatal("invalid val")
}

The API is more similar to the one in V8

Here is the PR that works out this proposal #43

@GustavoCaso GustavoCaso force-pushed the heap-snapshots branch 2 times, most recently from f0c8e99 to c43c801 Compare January 19, 2022 15:07
@GustavoCaso
Copy link
Author

GustavoCaso commented Jan 19, 2022

@dylanahsmith @genevieve @davars These changes are ready for another review.

I addressed all the concerns outlined above:

  • Split the creation of the SnapshotCreator and the creation of the blob AddContext and Create methods
  • The memory management of StartupData is no longer the responsibility of the user and we take care of that for them ed2098d
  • I added Isolate options to the C method NewIsolate that way we do not have to add a new method NewIsolateWithCreateParams

There are a lot of commits and PR merges from experiments. If we are happy with the end result after another round of feedback I will squash the commits into one and we can open it upstream or use it in our fork to test

We can see how it could be use in oxygen-sws here https://github.com/Shopify/oxygen-sws/pull/713

Thanks you all for the invaluable feedback I got ❤️

@GustavoCaso
Copy link
Author

After working on this for some days I have some thoughts regarding the API and some ways we might be able to improve it.

Two points:

Supporting multiple contexts:

The snapshot creator supports adding multiple contexts. If we think we are going to add support in the future we might change how the API looks like. The idea of adding multiple context is selecting which context you want to use when creating a new context from the isolate that has the snapshot_blob associated.
If we see the snapshot blob binary format:

// Snapshot blob layout:
  // [0] number of contexts N
  // [1] rehashability
  // [2] checksum
  // [3] (128 bytes) version string
  // [4] offset to readonly
  // [5] offset to context 0
  // [6] offset to context 1
  // ...
  // ... offset to context N - 1
  // ... startup snapshot data
  // ... read-only snapshot data
  // ... context 0 snapshot data
  // ... context 1 snapshot data

We can see that it store all the contexts. The current API only supports adding one context, but that could change if we think having multiple contexts is something that we should implement.

The v8 API for getting a context from snapshot support passing the context snapshot index

MaybeLocal<Context> v8::Context::FromSnapshot(
    v8::Isolate* external_isolate, size_t context_snapshot_index,
    v8::DeserializeInternalFieldsCallback embedder_fields_deserializer,
    v8::ExtensionConfiguration* extensions, MaybeLocal<Value> global_object,
    v8::MicrotaskQueue* microtask_queue)

Knowing this we could have an API in v8go that looks something like:

snapshotCreator := v8.NewSnapshotCreator()
snapshotCreatorIso, err := snapshotCreator.GetIsolate()
fatalIf(t, err)

ctx1 := v8.NewContext(snapshotCreatorIso)
defer ctx1.Close()

ctx2 := v8.NewContext(snapshotCreatorIso)
defer ctx2.Close()

ctx1.RunScript(`function run() { return 1; }`, "run1.js")
ctx2.RunScript(`function run() { return 2 }`, "run2.js")
snapshot_index1 = snapshotCreator.AddContext(ctx1)
snapshot_index2 = snapshotCreator.AddContext(ctx2)

data, err := snapshotCreator.Create(v8.FunctionCodeHandlingKlear)

if err != nil {
	panic(err)
}

iso := v8.NewIsolate(v8.WithStartupData(data))
defer iso.Dispose()

contextFromSnapshot1 := v8.NewContextFromSnapShot(iso, snapshot_index1)
defer contextFromSnapshot1.Close()

runVal, err := contextFromSnapshot1.Global().Get("run")
if err != nil {
	panic(err)
}

fn, err := runVal.AsFunction()
if err != nil {
	panic(err)
}
val, err := fn.Call(v8.Undefined(iso))
if err != nil {
	panic(err)
}
val.String() == "1"

contextFromSnapshot2 := v8.NewContextFromSnapShot(iso, snapshot_index2)
defer contextFromSnapshot2.Close()

runVal2, err := contextFromSnapshot2.Global().Get("run")
if err != nil {
	panic(err)
}

fn2, err := runVal2.AsFunction()
if err != nil {
	panic(err)
}
val2, err := fn2.Call(v8.Undefined(iso))
if err != nil {
	panic(err)
}
val2.String() == "2"

This would make the v8go API to be closer to the v8 API for creating contexts, but with the extra overhead of developers having to store the different snapshot context index.

Running scripts on the context

The current API support passing a v8go context to AddContext that context most probably has used the provided v8go API to run scripts context.RunScript. The run script would execute the script and return a value to the caller. v8go internally store those values inside the context

v8go/v8go.cc

Lines 733 to 743 in ed2098d

Local<Value> result;
if (!script->Run(local_ctx).ToLocal(&result)) {
rtn.error = ExceptionError(try_catch, iso, local_ctx);
return rtn;
}
m_value* val = new m_value;
val->iso = iso;
val->ctx = ctx;
val->ptr = Persistent<Value, CopyablePersistentTraits<Value>>(iso, result);
rtn.value = tracked_value(ctx, val);

Also, before creating the snapshot we need to make sure to remove those internal values so v8 can create the blob.

v8go/v8go.cc

Line 314 in ed2098d

ContextFree(ctx);

If we envision people using the snapshot creator API similar to the way we might use it in oxygen where we run a bunch of scripts on a context and then we create a snapshot from that context, do we really need to keep track of the result from running the script?

We could change the API in a way that we do not allocate those internal values when running the scripts, that way we could be more efficient.

Something like:

snapshotCreator := v8.NewSnapshotCreator()

ctx1 := snapshotCreator.NewContext()
defer ctx1.Close()


ctx1.RunScript(`const add = (a, b) => a + b`)
ctx1.RunScript(`function run() { return add(3, 4); }`)
err = snapshotCreator.AddContext(ctx1)

data, err := snapshotCreator.Create(v8.FunctionCodeHandlingKlear)

if err != nil {
	panic(err)
}

iso := v8.NewIsolate(v8.WithStartupData(data))
defer iso.Dispose()

contextFromSnapshot := v8.NewContext(iso)
defer contextFromSnapshot.Close()

runVal, err := contextFromSnapshot.Global().Get("run")
if err != nil {
	panic(err)
}

fn, err := runVal.AsFunction()
if err != nil {
	panic(err)
}
val, err := fn.Call(v8.Undefined(iso))
if err != nil {
	panic(err)
}
val.String() == "7"

The function RunScript from the context provided by the snapshot creator would not store that values inside the local context, saving us from those allocations and having to free the context before creating the blob.

Any thoughts?

@dylanahsmith
Copy link

The function RunScript from the context provided by the snapshot creator would not store that values inside the local context, saving us from those allocations and having to free the context before creating the blob.

I don't think those allocations would be that significant. Also, that isn't unique to snapshots, since there are lots of cases where a script is only run for its side-effects of defining functions, classes, variables, etc.

Comment on lines 72 to 90
if s.ctx == nil {
return nil, errors.New("v8go: Cannot create a snapshot without first adding a context")
}

rtn := C.CreateBlob(s.ptr, s.ctx.ptr, C.int(functionCode))

s.ptr = nil
s.ctx.ptr = nil
s.iso.ptr = nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the DCHECK in ~SnapshotCreator seems to lock is into a requirement that we need to call CreateBlob on it before we can delete it.

What if we made Create always call CreateBlob, and that calling Create was the only way to delete the snapshot creator (which again, is forced on us by the debug check).

Ugly, but lets us also remove DeleteSnapshotCreator and its caller:

Suggested change
if s.ctx == nil {
return nil, errors.New("v8go: Cannot create a snapshot without first adding a context")
}
rtn := C.CreateBlob(s.ptr, s.ctx.ptr, C.int(functionCode))
s.ptr = nil
s.ctx.ptr = nil
s.iso.ptr = nil
var ctxPtr C.ContextPtr // assuming this is a null *Context
if s.ctx != nil {
ctxPtr = s.ctx.ptr
}
rtn := C.CreateBlob(s.ptr, ctxPtr, C.int(functionCode))
defer C.SnapshotBlobDelete(rtn)
s.ptr = nil
s.iso.ptr = nil
if s.ctx == nil {
return nil, errors.New("v8go: Cannot create a snapshot without first adding a context")
} else {
s.ctx.ptr = nil
}

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants