Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freeze initialized runtime state for use in subsequent executions. #78

Closed
ericsnowcurrently opened this issue Jul 29, 2021 · 3 comments
Closed

Comments

@ericsnowcurrently
Copy link
Collaborator

This is based on a discusssion @markshannon and I had the other day, but it also relates to discussions I've had with other core devs periodically for several years.

The idea is to start up the runtime, finish initialization, and then take a snapshot of the process memory (or a subset). That snapshot is then rendered as a header file (a la frozen modules) which the runtime can use on subsequent executions to get to that initialized state instead of executing all the usual runtime code. (This is reminiscent of a technique xemacs uses.)

Benefits

  • possibly skip most of runtime init, getting us to running user code much faster
  • allow us to do one allocation (for the whole snapshot) instead of the many we normally do
  • (we may be able to get that snapshot into the DATA section to avoid allocation altogether, though likely not worth the trouble)
  • the snapshot could be re-used to speed up creating subinterpreters
  • if we make the snapshot dump human-readable, it could be a useful diagnostic tool

Caveats and Challenges

  • ? other than relatively short-lived ones, most Python processes won't benefit all that much
  • must be part of the build process (probably not realistic to do at runtime)
  • taking the snapshot might not be so easy
  • turning the snapshot back into a fully initialized runtime might not be so easy
  • there are lots of things to fix up (e.g. offsets, pointers, object hashes, maybe refcounts), which may make it too complex or otherwise neutralize any performance gains
  • command-line options and env vars can invalidate the snapshot

Open Questions

  • is it worth it?
  • is it worth the time to figure out if it's worth it?
  • would it make sense to do this with a subset of runtime initialization?
  • what should be in the snapshot?
  • what should the format be for the snapshot dump?
  • make it human-readable?
  • how to turn the initialized runtime into a snapshot?
    • in-proc vs. external
    • stdout vs. outfile
  • what should the format be for the data we will use to initialize the runtime? (e.g. in a header file)
  • how to render the snapshot dump as that data?
  • how to go from that data to a fully initialized runtime?
@ericsnowcurrently
Copy link
Collaborator Author

One thing @markshannon suggested is that we start off with the snapshot as just the initial graph of PyObject *, rather than the full runtime state.

@iritkatriel
Copy link
Collaborator

  • must be part of the build process (probably not realistic to do at runtime)

Could it not be impacted by the runtime environment, like ENV variables?

@ericsnowcurrently
Copy link
Collaborator Author

They would definitely impact the solution. We'd have to figure out how to deal with that.

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
Development

No branches or pull requests

3 participants