-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add relocatable root compression #43881
Conversation
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. |
@nanosoldier |
Your package evaluation job has completed - no new issues were detected. A full report can be found here. |
Currently we can't cache "external" CodeInstances, i.e., those generated by compiling other modules' methods with externally-defined types. For example, consider `push!([], MyPkg.MyType())`: Base owns the method `push!(::Vector{Any}, ::Any)` but doesn't know about `MyType`. While there are several obstacles to caching exteral CodeInstances, the primary one is that in compressed IR, method roots are referenced from a list by index, and the index is defined by order of insertion. This order might change depending on package-loading sequence or other history-dependent factors. If the order isn't consistent, our current serialization techniques would result in corrupted code upon decompression, and that would generally trigger catastrophic failure. To avoid this problem, we simply avoid caching such CodeInstances. This enables roots to be referenced with respect to a `(key, index)` pair, where `key` identifies the module and `index` numbers just those roots with the same `key`. Roots with `key = 0` are considered to be of unknown origin, and CodeInstances referencing such roots will remain unserializable unless all such roots were added at the time of system image creation. To track this additional data, this adds two fields to core types: - to methods, it adds a `nroots_sysimg` field to count the number of roots defined at the time of writing the system image (such occur first in the list of `roots`) - to CodeInstances, it adds a flag `relocatability` having value 1 if every root is "safe," meaning it was either added at sysimg creation or is tagged with a non-zero `key`. Even a single unsafe root will cause this to have value 0.
81d5216
to
7aac483
Compare
This seems ready. I stripped out the asan work as it was submitted independently in #43885, but there are no other changes. |
I think the main thing we need to do to make this safe is to alter the system to avoid representing external objects directly in the IR, and instead use indirection descriptions (such as GlobalRef). To do that, we need to add a similar representation for a |
Do you mean if we want to handle the roots that are currently marked with |
PR JuliaLang#43793 passed the buildkite test but the logs for JuliaLang#43881 show an address sanitzer failure. Removing jl_precompile_toplevel_module from jl_exported_data.inc fixes the error. For good measure, set it to NULL at the point of definition, even though it gets nulled during initialization.
Currently we can't cache "external" CodeInstances, i.e., those generated by compiling other modules' methods with externally-defined types. For example, consider `push!([], MyPkg.MyType())`: Base owns the method `push!(::Vector{Any}, ::Any)` but doesn't know about `MyType`. While there are several obstacles to caching exteral CodeInstances, the primary one is that in compressed IR, method roots are referenced from a list by index, and the index is defined by order of insertion. This order might change depending on package-loading sequence or other history-dependent factors. If the order isn't consistent, our current serialization techniques would result in corrupted code upon decompression, and that would generally trigger catastrophic failure. To avoid this problem, we simply avoid caching such CodeInstances. This enables roots to be referenced with respect to a `(key, index)` pair, where `key` identifies the module and `index` numbers just those roots with the same `key`. Roots with `key = 0` are considered to be of unknown origin, and CodeInstances referencing such roots will remain unserializable unless all such roots were added at the time of system image creation. To track this additional data, this adds two fields to core types: - to methods, it adds a `nroots_sysimg` field to count the number of roots defined at the time of writing the system image (such occur first in the list of `roots`) - to CodeInstances, it adds a flag `relocatability` having value 1 if every root is "safe," meaning it was either added at sysimg creation or is tagged with a non-zero `key`. Even a single unsafe root will cause this to have value 0.
PR JuliaLang#43793 passed the buildkite test but the logs for JuliaLang#43881 show an address sanitzer failure. Removing jl_precompile_toplevel_module from jl_exported_data.inc fixes the error. For good measure, set it to NULL at the point of definition, even though it gets nulled during initialization.
Currently we can't cache "external" CodeInstances, i.e., those generated by compiling other modules' methods with externally-defined types. For example, consider `push!([], MyPkg.MyType())`: Base owns the method `push!(::Vector{Any}, ::Any)` but doesn't know about `MyType`. While there are several obstacles to caching exteral CodeInstances, the primary one is that in compressed IR, method roots are referenced from a list by index, and the index is defined by order of insertion. This order might change depending on package-loading sequence or other history-dependent factors. If the order isn't consistent, our current serialization techniques would result in corrupted code upon decompression, and that would generally trigger catastrophic failure. To avoid this problem, we simply avoid caching such CodeInstances. This enables roots to be referenced with respect to a `(key, index)` pair, where `key` identifies the module and `index` numbers just those roots with the same `key`. Roots with `key = 0` are considered to be of unknown origin, and CodeInstances referencing such roots will remain unserializable unless all such roots were added at the time of system image creation. To track this additional data, this adds two fields to core types: - to methods, it adds a `nroots_sysimg` field to count the number of roots defined at the time of writing the system image (such occur first in the list of `roots`) - to CodeInstances, it adds a flag `relocatability` having value 1 if every root is "safe," meaning it was either added at sysimg creation or is tagged with a non-zero `key`. Even a single unsafe root will cause this to have value 0.
Currently we can't cache "external" CodeInstances, i.e., those generated
by compiling other modules' methods with externally-defined types.
For example, consider
push!([], MyPkg.MyType())
: Base ownsthe method
push!(::Vector{Any}, ::Any)
but doesn't know aboutMyType
.While there are several obstacles to caching exteral CodeInstances,
the primary one is that in compressed IR, method roots are referenced
from a list by index, and the index is defined by order of insertion.
This order might change depending on package-loading sequence or other
history-dependent factors. If the order isn't consistent, our current
serialization techniques would result in corrupted code upon
decompression, and that would generally trigger catastrophic
failure. To avoid this problem, we simply avoid caching such
CodeInstances.
This enables roots to be referenced with respect to a
(key, index)
pair, where
key
identifies the module andindex
numbers just thoseroots with the same
key
. Roots withkey = 0
are considered to beof unknown origin, and CodeInstances referencing such roots will remain
unserializable unless all such roots were added at the time of system
image creation. To track this additional data, this adds two fields
to core types:
nroots_sysimg
field to count the numberof roots defined at the time of writing the system image
(such occur first in the list of
roots
)relocatability
having value 1if every root is "safe," meaning it was either added at sysimg
creation or is tagged with a non-zero
key
. Even a singleunsafe root will cause this to have value 0.
This is part 2 of a series that started in #43793. Change in sysimg size is less than half a percent.