Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output the system image as static data #22254

Merged
merged 3 commits into from
Jun 26, 2017
Merged

output the system image as static data #22254

merged 3 commits into from
Jun 26, 2017

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Jun 6, 2017

This replaces the current lisp-y system image compression with an static image format. This is more similar to how it would be handled in a static compiler.

benefits:

  • faster gc (fewer live objects)
  • smaller memory footprint (static mmap image allows OS to share pages and delay loading some)
  • uses less stack space (and doesn't require the various heuristic hacks that we have in dump.c to keep stack size in check)
  • simpler format (I think this'll make it easier to extend)
  • faster startup (less work to do)
  • expected to be easier to track how various optimizations affect memory usage (since on-disk size equals in-memory size)

fixes #19199
fixes #22320

src/staticdata.c Outdated
// array of definitions for the predefined function pointers
// (reverse of fptr_to_id)
static const jl_fptr_t id_to_fptrs[] = {
NULL, NULL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be 4 space indents (plus closing } on its own line)

@ararslan ararslan requested review from yuyichao and JeffBezanson June 6, 2017 20:01
@JeffBezanson
Copy link
Member

Cool! You can probably guess what I'm going to ask but here goes: any data on sysimg size and startup time?

src/gc.c Outdated
if (update_meta)
gc_setmark(ptls, o, bits, l * sizeof(void*) + sizeof(jl_svec_t));
if (update_meta) {
if (foreign_alloc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since objprofile_count is a non-default debug option, I think it'll be better to make update_meta = 0 in this case and do this in an else branch so that it's more clear to the compiler that the two branches can be merged together. It also seems like it's exactly the same code so maybe just do at the very beginning

if (update_meta) {
    if (((void*)o >= sysimg_base && (void*)o < sysimg_end)) {
        update_meta = 0;
        objprofile_count(vt, bits == GC_OLD_MARKED, jl_datasize_size(vt));
    }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a copy-paste error (weren't supposed to be the same). Sadly, I had implemented this the other way first (as you suggested), then forgot what what I was doing, and changed them all back to this version

@vtjnash
Copy link
Member Author

vtjnash commented Jun 6, 2017

$ time ./julia-80369bf7b4/bin/julia -e 0
real	0m0.433s
user	0m0.420s
sys	0m0.476s

$ time ./julia -e 0
real	0m0.333s
user	0m0.320s
sys	0m0.648s
$ time ./julia-80369bf7b4/bin/julia -e 'for i = 1:10; @time gc(false); end'
  0.133410 seconds, 99.95% gc time
  0.129873 seconds, 99.99% gc time
  0.127389 seconds, 99.99% gc time
  0.103277 seconds, 99.99% gc time
  0.000170 seconds, 98.40% gc time
  0.000152 seconds, 98.71% gc time
  0.000152 seconds, 98.76% gc time
  0.000151 seconds, 98.75% gc time
  0.000152 seconds, 98.76% gc time
  0.000152 seconds, 98.72% gc time
real	0m1.510s
user	0m1.616s
sys	0m1.500s

$ time ./julia -e 'for i = 1:10; @time gc(false); end'
  0.004610 seconds, 99.40% gc time
  0.002450 seconds, 99.89% gc time
  0.001323 seconds, 99.81% gc time
  0.000376 seconds, 99.35% gc time
  0.000110 seconds, 98.10% gc time
  0.000110 seconds, 98.03% gc time
  0.000109 seconds, 98.07% gc time
  0.000109 seconds, 98.15% gc time
  0.000109 seconds, 98.01% gc time
  0.000109 seconds, 98.07% gc time

real	0m0.930s
user	0m1.144s
sys	0m1.788s
$ time ./julia-80369bf7b4/bin/julia -e 'for i = 1:10; @time gc(); end'
  0.091677 seconds, 99.94% gc time
  0.093045 seconds, 99.99% gc time
  0.166092 seconds, 99.99% gc time
  0.090291 seconds, 99.99% gc time
  0.090208 seconds, 99.99% gc time
  0.090224 seconds, 99.99% gc time
  0.090873 seconds, 99.99% gc time
  0.090771 seconds, 99.99% gc time
  0.090349 seconds, 99.99% gc time
  0.090110 seconds, 99.99% gc time
real	0m1.991s
user	0m2.212s
sys	0m1.788s

$ time ./julia -e 'for i = 1:10; @time gc(); end'
  0.004310 seconds, 99.42% gc time
  0.002271 seconds, 99.89% gc time
  0.071036 seconds, 99.99% gc time
  0.069055 seconds, 99.99% gc time
  0.068984 seconds, 99.99% gc time
  0.069206 seconds, 99.99% gc time
  0.069419 seconds, 99.99% gc time
  0.069340 seconds, 99.99% gc time
  0.069038 seconds, 99.99% gc time
  0.069223 seconds, 99.99% gc time
real	0m1.478s
user	0m1.548s
sys	0m1.516s

RSS 148M from ./julia-80369bf7b4/bin/julia -e 'run(`htop`)'
RSS 126M from ./julia -e 'run(`htop`)'

size -A usr/lib/julia/sys.o
.data 73132035 0

sysimg size breakdown:

     sys data: 45716212 # random mutable data
  isbits data: 15671784 # mostly AST data
      symbols:   249675
    tags list:  2345872 # this can be compressed much better
   reloc list:  8834488 # this can be compressed much better
    gvar list:    79592 # 19898 pointers inserted into our generated code
    fptr list:   215464 # approximately 13500 methods compiled

$ ls -lh usr/lib/julia/
total 367M
-rw-rw-r-- 1 vtjnash vtjnash 6.8M Jun 6 18:23 inference.ji
-rw-rw-r-- 1 vtjnash vtjnash 100M Jun 6 18:27 sys-debug.o
-rwxrwxr-x 1 vtjnash vtjnash 88M Jun 6 18:28 sys-debug.so
-rw-rw-r-- 1 vtjnash vtjnash 91M Jun 6 18:27 sys.o
-rwxrwxr-x 1 vtjnash vtjnash 83M Jun 6 18:27 sys.so

@vtjnash vtjnash force-pushed the jn/staticdata branch 5 times, most recently from b192385 to 5c0c93f Compare June 8, 2017 06:47
src/staticdata.c Outdated
size_t item = backref_table_numel++;
char *pos = (char*)HT_NOTFOUND + 1 + item;
size_t item = ++backref_table_numel;
assert(item < (1 << 28) && "too many items to serialize");
Copy link
Contributor

@tkelman tkelman Jun 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

28 appears often here but its significance isn't obvious, give it a name?

@JeffBezanson
Copy link
Member

Nice. Sysimg is larger as one would expect, but seems to be totally worth it.

src/staticdata.c Outdated
static void jl_update_all_gvars(jl_serializer_state *s)
{
if (sysimg_gvars == NULL)
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one too few spaces

src/staticdata.c Outdated
uint32_t b2 = *(uint8_t*)(*base)++;
uint32_t b1 = *(uint8_t*)(*base)++;
uint32_t b0 = *(uint8_t*)(*base)++;
return b0 | (b1 << 8) | (b2 << 16) | (b3 << 24);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    uint32_t v;
    memcpy(&v, base, 4);
    return ntohl(v);

Is much easier for the compiler to optimize.

@@ -644,6 +644,37 @@ void gc_setmark_buf(jl_ptls_t ptls, void *o, uint8_t mark_mode, size_t minsz)
gc_setmark_buf_(ptls, o, mark_mode, minsz);
}

void jl_gc_force_mark_old(jl_ptls_t ptls, jl_value_t *v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and gc_sweep_sysimg etc below) should go into a header. It happens way too often when we put declaration in C files and they get out of sync when the implementation is updated.

@vtjnash
Copy link
Member Author

vtjnash commented Jun 23, 2017

just for fun: @nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

vtjnash added 3 commits June 24, 2017 16:24
we use this step to try to maximize the number of shared mmap pages
and minimize memory (virtual and physical) that is required.

the sorting order appears to have a strong impact on the time for a full gc
@vtjnash vtjnash merged commit c2c37ef into master Jun 26, 2017
@vtjnash vtjnash deleted the jn/staticdata branch June 26, 2017 07:41
@ihnorton
Copy link
Member

ihnorton commented Jun 26, 2017

Backport? @vtjnash said that would be the simplest fix for stack size issues in 0.6 (#22320)...

@tkelman
Copy link
Contributor

tkelman commented Jun 26, 2017

This is a pretty large change. What about the stack size compiler flag?

@ihnorton
Copy link
Member

That should work for embedders, more-or-less, but it's probably unreasonable to ask Python to recompile. That said, I have no idea how many people use pyjulia.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

embedding crash (?) on Windows SIGSEGV on arm (Raspberry Pi 3)
7 participants