Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocations of Julia Tasks' stacks are missed by @time and Allocation Profiler, and at least 9 other allocations missed by Allocation Profiler #48216

Open
NHDaly opened this issue Jan 10, 2023 · 12 comments
Labels
observability metrics, timing, understandability, reflection, logging, ... profiler

Comments

@NHDaly
Copy link
Member

NHDaly commented Jan 10, 2023

There are 9 objects being allocated when allocating a Julia Task that are missed by the Allocation Profiler (#42768).

Additionally, both the Allocation Profiler and @time are missing the allocation of the stack for a Julia Task.

You can see in the following example, that we are actually missing around 9 allocations, including the allocation of the Task stack, but they are tracked by @time:

julia> using Profile

julia> @time Threads.@spawn sleep(10)  # warmup
  0.025083 seconds (23.00 k allocations: 1.436 MiB, 67.28% compilation time)
Task (runnable) @0x000000013056ab60

julia> @time Threads.@spawn sleep(10)  # warmup (again?)
  0.000061 seconds (30 allocations: 2.102 KiB)
Task (runnable) @0x0000000110b73340

julia> @time Threads.@spawn sleep(10)
  0.000064 seconds (29 allocations: 2.070 KiB)
Task (runnable) @0x0000000130568fd0

julia> @time Threads.@spawn sleep(10)
  0.000072 seconds (29 allocations: 2.070 KiB)
Task (runnable) @0x0000000130569270

julia> @time Threads.@spawn sleep(10)
  0.000066 seconds (29 allocations: 2.070 KiB)
Task (runnable) @0x0000000130569510

julia> Profile.Allocs.clear(); begin
           Profile.Allocs.@profile sample_rate=1 Threads.@spawn sleep(10)
       end
Task (runnable) @0x0000000130569660

julia> [a.size for a in Profile.Allocs.fetch().allocs]
20-element Vector{Int64}:
  48
  56
 104
  80
  16
  32
  56
 200
  32
  56
 128
   7
  40
  64
  89
  64
  32
  16
  32
 328

I think the stack allocation happens here, but i'm not 100% sure:

julia/src/gc-stacks.c

Lines 150 to 182 in b7201d6

JL_DLLEXPORT void *jl_malloc_stack(size_t *bufsz, jl_task_t *owner) JL_NOTSAFEPOINT
{
jl_task_t *ct = jl_current_task;
jl_ptls_t ptls = ct->ptls;
size_t ssize = *bufsz;
void *stk = NULL;
if (ssize <= pool_sizes[JL_N_STACK_POOLS - 1]) {
unsigned pool_id = select_pool(ssize);
ssize = pool_sizes[pool_id];
arraylist_t *pool = &ptls->heap.free_stacks[pool_id];
if (pool->len > 0) {
stk = arraylist_pop(pool);
}
}
else {
ssize = LLT_ALIGN(ssize, jl_page_size);
}
if (stk == NULL) {
if (jl_atomic_load_relaxed(&num_stack_mappings) >= MAX_STACK_MAPPINGS)
// we accept that this can go over by as much as nthreads since it's not a CAS
return NULL;
// TODO: allocate blocks of stacks? but need to mprotect individually anyways
stk = malloc_stack(ssize);
if (stk == MAP_FAILED)
return NULL;
}
*bufsz = ssize;
if (owner) {
arraylist_t *live_tasks = &ptls->heap.live_tasks;
arraylist_push(live_tasks, owner);
}
return stk;
}

or maybe this is a better place to instrument - this is the actual site of the mmap:

julia/src/gc-stacks.c

Lines 53 to 67 in b7201d6

static void *malloc_stack(size_t bufsz) JL_NOTSAFEPOINT
{
void* stk = mmap(0, bufsz, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (stk == MAP_FAILED)
return MAP_FAILED;
#if !defined(JL_HAVE_UCONTEXT) && !defined(JL_HAVE_SIGALTSTACK)
// setup a guard page to detect stack overflow
if (mprotect(stk, jl_guard_size, PROT_NONE) == -1) {
munmap(stk, bufsz);
return MAP_FAILED;
}
#endif
jl_atomic_fetch_add(&num_stack_mappings, 1);
return stk;
}

Where does this get tracked and added to the allocs reported in @time?

EDIT: I believe that the stack allocation is also not being captured by @time, since I think they're more like 4MiB, and I don't see any accounting for the allocation in the above code.

@NHDaly
Copy link
Member Author

NHDaly commented Jan 10, 2023

I'm not sure what the other 8 missing allocations are yet. This was the most obvious one, since there's around 2.00 KiB unaccounted for.

EDIT: Oh wait, duh, nvm, the existing allocations already sum to 1.480 KiB, so the stack allocation isn't currently being reported in @time, either! 💡

@NHDaly
Copy link
Member Author

NHDaly commented Jan 10, 2023

CC: @gbaraldi, @IanButterworth

@NHDaly
Copy link
Member Author

NHDaly commented Jan 10, 2023

EDIT: Oh wait, duh, nvm, the existing allocations already sum to 1.480 KiB, so the stack allocation isn't currently being reported in @time, either! 💡

@NHDaly NHDaly changed the title Allocation Profiler missing allocations for Julia Tasks' stacks, and 8 other allocations Allocation Profiler missing allocations for Julia Tasks' stacks, and 9 other allocations Jan 10, 2023
@NHDaly NHDaly changed the title Allocation Profiler missing allocations for Julia Tasks' stacks, and 9 other allocations Allocation Profiler missing allocations for Julia Tasks' stacks, and at least 9 other allocations Jan 10, 2023
@NHDaly NHDaly changed the title Allocation Profiler missing allocations for Julia Tasks' stacks, and at least 9 other allocations Allocations of Julia Tasks' stacks are missed by @time and Allocation Profiler, and at least 9 other allocations missed by Allocation Profiler Jan 10, 2023
@NHDaly
Copy link
Member Author

NHDaly commented Jan 10, 2023

K, i updated the description per my new understanding.

@gbaraldi
Copy link
Member

We might not want to report the stack allocation anyway, it's kind of a deep internals thing that might make a user go crazy. I do wonder where the other allocations are coming from. I guess an rr trace to see when the allocation counter goes up but we don't record the allocation.

@NHDaly
Copy link
Member Author

NHDaly commented Jan 10, 2023

Yeah, i'm not sure either.. I think that it might be useful though, since it is ultimately part of the user's program, and it's something that they have control over. E.g. if you find you're unexpectedly spawning like 10,000,000 tasks or something, and it's your main contributor to allocations?

But i can see the argument not to report them as well...

I guess an rr trace to see when the allocation counter goes up but we don't record the allocation.

Great idea, yeah! :) I don't think we have the cycles to push on this now, but i'll leave this idea here for the future, or for if you or anyone there has the cycles+interest to hunt these down. Thanks! 😊

@NHDaly
Copy link
Member Author

NHDaly commented Jan 10, 2023

If it was solely up to me, i think i'd lean towards reporting the allocations of the Task stacks as well.

I'd also be interested to check to see what Golang's allocations profiler does, and whether or not they report them.

@NHDaly
Copy link
Member Author

NHDaly commented Jan 10, 2023

CC: @vilterp as well

@vchuravy
Copy link
Member

The task stack is anonymous memory, we ask the OS for 4MiB but the actually allocation does not happen. So this is potential memory use not actual memory use. The cost of memory will be incurred when the stack actually grows to that size.

So I don't think we should report it as part of @time

@NHDaly
Copy link
Member Author

NHDaly commented Jan 12, 2023

💡 oh cool, thanks! I didn't know that. That's nifty. Okay yeah i think that sounds reasonable then. 👍

thanks! Should I repurpose this issue for the remaining 9 other missing allocs, or close it and open a new one?

@charleskawczynski
Copy link
Contributor

I'm curious if this is related to #50187

@NHDaly
Copy link
Member Author

NHDaly commented Sep 30, 2023

🤔 it doesn't look like it's exactly the same, since no tasks involved, but agreed that it's the same symptom... Yeah, maybe they're related! 💡 Thanks @charleskawczynski. I commented on the other thread too 👍

@NHDaly NHDaly added the observability metrics, timing, understandability, reflection, logging, ... label Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
observability metrics, timing, understandability, reflection, logging, ... profiler
Projects
None yet
Development

No branches or pull requests

4 participants