Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocate objects in pools with correct alignment #21959

Closed
wants to merge 1 commit into from

Conversation

vchuravy
Copy link
Member

Fixes #21918.

The issue was that out gc pools only guaranteed 16 bytes alignment and the pool pages are only aligned by 16 bytes as well.
This PR changes the alignment of the pool pages to 64 bytes (enough for avx512) and takes the alignment information into account when selecting the szclass/pool.
It correctly handles unaligned allocations and allocations that are near the maximum pool size.

This should be squashed a bit before we merge.

@vchuravy vchuravy added backport pending 0.6 bugfix This change fixes an existing bug GC Garbage collector labels May 19, 2017
@vchuravy vchuravy requested a review from yuyichao May 19, 2017 10:09

STATIC_INLINE jl_value_t *jl_gc_alloc_(jl_ptls_t ptls, size_t sz, void *ty)
{
const size_t allocsz = sz + sizeof(jl_taggedvalue_t);
if (allocsz < sz) // overflow in adding offs, size was "negative"
jl_throw(jl_memory_exception);
size_t alignment = JL_SMALL_BYTE_ALIGNMENT;
if (ty && ((uintptr_t)ty != jl_buff_tag) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function shouldn't read ty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I need access to the alignment at this point and ty is used in jl_set_typeof(v, ty) later in the function.
Otherwise I will need to pass in the alignment from the outside.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size is also passed in explicitly. I think we can just have an aligned version

base/atomics.jl Outdated
@@ -322,7 +322,7 @@ inttype(::Type{Float32}) = Int32
inttype(::Type{Float64}) = Int64


alignment(::Type{T}) where {T} = ccall(:jl_alignment, Cint, (Csize_t,), sizeof(T))
alignment(::Type{T}) where {T} = ccall(:jl_alignment, Cint, (Any,), T)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it's just datatype_alignment now.)

@@ -170,30 +176,25 @@ static const int jl_gc_sizeclasses[JL_GC_N_POOLS] = {
// 64, 32, 160, 64, 16, 64, 112, 128, bytes lost
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we should change this list for this.

// szclass 16+
return 16;
#endif
// The pools are aligned wit JL_CACHE_BYTE_ALIGNMENT (typically 64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with

@vchuravy vchuravy force-pushed the vc/pool_alignment branch 2 times, most recently from 19b847c to ab60fc6 Compare May 20, 2017 06:03
@vchuravy vchuravy force-pushed the vc/pool_alignment branch 2 times, most recently from 19ee5f9 to dcf3920 Compare May 25, 2017 05:30
@vchuravy vchuravy force-pushed the vc/pool_alignment branch from 7e38329 to fe5bd9f Compare June 1, 2017 05:36
#define JL_CACHE_BYTE_ALIGNMENT 64
// JL_HEAP_ALIGNMENT is the maximum alignment that the GC can provide
#define JL_HEAP_ALIGNMENT JL_SMALL_BYTE_ALIGNMENT
#define GC_MAX_SZCLASS (2032-sizeof(void*))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the significance of the 2032?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the largest object size that we use pools for. Also, see the last entry in jl_gc_sizeclasses

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the chances of it changing in either place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the sizeof(void*) comes from, but this PR is supposed to be changing this list (#21959 (comment))

src/ccall.cpp Outdated
@@ -1214,8 +1214,9 @@ static jl_cgval_t mark_or_box_ccall_result(Value *result, bool isboxed, jl_value
const DataLayout &DL = *jl_ExecutionEngine->getDataLayout();
#endif
unsigned nb = DL.getTypeStoreSize(result->getType());
unsigned alignment = DL.getPrefTypeAlignment(result->getType());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be DL.getABITypeAlignment(result->getType())

#define JL_CACHE_BYTE_ALIGNMENT 64
// JL_HEAP_ALIGNMENT is the maximum alignment that the GC can provide
#define JL_HEAP_ALIGNMENT JL_SMALL_BYTE_ALIGNMENT
#define GC_MAX_SZCLASS (2032-sizeof(void*))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the largest object size that we use pools for. Also, see the last entry in jl_gc_sizeclasses

return 16 - 16376 / 2 / LLT_ALIGN(sz, 16 * 2) + 24 + N;
return 16 - 16376 / 1 / LLT_ALIGN(sz, 16 * 1) + 32 + N;
size_t klass = 0;
while (klass < JL_GC_N_POOLS) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems way more expensive (also given that this function is already a pretty significant percentage of the time to allocate an object)

Copy link
Member Author

@vchuravy vchuravy Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I know I wanted a fast way to check correctness before going back to the jump table.

#define JL_CACHE_BYTE_ALIGNMENT 64
// JL_HEAP_ALIGNMENT is the maximum alignment that the GC can provide
#define JL_HEAP_ALIGNMENT JL_SMALL_BYTE_ALIGNMENT
#define GC_MAX_SZCLASS (2032-sizeof(void*))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the sizeof(void*) comes from, but this PR is supposed to be changing this list (#21959 (comment))

@vchuravy vchuravy force-pushed the vc/pool_alignment branch 2 times, most recently from 33b89c5 to 06ed45f Compare June 1, 2017 08:36
@vchuravy
Copy link
Member Author

vchuravy commented Jun 2, 2017

@jlbuild !filter=linux

@jlbuild
Copy link

jlbuild commented Jun 2, 2017

Status of f487f73 builds:

Builder Name Build Download
linux32 COMPLETE Download
linux64 COMPLETE Download
linuxaarch64 ERRORED N/A
linuxarmv7l COMPLETE Download
linuxppc64le COMPLETE Download

@vchuravy
Copy link
Member Author

vchuravy commented Jun 2, 2017

AppVeyor failure is #20152 (comment) ...

@vtjnash
Copy link
Member

vtjnash commented Jul 15, 2017

Why did this stall out? Are we close in it?

@vchuravy
Copy link
Member Author

vchuravy commented Jul 15, 2017 via email

@JeffBezanson
Copy link
Member

Bump. This is becoming a bit urgent, as we have frequent CI failures due to this issue.

@vchuravy
Copy link
Member Author

vchuravy commented Sep 24, 2017

OK, I rebased this so it is on the same stand as earlier this year. I will need help with making sure that the logic for jl_gc_szclass is sound and performant.

-- edit I hope I got the rebase right and complete, so a review would be appreciated

@vchuravy
Copy link
Member Author

freebsd CI is a wonderful LLVM ERROR: Program used external function '__atomic_store' which could not be resolved!. This means probably two things.

  1. @ararslan & @iblis17 are we using libatomic on FreeBSD?
  2. The alignments for one of the atomic types is off, and we are outputting a generic atomic_store

AppVeyor is a lovely:

From worker 16:	Warning: threaded loop executed in order
	From worker 16:	
	From worker 16:	Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
	From worker 16:	Exception: EXCEPTION_ACCESS_VIOLATION at 0x6c6c15c0 -- _atomic_load at C:\projects\julia\julia-f569b7bf15\bin\libatomic-1.DLL (unknown line)
	From worker 16:	in expression starting at C:\projects\julia\julia-f569b7bf15\share\julia\test\threads.jl:271
	From worker 16:	_atomic_load at C:\projects\julia\julia-f

Which shouldn't have seqfaulted, but is probably a wrong alignment as well.

and CircleCI seems unrelated.

@yuyichao
Copy link
Contributor

are we using libatomic on FreeBSD?

That code is not turned on for FreeBSD.

@vchuravy
Copy link
Member Author

vchuravy commented Sep 25, 2017

One new test failure from Travis Mac 64bit

	From worker 8:	union_isbits at /Users/travis/build/JuliaLang/julia/src/datatype.c:244
	From worker 8:	union_isbits at /Users/travis/build/JuliaLang/julia/src/datatype.c:248
	From worker 8:	jl_islayout_inline at /Users/travis/build/JuliaLang/julia/src/datatype.c:267
	From worker 8:	_new_array at /Users/travis/build/JuliaLang/julia/src/array.c:146 [inlined]
	From worker 8:	jl_alloc_array_1d at /Users/travis/build/JuliaLang/julia/src/array.c:415

and #23371 on Travis Linux 32bit.

@vchuravy
Copy link
Member Author

As far as I am aware one wants to align atomic types by their natural alignment e.g. sizeof == alignment.

On current master this invariant in correct:

T = Base.Threads.Atomic{Int128}
julia> Base.Threads.alignment(T)
16
julia> Base.Threads.alignment(T) == Base.sizeof(T)
true
julia> Base.datatype_alignment(T)
8
julia> Base.datatype_alignment(Int128)
8

On this branch there is no longer a special alignment query function since the ideas was to only have one source of truths for all alignments and as one can see already on master datatype_alignment and Threads.alignment disagree for (U)Int128, which turns out to be related to #22649 (a classic case of mobius #yak shaving)

src/julia.h Outdated
@@ -155,6 +155,7 @@ JL_EXTENSION typedef struct {
#endif
jl_array_flags_t flags;
uint16_t elsize;
uint16_t elalign;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? we don't have any spare bits right here, but you can steal some from offset

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in multiple places like jl_array_copy and during deserialization to create a new array with the right alignment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but those aren't performance sensitive. How does this differ from
el->isptr ? sizeof(void*) ? jl_gc_align(jl_tparam0(jl_typeof(el))?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that should work, I was mostly mirroring what elsize is doing.

#define JL_CACHE_BYTE_ALIGNMENT 64
// JL_HEAP_ALIGNMENT is the maximum alignment that the GC can provide
#define JL_HEAP_ALIGNMENT JL_CACHE_BYTE_ALIGNMENT
#define GC_MAX_SZCLASS (2032-sizeof(void*))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2032 has no significance. It is simply:
(((GC_PAGE_SZ - ALIGN + sizeof(jl_taggedvalue_t)) ÷ 8) ÷ ALIGN) * ALIGN
per the formula below (the 8 is arbitrary)

// An alignment of 0 or 1 means unaligned and we can use sz directly.
if (alignment != 0 && alignment != 1 && alignment != sz)
sz = ((sz / alignment) + 1) * alignment;
return sz;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_t alsz = LLT_ALIGN(sz, alignment);
return alignment ? alsz : sz;

jl_value_t *v;
if (allocsz <= GC_MAX_SZCLASS + sizeof(jl_taggedvalue_t)) {
int pool_id = jl_gc_szclass(allocsz);
if (klass != -1 && alignsz <= GC_MAX_SZCLASS + sizeof(jl_taggedvalue_t)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size test seems unnecessary (redundant with klass test)

osize = jl_gc_sizeclasses[pool_id];
}
else {
osize = p->osize;
}
assert((size_t)osize >= alignment &&
(alignment == 0 || alignment == 1 || osize % alignment == 0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alignment == 0 || (osize & (alignment - 1)) == 0 (if you want, also can add && (alignment & (alignment - 1) == 0) to assert that alignment is a power-of-two)

#else
# define jl_gc_alloc(ptls, sz, ty) jl_gc_alloc_(ptls, sz, ty)
# define jl_gc_alloc(ptls, sz, align, ty) jl_gc_alloc_(ptls, sz, align, ty)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should keep the same names on both sides of the conditional

test/threads.jl Outdated
@@ -280,6 +280,10 @@ let atomic_types = [Int8, Int16, Int32, Int64, Int128,
filter!(T -> sizeof(T)<=8, atomic_types)
end
for T in atomic_types
# Check that alignment is natural alignment
@test Base.datatype_alignment(T) == Base.sizeof(T)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is currently broken for Atomic{(U)Int128}

src/dump.c Outdated
@@ -1349,7 +1355,7 @@ static jl_value_t *jl_deserialize_value_array(jl_serializer_state *s, jl_value_t
for (i = 0; i < ndims; i++) {
dims[i] = jl_unbox_long(jl_deserialize_value(s, NULL));
}
jl_array_t *a = jl_new_array_for_deserialization((jl_value_t*)NULL, ndims, dims, isunboxed, elsize);
jl_array_t *a = jl_new_array_for_deserialization((jl_value_t*)NULL, ndims, dims, isunboxed, elsize, elalign);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtjnash Any idea how to get elalign at this point of the serialisation state? I don't have a eltype here yet, right? For that I need aty and that is depended on a

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute it during serialization

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok just not keep it around ;) Gotcha

if (allocsz <= GC_MAX_SZCLASS + sizeof(jl_taggedvalue_t)) {
int pool_id = jl_gc_szclass(allocsz);
if (klass != -1){
assert(alignsz <= GC_MAX_SZCLASS + sizeof(jl_taggedvalue_t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indentation here (should be two four-space indents rather than a tab)

continue;
it.first->eraseFromParent();
std::get<0>(it)->eraseFromParent();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indentation here too

test/threads.jl Outdated
@@ -280,6 +280,14 @@ let atomic_types = [Int8, Int16, Int32, Int64, Int128,
filter!(T -> sizeof(T)<=8, atomic_types)
end
for T in atomic_types
# Check that alignment is natural alignment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines too

@vchuravy
Copy link
Member Author

vchuravy commented Sep 27, 2017

OK todos after the recent rebase:

  • address Jameson comment and use jl_islayout_inline
  • cleanup the mess that is jl_alignment,julia_alignment, datype_alignment, and jl_gc_alignment
  • optimise the pool osize table and selection
  • stress test this

After my discussion with Jameson yesterday I am worried about the difference between stack-alignment and heap-alignment. Currently they both agree at 16, but part of the reason for this whole PR is that I would like to bump heap alignment to 64 so that we can support alignments of 32 and 64 (avx256, avx512) Do we also need to bump the stack alignment?

if (!ignore_tag) {
align = sz <= 8 ? 8 : JL_SMALL_BYTE_ALIGNMENT;
sz += align;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should change alignment to pointer size and increase size by that much.

@JeffreySarnoff
Copy link
Contributor

The memory alignment used by Julia should be 64. No good comes from bifurcating the manner of [co]locating information. Stack and heap should be transparently egalitarian in their conferred alignment. This provides palpable future-proofing and levers performance (which follows less mess).

* TODO: select bucket that fits multiple of alignment
* TODO: allow alignment up to 64
* TODO: make arrays follow alignment as well
* TODO: cleanup the mess that is jl_alignment, julia_alignment, datype_alignment, and jl_gc_alignment
* TODO: teach jl_gc_alloc_ to do something with the alignment request
@vtjnash
Copy link
Member

vtjnash commented May 5, 2021

Discussing with @vchuravy today, the last missing bit is computing the object padding required to provide the API here. It is a slight change to the way this works now, but should independently help finish #22649 too.

@vchuravy
Copy link
Member Author

Very stale

@vchuravy vchuravy closed this Apr 26, 2023
@vtjnash
Copy link
Member

vtjnash commented Apr 27, 2023

I thought we had almost finished this. Just needed a small rebase

@vchuravy
Copy link
Member Author

Just needed a small rebase

Just doing a lot of heavy lifting here. :)
I won't be working on this anytime soon, so in some sense I am unlicking the cookie.

@giordano giordano deleted the vc/pool_alignment branch March 12, 2024 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug GC Garbage collector
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Loading a NTuple{N, VecElement{T}} can seqfault
8 participants