Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer types for array backend #48728

Closed
wants to merge 87 commits into from
Closed

Conversation

Tokazama
Copy link
Contributor

@Tokazama Tokazama commented Feb 19, 2023

Motivation

After a conversation on slack with @StefanKarpinski and @brenhinkeller it seemed like a lot of other PRs on improving arrays stalled out and the first fundamental step is having a generic buffer type. I still have crashes in the REPL with this current implementation and it appears I haven't fully implemented the GC side correctly, but I figured it was better to put this out there for someone else to start with then completely abandon this.

The goal here is to provide the core functionality for bare-bones storage of multiple repeated elements.

Implementation

There are three primary storage types this PR is intended to support:

  • Buffer{T}: has a fixed size and mutable elements

  • DynamicBuffer{T}: has a dynamic size and mutable elements.

  • ImmutableBuffer{T}: has a fixed size and immutable elements (not implemented here but should be as a future PR for with the freeze/thaw compiler optimizations worked on in prior ImmutableArray PRs).

The type layout is modeled after that of jl_array_t and is similar to the following native Julia structure:

struct MemoryChunk{T}
    length::Int
    data::Ptr{T}
end

Just like Array, the number of bytes stored determines if data points to inline allocations that extends the size of MemoryChunk via jl_gc_alloc or a seperately stored chunk via jl_gc_managed_malloc and jl_gc_track_malloced_buffer.

Storage of elements is similar as bit unions, bits, or pointers to boxed types is identical to that of Array or jl_array_t.

Note that there are not explicitly stored flags, offsets, additional dimensions, or element size. This means that:

  • Unlike jl_array_t, all information about the element type must be derived again everytime it's needed when running C code. Of course, this is not an issue once we are working with TBAA and that is all optimized away. However, I'm uncertain whether that will slow things down that don't often interact with TBAA directly (such as the garbage collector).

  • We don't have flags.how == 1 to mark a julia-allocated buffer that needs to be marked or flags.isshared. I've tried to use the last two its of the data pointer to mark these (see jl_buffer_isshared and jl_buffer_isunmarked). It may make more sense to have a type that explicitly wraps a shared buffer, preserving that bit for some other future use and allowing more explicit representation of data storage through the type system. I've yet to spend much time grocking the unmarked allocated data aspect of this, so there may be a better approach that I've yet to consider.

  • The lack of an offset or maximum size means that resizing for DynamicBuffer will always result in a reallocation or new allocation. Feature parity with Vector is intended to be accomplished through Julia code with something like

    mutable struct DynamicVector{T} <: DenseVector{T}
        buffer::DynamicBuffer{T}
        offset::Int
        length::Int
    end

    where the length of DynamicBuffer here is functionally equivalent to Arrays maxsize.

I'm in the process of moving the implementation to native Julia. This is a bit challenging since this is essentially an attempt to rewrite large portions of "array.c" where a lot of things aren't implemented in native Julia code (interacting with the GC, traversing Unions) and sometimes working efficiently with a pointer get tricky (storing pointers to immutable types).

Remaining Work

  • Names may not be sufficiently self documenting and may even be misleading. The term "buffer" is often used to refer to something more akin to what our current Vector is with offsets and resizing. Perhaps something that is more clearly describing a continuous chunk of memory such as MemoryChunk, ResizableMemoryChunk, and ImmutableMemoryChunk.

  • What needs to be done here to support better allocation practices? There's a lot of interest in providing users with the ability specify how things are allocated (allocating to the stack, bump allocators, smart pointers). I've assumed that most of that would be better addressed in future PRs by those who really understand how to optimize that sort of thing, but I'd also like to ensure that the implementation here is not prohibittive to future developments.

  • Could we provide better support for Bool here so that we don't need an explicit BitBuffer type?

  • More support for future implementation of DynamicVector with user friendly resizing methods

  • The assumed effects for methods should change based on the buffer variant. This is currently buggy. For example, the effects for length(::Buffer) should not be the same as length(::DynamicBuffer), but currently are.

  • Would it make sense to implement ImmutableBuffer here without any of the thaw/freeze stuff that would complicate things?

This is almost entirely copying SimpleVector implementations. When
calling `Core.sbuf(...)` it still throws a segmentation fault 11. Right
now my only guess is that I inadvertently have miscounted the amount of
builtin tags that I inserted and that it's trying to access memory that
isn't there. I'm still figuring out that entire bit of code though, so
it's easier said than done.
@quinnj
Copy link
Member

quinnj commented Feb 20, 2023

Ha, @oscardssmith and I were just chatting about this about a month ago and we revived a branch I had started a couple years ago doing something similar: https://github.com/JuliaLang/julia/tree/add-buffer-type2. It takes a slightly different approach in basically copying jl_array_t (allowing any eltype) but having it be fixed size. We got it mostly working, but there were still a few GC-related pieces to fix.

(not saying this PR isn't good/right; just wanted to raise the fact that a parallel effort was restarted recently)

@oscardssmith
Copy link
Member

@Tokazama I think a mix between our approaches is probably best. Specifically, I like that our version let you have a non UInt8 eltype, but I like your data layout for the struct a lot better.

@Tokazama
Copy link
Contributor Author

I think allowing the eltype to be included is a good idea if we can keep the structure itself the same. This does complicate things quite a bit (particularly with the garbage collector) because you have to figure out size and alignment manually a lot more without fields that explicitly reference that. I wasn't confident enough in everything else I was doing to spend more time implementing that too.

@oscardssmith
Copy link
Member

I think all you need to do is make a type parameter for the struct that stores the type, and then you can just get the size information from the type parameter.

@Tokazama
Copy link
Contributor Author

If someone figured out the shared pointer thing so we could have a type that is truly transparent then it should be fairly simple. We'd basically have three pipelines for allocation:

  1. SimplBuffer{BitType}
  2. SimpleBuffer{UnionBitType}
  3. SimpleBuffer{SharedPtr{BigType}}

Then we could have a ThinArray type that wraps it, hiding some of the ugliness from users without losing the transparency for other use cases.

I think all you need to do is make a type parameter for the struct that stores the type, and then you can just get the size information from the type parameter.

That would definitely work, but since I was still getting errors with a very basic implementation I wasn't sure it was a good idea to add more complexity yet.

@Tokazama
Copy link
Contributor Author

While it's on my mind I'll change around a couple things to make the transition to different types more approachable.

@quinnj
Copy link
Member

quinnj commented Feb 20, 2023

I might not quite understand the details well enough, but I thought @vtjnash was against doing any kind of shared pointer exposure in a buffer type; jl_value_ts should only ever live in /src code and never exposed on the Julia side.

@Tokazama
Copy link
Contributor Author

Tokazama commented Feb 20, 2023

I might not quite understand the details well enough, but I thought @vtjnash was against doing any kind of shared pointer exposure in a buffer type; jl_value_ts should only ever live in /src code and never exposed on the Julia side.

I don't mean we expose the pointer in its current state. I mean we need a new dedicated type that can serve the same role but is more transparent about what it's doing. So we'd need to declare a new type in "julia.h" as

typedef struct {
    JL_DATA_TYPE
    void *ptr
} jl_shared_ptr_t;

There then we could have something that reference counts in the garbage collector to ensure it's properly rooted.

EDIT: part of the strategy in #47735 is what I'm trying to get at here

@giordano giordano added the arrays [a, r, r, a, y, s] label Feb 20, 2023
@Tokazama
Copy link
Contributor Author

latest has a field for the type and the element size. I think I fixed all the code that is dependent on how this changes the amount of memory consumed but I could be wrong. I haven't had errors in my naive tests in repl but once we have settled on some other design decisions we clearly need tests here.

One of the current assumptions here is that SimpleBuffer always owns its data. So there's no pointer to another one containing the actual data. I prefer that this sort of transparency exists at this level, but I figured it needed to be explicitly pointed out given how this is different than Array.

@oscardssmith
Copy link
Member

Is it really a good idea to add the immutable buffer in this PR? To me that seems like a lot of complexity that's mostly orthogonal.

@Tokazama
Copy link
Contributor Author

Tokazama commented Feb 21, 2023

For me it was easier to implement them together because the compiler and builtin functions have to be different to accommodate the typename switch built on the same jl_buffer_t. If it's too much I can just remove it though.

Also, I'm not worrying about the behemoth of code for optimizations that replace creating a new immutable buffer with mutating the typename of a mutable buffer instead. In terms of compiler work I've only done the basic escape analysis for builtin functions so that derivatives of jl_buffer_t work.

@Tokazama Tokazama changed the title SimpleBuffer Draft for array backend Buffer types for array backend Feb 21, 2023
@Tokazama
Copy link
Contributor Author

One thing I was thinking about with this PR is that we might want to make it have a ccall barrier, and move most of the logic either to an LLVM intrinsic or julia itself. Doing that would allow for further constant propagation and might make it easier for LLVM to do things like turning a heap allocation into a stack one.

@gbaraldi, I think you're more familiar with the internal mechanics behind allocation and GC here. Do you have any suggestions concerning what entry points into the GC I should be using to track newly assigned pointers through a ccall on the Julia side and facilitating stack allocations?

Tokazama added 13 commits April 15, 2023 15:53
Previously boxed values weren't being tracked by the GC when set in the
native Julia code the set values and we didn't even bother worrying
about immutables that contained pointer fields at all. This should
appropriately track those values and ensures that we use a pointer safe
variant of `memmove` for pointer fields
Use "kinds" instead of multiple fields to describe more of the layout.
This could potentially be used to provide a clear way of switching up
storage in the future without being disruptive to existing code, since a
lot of this could change in the future anyway.
`Core.allocate_buffer` is intended to allocate buffer types, only
requiring direct interaction with C when the GC collector needs to
be notified. This isn't yet the default way to allocate because I'm
still having trouble getting the GC to properly mark, track, and
preserve stuff despite using exported C calls like `jl_gc_alloc`
Technically this executes but there are still some issues with rooting
Previous commit got resizing methods working in almost Pure Julia, but
we can't allocate anything on the Julia side reliably. Unfortunately,
this means any growth or deletion method must be predominantly done in C
so that data at provided indices can actually be shifted around before
freeing old data.
These don't need specific code generation anymore since this is all done
using Julia code now
@Tokazama Tokazama marked this pull request as ready for review April 27, 2023 01:18
@Tokazama
Copy link
Contributor Author

Despite some rough edges, I think this is ready for review. I'm still getting some odd errors when resizing DynamicBuffer, but besides that all else seems to work fine.

@mkitti
Copy link
Contributor

mkitti commented Apr 27, 2023

Regarding garbage collection, is anyone here talking to the folks (e.g. @NHDaly et al) working on memory management toolkit.

https://github.com/mmtk/mmtk-julia or https://github.com/mmtk/julia

@JeffBezanson
Copy link
Member

This is a large and important enough change that I think we need a detailed design document with everything thought through, and with wide buy-in before a lot of code is written. Every detail here can have wide-reaching implications. Some tricky points are how to accommodate non-moving resizing (which becomes very important for large data), and the existing Array feature of wrapping foreign pointers. Do we really want both growable and non-growable buffers? We need everybody to be comfortable with the trade-offs chosen here, and compare some alternate designs. @vtjnash, @oscardssmith and I started working on a proposal, but this will take some time.

@NHDaly
Copy link
Member

NHDaly commented May 18, 2023

Thanks for the ping, @mkitti. CC: @diogo and @kpamnany

@jpsamaroo
Copy link
Member

👍 to growable and non-growable buffers (especially in the context of wrapping foreign pointers, where the underlying memory may not support it)

@oscardssmith
Copy link
Member

@jpsamaroo to be clear, when we say "growable" we don't mean that it is guaranteed to be growable, but just that you can make a realloc-like call that might be able to grow the size of the buffer (if it happens to be feasible). This is unfortunately necessary to support efficient resizing of large arrays (where the OS can just give you more pages of memory), but since it is only sometimes possible, it wouldn't be a problem if some buffers don't successfully resize.

Copy link
Contributor

@mkitti mkitti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I highlighted the remaining fixme and Todo comments in the code

@@ -724,6 +724,7 @@ function perform_lifting!(compact::IncrementalCompact,
return stmt_val # N.B. should never happen
end

# FIXME: Buffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixme

@@ -1017,6 +1017,7 @@ end

rewrite_invoke_exprargs!(expr::Expr) = (expr.args = invoke_rewrite(expr.args); expr)

# DECISION: Buffer?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decision

Comment on lines +1780 to +1782
# TODO Buffer: escape analysis for jl_buffer_copy
# is_buffer_copy(name::Symbol) = name === :jl_buffer_copy
# # FIXME this implementation is very conservative, improve the accuracy and solve broken test cases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo and fixme

@@ -442,7 +447,8 @@ static size_t dereferenceable_size(jl_value_t *jt)
// Return the min required / expected alignment of jltype (on the stack or heap)
static unsigned julia_alignment(jl_value_t *jt)
{
if (jl_is_array_type(jt)) {
// TODO BUffer: is it safe to assume will also always have this alignment?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo

@@ -694,6 +694,10 @@ function statement_cost(ex::Expr, line::Int, src::Union{CodeInfo, IRCode}, sptyp
elseif (f === Core.arrayref || f === Core.const_arrayref || f === Core.arrayset) && length(ex.args) >= 3
atyp = argextype(ex.args[3], src, sptypes)
return isknowntype(atyp) ? 4 : error_path ? params.inline_error_path_cost : params.inline_nonleaf_penalty
# TODO optimize get_buffer_index && set_buffer_index!?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo

@Tokazama
Copy link
Contributor Author

This is a large and important enough change that I think we need a detailed design document with everything thought through, and with wide buy-in before a lot of code is written. Every detail here can have wide-reaching implications. Some tricky points are how to accommodate non-moving resizing (which becomes very important for large data), and the existing Array feature of wrapping foreign pointers. Do we really want both growable and non-growable buffers? We need everybody to be comfortable with the trade-offs chosen here, and compare some alternate designs. @vtjnash, @oscardssmith and I started working on a proposal, but this will take some time.

Is this proposal in progress located anywhere or is the goal to have something more complete put together before getting too many people involved?

@JeffBezanson
Copy link
Member

Draft proposal in progress here: https://gist.github.com/JeffBezanson/a25dde3bebb5a734af87bb5ddcf31fb0

@Tokazama
Copy link
Contributor Author

Tokazama commented Jun 6, 2023

I'm going to close this PR since Jeff's proposal already has some clear advantages over this approach. I'd be happy to help further work in this area if it would be helpful going forward. On the other hand, I understand too much feedback at this stage may be counterproductive so I'll stand by until further involvement is solicited.

@Tokazama Tokazama closed this Jun 6, 2023
@brenhinkeller brenhinkeller removed the triage This should be discussed on a triage call label Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s]
Projects
None yet
Development

Successfully merging this pull request may close these issues.