Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a hashtable to lookup method roots #52073

Draft
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

BioTurboNick
Copy link
Contributor

@BioTurboNick BioTurboNick commented Nov 8, 2023

This PR implements a hash table for looking up method roots. Compression of methods with many roots should be sped up, removing a barrier to adding additional roots (such as to address #41099, #50082).

Currently, each time a root is added, all previous roots have to be checked, leading to O(n^2) runtime. The hash table adds minimal memory overhead for each method (as a hidden field) to convert this step to O(n). The table is implemented with the existing htable_t type, using jl_object_id as the hash function and jl_egal as the equality function. Implemented with an IdDict as a field of the temporary jl_ircode_state struct generated during compression.

Some notes:

  • Sysimage (sys.so) size before: 129.7 MB; after: 130.9 MB
  • Compilation of Base + Stdlibs ~10% faster (n = 2, YMMV)
  • Should look into smallintset/IdSet, after planned updates

@oscardssmith
Copy link
Member

Is there a reason this isn't just using an IDDict

@BioTurboNick
Copy link
Contributor Author

Didn't know I could in C.

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Nov 8, 2023

I suppose I can draw from how jl_global_roots_table operates -- though it's not an IdDict proper, just a jl_genericmemory_t.

@oscardssmith
Copy link
Member

so on the C side, IDDicts are just jl_genericmemory_t and the iddict functions.

@BioTurboNick
Copy link
Contributor Author

I tried to move to the iddict, and it seemed to be working, but then when I cleaned and rebuilt I started getting segfaults. Probably some subtle aspect of the API I overlooked?

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Nov 9, 2023

The GC checker tells me I am "Passing non-rooted value as argument to function that may GC" but I'm not sure what I should be doing there.

src/ircode.c Outdated Show resolved Hide resolved
src/julia.h Outdated Show resolved Hide resolved
src/ircode.c Outdated Show resolved Hide resolved
src/method.c Outdated Show resolved Hide resolved
@BioTurboNick
Copy link
Contributor Author

Hmm. Using iddict and making it part of the type, the sysimage expands 15 MB.

Is it possible to avoid serializing this entry and regenerate it on load? I presume I was accidentally doing that previously, with the table not included as part of the Julia type.

Also, unlike the C htable_t variant, it seems I can't fool the iddict into storing an int inline, which is why I'm boxing them. Is there an alternative? Like, a set of singleton boxed ints that I can refer to? Or I could make them myself.

@vtjnash
Copy link
Member

vtjnash commented Nov 10, 2023

I suspect not storing this will be the actual answer (regenerating it at the start of compressing IR, and discarding at the end), though it is still questionable that we should be having any extra roots being created, as those can interact very badly with pkgimage too.

However, I also happened to be simultaneously working on an update to the smallintset code (which is a general wrapper that allows adding a compact index to an exist array) to make it a general-purpose IdSet implementation, with the intent to save on space. I hope to have that in a PR form very soon.

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Nov 11, 2023

though it is still questionable that we should be having any extra roots being created, as those can interact very badly with pkgimage too.

Fair enough, not married to that idea.

Moved to generating the iddict at the top of jl_compress_ir and storing in jl_ircode_state... and the GC analyzer looks happy with me for once. Sysimage size back down to 130.9 MB

@BioTurboNick
Copy link
Contributor Author

I just noticed that there are some arrays storing reusable boxed integers in datatype.c, though the implementation is eluding me. I see jl_box_int8() explicitly pulling from its cache, but I don't see any implementations for higher-bit int types. E.g. boxed_int64_cache doesn't seem to be accessed anywhere else in /src?

Was curious to find out if the boxing operation in here is less costly than I assumed.

@vtjnash
Copy link
Member

vtjnash commented Nov 13, 2023

This is looking quite reasonable. It is pretty cheap to box them, but this exact use is also what smallintset is optimized at doing. If you take a looking at the new idset.c file in https://github.com/JuliaLang/julia/pull/52114/files#diff-729a5f792d3a56d7d66a236816bf811c87325e47ccd28d5620872c88aa82db4e, that might give you some basic idea of how to use its API. (once that PR is merged, you can just call jl_idset_rehash with this array for the keys and an empty array for the idxs, and it will populate idxs with something that can be used for the rest of the APIs such as jl_idset_get)

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Nov 26, 2023

@vtjnash I'm trying the idset out, and maybe I'm using it wrong, but values 254 and over aren't being stored or retrieved properly? If I try to retrieve an element with values in that range immediately after I store it, _peek_bp returns -1. (I haven't looked at what you're suggesting w/_rehash yet)

@vtjnash
Copy link
Member

vtjnash commented Nov 26, 2023

Seems to work for me?

julia> x = Base.IdSet()
Base.IdSet{Any}(Any[], UInt8[], 0, 0)

julia> using Test; @testset begin
       @testset for i in 1:1000
       @test !in(i, x)
       push!(x, i)
       @test in(i, x)
       end
       end;
Test Summary: | Pass  Total  Time
test set      | 2000   2000  0.0s

Anyways, typical usage for your case would be:

assert(m->roots->ref.mem->ptr == m->roots->ref.ptr_or_offset); // expected offset == 0
s.roots_ids = jl_idset_put_idx(m->roots->ref.mem, jl_an_empty_memory_any, -jl_array_nrows(m->roots));

@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Nov 26, 2023

Hmm, I don't know...

static void literal_val_id(rle_reference *rr, jl_ircode_state *s, jl_value_t *v) JL_GC_DISABLED
{
    jl_array_t *rs = s->method->roots;
    jl_genericmemory_t *rt = s->roots_ids;
    jl_genericmemory_t *rt1 = s->roots_ids1; // test field
    int i;

    jl_value_t *ival = jl_eqtable_get(rt, v, NULL);
    if (!ival) {
        i = jl_array_nrows(rs);
        jl_add_method_root(s->method, jl_precompile_toplevel_module, v);
        s->roots_ids = rt = jl_eqtable_put(rt, v, jl_box_long(i), NULL);
        assert(rs->ref.mem->ptr == rs->ref.ptr_or_offset);
        s->roots_ids1 = rt1 = jl_idset_put_idx(rs->ref.mem, rt1, -i);
        jl_value_t *iival = jl_idset_get(rs->ref.mem, rt1, v);
        if (!iival)
            jl_safe_printf("root was not found in idset despite just being inserted: i: %ld, function: %s.\n", i, jl_symbol_name(s->method->name));
    }
    else
        i = jl_unbox_long(ival);

    return tagged_root(rr, s, i);
}

If I pass i, I start seeing the printed line once the number of roots for the method reaches 254. If I pass -i, it prints on every iteration.

@vtjnash
Copy link
Member

vtjnash commented Nov 27, 2023

You need -i - 1, since that is a length, not an index, but then you won't have the behavior of a Set since it rehashes on every insertion

@BioTurboNick
Copy link
Contributor Author

You need -i - 1, since that is a length, not an index, but then you won't have the behavior of a Set since it rehashes on every insertion

i is the length of the array prior to the pushing the new root at the end of the array, so it is the index of the added root.

Sorry for the confusion, I was just going with the negative because you included it above.

So breaking down the conditions further, it isn't all values that fail... 2 or 3 insert/retrieve fine, then one will fail to be found. And it appears that the roots that fail are always DataTypes and specifically GenericMemory. But only when the value is 254 or larger; less than that and it appears GenericMemory is inserted/retrieved just fine.

root was not found in idset despite just being inserted: i: 276, function: Array, root type: DataType, value: GenericMemory.
root was found with i: 277, function: Array, root type: DataType, value: Ref.
root was found with i: 278, function: Array, root type: DataType, value: GenericMemoryRef.
root was found with i: 279, function: Array, root type: DataType, value: SpinLock.
task.jl
root was not found in idset despite just being inserted: i: 280, function: Array, root type: DataType, value: GenericMemory.
root was found with i: 281, function: Array, root type: DataType, value: Ref.
root was found with i: 282, function: Array, root type: DataType, value: GenericMemoryRef.
root was found with i: 283, function: Array, root type: DataType, value: IntrusiveLinkedListSynchronized.

@vtjnash
Copy link
Member

vtjnash commented Nov 27, 2023

I can't tell what is incorrect about it without a link to the code

@BioTurboNick
Copy link
Contributor Author

Thanks, but no worries I got it working. Not entirely sure what was wrong, but it got there. I love C 😶. I'll benchmark later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants