Custom allocator context #4

fonghou · 2023-12-28T17:30:56Z

Hi, thanks for sharing an awesome library. love the efficient design and clean api.

Regarding the topic of custom allocator context raised by @skeeto here https://old.reddit.com/r/C_Programming/comments/18gnxkd/verstable_a_versatile_highperformance_generic/kd25s9z/

Would you think the approach https://github.com/stclib/STC?tab=readme-ov-file#per-container-instance-customization fit Verstable well? It seems "zero cost" at runtime if not opted in, also less intrusive than what @skeeto's approach.

Regards!

JacksonAllan · 2023-12-28T21:46:35Z

Hi! Glad you’re enjoying the library :)

I haven’t had time to think much about the custom allocator issue yet, although I did bookmark @skeeto’s recent article on the topic.

I did, however, realize that a context pointer parameter for the allocation function would presumably require the hash-table struct to internally carry around context pointer that the user sets when her or she initializes it. Is that right? If so, that would be an unacceptable cost to impose on users who do not need this functionality, and this seems to be the problem that STC (@stclib) attempts to solve.

One solution, I think, would be to use the preprocessor to have the struct only include such a pointer if the user actually needs it (just as the bucket struct only includes a val field if the user defines VAL_TY). In that case, the API would look something like this:

#define MALLOC_FN <name of a user-provided function with the same signature as malloc>
// OR
#define EXTENDED_MALLOC_FN <name of a user-provided function that includes a context pointer parameter>

What do you think of that solution? It seems simpler than STC's approach.

In any case, it will be a while before I can introduce this functionality. The priorities are the moment are to finish refactoring and share my hash-table benchmarking project, then integrate Verstable into CC (as a separate implementation, not a replacement for the standalone library), and then publish an article on the benchmarks that also includes most popular C hash-table libraries. So it will probably be a few months before I get to the custom allocators, unfortunately.

Edit: Maybe it makes more sense to prioritize addressing allocator question, since that will take less time than those other things I mentioned.

yrashk · 2024-01-05T17:18:54Z

I think this is similar to what I propose in #5

JacksonAllan · 2024-01-06T23:15:06Z

Hi @yrashk and @fonghou. I’m going to make resolving the customer allocator issue my top priority. A few thoughts and questions before I implement anything:

The idea I currently have, as I mentioned earlier, is to provide the user with two options for plugging in a custom allocator:
```
// Option 1
#define MALLOC_FN <name of a function with the signature void *( size_t size )>
#define FREE_FN <name of a function with the signature void ( void *ptr )>

// Option 2
#define EXTENDED_MALLOC_FN <name of a function with the signature void *( size_t size, void *ctx )>
#define EXTENDED_FREE_FN <name of a function with the signature void ( void *ptr, size_t size, void *ctx )>
```
Option 1 is primarily for users who just want to plug in an allocator that quits on out-of-memory.

Only in the case of option 2 would the hash table struct actually contain a context pointer (set by the user via _init).

Would those two options be sufficient? Or is there actually a use case in which the user would need their free function to receive the size of the memory to free but not need a context pointer?
@yrashk mentioned:

An alternative worth exploring is using realloc instead of malloc so that at least the previous allocation is provided to the allocator, so at least some context can be shoehorned into it.

Verstabe couldn’t use a true realloc-like function because while resizing, both the old and new tables must, of course, exist in parallel. Nevertheless, having the allocation function see the existing allocation, would—in theory—allow it to transfer context information stored in the allocation (in memory before the address that the allocation function returns to Verstable) to the new allocation. In practice, I don’t think this approach could work because Verstable lazy-initializes (i.e. it only allocates upon the first _insert, not upon _init).
@yrashk’s pull request has Verstable pass an enum into the allocator function indicating whether the allocation is for the buckets array or metadata array (but curiously no context pointer?). Is there a good use case for distinguishing between these two allocation purposes (i.e. buckets vs metadata)? Another, possibly better approach is to have the two arrays live in the same allocation, as is done by Swiss/Abseil, for example. This would be slightly more memory efficient (particularly for hash tables containing only a few keys) because each separate allocation—at least by malloc—gets prefixed with malloc metadata and postfixed with padding.

I'll tag @skeeto again, too, just in case he has some sage advice to offer.

fonghou · 2024-01-07T22:12:46Z

Hi @JacksonAllan, I'd prefer Option 2 because it integrates better with Arena use cases per @skeeto example. Thanks!

N-R-K · 2024-01-12T18:24:45Z

My 2 cent on this - as someone who uses custom allocators a lot - is that a user defined context pointer is a must. One key benefit of custom allocators is the ability to avoid global state. But without a context pointer, it becomes necessary to smuggle the allocator metadata through global state - which is very much undesirable.

In that regard, I don't find #5 to be useful. In fact, from a library design standpoint, I find it to be harmful because it leaks unnecessary internal information out through the interface. For example, if you want to merge the two allocations into one in the future then all of a sudden it will require an API breakage.

If so, that would be an unacceptable cost to impose on users who do not need this functionality

A context pointer and a function pointer is merely 8+8=16 bytes on 64 bit platform (also see the note below). And this cost will occur per hashtable instance, not per node/item in the hashtable. I'm willing to bet that the items that will be inserted into the table will weight multiple orders of magnitude more than this pesky 16 byte.

If you can come up with a way to avoid it, then that's fine. But I can't imagine this being a deal breaker for anyone.

Note: If you want, you can avoid taking in the function pointer. A single context pointer is enough, the user can embed the function pointer into the context pointer and dispatch from there. Not super convenient, but not a deal breaker in my eyes.

my_alloc(..., void *ctx)
{
   MyStruct *a = ctx;
   return a->alloc(...); // allocator function pointer embedded in the context pointer
}

TL;DR: Proper support for custom allocators requires each hashtable to carry a context pointer at least.

JacksonAllan · 2024-01-12T19:26:26Z

@N-R-K

Thanks for weighing in!

A context pointer and a function pointer is merely 8+8=16 bytes on 64 bit platform (also see the note below).

It could make a difference for a user that has many tables that are small or empty. But in any case, the approach I outlined in my earlier comment addresses this issue by making the context pointer an opt-in mechanism.

My plan is still the one I outlined in that comment. To use a custom allocator with a context pointer, users will need to define EXTENDED_MALLOC_FN and EXTENDED_FREE_FN when instantiating the hash table template and then provide the context pointer when calling _init. Users who really need the same hash table type to use different allocator functions (i.e. they can't just instantiate the template once for every allocator) can follow the approach you outlined (i.e. embed the function pointer in the context).

The buckets and metadata arrays will indeed be combined into one allocation.

Whether or not FREE_FN (i.e. a user-defined free function without a context pointer parameter) should still take a size parameter is an open question. I'm inclined to say yes as someone might find it useful (?) and the compiler will optimize away the parameter if goes unused. Technically, this would be a breaking API change, but maybe worrying about minor changes is at this time a little premature.

fonghou · 2024-01-12T20:17:19Z

@JacksonAllan, one thing not clear to me with
#dedfine EXTENDED_MALLOC_FN/EXTENDED_FREE_FN option is that when the context pointer field is declared in hashtable struct itself. Will it be opted in #ifdef at compile time or void* field already declared (NULL with default malloc/free alloctor). If it's the later, carrying around an extra (void*) NULL field seems not "zero cost".

I personally like the STC's approach as mentioned previously. The key bits are in the source linked and copied below.

https://github.com/stclib/STC/blob/master/include/stc/extend.h#L54

typedef struct {
    i_extend
    i_type get;
} c_JOIN(i_type, _ext);

#define c_extend() c_container_of(self, _c_MEMB(_ext), get)

Just to demonstrate how that design works, I wrote a custom allocator based on an arena implementation (another header file "mem/arena.h" in my forked STC repo) described in @skeeto's blogs.

One difference you may notice, STC defines those custom allocator macros as function call expressions, instead of function names (ie. pointers). One advantage of it is that both struct field and function arg are static typed instead of void*.

https://github.com/fonghou/STC/blob/master/include/stc/arena.h

Below is an example how it looks like in client code.

https://github.com/fonghou/STC/blob/master/misc/examples/mixed/arena.c

JacksonAllan · 2024-01-19T03:00:31Z

@fonghou

Sorry for the belated response—I've been super busy.

Will it be opted in #ifdef at compile time

Yes, exactly. With this approach, the hash table struct would look like this:

typedef struct
{
  size_t key_count;
  size_t bucket_count;

  uint16_t *metadata;
  VT_CAT( NAME, _bucket ) *buckets;

  #ifdef EXTENDED_MALLOC_FN
  void *allocator_context;
  #endif
} NAME;

As for the STC approach, initially I had some reservations. But the more I think about it, the more I like it.

Advantages:

The context pointer can be correctly typed, rather than void *.
It's more flexible because the user can add any variables they like to the wrapper struct.
It's neat and easy to implement: All it fundamentally requires is that the allocator function takes a pointer to the hash table struct, and the user can handle the rest. With this approach, there's no need for Verstable to provide separate #define MALLOC_FN and #define EXTENDED_MALLOC_FN options.

Disadvantages:

The hash table now has to be accessed via a member of the wrapper struct, which is a little awkward.
The container_of idiom is generally recognized not to be strictly conforming C due to ambiguities in the Standard about pointer provenance and arithmetic. However, I think the user can dodge this "issue" by making the hash table struct the initial member of the wrapper struct. Under 6.7.2.1 15, which states that "A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa" (emphasis added), deriving a pointer to the wrapper struct from its initial member should always be allowed. (For this reason, perhaps @tylov should consider switching the order of i_extend and i_type get; in STC?)

tylov · 2024-01-19T23:07:05Z

@JacksonAllan Hi, I prefer strict conforming C, but I personally don't worry much about this "possible" non-conformance of container_of(). I use it also in the linked list implementation. However, I still think it is a good idea to have the "contained object" as the initial member, so casting can be used instead (actually container_of() will collapse into a regular cast). I'll likely do it with both the list node and the extend wrapped struct.

tylov · 2024-01-20T17:19:11Z

@JacksonAllan I have committed an update to STC in the v50dev branch, and have removed the dependency on container_of in the library, and added a clone function for the extended container struct.
That said, I still think "container_of" is an important feature that can be used. The linux kernel uses it quite extensively (I belive), e.g. via Thorvalds intrusive linked list. The linux kernel is arguably the "Mother of all C programs", so I think it is highly unlikely that any compiler vendor will ever dare to make a compiler which renders that code broken.

fonghou · 2024-01-20T22:46:37Z

Hi @tylov, I've tested out this change.

so casting can be used instead (actually container_of() will collapse into a regular cast)

I find it makes below code work as expected (and api more ergonomic).

  hmap_str_ext table = {.arena = &arena};
  hmap_str_emplace(&table, "Map", "test");

However, both gcc and clang give incompatible pointer type warning like below. Do you recommend this usage? or stick with old way of table.get?

 warning: passing argument 1 of ‘hmap_str_emplace’ from incompatible pointer type [-Wincompatible-pointer-types]
  188 |   hmap_str_emplace(&table, "Map", "test");
      |                    ^~~~~~
      |                    |
      |                    hmap_str_ext *
In

tylov · 2024-01-20T23:20:52Z

I would go with the type-correct version, i.e. the .get doesn't bother me much. You can always make an extra pointer directly to the container if you use it a lot.

JacksonAllan · 2024-01-20T23:50:48Z

@tylov

That said, I still think "container_of" is an important feature that can be used. The linux kernel uses it quite extensively (I belive), e.g. via Thorvalds intrusive linked list. The linux kernel is arguably the "Mother of all C programs", so I think it is highly unlikely that any compiler vendor will ever dare to make a compiler which renders that code broken.

Right, the arguments in favor of container_of seem to be 1) the contemporary and historical prevalence of this idiom and 2) the ambiguities in 6.3.2.3 7, which governs char pointer arithmetic within an object (e.g. the exact meaning of "object" in this context and the fact that the text only technically only allows "successive increments" of the char pointer, not general addition to or subtraction from it). To these arguments I might add that at a certain point, we have to let pointers be pointers or we'll all lose our minds. But the unclear status of container_of seems to be well recognized, so if we can transparently avoid it simply be changing the order of our struct members, I say that we might as well do so.

JacksonAllan · 2024-01-27T00:50:10Z

@fonghou
@yrashk
@N-R-K

I've now implemented the extended custom allocator support in the newly created dev branch.

Initially, I took the STC wrapper-struct approach. However, that experiment highlighted a few drawbacks of this approach that caused me to abandon it:

The hash table struct and the wrapper struct are very tightly coupled. When the hash table is defined with allocation and free functions that assume that it is the first member of a wrapper struct, it can only ever be used as part of that wrapper struct. To me, this raises the question, why should there be two separate structs?
A separate wrapper struct, allocation function, and free function must be defined for every hash table type. This is rather cumbersome for anyone wanting to use essentially the same allocator with multiple table types, which seems like a common scenario.

Hence, the approach I ultimately followed is a combination of my original idea (in that a member is conditionally added to the hash table struct) and the STC approach (in that the user fully controls the type of that member).

It works as follows:

To enable and specify the type of the hash table struct’s ctx member, define CTX_TY before including verstable.h.

If CTX_TY has been defined, the custom allocation and free functions supplied via MALLOC_FN and FREE_FN should each take an extra argument, namely a pointer to the table's ctx member: void *( size_t size, CTX_TY *ctx ) for MALLOC_FN and void ( void *ptr, size_t size, CTX_TY *ctx ) and for FREE_FN.

If CTX_TY has been defined, init and init_clone also each take an extra argument that sets the ctx member on the newly initialized table.

Here's how this system might look in practice:

#include <stddef.h>

// Example context data as a struct.
typedef struct
{
  // Context data...
} context;

void *malloc_with_ctx( size_t size, context *ctx )
{
  // Allocate size bytes, taking into account ctx, and return a pointer to the allocated memory or NULL in the case of
  // failure...
}

void free_with_ctx( void *ptr, size_t size, context *ctx )
{
  // Free size bytes, taking into account ctx...
}

#define NAME int_int_map
#define KEY_TY int
#define VAL_TY int
#define CTX_TY context
#define MALLOC_FN malloc_with_ctx
#define FREE_FN free_with_ctx
#include "verstable.h"

int main( void )
{
  context our_ctx;

  // Initialize our_ctx...

  int_int_map our_map;
  vt_init( &our_map, our_ctx );

  // Do things with our_map...

  vt_cleanup( &our_map );
}

One thing I wasn't sure about was whether init_clone should accept a context argument or simply copy the ctx member of the source table. For now, I've implemented the former.

Edit: Fixed above example.

JacksonAllan · 2024-02-07T00:21:23Z

@fonghou
@yrashk
@N-R-K

The above changes are now part of the main branch (v2.0.0). I recommend updating to this new version because, besides introducing optimizations, it also fixes a significant bug whereby a key was hashed after its user-supplied destructor was called during erasure.

I'll leave this issue open for now in case there's any feedback.

tylov · 2024-02-07T13:44:02Z

Just letting you know, in STC branch v50dev, there is a include/hmap-robin.h. Would be great if you could include it in your benchmarks if you have time at some point. It appears to be quite fast (I think insert is slightly slower, but erase and lookup are faster and probably more consistent). I will likely replace it with hmap.h if it works well.

JacksonAllan · 2024-02-07T14:03:21Z

@tylov

Will do ASAP. I'll probably post the results in the discussion in order to keep talk of design and benchmarks centralized there.

JacksonAllan closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom allocator context #4

Custom allocator context #4

fonghou commented Dec 28, 2023

JacksonAllan commented Dec 28, 2023 •

edited

Loading

yrashk commented Jan 5, 2024

JacksonAllan commented Jan 6, 2024 •

edited

Loading

fonghou commented Jan 7, 2024

N-R-K commented Jan 12, 2024

JacksonAllan commented Jan 12, 2024 •

edited

Loading

fonghou commented Jan 12, 2024

JacksonAllan commented Jan 19, 2024 •

edited

Loading

tylov commented Jan 19, 2024

tylov commented Jan 20, 2024

fonghou commented Jan 20, 2024

tylov commented Jan 20, 2024

JacksonAllan commented Jan 20, 2024 •

edited

Loading

JacksonAllan commented Jan 27, 2024 •

edited

Loading

JacksonAllan commented Feb 7, 2024

tylov commented Feb 7, 2024

JacksonAllan commented Feb 7, 2024

Custom allocator context #4

Custom allocator context #4

Comments

fonghou commented Dec 28, 2023

JacksonAllan commented Dec 28, 2023 • edited Loading

yrashk commented Jan 5, 2024

JacksonAllan commented Jan 6, 2024 • edited Loading

fonghou commented Jan 7, 2024

N-R-K commented Jan 12, 2024

JacksonAllan commented Jan 12, 2024 • edited Loading

fonghou commented Jan 12, 2024

JacksonAllan commented Jan 19, 2024 • edited Loading

tylov commented Jan 19, 2024

tylov commented Jan 20, 2024

fonghou commented Jan 20, 2024

tylov commented Jan 20, 2024

JacksonAllan commented Jan 20, 2024 • edited Loading

JacksonAllan commented Jan 27, 2024 • edited Loading

JacksonAllan commented Feb 7, 2024

tylov commented Feb 7, 2024

JacksonAllan commented Feb 7, 2024

JacksonAllan commented Dec 28, 2023 •

edited

Loading

JacksonAllan commented Jan 6, 2024 •

edited

Loading

JacksonAllan commented Jan 12, 2024 •

edited

Loading

JacksonAllan commented Jan 19, 2024 •

edited

Loading

JacksonAllan commented Jan 20, 2024 •

edited

Loading

JacksonAllan commented Jan 27, 2024 •

edited

Loading