Skip to content
This repository has been archived by the owner on Oct 12, 2022. It is now read-only.

Fix issue 2504 - AA.reserve #1929

Closed
wants to merge 1 commit into from

Conversation

Darredevil
Copy link
Contributor

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @Darredevil! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.

Some tips to help speed things up:

  • smaller, focused PRs are easier to review than big ones

  • try not to mix up refactoring or style changes with bug fixes or feature enhancements

  • provide helpful commit messages explaining the rationale behind each change

Bear in mind that large or tricky changes may require multiple rounds of review and revision.

Please see CONTRIBUTING.md for more information.

Bugzilla references

Auto-close Bugzilla Description
2504 Reserve for associative arrays

@dlang-bot dlang-bot added the Enhancement New functionality label Oct 5, 2017
@Darredevil Darredevil force-pushed the issue-2504-aa-reserve branch from bbe08b3 to a276e47 Compare October 5, 2017 14:28
@@ -1974,6 +1975,11 @@ void clear(T : Value[Key], Value, Key)(T* aa)
_aaClear(*cast(void **) aa);
}

void reserve(T : Value[Key], Value, Key)(ref T aa, size_t ndim)
{
_aaReserve(cast(void **)&aa, typeid(Value[Key]), ndim);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use typeid(T) here since T == Value[Key].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, however I used Value[Key] to be consistent with the rest of the file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few kinks we could do better in this area, si @Darredevil let's go with the cleaner way here and we'll update the rest in good time.

@@ -1974,6 +1975,11 @@ void clear(T : Value[Key], Value, Key)(T* aa)
_aaClear(*cast(void **) aa);
}

void reserve(T : Value[Key], Value, Key)(ref T aa, size_t ndim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add an overload for T* as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch do we really need this T* for new overloads? After all aren't we trying to move away from the ugly C-like pointer syntax?
Also while you are at it, note that most AA methods still aren't @safe :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wilzbach yes, it needs to be there. Otherwise it operates inconsistently with other normal objects that can call "members" based on a pointer.

This is a special case because AA's are builtins. If they were fully library types, this wouldn't be needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sux but ok

@Darredevil Darredevil force-pushed the issue-2504-aa-reserve branch from a276e47 to 7cceafc Compare October 5, 2017 14:57
Copy link
Member

@schveiguy schveiguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs unit tests and documentation. Will do complete review when those are there. Looks good so far!

@wilzbach
Copy link
Member

wilzbach commented Oct 5, 2017

Btw please don't forget to open a PR at dlang.org for the spec update -> https://dlang.org/spec/hash-map.html

@schveiguy
Copy link
Member

A design question on this -- reserve (as you have implemented it) is going to reserve space for the buckets, but not space for the actual elements. That is, as you assign data, allocations will still need to occur.

What is the expectation we want for reserve? When you use it on a normal array, the reservation is to pre-allocate space for the data.

You could potentially do both the buckets and the data items if you preallocate the data elements into some sort of array or block, but that can have an effect on many things, including when a data item is collected by the GC.

There are advantages to both ways, so it truly is a question of what we want, and not of what is obvious. Again, another reason to favor a configurable library type vs. a language type.

Make sure the documentation is crystal clear what is being reserved.

src/rt/aaA.d Outdated
/// Reserve AA
extern (C) void _aaReserve(AA* aa, const TypeInfo_AssociativeArray ti, size_t ndim)
{
// lazily alloc implementation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dimension has to be a power of 2, so anything else should be rejected or rounded up.

Copy link
Contributor Author

@Darredevil Darredevil Oct 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this actually saved me some headache I had with a few tests. Rounding up by default now. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the resize call is supposed to avoid rehashing for the given number of entries, you need to include the "grow ratio", too. The AA rehashes if the number of entries exceeds 4/5 of the dimension of the bucket array: https://github.com/Darredevil/druntime/blob/eb9865e63b1205e36f28c52aeffbb08d02f96b68/src/rt/aaA.d#L386.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rainers that seems to explain the efficiency issue. Also, @Darredevil what happens if reserve is called with a smaller number than the current number of elements? Seems to me the correct code should be:

if (ndim <= aa.dim) return;
if (aa.used * GROW_DEN > ndim * GROW_NUM) ndim = (aa.used * GROW_DEN + GROW_NUM - 1) / GROW_NUM;
assert(aa.used * GROW_DEN <= ndim * GROW_NUM);
ndim = nextpow2(ndim);
if (aa.impl is null)
        aa.impl = new Impl(ti, ndim);
    else
        aa.resize(ndim);

Please verify my math.

Copy link
Member

@schveiguy schveiguy Oct 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, 20 million rounds up to 33 million as a power of 2. 20/33 < 4/5. So no rehashing should be happening.

Still a good idea to take into account the load factor.

Edit: this is in relation to the current tests, so what I mean in this context is it doesn't "explain the efficiency issue"

@Darredevil Darredevil force-pushed the issue-2504-aa-reserve branch from 7cceafc to eb9865e Compare October 6, 2017 13:33
@Darredevil
Copy link
Contributor Author

Darredevil commented Oct 6, 2017

@schveiguy regarding the design question, I noticed after several tests that the current implementation seems to run faster without a reserve() call.

int[int] aa;
aa.reserve(20_000_000);
foreach(i;0..20_000_000) a[i] = i;

This code runs several hundred milisec slower with the reserve(). Would it be faster if we preallocate the data elements as well?

src/rt/aaA.d Outdated
assert(aa[31_133] == 31_133);

foreach(i;0..20_000_000) aa[i] = i;
assert(aa[19_999_133] == 19_999_133);
Copy link
Member

@schveiguy schveiguy Oct 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't actually testing the functionality of reserve, but just that reserve doesn't break the current functionality.

You should check that aa.impl.buckets doesn't change at some point while filling. Something like:

auto b = aa.impl.buckets;
assert(b.length >= 20_000_000);
// fill, rehashing won't happen
...
assert(aa.impl.buckets is b); // check both pointer and length still match

@schveiguy
Copy link
Member

This code runs several hundred milisec slower with the reserve()

I can't see why it would be slower.

Without reserve, it has to rehash several times during the insertion of elements. With reserve, it shouldn't rehash at all, as 20 million elements will round up to a dim size of 2^25, or 33 million. It shoudn't ever rehash during adding the elements.

Perhaps the issue is that the GC scans take longer when you are allocating because you have this humungous array of pointers?

@schveiguy
Copy link
Member

Note that preallocating the elements would stop the GC scans during insertion, so if that is the issue, preallocation of elements would solve it.

@rainers
Copy link
Member

rainers commented Oct 6, 2017

Perhaps the issue is that the GC scans take longer when you are allocating because you have this humungous array of pointers?

Could be true, using GC.disable() might be used to verify it. Maybe caching also has effects, i.e. inserting into the AA might be faster with smaller bucket array, more than compensating for the eventual rehashing.

Note that preallocating the elements would stop the GC scans during insertion, so if that is the issue, preallocation of elements would solve it.

It would move allocation and GC scan time into the reserve() call, but I suspect overall performance would not change.

@schveiguy
Copy link
Member

It would move allocation and GC scan time into the reserve() call

Unless you preallocated as an array of elements. Then it's only one allocation, and one potential scan (2 if you preallocate the buckets).

@rainers
Copy link
Member

rainers commented Oct 6, 2017

Unless you preallocated as an array of elements.

Yes, but it would be a large memory block staying in memory unless all entries are removed and all references to it are gone. That's not very GC friendly. Please note that there can be "external" references to the values via auto p = key in aa.

@@ -1974,6 +1975,11 @@ void clear(T : Value[Key], Value, Key)(T* aa)
_aaClear(*cast(void **) aa);
}

void reserve(T : Value[Key], Value, Key)(ref T aa, size_t ndim)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sux but ok

src/object.d Outdated

void reserve(T : Value[Key], Value, Key)(T* aa, size_t ndim)
{
_aaReserve(cast(void **)aa, typeid(Value[Key]), ndim);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rather use simple forwarding instead of the ugly cast twice. That way we only need to maintain the tricky code once:

reserve(*aa, ndim);

@@ -1974,6 +1975,11 @@ void clear(T : Value[Key], Value, Key)(T* aa)
_aaClear(*cast(void **) aa);
}

void reserve(T : Value[Key], Value, Key)(ref T aa, size_t ndim)
{
_aaReserve(cast(void **)&aa, typeid(Value[Key]), ndim);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few kinks we could do better in this area, si @Darredevil let's go with the cleaner way here and we'll update the rest in good time.

src/rt/aaA.d Outdated
/// Reserve AA
extern (C) void _aaReserve(AA* aa, const TypeInfo_AssociativeArray ti, size_t ndim)
{
// lazily alloc implementation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rainers that seems to explain the efficiency issue. Also, @Darredevil what happens if reserve is called with a smaller number than the current number of elements? Seems to me the correct code should be:

if (ndim <= aa.dim) return;
if (aa.used * GROW_DEN > ndim * GROW_NUM) ndim = (aa.used * GROW_DEN + GROW_NUM - 1) / GROW_NUM;
assert(aa.used * GROW_DEN <= ndim * GROW_NUM);
ndim = nextpow2(ndim);
if (aa.impl is null)
        aa.impl = new Impl(ti, ndim);
    else
        aa.resize(ndim);

Please verify my math.

src/rt/aaA.d Outdated

foreach(i;0..20_000_000) aa[i] = i;
assert(aa[19_999_133] == 19_999_133);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a test that reserves less than the current length (which will probably fail with the current implementation).

@schveiguy
Copy link
Member

Yes, I know. That is why I said earlier "can have an effect on many things, including when a data item is collected by the GC."

But the bottom line is -- if reserve makes things slower, there's no point. There's only a point to reserve if it improves performance.

The design issue with the current proposed PR is that it only preallocates the bucket allocations, which is something that isn't very costly already.

The problem with preallocating the elements as individual allocations is that each of them must be put somewhere to not be collected, but without knowing where they will go (as that is based on the hash), you then would have to store them in a temporary array.

What we need is a way to "preallocate" from the GC, and then use that cache for inserting elements. Each element would be able to be collected once it's used, but before actually being inserted it would be considered one block. You could probably do it by manipulating the bits for the page as each one is consumed. That's significantly more work than just this PR.

@schveiguy
Copy link
Member

What we need is a way to "preallocate" from the GC

https://issues.dlang.org/show_bug.cgi?id=17881

@andralex
Copy link
Member

andralex commented Oct 6, 2017

If this partial solution is not helping, we should figure out whether there's enough need for a more elaborate solution, or scrap reserve() altogether.

The proper solution would be for the hashtable to maintain a private freelist. Then reserve(n) makes sure there are at least n elements in the freelist.

Again, we need to make sure there are solid use cases for this complication. All, please advise.

@Darredevil Darredevil force-pushed the issue-2504-aa-reserve branch 3 times, most recently from 4b6127f to d1e0860 Compare October 9, 2017 14:39
@Darredevil
Copy link
Contributor Author

Darredevil commented Oct 9, 2017

After taking into account the grow threshold it seems there is a 20% speedup after running several tests on my machine. I would appreciate it if others could test this as well and confirm there is an increase in performance, example test code:

StopWatch sw;
int[int] a, b;

sw.start();
a.reserve(20_000_000);
sw.stop();
writefln("Elapsed time = %d mili seconds", sw.peek().msecs());
sw.reset();

sw.start();
foreach(i;0..20_000_000) a[i] = i;
sw.stop();
writefln("Elapsed time = %d mili seconds", sw.peek().msecs());
sw.reset();

sw.start();
foreach(i;0..20_000_000) b[i] = i;
sw.stop();
writefln("Elapsed time = %d mili seconds", sw.peek().msecs());

Thank you everyone for the assistance, please let me know if you agree with this implementation so we can merge it. Once merged I'll create a PR for the docs as well.

@edi33416
Copy link
Contributor

edi33416 commented Oct 9, 2017

I've got a slower machine, but I got an average of 500ms out of 10 tests speedup :)

@Darredevil Darredevil force-pushed the issue-2504-aa-reserve branch from d1e0860 to 3f9f357 Compare October 9, 2017 16:15
@Darredevil Darredevil force-pushed the issue-2504-aa-reserve branch from 3f9f357 to af6d479 Compare October 9, 2017 16:16
@jondegenhardt
Copy link

jondegenhardt commented Oct 9, 2017

I'll try it this evening (California time) on my standard benchmark tests.

In case anyone wishes to try it, there's a documented test here: eBay TSV utils: Join Benchmark. Instructions for creating the test files are included, and there is a link the source data file under the "Details" section. To try it, call reserve on filterHash with a value of 10 million after the filterHash declaration. Build by running make or make DCOMPILER=<path> in the tsv-join directory. Run the test shown in the performance report a couple times, timing with the time function.

It will also be interesting to check the max GC pause time. Add the command line arg "--DRT-gcopt=profile:1". (Instead of piping output to /dev/null, pipe to tail -n 20). On my Macbook Pro (16GB, SSD), the max GC pause time for this test exceeds 2000ms. That should drop significantly.

@Darredevil
Copy link
Contributor Author

@jondegenhardt I tried to run your benchmark but I ran into some issues when trying to use my local build for dmd/druntime/phobos with tsv utils. Could you please provide the results from your machine ?

@jondegenhardt
Copy link

jondegenhardt commented Oct 10, 2017

tsv-join benchmark: My benchmarks show no material change. Very surprising. I need to take a more in-depth look to make sure I haven't made a mistake somewhere. Here are timing from three builds: this PR with no reserve; this PR with a reserve of 8 million (the AA uses 7 million in this case), and a build of 2.076 (no reserve):

# dmd.pr1929 no reserve
real 39.64; user 36.66; sys: 2.87;
real 39.27; user 36.42; sys: 2.83;
real 39.46; user 36.58; sys: 2.86;
real 39.36; user 36.51; sys: 2.83;
real 39.29; user 36.39; sys: 2.87;

# dmd.pr1929 8m reserve
real 39.53; user 36.72; sys: 2.79;
real 39.76; user 36.93; sys: 2.82;
real 39.86; user 36.96; sys: 2.88;
real 39.78; user 36.93; sys: 2.84;
real 39.74; user 36.86; sys: 2.87;

# dmd.2076 (no reserve)
real 39.26; user 36.28; sys: 2.95;
real 39.18; user 36.18; sys: 2.98;
real 39.05; user 36.09; sys: 2.95;
real 39.13; user 36.18; sys: 2.94;
real 39.07; user 36.04; sys: 3.01;

@schveiguy
Copy link
Member

schveiguy commented Oct 10, 2017

I would appreciate it if others could test this as well and confirm there is an increase in performance

There is a large flaw with your benchmark, and that is that GC tests largely depend on the state of the GC. In each test, the GC is in a very different state after having allocated some number of AA elements for the previous tests.

I recommend instead compiling 3 2 separate versions to do an accurate comparison.

Edit: I realized the first "test" is just to check how long reserve takes.

@jondegenhardt
Copy link

jondegenhardt commented Oct 10, 2017

tsv-join benchmark: Here are the GC profile stats. These vary more run-to-run, those below are smallest "Grand total GC time" of two runs. Here also, no material difference. Like the raw timing numbers, this is a surprise. Makes me question the validity. I'll take a look again this evening and see if I can identify anything. The other thing I might try is to create an LDC build. LDC builds run about twice as fast on this benchmark.

dmd.pr1929 no reserve dmd.pr1929 8m reserve dmd.2076 (no reserve)
Number of collections 12 12 12
Total GC prep time (ms) 77 81 84
Total mark time (ms) 4006 3992 4130
Total sweep time (ms) 333 334 344
Total page recovery time (ms) 260 257 273
Max Pause Time (ms) 2432 2374 2474
Grand total GC time (ms) 4678 4666 4833
GC summary: MB 4045 4039 4045

/// Reserve AA
extern (C) void _aaReserve(AA* aa, const TypeInfo_AssociativeArray ti, size_t ndim)
{
ndim = nextpow2(ndim);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buckets size used should be large enough that if ndim elements are added, the bucket array will not be grown further (ndim function arg value). However, the choices here don't achieve that. Specifically, the bucket array will be grown if it becomes 80% full (GROW_NUM/GROW_DEN). nextpow2(ndim) will not always need this requirement. It will sometimes be necessary to move one power of two further to achieve this. e.g. If the arg is 500, the bucket size selected should be 1024, not 512. I suggest testing for this after the ndim = nextpow2(ndim) line, if it doesn't meet it, find the next power of two. That should handle the first two cases (aa.impl is null, ndim <= aa.dim). For the third case tested (aa.used * GROW_DEN > ndim * GROW_NUM), it is less clear to me when that would get triggered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think applying ndim = (ndim * GROW_DEN + GROW_NUM - 1) / GROW_NUM; before finding the next power of 2 should be good enough.
I'm not 100% sure, but you might also have to take the deleted entries into account when comparing with aa.dim or aa.used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to be cautious of is that reserve should never remove buckets. Due to the logic below, it won't, but if the logic is moved around, it could easily happen.

Copy link

@jondegenhardt jondegenhardt Oct 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the power-of-two allocation sizes: Best I can tell all the existing bucket allocation code paths use nextpow2 to set the bucket size when allocating. Not clear if that is important. If it is, the

ndim = (aa.used * GROW_DEN + GROW_NUM - 1) / GROW_NUM;

line is not following it. However, it the power-of-two component is not important, there may be an opportunity with reserve. Assuming that a call to reserve is a hint about expected element count, it might be better to choose the size based on fill ratio rather than power-of-two. For example, reserving 7 million entries will pick a power-of-two of 16777216, a 42% fill ratio. A 70% fill ratio of 10 million would save space and improve locality of reference.

Update: findSlotLookup and findSlotInsert identify the slot based on a mask created by buckets.length - 1, so it looks like power-of-two is expected. The code line in reserve should probably be changed to conform.

@jondegenhardt
Copy link

tsv-join benchmark: I was able to look into this a little, but I still need to examine further to understand why there is no material benefit. It was still re-allocating the bucket array one time more than necessary, but using a large enough initial size to avoid it doesn't get a material change. I think I'll need to write a different benchmark a bit to get a more accurate read.

@jondegenhardt
Copy link

jondegenhardt commented Oct 12, 2017

tsv-join benchmark: Did some further investigation. I'm seeing something unexpected: Using reserve, the time to populate the AA materially improves (expected), but the time accessing the AA is materially worse (unexpected).

Normally I would spend more time validating and investigating prior to reporting a result like this, but my time is quite fragmented right now, I wanted to provide others an opportunity to try reproducing this before I got back to it. At this point I've run this enough times to be sure there is a real behavior change in my tests. However, it would be preferable to create a more specific test to ensure I haven't made a mistake, that there are no unintended side effects involved, etc . I will do this as soon as I can.

My benchmark uses a string[string] AA. Keys are a 1-upped counter, converted to a string, 7 million of them. Reserve is set at 10 million. The only hypothesis I have is that something is causing a higher collision rate between hash keys. (Or that I've made error somewhere.)

Update: In a simplified version of this benchmark I'm seeing a 9-10% improvement in AA population time with no degradation in lookup time. This is with the DMD compiler.

@jondegenhardt
Copy link

jondegenhardt commented Oct 16, 2017

tsv-join benchmark: I tried a number of standalone tests using string[string] AAs similar to those used in the tsv-join benchmark. I ran these with a current LDC build. In all cases there was a 9% improvement in AA population time and no degradation in lookup time.

For some reason I don't yet understand the full tsv-join benchmark is materially slower when it calls reserve. Same build, the call to reserve conditioned on a run-time argument. However, another of my tools, tsv-uniq, is materially faster when it calls reserve first.

There is some evidence that using reserve reduces GC impact, but the numbers reported have high variance, it would take a more study to validate and quantify deltas. In any case, the max pause times remain relatively long.

Another benefit of using reserve - It chooses a more appropriate power-of-two size for the bucket array. In the current scheme, the bucket array quadruples in size when it grows. This leaves a 50-50 chance of using the power-of-two that best fits the data size, otherwise it's 2 times larger than needed. With default allocation, 7 million keys results in bucket array of 2**25 (33.5 million). Using reserve(7_000_000) results in a bucket size of 2**24 (16.7 million).

My suggestion: Make a decision about proceeding without worrying about the degradation in the full tsv-join benchmark. I'll keep trying to identify a simplified case, but whatever is going on appears to be somewhat more complicated.

@andralex
Copy link
Member

Thanks for this work.

reserve is a large implied commitment. I suspect it will be typically used by folks who want to make sure there is no allocation traffic later in the application.

My understanding (correct me if wrong) is that what we implement now is reserving the bucket array but not the nodes. That is not fulfilling the contract above and improves performance only marginally. Even the description is difficult ("reserves some of the memory needed later but not all" etc). We risk to disappoint folks more than help.

This should do what some code in std.experimental.allocator does, i.e. allocate relatively large blocks then thread a freelist through them. A freelist would in fact help the hashtable implementation a whole lot. If that's deemed too difficult, let's close this and work on something more impactful.

@jondegenhardt
Copy link

I'm inclined to agree. A 9% gain is material, but frankly, I was hoping for a good bit more. Relative to the other tools I wrote, those using AAs are quite a bit slower than the others for the amount of work they are doing. That is, my assessment is that there is fair bit of opportunity to improve AAs, reserve as implemented only captures part of it.

The memory savings by selecting a more appropriate power-of-two size are probably more worthwhile than the speed gains, but there are other ways to achieve this.

I was also hoping that the multi-second GC pauses would be further reduced. My assumption is that the large array of pointers in these AAs is a major contributor. Clearly an alternate approach is need to address this.

@jondegenhardt
Copy link

In the interests of moving this along...

To me the performance gain measured doesn't justify adding a method to the API. I recommend closing this. An alternate PR could certainly be put together in the future, incorporating this work it appropriate.

I have a couple follow-up questions:

  • Is there justification for changing the grow rate from quadrupling to doubling as the sizes get larger? eg. After allocating for 2**20 entries, switch to doubling in size rather than quadrupling?
  • Is it worth investigating/understanding the long GC pause times that seem associated with large AAs? Perhaps this well understood. But if not, might be worth some exploration.
  • I'm a little unclear about how preallocation of keys and values would be done for variable length keys and values like immutable strings. If there's a short explanation I'd be interested in how it would work.

@Darredevil
Copy link
Contributor Author

Darredevil commented Oct 24, 2017

After discussing with @andralex in more depth about the issue we decided to postpone this enhancement for now and revisit it later.

@Darredevil Darredevil closed this Oct 24, 2017
@schveiguy
Copy link
Member

Is it worth investigating/understanding the long GC pause times that seem associated with large AAs? Perhaps this well understood. But if not, might be worth some exploration.

It's always worth exploring, as we can guess what the reason is, but testing can reveal something we didn't think of (my theory would be that there are a huge amount of small blocks involved, with a large array that points at them, meaning you are following pointers all over the place, looking up each block as you process the large array, killing the cache).

I'm a little unclear about how preallocation of keys and values would be done for variable length keys and values like immutable strings. If there's a short explanation I'd be interested in how it would work.

You pre-allocate the blocks that would be allocated by the AA. You would not preallocate the values themselves. Any string types would be allocated elsewhere, the AA does not duplicate them. It just stores the string reference itself.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Enhancement New functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants