Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Vec<T> for keeping track of gc objects #3493

Merged
merged 2 commits into from
Dec 4, 2023

Conversation

HalidOdat
Copy link
Member

This does not reduce memory usage, since we still keep a pointer to the gc objects, but instead it reduces the overhead of jumping through the intrusive linked list and also simplifies the gc collection functions.


Benchmarks

Main

PROGRESS Richards
RESULT Richards 36.5
PROGRESS DeltaBlue
RESULT DeltaBlue 41.7
PROGRESS Encrypt
PROGRESS Decrypt
RESULT Crypto 55.5
PROGRESS RayTrace
RESULT RayTrace 147
PROGRESS Earley
PROGRESS Boyer
RESULT EarleyBoyer 138
PROGRESS Splay
RESULT Splay 120
PROGRESS NavierStokes
RESULT NavierStokes 111
SCORE 81.0

PR

PROGRESS Richards
RESULT Richards 39.1
PROGRESS DeltaBlue
RESULT DeltaBlue 46.4
PROGRESS Encrypt
PROGRESS Decrypt
RESULT Crypto 56.3
PROGRESS RayTrace
RESULT RayTrace 150
PROGRESS Earley
PROGRESS Boyer
RESULT EarleyBoyer 153
PROGRESS Splay
RESULT Splay 166
PROGRESS NavierStokes
RESULT NavierStokes 112
SCORE 88.9

@HalidOdat HalidOdat added performance Performance related changes and issues gc Issue related to garbage collection labels Dec 1, 2023
@HalidOdat HalidOdat added this to the v0.18.0 milestone Dec 1, 2023
@HalidOdat HalidOdat requested a review from a team December 1, 2023 08:33
Copy link

github-actions bot commented Dec 1, 2023

Test262 conformance changes

Test result main count PR count difference
Total 95,609 95,609 0
Passed 76,526 76,526 0
Ignored 18,132 18,132 0
Failed 951 951 0
Panics 0 0 0
Conformance 80.04% 80.04% 0.00%

Copy link

codecov bot commented Dec 1, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (9f181ef) 44.57% compared to head (8330f9d) 44.54%.

❗ Current head 8330f9d differs from pull request most recent head 685b29d. Consider uploading reports for the commit 685b29d to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3493      +/-   ##
==========================================
- Coverage   44.57%   44.54%   -0.03%     
==========================================
  Files         487      487              
  Lines       50621    50598      -23     
==========================================
- Hits        22563    22539      -24     
- Misses      28058    28059       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jedel1043
Copy link
Member

jedel1043 commented Dec 1, 2023

Um, I'm pretty sure the point of using intrusive linked lists here is to avoid having live allocations on the GC state. This does make it faster, but consider that the Vec will keep expanding if the number of objects keeps growing, and it'll never reclaim the allocated buffer. This can be seen using our heap profiling tools:

Sample

{
    let array = []
    for (let i = 0; i <= 100000; i++) {
        array.push({data: i});
    }
}

$boa.gc.collect()

{
    let array = []
    for (let i = 0; i <= 100000; i++) {
        array.push({data: i});
    }
}

$boa.gc.collect()

Running cargo run --profile release-dbg --features dhat -- test.js --debug-object

Main

dhat: Total:     248,322,251 bytes in 1,984,424 blocks
dhat: At t-gmax: 53,298,622 bytes in 274,607 blocks
dhat: At t-end:  499,999 bytes in 2,648 blocks

This PR

dhat: Total:     217,362,347 bytes in 1,984,405 blocks
dhat: At t-gmax: 57,012,374 bytes in 283,328 blocks
dhat: At t-end:  4,678,511 bytes in 2,650 blocks

Note how main uses only 500k bytes of heap at the end of the program, while this PR uses 4.5M bytes.

@jedel1043
Copy link
Member

The alternative would be to resize the Vec on each collection, but that'll probably impact performance if the GC needs to do several GC collections on a high workload, and we'll still have half of Vec's buffer unused at some points in time.

@HalidOdat
Copy link
Member Author

HalidOdat commented Dec 1, 2023

In that case we can shrink_to with some_grace_value (maybe len/2 or len/3) on every collect.

@HalidOdat
Copy link
Member Author

Had the same thought 😆

@HalidOdat
Copy link
Member Author

Fixed in 685b29d :)

dhat: Total:     259,991,433 bytes in 1,984,563 blocks
dhat: At t-gmax: 57,561,904 bytes in 283,328 blocks
dhat: At t-end:  499,999 bytes in 2,650 blocks

Previous implementation

PROGRESS Richards
RESULT Richards 38.7
PROGRESS DeltaBlue
RESULT DeltaBlue 46.1
PROGRESS Encrypt
PROGRESS Decrypt
RESULT Crypto 55.8
PROGRESS RayTrace
RESULT RayTrace 150
PROGRESS Earley
PROGRESS Boyer
RESULT EarleyBoyer 146
PROGRESS Splay
RESULT Splay 159
PROGRESS NavierStokes
RESULT NavierStokes 108
SCORE 86.9

Current implementation

PROGRESS Richards
RESULT Richards 38.8
PROGRESS DeltaBlue
RESULT DeltaBlue 45.7
PROGRESS Encrypt
PROGRESS Decrypt
RESULT Crypto 56.7
PROGRESS RayTrace
RESULT RayTrace 149
PROGRESS Earley
PROGRESS Boyer
RESULT EarleyBoyer 146
PROGRESS Splay
RESULT Splay 160
PROGRESS NavierStokes
RESULT NavierStokes 110
SCORE 87.2

@jedel1043
Copy link
Member

jedel1043 commented Dec 1, 2023

Yeah, that's what I expected. Perf is virtually the same with that change. Clarified that the new bench compares two implementations of this PR.

I'd personally keep the current linked list just because it could enable some optimizations in the future; I think CPython uses the pointer as a tagged pointer to store the ref count.

Since I'm tending towards not merging this, I'd probably defer the review to @nekevss and @raskad for an unbiased opinion.

@HalidOdat
Copy link
Member Author

Yeah, that's what I expected. Perf is virtually the same with that change.

Umm.. Note the "previous implementation" is the previous implementation of this PR (without 685b29d), not main branch, so there is still a pretty nice perf increase. (I should have added a note on it 😅 ). I wanted to show that limiting the capacity allocation does not effect the performance, oddly I saw a very slight improvement (😕)?

There is a ~40 score increase in splay from main's imlementation, probably because it creates so many objects, to construct the tree.

@jedel1043
Copy link
Member

Yeah, that's what I expected. Perf is virtually the same with that change.

Umm.. Note the "previous implementation" is the previous implementation of this PR (without 685b29d), not main branch, so there is still a pretty nice perf increase. (I should have added a note on it 😅 ). I wanted to show that limiting the capacity allocation does not effect the performance, oddly I saw a very slight improvement (😕)?

There is a ~40 score increase in splay from main's imlementation, probably because it creates so many objects, to construct the tree.

Noted 👍

I still feel it's "wrong" to use allocations to keep track of allocations, so I'd still defer to the rest of the team for more opinions on this.

@nekevss
Copy link
Member

nekevss commented Dec 2, 2023

I'd tend to agree a bit more with Jedel that using the extra allocation of Vec feels off in this instance. 😕

@HalidOdat
Copy link
Member Author

This just moves the memory that was used to track the gc pointers (next field) into a dense array so that the CPU can better cache, hence the speed improvement.

Why is this weird?

In the future, if we aspire to achieve significant improvements in speed, it may be necessary to develop our own "malloc" and take charge of managing the heap pages ourselves. Perhaps my exposure to a considerable amount of v8/Spidermonkey code and techniques has influenced my perspective of what is weird. Making this proposed change seem normal. 😆

@jedel1043
Copy link
Member

jedel1043 commented Dec 2, 2023

Perhaps my exposure to a considerable amount of v8/Spidermonkey code and techniques has influenced my perspective of what is weird. Making this proposed change seem normal.

Ooh, does V8/SM use dynamic arrays to keep track of allocated memory? If that's the case, it'll remove my uncertainty completely.

@nekevss
Copy link
Member

nekevss commented Dec 2, 2023

Most of my hesitation here is around the strict provenance and how loading the NonNulls into a Vec may affect the provenance. It may be totally fine. I'm just not 100% certain, hence the hesitation. Whereas, Cell helps to enforce provenance.

I agree on the malloc/allocator point, and it may be worth looking into using the Allocator API via allocator_api2 early to handle page allocation. But if we did that, we'd be dealing with two allocations via Box and Vec, rather than just Box.

That all being said, I could be wrong. If there's consensus to move forward with this, we can.

@HalidOdat
Copy link
Member Author

HalidOdat commented Dec 2, 2023

Ooh, does V8/SM use dynamic arrays to keep track of allocated memory? If that's the case, it'll remove my uncertainty completely.

They preallocate a large slabs of memory, they keep track of whats free instead of what is allocated. I think they use an in-place free list.

You can find some information here


Anyway, I just felt like it was a waste to leave that big of a performance boost. Though I'm totally fine with closing if there is any cause for concern or a consensus has been reached :)

@raskad what are your thoughts on this?

@raskad
Copy link
Member

raskad commented Dec 4, 2023

To be honest, I though of this exact change multiple times myself already but was never to confident to actually try it :D

I think this is the right step forward. As far as I know, storing gc allocations in one or more continuous blocks of memory is probably the best approach. And starting it out with a simple Vec seems like the most natural solution to me.

I'm courious about the strict provenance argument. As far as I know miri has some strict provenance checks correct? @HalidOdat did you run miri on the boa_gc tests with this change?

@jedel1043
Copy link
Member

jedel1043 commented Dec 4, 2023

Made some research and Spidermonkey uses a mixed architecture of "Zones", which are mini heaps of memory that contain many objects that are tracked using linked lists. Their GC collects at the granularity of zones, and they keep track of the active zones using a Vec (technically a SmallVec that is first an array but expands to a Vec when exceeding 4 zones).

I think this should be enough evidence to accept this change.

@HalidOdat
Copy link
Member Author

I'm courious about the strict provenance argument. As far as I know miri has some strict provenance checks correct? @HalidOdat did you run miri on the boa_gc tests with this change?

Ran it with the -Zmiri-strict-provenance env option and miri did not complain and it ran fine.

MIRIFLAGS="-Zmiri-strict-provenance" cargo +nightly miri test -p boa_gc

Copy link
Member

@nekevss nekevss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If SM is using a Vec equivalent and miri isn't throwing any errors, then I'm fine with moving forward with this.

@raskad raskad added this pull request to the merge queue Dec 4, 2023
Merged via the queue into main with commit 6506f65 Dec 4, 2023
13 checks passed
@raskad raskad deleted the gc-use-vec-instead-linked-list branch December 5, 2023 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gc Issue related to garbage collection performance Performance related changes and issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants