Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC mark-loop rewrite #47292

Merged
merged 4 commits into from
Jan 24, 2023
Merged

GC mark-loop rewrite #47292

merged 4 commits into from
Jan 24, 2023

Conversation

d-netto
Copy link
Member

@d-netto d-netto commented Oct 22, 2022

Previous work

Since #21590, the GC mark-loop was implemented by keeping two manually managed stacks: one of which contained iterator states used to keep track of the object currently being marked. As an example, to mark arrays, we would pop the corresponding iterator state from the stack, iterate over the array until we found an unmarked reference, and if so, we would update the iterator state (to reflect the index we left off), "repush" it into the stack and proceed with marking the reference we just found.

This PR

This PR eliminates the need of keeping the iterator states by modifying the object graph traversal code. We keep a single stack of jl_value_t * currently being processed. To mark an object, we first pop it from the stack, push all unmarked references into the stack and proceed with marking.

I believe this doesn't break any invariant from the generational GC. Indeed, the age bits are set after marking (if the object survived one GC cycle it's moved to the old generation), so this new traversal scheme wouldn't change the fact of whether an object had references to old objects or not. Furthermore, we must not update GC metadata for objects in the remset, and we ensure this by calling gc_mark_outrefs in gc_queue_remset with meta_updated set to 1.

Additional advantages

  1. There are no recursive function calls in the GC mark-loop code (one of the reasons why Implement GC mark loop #21590 was implemented).
  2. Keeping a single GC queue will greatly simplify work-stealing in the multi-threaded GC we are working on (c.f. [rfc] parallel marking #45639).
  3. Arrays of references, for example, are now marked on a regular stride fashion, which could help with hardware prefetching.
  4. We can easily modify the traversal mode (to breadth first, for example) by only changing the jl_gc_markqueue_t(from LIFO to FIFO, for example) methods without touching the mark-loop itself, which could enable further exploration on the GC in the future.

Since this PR changes the mark-loop graph traversal code, there are some changes in the heap-snapshot, though I'm not familiar with that PR.

Some benchmark results are here: https://hackmd.io/@Idnmfpb3SxK98-OsBtRD5A/H1V6QSzvs.

@vchuravy
Copy link
Member

@nanosoldier runtests(ALL, vs = ":master")

@vchuravy
Copy link
Member

@nanosoldier runbenchmarks(!"scalar", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@d-netto d-netto added the GC Garbage collector label Oct 27, 2022
@vchuravy
Copy link
Member

vchuravy commented Nov 3, 2022

@nanosoldier runtests(["AbstractLogic", "AutomotiveVisualization", "Bonsai", "Contour", "FastTabular", "HydrophoneCalibrations", "IntervalTrees", "JET", "LuxLib", "MITgcmTools", "MixedAnova", "MixedModels", "OceanStateEstimation", "OutlierDetectionData", "PalmerPenguins", "Parsers", "Permanents", "ProtoBuf", "SlackThreads", "SoapySDR", "WaterWaves1D"], vs = ":master")

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@KristofferC
Copy link
Member

KristofferC commented Nov 3, 2022

The IOError thing is a Pkg bug (which might hide some other real bug).

Edit: Actually, a rebase should probably update to a Pkg where it is fixed.

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments from looking this over with you, but it seems to look good to me as a whole. Really solid work, Diogo.

src/gc.c Outdated Show resolved Hide resolved
src/gc.c Outdated Show resolved Hide resolved
src/gc.c Outdated Show resolved Hide resolved
src/gc.c Show resolved Hide resolved
size_t elsize = ((jl_array_t *)ary8_parent)->elsize / sizeof(jl_value_t *);
#ifndef GC_VERIFY
// Decide whether need to chunk ary8
size_t nrefs = (ary8_end - ary8_begin) / elsize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we multiply by * (elem_end - elem_begin) for the number of pointers in each element?

gc_cache->data_stack = (jl_gc_mark_data_t *)malloc_s(init_size * sizeof(jl_gc_mark_data_t));

// Initialize GC mark-queue
size_t init_size = (1 << 18);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 MB seems rather large (though 8k does seem a little small)

Copy link
Member Author

@d-netto d-netto Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the chunk-queue shouldn't be large (the number of chunks grows with the depth of graphs of objarrays).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is over 2 MB per thread though, which seems large

@d-netto d-netto force-pushed the dcn/ml2 branch 2 times, most recently from f594a8d to e82cd9d Compare November 17, 2022 15:50
@d-netto
Copy link
Member Author

d-netto commented Nov 18, 2022

@vtjnash any idea of what could be causing the error in analyzegc?

@vtjnash
Copy link
Member

vtjnash commented Nov 18, 2022

Try annotating the jl_raise_debugger function declaration with JL_NOTSAFEPOINT:

JL_DLLEXPORT void jl_raise_debugger(void) JL_NOTSAFEPOINT;

It is a heuristic evaluation tool, so any changes to the code can sometime arbitrarily change what branches it considers and what errors it thinks it might have found. You can run it locally with make -C src install-analysis-deps clang-sagc-gc

@d-netto d-netto force-pushed the dcn/ml2 branch 3 times, most recently from c0cf1ed to 5bb5bab Compare November 18, 2022 21:24
@vchuravy
Copy link
Member

@nanosoldier runtests(ALL)

1 similar comment
@d-netto
Copy link
Member Author

d-netto commented Dec 1, 2022

@nanosoldier runtests(ALL)

@maleadt
Copy link
Member

maleadt commented Dec 12, 2022

Hm, not sure why this didn't went through. Let's try again:

@nanosoldier runtests(ALL)

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@d-netto d-netto force-pushed the dcn/ml2 branch 2 times, most recently from 80f71bb to be4b416 Compare January 16, 2023 21:21
@vchuravy vchuravy changed the title Rebasing mark-loop changes GC mark-loop rewrite Jan 18, 2023
@d-netto
Copy link
Member Author

d-netto commented Jan 19, 2023

I'm seeing some regressions after the rebase, so not good to merge yet.

@d-netto
Copy link
Member Author

d-netto commented Jan 19, 2023

NVM, I was running with obj pools disabled.

@vchuravy
Copy link
Member

@nanosoldier runtests()

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@d-netto d-netto closed this Jan 23, 2023
@d-netto d-netto reopened this Jan 23, 2023
@d-netto
Copy link
Member Author

d-netto commented Jan 24, 2023

Bump

@vtjnash vtjnash merged commit dfab7be into JuliaLang:master Jan 24, 2023
@@ -117,6 +117,7 @@ typedef intptr_t ssize_t;
#define LLT_FREE(x) free(x)

#define STATIC_INLINE static inline
#define FORCE_INLINE static inline __attribute__((always_inline))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already defined in

#define FORCE_INLINE inline __attribute__((always_inline))
and now causes a warning during compilation:

In file included from /home/mose/repo/julia/src/support/hashing.c:51:
/home/mose/repo/julia/src/support/MurmurHash3.c:15: warning: "FORCE_INLINE" redefined
   15 | #define FORCE_INLINE inline __attribute__((always_inline))
      | 
In file included from /home/mose/repo/julia/src/support/hashing.c:7:
/home/mose/repo/julia/src/support/dtypes.h:120: note: this is the location of the previous definition
  120 | #define FORCE_INLINE static inline __attribute__((always_inline))
      | 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants