-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
src/gc/gc.h
Outdated
@@ -105,6 +106,7 @@ extern "C" uint8_t* g_gc_highest_address; | |||
extern "C" GCHeapType g_gc_heap_type; | |||
extern "C" uint32_t g_max_generation; | |||
extern "C" MethodTable* g_gc_pFreeObjectMethodTable; | |||
extern "C" MethodTable** g_gc_pStringClass; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats lame to have nested pointer, but gc is initialized before EE sets pointers to method tables for such types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we should set it lazily, then?
src/gc/gc.cpp
Outdated
{ | ||
return false; | ||
} | ||
// it's sad we can't leverage mark bit to indicate that string is duplicate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick thought: the lowest two bits of the method table pointer are usable and we only use one of them. We could in theory use the other bit for this.
src/gc/stringdedup.cpp
Outdated
size_t number_of_heaps | ||
) | ||
{ | ||
table = new StringDedupTable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beware of allocation failures here. All of these will throw on failure and the GC is not allowed to throw. You should use the nothrow operator new
and return a bool
or something to indicate whether or not the initialization was successful.
It is a huge pain that we can't use std::unique_ptr
here, and I'm sorry about that. I think that's something we definitely should address in the GC codebase as a whole going forward.
} | ||
if (*data) | ||
{ | ||
return (*data)->Write(item); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if this string is interned? does this still work?
|
||
uint32_t GCHashTableBase::GetHash(GCStringData* key) | ||
{ | ||
uint32_t hash = 5381; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd imagine we'd want to use a cryptographic hash if we want to avoid potential denial-of-service problems here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String itnern pool routines uses this algorithm.
https://github.com/dotnet/coreclr/blob/master/src/vm/eehash.cpp#L289
https://github.com/dotnet/coreclr/blob/master/src/inc/utilcode.h#L3020
And I just simply copied it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intuition feels like this hash function is OK for string interning. Strings that get interned are only attacker-controlled when passed verbatim to string.Intern
(assuming our threat model doesn't consider string literals in metadata to be attacker controlled). In this case, though, any string on the heap (and therefore passing through this code path) is potentially attacker controlled and large (<85kb since LOH strings aren't eligible for this). I'd be worried of attackers repeatedly colliding strings and forcing the slow string comparison path on potentially large numbers of large strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@swgillespie could you please explain this:
<85kb since LOH strings aren't eligible for this
I'm agree with your concerns, but keep in mind that this code meant to happen concurrent with mutators, i.e. not affecting STW pause. Also stronger hash function may affect performance more than collision-induced long chains of entries. And finally such malicious strings have to survive probably several gen0 and following gen1 collections.
I'm not against your suggestion, just want all trade-offs to be considered.
And last, we can try different structures like trie-based ones and trade hashing issues for cache thrashing issues 🤣 . Of course this too exotic, just for the record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collision-induced long chains of entries
You can give up adding strings to the hashtable when you get collision-induced long chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Rattenkrieg Correct me if I'm wrong, but my understanding of your approach is that strings only get deduped when they are promoted into gen 2, correct? In that case, LOH strings won't ever get promoted into gen 2 (since they are allocated in Gen2/LOH) and won't trigger this logic. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@swgillespie that's right! I've completely missed the way how large strings instantly get promoted. Though I don't think there are going to be duplicates among large strings, towards the finish of this PR we may consider knobs to probe large heap too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating collisions with this algorithm and a hash table with power-of-two buckets is trivial -- you just need to have the last few characters of the string be the same. We should assume an attacker does have the ability to create a bunch of strings and have them last for a little while. That's not particularly far-fetched when they can submit things like a json document with lots of strings in it to a server.
We've taken two approaches to this that would be acceptable:
- Always use a randomly seeded strong hash. In the VM, we use https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/vm/marvin32.cpp . Marvin is about half the speed of algorithms like HashBytes (though it may make up some of that with a better distribution of results, especially in the power-of-two hash table). I'd recommend testing with this to see if it's good enough to always use (it's how we implement String.GetHashCode).
- Detect attacks and change strategies. We do that in Dictionary by counting collisions and switching to Marvin if we hit a threshold. You might also be able to detect attacks and just skip putting the colliding strings in the table, but that would have a couple of issues:
a. Strings would still have to be hashed/checked to see that they are more collisions. That might still add some load, especially if they're long.
b. If a service depends on the improved performance of deduplication, an attack that disables it might still act as a denial-of-service.
src/gc/stringobjhashtable.cpp
Outdated
if (entryKey->GetCharCount() != key->GetCharCount()) | ||
return false; | ||
|
||
return !memcmp(entryKey->GetStringBuffer(), key->GetStringBuffer(), entryKey->GetCharCount() * sizeof(wchar_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sizeof(wchar_t)
is platform dependent, so this looks scary to me.
src/gc/env/gcenv.object.h
Outdated
{ | ||
private: | ||
uint32_t m_StringLength; | ||
wchar_t m_Characters[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wchar_t
is 32-bits on Linux, which wouldn't be correct for this definition.
On interference with intern pool:Right now the algorithm doesn't give special treatment to interned strings, i.e it can relocate pointer to interned string to regular string or vice-versa. That may broke reference counting invariants. These are the places where I've found dependencies on correct RC: On interference with strings under lock:At first I completely ignored such issue since locking on marshal-by-bleed instances is conventially forbidden. But then I've found this: "FxCop: System.Uri locks on a string." dotnet/corefx#13107 So since even BCL contains evil code like that I came with conclusion to examine object header and potentially sync-block entries and reschedule i.e |
@@ -15357,6 +15377,11 @@ void gc_heap::gc1() | |||
assert (g_gc_card_bundle_table == card_bundle_table); | |||
#endif | |||
|
|||
if (!g_gc_pStringClass) | |||
{ | |||
g_gc_pStringClass = GCToEEInterface::GetStringMethodTable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the logically closest place before actual usage of g_gc_pStringClass.
But we may put it higher on the callstack ie closer to collection begining.
I have tried @swgillespie 's idea
... and it worked! つ ◕_◕ ༽つ Fun fact: there are some amount of duplicated strings in CLR like "0", "1", "2", "%" etc |
src/gc/stringobjhashtable.h
Outdated
#else | ||
|
||
#ifndef _INC_WINDOWS | ||
typedef wchar_t TCHAR; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That copied from gcenv.structs.h
, which is included in gcenv.h
and it's ok to use gcenv.h
in .cpp's but not in headers
typedef char TCHAR
for PLATFORM_UNIX
is super confusing. Isn't sizeof(char) == 1
everywhere according to standard?
Personally I'd like to use char16_t
, but it raising type errors when interoping with StringObject
methods which are WCHAR
typed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TCHAR
is Windowsism: https://msdn.microsoft.com/en-us/library/office/cc842072.aspx
src/gc/env/gcenv.structs.h
Outdated
#else | ||
|
||
#ifndef _INC_WINDOWS | ||
typedef wchar_t TCHAR; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes were introduced in 0462be1
None of usages of TCHAR have survived to this day. So I'm renaming TCHAR
to WCHAR
and replacing char
with char16_t
for PLATFORM_UNIX
.
sorry to jump in this late - I was mostly out for the past few weeks. this really shouldn't be done as a GC feature - there's no reason to - you can do this outside GC so you don't affect GC STW pauses and don't affect critical code paths in GC penalizing everything else. the algorithm is this - go through the heap for old generations (gen2/3) linearly as we only care about those objects; objects in gen0/1 might die quickly and even if they point to strings in the old generations we don't care about deduping those. if we see an object contains a reference to a string object -
if an object doesn't contain any references to string objects, simply skip (this includes skipping string objects themselves). note that going through the gen3 heap (LOH) requires cooperation with the LOH allocator. this is the same way BGC does it (with bgc_mark_set/bgc_mark_done). you can just leave this part out for now as you mentioned - we know that objects on LOH usually aren't nearly as reference rich as the ones on SOH. most of this could actually be done in managed code. we do need to expose some FCalls to facilitate this - we need to expose the relevant refs to managed code. the benefit of this is you don't need to write the kind of "manually managed code" that you see in the runtime (eg, you don't need to manually relocate the addresses in a gen2 compacting GC if you just store a ref to the string object; you don't need to probe for GC to allow it to happen). I am fine if you want to implement this in native code first and convert most of it to managed code later; or if you want to try the managed implementation first. again, sorry to comment so late... |
This would work only if all updates of the ref in the app were done using interlocked operations as well. It is hard to do, and even if it was doable - the performance impact would be prohibitive. Updating of the string references to deduped copy has to be done during STW. I do not see any other way. |
I'm fine, if even this will be rejected completely, digging through gc internals was fun for me and I learned couple of things from sources.
From the very begining I thought this planned to be opt-in via configuration knobs. Though it's obvious that additional work during gen1 and gen2 would impact on highest percentiles, there are several types of throughput oriented applications, (compilers, test runners, everything CI related etc) which should benefit from such feature. Also amount of collections may be reduced as well due to smaller number of live objects. As variation of your algorithm we can try to perform references relocation during STW while keep scanning phase running concurrently with mutators. |
why do you think that? on Intel "lock cmpxchg" will prevent other processors from accessing the memory location that's being locked; and loading/storing a ptr sized word is an atomic operation itself; this means if there's an update from another processor it will either be completely observed by lock cmpxchg before or after. on arm ldrex/strex or ldxr/stxr also provide exclusive access which is reset by normal stores so it achieves the same purpose. |
@Maoni0 given that mutators can pin or lock on string being processed by dedup thread, does that imply that we need atomic conditional update of two adjacent pointer sized words? |
@Maoni0 You are right that doing all updates using interlocked operations won't make difference. We would actually need read barriers to address the concern I was worried about. If string reference updates were done outside STW, the programs and libraries out there would need to be prepared to handle the possibility of one string object becoming two string objects. It is even more breaking in very subtle ways than simple dedupping where two string objects become one string object. It is an interesting option to experiment with, but I would think the less breaking option is the more useful one. In other words, is the following test guaranteed to pass with string deduping turned on? class Test
{
string _field;
Test()
{
_field = new string('a', 1);
string a1 = _field;
Thread.Sleep(1000);
string a2 = _field;
Debug.Assert(Object.ReferenceEquals(a1,a2));
}
} |
Maybe we can structure this such that the string deduping logic is outside the GC, and the GC just provides methods to fold one object into another one atomically during Gen2 GC? Such callback can be then also used e.g. to fold all empty arrays together, etc. There seems to be a lot out there written about deduplicating garbage collection: https://www.bing.com/search?q=deduplicating+GC. |
What's the plan here? 1+ year no update ... should we close the PR? |
I agree that the PR should be closed until somebody is interested in picking this up again. @Rattenkrieg Thank you anyway! |
PoC for #14208
Currently able to deduplicate strings in non-concurrent workstation mode.
Algorithm in brief:
pointer is string && is evacuating from gen1 && at least has sizeof ptr characters;
StringDedupQueue
;is_string_and_about_to_be_promoted_to_gen2
call sites. That is double work, but I see no way to know in advance whether we are going to compact (with exception of rare cases when user ask for compaction byGCCollectionMode.Forced
).StringDedupThread
populatingStringDedupTable
with strings fromStringDedupQueue
:StringDedupThread
is running synchronously after STW gc to ease my debugging.adjust_string_dups_reloc_pointers
is called right before relocating pointers - for each group of duplicates fromStringDedupTable
we seek for first live string in the heap and destructively adjusting other strings by marking their object headers with0xFF...
and writing pointer to original into its first sizeof ptr characters;try_relocate_duplicate_string
is made to relocate pointers to duplicates to their original string.Things to do:
WCHAR
vswchar_t
,BOOL
vsbool
etc - I need help here;StringDedupThread
concurrently, handle related routines like cooperative join;