-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fastpoll Optimization #28
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but please clarify several things.
expireInterval = 5 * time.Minute | ||
) | ||
|
||
// Client is a cache of recently connected clients. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is pretty confusing. I presume you didn't call this "ClientCache", because the package is already named "cache".
At any rate, "Client" sounds to me like a type that represents a single client. I'd call this e.g. "Clients", "ClientCache", "ClientMapping", what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the package is named cache, as I figure we might put other caches here if we ever need them.
So everybody will see it as cache.Client, which I agree is backwards, but otherwise perfect.
cache.Clients leads to absurdities like cache.NewClients
cache.ClientCache seems overly redundant
cache.ClientMapping makes it sound less like a cache and more like a map.
Maybe cache.ClientData? But it isn't less backwards that cache.Client.
Suggest leaving as-is for now, we can fix up the name later if we find one that actually seems nice. Of course it makes sense that we have to worry about both of the hard problems in CS within the same PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One being caching and the other naming? :)
NewClients doesn't sound bad to me.
ClientData is also better than Client.
I wouldn't leave this as-is; you're the TL, you decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://martinfowler.com/bliki/TwoHardThings.html
Maybe Clients. I'll think about it over lunch, see if any better names come to mind.
|
||
for k, e := range c.m { | ||
if db.Now().Sub(e.u) > maxAge { | ||
delete(c.m, k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure modifying maps while iterating over them is allowed? In Python it raises.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
e := c.m[id] | ||
if e == nil || db.Now().Sub(e.u) > maxAge { | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if e == nil {
return nil
}
if db.Now().Sub(e.u) > maxAge {
delete(c.m, id)
return nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only have a read lock at this point, so we cannot change the map.
} | ||
|
||
// expire prunes the cache to clean out clients that are no longer up to date. | ||
func (c *Client) expire() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How big of a hiccup can this be worst case? 100ms? 500ms? Are we okay with having all callers of the cache wait this long? Just asking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think in terms of read/write locks previously. Having an option to read-lock helps a lot here - we can do one read-lock pass noting all the keys to be deleted, then acquire write-lock and delete the noted keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is unlikely to take even 100ms - how long do you think it takes to walk a 100k element has table? And even if it takes a bit, a hiccoup every 5 minutes is pretty minor. A two pass solution requires new allocations, is therefore less cache friendly, and subtle to get right - once you release the initial read lock, the records you plan to delete could be updated, so under the write lock you need to check again the expiry condition for the records you plan to remove.
Actually though, I just found golang/go#20135. So the right answer may be to either do nothing, or to rebuild the map entirely. (of course there is a cost for allowing deletion during iteration)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to promote a read lock to a write lock, no? Then the described race condition doesn't occur.
At any rate, not cleaning up sounds like the best way here. There are no bugs if there is no code & this structure will never get significantly large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sane Mutex implementation normally just disallow lock promotion altogether as it makes deadlocks too easy. (suppose two threads try to promote at once)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A memory leak is a bug, and the usual definition of a leak is that you hold on to data that you won't use. When leaks become a problem, they can be really hard to debug. Also, while deleting from the map might not shrink it, it still clears out the records so that the map space can be reused.
return len(c.m) | ||
} | ||
|
||
func (c *Client) expireLoop() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making cleanup periodical is surprising to me. While this works, another obvious way that seems cleaner to me is triggering cleanup once every 100000 (or a different constant) operations, or when len(c.m) % 10000 == 0. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have all the clients in the system cached without noticing the ram increase. (100k clients, 250 bytes/client ->25MB? Our least loaded server tasks are using 1GB) This is really just to be tidy, not to keep memory size down.
A client will take a few minutes to fall out of fastpoll, this is independent of the number of operations that occur.
Adding comments to explain the design choices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't refer to RAM. Adding a garbage collector goroutine isn't tidy - is all I'm saying. Do we need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making the cleanup time-based, when the need for cleanup tends to happen after a certain amount of time, is in my mind tidy. We could avoid the goroutine by checking the time on every op, but these are goroutines, not pthreads. They are cheap and meant to be freely used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's the tidiest to not have a GC at all. See the comment below.
// Controls cache timing behavior, variable only to support unit tests. | ||
var ( | ||
maxAge = 30 * time.Second | ||
expireInterval = 5 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are 30s and 5m the best values here? Should be clarified in a comment.
fleetspeak/src/server/comms.go
Outdated
return nil, nil | ||
var cld *db.ClientData | ||
var err error | ||
var ch bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "ch" stand for "cached"? It took me a minute and a look at the return at the end of this function to figure it out. Just naming this "cached" would be much better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to cacheHit
|
||
// Update updates or sets the cached data for a particular client. If data is | ||
// nil, it clears the data for the client. | ||
func (c *Client) Update(id common.ClientID, data *db.ClientData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only called in one place, with data=nil. Iiuc, that means the cache is always empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Of course commsContext.GetClientInfo should update the cache on load. Fixed.
defer c.l.Unlock() | ||
|
||
if data == nil { | ||
delete(c.m, id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it okay to call delete on a non-existent key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I double delete in the unit test just to make sure nothing crazy happens. But also is mentioned at: https://blog.golang.org/go-maps-in-action
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go.
Comment nits only.
d *db.ClientData | ||
} | ||
|
||
// NewClient returns a new cache of client data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NewClients
} | ||
|
||
// Stop releases the resources required for background cache maintenence. The | ||
// Cache should not be used once Stop has been called. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache -- lowercase?
Optimize the fastpoll situation from the server side by skipping certain database operations if we've done them recently.