Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fastpoll Optimization #28

Merged
merged 6 commits into from
Jan 11, 2018
Merged

Fastpoll Optimization #28

merged 6 commits into from
Jan 11, 2018

Conversation

bgalehouse
Copy link
Contributor

Optimize the fastpoll situation from the server side by skipping certain database operations if we've done them recently.

@bgalehouse bgalehouse changed the title Fastpoll Fastpoll Optimization Jan 11, 2018
Copy link
Contributor

@mlkm mlkm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but please clarify several things.

expireInterval = 5 * time.Minute
)

// Client is a cache of recently connected clients.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is pretty confusing. I presume you didn't call this "ClientCache", because the package is already named "cache".
At any rate, "Client" sounds to me like a type that represents a single client. I'd call this e.g. "Clients", "ClientCache", "ClientMapping", what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the package is named cache, as I figure we might put other caches here if we ever need them.

So everybody will see it as cache.Client, which I agree is backwards, but otherwise perfect.

cache.Clients leads to absurdities like cache.NewClients
cache.ClientCache seems overly redundant
cache.ClientMapping makes it sound less like a cache and more like a map.

Maybe cache.ClientData? But it isn't less backwards that cache.Client.

Suggest leaving as-is for now, we can fix up the name later if we find one that actually seems nice. Of course it makes sense that we have to worry about both of the hard problems in CS within the same PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One being caching and the other naming? :)
NewClients doesn't sound bad to me.
ClientData is also better than Client.
I wouldn't leave this as-is; you're the TL, you decide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://martinfowler.com/bliki/TwoHardThings.html

Maybe Clients. I'll think about it over lunch, see if any better names come to mind.


for k, e := range c.m {
if db.Now().Sub(e.u) > maxAge {
delete(c.m, k)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure modifying maps while iterating over them is allowed? In Python it raises.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


e := c.m[id]
if e == nil || db.Now().Sub(e.u) > maxAge {
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if e == nil {
  return nil
}
if db.Now().Sub(e.u) > maxAge {
  delete(c.m, id)
  return nil
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only have a read lock at this point, so we cannot change the map.

}

// expire prunes the cache to clean out clients that are no longer up to date.
func (c *Client) expire() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big of a hiccup can this be worst case? 100ms? 500ms? Are we okay with having all callers of the cache wait this long? Just asking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think in terms of read/write locks previously. Having an option to read-lock helps a lot here - we can do one read-lock pass noting all the keys to be deleted, then acquire write-lock and delete the noted keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unlikely to take even 100ms - how long do you think it takes to walk a 100k element has table? And even if it takes a bit, a hiccoup every 5 minutes is pretty minor. A two pass solution requires new allocations, is therefore less cache friendly, and subtle to get right - once you release the initial read lock, the records you plan to delete could be updated, so under the write lock you need to check again the expiry condition for the records you plan to remove.

Actually though, I just found golang/go#20135. So the right answer may be to either do nothing, or to rebuild the map entirely. (of course there is a cost for allowing deletion during iteration)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to promote a read lock to a write lock, no? Then the described race condition doesn't occur.

At any rate, not cleaning up sounds like the best way here. There are no bugs if there is no code & this structure will never get significantly large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sane Mutex implementation normally just disallow lock promotion altogether as it makes deadlocks too easy. (suppose two threads try to promote at once)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A memory leak is a bug, and the usual definition of a leak is that you hold on to data that you won't use. When leaks become a problem, they can be really hard to debug. Also, while deleting from the map might not shrink it, it still clears out the records so that the map space can be reused.

return len(c.m)
}

func (c *Client) expireLoop() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making cleanup periodical is surprising to me. While this works, another obvious way that seems cleaner to me is triggering cleanup once every 100000 (or a different constant) operations, or when len(c.m) % 10000 == 0. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have all the clients in the system cached without noticing the ram increase. (100k clients, 250 bytes/client ->25MB? Our least loaded server tasks are using 1GB) This is really just to be tidy, not to keep memory size down.

A client will take a few minutes to fall out of fastpoll, this is independent of the number of operations that occur.

Adding comments to explain the design choices.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't refer to RAM. Adding a garbage collector goroutine isn't tidy - is all I'm saying. Do we need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the cleanup time-based, when the need for cleanup tends to happen after a certain amount of time, is in my mind tidy. We could avoid the goroutine by checking the time on every op, but these are goroutines, not pthreads. They are cheap and meant to be freely used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the tidiest to not have a GC at all. See the comment below.

// Controls cache timing behavior, variable only to support unit tests.
var (
maxAge = 30 * time.Second
expireInterval = 5 * time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are 30s and 5m the best values here? Should be clarified in a comment.

return nil, nil
var cld *db.ClientData
var err error
var ch bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "ch" stand for "cached"? It took me a minute and a look at the return at the end of this function to figure it out. Just naming this "cached" would be much better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to cacheHit


// Update updates or sets the cached data for a particular client. If data is
// nil, it clears the data for the client.
func (c *Client) Update(id common.ClientID, data *db.ClientData) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only called in one place, with data=nil. Iiuc, that means the cache is always empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Of course commsContext.GetClientInfo should update the cache on load. Fixed.

defer c.l.Unlock()

if data == nil {
delete(c.m, id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to call delete on a non-existent key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I double delete in the unit test just to make sure nothing crazy happens. But also is mentioned at: https://blog.golang.org/go-maps-in-action

Copy link
Contributor

@mlkm mlkm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go.
Comment nits only.

d *db.ClientData
}

// NewClient returns a new cache of client data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewClients

}

// Stop releases the resources required for background cache maintenence. The
// Cache should not be used once Stop has been called.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache -- lowercase?

@bgalehouse bgalehouse merged commit 23c8fa3 into google:master Jan 11, 2018
@bgalehouse bgalehouse deleted the fastpoll branch January 12, 2018 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants