-
Notifications
You must be signed in to change notification settings - Fork 2
Implement Tiered Hashing with small pool sizes, kick out nodes for correctness, talk more to fast nodes and reduce pool churn #86
Conversation
caboose.go
Outdated
const ( | ||
BifrostProd = "bifrost-prod" | ||
BifrostStaging = "bifrost-staging" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is coming in as a tag from the environment, why are we making this bifrost specific?
isn't the intention to allow caboose to be re-used for other clients, not just bifrost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willscott Yeah, had this for local testing. Will move to using env vars once we set them on Bifrost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine for a next iteration
The "speed" calculation I think needs to look at bytes received versus wall clock time, rather than the throughput of individual requests
@willscott Not sure what you mean by that speed comment. Speed is calculated as (bytes recieved)/(end - start time). What change do we need to make here ? |
That the speed we care about is how many bytes we got over the last minute - we'll get more accuracy by adding receiver bytes into a bucket per peer and looking at it overall, rather than sampling each individual request as individual throughput measurements |
@willscott But we add the speed of the individual request to a bucket for that peer and then use the P25 of that bucket to rank. Isn't that what you are saying here ? |
You wants be directly summing Bytes received. - by dividing by time of individual request you aren't going to measure the overall bandwidth well |
caboose_test.go
Outdated
|
||
type HarnessOption func(config *caboose.Config) | ||
|
||
func WithTieredHashingOpts(opts []tieredhashing.Option) func(config *caboose.Config) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func WithTieredHashingOpts(opts []tieredhashing.Option) func(config *caboose.Config) { | |
func WithTieredHashingOpts(opts []tieredhashing.Option) HarnessOption { |
use the type above for these options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -260,6 +188,7 @@ func (e *ep) Setup() { | |||
e.valid = true | |||
e.resp = testBlock | |||
e.server = httptest.NewTLSServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | |||
time.Sleep(time.Millisecond * 20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed? / why are responses delayed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to introduce some form of delay in the tests as this is the "L1 peer that sends us blocks".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doing so masks race conditions / makes tests artificially slow - we ideally shouldn't need it - add TODO to further clean up tests
fetcher.go
Outdated
@@ -74,14 +75,17 @@ func (p *pool) doFetch(ctx context.Context, from string, c cid.Cid, attempt int) | |||
} | |||
|
|||
// TODO Refactor to use a metrics collector that separates the collection of metrics from the actual fetching | |||
func (p *pool) fetchResource(ctx context.Context, from string, resource string, mime string, attempt int, cb DataCallback) (err error) { | |||
// rm will be nil only for context cancellation errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is confusing - it looks like it isn't nil even in the context cancellation case - it's just an empty ResponseMetrics objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is a remanant of a previous iteration. Fixed.
fetcher.go
Outdated
@@ -232,22 +227,35 @@ func (p *pool) fetchResource(ctx context.Context, from string, resource string, | |||
} | |||
} | |||
} | |||
req.Header.Add("User-Agent", "bifrost-"+os.Getenv(EnvironmentKey)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't over-ride the provided user agent - there's already a more specific user agent set on the client that bifrost passes in and this will over-write it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willscott Can we append this to the existing UserAgent
that the bifrost client has on it ? I believe L1s need this to be able to bifurcate metrics etc on their side based on environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appending is okay, sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or you could set environment as another header, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appended to existing
out.car
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this shouldn't be committed / in history
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
pool.go
Outdated
poolSizeMetric.WithLabelValues("unknown").Set(float64(mt.Unknown)) | ||
poolSizeMetric.WithLabelValues("main").Set(float64(mt.Main)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"unknown" / "main" also as consts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pool.go
Outdated
aff = cidToKey(c) | ||
} | ||
|
||
p.lk.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this an exclusive lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
pool.go
Outdated
aff = path | ||
} | ||
|
||
p.lk.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise: why is this an exclusive lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah agreed this can be just a read lock. Fixed.
tieredhashing/sorting.go
Outdated
var nodes []nodeWithLatency | ||
|
||
for n, perf := range t.nodes { | ||
perf := perf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
"github.com/serialx/hashring" | ||
) | ||
|
||
// TODO Make env vars for tuning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file an issue for this if it's not in the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
TODO
Subsequent PRs: