Skip to content
This repository has been archived by the owner on Apr 18, 2024. It is now read-only.

Implement Tiered Hashing with small pool sizes, kick out nodes for correctness, talk more to fast nodes and reduce pool churn #86

Merged
merged 33 commits into from
May 2, 2023

Conversation

aarshkshah1992
Copy link
Contributor

@aarshkshah1992 aarshkshah1992 commented Apr 12, 2023

TODO

Subsequent PRs:

  • Traffic Mirroring to probe the unknown set instead of the current probabilistic mechanism ? (subsequent PR)
  • Use exponential decay over the entire history of the node rather than sliding window percentiles to rank nodes

caboose.go Outdated
Comment on lines 19 to 23
const (
BifrostProd = "bifrost-prod"
BifrostStaging = "bifrost-staging"
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is coming in as a tag from the environment, why are we making this bifrost specific?

isn't the intention to allow caboose to be re-used for other clients, not just bifrost?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willscott Yeah, had this for local testing. Will move to using env vars once we set them on Bifrost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@willscott willscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine for a next iteration

The "speed" calculation I think needs to look at bytes received versus wall clock time, rather than the throughput of individual requests

@aarshkshah1992
Copy link
Contributor Author

@willscott Not sure what you mean by that speed comment. Speed is calculated as (bytes recieved)/(end - start time).

What change do we need to make here ?

@willscott
Copy link
Contributor

That the speed we care about is how many bytes we got over the last minute - we'll get more accuracy by adding receiver bytes into a bucket per peer and looking at it overall, rather than sampling each individual request as individual throughput measurements

@aarshkshah1992
Copy link
Contributor Author

@willscott But we add the speed of the individual request to a bucket for that peer and then use the P25 of that bucket to rank. Isn't that what you are saying here ?

@willscott
Copy link
Contributor

You wants be directly summing Bytes received. - by dividing by time of individual request you aren't going to measure the overall bandwidth well

lidel added a commit to ipfs-inactive/bifrost-gateway that referenced this pull request Apr 12, 2023
lidel added a commit to ipfs-inactive/bifrost-gateway that referenced this pull request Apr 12, 2023
fetcher.go Outdated Show resolved Hide resolved
@aarshkshah1992 aarshkshah1992 changed the title [WIP] Aggressive and better pool management Implement Tiered Hashing will small pool sizes May 1, 2023
@aarshkshah1992 aarshkshah1992 changed the title Implement Tiered Hashing will small pool sizes Implement Tiered Hashing with small pool sizes, kick out nodes for correctness, talk more to fast nodes and reduce pool churn May 1, 2023
caboose_test.go Outdated

type HarnessOption func(config *caboose.Config)

func WithTieredHashingOpts(opts []tieredhashing.Option) func(config *caboose.Config) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func WithTieredHashingOpts(opts []tieredhashing.Option) func(config *caboose.Config) {
func WithTieredHashingOpts(opts []tieredhashing.Option) HarnessOption {

use the type above for these options

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -260,6 +188,7 @@ func (e *ep) Setup() {
e.valid = true
e.resp = testBlock
e.server = httptest.NewTLSServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
time.Sleep(time.Millisecond * 20)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? / why are responses delayed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to introduce some form of delay in the tests as this is the "L1 peer that sends us blocks".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing so masks race conditions / makes tests artificially slow - we ideally shouldn't need it - add TODO to further clean up tests

fetcher.go Outdated
@@ -74,14 +75,17 @@ func (p *pool) doFetch(ctx context.Context, from string, c cid.Cid, attempt int)
}

// TODO Refactor to use a metrics collector that separates the collection of metrics from the actual fetching
func (p *pool) fetchResource(ctx context.Context, from string, resource string, mime string, attempt int, cb DataCallback) (err error) {
// rm will be nil only for context cancellation errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is confusing - it looks like it isn't nil even in the context cancellation case - it's just an empty ResponseMetrics objects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a remanant of a previous iteration. Fixed.

fetcher.go Outdated
@@ -232,22 +227,35 @@ func (p *pool) fetchResource(ctx context.Context, from string, resource string,
}
}
}
req.Header.Add("User-Agent", "bifrost-"+os.Getenv(EnvironmentKey))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't over-ride the provided user agent - there's already a more specific user agent set on the client that bifrost passes in and this will over-write it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willscott Can we append this to the existing UserAgent that the bifrost client has on it ? I believe L1s need this to be able to bifurcate metrics etc on their side based on environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

appending is okay, sure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or you could set environment as another header, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appended to existing

out.car Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be committed / in history

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

pool.go Outdated
Comment on lines 114 to 115
poolSizeMetric.WithLabelValues("unknown").Set(float64(mt.Unknown))
poolSizeMetric.WithLabelValues("main").Set(float64(mt.Main))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"unknown" / "main" also as consts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pool.go Outdated
aff = cidToKey(c)
}

p.lk.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this an exclusive lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

pool.go Outdated
aff = path
}

p.lk.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise: why is this an exclusive lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agreed this can be just a read lock. Fixed.

var nodes []nodeWithLatency

for n, perf := range t.nodes {
perf := perf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

"github.com/serialx/hashring"
)

// TODO Make env vars for tuning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file an issue for this if it's not in the PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#86

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@willscott willscott merged commit 34094f8 into main May 2, 2023
@willscott willscott deleted the feat/reduce-pool-churn branch May 2, 2023 07:48
guanzo pushed a commit to guanzo/bifrost-gateway that referenced this pull request May 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants