-
Notifications
You must be signed in to change notification settings - Fork 5
Add a storetheindex delegated provider #158
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine. One small bug/typo which I noted. Next steps:
- We keep this PR in this branch and don't merge it until we test it in production.
- Tommy has a way of deploying a target commit to a single Hydra machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just did a very cursory look at this and found a config option bug. I'd recommend running this locally and testing that it works before trying to deploy this into production.
FWIW you may want to use something like https://github.com/aschmahmann/vole to issue DHT queries to a head.
Co-authored-by: Adin Schmahmann <[email protected]>
tested locally using vole. |
@thattommyhall when you get a chance can you test with the latest commit in this branch, and the config flag
or the env setting:
|
@@ -80,7 +80,9 @@ func mergeAddrInfos(infos []peer.AddrInfo) []peer.AddrInfo { | |||
} | |||
var r []peer.AddrInfo | |||
for k, v := range m { | |||
r = append(r, peer.AddrInfo{ID: k, Addrs: v}) | |||
if k.Validate() == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine, but why wouldn't we do this check further up next to the if r.Err == nil
check when accumulating the addresses before merging them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because we aren't doing an explicit iteration through the addrInfo's / keys during that part of accumulation
This needs to be rebased on master before we can merge it since it has conflicts |
that's the merge commit that got pushed this morning, no? github says there are not conflicts |
|
||
func (c *client) FindProviders(ctx context.Context, mh multihash.Multihash) ([]peer.AddrInfo, error) { | ||
// encode request in URL | ||
u := fmt.Sprint(c.endpoint.String(), "/", mh.B58String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to use multibase b58 encoding here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but this is an existing endpoint which we are planning on replacing with the delegated routing one anyway.
Side note: @willscott you probably want to change the endpoint at some point in the future to use multibase. The cost of not having that one extra character is almost never worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but this is an existing endpoint which we are planning on replacing with the delegated routing one anyway.
Yeah I was just wondering if we should change the endpoint to use multibase encoding, if it's not too late. If getting replaced soon, then disregard :). (and consider doing this for the replacement)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this whole setup is for the 'works now' variant until go-delegated-routing
is solidified and migrated to.
if len(parsedResponse.MultihashResults) != 1 { | ||
return nil, fmt.Errorf("unexpected number of responses") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this always has one then why is it in an array? Is it expected the change in the future? If so can we just loop over the array so this can be forwards compatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the query endpoint allows an array of multihashes to be queried. this client only queries for an individual one at a time.
ContextID []byte | ||
Metadata json.RawMessage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these fields for? They look unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the response from the indexer node contains these fields, which are used by some providers. They're here for completeness of the message format. There has been some conversations of providers considering using them for cases that they could be relevant here, for instance in expressing priorities, or that multiple records with the same contextID should be de-duplicated
head/head.go
Outdated
if err != nil { | ||
return nil, nil, fmt.Errorf("failed to instantiate delegation client (%w)", err) | ||
} | ||
providerStore = hproviders.CombineProviders(providerStore, hproviders.AddProviderNotSupported(stiProvider)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this will try the caching provider store concurrently, which we expect to fail, which will then enqueue an async DHT lookup. Those are expensive, will always fail, and will contend for resources (the work queue) with regular DHT queries...is there a way to avoid that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying you're concerned that all the requests that end up being fulfilled by the indexers will result in DHT queries that will likely fail and you're concerned about the load?
If so we have some options here depending on the semantics we want. One might be that if we make a "fallback provider" that instead of trying all the providers in parallel does them sequentially only if the previous ones fail. In this case we could then decide to only do a DHT lookup in the event the Datastore and Indexer systems returned no records.
This wouldn't cover every case (e.g. if there's some record in the DHT that we're missing for a given multihash, but the data is separately advertised by the indexers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there's a change in logic here it should be in a different PR in order to keep the scope of this one reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, but can we agree to not deploy this to the Hydras until this is fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is not making the current situation any worse right?
Do we have a consensus agreement for something better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change does make it worse, for the reasons listed above.
What @aschmahmann brought up seems like a good compromise. I can make the change if it helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with Gus. In practice merging this code will make it worse. The change I proposed should be pretty small though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i wasn't clear earlier - what i meant by 'this change' was that this PR uses the same composition structure as the already merged delegated routing structure. I agree that spinning up load in this path is something we need to watch in case it leads to lots of failing dht queries, and that the change to the composition structure propose seems good.
-
There isn't going to be substantial amount of upstream bitswap data that we expect loaded into store the index in the coming week. it would be useful for providers to begin testing the end-to-end flow though, so if the additional change is going to take more than this coming week, we should consider if we can get away without it temporarily.
-
@guseggert if you're able to make the proposed change, that would be great!
I think I've responded to the actionable things brought up. please take another look. |
I pushed some changes to do the logic layed out above, and fixes a few other things, let me know if this works for you. |
Actually there are some problems with my commit, let me fix them. |
3bac15c
to
dfa283a
Compare
dfa283a
to
598cc78
Compare
Okay I think it's correct now, and fixed up the end-to-end test to exercise the StoreTheIndex code path. |
Also: * Reuse the delegated routing HTTP client across all heads * Don't set arbitrary error string as Prometheus label, to avoid hitting time series limit * Unexport some structs * Rip out the other delegated routing stuff since it's unused & dangerous
598cc78
to
222ad10
Compare
2022-02-15: @petar will review. Potentially sync with @guseggert . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm + fyi notes
} | ||
if cfg.ProvidersFinder != nil && cfg.StoreTheIndexAddr != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this sequence of if statements is correct, but for future reference: When your intention is to have mutually-exclusive cases, both for readability and safety, it is best to capture them either as a switch statement, or with "if else" chain. Here switch statement would be best:
switch {
case cfg.ProvidersFinder != nil && cfg.StoreTheIndexAddr == "":
...
break
case cfg.ProvidersFinder != nil && cfg.StoreTheIndexAddr != "":
...
break
case cfg.ProvidersFinder == nil && cfg.StoreTheIndexAddr != "":
...
break
default:
... something is not right ...
}
if err != nil { | ||
return addrInfos, err | ||
} | ||
|
||
if len(addrInfos) > 0 { | ||
recordPrefetches(ctx, "local") | ||
return addrInfos, nil | ||
} | ||
|
||
return nil, d.Finder.Find(ctx, d.Router, key, func(ai peer.AddrInfo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably not worth the effort, but for future reference. It is bad style to reuse the error return value for two different purposes. Above, returned errors indicate errors in the providerstores. Here, errors indicate errors in the finder. Considering that the finder is an async functionality, independent of the GetProviders execution path, its error should be logged, not returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise this function seems to be a correct implementation of the logic we discussed in colo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't heard of that style guideline before, if a method is supposed to do two things, and it can't do one of them, then it's normal to return an error, regardless of which one failed. The finder is best effort, so an error from the finder here doesn't represent an error in the finding itself, it means that we couldn't even try to find (e.g. in async case, that the work couldn't be queued for some reason). My intention was for the CachingProviderStore to not know nor care that things are happening async, but given that we don't have a sync version of this, I can see how that just adds confusion, so I can rename things a bit here to clarify.
No need to make any changes. I guess it’s a matter of preference/taste.
…On Wed, Feb 16, 2022 at 9:17 AM Gus Eggert ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In providers/caching.go
<#158 (comment)>:
> if err != nil {
return addrInfos, err
}
if len(addrInfos) > 0 {
- recordPrefetches(ctx, "local")
return addrInfos, nil
}
return nil, d.Finder.Find(ctx, d.Router, key, func(ai peer.AddrInfo) {
I haven't heard of that style guideline before, if a method is supposed to
do two things, and it can't do one of them, then it's normal to return an
error, regardless of which one failed. The finder is best effort, so an
error from the finder here doesn't represent an error in the finding
itself, it means that we couldn't even *try* to find (e.g. in async case,
that the work couldn't be queued for some reason). My intention was for the
CachingProviderStore to not know nor care that things are happening async,
but given that we don't have a sync version of this, I can see how that
just adds confusion, so I can rename things a bit here to clarify.
—
Reply to this email directly, view it on GitHub
<#158 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACFTSZH2XMTC2RM3AH2Y3LU3PL3JANCNFSM5NNHH4KA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This work will be undone in future as part of #162 |
This looks a lot like the current delegated provider, but makes the request with the json / url format spoken by the current storetheindex find http server.
This PR does connect up metrics to views so that stats on these delegated providers will become visible to prometheus.
The code in
providers/storetheindex
is a re-homing of this PR which has an end-to-end test. Thego-delegated-routing
repo isn't a good home for it, as this is more of a current cludge than the long-term protocol we want to support.I'm not including the test from that PR in this repo as it depends on the storetheindex codebase, which is a newer/incompatible version of all the libp2p core dependencies.