-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic peer SIG discovery #2155
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 8 files at r1.
Reviewable status: 7 of 8 files reviewed, 4 unresolved discussions (waiting on @sustrik)
go/sig/egress/session/session.go, line 174 at r1 (raw file):
} func (s *Session) Healthy() bool {
I assume this move was accidental, please revert :)
go/sig/egress/session/sessmon.go, line 58 at r1 (raw file):
// When a successful response is received, this flag will cause the // sessions remoteInfo to be updated from smRemote. needUpdate bool
This is now obsolete.
go/sig/egress/session/sessmon.go, line 152 at r1 (raw file):
if !ok { sm.Debug("Current path was invalidated", "remote", sm.smRemote) sm.sess.healthy.Store(true)
I'm not sure that setting the health explicitly here does what we want. I think it's a no-op in any case, but if it isn't, having a unhealthy->healthy transition isn't helpful. If we're already in unhealthy, it's better for the health check to do the transition upon a successful probe.
go/sig/egress/session/sessmon.go, line 250 at r1 (raw file):
currSessRemote := sm.sess.Remote() if sm.updateMsgId == rpld.Id { if !currSessRemote.Sig.Host.Eq(rpld.Addr.Host.L3) {
We should probably also support updating CtrlL4Port
and EncapL4Port
. Oh. That is not the correct field to be using. rpld.Addr
is the src address of the reply. What we want is the contents of the reply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 8 files at r1.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @sustrik)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 11 files reviewed, 4 unresolved discussions (waiting on @kormat)
go/sig/egress/session/session.go, line 174 at r1 (raw file):
Previously, kormat (Stephen Shirley) wrote…
I assume this move was accidental, please revert :)
Done.
go/sig/egress/session/sessmon.go, line 58 at r1 (raw file):
Previously, kormat (Stephen Shirley) wrote…
This is now obsolete.
Done.
go/sig/egress/session/sessmon.go, line 152 at r1 (raw file):
Previously, kormat (Stephen Shirley) wrote…
I'm not sure that setting the health explicitly here does what we want. I think it's a no-op in any case, but if it isn't, having a unhealthy->healthy transition isn't helpful. If we're already in unhealthy, it's better for the health check to do the transition upon a successful probe.
Done.
go/sig/egress/session/sessmon.go, line 250 at r1 (raw file):
Previously, kormat (Stephen Shirley) wrote…
We should probably also support updating
CtrlL4Port
andEncapL4Port
. Oh. That is not the correct field to be using.rpld.Addr
is the src address of the reply. What we want is the contents of the reply.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 8 files at r1, 6 of 6 files at r2.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @kormat and @sustrik)
go/sig/egress/session/session.go, line 151 at r2 (raw file):
} func (s *Session) Healthy() bool {
Can you move this up to where it originally was (between ID()
and PathPool()
) so that we don't have a diff for something that didn't change.
go/sig/egress/session/sessmon.go, line 182 at r2 (raw file):
func (sm *sessMonitor) getNewPath(old *egress.SessPath) *egress.SessPath { // TODO(sustrik): Don't pick paths that are about to expire.
I think it would make sense if the SessPathPool would not only consider but also Expiry. (But this should definitely not change in this PR)
go/sig/egress/session/sessmon.go, line 273 at r2 (raw file):
} else { // This is going to happen if latency of the path is greater than the poll ticker period. sm.Debug("Reply to an old request received", "request", sm.updateMsgId, "reply", rpld.Id)
I wonder if that shouldn't be an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 11 files reviewed, 7 unresolved discussions (waiting on @lukedirtwalker, @kormat, and @sustrik)
go/sig/egress/session/session.go, line 151 at r2 (raw file):
Previously, lukedirtwalker (Lukas Vogel) wrote…
Can you move this up to where it originally was (between
ID()
andPathPool()
) so that we don't have a diff for something that didn't change.
Done.
go/sig/egress/session/sessmon.go, line 273 at r2 (raw file):
Previously, lukedirtwalker (Lukas Vogel) wrote…
I wonder if that shouldn't be an error?
If it's just one packet, then no, it may have just been delayed, so no big problem. If it happens consistently the path latency may be generally too high for us to cope with, e.g. if it's routed through a satellite. We should probably add some monitoring to catch the later case. Adding a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 8 of 11 files reviewed, 7 unresolved discussions (waiting on @lukedirtwalker and @kormat)
go/sig/egress/session/sessmon.go, line 182 at r2 (raw file):
Previously, lukedirtwalker (Lukas Vogel) wrote…
I think it would make sense if the SessPathPool would not only consider but also Expiry. (But this should definitely not change in this PR)
Actually, it may be better if that's part of this PR. I've fixed the function to do more nuanced path selection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r3.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @kormat and @sustrik)
go/sig/egress/interface.go, line 107 at r3 (raw file):
type SessPathPool map[spathmeta.PathKey]*SessPath // Return the most suitable path. Exclude a specific path, if possible.
I think it should probably say what the metric for most suitable is. AFAIU it is non-expiring with least failures first, otherwise least failures first, otherwise the same again.
Is it intended that we possibly chose a path with a high amount of error only because some other path is expiring soon? I'm not sure what the best approach is here.
go/sig/egress/session/sessmon.go, line 177 at r3 (raw file):
func (sm *sessMonitor) getNewPath(old *egress.SessPath) *egress.SessPath { key := old.Key()
Is old
guaranteed to be non-nil ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 10 of 11 files reviewed, 6 unresolved discussions (waiting on @lukedirtwalker and @kormat)
go/sig/egress/interface.go, line 107 at r3 (raw file):
Previously, lukedirtwalker (Lukas Vogel) wrote…
I think it should probably say what the metric for most suitable is. AFAIU it is non-expiring with least failures first, otherwise least failures first, otherwise the same again.
Is it intended that we possibly chose a path with a high amount of error only because some other path is expiring soon? I'm not sure what the best approach is here.
As for the comment, I would keep it succint. Too much detail and the comment would eventually diverge from the implementation.
As for the best strategy, I have no strong perferences and I am open to suggestions. @kormat?
go/sig/egress/session/sessmon.go, line 177 at r3 (raw file):
Previously, lukedirtwalker (Lukas Vogel) wrote…
Is
old
guaranteed to be non-nil ?
Well spotted. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 10 of 11 files reviewed, 4 unresolved discussions (waiting on @lukedirtwalker and @kormat)
go/sig/egress/interface.go, line 107 at r3 (raw file):
Previously, sustrik (Martin Sustrik) wrote…
As for the comment, I would keep it succint. Too much detail and the comment would eventually diverge from the implementation.
As for the best strategy, I have no strong perferences and I am open to suggestions. @kormat?
Ok works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 6 files at r2, 1 of 3 files at r3.
Reviewable status: 10 of 11 files reviewed, 5 unresolved discussions (waiting on @lukedirtwalker, @kormat, and @sustrik)
go/sig/egress/interface.go, line 84 at r4 (raw file):
type RemoteInfo struct { Sig siginfo.Sig
I've noticed in a few places that you're using a siginfo.Sig
value instead of pointer - why the change?
go/sig/egress/interface.go, line 108 at r4 (raw file):
// Return the most suitable path. Exclude a specific path, if possible.
Blank line.
go/sig/egress/interface.go, line 135 at r4 (raw file):
return bestSessPath } // In the worst case return the excluded path. Given that the caller asked to exclude it
I'm not sure how you can end up here. If bestSessPath
is nil
, that implies there are no paths in the pool, so doing spp[*exclude]
will panic.
TL;DR, i think you can just do:
// Return a non-expiring path with least failures.
if bestNonExpiringSessPath != nil {
return bestNonExpiringSessPath
}
return bestSessPath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r2, 1 of 3 files at r3, 1 of 1 files at r4.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @sustrik)
go/sig/egress/session/session.go, line 51 at r4 (raw file):
// pool of paths, managed by pathmgr pool egress.PathPool // remote SIG
This isn't used anymore.
go/sig/egress/session/sessmon.go, line 121 at r4 (raw file):
if since > tout { // FIXME(kormat): these debug statements should be converted to prom metrics. sm.Debug("isessMonitor: Remote SIG timeout", "remote", sm.smRemote, "duration", since)
sessMonitor
go/sig/egress/session/sessmon.go, line 165 at r4 (raw file):
if sm.smRemote.SessPath.IsCloseToExpiry() { sm.Debug("sessMonitor: Current path is about to expire", "remote", sm.smRemote) sm.sess.healthy.Store(true)
I don't think we need to change the health state here either.
go/sig/egress/session/sessmon.go, line 181 at r4 (raw file):
} key := old.Key() return sm.sessPathPool.Get(&key)
Changing from PathKey
to*PathKey
is making things weeeird.
go/sig/egress/session/sessmon.go, line 240 at r4 (raw file):
return } sm.lastReply = time.Now()
Hmm. We should only update this if the reply is a match. Otherwise bogus replies will stop timeouts from firing.
go/sig/egress/session/sessmon.go, line 256 at r4 (raw file):
if sessRemote == nil || !sm.smRemote.Sig.Host.Eq(sessRemote.Sig.Host) || sm.smRemote.Sig.EncapL4Port != sessRemote.Sig.EncapL4Port {
It would be easier to implement an Eq
method on siginfo.Sig
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 11 files reviewed, 9 unresolved discussions (waiting on @kormat and @lukedirtwalker)
go/sig/egress/interface.go, line 84 at r4 (raw file):
Previously, kormat (Stephen Shirley) wrote…
I've noticed in a few places that you're using a
siginfo.Sig
value instead of pointer - why the change?
Because now it's always set, never nil.
go/sig/egress/interface.go, line 108 at r4 (raw file):
Previously, kormat (Stephen Shirley) wrote…
Blank line.
Done.
go/sig/egress/interface.go, line 135 at r4 (raw file):
Previously, kormat (Stephen Shirley) wrote…
I'm not sure how you can end up here. If
bestSessPath
isnil
, that implies there are no paths in the pool, so doingspp[*exclude]
will panic.TL;DR, i think you can just do:
// Return a non-expiring path with least failures. if bestNonExpiringSessPath != nil { return bestNonExpiringSessPath } return bestSessPath
There can still be the excluded path in the pool as it was skipped in the loop above. We want to use it if there's no other option. No?
go/sig/egress/session/session.go, line 51 at r4 (raw file):
Previously, kormat (Stephen Shirley) wrote…
This isn't used anymore.
Done.
go/sig/egress/session/sessmon.go, line 121 at r4 (raw file):
Previously, kormat (Stephen Shirley) wrote…
sessMonitor
Done.
go/sig/egress/session/sessmon.go, line 165 at r4 (raw file):
Previously, kormat (Stephen Shirley) wrote…
I don't think we need to change the health state here either.
Done.
go/sig/egress/session/sessmon.go, line 181 at r4 (raw file):
Previously, kormat (Stephen Shirley) wrote…
Changing from
PathKey
to*PathKey
is making things weeeird.
OK, fair enough. Reverted to the previous way of doing things.
go/sig/egress/session/sessmon.go, line 240 at r4 (raw file):
Previously, kormat (Stephen Shirley) wrote…
Hmm. We should only update this if the reply is a match. Otherwise bogus replies will stop timeouts from firing.
Done.
go/sig/egress/session/sessmon.go, line 256 at r4 (raw file):
Previously, kormat (Stephen Shirley) wrote…
It would be easier to implement an
Eq
method onsiginfo.Sig
.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r5.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @sustrik)
go/sig/egress/interface.go, line 84 at r4 (raw file):
Previously, sustrik (Martin Sustrik) wrote…
Because now it's always set, never nil.
Ah. That's not the only reason we use pointers, though. As you've noticed, all the methods on siginfo.Sig
have pointer receivers. The struct itself is also quite large, making a pointer reference probably more efficient to pass around. Plus, it contains a sync.RWMutex, which "must not be copied after first use" according to the docs - using a value siginfo.Sig makes it much easier to accidentally copy the struct, and break the mutex.
go/sig/egress/interface.go, line 135 at r4 (raw file):
Previously, sustrik (Martin Sustrik) wrote…
There can still be the excluded path in the pool as it was skipped in the loop above. We want to use it if there's no other option. No?
Ah, i see your point. There's no guarantee it is in the pool, though, so there needs to be another check.
go/sig/siginfo/sig.go, line 41 at r5 (raw file):
} func (s *Sig) Eq(x *Sig) bool {
If you add one more bit, it can work safely on potentially nil Sig
s:
if s == nil || x == nil {
return s == x
}
Then you don't need to check for == nil
before callingEq()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 7 of 10 files reviewed, 3 unresolved discussions (waiting on @kormat and @sustrik)
go/sig/egress/interface.go, line 84 at r4 (raw file):
Previously, kormat (Stephen Shirley) wrote…
Ah. That's not the only reason we use pointers, though. As you've noticed, all the methods on
siginfo.Sig
have pointer receivers. The struct itself is also quite large, making a pointer reference probably more efficient to pass around. Plus, it contains a sync.RWMutex, which "must not be copied after first use" according to the docs - using a value siginfo.Sig makes it much easier to accidentally copy the struct, and break the mutex.
I don't think it's being copied. Anyway, I think that may have been a bit of C-style thinking on my part. Reverted.
go/sig/egress/interface.go, line 135 at r4 (raw file):
Previously, kormat (Stephen Shirley) wrote…
Ah, i see your point. There's no guarantee it is in the pool, though, so there needs to be another check.
This is what I found in Go docs: "If the requested key doesn't exist, we get the value type's zero value." In this case the map's value is a pointer, so the result will be nil, which is exactly what we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 7 of 10 files reviewed, 3 unresolved discussions (waiting on @kormat)
go/sig/siginfo/sig.go, line 41 at r5 (raw file):
Previously, kormat (Stephen Shirley) wrote…
If you add one more bit, it can work safely on potentially nil
Sig
s:if s == nil || x == nil { return s == x }
Then you don't need to check for
== nil
before callingEq()
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 5 files at r6.
Reviewable status:complete! all files reviewed, all discussions resolved
go/sig/egress/interface.go, line 135 at r4 (raw file):
Previously, sustrik (Martin Sustrik) wrote…
This is what I found in Go docs: "If the requested key doesn't exist, we get the value type's zero value." In this case the map's value is a pointer, so the result will be nil, which is exactly what we want.
Gah, ok, i keep getting that one wrong. I withdraw all of this thread :)
The tests here will be somewhat lacking because of #1390, but that's out of scope of this issue. |
This is a breaking change. For SIGs to work properly they should be added to the topology file, so that BR is aware of them when it gets a packet sent to SvcSIG. Fixes #2121
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 5 files at r6, 6 of 6 files at r7.
Reviewable status:complete! all files reviewed, all discussions resolved
This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)