Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic peer SIG discovery #2155

Merged
merged 1 commit into from
Dec 13, 2018
Merged

Dynamic peer SIG discovery #2155

merged 1 commit into from
Dec 13, 2018

Conversation

sustrik
Copy link
Contributor

@sustrik sustrik commented Nov 22, 2018

This change is Reviewable

Copy link
Contributor

@kormat kormat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 8 files at r1.
Reviewable status: 7 of 8 files reviewed, 4 unresolved discussions (waiting on @sustrik)


go/sig/egress/session/session.go, line 174 at r1 (raw file):

}

func (s *Session) Healthy() bool {

I assume this move was accidental, please revert :)


go/sig/egress/session/sessmon.go, line 58 at r1 (raw file):

	// When a successful response is received, this flag will cause the
	// sessions remoteInfo to be updated from smRemote.
	needUpdate bool

This is now obsolete.


go/sig/egress/session/sessmon.go, line 152 at r1 (raw file):

	if !ok {
		sm.Debug("Current path was invalidated", "remote", sm.smRemote)
		sm.sess.healthy.Store(true)

I'm not sure that setting the health explicitly here does what we want. I think it's a no-op in any case, but if it isn't, having a unhealthy->healthy transition isn't helpful. If we're already in unhealthy, it's better for the health check to do the transition upon a successful probe.


go/sig/egress/session/sessmon.go, line 250 at r1 (raw file):

	currSessRemote := sm.sess.Remote()
	if sm.updateMsgId == rpld.Id {
		if !currSessRemote.Sig.Host.Eq(rpld.Addr.Host.L3) {

We should probably also support updating CtrlL4Port and EncapL4Port. Oh. That is not the correct field to be using. rpld.Addr is the src address of the reply. What we want is the contents of the reply.

Copy link
Contributor

@kormat kormat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 8 files at r1.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @sustrik)

Copy link
Contributor Author

@sustrik sustrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 11 files reviewed, 4 unresolved discussions (waiting on @kormat)


go/sig/egress/session/session.go, line 174 at r1 (raw file):

Previously, kormat (Stephen Shirley) wrote…

I assume this move was accidental, please revert :)

Done.


go/sig/egress/session/sessmon.go, line 58 at r1 (raw file):

Previously, kormat (Stephen Shirley) wrote…

This is now obsolete.

Done.


go/sig/egress/session/sessmon.go, line 152 at r1 (raw file):

Previously, kormat (Stephen Shirley) wrote…

I'm not sure that setting the health explicitly here does what we want. I think it's a no-op in any case, but if it isn't, having a unhealthy->healthy transition isn't helpful. If we're already in unhealthy, it's better for the health check to do the transition upon a successful probe.

Done.


go/sig/egress/session/sessmon.go, line 250 at r1 (raw file):

Previously, kormat (Stephen Shirley) wrote…

We should probably also support updating CtrlL4Port and EncapL4Port. Oh. That is not the correct field to be using. rpld.Addr is the src address of the reply. What we want is the contents of the reply.

Done.

Copy link
Collaborator

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 8 files at r1, 6 of 6 files at r2.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @kormat and @sustrik)


go/sig/egress/session/session.go, line 151 at r2 (raw file):

}

func (s *Session) Healthy() bool {

Can you move this up to where it originally was (between ID() and PathPool()) so that we don't have a diff for something that didn't change.


go/sig/egress/session/sessmon.go, line 182 at r2 (raw file):

func (sm *sessMonitor) getNewPath(old *egress.SessPath) *egress.SessPath {
	// TODO(sustrik): Don't pick paths that are about to expire.

I think it would make sense if the SessPathPool would not only consider but also Expiry. (But this should definitely not change in this PR)


go/sig/egress/session/sessmon.go, line 273 at r2 (raw file):

	} else {
		// This is going to happen if latency of the path is greater than the poll ticker period.
		sm.Debug("Reply to an old request received", "request", sm.updateMsgId, "reply", rpld.Id)

I wonder if that shouldn't be an error?

@sustrik sustrik added the i/breaking change PR that breaks forwards or backwards compatibility label Dec 4, 2018
Copy link
Contributor Author

@sustrik sustrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 9 of 11 files reviewed, 7 unresolved discussions (waiting on @lukedirtwalker, @kormat, and @sustrik)


go/sig/egress/session/session.go, line 151 at r2 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

Can you move this up to where it originally was (between ID() and PathPool()) so that we don't have a diff for something that didn't change.

Done.


go/sig/egress/session/sessmon.go, line 273 at r2 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

I wonder if that shouldn't be an error?

If it's just one packet, then no, it may have just been delayed, so no big problem. If it happens consistently the path latency may be generally too high for us to cope with, e.g. if it's routed through a satellite. We should probably add some monitoring to catch the later case. Adding a comment.

Copy link
Contributor Author

@sustrik sustrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 8 of 11 files reviewed, 7 unresolved discussions (waiting on @lukedirtwalker and @kormat)


go/sig/egress/session/sessmon.go, line 182 at r2 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

I think it would make sense if the SessPathPool would not only consider but also Expiry. (But this should definitely not change in this PR)

Actually, it may be better if that's part of this PR. I've fixed the function to do more nuanced path selection.

Copy link
Collaborator

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r3.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @kormat and @sustrik)


go/sig/egress/interface.go, line 107 at r3 (raw file):

type SessPathPool map[spathmeta.PathKey]*SessPath

// Return the most suitable path. Exclude a specific path, if possible.

I think it should probably say what the metric for most suitable is. AFAIU it is non-expiring with least failures first, otherwise least failures first, otherwise the same again.
Is it intended that we possibly chose a path with a high amount of error only because some other path is expiring soon? I'm not sure what the best approach is here.


go/sig/egress/session/sessmon.go, line 177 at r3 (raw file):

func (sm *sessMonitor) getNewPath(old *egress.SessPath) *egress.SessPath {
	key := old.Key()

Is old guaranteed to be non-nil ?

Copy link
Contributor Author

@sustrik sustrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 10 of 11 files reviewed, 6 unresolved discussions (waiting on @lukedirtwalker and @kormat)


go/sig/egress/interface.go, line 107 at r3 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

I think it should probably say what the metric for most suitable is. AFAIU it is non-expiring with least failures first, otherwise least failures first, otherwise the same again.
Is it intended that we possibly chose a path with a high amount of error only because some other path is expiring soon? I'm not sure what the best approach is here.

As for the comment, I would keep it succint. Too much detail and the comment would eventually diverge from the implementation.

As for the best strategy, I have no strong perferences and I am open to suggestions. @kormat?


go/sig/egress/session/sessmon.go, line 177 at r3 (raw file):

Previously, lukedirtwalker (Lukas Vogel) wrote…

Is old guaranteed to be non-nil ?

Well spotted. Fixed.

Copy link
Collaborator

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 10 of 11 files reviewed, 4 unresolved discussions (waiting on @lukedirtwalker and @kormat)


go/sig/egress/interface.go, line 107 at r3 (raw file):

Previously, sustrik (Martin Sustrik) wrote…

As for the comment, I would keep it succint. Too much detail and the comment would eventually diverge from the implementation.

As for the best strategy, I have no strong perferences and I am open to suggestions. @kormat?

Ok works for me.

Copy link
Contributor

@kormat kormat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 6 files at r2, 1 of 3 files at r3.
Reviewable status: 10 of 11 files reviewed, 5 unresolved discussions (waiting on @lukedirtwalker, @kormat, and @sustrik)


go/sig/egress/interface.go, line 84 at r4 (raw file):

type RemoteInfo struct {
	Sig      siginfo.Sig

I've noticed in a few places that you're using a siginfo.Sig value instead of pointer - why the change?


go/sig/egress/interface.go, line 108 at r4 (raw file):

// Return the most suitable path. Exclude a specific path, if possible.

Blank line.


go/sig/egress/interface.go, line 135 at r4 (raw file):

		return bestSessPath
	}
	// In the worst case return the excluded path. Given that the caller asked to exclude it

I'm not sure how you can end up here. If bestSessPath is nil, that implies there are no paths in the pool, so doing spp[*exclude] will panic.

TL;DR, i think you can just do:

	// Return a non-expiring path with least failures.
	if bestNonExpiringSessPath != nil {
		return bestNonExpiringSessPath
	}
        return bestSessPath

Copy link
Contributor

@kormat kormat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 6 files at r2, 1 of 3 files at r3, 1 of 1 files at r4.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @sustrik)


go/sig/egress/session/session.go, line 51 at r4 (raw file):

	// pool of paths, managed by pathmgr
	pool egress.PathPool
	// remote SIG

This isn't used anymore.


go/sig/egress/session/sessmon.go, line 121 at r4 (raw file):

	if since > tout {
		// FIXME(kormat): these debug statements should be converted to prom metrics.
		sm.Debug("isessMonitor: Remote SIG timeout", "remote", sm.smRemote, "duration", since)

sessMonitor


go/sig/egress/session/sessmon.go, line 165 at r4 (raw file):

	if sm.smRemote.SessPath.IsCloseToExpiry() {
		sm.Debug("sessMonitor: Current path is about to expire", "remote", sm.smRemote)
		sm.sess.healthy.Store(true)

I don't think we need to change the health state here either.


go/sig/egress/session/sessmon.go, line 181 at r4 (raw file):

	}
	key := old.Key()
	return sm.sessPathPool.Get(&key)

Changing from PathKey to*PathKey is making things weeeird.


go/sig/egress/session/sessmon.go, line 240 at r4 (raw file):

		return
	}
	sm.lastReply = time.Now()

Hmm. We should only update this if the reply is a match. Otherwise bogus replies will stop timeouts from firing.


go/sig/egress/session/sessmon.go, line 256 at r4 (raw file):

		if sessRemote == nil ||
			!sm.smRemote.Sig.Host.Eq(sessRemote.Sig.Host) ||
			sm.smRemote.Sig.EncapL4Port != sessRemote.Sig.EncapL4Port {

It would be easier to implement an Eq method on siginfo.Sig.

Copy link
Contributor Author

@sustrik sustrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 11 files reviewed, 9 unresolved discussions (waiting on @kormat and @lukedirtwalker)


go/sig/egress/interface.go, line 84 at r4 (raw file):

Previously, kormat (Stephen Shirley) wrote…

I've noticed in a few places that you're using a siginfo.Sig value instead of pointer - why the change?

Because now it's always set, never nil.


go/sig/egress/interface.go, line 108 at r4 (raw file):

Previously, kormat (Stephen Shirley) wrote…

Blank line.

Done.


go/sig/egress/interface.go, line 135 at r4 (raw file):

Previously, kormat (Stephen Shirley) wrote…

I'm not sure how you can end up here. If bestSessPath is nil, that implies there are no paths in the pool, so doing spp[*exclude] will panic.

TL;DR, i think you can just do:

	// Return a non-expiring path with least failures.
	if bestNonExpiringSessPath != nil {
		return bestNonExpiringSessPath
	}
        return bestSessPath

There can still be the excluded path in the pool as it was skipped in the loop above. We want to use it if there's no other option. No?


go/sig/egress/session/session.go, line 51 at r4 (raw file):

Previously, kormat (Stephen Shirley) wrote…

This isn't used anymore.

Done.


go/sig/egress/session/sessmon.go, line 121 at r4 (raw file):

Previously, kormat (Stephen Shirley) wrote…

sessMonitor

Done.


go/sig/egress/session/sessmon.go, line 165 at r4 (raw file):

Previously, kormat (Stephen Shirley) wrote…

I don't think we need to change the health state here either.

Done.


go/sig/egress/session/sessmon.go, line 181 at r4 (raw file):

Previously, kormat (Stephen Shirley) wrote…

Changing from PathKey to*PathKey is making things weeeird.

OK, fair enough. Reverted to the previous way of doing things.


go/sig/egress/session/sessmon.go, line 240 at r4 (raw file):

Previously, kormat (Stephen Shirley) wrote…

Hmm. We should only update this if the reply is a match. Otherwise bogus replies will stop timeouts from firing.

Done.


go/sig/egress/session/sessmon.go, line 256 at r4 (raw file):

Previously, kormat (Stephen Shirley) wrote…

It would be easier to implement an Eq method on siginfo.Sig.

Done.

Copy link
Contributor

@kormat kormat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r5.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @sustrik)


go/sig/egress/interface.go, line 84 at r4 (raw file):

Previously, sustrik (Martin Sustrik) wrote…

Because now it's always set, never nil.

Ah. That's not the only reason we use pointers, though. As you've noticed, all the methods on siginfo.Sig have pointer receivers. The struct itself is also quite large, making a pointer reference probably more efficient to pass around. Plus, it contains a sync.RWMutex, which "must not be copied after first use" according to the docs - using a value siginfo.Sig makes it much easier to accidentally copy the struct, and break the mutex.


go/sig/egress/interface.go, line 135 at r4 (raw file):

Previously, sustrik (Martin Sustrik) wrote…

There can still be the excluded path in the pool as it was skipped in the loop above. We want to use it if there's no other option. No?

Ah, i see your point. There's no guarantee it is in the pool, though, so there needs to be another check.


go/sig/siginfo/sig.go, line 41 at r5 (raw file):

}

func (s *Sig) Eq(x *Sig) bool {

If you add one more bit, it can work safely on potentially nil Sigs:

if s == nil || x == nil {
    return s == x
}

Then you don't need to check for == nil before callingEq()

Copy link
Contributor Author

@sustrik sustrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 10 files reviewed, 3 unresolved discussions (waiting on @kormat and @sustrik)


go/sig/egress/interface.go, line 84 at r4 (raw file):

Previously, kormat (Stephen Shirley) wrote…

Ah. That's not the only reason we use pointers, though. As you've noticed, all the methods on siginfo.Sig have pointer receivers. The struct itself is also quite large, making a pointer reference probably more efficient to pass around. Plus, it contains a sync.RWMutex, which "must not be copied after first use" according to the docs - using a value siginfo.Sig makes it much easier to accidentally copy the struct, and break the mutex.

I don't think it's being copied. Anyway, I think that may have been a bit of C-style thinking on my part. Reverted.


go/sig/egress/interface.go, line 135 at r4 (raw file):

Previously, kormat (Stephen Shirley) wrote…

Ah, i see your point. There's no guarantee it is in the pool, though, so there needs to be another check.

This is what I found in Go docs: "If the requested key doesn't exist, we get the value type's zero value." In this case the map's value is a pointer, so the result will be nil, which is exactly what we want.

Copy link
Contributor Author

@sustrik sustrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 10 files reviewed, 3 unresolved discussions (waiting on @kormat)


go/sig/siginfo/sig.go, line 41 at r5 (raw file):

Previously, kormat (Stephen Shirley) wrote…

If you add one more bit, it can work safely on potentially nil Sigs:

if s == nil || x == nil {
    return s == x
}

Then you don't need to check for == nil before callingEq()

Done.

Copy link
Contributor

@kormat kormat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 5 files at r6.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved


go/sig/egress/interface.go, line 135 at r4 (raw file):

Previously, sustrik (Martin Sustrik) wrote…

This is what I found in Go docs: "If the requested key doesn't exist, we get the value type's zero value." In this case the map's value is a pointer, so the result will be nil, which is exactly what we want.

Gah, ok, i keep getting that one wrong. I withdraw all of this thread :)

@sustrik
Copy link
Contributor Author

sustrik commented Dec 12, 2018

The tests here will be somewhat lacking because of #1390, but that's out of scope of this issue.

This is a breaking change. For SIGs to work properly they should be
added to the topology file, so that BR is aware of them when it
gets a packet sent to SvcSIG.

Fixes #2121
Copy link
Collaborator

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 5 files at r6, 6 of 6 files at r7.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@sustrik sustrik merged commit af6cd69 into scionproto:master Dec 13, 2018
@sustrik sustrik deleted the use-svc-sig2 branch December 13, 2018 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i/breaking change PR that breaks forwards or backwards compatibility
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants