Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distsql: introduce SpanResolver #10575

Merged
merged 2 commits into from
Nov 22, 2016

Conversation

andreimatei
Copy link
Contributor

@andreimatei andreimatei commented Nov 9, 2016

This change is Reviewable

@andreimatei
Copy link
Contributor Author

still working on more tests


Review status: 0 of 18 files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

@RaduBerinde
Copy link
Member

Thanks a lot for getting this done!! WOOOHOO

Most comments I have are from the perspective of a prospective user. Someone more familiar with the lower layers that we're using should also take a look


Review status: 0 of 18 files reviewed at latest revision, 14 unresolved discussions, some commit checks failed.


pkg/sql/distsql/span_resolver.go, line 74 at r2 (raw file):

const (
  // Ascending means Key is inclusive and EndKey is exclusive.
  Ascending ScanType = iota

Can't we use an existing type for this (like encoding.Direction)? Or if the direction is actually not important and we only care about which end is inclusive, maybe name them accordingly (eg StartInclusive, EndInclusive)


pkg/sql/distsql/span_resolver.go, line 93 at r2 (raw file):

  case BinPackingLHGuessingPolicy:
      oracle = &binPackingOracle{
          // This number is based on nothing.

duplicate comment


pkg/sql/distsql/span_resolver.go, line 108 at r2 (raw file):

}

func (sr *SpanResolver) resolveSpan(

could use a comment


pkg/sql/distsql/span_resolver.go, line 132 at r2 (raw file):

  var leaseHolders []kv.ReplicaInfo
  ri := kv.NewRangeIterator(sr.distSender, scanType == Descending)

can/should we reuse the iterator? it may make sense to continue seeking from where we left off


pkg/sql/distsql/span_resolver.go, line 160 at r2 (raw file):

      }
  }
  return leaseHolders, nil

If ri.Seek() runs into an error, shouldn't we return ri.Error()?


pkg/sql/distsql/span_resolver.go, line 166 at r2 (raw file):

// spans are distributed across kv ranges.
// For each input span it returns an array of results containing information
// about all the ranges spanned by the respective span. Each range is

What is not ideal here is that if we have a lot of spans within a replica, we have a separate copy of the same ReplicaInfo for each span. I'm thinking that perhaps in the future we will want have a route-by-range that uses this API with a lot of small spans (rows).

The calling code will have to go through all the spans and do some work to trim the spans to the range boundaries, so it could just as easily handle a []kv.ReplicaInfo which just contains all ranges touched by the spans (in order).


pkg/sql/distsql/span_resolver.go, line 169 at r2 (raw file):

// represented by a RangeInfo, which identifies the replica that is believed
// to be the current lease holder of the range. Information about connecting to
// the node owning that replica is also included.

Is it guaranteed that the ranges for each span cover the span without gaps? (is the end key of each range the same with the start key of the next?)


pkg/sql/distsql/span_resolver.go, line 177 at r2 (raw file):

// spans but not necessarily the EndKey (so StartKey is inclusive, EndKey is
// exclusive). If it's Descending then the opposite holds. Regardless of
// scanType, the input spans still need to be sorted ascendingly.

should we verify the sort order and error out or panic, in case the higher layers are doing something wrong?


pkg/sql/distsql/span_resolver.go, line 206 at r2 (raw file):

  } else {
      // Iterate backwards so that prefetching helps us across spans.
      for i := len(spans) - 1; i >= 0; i-- {

we could have a single loop for i := start; i != end; i += step {} and initialize start, end, step accordingly


pkg/sql/distsql/span_resolver.go, line 208 at r2 (raw file):

      for i := len(spans) - 1; i >= 0; i-- {
          var err error
          leaseHolders[i], err = sr.resolveSpan(ctx, spans[i], scanType, rangesPerNode)

We could check if the span is already completely inside the last kv.ReplicaInfo and skip the call in that case. This ties in with the comment above to maybe just return a single list of replica infos.


pkg/sql/distsql/span_resolver.go, line 233 at r2 (raw file):

  // about any of the nodes that might be tried.
  GuessLeaseHolder(
      desc roachpb.RangeDescriptor, rangesPerLeaseHolder map[roachpb.NodeID]uint,

Maybe rangesPerLeaseHolder belongs in a separate struct where we explain that it contains stuff that oracle implementations can use to make their decisions. It would make it cleaner and easier to add more stuff.

Also, why uint? I'd prefer int if you don't have a strong reason (unexpected negative values are usually easier to debug vs underflows).


pkg/sql/distsql/span_resolver.go, line 298 at r2 (raw file):

  // full. Use the least-loaded one.
  return replicas[leastLoadedIdx], nil

[nit] extra line


pkg/sql/distsql/span_resolver_test.go, line 199 at r2 (raw file):

// `CREATE TABLE test (k INT PRIMARY KEY)` at row with value pk (the row will be
// the first on the right of the split).
func splitRangeAtKey(

We already have SQL for this, why not use that?


pkg/sql/distsql/span_resolver_test.go, line 220 at r2 (raw file):

  pik := primaryIndexKey(tableDesc, parser.NewDInt(parser.DInt(pk)))
  startKey := keys.MakeFamilyKey(pik, uint32(tableDesc.Families[0].ID))
  leftRange, rightRange, err := ts.SplitRange(startKey)

note that in the implementation of SPLIT AT, we had to tolerate occasional conflict updating range descriptors errors.. If you keep this code please run the tests under stress to make sure they're not flaky. Come to think of it, I believe @tamird ensured that it's done for you now :)


Comments from Reviewable

@andreimatei
Copy link
Contributor Author

Transformed the code into an iterator, see if you like it. I think it works more naturally with the lower layer, that's also an iterator, and it makes more clear the significance of that "scan direction" argument, and also makes it more natural for a client to mix directions between spans... Plus I also think we'll need to use an iteration model for SELECT * FROM t LIMIT 1 from day one... And probably also that query + a filter.


Review status: 0 of 19 files reviewed at latest revision, 14 unresolved discussions, some commit checks failed.


pkg/sql/distsql/span_resolver.go, line 74 at r2 (raw file):

Previously, RaduBerinde wrote…

Can't we use an existing type for this (like encoding.Direction)? Or if the direction is actually not important and we only care about which end is inclusive, maybe name them accordingly (eg StartInclusive, EndInclusive)

I've considered various alternatives... - `encoding.Direction` is a much lower-level concept. - the direction is actually relevant to the implementation (as an optimization), and might also become important to the interface too if we move this Resolver to an iterator model in the future. - the "scan direction" concept with its associated semantics about which end is inclusive is very pervasive in our code base, for better or worse. Shows up all over the place, in different APIs and caches. Everybody and their dog takes a `reverse bool`, which nobody explains. I couldn't bare that bool any more, with all the comments at call sites about what it is. The enum is more evident and also gives you a place to hang documentation off of. I think it'd be nice if some of the lower levels moved to more explicit arguments, but at this level here I think it's fine.

pkg/sql/distsql/span_resolver.go, line 166 at r2 (raw file):

Previously, RaduBerinde wrote…

What is not ideal here is that if we have a lot of spans within a replica, we have a separate copy of the same ReplicaInfo for each span. I'm thinking that perhaps in the future we will want have a route-by-range that uses this API with a lot of small spans (rows).

The calling code will have to go through all the spans and do some work to trim the spans to the range boundaries, so it could just as easily handle a []kv.ReplicaInfo which just contains all ranges touched by the spans (in order).

the iterator now makes this optimization

pkg/sql/distsql/span_resolver.go, line 169 at r2 (raw file):

Previously, RaduBerinde wrote…

Is it guaranteed that the ranges for each span cover the span without gaps? (is the end key of each range the same with the start key of the next?)

Yes. The lower levels take care of this, in the face of inconsistent range descriptor reads racing with splits.

pkg/sql/distsql/span_resolver.go, line 233 at r2 (raw file):

Previously, RaduBerinde wrote…

Maybe rangesPerLeaseHolder belongs in a separate struct where we explain that it contains stuff that oracle implementations can use to make their decisions. It would make it cleaner and easier to add more stuff.

Also, why uint? I'd prefer int if you don't have a strong reason (unexpected negative values are usually easier to debug vs underflows).

Done. I agree with the separate struct. Initially I wanted to make it a black box implemented by each oracle, but the code above the iterator needs some access to updating it because, if a lease holder is known, the oracle is not involved, so... meh.

About the uint... would you then say that uint is pretty much never to be used (unless you need the 1 bit of extra range)? But done.


Comments from Reviewable

@RaduBerinde
Copy link
Member

:lgtm_strong:

But also get a review from someone with deeper `kv` knowledge, maybe @petermattis

Review status: 0 of 19 files reviewed at latest revision, 15 unresolved discussions, some commit checks failed.


pkg/sql/distsql/span_resolver.go, line 74 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I've considered various alternatives...

  • encoding.Direction is a much lower-level concept.
  • the direction is actually relevant to the implementation (as an optimization), and might also become important to the interface too if we move this Resolver to an iterator model in the future.
  • the "scan direction" concept with its associated semantics about which end is inclusive is very pervasive in our code base, for better or worse. Shows up all over the place, in different APIs and caches. Everybody and their dog takes a reverse bool, which nobody explains. I couldn't bare that bool any more, with all the comments at call sites about what it is. The enum is more evident and also gives you a place to hang documentation off of.
    I think it'd be nice if some of the lower levels moved to more explicit arguments, but at this level here I think it's fine.
👍

pkg/sql/distsql/span_resolver.go, line 169 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Yes.
The lower levels take care of this, in the face of inconsistent range descriptor reads racing with splits.

Woot

pkg/sql/distsql/span_resolver.go, line 233 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Done. I agree with the separate struct. Initially I wanted to make it a black box implemented by each oracle, but the code above the iterator needs some access to updating it because, if a lease holder is known, the oracle is not involved, so... meh.

About the uint... would you then say that uint is pretty much never to be used (unless you need the 1 bit of extra range)? But done.

In general, yeah.. And seems that Go went that way with e.g. `len()` and stuff. Just a suggestion though, feel free to ignore.

pkg/sql/distsql/span_resolver.go, line 171 at r3 (raw file):

// Seek positions the iterator on the start of a span.
// After calling this, ReplicaInfo() will return information about the first

Maybe "the range containing the start of the span" is a bit more explicit


pkg/sql/distsql/span_resolver.go, line 173 at r3 (raw file):

// After calling this, ReplicaInfo() will return information about the first
// range of the span. NeedAnother() will return true until the iterator is
// positioned on the end of the span.

on or after the end of the span?


Comments from Reviewable

@andreimatei
Copy link
Contributor Author

A few changes:

  • added a Desc() method for getting the current RangeDesc
  • removed all language about the scan direction affecting the inclusive nature of EndKey

@RaduBerinde, I've kept the resolving of nodes to addresses inside the iterator, even though it could be done outside and only once per node. This is because, when an oracle is involved, we need to resolve through gossip to get the node attributes. And also because we avoid returning replicas on nodes that are not available in gossip.
The sanity of the latter point is dubious, but it's also what the DistSender is doing.

In any case, hopefully this isn't expensive. If it is, the DistSender has the same problem and we can likely improve the gossip lookup process.


Review status: 0 of 21 files reviewed at latest revision, 15 unresolved discussions, some commit checks failed.


pkg/sql/distsql/span_resolver.go, line 93 at r2 (raw file):

Previously, RaduBerinde wrote…

duplicate comment

Done.

pkg/sql/distsql/span_resolver.go, line 108 at r2 (raw file):

Previously, RaduBerinde wrote…

could use a comment

Done.

pkg/sql/distsql/span_resolver.go, line 132 at r2 (raw file):

Previously, RaduBerinde wrote…

can/should we reuse the iterator? it may make sense to continue seeking from where we left off

sorry, I don't follow. If this condition passes, we will reuse the `it.it`.

pkg/sql/distsql/span_resolver.go, line 177 at r2 (raw file):

Previously, RaduBerinde wrote…

should we verify the sort order and error out or panic, in case the higher layers are doing something wrong?

no longer applicable

pkg/sql/distsql/span_resolver.go, line 206 at r2 (raw file):

Previously, RaduBerinde wrote…

we could have a single loop for i := start; i != end; i += step {} and initialize start, end, step accordingly

no longer applicable

pkg/sql/distsql/span_resolver.go, line 208 at r2 (raw file):

Previously, RaduBerinde wrote…

We could check if the span is already completely inside the last kv.ReplicaInfo and skip the call in that case. This ties in with the comment above to maybe just return a single list of replica infos.

no longer applicable

pkg/sql/distsql/span_resolver.go, line 233 at r2 (raw file):

Previously, RaduBerinde wrote…

In general, yeah.. And seems that Go went that way with e.g. len() and stuff. Just a suggestion though, feel free to ignore.

yeah, I don't disagree.

pkg/sql/distsql/span_resolver.go, line 298 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] extra line

Done.

pkg/sql/distsql/span_resolver.go, line 171 at r3 (raw file):

Previously, RaduBerinde wrote…

Maybe "the range containing the start of the span" is a bit more explicit

Done.

pkg/sql/distsql/span_resolver.go, line 173 at r3 (raw file):

Previously, RaduBerinde wrote…

on or after the end of the span?

Done.

pkg/sql/distsql/span_resolver_test.go, line 199 at r2 (raw file):

Previously, RaduBerinde wrote…

We already have SQL for this, why not use that?

I might use it... Let me play around.

pkg/sql/distsql/span_resolver_test.go, line 220 at r2 (raw file):

Previously, RaduBerinde wrote…

note that in the implementation of SPLIT AT, we had to tolerate occasional conflict updating range descriptors errors.. If you keep this code please run the tests under stress to make sure they're not flaky. Come to think of it, I believe @tamird ensured that it's done for you now :)

ack

Comments from Reviewable

@RaduBerinde
Copy link
Member

:lgtm:


Review status: 0 of 21 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


pkg/sql/index_selection.go, line 950 at r5 (raw file):

// Encodes datum at the end of key, using direction `dir` for the encoding.
// It takes in an inclusive key and return an inclusive key if

returns


pkg/sql/distsql/span_resolver_test.go, line 450 at r5 (raw file):

      res = append(res, repls)
  }
  return res, nil

If we exit the loop because it.Valid() is false, we should return it.Error() no? Maybe doing if !it.Valid() { return nil. it.Error() in the loop makes more sense.


Comments from Reviewable

@bdarnell
Copy link
Contributor

:lgtm:


Reviewed 6 of 9 files at r1, 3 of 13 files at r4, 12 of 12 files at r5.
Review status: all files reviewed at latest revision, 10 unresolved discussions, some commit checks failed.


pkg/sql/distsql/span_resolver.go, line 35 at r5 (raw file):

// When guessing lease holders, we try to guess the same node for all the ranges
// applicable, until we hit this limit. The rationale is that maybe a bunch of
// those ranges don't have an active lease, so our guess is going to be

Once we have epoch-based leases, it will be rare for a range to not have an active lease.


pkg/sql/distsql/span_resolver.go, line 82 at r5 (raw file):

  // preference to the current node and others "close" to it when guessing lease
  // holders.
  // TODO(andrei): can the descriptor change at runtime?

Not currently, and there are other places where we store copies of the NodeDescriptor so we'd have to clean all of those up before we could allow runtime changes to the descriptor. Unlikely to happen any time soon.


pkg/sql/distsql/span_resolver.go, line 304 at r5 (raw file):

  // A RangeUnavailableError can be returned if there's no information in gossip
  // about any of the nodes that might be tried.
  GuessLeaseHolder(

This method seems misnamed. These aren't really trying to guess where the lease holder is, they're choosing where we would prefer the lease go if it's free.


pkg/sql/sqlbase/errors.go, line 427 at r5 (raw file):

}

// ErrRangeUnavailable represents a missing database error.

s/database/range/


pkg/sql/sqlbase/errors.go, line 435 at r5 (raw file):

}

// NewRangeUnavailableError creates a new ErrRangeUnavailable.

I know you're just following the pattern in this file, but it's unusual for the constructor name and the type name to differ. Most of our error types use the suffix Error. (why have I mentioned this in three PRs in the same day after never being an issue before?)


pkg/testutils/testcluster/testcluster.go, line 248 at r5 (raw file):

  splitKey roachpb.Key,
) (roachpb.RangeDescriptor, roachpb.RangeDescriptor, error) {
  return tc.Servers[0].SplitRange(splitKey)

Merge conflict warning: I think @vivekmenezes will need to change this method in #10728.


Comments from Reviewable

@andreimatei
Copy link
Contributor Author

Unfortunately the tests were flaky because of #10751, whose investigation showed a number of weird things in how we manage the RangeCache; I've had to add a sleep in the test :) to make the test not flake. I think I'll merge like this and will work on fixing the various caching problems.


Review status: all files reviewed at latest revision, 10 unresolved discussions, some commit checks failed.


pkg/sql/index_selection.go, line 950 at r5 (raw file):

Previously, RaduBerinde wrote…

returns

Done.

pkg/sql/distsql/span_resolver.go, line 160 at r2 (raw file):

Previously, RaduBerinde wrote…

If ri.Seek() runs into an error, shouldn't we return ri.Error()?

no longer applicable

pkg/sql/distsql/span_resolver.go, line 35 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Once we have epoch-based leases, it will be rare for a range to not have an active lease.

yeah, I'm aware of this impedance mismatch. However I don't know what to do about it just now.

My plan is to move the whole planning process for large queries to an iterative model, where we plan the first ranges we want to read, then plan some more as we go. At that point, this SpanResolverIterator would resolve ranges in advance, diminishing the role of "guessing".
Separately, for small queries, I plan to do something so that information about bad planning flows back to the gateway node, updating its caches. We have that today with the DistSender, but we're losing it with DistSQL. This will further diminish the role of guessing for common queries.
And, of course, separate for all of these, we need some new plan for having data be located in accordance with access patterns. Not sure what to do here; generally it's a hard optimization problem. It'd probably be easy to get back the crude what we had today and are losing with epoch-based leases - we could add a queue that releases the leases for ranges that haven't been accessed in a while (I think it'd be easy to adapt the lease transfer code to work for "releasing" a lease too). But that's not actually what we want.


pkg/sql/distsql/span_resolver.go, line 82 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Not currently, and there are other places where we store copies of the NodeDescriptor so we'd have to clean all of those up before we could allow runtime changes to the descriptor. Unlikely to happen any time soon.

Done.

pkg/sql/distsql/span_resolver.go, line 304 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This method seems misnamed. These aren't really trying to guess where the lease holder is, they're choosing where we would prefer the lease go if it's free.

well, the method does both. I'll take another name suggestion.

pkg/sql/distsql/span_resolver_test.go, line 450 at r5 (raw file):

Previously, RaduBerinde wrote…

If we exit the loop because it.Valid() is false, we should return it.Error() no? Maybe doing if !it.Valid() { return nil. it.Error() in the loop makes more sense.

Done. good catch

pkg/sql/sqlbase/errors.go, line 427 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/database/range/

Done.

pkg/sql/sqlbase/errors.go, line 435 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I know you're just following the pattern in this file, but it's unusual for the constructor name and the type name to differ. Most of our error types use the suffix Error. (why have I mentioned this in three PRs in the same day after never being an issue before?)

Done.

pkg/testutils/testcluster/testcluster.go, line 248 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Merge conflict warning: I think @vivekmenezes will need to change this method in #10728.

I'm aware of that change, but I'm not sure what change to this `TestServer` method you have in mind exactly.

Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: 4 of 21 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed.


pkg/sql/distsql/span_resolver.go, line 304 at r5 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

well, the method does both. I'll take another name suggestion.

Where is it guessing? I'd suggest something like ChoosePreferredLeaseHolder for the name.

pkg/sql/distsql/span_resolver_test.go, line 220 at r7 (raw file):

      Knobs: base.TestingKnobs{
          Store: &storage.StoreTestingKnobs{
              // Disable the split queue, as it can rage with the splitting done

s/rage/race/


pkg/testutils/testcluster/testcluster.go, line 248 at r5 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'm aware of that change, but I'm not sure what change to this TestServer method you have in mind exactly.

I'm arguing there that the callers of AdminSplit should be responsible for retrying instead of the lower levels.

Comments from Reviewable

@andreimatei
Copy link
Contributor Author

Review status: 3 of 20 files reviewed at latest revision, 9 unresolved discussions.


pkg/sql/distsql/span_resolver.go, line 304 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote… > Where is it guessing? I'd suggest something like ChoosePreferredLeaseHolder for the name.
Done.

pkg/sql/distsql/span_resolver_test.go, line 220 at r7 (raw file):

Previously, bdarnell (Ben Darnell) wrote… > s/rage/race/
Done.

pkg/testutils/testcluster/testcluster.go, line 248 at r5 (raw file):

Previously, bdarnell (Ben Darnell) wrote… > I'm arguing there that the callers of AdminSplit should be responsible for retrying instead of the lower levels.
seems like you're losing that argument :) In any case, if anything, `TestServer.SplitRange` will need to change, not this `TestCluster.SplitRange`.

Comments from Reviewable

... from TestCluster, so that they can be used by tests that only need a
single node. Also change their results from pointers to vals.
SpanResolver will be used by physical planning for figuring out the
mapping between key spans and kv ranges.
@andreimatei andreimatei merged commit ce5c9a2 into cockroachdb:master Nov 22, 2016
@andreimatei andreimatei deleted the lift-leader-cache branch November 22, 2016 19:19
@RaduBerinde
Copy link
Member

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants