Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Fix bug where cluster gets stuck if node restarts without its data #10544

Merged
merged 4 commits into from
Nov 14, 2016

Conversation

a-robinson
Copy link
Contributor

@a-robinson a-robinson commented Nov 8, 2016

gossip: Don't allow two nodes to share the same address
When a new node registers with the same address as an existing node,
assume that the existing node has been replaced by the new one and
clear it out of the infoStore.

storage: Stop processing if target node doesn't contain desired store
Do so by adding a new roachpb error type that can be recognized in such
cases.

I'm open to suggestions on other approaches to the gossip commit. It looks as though our gossip depends on each piece of Info being immutable, so replacing the populated descriptor with an empty descriptor at a later TTL seemed like a better approach than trying to add a different way to remove an entry, but I may be missing a cleaner way.

I'm still working on getting a testcluster test in place. Sending this out now for feedback in the meantime. I've manually verified that this fixes the scenario in practice.

Fixes #10266

@tamird


This change is Reviewable

@a-robinson a-robinson force-pushed the raft branch 2 times, most recently from 2ab7c56 to d827efb Compare November 8, 2016 21:28
@a-robinson
Copy link
Contributor Author

Repushed to fix a broken test

@tamird
Copy link
Contributor

tamird commented Nov 8, 2016

Second commit needs a test?


Reviewed 5 of 5 files at r1, 1 of 1 files at r2, 7 of 7 files at r3.
Review status: all files reviewed at latest revision, 14 unresolved discussions, some commit checks pending.


pkg/gossip/gossip.go, line 193 at r1 (raw file):

  resolversTried map[int]struct{} // Set of attempted resolver indexes
  nodeDescs      map[roachpb.NodeID]*roachpb.NodeDescriptor
  nodeAddrs      map[util.UnresolvedAddr]roachpb.NodeID

is this a superset of bootstrapAddrs?


pkg/gossip/gossip.go, line 584 at r1 (raw file):

  // removed from the cluster. If that's the case, remove it from our map of
  // nodes to prevent other parts of the system from trying to talk to it.
  if desc.NodeID == 0 {

if desc == (roachpb.NodeDescriptor{})?


pkg/gossip/gossip.go, line 605 at r1 (raw file):

  // it from our set of tracked descriptors to ensure we don't attempt to
  // connect to its previous identity (as came up in issue #10266).
  // We ignore empty addresses for the sake of not breaking tests.

how many tests? It might be better to add a test-only node descriptor factory method.


pkg/gossip/gossip.go, line 608 at r1 (raw file):

  if desc.Address != (util.UnresolvedAddr{}) {
      if oldNodeID, ok := g.nodeAddrs[desc.Address]; ok && oldNodeID != desc.NodeID {
          log.Infof(ctx, "removing node %d which was at same address as new node %v", oldNodeID, desc)

log the address as well?


pkg/gossip/gossip.go, line 610 at r1 (raw file):

          log.Infof(ctx, "removing node %d which was at same address as new node %v", oldNodeID, desc)
          delete(g.nodeDescs, oldNodeID)
          bytes, err := protoutil.Marshal(&roachpb.NodeDescriptor{})

i think you should add a comment here to explain what's going on and that this is recursive because this is a gossip callback.


pkg/gossip/gossip.go, line 620 at r1 (raw file):

          }
      }
      g.nodeAddrs[desc.Address] = desc.NodeID

this map grows without bound, yeah? probably fine, just sayin'.


pkg/gossip/gossip_test.go, line 70 at r1 (raw file):

  g.SetNodeDescriptor(node1)
  g.SetNodeDescriptor(node2)
  if val, err := g.GetNodeDescriptor(1); err != nil {

shouldn't these magic numbers be node{1,2}.NodeID?


pkg/gossip/gossip_test.go, line 71 at r1 (raw file):

  g.SetNodeDescriptor(node2)
  if val, err := g.GetNodeDescriptor(1); err != nil {
      t.Errorf("error fetching node 1: %s", err)

nit: all these Errorfs where you already have an error could just be Errors - the extra verbosity doesn't help and only increases the probability of rot.


pkg/gossip/gossip_test.go, line 83 at r1 (raw file):

  // Give node3 the same address as node1, which should cause node1 to be
  // removed from the cluster.
  node3 := &roachpb.NodeDescriptor{NodeID: 3, Address: util.MakeUnresolvedAddr("tcp", "1.1.1.1:1")}

if it's the same as node1, use node1.Address rather than the same magic string.


pkg/gossip/gossip_test.go, line 92 at r1 (raw file):

  // Quiesce the stopper now to ensure that the update has propagated before
  // checking whether node 1 has been removed from the infoStore.

nice!


pkg/gossip/keys.go, line 117 at r1 (raw file):

// Returns an error if the key is not of the correct type or is not parsable.
func NodeIDFromKey(key string) (roachpb.NodeID, error) {
  trimmedKey := strings.TrimPrefix(key, KeyNodeIDPrefix+separator)

oddly, two calls to TrimPrefix would be more efficient, since you wouldn't allocate the concatenated string.


pkg/gossip/keys_test.go, line 50 at r1 (raw file):

      if err != nil {
          if tc.success {
              t.Errorf("%d: expected success, got error: %s", i, err)

nit: you could use subtests named after the keys instead of these indexes. that would produce better errors.


pkg/storage/store.go, line 2958 at r3 (raw file):

}

// HandleRaftResponse handles response messages from the raft transport. It

Just remove this comment and say that it implements whatever interface it implements.


pkg/storage/store.go, line 2970 at r3 (raw file):

          repl, err := s.GetReplica(resp.RangeID)
          if err != nil {
              return nil // not unexpected

NYC but this should really check for roachpb.RangeNotFoundError; anything else is unexpected.


Comments from Reviewable

@petermattis
Copy link
Collaborator

:lgtm:

@spencerkimball might want to take a look at the gossip changes.


Review status: all files reviewed at latest revision, 14 unresolved discussions, all commit checks successful.


pkg/gossip/gossip.go, line 606 at r2 (raw file):

  // connect to its previous identity (as came up in issue #10266).
  // We ignore empty addresses for the sake of not breaking tests.
  if desc.Address != (util.UnresolvedAddr{}) {

Might be cleaner to add an UnresolvedAddr.IsEmpty helper method.


Comments from Reviewable

@a-robinson
Copy link
Contributor Author

Review status: 1 of 12 files reviewed at latest revision, 14 unresolved discussions.


pkg/gossip/gossip.go, line 193 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

is this a superset of bootstrapAddrs?

Yes, effectively. Do you think it's worth refactoring?

pkg/gossip/gossip.go, line 584 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

if desc == (roachpb.NodeDescriptor{})?

Unfortunately roachpb.NodeDescriptor isn't comparable due to its roachpb.Attributes field

pkg/gossip/gossip.go, line 605 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

how many tests? It might be better to add a test-only node descriptor factory method.

It's up in the double digits in just the storage package alone. I can go through and update them all if you'd like, but it'll probably lead to wasted time in the future as more tests are added without addresses specified and the developer has to figure out why their tests are timing out.

pkg/gossip/gossip.go, line 608 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

log the address as well?

Done.

pkg/gossip/gossip.go, line 610 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

i think you should add a comment here to explain what's going on and that this is recursive because this is a gossip callback.

Done.

pkg/gossip/gossip.go, line 620 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this map grows without bound, yeah? probably fine, just sayin'.

Yeah. I also assume it's probably fine. nodeDescs grows without bound as well, otherwise I'd remove entries from here whenever we removed them from nodeDescs

pkg/gossip/gossip.go, line 606 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Might be cleaner to add an UnresolvedAddr.IsEmpty helper method.

Done.

pkg/gossip/gossip_test.go, line 70 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

shouldn't these magic numbers be node{1,2}.NodeID?

I think it's pretty understandable in the context of the test, but sure

pkg/gossip/gossip_test.go, line 71 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit: all these Errorfs where you already have an error could just be Errors - the extra verbosity doesn't help and only increases the probability of rot.

Sure, done.

pkg/gossip/gossip_test.go, line 83 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

if it's the same as node1, use node1.Address rather than the same magic string.

Good call, done.

pkg/gossip/keys.go, line 117 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

oddly, two calls to TrimPrefix would be more efficient, since you wouldn't allocate the concatenated string.

Hm, but wouldn't each call to TrimPrefix have to allocate its return value so we'd do that work twice? Also, I'd assume the compiler can concatenate two constants at compilation time.

Doing two separate trims would also mean we'd have to add a second comparison to avoid accidentally accepting keys like ":1" on which the first trim is a no-op.


pkg/gossip/keys_test.go, line 50 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit: you could use subtests named after the keys instead of these indexes. that would produce better errors.

Done.

pkg/storage/store.go, line 2958 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Just remove this comment and say that it implements whatever interface it implements.

Done.

pkg/storage/store.go, line 2970 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

NYC but this should really check for roachpb.RangeNotFoundError; anything else is unexpected.

Added a log message for non-RangeNotFound errors. Look alright?

Comments from Reviewable

@a-robinson
Copy link
Contributor Author

Review status: 1 of 12 files reviewed at latest revision, 14 unresolved discussions, some commit checks pending.


pkg/gossip/keys.go, line 117 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Hm, but wouldn't each call to TrimPrefix have to allocate its return value so we'd do that work twice? Also, I'd assume the compiler can concatenate two constants at compilation time.

Doing two separate trims would also mean we'd have to add a second comparison to avoid accidentally accepting keys like ":1" on which the first trim is a no-op.

For fun, I did a quick benchmark of this. It might not be the ideal comparison, but at least in this particular case the concatenation version is better, even leaving out the extra comparison that would be needed for correctness in the trimming twice case:
package bench

import (
  "strings"
  "testing"
)

func BenchmarkTrimAndConcat(b *testing.B) {
  TrimAndConcat("node:33", b.N)
}

func BenchmarkTrimTwice(b *testing.B) {
  TrimTwice("node:33", b.N)
}

const KeyNodeIDPrefix = "node"
const separator = ":"

var foo int

func TrimAndConcat(key string, n int) {
  for k := 0; k < n; k++ {
    trimmedKey := strings.TrimPrefix(key, KeyNodeIDPrefix+separator)
    if trimmedKey != "" {
      foo++
    }
  }
}

func TrimTwice(key string, n int) {
  for k := 0; k < n; k++ {
    trimmedKey := strings.TrimPrefix(strings.TrimPrefix(key, KeyNodeIDPrefix), separator)
    if trimmedKey != "" {
      foo++
    }
  }
}
$ go test -bench=.
testing: warning: no tests to run
BenchmarkTrimAndConcat-8            300000000            4.77 ns/op
BenchmarkTrimTwice-8                200000000           10.2 ns/op
PASS

Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 9, 2016

Reviewed 11 of 11 files at r4, 7 of 7 files at r5, 1 of 1 files at r6.
Review status: all files reviewed at latest revision, 10 unresolved discussions, all commit checks successful.


pkg/gossip/gossip.go, line 193 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Yes, effectively. Do you think it's worth refactoring?

Yeah, I think so.

pkg/gossip/gossip.go, line 584 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Unfortunately roachpb.NodeDescriptor isn't comparable due to its roachpb.Attributes field

Gah. OK, can you note that in here? Basically I want to call out that this is not a node that doesn't have a node ID assigned. The comment kind of mentions it, but I still found this confusing when reading the first time through.

pkg/gossip/gossip.go, line 605 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

It's up in the double digits in just the storage package alone. I can go through and update them all if you'd like, but it'll probably lead to wasted time in the future as more tests are added without addresses specified and the developer has to figure out why their tests are timing out.

OK. Note that the comment is now malformed "tests the many tests".

pkg/gossip/gossip.go, line 624 at r4 (raw file):

          } else {
              key := MakeNodeIDKey(oldNodeID)
              if err := g.addInfoLocked(key, bytes, ttlNodeDescriptorGossip); err != nil {

actually, looking at this again, introducing SetNodeDescriptorLocked would save substantial boilerplate here. At the same time, you could update the comment on that method that says "...gossip network and sets the infostore's node ID." - it does not set the inforstore's node ID.


pkg/gossip/gossip_test.go, line 100 at r4 (raw file):

  // checking whether node 1 has been removed from the infoStore.
  stopper.Quiesce()
  if val, err := g.GetNodeDescriptor(node1.NodeID); err == nil {

can you use testutils.IsError or otherwise check for the specific error you expect?


pkg/gossip/keys.go, line 117 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

For fun, I did a quick benchmark of this. It might not be the ideal comparison, but at least in this particular case the concatenation version is better, even leaving out the extra comparison that would be needed for correctness in the trimming twice case:

package bench

import (
  "strings"
  "testing"
)

func BenchmarkTrimAndConcat(b *testing.B) {
  TrimAndConcat("node:33", b.N)
}

func BenchmarkTrimTwice(b *testing.B) {
  TrimTwice("node:33", b.N)
}

const KeyNodeIDPrefix = "node"
const separator = ":"

var foo int

func TrimAndConcat(key string, n int) {
  for k := 0; k < n; k++ {
    trimmedKey := strings.TrimPrefix(key, KeyNodeIDPrefix+separator)
    if trimmedKey != "" {
      foo++
    }
  }
}

func TrimTwice(key string, n int) {
  for k := 0; k < n; k++ {
    trimmedKey := strings.TrimPrefix(strings.TrimPrefix(key, KeyNodeIDPrefix), separator)
    if trimmedKey != "" {
      foo++
    }
  }
}
$ go test -bench=.
testing: warning: no tests to run
BenchmarkTrimAndConcat-8              300000000            4.77 ns/op
BenchmarkTrimTwice-8                  200000000           10.2 ns/op
PASS
Cool, thanks for benchmarking. I was expecting the double-trim to be faster because of my expectation that slicing a string never allocates. Perhaps that's incorrect?

pkg/gossip/keys_test.go, line 53 at r4 (raw file):

                  t.Errorf("expected success, got error: %s", err)
              }
              return

nit: else? also, we generally try to avoid checking for "error or no error" because that can produce false negatives as the implementation evolves. instead, we typically use an expectedErr string to match more precisely (empty meaning nil is expected).


pkg/storage/store.go, line 2970 at r3 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Added a log message for non-RangeNotFound errors. Look alright?

Why not return the unexpected error?

pkg/util/unresolved_addr.go, line 78 at r4 (raw file):

// IsEmpty returns true if the address has no network or address specified.
func (a *UnresolvedAddr) IsEmpty() bool {

value receiver, since you're dereferencing.


Comments from Reviewable

@bdarnell
Copy link
Contributor

:lgtm:


Reviewed 1 of 5 files at r1, 4 of 11 files at r4, 6 of 7 files at r5, 1 of 1 files at r6.
Review status: all files reviewed at latest revision, 13 unresolved discussions, all commit checks successful.


pkg/gossip/gossip.go, line 530 at r6 (raw file):

              return err
          }
          if desc.Address == (util.UnresolvedAddr{}) || desc.Address == g.mu.is.NodeAddr {

Use UnresolvedAddr.IsEmpty here.


pkg/gossip/gossip.go, line 605 at r6 (raw file):

  // it from our set of tracked descriptors to ensure we don't attempt to
  // connect to its previous identity (as came up in issue #10266).
  // We ignore empty addresses for the sake of not breaking tests the many

s/tests//


pkg/gossip/gossip.go, line 619 at r6 (raw file):

          // being updated but won't lead to deadlock because it's called
          // asynchronously.
          bytes, err := protoutil.Marshal(&roachpb.NodeDescriptor{})

As long as the proto has no required fields, you can just use an empty byte string.


pkg/gossip/gossip.go, line 621 at r6 (raw file):

          bytes, err := protoutil.Marshal(&roachpb.NodeDescriptor{})
          if err != nil {
              log.Errorf(ctx, "failed to marshal empty NodeDescriptor proto: %s", err)

This should probably be a fatal (if it's not removed completely) since any failure here should never happen, but if it did it would happen every time and we'd have reintroduced the original bug.


pkg/gossip/keys.go, line 114 at r6 (raw file):

}

// NodeIDFromKey attempts to extract a NodeID from the provided key.

Mention that they key should have been constructed by MakeNodeIDKey (and maybe move it up to be adjacent to that function).


pkg/storage/store.go, line 2970 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Why not return the unexpected error?

Because returning an error shuts down the connection, but an unexpected error doesn't indicate anything wrong with the connection.

Comments from Reviewable

@a-robinson
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 14 unresolved discussions, all commit checks successful.


pkg/gossip/gossip.go, line 193 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Yeah, I think so.

Ehhh, after looking at it more closely I'm a little hesitant to mess with it. Storing node IDs in bootstrapAddrs would be awkward since we wouldn't want to store the descriptors to disk along with the addrs (since that could introduce new weird cases where they change) but don't have them yet when we're regenerating bootstrapAddrs on startup. I'm going to keep them separate for simplicity unless you feel strongly about it.

pkg/gossip/gossip.go, line 584 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Gah. OK, can you note that in here? Basically I want to call out that this is not a node that doesn't have a node ID assigned. The comment kind of mentions it, but I still found this confusing when reading the first time through.

Done. I've also added a check that the address is empty, for good measure.

pkg/gossip/gossip.go, line 605 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

OK. Note that the comment is now malformed "tests the many tests".

Done.

pkg/gossip/gossip.go, line 624 at r4 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

actually, looking at this again, introducing SetNodeDescriptorLocked would save substantial boilerplate here. At the same time, you could update the comment on that method that says "...gossip network and sets the infostore's node ID." - it does not set the inforstore's node ID.

Updated the comment, but didn't add the chain of Locked methods (SetNodeDescriptorLocked and AddInfoProtoLocked to go along with addInfoLocked) because I just switched to using an empty byte array literal at Ben's suggestion.

pkg/gossip/gossip.go, line 530 at r6 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Use UnresolvedAddr.IsEmpty here.

Done.

pkg/gossip/gossip.go, line 605 at r6 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/tests//

Done.

pkg/gossip/gossip.go, line 619 at r6 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

As long as the proto has no required fields, you can just use an empty byte string.

That struck me as being a little too clever to be worth the simplification when I was initially writing this, but I've made the change now along with a comment.

pkg/gossip/gossip.go, line 621 at r6 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This should probably be a fatal (if it's not removed completely) since any failure here should never happen, but if it did it would happen every time and we'd have reintroduced the original bug.

Done.

pkg/gossip/gossip_test.go, line 100 at r4 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

can you use testutils.IsError or otherwise check for the specific error you expect?

Done.

pkg/gossip/keys.go, line 117 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Cool, thanks for benchmarking. I was expecting the double-trim to be faster because of my expectation that slicing a string never allocates. Perhaps that's incorrect?

Well, I assume constant folding is keeping the concatenation from ever happening at runtime.

Your assumption about slicing is correct -- TrimPrefix doesn't do any allocations. So perhaps it's the function call overhead that's affecting this benchmark.


pkg/gossip/keys.go, line 114 at r6 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Mention that they key should have been constructed by MakeNodeIDKey (and maybe move it up to be adjacent to that function).

Done.

pkg/gossip/keys_test.go, line 53 at r4 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit: else? also, we generally try to avoid checking for "error or no error" because that can produce false negatives as the implementation evolves. instead, we typically use an expectedErr string to match more precisely (empty meaning nil is expected).

Switched to an else. This is a simple enough function that I'm going to skip changing up the error checking in the test this time, but I'll do that next time.

pkg/storage/store.go, line 2970 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Because returning an error shuts down the connection, but an unexpected error doesn't indicate anything wrong with the connection.

What Ben said.

pkg/util/unresolved_addr.go, line 78 at r4 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

value receiver, since you're dereferencing.

Done.

Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 11, 2016

Reviewed 12 of 12 files at r7.
Review status: 5 of 12 files reviewed at latest revision, 11 unresolved discussions, all commit checks successful.


pkg/gossip/gossip.go, line 193 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Ehhh, after looking at it more closely I'm a little hesitant to mess with it. Storing node IDs in bootstrapAddrs would be awkward since we wouldn't want to store the descriptors to disk along with the addrs (since that could introduce new weird cases where they change) but don't have them yet when we're regenerating bootstrapAddrs on startup. I'm going to keep them separate for simplicity unless you feel strongly about it.

I'm not really following you here. What descriptors are you talking about? AFAICT only addresses and node IDs are present here, and it seems easy enough to ignore the node IDs wherever bootstrapAddrs is currently used.

pkg/gossip/gossip.go, line 626 at r7 (raw file):

          // asynchronously.
          key := MakeNodeIDKey(oldNodeID)
          emptyProto := []byte{}

doesn't this still allocate? I'd have expected var emptyProto []byte


pkg/gossip/gossip_test.go, line 103 at r7 (raw file):

  expectedErr := "unable to look up descriptor for node"
  if val, err := g.GetNodeDescriptor(node1.NodeID); !testutils.IsError(err, expectedErr) {
      t.Errorf("expected error %q fetching node %d, got node %+v", expectedErr, node1.NodeID, val)

log the unexpected error


pkg/util/unresolved_addr.go, line 79 at r7 (raw file):

// IsEmpty returns true if the address has no network or address specified.
func (a UnresolvedAddr) IsEmpty() bool {
  return (a == (UnresolvedAddr{}))

return a == (UnresolvedAddr{})


Comments from Reviewable

@a-robinson
Copy link
Contributor Author

Review status: 5 of 12 files reviewed at latest revision, 11 unresolved discussions, all commit checks successful.


pkg/gossip/gossip.go, line 193 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I'm not really following you here. What descriptors are you talking about? AFAICT only addresses and node IDs are present here, and it seems easy enough to ignore the node IDs wherever bootstrapAddrs is currently used.

Sorry, I meant to write "node IDs" where I wrote "descriptors".

It feels like combining the different uses makes the code a little less understandable, but I've thrown one way of refactoring it into an additional commit. Let me know what you think.


pkg/gossip/gossip.go, line 626 at r7 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

doesn't this still allocate? I'd have expected var emptyProto []byte

Yeah, I just tend to like the explicitness of initializing it and the safety of not passing nil into unknown code. Switched over to avoid the allocation though, since I realize my preferences are silly.

pkg/gossip/gossip_test.go, line 103 at r7 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

log the unexpected error

Done.

pkg/util/unresolved_addr.go, line 79 at r7 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

return a == (UnresolvedAddr{})

Done.

Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 11, 2016

:lgtm:


Reviewed 2 of 3 files at r8, 6 of 7 files at r9, 1 of 1 files at r10, 1 of 1 files at r11.
Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


pkg/gossip/gossip.go, line 685 at r11 (raw file):

          return nil, err
      }
      if nodeDescriptor.NodeID != 0 {

this is checking the empty, right? should mirror the other site that checks for empty descriptor


Comments from Reviewable

@a-robinson
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


pkg/gossip/gossip.go, line 685 at r11 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this is checking the empty, right? should mirror the other site that checks for empty descriptor

Good catch, done.

Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 11, 2016

Reviewed 6 of 7 files at r13, 1 of 1 files at r14, 1 of 1 files at r15.
Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.


Comments from Reviewable

When a new node registers with the same address as an existing node,
assume that the existing node has been replaced by the new one and
clear it out of the infoStore.
Do so by adding a new roachpb error type that can be recognized in such
cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: restarting node without its data causes other nodes to get stuck
4 participants