Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop using protos as the expected value in CPut #38308

Closed
danhhz opened this issue Jun 19, 2019 · 1 comment · Fixed by #49135
Closed

stop using protos as the expected value in CPut #38308

danhhz opened this issue Jun 19, 2019 · 1 comment · Fixed by #49135
Assignees
Labels
A-kv-client Relating to the KV client and the KV interface. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior.

Comments

@danhhz
Copy link
Contributor

danhhz commented Jun 19, 2019

#38147 and #38004 each introduced a regression that broke RangeDescriptor updates (replica changes, splits, merges) after a cluster was upgraded from a prior version because of a bad interaction between CPut and protos.

Both bugs where the same general idea: A non-nullable field was added to RangeDescriptor (or ReplicaDescriptor, which is stored on RangeDescriptor). If a RangeDescriptor was written by an old version of cockroach, it wouldn't have the new field. Whenever we update RangeDescriptor, we do it with CPut to catch races. So a new binary would read the old serialized descriptor from kv (which doesn't have the new field), parse it into a RangeDescriptor, compute the update, then try to CPut the new RangeDescriptor. Proto2 includes all fields in the serialization, including ones with the default value, so it would include the new field. This wouldn't match what was in kv and the CPut would return a ConditionFailedError.

We could work around these two bugs by adding the fields as nullable, but that has an allocation cost. Worse though, the proto spec is pretty explicit about not relying on serialization of a given message to always produce exactly the same bytes. From https://developers.google.com/protocol-buffers/docs/encoding#implications

    Do not assume the byte output of a serialized message is stable.

Instead, what we should do is fetch the raw roachpb.Value from kv for the old RangeDescriptor, parse it into a RangeDescriptor and use that to compute the update, but also pass along the actual roachpb.Value we got. Then the CPut will use the roachpb.Value, which is guaranteed to match. If we need to do any equality of protos, we should be doing it using the Equal method on the parsed versions.

Further, there are more places where we CPut using a parsed proto message as the expected value, which are also footguns waiting to happen. We'll fix all of them and prevent future occurrences by changing the signature of CPut to accept a *roachpb.Value instead of interface{} for the expected value.

@danhhz danhhz added C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. A-kv-client Relating to the KV client and the KV interface. labels Jun 19, 2019
@danhhz danhhz self-assigned this Jun 19, 2019
danhhz added a commit to danhhz/cockroach that referenced this issue Jun 19, 2019
In cockroachdb#38147, a new non-nullable field was added to the ReplicaDescriptor
proto that is serialized inside RangeDescriptors. RangeDescriptors are
updated using CPut to detect races. This means that if a RangeDescriptor
had been written by an old version of cockroach and we then attempt to
update it, the CPut will fail because the encoded version of it is
different (non-nullable proto2 fields are always included).

A similar issue was introduced in cockroachdb#38004 which made the StickyBit field
on RangeDescriptor non-nullable.

We could keep the fields as nullable, but this has allocation costs (and
is also more annoying to work with).

Worse, the proto spec is pretty explicit about not relying on
serialization of a given message to always produce exactly the same
bytes:

    From https://developers.google.com/protocol-buffers/docs/encoding#implications
    Do not assume the byte output of a serialized message is stable.

So instead, we've decided to stop CPut-ing protos and to change the
interface of CPut to take the expected value as bytes. To CPut a proto,
the encoded value will be read from kv, passed around with the decoded
struct, and used in the eventual CPut. This work is upcoming, but the
above two PRs are real breakages, so this commit fixes those first.

Neither of these PRs made the last alpha so no release note.

Touches cockroachdb#38308

Release note: None
danhhz added a commit to danhhz/cockroach that referenced this issue Jun 20, 2019
In cockroachdb#38147, a new non-nullable field was added to the ReplicaDescriptor
proto that is serialized inside RangeDescriptors. RangeDescriptors are
updated using CPut to detect races. This means that if a RangeDescriptor
had been written by an old version of cockroach and we then attempt to
update it, the CPut will fail because the encoded version of it is
different (non-nullable proto2 fields are always included).

A similar issue was introduced in cockroachdb#38004 which made the StickyBit field
on RangeDescriptor non-nullable.

We could keep the fields as nullable, but this has allocation costs (and
is also more annoying to work with).

Worse, the proto spec is pretty explicit about not relying on
serialization of a given message to always produce exactly the same
bytes:

    From https://developers.google.com/protocol-buffers/docs/encoding#implications
    Do not assume the byte output of a serialized message is stable.

So instead, we've decided to stop CPut-ing protos and to change the
interface of CPut to take the expected value as bytes. To CPut a proto,
the encoded value will be read from kv, passed around with the decoded
struct, and used in the eventual CPut. This work is upcoming, but the
above two PRs are real breakages, so this commit fixes those first.

Neither of these PRs made the last alpha so no release note.

Touches cockroachdb#38308

Release note: None
craig bot pushed a commit that referenced this issue Jun 20, 2019
38302: storage: CPut bytes instead of a proto when updating RangeDescriptors r=tbg a=danhhz

In #38147, a new non-nullable field was added to the ReplicaDescriptor
proto that is serialized inside RangeDescriptors. RangeDescriptors are
updated using CPut to detect races. This means that if a RangeDescriptor
had been written by an old version of cockroach and we then attempt to
update it, the CPut will fail because the encoded version of it is
different (non-nullable proto2 fields are always included).

A similar issue was introduced in #38004 which made the StickyBit field
on RangeDescriptor non-nullable.

We could keep the fields as nullable, but this has allocation costs (and
is also more annoying to work with).

Worse, the proto spec is pretty explicit about not relying on
serialization of a given message to always produce exactly the same
bytes:

    From https://developers.google.com/protocol-buffers/docs/encoding#implications
    Do not assume the byte output of a serialized message is stable.

So instead, we've decided to stop CPut-ing protos and to change the
interface of CPut to take the expected value as bytes. To CPut a proto,
the encoded value will be read from kv, passed around with the decoded
struct, and used in the eventual CPut. This work is upcoming, but the
above two PRs are real breakages, so this commit fixes those first.

Neither of these PRs made the last alpha so no release note.

Touches #38308
Closes #38183

Release note: None

Co-authored-by: Daniel Harrison <[email protected]>
danhhz added a commit to danhhz/cockroach that referenced this issue Aug 31, 2019
The signature of CPut was changed so the expected value is now
*roachpb.Value instead of interface{}. This was primarily motivated by
protos because 1) the protobuf spec specifically mentions not to rely on
the serialized version being stable and 2) we had actual issues with
backward compatibility when trying to add new non-nullable fields as
described in cockroachdb#38308. Instead, we now use the roachpb.Value bytes
actually fetched from kv.

Touches cockroachdb#38308

Release note: None
danhhz added a commit to danhhz/cockroach that referenced this issue Aug 31, 2019
The signature of CPut was changed so the expected value is now
*roachpb.Value instead of interface{}. This was primarily motivated by
protos because 1) the protobuf spec specifically mentions not to rely on
the serialized version being stable and 2) we had actual issues with
backward compatibility when trying to add new non-nullable fields as
described in cockroachdb#38308. Instead, we now use the roachpb.Value bytes
actually fetched from kv.

Touches cockroachdb#38308

Release note: None
danhhz added a commit to danhhz/cockroach that referenced this issue Sep 4, 2019
The signature of CPut was changed so the expected value is now
*roachpb.Value instead of interface{}. This was primarily motivated by
protos because 1) the protobuf spec specifically mentions not to rely on
the serialized version being stable and 2) we had actual issues with
backward compatibility when trying to add new non-nullable fields as
described in cockroachdb#38308. Instead, we now use the roachpb.Value bytes
actually fetched from kv.

Touches cockroachdb#38308

Release note: None
danhhz added a commit to danhhz/cockroach that referenced this issue Sep 4, 2019
The signature of CPut was changed so the expected value is now
*roachpb.Value instead of interface{}. This was primarily motivated by
protos because 1) the protobuf spec specifically mentions not to rely on
the serialized version being stable and 2) we had actual issues with
backward compatibility when trying to add new non-nullable fields as
described in cockroachdb#38308. Instead, we now use the roachpb.Value bytes
actually fetched from kv.

Touches cockroachdb#38308

Release note: None
craig bot pushed a commit that referenced this issue Sep 4, 2019
40357: storage/engine: fix unnecessary C.ulonglong conversion r=nvanbenschoten a=nvanbenschoten

Something recently shook up the build, causing the following lint warnings to start firing:

```
--- FAIL: TestLint/TestRoachLint (56.77s)
    lint_test.go:77:
        /Users/nathan/Go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:2216:35: unnecessary conversion
    lint_test.go:77:
        /Users/nathan/Go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:2217:35: unnecessary conversion
```

Release note: None

40397: *: migrate more uses of CPutDeprecated r=tbg,dt a=danhhz

The signature of CPut was changed so the expected value is now
*roachpb.Value instead of interface{}. This was primarily motivated by
protos because 1) the protobuf spec specifically mentions not to rely on
the serialized version being stable and 2) we had actual issues with
backward compatibility when trying to add new non-nullable fields as
described in #38308. Instead, we now use the roachpb.Value bytes
actually fetched from kv.

Touches #38308

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Daniel Harrison <[email protected]>
@andreimatei
Copy link
Contributor

The problem has been fixed for RangeDescriptors, but persists for, at least, Liveness.

andreimatei added a commit to andreimatei/cockroach that referenced this issue May 15, 2020
Fixes cockroachdb#38308

Updates to the liveness records are done as CPuts. Before this patch,
the CPuts' expected value was the re-marshaled proto with the previous
version. As cockroachdb#38308 explains, that's a bad practice since it prevents the
proto's encoding to change in any way (e.g. fields can't be removed).
Instead, this patch moves to keeping track of the raw bytes that have
been read from the DB, besides the unmarshalled liveness protos. The raw
bytes are used as the expected values.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue May 19, 2020
Fixes cockroachdb#38308

Updates to the liveness records are done as CPuts. Before this patch,
the CPuts' expected value was the re-marshaled proto with the previous
version. As cockroachdb#38308 explains, that's a bad practice since it prevents the
proto's encoding to change in any way (e.g. fields can't be removed).
Instead, this patch moves to keeping track of the raw bytes that have
been read from the DB, besides the unmarshalled liveness protos. The raw
bytes are used as the expected values.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue May 21, 2020
Fixes cockroachdb#38308

Updates to the liveness records are done as CPuts. Before this patch,
the CPuts' expected value was the re-marshaled proto with the previous
version. As cockroachdb#38308 explains, that's a bad practice since it prevents the
proto's encoding to change in any way (e.g. fields can't be removed).
Instead, this patch moves to keeping track of the raw bytes that have
been read from the DB, besides the unmarshalled liveness protos. The raw
bytes are used as the expected values.

Release note: None
irfansharif pushed a commit to irfansharif/cockroach that referenced this issue Jun 12, 2020
Fixes cockroachdb#38308

Updates to the liveness records are done as CPuts. Before this patch,
the CPuts' expected value was the re-marshaled proto with the previous
version. As cockroachdb#38308 explains, that's a bad practice since it prevents the
proto's encoding to change in any way (e.g. fields can't be removed).
Instead, this patch moves to keeping track of the raw bytes that have
been read from the DB, besides the unmarshalled liveness protos. The raw
bytes are used as the expected values.

Release note: None
irfansharif pushed a commit to irfansharif/cockroach that referenced this issue Jun 17, 2020
Fixes cockroachdb#38308

Updates to the liveness records are done as CPuts. Before this patch,
the CPuts' expected value was the re-marshaled proto with the previous
version. As cockroachdb#38308 explains, that's a bad practice since it prevents the
proto's encoding to change in any way (e.g. fields can't be removed).
Instead, this patch moves to keeping track of the raw bytes that have
been read from the DB, besides the unmarshalled liveness protos. The raw
bytes are used as the expected values.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this issue Jun 26, 2020
Fixes cockroachdb#38308

Updates to the liveness records are done as CPuts. Before this patch,
the CPuts' expected value was the re-marshaled proto with the previous
version. As cockroachdb#38308 explains, that's a bad practice since it prevents the
proto's encoding to change in any way (e.g. fields can't be removed).
Instead, this patch moves to keeping track of the raw bytes that have
been read from the DB, besides the unmarshalled liveness protos. The raw
bytes are used as the expected values.

Release note: None
craig bot pushed a commit that referenced this issue Jun 29, 2020
49135: kvserver: don't use marshaled proto for liveness CPut r=andreimatei a=andreimatei

Fixes #38308

Updates to the liveness records are done as CPuts. Before this patch,
the CPuts' expected value was the re-marshaled proto with the previous
version. As #38308 explains, that's a bad practice since it prevents the
proto's encoding to change in any way (e.g. fields can't be removed).
Instead, this patch moves to keeping track of the raw bytes that have
been read from the DB, besides the unmarshalled liveness protos. The raw
bytes are used as the expected values.

Release note: None

50658: geopb: rename Shape to ShapeType r=sumeerbhola a=otan

Rename Shape to ShapeType, in preparation for a new Shape protobuf.

Release note: None

50671: build: upgrade to Go 1.14 r=petermattis a=otan

Minor fixes required:
* Upgrade go.mod to prevent build.Import errors.
* Updated instructions to upgrade Go.
* Updated .pb.go files - these are tied with the gofmt of the
  environment and we cannot fix this.
* Update storage use cast DBSlice to DBString or it fails S1016 of
  staticcheck.
* Make lint fail if the import directory is not found.
* Fix acceptance test to include GOROOT or else build.Import fails from
  within the acceptance tests. The acceptance test is weirdly set up to
  run a Go binary from the outside container.

Checklist:
* [x] Adjust version in Docker image ([source](./builder/Dockerfile#L199-L200)).
* [x] Rebuild the Docker image and bump the `version` in `builder.sh` accordingly ([source](./builder.sh#L6)).
* [x] Bump the version in `go-version-check.sh` ([source](./go-version-check.sh)), unless bumping to a new patch release.
* [x] Bump the default installed version of Go in `bootstrap-debian.sh` ([source](./bootstrap/bootstrap-debian.sh#L40-42)).
* [ ] Update the `builder.dockerImage` parameter in the TeamCity [`Cockroach`](https://teamcity.cockroachdb.com/admin/editProject.html?projectId=Cockroach&tab=projectParams) and [`Internal`](https://teamcity.cockroachdb.com/admin/editProject.html?projectId=Internal&tab=projectParams) projects.

(I will do the last step after this has merged)

Resolves #48737.

Release note (general change): Bump the Go version to 1.14.



50675: httputil: remove req_go112.go r=knz a=otan

Release note: None

Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
@craig craig bot closed this as completed in 9c0dce9 Jun 29, 2020
craig bot pushed a commit that referenced this issue Aug 1, 2022
84420: sql/logictests: don't fail parent test when using retry r=knz a=stevendanna

testutils.SucceedsSoon calls Fatal() on the passed testing.T.  Here,
we were calling SucceedsSoon with the root T, which in the case of a
subtest resulted in this error showing up in our logs

    testing.go:1169: test executed panic(nil) or runtime.Goexit:
    subtest may have called FailNow on a parent test

This moves to using SucceedsSoonError so that we can process errors
using the same formatting that we use elsewhere.

Release note: None

85120: roachpb: make range/replica descriptor fields non-nullable r=pavelkalinnikov a=erikgrinaker

This patch makes all fields in range/replica descriptors non-nullable,
fixing a long-standing TODO.

Additionally, this fixes a set of bugs where code would use regular
comparison operators (e.g. `==`) to compare replica descriptors, which
with nullable pointer fields would compare the memory locations rather
than the values. One practical consequence of this was that the
DistSender would fail to use a new leaseholder with a non-`VOTER` type
(e.g.  `VOTER_INCOMING`), instead continuing to try other replicas
before backing off. However, there are further issues surrounding this
bug and it will be addressed separately in a way that is more amenable
to backporting.

The preparatory work for this was done in ea720e3.

Touches #85060.
Touches #38308.
Touches #38465.

Release note: None

85352: opt: revert data race fix r=mgartner a=mgartner

This commit reverts #37972. We no longer lazily build filter props and
share them across multiple threads.

Release note: None

Co-authored-by: Steven Danna <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-client Relating to the KV client and the KV interface. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants