Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libroach: Migrate IncrementalIterator logic to C++ #37496

Merged
merged 4 commits into from
Jun 4, 2019

Conversation

adityamaru27
Copy link
Contributor

@adityamaru27 adityamaru27 commented May 13, 2019

Export was previously executed using an MVCCIncrementalIterator
where the logic handling iteration over the diff of the key
range [startKey, endKey) and time range (startTime, endTime]
was written in Go. This iteration involved making a cgo call
to find every key, along with another cgo call for writing
each key to the sstable.
This migration resolves the aforementioned performance
bottleneck (#18884), by pushing all the required logic to C++
and exposing a single export method.

Based on a performance benchmark by running BACKUP on a tpcc database with 1000 warehouses we observe the following:

  • Over an average of 3 runs we see a 1.1x improvement in time performance. While the original binary took ~32m04s the changed implementation took ~28m55s. This is due to the elimination of a cgo call per key.

@CLAassistant
Copy link

CLAassistant commented May 13, 2019

CLA assistant check
All committers have signed the CLA.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@adityamaru27 adityamaru27 requested a review from dt May 13, 2019 20:39
@adityamaru27 adityamaru27 marked this pull request as ready for review May 13, 2019 20:39
@adityamaru27 adityamaru27 requested review from a team and removed request for a team May 13, 2019 20:39
@petermattis petermattis requested a review from ajkr May 13, 2019 21:03
@petermattis
Copy link
Collaborator

Also tagging @ajkr on this as he's familiar with both C++ and RocksDB.

@adityamaru27 adityamaru27 changed the title Migrate IncrementalIterator logic to C++ libroach: Migrate IncrementalIterator logic to C++ May 13, 2019
@dt
Copy link
Member

dt commented May 13, 2019

@ajkr you can hold off for a bit, we've discussed this a bit with @danhhz and are going to change the approach slightly and revise this before it is ready for review.

@adityamaru27 adityamaru27 removed request for dt and ajkr May 15, 2019 16:39
@adityamaru27 adityamaru27 force-pushed the port-export-to-c++ branch 2 times, most recently from 997083d to f508ca2 Compare May 15, 2019 17:31
@adityamaru27 adityamaru27 requested review from dt and ajkr May 15, 2019 17:36
var rows bulk.RowCounter
// TODO(dan): Move all this iteration into cpp to avoid the cgo calls.
// TODO(dan): Consider checking ctx periodically during the MVCCIterate call.
iter := engineccl.NewMVCCIncrementalIterator(batch, engineccl.IterOptions{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: you could pull this code into a go function that takes the same args are your new c++ function (and keep mvcc.go around for now). Then you could write a unit test that calls both functions on the same engine and compares the result?

Copy link
Contributor Author

@adityamaru27 adityamaru27 May 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but still investigating why it is taking the test so long to run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, 12 of 14 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru27 and @ajkr)


c-deps/libroach/db.h, line 104 at r2 (raw file):

// preferred function for Go callers.
DBStatus DBSstFileWriterAddRaw(DBSstFileWriter* fw, const ::rocksdb::Slice key,
                               const ::rocksdb::Slice val);

nit: newline at end of file


c-deps/libroach/db.cc, line 950 at r2 (raw file):

    // Skip tombstone (len=0) records when start time is zero (non-incremental)
    // and we are not exporting all versions.
    if (!export_all_revisions && iter.value().size() == 0 && start.wall_time == 0 &&

2¢: I might pull the three conditions that go into "are we skipping deletes?" into a bool before the loop, so we only check "are we skipping deletes && is this a delete"


c-deps/libroach/db.cc, line 974 at r2 (raw file):

  return res;
}

nit: newline at end of file


c-deps/libroach/incremental_iterator.h, line 1 at r2 (raw file):

// Copyright 2018 The Cockroach Authors.

nit: 2019


c-deps/libroach/incremental_iterator.h, line 52 at r2 (raw file):

  cockroach::util::hlc::LegacyTimestamp start_time;
  cockroach::util::hlc::LegacyTimestamp end_time;
};

nit: newline at end of file


c-deps/libroach/incremental_iterator.cc, line 1 at r2 (raw file):

// Copyright 2018 The Cockroach Authors.

nit: 2019


c-deps/libroach/incremental_iterator.cc, line 223 at r2 (raw file):

const rocksdb::Slice DBIncrementalIterator::key() { return iter.get()->rep->key(); }

const rocksdb::Slice DBIncrementalIterator::value() { return iter.get()->rep->value(); }

nit: newline at end of file


pkg/storage/engine/rocksdb.go, line 3201 at r1 (raw file):

func ExportToSst(
	ctx context.Context, e Reader, start, end MVCCKey, exportAllRevisions bool, io IterOptions,
) ([]byte, C.int64_t, error) {

I'd return an int64 instead of the c type.


pkg/storage/engine/rocksdb.go, line 3213 at r1 (raw file):

	var intentErr C.DBString

	err := statusToError(C.DBExportToSst(goToCKey(start), goToCKey(end), C.bool(exportAllRevisions),

I think you might have inverted the order of the commits?


pkg/storage/engine/rocksdb.go, line 3223 at r1 (raw file):

			}

			return nil, dataSize, &e

I'd return 0 instead of the undefined value (right?) in dataSize.

@adityamaru27 adityamaru27 force-pushed the port-export-to-c++ branch 2 times, most recently from 1d49170 to d8df6cd Compare May 16, 2019 22:08
Copy link
Contributor Author

@adityamaru27 adityamaru27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajkr and @dt)


c-deps/libroach/db.cc, line 950 at r2 (raw file):

Previously, dt (David Taylor) wrote…

2¢: I might pull the three conditions that go into "are we skipping deletes?" into a bool before the loop, so we only check "are we skipping deletes && is this a delete"

Done.


c-deps/libroach/db.cc, line 974 at r2 (raw file):

Previously, dt (David Taylor) wrote…

nit: newline at end of file

Done.


c-deps/libroach/incremental_iterator.h, line 1 at r2 (raw file):

Previously, dt (David Taylor) wrote…

nit: 2019

Done.


c-deps/libroach/incremental_iterator.h, line 52 at r2 (raw file):

Previously, dt (David Taylor) wrote…

nit: newline at end of file

Done.


c-deps/libroach/incremental_iterator.cc, line 1 at r2 (raw file):

Previously, dt (David Taylor) wrote…

nit: 2019

Done.


c-deps/libroach/incremental_iterator.cc, line 223 at r2 (raw file):

Previously, dt (David Taylor) wrote…

nit: newline at end of file

Done.


pkg/storage/engine/rocksdb.go, line 3201 at r1 (raw file):

Previously, dt (David Taylor) wrote…

I'd return an int64 instead of the c type.

Done.


pkg/storage/engine/rocksdb.go, line 3213 at r1 (raw file):

Previously, dt (David Taylor) wrote…

I think you might have inverted the order of the commits?

Done.


pkg/storage/engine/rocksdb.go, line 3223 at r1 (raw file):

Previously, dt (David Taylor) wrote…

I'd return 0 instead of the undefined value (right?) in dataSize.

Done.

@adityamaru27 adityamaru27 force-pushed the port-export-to-c++ branch 4 times, most recently from 964369c to d6628ec Compare May 20, 2019 20:04
@dt
Copy link
Member

dt commented May 20, 2019

@ajkr thanks for waiting -- this is now ready for review now!

@ajkr
Copy link
Contributor

ajkr commented May 21, 2019

Thanks, taking a look. Are there any benchmark results on export time/CPU with vs without this PR?

@adityamaru27
Copy link
Contributor Author

Thanks, taking a look. Are there any benchmark results on export time/CPU with vs without this PR?

Hey @ajkr! Not yet, but I am working on compiling some results from roachprod and existing export/backup benchmarks. I shall update the PR as soon as I have compiled them.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, all my comments are minor.

A couple high-level questions.

  • why does ExportToSst return a byte array rather than take a filename as an argument and export to that file?
  • what limits the size of the SST created by ExportToSst?

Reviewed 1 of 18 files at r3, 14 of 14 files at r4, 4 of 4 files at r5, 1 of 1 files at r7, 1 of 1 files at r8.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru27 and @dt)


c-deps/libroach/db.h, line 103 at r8 (raw file):

// DBSstFileWriterAddRaw is used internally -- DBSstFileWriterAdd is the
// preferred function for Go callers.
DBStatus DBSstFileWriterAddRaw(DBSstFileWriter* fw, const ::rocksdb::Slice key,

Do we need this declaration in db.h? I think it's only used in db.cc, so we should be able to declare and define it in an anonymous namespace there. (Similar to DBIterGetState(), for example.)


c-deps/libroach/db.cc, line 939 at r8 (raw file):

      DBSstFileWriterClose(writer);
      return state.status;
    } else if (!state.valid || kComparator.Compare(iter.key(), EncodeKey(end)) >= 0) {

We could avoid an allocation/memcpy on each iteration by getting the result of EncodeKey(end) before the for-loop. I don't know whether it'll make a noticeable difference, but typically I have seen memcpy/malloc/free near the top of CPU profiles for other range scan workloads.


c-deps/libroach/incremental_iterator.h, line 28 at r8 (raw file):

                        DBString* write_intent);
  ~DBIncrementalIterator();
  void advanceKey();

can advanceKey() be private?


c-deps/libroach/incremental_iterator.cc, line 52 at r8 (raw file):

    assert(!EmptyTimestamp(opts.max_timestamp_hint));
    DBIterOptions nontimebound_opts = DBIterOptions();
    nontimebound_opts.upper_bound = opts.upper_bound;

I'm kind of curious about the relationship between opts.upper_bound and this->end. Will they be the same?


pkg/ccl/storageccl/export.go, line 164 at r8 (raw file):

		}

		rows.BulkOpSummary.DataSize += int64(len(it.UnsafeKey().Key)) + int64(len(it.UnsafeValue()))

Is the result being computed here different from what's stored in the variable dataSize?

@dt
Copy link
Member

dt commented May 22, 2019

why does ExportToSst return a byte array rather than take a filename as an argument and export to that file?

In some cases we return the content in the response rather than write a file at all, and even when we're writing a "file" it is usually to remote, cloud storage, not the local file system. The cloud storage accessors are in Go, so passing the content to write back up and letting go decide what to do with it seems to make sense.

FWIW, prior to this change, we were also passing the same slice back from c++ to go when we called Finish on the RocksDBSstFileWriter.

what limits the size of the SST created by ExportToSst?

Range size.

Copy link
Contributor Author

@adityamaru27 adityamaru27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajkr and @dt)


c-deps/libroach/db.h, line 103 at r8 (raw file):

Previously, ajkr (Andrew Kryczka) wrote…

Do we need this declaration in db.h? I think it's only used in db.cc, so we should be able to declare and define it in an anonymous namespace there. (Similar to DBIterGetState(), for example.)

Done.


c-deps/libroach/db.cc, line 939 at r8 (raw file):

Previously, ajkr (Andrew Kryczka) wrote…

We could avoid an allocation/memcpy on each iteration by getting the result of EncodeKey(end) before the for-loop. I don't know whether it'll make a noticeable difference, but typically I have seen memcpy/malloc/free near the top of CPU profiles for other range scan workloads.

makes sense, done.


c-deps/libroach/incremental_iterator.h, line 28 at r8 (raw file):

Previously, ajkr (Andrew Kryczka) wrote…

can advanceKey() be private?

yes it can, done.


c-deps/libroach/incremental_iterator.cc, line 52 at r8 (raw file):

Previously, ajkr (Andrew Kryczka) wrote…

I'm kind of curious about the relationship between opts.upper_bound and this->end. Will they be the same?

opts.upper_bound is a DBKey with empty wall_time and logical fields (i.e. it only enforces a key upper bound). this->end has those fields populated, and is used in advanceKey() for the purpose of checking that we are within the required time bounds.


pkg/ccl/storageccl/export.go, line 164 at r8 (raw file):

Previously, ajkr (Andrew Kryczka) wrote…

Is the result being computed here different from what's stored in the variable dataSize?

So using dataSize instead of recomputing it here was causing existing backup tests to fail, as the computed bytes were more than the expected bytes. I initially thought this could be because we are iterating more keys than we should on the C++ side, but the number of rows and the SSTable data itself matched up to what was expected.
I use dataSize to ensure that we are in fact exporting some data (otherwise the levelDB iterator collecting stats after the cgo call panics) and then recompute the actual rows.BulkOpSummary.DataSize as was the case in the previous implementation.
Eventually, we would like to move the stats collection (including RowCounter) to C++, but decided it was not in the scope of this change.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 20 files at r9, 18 of 19 files at r10.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru27 and @dt)


c-deps/libroach/db.cc, line 937 at r10 (raw file):

  bool skip_current_key_versions = !export_all_revisions;
  DBIterState state;
  const rocksdb::Slice end_key = EncodeKey(end);

I think we need to store the string result otherwise the memory pointed to by the Slice can be freed.


pkg/ccl/storageccl/export.go, line 164 at r8 (raw file):

Previously, adityamaru27 (Aditya Maru) wrote…

So using dataSize instead of recomputing it here was causing existing backup tests to fail, as the computed bytes were more than the expected bytes. I initially thought this could be because we are iterating more keys than we should on the C++ side, but the number of rows and the SSTable data itself matched up to what was expected.
I use dataSize to ensure that we are in fact exporting some data (otherwise the levelDB iterator collecting stats after the cgo call panics) and then recompute the actual rows.BulkOpSummary.DataSize as was the case in the previous implementation.
Eventually, we would like to move the stats collection (including RowCounter) to C++, but decided it was not in the scope of this change.

Got it, thanks for the explanation.

Copy link
Contributor Author

@adityamaru27 adityamaru27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru27, @ajkr, and @dt)


c-deps/libroach/db.cc, line 937 at r10 (raw file):

Previously, ajkr (Andrew Kryczka) wrote…

I think we need to store the string result otherwise the memory pointed to by the Slice can be freed.

Thanks for this catch! I was scratching my head as to why the tests were failing with these changes.

@adityamaru27
Copy link
Contributor Author

@dt changed the headers of the newly added files to BSL. Will merge once you have a quick look!

The export logic requires access to the C.DBEngine of an
engine.Reader. In the case of a spanSetReader this
requires us to recursively access the engine.Reader member
until we reach either a RocksDB or rocksDBReadOnly engine.
A type switch construct is deemed necessary in
storageccl/export.go, requiring the SpanSetReadWriter type to
be exported.

Note: lint fails if we name it SpanSetReadWriter post exporting,
as other packages would have to access it using
spanset.SpanSetReadWriter which introduces a "stutter". It has
been renamed to ReadWriter.

Release note: None
All of the IncrementalIterator logic was previously written in Go.
This change migrates all the required implementations to C++ and
allows the DBExportToSst method to use this iterator to export
the required keys to an SSTable.

Release note: Performance improvement to the BACKUP process.
Export was previously executed using an MVCCIncrementalIterator
where the logic handling iteration over the diff of the key
range [startKey, endKey) and time range (startTime, endTime]
was written in Go. This iteration involved making a cgo call
to find every key, along with another cgo call for writing
each key to the sstable.
This migration resolves the aforementioned performance
bottleneck (cockroachdb#18884), by pushing all the required logic to C++
and exposing a single export method.

Release note: Performance improvement to the BACKUP process.
Added a test which generates random KVPs and timestamps, subsequently
attempting to export the data between provided key and time ranges.
Correctness of the new export logic is checked against the old
implementation which used the now non-existant MVCCIncrementalIterator.

Release note: None
@adityamaru27
Copy link
Contributor Author

bors r=dt

@craig
Copy link
Contributor

craig bot commented Jun 4, 2019

🔒 Permission denied

Existing reviewers: click here to make adityamaru27 a reviewer

@adityamaru27
Copy link
Contributor Author

bors r=dt

craig bot pushed a commit that referenced this pull request Jun 4, 2019
37496: libroach: Migrate IncrementalIterator logic to C++ r=dt a=adityamaru27

Export was previously executed using an `MVCCIncrementalIterator`
where the logic handling iteration over the diff of the key
range `[startKey, endKey)` and time range `(startTime, endTime]`
was written in Go. This iteration involved making a cgo call
to find every key, along with another cgo call for writing
each key to the sstable.
This migration resolves the aforementioned performance
bottleneck (#18884), by pushing all the required logic to C++
and exposing a single export method.

Based on a performance benchmark by running BACKUP on a tpcc database with 1000 warehouses we observe the following:

- Over an average of 3 runs we see a 1.1x improvement in time performance. While the original binary took ~32m04s the changed implementation took ~28m55s. This is due to the elimination of a cgo call per key.

Co-authored-by: Aditya Maru <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jun 4, 2019

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants