Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UNSPLIT AT statement page and diagrams #5360

Merged
merged 2 commits into from
Sep 26, 2019
Merged

UNSPLIT AT statement page and diagrams #5360

merged 2 commits into from
Sep 26, 2019

Conversation

ericharmeling
Copy link
Contributor

@ericharmeling ericharmeling commented Sep 3, 2019

  • Added new page for UNSPLIT AT statement.
  • Added new diagrams for UNSPLIT AT statement.
  • Added new diagrams/parameter for SPLIT AT statement for WITH EXPIRATION.
  • Changed all UNSPLIT AT examples to use movr database.

Fixes #4895.
Fixes #5008.
Fixes #5007.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@rmloveland rmloveland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall direction LGTM, I think we have a few things to address re: terminology ("remove split enforcement" vs. "unsplit") and interaction of this with range merge settings which are not mentioned at all here or in SPLIT AT (yet). See below for more details.

@@ -0,0 +1,25 @@
<div><svg width="631" height="81">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file does not appear to be included from any pages. Unless I'm missing it?

@@ -17,3 +17,4 @@ Subcommand | Description
[`CONFIGURE ZONE`](configure-zone.html) | [Configure replication zones](configure-replication-zones.html) for an index.
[`RENAME`](rename-index.html) | Change the name of an index.
[`SPLIT AT`](split-at.html) | Force a key-value layer range split at the specified row in the index.
[`UNSPLIT AT`](unsplit-at.html) | Remove a key-value layer range split enforcement at a specified row in the index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add the "new in 19.2" flag to all of these new links to UNSPLIT AT, the same as we've done on the 'SQL Statements' page.

@@ -35,6 +35,7 @@ Subcommand | Description | Can combine with other subcommands?
[`RENAME CONSTRAINT`](rename-constraint.html) | Change constraints columns. | Yes
[`RENAME TABLE`](rename-table.html) | Change the names of tables. | No
[`SPLIT AT`](split-at.html) | Force a key-value layer range split at the specified row in the table. | No
[`UNSPLIT AT`](unsplit-at.html) | Remove a key-value layer range split enforcement at a specified row in the table. | No
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as previously re: "new in 19.2" flag.

@@ -252,3 +252,4 @@ SHOW RANGES FROM TABLE t;
- [Distribution Layer](architecture/distribution-layer.html)
- [Replication Layer](architecture/replication-layer.html)
- [`SHOW JOBS`](show-jobs.html)
- [`UNSPLIT AT`](unsplit-at.html)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as previously re: "new in 19.2" flag. Also, I think the SPLIT AT page should be updated to link to the Range Merges doc and to note that you have to explicitly turn off Range Merges if you want to use SPLIT AT. Perhaps that should be a separate issue? UPDATE: I did a search and found that #4960 has been filed to address this. I've added it to the 19.2 milestone as a P-1.

@@ -0,0 +1,192 @@
---
title: UNSPLIT AT
summary: The UNSPLIT AT statement reverts a manual key-value layer range split at a specified row in a table or index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This summary could be updated to use the same "split enforcement" terminology as the doc. I must admit I find "reverts a range split" more understandable than "removes a split enforcement", but I guess the latter is a more correct description of what happens?

(4 rows)
~~~

The `split_enforced_until` column is now `NULL` for all ranges in the `users` table. The table is still split into ranges at `'chicago'`, `'new york'`, and `'seattle'`, but the split is no longer enforced.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to mention that "now the system will be free to re-merge the ranges at its convenience once you re-enable the kv.range_merge.queue_enabled setting to true. At least according to the Range Merges doc you will need to have turned off the range merge queue to even be able to SPLIT AT - if I am reading that correctly? I think we should at least make mention of that in the overview (I'll edit the comment on the overview as well).

~~~
~~~
range_id | start_key | start_pretty | end_key | end_pretty | database_name | table_name | index_name | replicas | learner_replicas | split_enforced_until | lease_holder
+----------+-------------------+------------------+-------------------+------------------+---------------+------------+-------------+----------+------------------+----------------------------------+--------------+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we have more updates to do re: split_enforced_until incoming via cockroachdb/cockroach#38004 (namely a WITH EXPIRATION option to SPLIT AT) but that will happen via another ticket.

(5 rows)
~~~

Now unsplit the index:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at this even longer I think we should use the "remove split enforcement, let the range merge queue run AKA unsplit" explanation in the intro and then just say "unsplit" everywhere else down-page.

(5 rows)
~~~

The `split_enforced_until` column is now `NULL` for all ranges in the `rides` table. The table is still split into ranges at `25.00`, `50.00`, and `75.00`, but the split is no longer enforced.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above re: giving a little more color about when/how the system merges the ranges back together.



## See also

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also link to the Range Merges doc here.

Copy link
Contributor Author

@ericharmeling ericharmeling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @rmloveland)


_includes/v19.2/sql/diagrams/unsplit.html, line 1 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

This file does not appear to be included from any pages. Unless I'm missing it?

No you didn't miss anything... it's not included anywhere. I'll delete it.


v19.2/alter-index.md, line 20 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

I think we should add the "new in 19.2" flag to all of these new links to UNSPLIT AT, the same as we've done on the 'SQL Statements' page.

Done.


v19.2/alter-table.md, line 38 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Same comment as previously re: "new in 19.2" flag.

Done.


v19.2/split-at.md, line 255 at r1 (raw file):
Done.
I never ran into any errors or warnings when I tried to split tables manually and I never changed any settings... In #4960 @andreimatei writes:

In 19.2 I believe the above limitation goes away and the new with expiration option should be documented. Introduced in cockroachdb/cockroach#38004 by @jeffrey-xiao.

So I think this requirement has been lifted (consistent with what @andreimatei wrote in #4960). Also, I didn't run into any warnings or any setting to prevent me from splitting any ranges, and I never set any settings prior to splitting/unsplitting. I did this both in an insecure local cluster (cockroach start --insecure) and with cockroach demo.


v19.2/unsplit-at.md, line 3 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

This summary could be updated to use the same "split enforcement" terminology as the doc. I must admit I find "reverts a range split" more understandable than "removes a split enforcement", but I guess the latter is a more correct description of what happens?

Done.


v19.2/unsplit-at.md, line 7 at r1 (raw file):

make "range split" the link text

Done.

describes what happens when the "split enforcement" is removed

Done.

is there some cluster setting that would control how slowly or quickly this merging will happen?

Not that I am aware of... @nvanbenschoten Can you control range merges in any way (other than preventing them with a manual split) with cluster settings or something else?

a callout that links to the Range Merges doc and mentions that the range merge cluster setting.

I think this requirement has been lifted (consistent with what @andreimatei wrote in #4960). Also, I didn't run into any warnings or any setting to prevent me from splitting any ranges, and I never set any settings prior to splitting/unsplitting. I did this both in an insecure local cluster (cockroach start --insecure) and with cockroach demo.


v19.2/unsplit-at.md, line 17 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

These diagrams look great!

:)


v19.2/unsplit-at.md, line 27 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Same comment as above re: standardizing on whether we are saying "unsplit this" vs. "removing split enforcements". If we stick with "remove split enforcement" I think we should describe how that works generally in the overview as noted in previous comment.

Done.


v19.2/unsplit-at.md, line 28 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Same as above re: "unsplit" vs. "remove split enforcement".

Done.


v19.2/unsplit-at.md, line 29 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Same as above re: "unsplit" vs. "remove split enforcement". Another option is to clarify in the intro that "unsplit" actually means remove split enforcement, but hereafter we will just say unsplit", which IMO would be a fine way to address this while simplifying the verbiage.

I just explained a little more what we mean by "unsplit", but also opt for "remove split enforcement" where it is clarifying elsewhere.


v19.2/unsplit-at.md, line 37 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Could the text of --nodes and --demo-locality be made into links to the specific flags? This can be accomplished by adding an anchor <a name="nodes-flag"></a> inside the table on the cockroach demo page. Also, this says "the --nodes ... tags" but I think you meant "the --nodes ... flags" ??? Unless "tags" is an alternative name like "soda" vs. "pop" which I was not aware of. But I would still vote "flag" for consistency with our other docs. Or "options" could work.

Oh no you are totally right that is my bad. I changed the "tags" to "flags" and then added some refs to the "General" flags part of the cockroack demo page. Unfortunately we don't have anchors for those specific arguments at this point (although we could add some).


v19.2/unsplit-at.md, line 41 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

I would reword the last clause slightly to "At this point, the data in the users table can be stored in just one range". The table is the more important "thing" to a SQL user while the range is merely there in service of it.

Done.


v19.2/unsplit-at.md, line 114 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Might be good to mention that "now the system will be free to re-merge the ranges at its convenience once you re-enable the kv.range_merge.queue_enabled setting to true. At least according to the Range Merges doc you will need to have turned off the range merge queue to even be able to SPLIT AT - if I am reading that correctly? I think we should at least make mention of that in the overview (I'll edit the comment on the overview as well).

I think this requirement has been lifted (consistent with what @andreimatei wrote in #4960). Also, I didn't run into any warnings or any setting to prevent me from splitting any ranges, and I never set any settings prior to splitting/unsplitting. I did this both in an insecure local cluster (cockroach start --insecure) and with cockroach demo.


v19.2/unsplit-at.md, line 144 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Note that we have more updates to do re: split_enforced_until incoming via cockroachdb/cockroach#38004 (namely a WITH EXPIRATION option to SPLIT AT) but that will happen via another ticket.

OK, great to know.


v19.2/unsplit-at.md, line 153 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

After looking at this even longer I think we should use the "remove split enforcement, let the range merge queue run AKA unsplit" explanation in the intro and then just say "unsplit" everywhere else down-page.

OK!


v19.2/unsplit-at.md, line 183 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Same comment as above re: giving a little more color about when/how the system merges the ranges back together.

Done.


v19.2/unsplit-at.md, line 187 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

I think we should also link to the Range Merges doc here.

Done.

Copy link
Contributor

@rmloveland rmloveland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: thanks Eric!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @ericharmeling, and @nvanbenschoten)


_includes/v19.2/sql/diagrams/unsplit.html, line 1 at r1 (raw file):

Previously, ericharmeling (Eric Harmeling) wrote…

No you didn't miss anything... it's not included anywhere. I'll delete it.

Thanks!


v19.2/split-at.md, line 255 at r1 (raw file):

Previously, ericharmeling (Eric Harmeling) wrote…

Done.
I never ran into any errors or warnings when I tried to split tables manually and I never changed any settings... In #4960 @andreimatei writes:

In 19.2 I believe the above limitation goes away and the new with expiration option should be documented. Introduced in cockroachdb/cockroach#38004 by @jeffrey-xiao.

So I think this requirement has been lifted (consistent with what @andreimatei wrote in #4960). Also, I didn't run into any warnings or any setting to prevent me from splitting any ranges, and I never set any settings prior to splitting/unsplitting. I did this both in an insecure local cluster (cockroach start --insecure) and with cockroach demo.

Ah! thanks for explaining


v19.2/unsplit-at.md, line 41 at r1 (raw file):

Previously, ericharmeling (Eric Harmeling) wrote…

Done.

It's def not a blocker but I think you forgot to push this change? Or at least I'm not seeing it, sorry.


v19.2/unsplit-at.md, line 114 at r1 (raw file):

Previously, ericharmeling (Eric Harmeling) wrote…

I think this requirement has been lifted (consistent with what @andreimatei wrote in #4960). Also, I didn't run into any warnings or any setting to prevent me from splitting any ranges, and I never set any settings prior to splitting/unsplitting. I did this both in an insecure local cluster (cockroach start --insecure) and with cockroach demo.

Ah, thanks for reading that more closely than I did. :-}

Copy link
Contributor Author

@ericharmeling ericharmeling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @rmloveland)


v19.2/unsplit-at.md, line 41 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

It's def not a blocker but I think you forgot to push this change? Or at least I'm not seeing it, sorry.

Ah, I changed this, then in the process of making another change, changed the wording again because the data can be stored in one or three or 5 ranges at this point, whatever CRDB decides, correct?

Copy link
Contributor

@rmloveland rmloveland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @ericharmeling, @nvanbenschoten, and @rmloveland)


v19.2/unsplit-at.md, line 41 at r1 (raw file):

the data can be stored in one or three or 5 ranges at this point, whatever CRDB decides, correct?

AIUI, yes. Sorry for the noise on this one. Gosh I hate having to think about ranges up here in SQL-land.

Sincerely, a dumb user

@ericharmeling
Copy link
Contributor Author

@jseldess Added you for a quick pass before merging.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks fine to me. I can't give it a more in-depth review, nor am I a particular authority on this functionality. I've just commented with some immediate nits.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @ericharmeling, @jseldess, @nvanbenschoten, and @rmloveland)


v19.2/unsplit-at.md, line 3 at r2 (raw file):

---
title: UNSPLIT AT
summary: The UNSPLIT AT statement removes a key-value layer range split enforcement at a specified row in the table or index..

nit: unless we use this "key-value layer" terminology extensively, I wouldn't use it here or on the SPLIT AT page. I'd just say "range split".

On the split at page we say "The key-value layer of CockroachDB is broken into sections of contiguous key...". I'd just say "data is split into ranges".


v19.2/unsplit-at.md, line 18 at r2 (raw file):

<div>
  {% include {{ page.version.version }}/sql/diagrams/unsplit_index_at.html %}

I have an pet peeve with this diagram, as well as the analogous one on the split at page. And I think it applies to other pages as well.
The thing just says select_stmt and that takes you to some monster. The thing I'm always looking for is the values (...) syntax, which is eventually presented on these pages, but it's pretty buried.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 9 files at r1, 8 of 8 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @ericharmeling, @jseldess, and @rmloveland)


v19.2/unsplit-at.md, line 3 at r2 (raw file):

---
title: UNSPLIT AT
summary: The UNSPLIT AT statement removes a key-value layer range split enforcement at a specified row in the table or index..

nit: two periods.


v19.2/unsplit-at.md, line 35 at r2 (raw file):

## Viewing schema changes

{% include {{ page.version.version }}/misc/schema-change-view-job.md %}

I don't think this is true of SPLIT AT or UNSPLIT AT.


v19.2/unsplit-at.md, line 47 at r2 (raw file):

{% include copy-clipboard.html %}
~~~ sql
> SELECT * FROM crdb_internal.ranges WHERE table_name='users';

nit: Consider refining the projection here to just the columns that users might find interesting (e.g. range_id, start_pretty, end_pretty, split_enforced_until).

Copy link
Contributor Author

@ericharmeling ericharmeling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @jseldess, and @rmloveland)


v19.2/unsplit-at.md, line 41 at r1 (raw file):

a dumb user

No such thing!!


v19.2/unsplit-at.md, line 3 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: unless we use this "key-value layer" terminology extensively, I wouldn't use it here or on the SPLIT AT page. I'd just say "range split".

On the split at page we say "The key-value layer of CockroachDB is broken into sections of contiguous key...". I'd just say "data is split into ranges".

Done (x2)


v19.2/unsplit-at.md, line 3 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: two periods.

Done.


v19.2/unsplit-at.md, line 18 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I have an pet peeve with this diagram, as well as the analogous one on the split at page. And I think it applies to other pages as well.
The thing just says select_stmt and that takes you to some monster. The thing I'm always looking for is the values (...) syntax, which is eventually presented on these pages, but it's pretty buried.

That's a good point. On some SQL statement pages (e.g. CREATE TABLE), we have a "basic" diagram view, that is compact and high-level, and then an "expanded" diagram view, where we show diagrams for some of those pieces that contain a lot more information buried down (e.g. your select_stmt's). I'm going to open a separate issue to create that "expand" view for all SQL statements that need it. There are other issues open that deal with the same issue, for other statements (e.g. the INSERT DIAGRAM).


v19.2/unsplit-at.md, line 35 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I don't think this is true of SPLIT AT or UNSPLIT AT.

OK. Removing this from both.


v19.2/unsplit-at.md, line 47 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: Consider refining the projection here to just the columns that users might find interesting (e.g. range_id, start_pretty, end_pretty, split_enforced_until).

Done.

@ericharmeling
Copy link
Contributor Author

@nvanbenschoten @andreimatei Thank you for the feedback!

I do have a final question...

When I create an index, it appears that CRDB performs a manual split, but with a much shorter expiration time on the enforcement.

[email protected]:63845/movr> SELECT range_id, start_pretty, end_pretty, split_enforced_until FROM crdb_internal.ranges WHERE table_name='rides';
  range_id | start_pretty | end_pretty | split_enforced_until
+----------+--------------+------------+----------------------+
        23 | /Table/55    | /Table/56  | NULL
(1 row)

Time: 2.002ms

[email protected]:63845/movr> CREATE INDEX revenue_idx ON rides(revenue);
CREATE INDEX

Time: 91.527ms

[email protected]:63845/movr> SELECT range_id, start_pretty, end_pretty, split_enforced_until FROM crdb_internal.ranges WHERE table_name='rides';
  range_id | start_pretty | end_pretty  |      split_enforced_until
+----------+--------------+-------------+---------------------------------+
        23 | /Table/55    | /Table/55/4 | NULL
        27 | /Table/55/4  | /Table/56   | 2019-09-10 21:37:44.47542+00:00
(2 rows)

Time: 1.346ms

What's going on here? When we create indexes, does this temporary split always happen?

@nvanbenschoten
Copy link
Member

What's going on here? When we create indexes, does this temporary split always happen?

I wasn't aware that it ever happened. @andreimatei do you know about this?

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That must be a schema change thing. I don't know what the deal is. cc @dt

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andreimatei, @jseldess, @nvanbenschoten, and @rmloveland)

Copy link
Contributor

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andreimatei, @ericharmeling, @nvanbenschoten, and @rmloveland)


v19.2/alter-index.md, line 20 at r3 (raw file):

[`RENAME`](rename-index.html) | Change the name of an index.
[`SPLIT AT`](split-at.html) | Force a key-value layer range split at the specified row in the index.
[`UNSPLIT AT`](unsplit-at.html) | <span class="version-tag">New in v19.2:</span> Remove a key-value layer range split enforcement at a specified row in the index.

nit: If you're changing key-value layer range split to just range split elsewhere, might as well do that here, too. Bot split at and unsplit at.


v19.2/alter-table.md, line 38 at r3 (raw file):

[`RENAME TABLE`](rename-table.html) | Change the names of tables. | No
[`SPLIT AT`](split-at.html) | Force a key-value layer range split at the specified row in the table. | No
[`UNSPLIT AT`](unsplit-at.html) | <span class="version-tag">New in v19.2:</span> Remove a key-value layer range split enforcement at a specified row in the table. | No

Same as above.


v19.2/unsplit-at.md, line 18 at r2 (raw file):

Previously, ericharmeling (Eric Harmeling) wrote…

That's a good point. On some SQL statement pages (e.g. CREATE TABLE), we have a "basic" diagram view, that is compact and high-level, and then an "expanded" diagram view, where we show diagrams for some of those pieces that contain a lot more information buried down (e.g. your select_stmt's). I'm going to open a separate issue to create that "expand" view for all SQL statements that need it. There are other issues open that deal with the same issue, for other statements (e.g. the INSERT DIAGRAM).

Yep, I think this is the same basic topic as #5396.


v19.2/unsplit-at.md, line 35 at r3 (raw file):

## Examples

{% include {{page.version.version}}/sql/movr-statements-nodes.md %}

@ericharmling, I'd like to test these examples, but it looks like this branch is behind master. Can you rebase this against an up-to-date master and repush and then let me know?

Copy link
Contributor Author

@ericharmeling ericharmeling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andreimatei, @jseldess, @nvanbenschoten, and @rmloveland)


v19.2/alter-index.md, line 20 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

nit: If you're changing key-value layer range split to just range split elsewhere, might as well do that here, too. Bot split at and unsplit at.

Done.


v19.2/alter-table.md, line 38 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Same as above.

Done.


v19.2/unsplit-at.md, line 35 at r3 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

@ericharmling, I'd like to test these examples, but it looks like this branch is behind master. Can you rebase this against an up-to-date master and repush and then let me know?

Done.

Copy link
Contributor

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Eric. :lgtm: with a few final nits.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andreimatei, @ericharmeling, @nvanbenschoten, and @rmloveland)


v19.2/unsplit-at.md, line 114 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Ah, thanks for reading that more closely than I did. :-}

Can you make merge the data a link to the page on range merges?


v19.2/unsplit-at.md, line 181 at r4 (raw file):

the split_enforced_until column is now NULL for all ranges in the table

That doesn't seem to be true. One range still has a split_enforced_until date. Do you need to change the unsplit command somehow to make all of them null?

Also, can you make merge the data a link to the page on range merges?

@ericharmeling
Copy link
Contributor Author

@jseldess Hate to do this again, but can you re-review? I changed all of the examples for SPLIT AT to use movr, added an example for WITH EXPIRATION, and I updated the diagrams/parameters to reflect the WITH EXPIRATION change.

Copy link
Contributor

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: with some final nits.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andreimatei, @ericharmeling, @jseldess, @nvanbenschoten, and @rmloveland)


v19.2/split-at.md, line 61 at r5 (raw file):

## Examples

{% include {{page.version.version}}/sql/movr-statements-partitioning.md %}

How about putting this under a ### Setup heading? Also, why not use the same include that you use for unsplit?

{% include {{page.version.version}}/sql/movr-statements-nodes.md %}

v19.2/split-at.md, line 106 at r5 (raw file):

~~~

### Split a table with a composite primary key

This is a nice addition, but why use a new table rather than one of the built-in movr tables, many of which already have compound primary keys?

Btw, I think you want to use the term compound instead of composite: https://en.wikipedia.org/wiki/Compound_key. That's the term we use in the partitioning docs, I think.


v19.2/split-at.md, line 110 at r5 (raw file):

You may want to split a table with a composite primary key (e.g., when working with [partitions](partitioning.html#partition-using-primary-key)).

Suppose that you want MovR to offer ride-sharing services, in addition to vehicle-sharing services. Some users need sign up to be drivers, so you need a `drivers` table to store driver information.

Some users need sign up... > Some users need to sign up...


v19.2/split-at.md, line 146 at r5 (raw file):

~~~

Now you can split the table based on the composite primary key. Note that you don't have to specify the entire value for the primary key, just the prefix.

Since you emphasize only needing the specify the prefix, why don't you do that in the example? That would just be the city values, right?


v19.2/unsplit-at.md, line 114 at r1 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Can you make merge the data a link to the page on range merges?

Still think we should add a link to the range merges doc.


v19.2/unsplit-at.md, line 181 at r4 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

the split_enforced_until column is now NULL for all ranges in the table

That doesn't seem to be true. One range still has a split_enforced_until date. Do you need to change the unsplit command somehow to make all of them null?

Also, can you make merge the data a link to the page on range merges?

Still think we should add a link to the range merges doc.


v19.2/unsplit-at.md, line 52 at r5 (raw file):

~~~

Now split the `users` table ranges based on primary key values:

Maybe make split a link to split-at.html?


v19.2/unsplit-at.md, line 116 at r5 (raw file):

### Unsplit an index

Add a new secondary [index](indexes.html) to the `rides` table, on the `revenue` column, and then split the table ranges by secondary index values:

Maybe make split a link to split-at.html.

Copy link
Contributor Author

@ericharmeling ericharmeling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jseldess See my responses to two questions. We can discuss more offline as well.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andreimatei, @jseldess, @nvanbenschoten, and @rmloveland)


v19.2/split-at.md, line 61 at r5 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

How about putting this under a ### Setup heading? Also, why not use the same include that you use for unsplit?

{% include {{page.version.version}}/sql/movr-statements-nodes.md %}

Done.

So my reasoning behind using a different number of nodes/localities for SPLIT AT and UNSPLIT AT is a little convoluted... I'd like to limit the amount of documentation exposure that we give to crdb_internal, especially in SQL statement documentation. For UNSPLIT, the crdb_internal output is kind of important because it shows you whether or not the range splits are enforced. For SPLIT AT, you just need to see the different ranges pre- and post-split. So for the SPLIT AT examples, I'm using SHOW RANGES, which lists out the replicas and the lease holders and the localities. All of those attributes are really interesting if you have a slightly more complex deployment. Also, we mention partitioning when splitting a table with a compound primary key, and partitioning doesn't make much sense with just 3 nodes. Thoughts?


v19.2/split-at.md, line 106 at r5 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

This is a nice addition, but why use a new table rather than one of the built-in movr tables, many of which already have compound primary keys?

Btw, I think you want to use the term compound instead of composite: https://en.wikipedia.org/wiki/Compound_key. That's the term we use in the partitioning docs, I think.

It's true that most, if not all, movr tables have compound keys. Those compound keys are on city (simple and intuitive string) and id (random UUID). A city is an easy value to split at, since there are discrete values that you can map to actual cities. id, however, is a UUID, so it's a much less intuitive value to split at. You need to first print out all of the ID's, sorted alphabetically or numerically, then you need to figure out what kind of distribution you have on them per city, so you can split each city into separate ranges based on those groups of ids. Also, (and probably most importantly) you can't use a prefix when you specify the split value for a UUID because the value type in your SPLIT ... AT <value> needs to match the value type of the primary key. Anyways, it would be better to pick a different, more intuitive and categorizable value that can be prefixed to split at, but since you can't change the primary index of an existing table using an ALTER TABLE statement, you need to create a new table. So since we've already used the drivers table, I figured we'd make a new table with the primary key on city and something else a little more manageable like dl, which is unique and randomly generated in our example, but at least we know that it's a string (can be prefixed) and is at most 8 characters in length.

Apologies for such a lengthy explanation! We can also discuss more offline.


v19.2/split-at.md, line 110 at r5 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Some users need sign up... > Some users need to sign up...

Done.


v19.2/split-at.md, line 146 at r5 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Since you emphasize only needing the specify the prefix, why don't you do that in the example? That would just be the city values, right?

We do that for the driver's license. Which is coincidentally why I chose to create the new table (to make a primary key that is a string that can be prefixed, unlike the UUID id)


v19.2/unsplit-at.md, line 114 at r1 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Still think we should add a link to the range merges doc.

Done. Sorry - I don't know why I skipped over this comment!


v19.2/unsplit-at.md, line 181 at r4 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Still think we should add a link to the range merges doc.

Done.


v19.2/unsplit-at.md, line 52 at r5 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Maybe make split a link to split-at.html?

Done.


v19.2/unsplit-at.md, line 116 at r5 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Maybe make split a link to split-at.html.

Done.

Copy link
Contributor Author

@ericharmeling ericharmeling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @andreimatei, @jseldess, @nvanbenschoten, and @rmloveland)


v19.2/split-at.md, line 106 at r5 (raw file):

Previously, ericharmeling (Eric Harmeling) wrote…

It's true that most, if not all, movr tables have compound keys. Those compound keys are on city (simple and intuitive string) and id (random UUID). A city is an easy value to split at, since there are discrete values that you can map to actual cities. id, however, is a UUID, so it's a much less intuitive value to split at. You need to first print out all of the ID's, sorted alphabetically or numerically, then you need to figure out what kind of distribution you have on them per city, so you can split each city into separate ranges based on those groups of ids. Also, (and probably most importantly) you can't use a prefix when you specify the split value for a UUID because the value type in your SPLIT ... AT <value> needs to match the value type of the primary key. Anyways, it would be better to pick a different, more intuitive and categorizable value that can be prefixed to split at, but since you can't change the primary index of an existing table using an ALTER TABLE statement, you need to create a new table. So since we've already used the drivers table, I figured we'd make a new table with the primary key on city and something else a little more manageable like dl, which is unique and randomly generated in our example, but at least we know that it's a string (can be prefixed) and is at most 8 characters in length.

Apologies for such a lengthy explanation! We can also discuss more offline.

@nvanbenschoten

Is there a reason why we even have this section? Jesse and I were wondering when you would even want to manually split a table at all.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @andreimatei, @jseldess, @nvanbenschoten, and @rmloveland)


v19.2/split-at.md, line 106 at r5 (raw file):

Previously, ericharmeling (Eric Harmeling) wrote…

@nvanbenschoten

Is there a reason why we even have this section? Jesse and I were wondering when you would even want to manually split a table at all.

You're asking why we have manual splits at all? Isn't that discussed in the "Why manually split a range?" section? Manually splitting is certainly much less important these days than it used to be, now that we have load based splitting to automatically split hotspots based on load, but it's still an option we provide. Maybe I'm missing your question though.

With respect to the UUID discussion: UUIDs are actually pretty easy to split because we know their distribution - it's exactly uniformly distributed. So splitting at the midway point is something like 80000000-00000000-00000000-00000000.

@jseldess
Copy link
Contributor


v19.2/split-at.md, line 106 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

You're asking why we have manual splits at all? Isn't that discussed in the "Why manually split a range?" section? Manually splitting is certainly much less important these days than it used to be, now that we have load based splitting to automatically split hotspots based on load, but it's still an option we provide. Maybe I'm missing your question though.

With respect to the UUID discussion: UUIDs are actually pretty easy to split because we know their distribution - it's exactly uniformly distributed. So splitting at the midway point is something like 80000000-00000000-00000000-00000000.

Based on my conversation with Eric, I think the question is really: Why should we show users how to split based on the second value in a compound primary key. With partitioning, we partition (auto-split) based on the first value, typically the geo-column. But I don't understand when a user would ever want, or need, to split on the second value. Do you have insights?

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @andreimatei, @jseldess, @nvanbenschoten, and @rmloveland)


v19.2/split-at.md, line 106 at r5 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

Based on my conversation with Eric, I think the question is really: Why should we show users how to split based on the second value in a compound primary key. With partitioning, we partition (auto-split) based on the first value, typically the geo-column. But I don't understand when a user would ever want, or need, to split on the second value. Do you have insights?

Well if you only have a few possible values for the first column in the primary key then you need to start splitting on the second part. For instance, imagine the first column in the primary key of a table was a BOOL. How would I split the table 10 ways to avoid contention?

Copy link
Contributor

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @andreimatei, @ericharmeling, @jseldess, @nvanbenschoten, and @rmloveland)


v19.2/split-at.md, line 106 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Well if you only have a few possible values for the first column in the primary key then you need to start splitting on the second part. For instance, imagine the first column in the primary key of a table was a BOOL. How would I split the table 10 ways to avoid contention?

Ah, I see. Thanks for the explanation.

@ericharmling, I think this is fine.

@ericharmeling
Copy link
Contributor Author

Thanks @nvanbenschoten !

You're asking why we have manual splits at all? Isn't that discussed in the "Why manually split a range?" section? Manually splitting is certainly much less important these days than it used to be, now that we have load based splitting to automatically split hotspots based on load, but it's still an option we provide. Maybe I'm missing your question though.

I meant "in the context of partitioning", in the section "Split a table with a composite primary key" - totally left that out of my question though. To resolve that, I'm just going to remove "(e.g., when working with partitions)" from that section.

- WITH EXPIRATION added to SPLIT AT (new diagrams, new parameter, new example)
@ericharmeling ericharmeling merged commit 1fd88db into master Sep 26, 2019
@ericharmeling ericharmeling deleted the unsplit-at branch September 26, 2019 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants