Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean up various non-unique indexes #3530

Merged
merged 4 commits into from
Jul 10, 2023
Merged

clean up various non-unique indexes #3530

merged 4 commits into from
Jul 10, 2023

Conversation

davepacheco
Copy link
Collaborator

@davepacheco davepacheco commented Jul 7, 2023

This addresses part (but not all) of #1497 and #2124.

This could use some careful review.

A common pattern here is that we had a non-unique index on some table T property P, like table instance property active_sled_id. I think people usually added these because they wanted to be able to find the rows having a certain value for that property (e.g., Instances on a given Sled). I believe this only works if you don't plan to paginate that query. If you do, you need a pagination key, and that needs to be in the index, too, or else Cockroach can't use the index to serve the paginated query. That's why the index on instance is now on (active_sled_id, id). With the id there, this can be a UNIQUE index.

Comment on lines +110 to +113
CREATE UNIQUE INDEX ON omicron.public.sled (
rack_id,
id
) WHERE
time_deleted IS NULL;
) WHERE time_deleted IS NULL;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the cases mentioned in the PR description. If we want to paginate the sleds in a rack, we need the sled id in this index, and then it can also be unique.

rack_id
) WHERE time_deleted IS NULL;

CREATE INDEX ON omicron.public.sled (
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This index is superfluous because CockroachDB always has an index on the primary key. (Note that having an index on "rack_id" and one on "id" is not the same as having one on (rack_id, id).)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This index also includes a filter predicate time_deleted IS NULL which makes it a partial index no? Perhaps not super relevant for this particular table but in the limit there could be more old/deleted records that a partial index could speed up access for only live records.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out. I had missed that. Are you saying that even if CockroachDB already indexes on "id", it might be beneficial to have an index on "id" limited only to live rows because querying it could be faster? Yeah, tThat's definitely possible. I think we probably want to wait and see if that's true, though, since it's somewhat speculative and has downsides (i.e., the extra disk space and memory usage and cost to update it on writes).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have a doc about it that says as much, but as you say, the benefit is an empirical question. They frame the advantages in contrast with a full index, as opposed to no index/one you're stuck with regardless.

@@ -152,8 +148,9 @@ CREATE TABLE omicron.public.sled_resource (
);

-- Allow looking up all resources which reside on a sled
CREATE INDEX ON omicron.public.sled_resource (
sled_id
CREATE UNIQUE INDEX ON omicron.public.sled_resource (
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the cases mentioned in the PR description. If we want tp aginate the sled_resources on a sled, we need the "id" here, too.

) WHERE time_deleted IS NULL;

CREATE INDEX ON omicron.public.switch (
CREATE UNIQUE INDEX ON omicron.public.switch (
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another of the cases mentioned in the PR description. This one's arguably unnecessary since we don't expect to have many switches in a rack.

rack_id
) WHERE time_deleted IS NULL;

CREATE INDEX ON omicron.public.switch (
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another superfluous index because the primary key always has an index.

Comment on lines +674 to +676
CREATE UNIQUE INDEX ON omicron.public.identity_provider (
silo_id,
name
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the few functional changes in this PR. This index was not unique and I think that was a bug. See this comment.

I also flipped the order of fields in order to use this index to paginate the IdPs in a Silo by name.

Comment on lines +708 to +712
CREATE UNIQUE INDEX ON omicron.public.saml_identity_provider (
silo_id,
id
) WHERE
time_deleted IS NULL;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above -- needed to flip the order of these fields.

Comment on lines +714 to +716
CREATE UNIQUE INDEX ON omicron.public.saml_identity_provider (
silo_id,
name
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this to ensure that SamlIdentityProvider names are unique within a Silo.

@@ -861,8 +865,9 @@ CREATE UNIQUE INDEX ON omicron.public.instance (

-- Allow looking up instances by server. This is particularly
-- useful for resource accounting within a sled.
CREATE INDEX ON omicron.public.instance (
active_sled_id
CREATE UNIQUE INDEX ON omicron.public.instance (
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the cases mentioned in the PR description. To paginate the instances on a sled, the id needs to be here, and then this can be a UNIQUE index.

@@ -976,8 +981,9 @@ CREATE UNIQUE INDEX ON omicron.public.disk (
) WHERE
time_deleted IS NULL;

CREATE INDEX ON omicron.public.disk (
attach_instance_id
CREATE UNIQUE INDEX ON omicron.public.disk (
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the cases mentioned in the PR description. To paginate the disks attached to an instance, the id needs to be here, and then this can be a UNIQUE index.

@davepacheco davepacheco marked this pull request as ready for review July 7, 2023 22:46
@davepacheco davepacheco requested a review from smklein July 7, 2023 22:46
Copy link
Collaborator

@smklein smklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the improvements!

@davepacheco davepacheco merged commit 8035a1e into main Jul 10, 2023
@davepacheco davepacheco deleted the dap/index-fixes branch July 10, 2023 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants