Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: permit foreign key relationships without indexes on the source side #36859

Closed
jordanlewis opened this issue Apr 16, 2019 · 6 comments
Closed
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@jordanlewis
Copy link
Member

Currently, creating a foreign key between two tables requires that both tables have a "relevant" index that can be used to look values up precisely on both sides.

For example, consider a foreign key from a customers table to a zipcodes table, where a zipcode column in customers references the id table in the zipcodes table. Currently, there needs to be two indexes:

  1. on the id column of the zipcodes table, to permit fast checks for zipcode existence when adding a new customer
  2. on the zipcode column of the customers table, to permit fast checks for any customers that reference a particular zipcode when deleting a row from the zipcodes table.

The first requirement is reasonable, since most schemas with foreign keys commonly need to support inserts into the referencing table.

The second requirement is somewhat unreasonable, because it's very common for a schema to have a reference table that never is deleted from at all. In this case, like the zipcodes table in our example, requiring that "reverse" index on the customers table is a waste of storage and compute resources, since every insert into customers now also requires updating that other index.

This issue tracks permitting foreign key relationships that don't require an index on the source side. I expect that this work will be mostly driven by the movement of foreign key checks into the optimizer. Once the optimizer is the entity deciding which index to use for reverse lookups on deletes, then this issue can be closed, since the optimizer should be able to decide that a precise lookup is unavailable and therefore require falling back to a full table scan, which is the desirable outcome in this scenario.

@RaduBerinde
Copy link
Member

Note that (1) is also required because the foreign key needs foremost to be a key, which can only be guaranteed with a unique index.

There is an extra difficulty here that @knz brought up. We currently use the IndexDescriptors of these two indexes to store the FK relationships themselves (`.ForeignKey/.ReferencedBy); there would need to be non-trivial work to change that representation (along with suitable migration). This falls outside of the work to plan FK stuff in the optimizer. Maybe you want to file a separate issue for that.

@knz
Copy link
Contributor

knz commented Apr 16, 2019

The way that FK relationships are currently encoded in table descriptors is a dumpster fire misdesign and needs to be addressed, independently from the issue at hand.

There are two other problems caused by this dumpster fire misdesign:

  • it's impossible to use multiple indexes to support a single FK relation (useful for geo-partitioning)
  • it's impossible to use an index on only a prefix of the FK tuple

(For context, when I discussed the original design with relevant parties, we found out that it was the fruit of

a) not checking what standard sql had to say
b) not caring to check+understand what pg was doing
c) NIH bias
d) incorrect/incomplete understanding of what FKs represent

— so even when accounting for (or tolerating) human mistakes, which make point d totally a-OK, casual effort on points an and b would have averted the current situation)

@jordanlewis
Copy link
Member Author

@knz since it sounds like you have an opinion on a better design, would you care to share it?

To me, it seems like the correct representation would be simply to store the table and referenced columns of the foreign key. That way, the optimizer would have full freedom as to how to plan the lookup. Does that make sense?

@knz
Copy link
Contributor

knz commented Apr 16, 2019

So for the narrow view of "the semantic information defined by a FOREIGN KEY constraint definition in CREATE", yes there should be some object that merely stores a reference to the referenced table and its columns (not an index). It can also store the various other semantic constraint properties such as match type and cascading actions.

That object presumably would be stored inside TableDescriptor, not IndexDescriptor as done currently.

Also the columns should be referenced by column ID, not by column name. That's simple because (mostly due to a fortunate accident of design) column IDs are stable in cockroachdb.

The table should be referenced by table ID presumably, but then care should be taken because TRUNCATE made table IDs non-stable (btw I hate this "feature" of TRUNCATE, but that's out of scope here).

That would be about it for this "forward" FK constraint definition.


However: beyond that narrow view, casual inspection of structured.proto will reveal the existence of a protobuf type ForeignKeyReference that is (ab)used for several other things besides the FOREIGN KEY constraint definitions.

  • "referenced by" (the reverse relationship from a constraint).

    • this is really an optimization but an important one. It's used both when removing tables (unless CASCADE is given a drop is prevented if there's a reference) and also in cascading actions when a referenced table is mutated.
    • "referenced by" points to ... an index. This is nonsense. It should simply point to a table (ID?) instead. No need for index, column or constraint information in that field.
    • "referenced by" should also be at the level of table descriptors, not index descriptors.
  • "interleaved by" to indicate interleave children.

    • This field can remain in index descriptor (interleaving is per index), however...
    • ... its use of ForeignKeyReference is nonsensical. It only needs to point to a table+index, this has nothing to do with FKs.

@jordanlewis
Copy link
Member Author

I'm closing this as a duplicate of #37255, which has a proposal for how to upgrade the descriptors to avoid the problems we've discussed here already.

@nvanbenschoten
Copy link
Member

Now that #36854 is merged, each of the 4 indexes in TPC-C that are required because of this issue are clearly marked with the suffix "_fk_idx".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

5 participants