sql: permit foreign key relationships without indexes on the source side #36859

jordanlewis · 2019-04-16T02:52:10Z

Currently, creating a foreign key between two tables requires that both tables have a "relevant" index that can be used to look values up precisely on both sides.

For example, consider a foreign key from a customers table to a zipcodes table, where a zipcode column in customers references the id table in the zipcodes table. Currently, there needs to be two indexes:

on the id column of the zipcodes table, to permit fast checks for zipcode existence when adding a new customer
on the zipcode column of the customers table, to permit fast checks for any customers that reference a particular zipcode when deleting a row from the zipcodes table.

The first requirement is reasonable, since most schemas with foreign keys commonly need to support inserts into the referencing table.

The second requirement is somewhat unreasonable, because it's very common for a schema to have a reference table that never is deleted from at all. In this case, like the zipcodes table in our example, requiring that "reverse" index on the customers table is a waste of storage and compute resources, since every insert into customers now also requires updating that other index.

This issue tracks permitting foreign key relationships that don't require an index on the source side. I expect that this work will be mostly driven by the movement of foreign key checks into the optimizer. Once the optimizer is the entity deciding which index to use for reverse lookups on deletes, then this issue can be closed, since the optimizer should be able to decide that a precise lookup is unavailable and therefore require falling back to a full table scan, which is the desirable outcome in this scenario.

The text was updated successfully, but these errors were encountered:

RaduBerinde · 2019-04-16T13:50:52Z

Note that (1) is also required because the foreign key needs foremost to be a key, which can only be guaranteed with a unique index.

There is an extra difficulty here that @knz brought up. We currently use the IndexDescriptors of these two indexes to store the FK relationships themselves (`.ForeignKey/.ReferencedBy); there would need to be non-trivial work to change that representation (along with suitable migration). This falls outside of the work to plan FK stuff in the optimizer. Maybe you want to file a separate issue for that.

knz · 2019-04-16T14:01:01Z

The way that FK relationships are currently encoded in table descriptors is a ~~dumpster fire~~ misdesign and needs to be addressed, independently from the issue at hand.

There are two other problems caused by this ~~dumpster fire~~ misdesign:

it's impossible to use multiple indexes to support a single FK relation (useful for geo-partitioning)
it's impossible to use an index on only a prefix of the FK tuple

(For context, when I discussed the original design with relevant parties, we found out that it was the fruit of

a) not checking what standard sql had to say
b) not caring to check+understand what pg was doing
c) NIH bias
d) incorrect/incomplete understanding of what FKs represent

— so even when accounting for (or tolerating) human mistakes, which make point d totally a-OK, casual effort on points an and b would have averted the current situation)

jordanlewis · 2019-04-16T17:00:56Z

@knz since it sounds like you have an opinion on a better design, would you care to share it?

To me, it seems like the correct representation would be simply to store the table and referenced columns of the foreign key. That way, the optimizer would have full freedom as to how to plan the lookup. Does that make sense?

knz · 2019-04-16T18:54:20Z

So for the narrow view of "the semantic information defined by a FOREIGN KEY constraint definition in CREATE", yes there should be some object that merely stores a reference to the referenced table and its columns (not an index). It can also store the various other semantic constraint properties such as match type and cascading actions.

That object presumably would be stored inside TableDescriptor, not IndexDescriptor as done currently.

Also the columns should be referenced by column ID, not by column name. That's simple because (mostly due to a fortunate accident of design) column IDs are stable in cockroachdb.

The table should be referenced by table ID presumably, but then care should be taken because TRUNCATE made table IDs non-stable (btw I hate this "feature" of TRUNCATE, but that's out of scope here).

That would be about it for this "forward" FK constraint definition.

However: beyond that narrow view, casual inspection of structured.proto will reveal the existence of a protobuf type ForeignKeyReference that is (ab)used for several other things besides the FOREIGN KEY constraint definitions.

"referenced by" (the reverse relationship from a constraint).
- this is really an optimization but an important one. It's used both when removing tables (unless CASCADE is given a drop is prevented if there's a reference) and also in cascading actions when a referenced table is mutated.
- "referenced by" points to ... an index. This is nonsense. It should simply point to a table (ID?) instead. No need for index, column or constraint information in that field.
- "referenced by" should also be at the level of table descriptors, not index descriptors.
"interleaved by" to indicate interleave children.
- This field can remain in index descriptor (interleaving is per index), however...
- ... its use of ForeignKeyReference is nonsensical. It only needs to point to a table+index, this has nothing to do with FKs.

jordanlewis · 2019-05-01T23:46:55Z

I'm closing this as a duplicate of #37255, which has a proposal for how to upgrade the descriptors to avoid the problems we've discussed here already.

nvanbenschoten · 2019-05-20T21:11:21Z

Now that #36854 is merged, each of the 4 indexes in TPC-C that are required because of this issue are clearly marked with the suffix "_fk_idx".

jordanlewis assigned RaduBerinde Apr 16, 2019

rmloveland mentioned this issue Apr 17, 2019

Note that some ORMs like SQLAlchemy rely on foreign keys to do their join magic cockroachdb/docs#4669

Closed

nstewart mentioned this issue Apr 17, 2019

Remove FKs from reference tables cockroachdb/movr#83

Merged

awoods187 added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 22, 2019

jordanlewis mentioned this issue May 1, 2019

sql: foreign key relationships should encode columns, not indexes #37255

Closed

jordanlewis closed this as completed May 1, 2019

jseldess mentioned this issue Jan 5, 2020

Improve FK Performance and Compatibility cockroachdb/docs#5967

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: permit foreign key relationships without indexes on the source side #36859

sql: permit foreign key relationships without indexes on the source side #36859

jordanlewis commented Apr 16, 2019

RaduBerinde commented Apr 16, 2019

knz commented Apr 16, 2019 •

edited

Loading

jordanlewis commented Apr 16, 2019

knz commented Apr 16, 2019

jordanlewis commented May 1, 2019

nvanbenschoten commented May 20, 2019

sql: permit foreign key relationships without indexes on the source side #36859

sql: permit foreign key relationships without indexes on the source side #36859

Comments

jordanlewis commented Apr 16, 2019

RaduBerinde commented Apr 16, 2019

knz commented Apr 16, 2019 • edited Loading

jordanlewis commented Apr 16, 2019

knz commented Apr 16, 2019

jordanlewis commented May 1, 2019

nvanbenschoten commented May 20, 2019

knz commented Apr 16, 2019 •

edited

Loading