Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: column names are not updated in the foreign keys when renaming a column #42208

Closed
yuzefovich opened this issue Nov 6, 2019 · 18 comments
Closed
Labels
A-schema-descriptors Relating to SQL table/db descriptor handling. A-sql-pgcompat Semantic compatibility with PostgreSQL C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@yuzefovich
Copy link
Member

Our contractor working on PeeWee ORM ran into a bug. It seems to me that when we're renaming a column, not all places are updated correctly - in particular, foreign keys seem to be forgotten. Here is a repro:

CREATE TABLE "users" ("id" SERIAL NOT NULL PRIMARY KEY, "username" TEXT NOT NULL);
-- It seems somehow having two tables FK to users leads to the error.
CREATE TABLE "note" ("id" SERIAL NOT NULL PRIMARY KEY, "user_id" INTEGER NOT NULL, 
                     FOREIGN KEY ("user_id") REFERENCES "users" ("id"));
CREATE INDEX "note_user_id" ON "note" ("user_id");

-- We'll rename the user_id column here.
CREATE TABLE IF NOT EXISTS "tweet" ("id" SERIAL NOT NULL PRIMARY KEY, "user_id" INTEGER, "content" TEXT NOT NULL, 
                                    FOREIGN KEY ("user_id") REFERENCES "users" ("id"));
CREATE INDEX "tweet_user_id" ON "tweet" ("user_id");

ALTER TABLE "tweet" RENAME COLUMN "user_id" TO "new_user_id";

-- Now the constraints table is listing 2x for table tweet.
SELECT DISTINCT kcu.column_name, ccu.table_name, ccu.column_name
FROM information_schema.table_constraints AS tc
JOIN information_schema.key_column_usage AS kcu
  ON (tc.constraint_name = kcu.constraint_name AND
      tc.constraint_schema = kcu.constraint_schema)
JOIN information_schema.constraint_column_usage AS ccu
  ON (ccu.constraint_name = tc.constraint_name AND
      ccu.constraint_schema = tc.constraint_schema)
WHERE tc.constraint_type = 'FOREIGN KEY' AND
      tc.table_name = 'tweet' AND
      tc.table_schema = 'public';

Currently on master we return:

  column_name | table_name | column_name  
+-------------+------------+-------------+
  user_id     | users      | id           
  new_user_id | users      | id           
(2 rows)

whereas Postgres returns:

 column_name | table_name | column_name 
-------------+------------+-------------
 new_user_id | users      | id
(1 row)
@yuzefovich yuzefovich added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-schema-descriptors Relating to SQL table/db descriptor handling. labels Nov 6, 2019
@yuzefovich yuzefovich changed the title sql: column names are not updated in the foreign when during RENAME sql: column names are not updated in the foreign keys when renaming a column Nov 6, 2019
@jordanlewis
Copy link
Member

Lucy would you be able to take a first look at diagnosing this?

@thoszhang
Copy link
Contributor

On first glance, the problem seems likely to be with information_schema and not with the table descriptor, since we don't store column names for foreign keys in the table descriptor. I can reproduce the above results on master, but SHOW CONSTRAINTS (which only uses the table descriptor) displays the correct results:

root@localhost:26257/defaultdb> show constraints from tweet;
  table_name |   constraint_name    | constraint_type |                    details                     | validated
+------------+----------------------+-----------------+------------------------------------------------+-----------+
  tweet      | fk_user_id_ref_users | FOREIGN KEY     | FOREIGN KEY (new_user_id) REFERENCES users(id) |   true
  tweet      | primary              | PRIMARY KEY     | PRIMARY KEY (id ASC)                           |   true
(2 rows)

@jordanlewis does anything seem suspicious to you with information_schema? I'll keep looking into it.

@yuzefovich
Copy link
Member Author

I tried debugging it briefly, looked into how information_schema.key_column_usage was populated, and it seemed to me that the information coming from conInfo, err := table.GetConstraintInfoWithLookup(tableLookup.getTableByID) was incorrect.

@thoszhang
Copy link
Contributor

Incorrect in the sense that it returned two constraints where there should only have been one?

@yuzefovich
Copy link
Member Author

Incorrect in a sense that it contained old column name (but, to be honest, I didn't look into that deep enough, so it may be better to ignore my comments).

@jordanlewis jordanlewis added the A-sql-pgcompat Semantic compatibility with PostgreSQL label Nov 7, 2019
@otan
Copy link
Contributor

otan commented Dec 20, 2019

GetConstraintInfoWithLookup calls AllActiveAndInactiveForeignKeys (and inactive for AllActiveAndInactiveChecks as well) -- so ones being dropped/moved may still linger around from what I can see. is this desired behaviour?

edit: that wasn't it

@otan
Copy link
Contributor

otan commented Dec 20, 2019

okay, the reason is because postgres names their FKs constraints differently to how we name it.

Postgres names: note_user_id_fkey on note table, tweet_user_id_fkey on tweet table
Cockroach names: fk_user_id_ref_users on note table, fk_user_id_ref_users on tweet table.

We don't (and psql doesn't afaict) rename constraints on columns renames.

Since they both collide, this check will reference the note.user_id column, hence why we return two results for that query

I think the correct fix for this is to change the query (but we can also change names as well; they just won't backport well):

SELECT DISTINCT kcu.column_name, ccu.table_name, ccu.column_name
FROM information_schema.table_constraints AS tc
JOIN information_schema.key_column_usage AS kcu
  ON (tc.constraint_name = kcu.constraint_name AND
      tc.constraint_schema = kcu.constraint_schema)
JOIN information_schema.constraint_column_usage AS ccu
  ON (ccu.constraint_name = tc.constraint_name AND
      ccu.constraint_schema = tc.constraint_schema)
WHERE tc.constraint_type = 'FOREIGN KEY' AND
      tc.table_name = 'tweet' AND
      tc.table_schema = 'public'
-- just add this, or fix a join or something
AND kcu.table_name = 'tweet' AND kcu.table_schema = 'public';

@thoszhang
Copy link
Contributor

If I understand correctly, the problem is that the names of FKs referencing the same table are not unique, right? (This problem could come up even with user-provided constraint names.)

In that case, I think that fix should work. Thanks for looking into this.

@otan
Copy link
Contributor

otan commented Jan 6, 2020

Yep! Not sure if we should close :) but it seems like a case of the query being able to be more exact instead

@thoszhang
Copy link
Contributor

I originally thought that query was an internal one when I first looked at this, but now I'm realizing it's presumably in the ORM. Is updating the query actually viable? Even if we do that for this particular ORM, I'm worried we'll run into more compatibility problems if other tools are assuming that FK names are unique on the referencing side.

@otan
Copy link
Contributor

otan commented Jan 9, 2020

Maybe it is worth looking at changing constraint names in general to be more exact.

I think any ORM trying to discern FKs by name instead of referencing columns may run into issues though -- given that people can name constraints themselves as well. Not sure how many ORMs can assume this / whether we can tell them to use alternative methods though...

@thoszhang thoszhang removed their assignment Feb 26, 2021
@ajwerner
Copy link
Contributor

ajwerner commented Mar 2, 2021

@rafiss can you take a look at this and then work with PeeWee on seeing whether this is a problem and if so how we can fix it?

@rafiss
Copy link
Collaborator

rafiss commented Mar 2, 2021

@coleifer Hi! Would you be able to let us know if PeeWeeORM still is encountering an issue here? To summarize the discussion above, if there is still a problem, I will be able to help you come up with an alternate query to find foreign keys that will work better with CockroachDB.

@coleifer
Copy link

coleifer commented Mar 3, 2021

Hi @rafiss - I can look into this and see if the additions described in @otan's comment and seeing if that resolves the issue:

-- just add this, or fix a join or something
AND kcu.table_name = 'tweet' AND kcu.table_schema = 'public';

@coleifer
Copy link

coleifer commented Mar 3, 2021

@otan / @lucy-zhang - do you have a suggestion for a more correct way to introspect the foreign-key constraints for a given table, besides the additional join predicates?

@rafiss - the suggested change seems to be working well and the tests that were previously failing now pass.

@rafiss
Copy link
Collaborator

rafiss commented Mar 3, 2021

@coleifer You have a couple options for making this query.

  1. Do what you did in the test:
SELECT DISTINCT kcu.column_name, ccu.table_name, ccu.column_name
FROM information_schema.table_constraints AS tc
JOIN information_schema.key_column_usage AS kcu
  ON (tc.constraint_name = kcu.constraint_name AND
      tc.constraint_schema = kcu.constraint_schema)
JOIN information_schema.constraint_column_usage AS ccu
  ON (ccu.constraint_name = tc.constraint_name AND
      ccu.constraint_schema = tc.constraint_schema)
WHERE tc.constraint_type = 'FOREIGN KEY' AND
      tc.table_name = <TABLE NAME> AND
      tc.table_schema = <SCHEMA NAME> AND
      kcu.table_name = <TABLE NAME> AND
      kcu.table_schema = <SCHEMA NAME>;
  1. Add the table name to the first join condition:
SELECT DISTINCT kcu.column_name, ccu.table_name, ccu.column_name
FROM information_schema.table_constraints AS tc
JOIN information_schema.key_column_usage AS kcu
  ON (tc.constraint_name = kcu.constraint_name AND
      tc.constraint_schema = kcu.constraint_schema AND
      tc.table_name = kcu.table_name AND
      tc.table_schema = kcu.table_schema)
JOIN information_schema.constraint_column_usage AS ccu
  ON (ccu.constraint_name = tc.constraint_name AND
      ccu.constraint_schema = tc.constraint_schema)
WHERE tc.constraint_type = 'FOREIGN KEY' AND
      tc.table_name = <TABLE NAME> AND
      tc.table_schema = <SCHEMA NAME>;

As far as performance is concerned, it doesn't really matter which option you pick.

@coleifer
Copy link

coleifer commented Mar 3, 2021

Yes, I implemented the latter, moving the additional predicates into the join - but what I meant was: is there another way altogether of introspecting this that the cockroachdb devs would recommend?

Related commit for peewee: coleifer/peewee@d9a5839

@rafiss
Copy link
Collaborator

rafiss commented Mar 3, 2021

This way seems fine. There are several other ways to get the foreign keys of a table, but I don't see any reason to change what you have. You could take a look at what other ORMs do if you're curious. To be clear though, this issue is not related to renaming the column -- it's caused by the different choice that CockroachDB makes for the default constraint name as compared to PostgreSQL.

Because of that, you may want similar logic for PostgreSQL too, since it lets you rename or customize the names of constraints. For example, if you have this in this PostgreSQL, the original query in this ticket would return incorrect results.

rafiss@127:postgres> CREATE TABLE "users" ("id" SERIAL NOT NULL PRIMARY KEY, "username" TEXT NOT NULL);
CREATE TABLE

rafiss@127:postgres> CREATE TABLE "note" ("id" SERIAL NOT NULL PRIMARY KEY, "user_id" INTEGER NOT NULL);
CREATE TABLE

rafiss@127:postgres> ALTER TABLE "note" ADD CONSTRAINT "my_fk" FOREIGN KEY ("user_id") REFERENCES "users" ("id");
ALTER TABLE

rafiss@127:postgres> CREATE TABLE IF NOT EXISTS "tweet" ("id" SERIAL NOT NULL PRIMARY KEY, "user_id" INTEGER, "content" TEXT NOT NULL);
CREATE TABLE

rafiss@127:postgres> ALTER TABLE "tweet" ADD CONSTRAINT "my_fk" FOREIGN KEY ("user_id") REFERENCES "users" ("id");
ALTER TABLE

rafiss@127:postgres> ALTER TABLE "tweet" RENAME COLUMN "user_id" TO "new_user_id";
ALTER TABLE

rafiss@127:postgres> SELECT DISTINCT kcu.column_name, ccu.table_name, ccu.column_name
 FROM information_schema.table_constraints AS tc
 JOIN information_schema.key_column_usage AS kcu
   ON (tc.constraint_name = kcu.constraint_name AND
       tc.constraint_schema = kcu.constraint_schema)
 JOIN information_schema.constraint_column_usage AS ccu
   ON (ccu.constraint_name = tc.constraint_name AND
       ccu.constraint_schema = tc.constraint_schema)
 WHERE tc.constraint_type = 'FOREIGN KEY' AND
       tc.table_name = 'tweet' AND
       tc.table_schema = 'public';
+---------------+--------------+---------------+
| column_name   | table_name   | column_name   |
|---------------+--------------+---------------|
| new_user_id   | users        | id            |
| user_id       | users        | id            |
+---------------+--------------+---------------+

Thanks for making that patch! I'll go ahead and close this issue, but feel free to keep discussing if you have further questions.

@rafiss rafiss closed this as completed Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-descriptors Relating to SQL table/db descriptor handling. A-sql-pgcompat Semantic compatibility with PostgreSQL C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

No branches or pull requests

7 participants