-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[8.x] Fix issues with dumping PostgreSQL databases that contain multiple schemata #36046
Conversation
The PostgreSQL search_path logic that this commit removes was added with the intention of enabling support for a schema named anything other than "public". While the theory was sound, the implementation didn't take into account the behavior in databases in which *multiple* schemas exist. In multi-schema databases, the list of tables for which data should not be dumped was incorrect, leading to unexpected behavior. This revised approach takes advantage of PostgreSQL's support for pattern-based object references when specifying the list of tables for which data should not be dumped, and eliminates the need to perform complex search_path parsing altogether. The attendant Pull Request documentation explains how this technique works in detail.
@cbj4074 fyi: this PR won't be reviewed until it's out of draft. |
Regarding the other half of the problem (restoring from the dumped file), please see: #36054 |
laravel#36054 was fixed in a previous commit, but appears to have been lost during subsequent edits in laravel@7be50a5 . This commit restores the changes made in laravel@502e75b . Fixes laravel#36054
While not required in a psql interactive terminal, this pattern requires outer double-quotes to function as intended when passed as a CLI argument. While simple in this specific instance, pattern quoting can grow complicated (depending on the pattern), but is well explained in the PostgreSQL manual: https://www.postgresql.org/docs/current/app-psql.html#APP-PSQL-PATTERNS
Now that the exceptions noted in #36054 are resolved, I would add only the following important points of note on this subject:
select concat(schemaname, '.', tablename) as tablename from pg_catalog.pg_tables where schemaname in ('homestead','public'); It's entirely possible that this would have no unwanted side-effects while solving many such problems of ambiguity; we'd just have to look at it in detail. For @taylorotwell or anyone else who may be interested in playing with the various scenarios, the simplest way to test this "manually" is to log into the PostgreSQL instance via CLI, e.g.,
This will give you a sense of how the The last two commands, with Lastly, I suspect that we may still have some work to do when PostgreSQL extensions, such as P.S. @GrahamCampbell Thanks! I never knew that "schemata" is perhaps a more scholarly, though not less accepted nor correct (especially in this context), variant of "schemas". :) |
To preface this post, it is intentionally long-winded (even though the punchline is succinct), because it articulates the complexities and nuances of this and similar situations concerning PostgreSQL's search_path.
Note also that this PR arises out of a continuing discussion regarding how the search_path is handled in various contexts throughout Laravel: #35463 (comment) .
The
schema:dump
Artisan command is designed to dump structures only — not table data. To achieve this, a--exclude-table-data
argument is appended to thepg_dump
command for every table that$connection->getSchemaBuilder()->getAllTables()
returns, excepting only the$migrationTable
, as defined in theSchemaState
class and configured viaconfig('database.migrations')
.In the existing implementation, the underlying logic does not take into account the numerous different formats that the search_path can take (as defined in the connection's configuration properties). And while noble efforts were made to add support for Laravel installations using a non-default schema name (i.e., anything other than
public
), the table data dumping logic doesn't account for the possibility that the database has more than one schema, which causes tables to be dumped with data when they shouldn't be.In a multi-schema environment, and when using a search_path that contains more than one schema, the aforementioned
getAllTables()
method returns a query like this, wherehomestead
andpublic
are schemas listed in thesearch_path
that is set on the connection:Of particular note is that the
products
table is in thehomestead
schema, and the other tables are inpublic
. I mention this because it demonstrates why we cannot use only a single value from thesearch_path
to prefix these references, as is done currently. Given this specific example, doing so would cause all but one of the tables to have the wrong schema qualifier, which would cause their data not to be excluded from the dump.Because the
schemaname
column is not selected in the above query, we can't qualify these references fully when pairing them with--exclude-table-data
switches. However, this is not a problem becausepg_dump
's--exclude-table-data
switch accepts a pattern (see: https://www.postgresql.org/docs/current/app-psql.html#APP-PSQL-PATTERNS ), and so is not limited to a static table reference, and more importantly, PostgreSQL will traverse the search_path while building reference qualifiers in this context.What's this all mean? It means that the schema qualifier should actually be omitted entirely from the reference. So, the appended switches might look something like this:
Now, PostgreSQL will do the "heavy lifting" and traverse the
search_path
, checking each schema therein, in the order specified, if and until it finds the specified table.To demonstrate, PostgreSQL will, ultimately, build the following qualified references, which is exactly what we want:
But, what happens when we have tables with the same name in two different schemas, e.g.,
homestead.foo
andpublic.foo
? We'd end-up with two identical switches, like--exclude-table-data=foo --exclude-table-data=foo
, and table data will be excluded only for the table that PostgreSQL finds first when traversing thesearch_path
.PostgreSQL's pattern capabilities to the rescue! The simple solution to this problem is to use a wildcard schema for each table, e.g.,
--exclude-table-data="*.foo"
. This works because the only goal here is to keep the migration table data, but discard the data for every other table, and this accomplishes that objective.In other words, we can eliminate all the logic around the
search_path
, and offload the complexity to PostgreSQL using a wildcard.IMPORTANT: This PR fixes only the dumping behavior in a multi-schema database, and not the reloading behavior, which has never worked correctly in multi-schema databases, and, in my opinion, should be addressed in a separate PR.Actually, it fixes both issues, now that it includes a previous fix that appears to have been reverted accidentally.