Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostgreSQL schema:dump includes data #45782

Closed
robclancy opened this issue Jan 24, 2023 · 11 comments
Closed

PostgreSQL schema:dump includes data #45782

robclancy opened this issue Jan 24, 2023 · 11 comments

Comments

@robclancy
Copy link
Contributor

robclancy commented Jan 24, 2023

  • Laravel Version: 8.83.27 (but was looking on 9.x branch and see the issue there)
  • PHP Version: 8.0
  • Database Driver & Version: postgresql 12, pg_dump 12.13

Description:

The artisan schema:dump command for postgresql is dumping the data from other postgres schemas.

In the original pull request it had --schema-only but after refactors it is now using --exclude-table-data for each table on the selected connection. https://github.com/laravel/framework/blob/8.x/src/Illuminate/Database/Schema/PostgresSchemaState.php#L24

If you have other schemas on the database the data for it won't be ignored. The original schema:dump works how I would expect it to work, where it does all database schemas without data (and my use case).
But I see there is a --database option so this fix shouldn't be to exclude all the extra tables but to restrict the pg_dump to only dump the schema for the current connection.

Steps To Reproduce:

Can add later if needed.

@driesvints
Copy link
Member

That indeed doesn't seem correct to me. Would be happy to get some help here with investigating at how we can prevent other connections from being dumped. I'm actually stumped it dumps the other schemas.

@github-actions
Copy link

Thank you for reporting this issue!

As Laravel is an open source project, we rely on the community to help us diagnose and fix issues as it is not possible to research and fix every issue reported to us via GitHub.

If possible, please make a pull request fixing the issue you have described, along with corresponding tests. All pull requests are promptly reviewed by the Laravel team.

Thank you!

@robclancy
Copy link
Contributor Author

robclancy commented Jan 24, 2023

It's because it uses pg_dump and specifies a database which will by default dump all schemas (a postgres "schema" being a separate things makes things confusing to talk about).

pg_dump has the following option...
-n, --schema=PATTERN dump the specified schema(s) only

So really the fix is just getting the schema from the config and then using that option. I don't have time to make a PR until a bit later if no one else does.

@eduance
Copy link
Contributor

eduance commented Jan 26, 2023

That indeed doesn't seem correct to me. Would be happy to get some help here with investigating at how we can prevent other connections from being dumped. I'm actually stumped it dumps the other schemas.

This is due to PostgreSQL and MySQL having different definitions of what schemas are.

For MySQL a schema is defined as:

Conceptually, a schema is a set of interrelated database objects, such as tables, table columns, data types, indexes and so on. In MySQL a schema is synonymous with a database: you can swap out the words schema and database.

For PostgreSQL a schema is defined as:

A schema is a namespace for SQL objects, which all reside in the same database. Each SQL object must reside in exactly one schema.

More generically, the term schema is used to mean all data descriptions.

A schema can be used to organize database objects, or even allow having one database identical tables for different schemas. If you have a database named 'laravel', with schema 'symfony' and schema 'laravel', this database can have two tables named users in a PostgreSQL database.

In a MySQL database, you would simply have a database named laravel and symfony.

PostgreSQL has a default public schema, but for consistency sake I would recommend exporting all schemas for one database, so the artisan dump command simply dumps out all tables.

What we would like to achieve is that we want to get all schemas, without any data, excluding the migrations table.

This would look like something like:

pg_dump [connection_options] -s -T 'migrations' [db_name]

The reason why -s rather than using @robclancy suggestion, is because -n would make us be concerned about choosing which schema we want to use: which in PostgreSQL is a namespace within a database, but our command suggests that we print out the entire database.

Another, more complicated way, would be that Laravel by default always uses the 'public' namespace, and we only print out whatever is in public, but this seems controversial.

@driesvints If you would like to test this, I found a pretty cool way of having some fun with it:

(1) Create a docker container for PostgreSQL https://hub.docker.com/_/postgres
(2) Use https://uibakery.io/sql-playground to create a postgres database and fill it with test data, I used their site
so I could also test use cases for when using an external host. I setup a booking sample website which had about 500 records, all defined
(3) pg_dump -s -T 'appartments' -h host.postgres.database.azure.com -U user@dbname > laravel.sql

Most tables were namespaced under public, but it allowed me to eliminate the appartment table and still get all object definitions within all schemes.

I'll make a PR referring to this issue.

@robclancy
Copy link
Contributor Author

robclancy commented Jan 26, 2023

schema is a database option, so you wouldn't have any issues with specifying which schema to use. When I originally wrote out the issue I suggested that it just have -s added but then I reviewed the code more and changed to the -n suggestion because of how it currently works.

The original PR did exactly as you describe and for my use case that would work better. But if being more strict to the laravel connections I could also dump all my schemas by just doing multiple dumps with each connection (it's an option to choose the connection in the command).

I don't even know why it was refactored like it was in the first place.

EDIT: I looked back through the changes and it was refactored because of a change to using the custom dump format instead of plain sql. Originally they just put queries for the migration data at the end of the sql.

For this issue there is only one quick fix, and that's to select a specific schema. To make it work for all schemas instead it would need to go back to sql to append the migration data. There is probably another way to be able to keep using the custom dump format.

@eduance
Copy link
Contributor

eduance commented Jan 26, 2023

@robclancy I replied to your comment in the PR:
#45805 (comment)

@eduance
Copy link
Contributor

eduance commented Jan 26, 2023

@robclancy What did you by the way mean with schema being a database option? I wasn't able to find any references to a collection of schemes in the code for PostgreSQL. (A database could have multiple schemes, which we shouldn't confuse with the more general definition of a scheme; Postgres and I think Oracle aswell have separate definitions of a scheme opposed to MySQL).

@robclancy
Copy link
Contributor Author

robclancy commented Jan 26, 2023

It's not really documented or in the skeleton app (it probably should be), but it works like you would expect.

if (isset($config['search_path']) || isset($config['schema'])) {

And I assume when not specified it just defaults to public.

I use multiple connections to setup multiple schemas.

@robclancy
Copy link
Contributor Author

robclancy commented Jan 26, 2023

Oh they do have it, but with search_path, which they probably added because schema is confusing.
https://github.com/laravel/laravel/blob/9.x/config/database.php#L77

EDIT: it changed here laravel/laravel#5726. I never knew it supported multiple schemas per connection.

@eduance
Copy link
Contributor

eduance commented Jan 26, 2023

PR is ready for this issue. @driesvints / @taylorotwell could you have a look? This is something that you guys worked on back in the day.

#45805

@driesvints
Copy link
Member

Will be fixed in the next release. Thanks all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants