Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import: no job or progress during setup when importing pgdump/mysqldump #48598

Closed
dt opened this issue May 8, 2020 · 1 comment · Fixed by #55511
Closed

import: no job or progress during setup when importing pgdump/mysqldump #48598

dt opened this issue May 8, 2020 · 1 comment · Fixed by #55511
Assignees

Comments

@dt
Copy link
Member

dt commented May 8, 2020

When importing "bundle" formats like pg_dump or mysqldump which include schema definitions and row-data in a single file, we currently download and parse the file to extract the schema definitions during import planning, because we want to resolve or create all the tables we will import into before we create the IMPORT job.

However when presented with something like a 300GB pg_dump file, this has the unfortunate effect of meaning that the IMPORT statement spends a long time in planning before creating a job, and in those minutes (or hours?) the user has no indication of what is going on -- there is no job to inspect or on which to report progress, even though we're clearly doing bulk-y work.

At the very least, it'd be more user-friendly to move the fetching and parsing of schemas to a prepare step of the actual import job execution, rather than in the planning phase. Ideally we'd avoid the step, and double download and parse of the file entirely though and instead simply parse schema definitions as we go, in the same pass which processes the row data. Unfortunately pg_dump dumps the index definitions after the row data though, while one of the big advantages of IMPORT is that we can generate all the KVs for a row, including index KVs, in one pass, so it isn't clear what the "right" way to do this is. We could optimistically say there are no indexes and import in one pass and then if/when we see indexes, either queue normal index creation and/or a second pass that just generates index kvs for those indexes?

@blathers-crl
Copy link

blathers-crl bot commented May 8, 2020

Hi @dt, please add a C-ategory label to your issue. Check out the label system docs.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants