-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
importccl: prototype support for IMPORT INTO #37451
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's here looks good, but I'm a bit confused by how partial it is. I don't recall the importStmt.Into
field existing. Is that new in this work or did that get added a while ago and I forget? Should there be tests here? If you are going to do this in small pieces where it won't actually work until the end then LGTM and merge away.
importing.Version++ | ||
if err := p.ExecCfg().DB.Txn(ctx, func(ctx context.Context, txn *client.Txn) error { | ||
return errors.Wrap( | ||
txn.CPut(ctx, sqlbase.MakeDescMetadataKey(found.TableDescriptor.ID), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this interact with other schema changes running on the table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cput of the whole desc from before to after should avoid any concurrent writes, but after that we're not yet doing anything to actively lock-out other schema changes. We'll need to.
pkg/ccl/importccl/import_stmt.go
Outdated
if err != nil { | ||
// Take the table offline for import. | ||
importing := found.TableDescriptor | ||
importing.State = sqlbase.TableDescriptor_IMPORTING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do to the table? Users cannot read or write it during this time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, similar to ADD state: can't lease it in SQL code so no reads or writes.
6cb1ca6
to
8e0eef7
Compare
@mjibson certainly partial: for now, trying to get a functioning skeleton in and then we'll fill in the gaps in standalone changes, but we want to get a prototype into the hands of product or design partners in time to collect feedback early in the development. |
This adds a prototype of incremental IMPORT, allowing importing CSV data into an existing table as opposed to only into as new table with current IMPORT. Unlike traditional IMPORT which takes a specification of the table to create, this takes a reference to an existing table into which it will import data. Initially only CSV data, importing into a single table, is supported (the SQL dumpfiles are typically dumps of an entire table so it seems likess likely that we need to support them here for now). Since the actual bulk ingestion is done via non-transactional AddSSTable commands, the table must be taken offline during ingestion. The IMPORT job begins by schema-changing the table to an offline 'IMPORTING' state that should prevent leasing it and moves it back to public when it finishes (on success or failure, unlike a newly table created table which is usually rolled back via a drop on failure). Release note: none.
bors r+ |
37451: importccl: prototype support for IMPORT INTO r=dt a=dt This adds a prototype of incremental IMPORT, allowing importing CSV data into an existing table as opposed to only into as new table with current IMPORT. Unlike traditional IMPORT which takes a specification of the table to create, this takes a reference to an existing table into which it will import data. Initially only CSV data, importing into a single table, is supported (the SQL dumpfiles are typically dumps of an entire table so it seems likess likely that we need to support them here for now). Since the actual bulk ingestion is done via non-transactional AddSSTable commands, the table must be taken offline during ingestion. The IMPORT job begins by schema-changing the table to an offline 'IMPORTING' state that should prevent leasing it and moves it back to public when it finishes (on success or failure, unlike a newly table created table which is usually rolled back via a drop on failure). This prototype, and as such has many unfinshed pieces (some of which are captured as TODOs). Given the number of unresolved UX questions around this feature, we wanted to start playing with something concrete to help guide the feature specification and further development. A few of the more significant areas to be resolved are how the hook uses transactions for table creation, resolution and alteration versus the job creation, resolving, checking and plumbing specified subsets of the columns to IMPORT into, handling rollbacks of partial ingestion, updating job status messages, and a careful audit of everywhere that table descriptors are acquired to ensure the IMPORTING state is handled correctly. That said, these are mostly seperable issues that make better standalone follow-up changes (and can be done in parallel) Release note: none. Co-authored-by: David Taylor <[email protected]>
Build succeeded |
This adds a prototype of incremental IMPORT, allowing importing CSV data
into an existing table as opposed to only into as new table with current
IMPORT.
Unlike traditional IMPORT which takes a specification of the table to
create, this takes a reference to an existing table into which it will
import data. Initially only CSV data, importing into a single table, is
supported (the SQL dumpfiles are typically dumps of an entire table so
it seems likess likely that we need to support them here for now).
Since the actual bulk ingestion is done via non-transactional AddSSTable
commands, the table must be taken offline during ingestion. The IMPORT
job begins by schema-changing the table to an offline 'IMPORTING' state
that should prevent leasing it and moves it back to public when it
finishes (on success or failure, unlike a newly table created table
which is usually rolled back via a drop on failure).
This prototype, and as such has many unfinshed pieces (some of which are
captured as TODOs). Given the number of unresolved UX questions
around this feature, we wanted to start playing with something concrete
to help guide the feature specification and further development.
A few of the more significant areas to be resolved are how the hook uses
transactions for table creation, resolution and alteration versus the
job creation, resolving, checking and plumbing specified subsets of the
columns to IMPORT into, handling rollbacks of partial ingestion,
updating job status messages, and a careful audit of everywhere that
table descriptors are acquired to ensure the IMPORTING state is handled
correctly. That said, these are mostly seperable issues that make better
standalone follow-up changes (and can be done in parallel)
Release note: none.