-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alter_table: Support adding columns to tables #30470
base: main
Are you sure you want to change the base?
alter_table: Support adding columns to tables #30470
Conversation
b1ba847
to
6edd439
Compare
MitigationsCompleting required mitigations increases Resilience Coverage.
Risk Summary:The risk score for this pull request is high at 80, driven by predictors such as the sum of bug reports of files and the delta of executable lines. Historically, pull requests with these predictors are 114% more likely to cause a bug compared to the repository baseline. Additionally, the repository's observed and predicted bug trends are both decreasing, which is a positive sign. Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity. Bug Hotspots:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I very much appreciate all the tests!
What happens if you call ALTER TABLE ADD COLUMN on a continual task/table from source/mv/...? What if you try to add a column with the name of the table? Can you keep adding columns or do things get slower with number of columns? Could add some tests checking for correct errors in the SLT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adapter code LGTM
.map(|id| self.get_entry_by_global_id(id)) | ||
.filter_map(|entry| entry.index().map(|index| index.on)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I might be missing something, what was the change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just some Rust lifetime shenanigans. .index()
returns an Option<&Index>
but .get_entry_by_global_id(...)
returns an owned type that only lives for the duration of the .map(...)
call
storage_collection_metadata: TableTransaction::new_with_uniqueness_fn( | ||
storage_collection_metadata, | ||
|a: &StorageCollectionMetadataValue, b| a.shard == b.shard, | ||
)?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirming, now we can have multiple global IDs for the same object that all point to the same shard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly
let item_id = self | ||
.entry_by_global_id | ||
.get(id) | ||
.unwrap_or_else(|| panic!("catalog out of sync, missing id {id:?}")); | ||
self.get_entry(item_id) | ||
|
||
let entry = self.get_entry(item_id).clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a little bad to clone the entry in this function. This used to be pretty cheap but now involves cloning potentially large expressions and create sql statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree, it's a bit tricky with Rust lifetimes and the trait CatalogItem
, I'll circle back and see if I can improve this though. There might be a Cow<...>
like thing we can do
impl From<RelationVersion> for SchemaId { | ||
fn from(value: RelationVersion) -> Self { | ||
SchemaId(usize::cast_from(value.0)) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand this, what's the correlation b/w a relation version and a schema version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now RelationVersion
s are 1:1 with SchemaId
s. At some point we can break this relationship and store the mapping somewhere in the Catalog, but it's not necessary at the moment.
fn latest_version(&self) -> Option<RelationVersion> { | ||
self.entry.latest_version() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that this returns an Option
, when would an entry ever not have a version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An entry only has a version if it's version-able, i.e. only Tables will return Some
here.
let is_versioned = c | ||
.options | ||
.iter() | ||
.any(|o| matches!(o.option, ColumnOption::Versioned { .. })); | ||
!is_versioned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we filtering here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment, but it's because of how the names
collection is used, I took a note for myself to refactor this entire block
6edd439
to
02b0137
Compare
@@ -642,14 +649,16 @@ where | |||
// Construct the handle in a separate block to ensure all error paths | |||
// are diverging | |||
let since_handle = { | |||
// If the collection we're openning is versioned, be sure to use a | |||
// different CriticalId so the SinceHandles don't conflict. | |||
let reader_id = match version { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still not clear to me that we're managing the lifecycle of this handle properly... for example, I think the finalization task will only force-downgrade the handle of the controller-global handle, not these per-version handles. (And at a first pass it's not clear to me how all N critical handles get updated when the write frontier advances...)
Still iterating a bit, but the most recent commit removes the need for multiple At a high level the implementation is there, but pushed up the commits to run against CI |
1b30ffa
to
7c3e244
Compare
f18f1b3
to
8e3af36
Compare
* changes Table to use a VersionedRelationDesc * adds CatalogCollectionEntry and move desc(...) method to it * renamed CatalogEntry::desc to CatalogEntry::desc_latest
* implement sequencing in the adapter * updates to the storage controller * change CriticalSinceHandles so we can have multiple per-shard
* add a good number of test cases to alter-table.slt * delete duplicate table_alter.slt * add a platform-check for Alter Table * add a (disable) parallel-workload case for Alter Table * add a legacy upgrade test for Alter Table
* refactors how alter_table_desc is implemented so the storage-controller tracks a dependency between different versions of the tables * removes the API for creating a CriticalId from a 'seed'
* update bootstraping storage collections to properly order Tables * fix upgrade tests by removing reference to old persist dyncfg
* change AstDisplay for ResolvedItemName to not print the version * remove commented our impl of previous sorting * update legacy upgrade test
* a few of the conditions we were testing fail when --auto-index-selects was enabled
* when creating collections for bootstrap, re-order them like we do in storage-collections
* in the storage-controller report the correct dependencies for tables * in the Coordinator register ReadPolicies * in the storage collections install read holds with the existing collections read frontier, not the implied capability * in storage collections set the write frontier of the new collection to the write frontier of the existing collection
8e3af36
to
219f937
Compare
This PR implements the SQL feature
ALTER TABLE ... ADD COLUMN ...
.Note: There are a lot of lines changed but the majority are new tests!
Specifically it:
VersionedRelationDesc
onTable
s to track new columnsCatalogCollectionEntry
which adds some typing around getting the currentRelationDesc
for an entry.storage-controller
to create new PersistWriteHandle
s and pass them to theTxnsTableWorker
CriticalSinceHandle
to open one per-version of a table. This proved necessary to get the proper read handles for Mat Views on top of tables.Otherwise it also adds several tests:
alter-table.slt
which exercises a number of different scenariosCheck
for theplatform-check
s test frameworkAction
for theparallel-workload
test framework.Motivation
Fixes https://github.com/MaterializeInc/database-issues/issues/8233
Tips for reviewer
I split the PR up into separate commits to ideally make it easier to review, most the changes here are new tests!
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.