-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add source versioning design doc #645
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all makes sense to me! Only took a quick skim though. Deferring the full review to the experts (@rjobanp and @morsapaes).
1. Deprecate the `table` field in existing source resources. | ||
2. Introduce the new `materialize_table_from_source` resource. | ||
3. Allow both old and new configurations to coexist during a transition period. | ||
|
||
This approach allows users to migrate their configurations gradually: | ||
|
||
- Existing sources can still be created and managed. | ||
- New tables (formerly subsources) will be created as separate `materialize_table_from_source` resources. | ||
- Users can migrate their configurations at their own pace by replacing `table` blocks with `materialize_table_from_source` resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really smart! Will you be able to mix-n-match table
and materialize_source_table
resources for the same source? Or will you always have to migrate an entire source at once?
My hunch is that we should require that you only use one or the other on a per source basis. That seems likely to make the code much simpler and easier to reason about. And it still affords the user quite a bit of flexibility to do the migration incrementally by migrating one source at a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will you be able to mix-n-match table and materialize_source_table resources for the same source? Or will you always have to migrate an entire source at once?
Based on what I've tested so far using main, it looks like that you could have both running simultaneously. So if we were to take the following example:
- Existing standard Postgres source:
resource "materialize_source_postgres" "pg_source" {
name = "pg_source"
cluster_name = "quickstart"
expose_progress {
name = "expose_postgres_progress"
}
postgres_connection {
name = materialize_connection_postgres.postgres_connection.name
}
publication = "mz_source"
table {
upstream_name = "table1"
upstream_schema_name = "public"
}
table {
upstream_name = "table2"
upstream_schema_name = "public"
}
...
}
- Then define a new
materialize_source_table
:
resource "materialize_source_table" "source_table3" {
name = "source_table3"
schema_name = "public"
database_name = "materialize"
source {
name = materialize_source_postgres.pg_source.name
schema_name = "public"
database_name = "materialize"
}
upstream_name = "table2"
upstream_schema_name = "public"
text_columns = [
"updated_at"
]
}
So the user should still be able to slowly do the migration if needed as the table
blocks and the materialize_source_table
are independent, at least until we do the actual migration on the Materialize side, then in that case, users would just need to define all of their tables as materialize_source_table
and as long as we have all of the required information in the catalog tables, we should be able to update the state accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like that you could have both running simultaneously
Yes at the moment we allow both if you have the new enable_create_table_from_source
feature-flag enabled, however we could artificially enforce that you are using either the old model or the new model in the terraform provider on a per-source basis to make things simpler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, if it's going to be supported to have both simultaneously on the database side, then maybe it's actually simpler to exactly mirror that in Terraform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it would be simpler if we could have it as it is at the moment. Allowing us to have both simultaneously on the database side makes things easier for the Terraform implementation as well.
1. Deprecate the `table` field in existing source resources. | ||
2. Introduce the new `materialize_source_table` resource. | ||
3. Allow both old and new configurations to coexist during a transition period. | ||
|
||
This approach allows users to migrate their configurations gradually: | ||
|
||
- Existing sources can still be created and managed. | ||
- New tables (formerly subsources) will be created as separate `materialize_source_table` resources. | ||
- Users can migrate their configurations at their own pace by replacing `table` blocks with `materialize_source_table` resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes sense to me to for allowing users to opt-in to the migration on their own accord!
We've also discussed implementing a catalog migration on the MZ side to automatically migrate all existing sources/subsources over to the new model, and in that case we would alter the CREATE SOURCE
statement to remove the deprecated table
field options. If the user reruns terraform after that migration it would attempt to re-introduce the table
field unless the user was proactive in updating their terraform config in coordination with that migration. Do we think there'd be a way to detect this situation? Perhaps if the table
field is already empty and the enable_create_table_from_source
feature-flag is enabled we just ignore the terraform config for that option and emit a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in general these migrations are so hairy when they involve stuff happening inside Materialize and stuff happening with Terraform.
I don't know that there will be much value in teaching the Terraform configuration to ignore the table
field once the migration occurs. There's no guarantee that users will upgrade to a version of the Terraform provider that has the special handling for table
before we do the catalog migration. And the catalog migration will happen behind Terraform's back, which is pretty sketchy—someone who thought they had everything properly tracked in Terraform will suddenly find that is not the case.
I think our best bet here may be to support the two syntaxes indefinitely, but disable the old syntax for new sources as soon as we can pull it off. That will prevent the problem from getting any worse. Then, after a couple months, hopefully there are few enough sources remaining that use the old syntax that we can go customer by customer and source by source and figure out if they're using Terraform to manage the source or not. If they're using Terraform, I think we're better off having field eng work with them to get through the upgrade manually than trying to do anything automatic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think our best bet here may be to support the two syntaxes indefinitely, but disable the old syntax for new sources as soon as we can pull off.
This sounds very good to me!
I think we're better off having field eng work with them to get through the upgrade manually than trying to do anything automatic.
I will try to prepare a step by step migrations guide on how to switch from the old to the new syntax on the Terraform side.
I have a working PoC of this here: #647 |
Description: "Tables to be ingested from the source. This field is deprecated and will be removed in a future version.", | ||
Type: schema.TypeSet, | ||
Optional: true, | ||
Deprecated: "Use the new `materialize_source_table` resource instead.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should link to the migration guide in these deprecated notices, once it exists!
As per the Source Versioning / Tables from Sources changes: MaterializeInc/materialize#27907