-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support dropping columns as a metadata-only operation #1076
Conversation
GitOrigin-RevId: 5bac741beced1ec847ebcc07d050234f82ece124
private def columnMappingAdviceMessage: String = { | ||
s""" | ||
|Please upgrade your Delta table to reader version 2 and writer version 5 | ||
| and change the column mapping mode to name mapping. You can use the following command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name in quotes?
|
||
def columnRenameNotSupported: Throwable = { | ||
val adviceMsg = columnMappingAdviceMessage | ||
new AnalysisException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we planning to convert the exception to DeltaAnlysisException
?
} | ||
def foundInvalidCharsInColumnNames(cause: Throwable): Throwable = | ||
new DeltaAnalysisException( | ||
errorClass = "UNSUPPORTED_INVALID_CHARACTERS_IN_COLUMN_NAME", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to include the column name here to make it easy for the user to find the column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The column name shows up in the child exception, so i think that would be ok. it's a bit involved to surface it to the top level.
override def run(sparkSession: SparkSession): Seq[Row] = { | ||
if (!sparkSession.sessionState.conf.getConf( | ||
DeltaSQLConf.DELTA_ALTER_TABLE_DROP_COLUMN_ENABLED)) { | ||
val errorMessage = "Drop Column is not enabled in Delta." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop column is not enabled in current Spark session
? May be mention the SQL config to enable it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if we want to mention this flag right now. this flag is mainly for merging code without the support of physical drop.
val droppedColumnSet = columnsToDrop.map(UnresolvedAttribute(_).name).toSet | ||
val droppingPartitionCols = metadata.partitionColumns.exists(droppedColumnSet.contains(_)) | ||
if (droppingPartitionCols) { | ||
throw DeltaErrors.dropPartitionColumnNotSupported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include the partition column name in the error message?
if (sparkSession.sessionState.conf.getConf( | ||
DeltaSQLConf.DELTA_ALTER_TABLE_CHANGE_COLUMN_CHECK_EXPRESSIONS)) { | ||
columnsToDrop.foreach { columnParts => | ||
// need to check if the column to rename is referenced by check constraints | ||
val dependentConstraints = | ||
Constraints.findDependentConstraints(sparkSession, columnParts, newMetadata) | ||
if (dependentConstraints.nonEmpty) { | ||
throw DeltaErrors.foundViolatingConstraintsForColumnChange( | ||
"drop", UnresolvedAttribute(columnParts).name, dependentConstraints) | ||
} | ||
|
||
// need to check if the renaming the column would affect any generated columns | ||
val dependentGenCols = SchemaUtils.findDependentGeneratedColumns( | ||
sparkSession, columnParts, txn.protocol, newMetadata.schema) | ||
if (dependentGenCols.nonEmpty) { | ||
throw DeltaErrors.foundViolatingGeneratedColumnsForColumnChange( | ||
"drop", UnresolvedAttribute(columnParts).name, dependentGenCols) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same code is repeated in RENAME with minimal difference. Convert it to a utility method, so that we don't miss updating one if we add a new check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call.
spark.sql(s"alter table t1 rename column map to `${colName("map")}`") | ||
|
||
// try drop map after rename to arbitrary chars now | ||
spark.sql(s"alter table t1 drop columns `${colName("map")}`") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify it actually removed the column? Same below.
|
||
simpleNestedData.write.format("delta").mode("append").saveAsTable("t1") | ||
|
||
assertException("Cannot drop column a") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check for: Cannot drop column a due to dependency (details on why it can't be dropped)
?
class DeltaDropColumnSuite extends QueryTest with DeltaArbitraryColumnNameSuiteBase { | ||
|
||
override protected val sparkConf: SparkConf = | ||
super.sparkConf.set(DeltaSQLConf.DELTA_ALTER_TABLE_DROP_COLUMN_ENABLED.key, "true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a test with DELTA_ALTER_TABLE_DROP_COLUMN_ENABLED
set to false and try dropping?
override protected val sparkConf: SparkConf = | ||
super.sparkConf.set(DeltaSQLConf.DELTA_ALTER_TABLE_DROP_COLUMN_ENABLED.key, "true") | ||
|
||
test("drop column") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many test cases within a single test. Is possible to split them into smaller and easy to understand tests?
GitOrigin-RevId: a40fd48ca89428254a1f728703417b9a6359a5fd
GitOrigin-RevId: 9829b0fc55bebbbc970faa0ebec5635057951de9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Support ALTER TABLE DROP COLUMN as a metadata only operation. This feature is not ready for release yet, so it's currently behind a feature flag. Closes delta-io#1076 Unit tests GitOrigin-RevId: 181d6dc38f5c074eb16710191795118e78fdf972
Support ALTER TABLE DROP COLUMN as a metadata only operation. This feature is not ready for release yet, so it's currently behind a feature flag. Closes delta-io#1076 Unit tests GitOrigin-RevId: 181d6dc38f5c074eb16710191795118e78fdf972
Description
This PR proposes to support ALTER TABLE DROP COLUMN as a metadata-only operation in Delta. No changes needed for the underlying Parquet files. This is only supported for Delta tables that enable column mapping. With column mapping, if a new column with the same name is added later, it won't be confused with the dropped column because the new column will have a different physical name than the one being dropped.
Note that there is no way to physically delete the dropped column from Parquet right now. This will be addressed as future work.
This PR partially addresses issue #1064.
How was this patch tested?
unit tests
Does this PR introduce any user-facing changes?
Add a new SQL command