-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Scala API for optimize #961
Changes from all commits
d318e0f
f79331b
d1e0b8e
5d4836f
ff2c5e9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
/* | ||
* Copyright (2021) The Delta Lake Project Authors. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package io.delta.tables | ||
|
||
import org.apache.spark.sql.{DataFrame, Dataset, SparkSession} | ||
import org.apache.spark.sql.catalyst.TableIdentifier | ||
import org.apache.spark.sql.delta.commands.OptimizeTableCommand | ||
import org.apache.spark.sql.delta.util.AnalysisHelper | ||
|
||
|
||
/** | ||
* Builder class for constructing OPTIMIZE command and executing. | ||
* | ||
* @param sparkSession SparkSession to use for execution | ||
* @param tableIdentifier Id of the table on which to | ||
* execute the optimize | ||
* @since 1.3.0 | ||
*/ | ||
class DeltaOptimizeBuilder( | ||
sparkSession: SparkSession, | ||
tableIdentifier: String) extends AnalysisHelper { | ||
private var partitionFilter: Option[String] = None | ||
|
||
/** | ||
* Apply partition filter on this optimize command builder to limit | ||
* the operation on selected partitions. | ||
* @param partitionFilter The partition filter to apply | ||
* @return [[DeltaOptimizeBuilder]] with partition filter applied | ||
*/ | ||
def partitionFilter(partitionFilter: String): DeltaOptimizeBuilder = { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @Kimahriman, had an offline conversation with @tdas. He mentioned one very good point about keeping the names same as SQL. In SQL we have We follow this pattern in other APIs as well. For example in Merge: SQL has Let me know if there are any concerns with the rename. I can make locally change and put it into the merge queue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have no problem with that, just let me know if you want me to change anything or if you'll handle it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @Kimahriman, its ok I can make the change. |
||
this.partitionFilter = Some(partitionFilter) | ||
this | ||
} | ||
|
||
/** | ||
* Z-Order the data in the table using the given columns. | ||
* @param columns Zero or more columns to order the data | ||
* using Z-Order curves | ||
* @return DataFrame containing the OPTIMIZE execution metrics | ||
*/ | ||
def executeZOrderBy(columns: String *): DataFrame = { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hey @Kimahriman, given that this API is not yet supported I think it is better to remove it. No need to make any changes, I will remove this before I put it into the merge queue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah that's fine, wasn't sure whether to include it or not |
||
throw new UnsupportedOperationException("Z ordering is not yet supported") | ||
} | ||
|
||
/** | ||
* Compact the small files in selected partitions. | ||
* @return DataFrame containing the OPTIMIZE execution metrics | ||
*/ | ||
def executeCompaction(): DataFrame = { | ||
val tableId: TableIdentifier = sparkSession | ||
.sessionState | ||
.sqlParser | ||
.parseTableIdentifier(tableIdentifier) | ||
val optimize = OptimizeTableCommand(None, Some(tableId), partitionFilter) | ||
toDataset(sparkSession, optimize) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@since 1.3.0