Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scala API for restoring delta table #863

Conversation

Maks-D
Copy link
Contributor

@Maks-D Maks-D commented Dec 11, 2021

Add possibility to restore delta table using version or timestamp.
Examples:

io.delta.tables.DeltaTable.forPath("/some_delta_path").restoreToVersion(1)
io.delta.tables.DeltaTable.forPath("/some_delta_path").restoreToTimestamp("2021-01-01 00:00:00.000")
io.delta.tables.DeltaTable.forPath("/some_delta_path").restoreToTimestamp("2021-01-01")

Fixes #632

Signed-off-by: Maksym Dovhal [email protected]

Tested locally using spark-shell

sbt package
spark-shell --jars ./core/target/scala-2.12/delta-core_2.12-1.1.0-SNAPSHOT.jar --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
spark.range(2).write.format("delta").mode("overwrite").save("/tmp/delta_restore_test")
spark.range(2,3).withColumnRenamed("id", "id_new").write.option("mergeSchema", "true").format("delta").mode("overwrite").save("/tmp/delta_restore_test")
io.delta.tables.DeltaTable.forPath("/tmp/delta_restore_test").restoreToVersion(0)
io.delta.tables.DeltaTable.forPath("/tmp/delta_restore_test").restoreToTimestamp("2021-12-18 16:40:14.54")
// At next day
io.delta.tables.DeltaTable.forPath("/tmp/delta_restore_test").restoreToVersion(0)
io.delta.tables.DeltaTable.forPath("/tmp/delta_restore_test").restoreToTimestamp("2021-12-19")
io.delta.tables.DeltaTable.forPath("/tmp/delta_restore_test").history().show(false)

Output:

+-------+-----------------------+------+--------+---------+------------------------------------------------------+----+--------+---------+-----------+--------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+--------------------------------------------+
|version|timestamp              |userId|userName|operation|operationParameters                                   |job |notebook|clusterId|readVersion|isolationLevel|isBlindAppend|operationMetrics                                                                                                                                             |userMetadata|engineInfo                                  |
+-------+-----------------------+------+--------+---------+------------------------------------------------------+----+--------+---------+-----------+--------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+--------------------------------------------+
|5      |2021-12-19 09:43:41.604|null  |null    |RESTORE  |{version -> null, timestamp -> 2021-12-19}            |null|null    |null     |4          |Serializable  |false        |{numRestoredFiles -> 2, removedFilesSize -> 1252, numRemovedFiles -> 3, restoredFilesSize -> 794, numOfFilesAfterRestore -> 2, tableSizeAfterRestore -> 794} |null        |Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT|
|4      |2021-12-19 09:43:17.415|null  |null    |RESTORE  |{version -> 0, timestamp -> null}                     |null|null    |null     |3          |Serializable  |false        |{numRestoredFiles -> 3, removedFilesSize -> 794, numRemovedFiles -> 2, restoredFilesSize -> 1252, numOfFilesAfterRestore -> 3, tableSizeAfterRestore -> 1252}|null        |Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT|
|3      |2021-12-18 16:42:14.083|null  |null    |RESTORE  |{version -> null, timestamp -> 2021-12-18 16:40:14.54}|null|null    |null     |2          |Serializable  |false        |{numRestoredFiles -> 2, removedFilesSize -> 1252, numRemovedFiles -> 3, restoredFilesSize -> 794, numOfFilesAfterRestore -> 2, tableSizeAfterRestore -> 794} |null        |Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT|
|2      |2021-12-18 16:40:53.861|null  |null    |RESTORE  |{version -> 0, timestamp -> null}                     |null|null    |null     |1          |Serializable  |false        |{numRestoredFiles -> 3, removedFilesSize -> 794, numRemovedFiles -> 2, restoredFilesSize -> 1252, numOfFilesAfterRestore -> 3, tableSizeAfterRestore -> 1252}|null        |Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT|
|1      |2021-12-18 16:40:14.54 |null  |null    |WRITE    |{mode -> Overwrite, partitionBy -> []}                |null|null    |null     |0          |Serializable  |false        |{numFiles -> 2, numOutputRows -> 1, numOutputBytes -> 794}                                                                                                   |null        |Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT|
|0      |2021-12-18 16:40:08.045|null  |null    |WRITE    |{mode -> Overwrite, partitionBy -> []}                |null|null    |null     |null       |Serializable  |false        |{numFiles -> 3, numOutputRows -> 2, numOutputBytes -> 1252}                                                                                                  |null        |Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT|
+-------+-----------------------+------+--------+---------+------------------------------------------------------+----+--------+---------+-----------+--------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+--------------------------------------------+

Examples of transactions:
/tmp/delta_restore_test/_delta_log/00000000000000000000.json

{"protocol":{"minReaderVersion":1,"minWriterVersion":2}}
{"metaData":{"id":"b090f082-f927-4372-9537-9623ae280ad8","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1639838404635}}
{"add":{"path":"part-00000-cb74dd35-ae80-4b3a-b97c-ea492e11ddc3-c000.snappy.parquet","partitionValues":{},"size":296,"modificationTime":1639838406641,"dataChange":true}}
{"add":{"path":"part-00005-beac50f7-dbe7-40b7-9ce2-5e0e6a1607ad-c000.snappy.parquet","partitionValues":{},"size":478,"modificationTime":1639838406642,"dataChange":true}}
{"add":{"path":"part-00011-7e35258f-a724-43f3-8622-c7efa51f01a6-c000.snappy.parquet","partitionValues":{},"size":478,"modificationTime":1639838406642,"dataChange":true}}
{"commitInfo":{"timestamp":1639838407868,"operation":"WRITE","operationParameters":{"mode":"Overwrite","partitionBy":"[]"},"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numFiles":"3","numOutputRows":"2","numOutputBytes":"1252"},"engineInfo":"Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT"}}

/tmp/delta_restore_test/_delta_log/00000000000000000001.json

{"metaData":{"id":"b090f082-f927-4372-9537-9623ae280ad8","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"id_new\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1639838404635}}
{"add":{"path":"part-00000-b71c3566-429b-4038-8127-6e480656038c-c000.snappy.parquet","partitionValues":{},"size":304,"modificationTime":1639838414041,"dataChange":true}}
{"add":{"path":"part-00011-7a5341e6-4876-467a-b33d-56d8dc0bf243-c000.snappy.parquet","partitionValues":{},"size":490,"modificationTime":1639838414041,"dataChange":true}}
{"remove":{"path":"part-00000-cb74dd35-ae80-4b3a-b97c-ea492e11ddc3-c000.snappy.parquet","deletionTimestamp":1639838414511,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":296}}
{"remove":{"path":"part-00011-7e35258f-a724-43f3-8622-c7efa51f01a6-c000.snappy.parquet","deletionTimestamp":1639838414511,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":478}}
{"remove":{"path":"part-00005-beac50f7-dbe7-40b7-9ce2-5e0e6a1607ad-c000.snappy.parquet","deletionTimestamp":1639838414511,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":478}}
{"commitInfo":{"timestamp":1639838414511,"operation":"WRITE","operationParameters":{"mode":"Overwrite","partitionBy":"[]"},"readVersion":0,"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numFiles":"2","numOutputRows":"1","numOutputBytes":"794"},"engineInfo":"Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT"}}

/tmp/delta_restore_test/_delta_log/00000000000000000002.json

{"commitInfo":{"timestamp":1639838436332,"operation":"RESTORE","operationParameters":{"version":0,"timestamp":null},"readVersion":1,"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numRestoredFiles":"3","removedFilesSize":"794","numRemovedFiles":"2","restoredFilesSize":"1252","numOfFilesAfterRestore":"3","tableSizeAfterRestore":"1252"},"engineInfo":"Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT"}}
{"metaData":{"id":"b090f082-f927-4372-9537-9623ae280ad8","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1639838404635}}
{"add":{"path":"part-00005-beac50f7-dbe7-40b7-9ce2-5e0e6a1607ad-c000.snappy.parquet","partitionValues":{},"size":478,"modificationTime":1639838406642,"dataChange":true}}
{"add":{"path":"part-00011-7e35258f-a724-43f3-8622-c7efa51f01a6-c000.snappy.parquet","partitionValues":{},"size":478,"modificationTime":1639838406642,"dataChange":true}}
{"add":{"path":"part-00000-cb74dd35-ae80-4b3a-b97c-ea492e11ddc3-c000.snappy.parquet","partitionValues":{},"size":296,"modificationTime":1639838406641,"dataChange":true}}
{"remove":{"path":"part-00011-7a5341e6-4876-467a-b33d-56d8dc0bf243-c000.snappy.parquet","deletionTimestamp":1639838435578,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":490}}
{"remove":{"path":"part-00000-b71c3566-429b-4038-8127-6e480656038c-c000.snappy.parquet","deletionTimestamp":1639838435586,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":304}}

/tmp/delta_restore_test/_delta_log/00000000000000000003.json

{"commitInfo":{"timestamp":1639838517073,"operation":"RESTORE","operationParameters":{"version":null,"timestamp":"2021-12-18 16:40:14.54"},"readVersion":2,"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numRestoredFiles":"2","removedFilesSize":"1252","numRemovedFiles":"3","restoredFilesSize":"794","numOfFilesAfterRestore":"2","tableSizeAfterRestore":"794"},"engineInfo":"Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT"}}
{"metaData":{"id":"b090f082-f927-4372-9537-9623ae280ad8","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"id_new\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1639838404635}}
{"add":{"path":"part-00011-7a5341e6-4876-467a-b33d-56d8dc0bf243-c000.snappy.parquet","partitionValues":{},"size":490,"modificationTime":1639838414041,"dataChange":true}}
{"add":{"path":"part-00000-b71c3566-429b-4038-8127-6e480656038c-c000.snappy.parquet","partitionValues":{},"size":304,"modificationTime":1639838414041,"dataChange":true}}
{"remove":{"path":"part-00005-beac50f7-dbe7-40b7-9ce2-5e0e6a1607ad-c000.snappy.parquet","deletionTimestamp":1639838516199,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":478}}
{"remove":{"path":"part-00011-7e35258f-a724-43f3-8622-c7efa51f01a6-c000.snappy.parquet","deletionTimestamp":1639838516199,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":478}}
{"remove":{"path":"part-00000-cb74dd35-ae80-4b3a-b97c-ea492e11ddc3-c000.snappy.parquet","deletionTimestamp":1639838516202,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":296}}

/tmp/delta_restore_test/_delta_log/00000000000000000004.json

{"commitInfo":{"timestamp":1639899780668,"operation":"RESTORE","operationParameters":{"version":0,"timestamp":null},"readVersion":3,"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numRestoredFiles":"3","removedFilesSize":"794","numRemovedFiles":"2","restoredFilesSize":"1252","numOfFilesAfterRestore":"3","tableSizeAfterRestore":"1252"},"engineInfo":"Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT"}}
{"metaData":{"id":"b090f082-f927-4372-9537-9623ae280ad8","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1639838404635}}
{"add":{"path":"part-00005-beac50f7-dbe7-40b7-9ce2-5e0e6a1607ad-c000.snappy.parquet","partitionValues":{},"size":478,"modificationTime":1639838406642,"dataChange":true}}
{"add":{"path":"part-00011-7e35258f-a724-43f3-8622-c7efa51f01a6-c000.snappy.parquet","partitionValues":{},"size":478,"modificationTime":1639838406642,"dataChange":true}}
{"add":{"path":"part-00000-cb74dd35-ae80-4b3a-b97c-ea492e11ddc3-c000.snappy.parquet","partitionValues":{},"size":296,"modificationTime":1639838406641,"dataChange":true}}
{"remove":{"path":"part-00011-7a5341e6-4876-467a-b33d-56d8dc0bf243-c000.snappy.parquet","deletionTimestamp":1639899779981,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":490}}
{"remove":{"path":"part-00000-b71c3566-429b-4038-8127-6e480656038c-c000.snappy.parquet","deletionTimestamp":1639899779981,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":304}}

/tmp/delta_restore_test/_delta_log/00000000000000000005.json

{"commitInfo":{"timestamp":1639899805962,"operation":"RESTORE","operationParameters":{"version":null,"timestamp":"2021-12-19"},"readVersion":4,"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numRestoredFiles":"2","removedFilesSize":"1252","numRemovedFiles":"3","restoredFilesSize":"794","numOfFilesAfterRestore":"2","tableSizeAfterRestore":"794"},"engineInfo":"Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT"}}
{"metaData":{"id":"b090f082-f927-4372-9537-9623ae280ad8","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"id_new\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1639838404635}}
{"add":{"path":"part-00011-7a5341e6-4876-467a-b33d-56d8dc0bf243-c000.snappy.parquet","partitionValues":{},"size":490,"modificationTime":1639838414041,"dataChange":true}}
{"add":{"path":"part-00000-b71c3566-429b-4038-8127-6e480656038c-c000.snappy.parquet","partitionValues":{},"size":304,"modificationTime":1639838414041,"dataChange":true}}
{"remove":{"path":"part-00005-beac50f7-dbe7-40b7-9ce2-5e0e6a1607ad-c000.snappy.parquet","deletionTimestamp":1639899805448,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":478}}
{"remove":{"path":"part-00011-7e35258f-a724-43f3-8622-c7efa51f01a6-c000.snappy.parquet","deletionTimestamp":1639899805445,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":478}}
{"remove":{"path":"part-00000-cb74dd35-ae80-4b3a-b97c-ea492e11ddc3-c000.snappy.parquet","deletionTimestamp":1639899805444,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":296}}

@Maks-D Maks-D force-pushed the Maks-D/scala_api_of_delta_restore_operation branch from c1b08d4 to 2cfd1ee Compare December 11, 2021 13:54
@Maks-D Maks-D marked this pull request as draft December 11, 2021 20:55
@Maks-D Maks-D force-pushed the Maks-D/scala_api_of_delta_restore_operation branch from 2cfd1ee to fca9b7c Compare December 12, 2021 07:06
@Maks-D Maks-D marked this pull request as ready for review December 12, 2021 08:30
@scottsand-db scottsand-db added the acknowledged This issue has been read and acknowledged by Delta admins label Dec 14, 2021
@scottsand-db
Copy link
Collaborator

Hi @Maks-D, thanks for this PR! We will take a look and get back to you.

@scottsand-db scottsand-db requested a review from tdas December 14, 2021 15:14
@tdas
Copy link
Contributor

tdas commented Dec 14, 2021

This is fantastic! Thank you for building this! I am going to take a look at this soon.

@Maks-D
Copy link
Contributor Author

Maks-D commented Dec 16, 2021

@tdas Thank you for review! I am going to solve requested changes soon.

@tdas
Copy link
Contributor

tdas commented Dec 16, 2021

@Maks-D thank you ... looking forward to them.

Maksym Dovhal added 2 commits December 19, 2021 09:54
Add possibility to restore delta table using version or timestamp.
Examples:
io.delta.tables.DeltaTable.forPath("/some_delta_path").restoreToVersion(1)
io.delta.tables.DeltaTable.forPath("/some_delta_path").restoreToTimestamp("2021-01-01 00:00:00.000")
io.delta.tables.DeltaTable.forPath("/some_delta_path").restoreToTimestamp("2021-01-01")

Fixes delta-io#632

Signed-off-by: Maksym Dovhal <[email protected]>
@Maks-D Maks-D force-pushed the Maks-D/scala_api_of_delta_restore_operation branch from fca9b7c to afbf020 Compare December 19, 2021 07:57
@Maks-D
Copy link
Contributor Author

Maks-D commented Dec 19, 2021

@tdas I've fixed requested changes.
Except this I decided to refactor RestoreTableCommand.run and now method uses txn.snapshot as a latest available snapshot.
PR description was updated according to new API and with the latest local tests

"left_anti")
.as[AddFile]
.map(_.copy(dataChange = true))
.cache() // To avoid Dataset recompute for each partition of toLocalIterator()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would localIterator cause recomputation. local iterator would compute partition one by one, but that should not cause any reprocessing of previously processed partitions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that comment was really confusing. I've updated comment to make it more correct and clear

@tdas
Copy link
Contributor

tdas commented Dec 20, 2021

other than the cache part.. this looks good.

@tdas
Copy link
Contributor

tdas commented Dec 21, 2021

@Maks-D I still dont understand the need for caching for the filesToRemove. When partitions are computed one-by-one when toLocalIterator is consumed, data produced by each partition will get cached but never reused, as each partition is going to be read-only once.

@Maks-D
Copy link
Contributor Author

Maks-D commented Dec 22, 2021

@tdas
I can't say that this cache has sense in practice, because I don't have access to the delta tables with millions of files.
More over, for computation of fileToRemove and filesToAdd we use already cached snapshots and maybe we can skip caching at all.

I want to describe this decision (maybe wrong and without sense):
Initially I found Note in toLocalIterator:

Note: this results in multiple Spark jobs, and if the input Dataset is the result of a wide transformation (e.g. join with different partitioners), to avoid recomputing the input Dataset should be cached first.

I decided to look into the code deeper and found that toLocalIterator calls org.apache.spark.rdd.RDD#toLocalIterator and this RDD method triggers sc.runJob (my understanding this is action) for each partition. Due to possible join with shuffle (in the worst case) we need to re-join data to get the data for a specific partition, because for each partition we have a separate action on lazy RDD.
For broadcast join (I think it will be the case for the most of restore operations) this has no sense, because data in partitions depends on incoming data and we can easily get this data without any additional computation.

Maybe my understanding is wrong and I don't take into account some spark optimisation :(.
If in the real life spark doesn't need to recompute RDD for each partition, we definitely don't need cache there and I will be happy remove it. Also, in this case I can execute filesToAdd.cache only if ignoreMissing is false and that will be one more optimisation.

@Maks-D
Copy link
Contributor Author

Maks-D commented Jan 9, 2022

@tdas Could you please review my comment related to caching #863 (comment) . If my understanding of toLocalIterator is wrong and cache is unnecessary, I can remove it.
Just want to finish this PR and merge new functionality to oss delta.
Thank you!

@tdas
Copy link
Contributor

tdas commented Jan 18, 2022

@Maks-D my apologies for the massive delay. Here are the updates from my side.

  • I have started the process of merging this PR. It should get merged by tomorrow. I dont want to block the PR over the cache() discussion, we acn always update it.
  • Regarding the cache, here is what I think will happen. Each instance of runJob started by the localIterator should reuse the shuffle output between them. That is, first call of the toLocalIterator is going to trigger the map side of the shuffle which will produce all the map outputs in the local disks of the executors. The reduce side of the first runJob will however read only the map output parts needed to full the partition(s) to be computed (e.g., map output only for reducer 1). Subsequent runJobs will reuse those map output from the first runJob, but read a different parts of that map output (e.g., map output for reducer 2). So the map side of the shuffle should not be re-triggered for every run job, and every run job should reuse pre-computed map output. So I dont think cache makes a huge performance improvement here. Rather it carries a much larger risk of perf degradation due to memory spilling, memory cache invalidation, etc.

If you are good with this suggestion, i suggest you make a follow up PR (after this one is merged) to:

  1. remove the cache
  2. add python API (would complete the programmatic APIs)
    what do you think?

@Maks-D
Copy link
Contributor Author

Maks-D commented Jan 18, 2022

@tdas Thank you for detailed explanation!
I agree with your proposals 👍
I've already created #890 for python API and I will remove the cache.

Maks-D pushed a commit to Maks-D/delta that referenced this pull request Jan 26, 2022
 * RestoreTableCommand moved to org.apache.spark.sql.delta.commands package
 * cache() of filesToRemove DataFame removed (according to delta-io#863 (comment))
 * cache() of filesToAdd will be applied only if spark.sql.files.ignoreMissingFiles = false (default value)

Signed-off-by: Maksym Dovhal <[email protected]>
Maks-D pushed a commit to Maks-D/delta that referenced this pull request Jan 26, 2022
 * RestoreTableCommand moved to org.apache.spark.sql.delta.commands package
 * cache() of filesToRemove DataFame removed (according to delta-io#863 (comment))
 * cache() of filesToAdd will be applied only if spark.sql.files.ignoreMissingFiles = false (default value)

Signed-off-by: Maksym Dovhal <[email protected]>
Maks-D pushed a commit to Maks-D/delta that referenced this pull request Jan 26, 2022
 * RestoreTableCommand moved to org.apache.spark.sql.delta.commands package
 * cache() of filesToRemove DataFame removed (according to delta-io#863 (comment))
 * cache() of filesToAdd will be applied only if spark.sql.files.ignoreMissingFiles = false (default value)

Signed-off-by: Maksym Dovhal <[email protected]>
allisonport-db pushed a commit that referenced this pull request Feb 4, 2022
 * RestoreTableCommand moved to org.apache.spark.sql.delta.commands package
 * cache() of filesToRemove DataFame removed (according to #863 (comment))
 * cache() of filesToAdd will be applied only if spark.sql.files.ignoreMissingFiles = false (default value)

Signed-off-by: Maksym Dovhal <[email protected]>

Closes #912

Signed-off-by: Venki Korukanti <[email protected]>
GitOrigin-RevId: b10707c96766f74423874f01898587f97c69c6b5
jbguerraz pushed a commit to jbguerraz/delta that referenced this pull request Jul 6, 2022
Add possibility to restore delta table using version or timestamp.

Examples:
```scala
io.delta.tables.DeltaTable.forPath("/some_delta_path").restoreToVersion(1)
io.delta.tables.DeltaTable.forPath("/some_delta_path").restoreToTimestamp("2021-01-01 00:00:00.000")
io.delta.tables.DeltaTable.forPath("/some_delta_path").restoreToTimestamp("2021-01-01")
```
Fixes delta-io#632

Signed-off-by: Maksym Dovhal <[email protected]>

Tested locally using spark-shell
```bash
sbt package
spark-shell --jars ./core/target/scala-2.12/delta-core_2.12-1.1.0-SNAPSHOT.jar --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
```

```scala
spark.range(2).write.format("delta").mode("overwrite").save("/tmp/delta_restore_test")
spark.range(2,3).withColumnRenamed("id", "id_new").write.option("mergeSchema", "true").format("delta").mode("overwrite").save("/tmp/delta_restore_test")
io.delta.tables.DeltaTable.forPath("/tmp/delta_restore_test").restoreToVersion(0)
io.delta.tables.DeltaTable.forPath("/tmp/delta_restore_test").restoreToTimestamp("2021-12-18 16:40:14.54")
// At next day
io.delta.tables.DeltaTable.forPath("/tmp/delta_restore_test").restoreToVersion(0)
io.delta.tables.DeltaTable.forPath("/tmp/delta_restore_test").restoreToTimestamp("2021-12-19")
io.delta.tables.DeltaTable.forPath("/tmp/delta_restore_test").history().show(false)
```
Output:
```
+-------+-----------------------+------+--------+---------+------------------------------------------------------+----+--------+---------+-----------+--------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+--------------------------------------------+
|version|timestamp              |userId|userName|operation|operationParameters                                   |job |notebook|clusterId|readVersion|isolationLevel|isBlindAppend|operationMetrics                                                                                                                                             |userMetadata|engineInfo                                  |
+-------+-----------------------+------+--------+---------+------------------------------------------------------+----+--------+---------+-----------+--------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+--------------------------------------------+
|5      |2021-12-19 09:43:41.604|null  |null    |RESTORE  |{version -> null, timestamp -> 2021-12-19}            |null|null    |null     |4          |Serializable  |false        |{numRestoredFiles -> 2, removedFilesSize -> 1252, numRemovedFiles -> 3, restoredFilesSize -> 794, numOfFilesAfterRestore -> 2, tableSizeAfterRestore -> 794} |null        |Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT|
|4      |2021-12-19 09:43:17.415|null  |null    |RESTORE  |{version -> 0, timestamp -> null}                     |null|null    |null     |3          |Serializable  |false        |{numRestoredFiles -> 3, removedFilesSize -> 794, numRemovedFiles -> 2, restoredFilesSize -> 1252, numOfFilesAfterRestore -> 3, tableSizeAfterRestore -> 1252}|null        |Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT|
|3      |2021-12-18 16:42:14.083|null  |null    |RESTORE  |{version -> null, timestamp -> 2021-12-18 16:40:14.54}|null|null    |null     |2          |Serializable  |false        |{numRestoredFiles -> 2, removedFilesSize -> 1252, numRemovedFiles -> 3, restoredFilesSize -> 794, numOfFilesAfterRestore -> 2, tableSizeAfterRestore -> 794} |null        |Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT|
|2      |2021-12-18 16:40:53.861|null  |null    |RESTORE  |{version -> 0, timestamp -> null}                     |null|null    |null     |1          |Serializable  |false        |{numRestoredFiles -> 3, removedFilesSize -> 794, numRemovedFiles -> 2, restoredFilesSize -> 1252, numOfFilesAfterRestore -> 3, tableSizeAfterRestore -> 1252}|null        |Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT|
|1      |2021-12-18 16:40:14.54 |null  |null    |WRITE    |{mode -> Overwrite, partitionBy -> []}                |null|null    |null     |0          |Serializable  |false        |{numFiles -> 2, numOutputRows -> 1, numOutputBytes -> 794}                                                                                                   |null        |Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT|
|0      |2021-12-18 16:40:08.045|null  |null    |WRITE    |{mode -> Overwrite, partitionBy -> []}                |null|null    |null     |null       |Serializable  |false        |{numFiles -> 3, numOutputRows -> 2, numOutputBytes -> 1252}                                                                                                  |null        |Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT|
+-------+-----------------------+------+--------+---------+------------------------------------------------------+----+--------+---------+-----------+--------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+--------------------------------------------+
```
Examples of transactions:
/tmp/delta_restore_test/_delta_log/00000000000000000000.json
```json
{"protocol":{"minReaderVersion":1,"minWriterVersion":2}}
{"metaData":{"id":"b090f082-f927-4372-9537-9623ae280ad8","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1639838404635}}
{"add":{"path":"part-00000-cb74dd35-ae80-4b3a-b97c-ea492e11ddc3-c000.snappy.parquet","partitionValues":{},"size":296,"modificationTime":1639838406641,"dataChange":true}}
{"add":{"path":"part-00005-beac50f7-dbe7-40b7-9ce2-5e0e6a1607ad-c000.snappy.parquet","partitionValues":{},"size":478,"modificationTime":1639838406642,"dataChange":true}}
{"add":{"path":"part-00011-7e35258f-a724-43f3-8622-c7efa51f01a6-c000.snappy.parquet","partitionValues":{},"size":478,"modificationTime":1639838406642,"dataChange":true}}
{"commitInfo":{"timestamp":1639838407868,"operation":"WRITE","operationParameters":{"mode":"Overwrite","partitionBy":"[]"},"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numFiles":"3","numOutputRows":"2","numOutputBytes":"1252"},"engineInfo":"Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT"}}
```
/tmp/delta_restore_test/_delta_log/00000000000000000001.json
```json
{"metaData":{"id":"b090f082-f927-4372-9537-9623ae280ad8","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"id_new\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1639838404635}}
{"add":{"path":"part-00000-b71c3566-429b-4038-8127-6e480656038c-c000.snappy.parquet","partitionValues":{},"size":304,"modificationTime":1639838414041,"dataChange":true}}
{"add":{"path":"part-00011-7a5341e6-4876-467a-b33d-56d8dc0bf243-c000.snappy.parquet","partitionValues":{},"size":490,"modificationTime":1639838414041,"dataChange":true}}
{"remove":{"path":"part-00000-cb74dd35-ae80-4b3a-b97c-ea492e11ddc3-c000.snappy.parquet","deletionTimestamp":1639838414511,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":296}}
{"remove":{"path":"part-00011-7e35258f-a724-43f3-8622-c7efa51f01a6-c000.snappy.parquet","deletionTimestamp":1639838414511,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":478}}
{"remove":{"path":"part-00005-beac50f7-dbe7-40b7-9ce2-5e0e6a1607ad-c000.snappy.parquet","deletionTimestamp":1639838414511,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":478}}
{"commitInfo":{"timestamp":1639838414511,"operation":"WRITE","operationParameters":{"mode":"Overwrite","partitionBy":"[]"},"readVersion":0,"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numFiles":"2","numOutputRows":"1","numOutputBytes":"794"},"engineInfo":"Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT"}}
```
/tmp/delta_restore_test/_delta_log/00000000000000000002.json
```json
{"commitInfo":{"timestamp":1639838436332,"operation":"RESTORE","operationParameters":{"version":0,"timestamp":null},"readVersion":1,"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numRestoredFiles":"3","removedFilesSize":"794","numRemovedFiles":"2","restoredFilesSize":"1252","numOfFilesAfterRestore":"3","tableSizeAfterRestore":"1252"},"engineInfo":"Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT"}}
{"metaData":{"id":"b090f082-f927-4372-9537-9623ae280ad8","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1639838404635}}
{"add":{"path":"part-00005-beac50f7-dbe7-40b7-9ce2-5e0e6a1607ad-c000.snappy.parquet","partitionValues":{},"size":478,"modificationTime":1639838406642,"dataChange":true}}
{"add":{"path":"part-00011-7e35258f-a724-43f3-8622-c7efa51f01a6-c000.snappy.parquet","partitionValues":{},"size":478,"modificationTime":1639838406642,"dataChange":true}}
{"add":{"path":"part-00000-cb74dd35-ae80-4b3a-b97c-ea492e11ddc3-c000.snappy.parquet","partitionValues":{},"size":296,"modificationTime":1639838406641,"dataChange":true}}
{"remove":{"path":"part-00011-7a5341e6-4876-467a-b33d-56d8dc0bf243-c000.snappy.parquet","deletionTimestamp":1639838435578,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":490}}
{"remove":{"path":"part-00000-b71c3566-429b-4038-8127-6e480656038c-c000.snappy.parquet","deletionTimestamp":1639838435586,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":304}}
```
/tmp/delta_restore_test/_delta_log/00000000000000000003.json
```json
{"commitInfo":{"timestamp":1639838517073,"operation":"RESTORE","operationParameters":{"version":null,"timestamp":"2021-12-18 16:40:14.54"},"readVersion":2,"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numRestoredFiles":"2","removedFilesSize":"1252","numRemovedFiles":"3","restoredFilesSize":"794","numOfFilesAfterRestore":"2","tableSizeAfterRestore":"794"},"engineInfo":"Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT"}}
{"metaData":{"id":"b090f082-f927-4372-9537-9623ae280ad8","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"id_new\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1639838404635}}
{"add":{"path":"part-00011-7a5341e6-4876-467a-b33d-56d8dc0bf243-c000.snappy.parquet","partitionValues":{},"size":490,"modificationTime":1639838414041,"dataChange":true}}
{"add":{"path":"part-00000-b71c3566-429b-4038-8127-6e480656038c-c000.snappy.parquet","partitionValues":{},"size":304,"modificationTime":1639838414041,"dataChange":true}}
{"remove":{"path":"part-00005-beac50f7-dbe7-40b7-9ce2-5e0e6a1607ad-c000.snappy.parquet","deletionTimestamp":1639838516199,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":478}}
{"remove":{"path":"part-00011-7e35258f-a724-43f3-8622-c7efa51f01a6-c000.snappy.parquet","deletionTimestamp":1639838516199,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":478}}
{"remove":{"path":"part-00000-cb74dd35-ae80-4b3a-b97c-ea492e11ddc3-c000.snappy.parquet","deletionTimestamp":1639838516202,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":296}}
```
/tmp/delta_restore_test/_delta_log/00000000000000000004.json
```json
{"commitInfo":{"timestamp":1639899780668,"operation":"RESTORE","operationParameters":{"version":0,"timestamp":null},"readVersion":3,"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numRestoredFiles":"3","removedFilesSize":"794","numRemovedFiles":"2","restoredFilesSize":"1252","numOfFilesAfterRestore":"3","tableSizeAfterRestore":"1252"},"engineInfo":"Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT"}}
{"metaData":{"id":"b090f082-f927-4372-9537-9623ae280ad8","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1639838404635}}
{"add":{"path":"part-00005-beac50f7-dbe7-40b7-9ce2-5e0e6a1607ad-c000.snappy.parquet","partitionValues":{},"size":478,"modificationTime":1639838406642,"dataChange":true}}
{"add":{"path":"part-00011-7e35258f-a724-43f3-8622-c7efa51f01a6-c000.snappy.parquet","partitionValues":{},"size":478,"modificationTime":1639838406642,"dataChange":true}}
{"add":{"path":"part-00000-cb74dd35-ae80-4b3a-b97c-ea492e11ddc3-c000.snappy.parquet","partitionValues":{},"size":296,"modificationTime":1639838406641,"dataChange":true}}
{"remove":{"path":"part-00011-7a5341e6-4876-467a-b33d-56d8dc0bf243-c000.snappy.parquet","deletionTimestamp":1639899779981,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":490}}
{"remove":{"path":"part-00000-b71c3566-429b-4038-8127-6e480656038c-c000.snappy.parquet","deletionTimestamp":1639899779981,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":304}}
```
/tmp/delta_restore_test/_delta_log/00000000000000000005.json
```json
{"commitInfo":{"timestamp":1639899805962,"operation":"RESTORE","operationParameters":{"version":null,"timestamp":"2021-12-19"},"readVersion":4,"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numRestoredFiles":"2","removedFilesSize":"1252","numRemovedFiles":"3","restoredFilesSize":"794","numOfFilesAfterRestore":"2","tableSizeAfterRestore":"794"},"engineInfo":"Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT"}}
{"metaData":{"id":"b090f082-f927-4372-9537-9623ae280ad8","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"id_new\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1639838404635}}
{"add":{"path":"part-00011-7a5341e6-4876-467a-b33d-56d8dc0bf243-c000.snappy.parquet","partitionValues":{},"size":490,"modificationTime":1639838414041,"dataChange":true}}
{"add":{"path":"part-00000-b71c3566-429b-4038-8127-6e480656038c-c000.snappy.parquet","partitionValues":{},"size":304,"modificationTime":1639838414041,"dataChange":true}}
{"remove":{"path":"part-00005-beac50f7-dbe7-40b7-9ce2-5e0e6a1607ad-c000.snappy.parquet","deletionTimestamp":1639899805448,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":478}}
{"remove":{"path":"part-00011-7e35258f-a724-43f3-8622-c7efa51f01a6-c000.snappy.parquet","deletionTimestamp":1639899805445,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":478}}
{"remove":{"path":"part-00000-cb74dd35-ae80-4b3a-b97c-ea492e11ddc3-c000.snappy.parquet","deletionTimestamp":1639899805444,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":296}}
```

Closes delta-io#863

Signed-off-by: Scott Sandre <[email protected]>

GitOrigin-RevId: 3f1c0e77b403f49f9460baff13174dd4f88da47d
jbguerraz pushed a commit to jbguerraz/delta that referenced this pull request Jul 6, 2022
 * RestoreTableCommand moved to org.apache.spark.sql.delta.commands package
 * cache() of filesToRemove DataFame removed (according to delta-io#863 (comment))
 * cache() of filesToAdd will be applied only if spark.sql.files.ignoreMissingFiles = false (default value)

Signed-off-by: Maksym Dovhal <[email protected]>

Closes delta-io#912

Signed-off-by: Venki Korukanti <[email protected]>
GitOrigin-RevId: b10707c96766f74423874f01898587f97c69c6b5
jbguerraz pushed a commit to jbguerraz/delta that referenced this pull request Jul 6, 2022
Add possibility to restore delta table using version or timestamp.

Examples:
```scala
io.delta.tables.DeltaTable.forPath("/some_delta_path").restoreToVersion(1)
io.delta.tables.DeltaTable.forPath("/some_delta_path").restoreToTimestamp("2021-01-01 00:00:00.000")
io.delta.tables.DeltaTable.forPath("/some_delta_path").restoreToTimestamp("2021-01-01")
```
Fixes delta-io#632

Signed-off-by: Maksym Dovhal <[email protected]>

Tested locally using spark-shell
```bash
sbt package
spark-shell --jars ./core/target/scala-2.12/delta-core_2.12-1.1.0-SNAPSHOT.jar --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
```

```scala
spark.range(2).write.format("delta").mode("overwrite").save("/tmp/delta_restore_test")
spark.range(2,3).withColumnRenamed("id", "id_new").write.option("mergeSchema", "true").format("delta").mode("overwrite").save("/tmp/delta_restore_test")
io.delta.tables.DeltaTable.forPath("/tmp/delta_restore_test").restoreToVersion(0)
io.delta.tables.DeltaTable.forPath("/tmp/delta_restore_test").restoreToTimestamp("2021-12-18 16:40:14.54")
// At next day
io.delta.tables.DeltaTable.forPath("/tmp/delta_restore_test").restoreToVersion(0)
io.delta.tables.DeltaTable.forPath("/tmp/delta_restore_test").restoreToTimestamp("2021-12-19")
io.delta.tables.DeltaTable.forPath("/tmp/delta_restore_test").history().show(false)
```
Output:
```
+-------+-----------------------+------+--------+---------+------------------------------------------------------+----+--------+---------+-----------+--------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+--------------------------------------------+
|version|timestamp              |userId|userName|operation|operationParameters                                   |job |notebook|clusterId|readVersion|isolationLevel|isBlindAppend|operationMetrics                                                                                                                                             |userMetadata|engineInfo                                  |
+-------+-----------------------+------+--------+---------+------------------------------------------------------+----+--------+---------+-----------+--------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+--------------------------------------------+
|5      |2021-12-19 09:43:41.604|null  |null    |RESTORE  |{version -> null, timestamp -> 2021-12-19}            |null|null    |null     |4          |Serializable  |false        |{numRestoredFiles -> 2, removedFilesSize -> 1252, numRemovedFiles -> 3, restoredFilesSize -> 794, numOfFilesAfterRestore -> 2, tableSizeAfterRestore -> 794} |null        |Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT|
|4      |2021-12-19 09:43:17.415|null  |null    |RESTORE  |{version -> 0, timestamp -> null}                     |null|null    |null     |3          |Serializable  |false        |{numRestoredFiles -> 3, removedFilesSize -> 794, numRemovedFiles -> 2, restoredFilesSize -> 1252, numOfFilesAfterRestore -> 3, tableSizeAfterRestore -> 1252}|null        |Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT|
|3      |2021-12-18 16:42:14.083|null  |null    |RESTORE  |{version -> null, timestamp -> 2021-12-18 16:40:14.54}|null|null    |null     |2          |Serializable  |false        |{numRestoredFiles -> 2, removedFilesSize -> 1252, numRemovedFiles -> 3, restoredFilesSize -> 794, numOfFilesAfterRestore -> 2, tableSizeAfterRestore -> 794} |null        |Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT|
|2      |2021-12-18 16:40:53.861|null  |null    |RESTORE  |{version -> 0, timestamp -> null}                     |null|null    |null     |1          |Serializable  |false        |{numRestoredFiles -> 3, removedFilesSize -> 794, numRemovedFiles -> 2, restoredFilesSize -> 1252, numOfFilesAfterRestore -> 3, tableSizeAfterRestore -> 1252}|null        |Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT|
|1      |2021-12-18 16:40:14.54 |null  |null    |WRITE    |{mode -> Overwrite, partitionBy -> []}                |null|null    |null     |0          |Serializable  |false        |{numFiles -> 2, numOutputRows -> 1, numOutputBytes -> 794}                                                                                                   |null        |Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT|
|0      |2021-12-18 16:40:08.045|null  |null    |WRITE    |{mode -> Overwrite, partitionBy -> []}                |null|null    |null     |null       |Serializable  |false        |{numFiles -> 3, numOutputRows -> 2, numOutputBytes -> 1252}                                                                                                  |null        |Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT|
+-------+-----------------------+------+--------+---------+------------------------------------------------------+----+--------+---------+-----------+--------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+--------------------------------------------+
```
Examples of transactions:
/tmp/delta_restore_test/_delta_log/00000000000000000000.json
```json
{"protocol":{"minReaderVersion":1,"minWriterVersion":2}}
{"metaData":{"id":"b090f082-f927-4372-9537-9623ae280ad8","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1639838404635}}
{"add":{"path":"part-00000-cb74dd35-ae80-4b3a-b97c-ea492e11ddc3-c000.snappy.parquet","partitionValues":{},"size":296,"modificationTime":1639838406641,"dataChange":true}}
{"add":{"path":"part-00005-beac50f7-dbe7-40b7-9ce2-5e0e6a1607ad-c000.snappy.parquet","partitionValues":{},"size":478,"modificationTime":1639838406642,"dataChange":true}}
{"add":{"path":"part-00011-7e35258f-a724-43f3-8622-c7efa51f01a6-c000.snappy.parquet","partitionValues":{},"size":478,"modificationTime":1639838406642,"dataChange":true}}
{"commitInfo":{"timestamp":1639838407868,"operation":"WRITE","operationParameters":{"mode":"Overwrite","partitionBy":"[]"},"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numFiles":"3","numOutputRows":"2","numOutputBytes":"1252"},"engineInfo":"Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT"}}
```
/tmp/delta_restore_test/_delta_log/00000000000000000001.json
```json
{"metaData":{"id":"b090f082-f927-4372-9537-9623ae280ad8","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"id_new\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1639838404635}}
{"add":{"path":"part-00000-b71c3566-429b-4038-8127-6e480656038c-c000.snappy.parquet","partitionValues":{},"size":304,"modificationTime":1639838414041,"dataChange":true}}
{"add":{"path":"part-00011-7a5341e6-4876-467a-b33d-56d8dc0bf243-c000.snappy.parquet","partitionValues":{},"size":490,"modificationTime":1639838414041,"dataChange":true}}
{"remove":{"path":"part-00000-cb74dd35-ae80-4b3a-b97c-ea492e11ddc3-c000.snappy.parquet","deletionTimestamp":1639838414511,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":296}}
{"remove":{"path":"part-00011-7e35258f-a724-43f3-8622-c7efa51f01a6-c000.snappy.parquet","deletionTimestamp":1639838414511,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":478}}
{"remove":{"path":"part-00005-beac50f7-dbe7-40b7-9ce2-5e0e6a1607ad-c000.snappy.parquet","deletionTimestamp":1639838414511,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":478}}
{"commitInfo":{"timestamp":1639838414511,"operation":"WRITE","operationParameters":{"mode":"Overwrite","partitionBy":"[]"},"readVersion":0,"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numFiles":"2","numOutputRows":"1","numOutputBytes":"794"},"engineInfo":"Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT"}}
```
/tmp/delta_restore_test/_delta_log/00000000000000000002.json
```json
{"commitInfo":{"timestamp":1639838436332,"operation":"RESTORE","operationParameters":{"version":0,"timestamp":null},"readVersion":1,"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numRestoredFiles":"3","removedFilesSize":"794","numRemovedFiles":"2","restoredFilesSize":"1252","numOfFilesAfterRestore":"3","tableSizeAfterRestore":"1252"},"engineInfo":"Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT"}}
{"metaData":{"id":"b090f082-f927-4372-9537-9623ae280ad8","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1639838404635}}
{"add":{"path":"part-00005-beac50f7-dbe7-40b7-9ce2-5e0e6a1607ad-c000.snappy.parquet","partitionValues":{},"size":478,"modificationTime":1639838406642,"dataChange":true}}
{"add":{"path":"part-00011-7e35258f-a724-43f3-8622-c7efa51f01a6-c000.snappy.parquet","partitionValues":{},"size":478,"modificationTime":1639838406642,"dataChange":true}}
{"add":{"path":"part-00000-cb74dd35-ae80-4b3a-b97c-ea492e11ddc3-c000.snappy.parquet","partitionValues":{},"size":296,"modificationTime":1639838406641,"dataChange":true}}
{"remove":{"path":"part-00011-7a5341e6-4876-467a-b33d-56d8dc0bf243-c000.snappy.parquet","deletionTimestamp":1639838435578,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":490}}
{"remove":{"path":"part-00000-b71c3566-429b-4038-8127-6e480656038c-c000.snappy.parquet","deletionTimestamp":1639838435586,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":304}}
```
/tmp/delta_restore_test/_delta_log/00000000000000000003.json
```json
{"commitInfo":{"timestamp":1639838517073,"operation":"RESTORE","operationParameters":{"version":null,"timestamp":"2021-12-18 16:40:14.54"},"readVersion":2,"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numRestoredFiles":"2","removedFilesSize":"1252","numRemovedFiles":"3","restoredFilesSize":"794","numOfFilesAfterRestore":"2","tableSizeAfterRestore":"794"},"engineInfo":"Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT"}}
{"metaData":{"id":"b090f082-f927-4372-9537-9623ae280ad8","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"id_new\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1639838404635}}
{"add":{"path":"part-00011-7a5341e6-4876-467a-b33d-56d8dc0bf243-c000.snappy.parquet","partitionValues":{},"size":490,"modificationTime":1639838414041,"dataChange":true}}
{"add":{"path":"part-00000-b71c3566-429b-4038-8127-6e480656038c-c000.snappy.parquet","partitionValues":{},"size":304,"modificationTime":1639838414041,"dataChange":true}}
{"remove":{"path":"part-00005-beac50f7-dbe7-40b7-9ce2-5e0e6a1607ad-c000.snappy.parquet","deletionTimestamp":1639838516199,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":478}}
{"remove":{"path":"part-00011-7e35258f-a724-43f3-8622-c7efa51f01a6-c000.snappy.parquet","deletionTimestamp":1639838516199,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":478}}
{"remove":{"path":"part-00000-cb74dd35-ae80-4b3a-b97c-ea492e11ddc3-c000.snappy.parquet","deletionTimestamp":1639838516202,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":296}}
```
/tmp/delta_restore_test/_delta_log/00000000000000000004.json
```json
{"commitInfo":{"timestamp":1639899780668,"operation":"RESTORE","operationParameters":{"version":0,"timestamp":null},"readVersion":3,"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numRestoredFiles":"3","removedFilesSize":"794","numRemovedFiles":"2","restoredFilesSize":"1252","numOfFilesAfterRestore":"3","tableSizeAfterRestore":"1252"},"engineInfo":"Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT"}}
{"metaData":{"id":"b090f082-f927-4372-9537-9623ae280ad8","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1639838404635}}
{"add":{"path":"part-00005-beac50f7-dbe7-40b7-9ce2-5e0e6a1607ad-c000.snappy.parquet","partitionValues":{},"size":478,"modificationTime":1639838406642,"dataChange":true}}
{"add":{"path":"part-00011-7e35258f-a724-43f3-8622-c7efa51f01a6-c000.snappy.parquet","partitionValues":{},"size":478,"modificationTime":1639838406642,"dataChange":true}}
{"add":{"path":"part-00000-cb74dd35-ae80-4b3a-b97c-ea492e11ddc3-c000.snappy.parquet","partitionValues":{},"size":296,"modificationTime":1639838406641,"dataChange":true}}
{"remove":{"path":"part-00011-7a5341e6-4876-467a-b33d-56d8dc0bf243-c000.snappy.parquet","deletionTimestamp":1639899779981,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":490}}
{"remove":{"path":"part-00000-b71c3566-429b-4038-8127-6e480656038c-c000.snappy.parquet","deletionTimestamp":1639899779981,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":304}}
```
/tmp/delta_restore_test/_delta_log/00000000000000000005.json
```json
{"commitInfo":{"timestamp":1639899805962,"operation":"RESTORE","operationParameters":{"version":null,"timestamp":"2021-12-19"},"readVersion":4,"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numRestoredFiles":"2","removedFilesSize":"1252","numRemovedFiles":"3","restoredFilesSize":"794","numOfFilesAfterRestore":"2","tableSizeAfterRestore":"794"},"engineInfo":"Apache-Spark/3.2.0 Delta-Lake/1.1.0-SNAPSHOT"}}
{"metaData":{"id":"b090f082-f927-4372-9537-9623ae280ad8","format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"id_new\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"configuration":{},"createdTime":1639838404635}}
{"add":{"path":"part-00011-7a5341e6-4876-467a-b33d-56d8dc0bf243-c000.snappy.parquet","partitionValues":{},"size":490,"modificationTime":1639838414041,"dataChange":true}}
{"add":{"path":"part-00000-b71c3566-429b-4038-8127-6e480656038c-c000.snappy.parquet","partitionValues":{},"size":304,"modificationTime":1639838414041,"dataChange":true}}
{"remove":{"path":"part-00005-beac50f7-dbe7-40b7-9ce2-5e0e6a1607ad-c000.snappy.parquet","deletionTimestamp":1639899805448,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":478}}
{"remove":{"path":"part-00011-7e35258f-a724-43f3-8622-c7efa51f01a6-c000.snappy.parquet","deletionTimestamp":1639899805445,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":478}}
{"remove":{"path":"part-00000-cb74dd35-ae80-4b3a-b97c-ea492e11ddc3-c000.snappy.parquet","deletionTimestamp":1639899805444,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":296}}
```

Closes delta-io#863

Signed-off-by: Scott Sandre <[email protected]>

GitOrigin-RevId: 3f1c0e77b403f49f9460baff13174dd4f88da47d
jbguerraz pushed a commit to jbguerraz/delta that referenced this pull request Jul 6, 2022
 * RestoreTableCommand moved to org.apache.spark.sql.delta.commands package
 * cache() of filesToRemove DataFame removed (according to delta-io#863 (comment))
 * cache() of filesToAdd will be applied only if spark.sql.files.ignoreMissingFiles = false (default value)

Signed-off-by: Maksym Dovhal <[email protected]>

Closes delta-io#912

Signed-off-by: Venki Korukanti <[email protected]>
GitOrigin-RevId: b10707c96766f74423874f01898587f97c69c6b5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged This issue has been read and acknowledged by Delta admins waiting for merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for restore operations on DeltaTable
4 participants