Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Found commits after time :20230220161017756, please rollback greater commits first #8025

Open
koochiswathiTR opened this issue Feb 23, 2023 · 7 comments

Comments

@koochiswathiTR
Copy link

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at [email protected].

  • If you have triaged this as a bug, then file an issue directly.

Hi,
Our streaming job is failing with Found commits after time :20230220161017756, please rollback greater commits first,

We tried to rollback commits with the command commit rollback --commit commit_num but we are facing the below exception
Caused by: java.lang.IllegalArgumentException: Cannot use marker based rollback strategy on completed instant:[20230221130513323__deltacommit__COMPLETED]
at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.(BaseRollbackActionExecutor.java:93)
at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.(BaseRollbackActionExecutor.java:73)
at org.apache.hudi.table.action.rollback.MergeOnReadRollbackActionExecutor.(MergeOnReadRollbackActionExecutor.java:48)
at org.apache.hudi.table.HoodieSparkMergeOnReadTable.rollback(HoodieSparkMergeOnReadTable.java:170)
at org.apache.hudi.client.BaseHoodieWriteClient.rollback(BaseHoodieWriteClient.java:766)
... 15 more

Our Hudi configs are

DataSourceWriteOptions.TABLE_TYPE.key() -> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL,
DataSourceWriteOptions.RECORDKEY_FIELD.key() -> "guid",
DataSourceWriteOptions.PARTITIONPATH_FIELD.key() -> "collectionName",
DataSourceWriteOptions.PRECOMBINE_FIELD.key() -> "operationTime",
HoodieCompactionConfig.INLINE_COMPACT_TRIGGER_STRATEGY.key() -> CompactionTriggerStrategy.TIME_ELAPSED.name,
HoodieCompactionConfig.INLINE_COMPACT_TIME_DELTA_SECONDS.key() -> String.valueOf(60 * 60),
HoodieCompactionConfig.CLEANER_POLICY.key() -> HoodieCleaningPolicy.KEEP_LATEST_COMMITS.name(),
HoodieCompactionConfig.CLEANER_COMMITS_RETAINED.key() -> "624", 
HoodieCompactionConfig.MIN_COMMITS_TO_KEEP.key() -> "625",  
HoodieCompactionConfig.MAX_COMMITS_TO_KEEP.key() -> "648", 
HoodieCompactionConfig.ASYNC_CLEAN.key() -> "false", 
HoodieCompactionConfig.INLINE_COMPACT.key() -> "true",
HoodieMetricsConfig.TURN_METRICS_ON.key() -> "true",
HoodieMetricsConfig.METRICS_REPORTER_TYPE_VALUE.key() -> MetricsReporterType.DATADOG.name(),
HoodieMetricsDatadogConfig.API_SITE_VALUE.key() -> "US",
HoodieMetricsDatadogConfig.METRIC_PREFIX_VALUE.key() -> "tacticalnovusingest.hudi",
HoodieMetricsDatadogConfig.API_KEY_SUPPLIER.key() -> "com.tr.indigo.tacticalnovusingest.utils.DatadogKeySupplier",
HoodieMetadataConfig.ENABLE.key() -> "false",
HoodieWriteConfig.ROLLBACK_USING_MARKERS_ENABLE.key() -> "false",

Please help what caused this issue
Help us how to resolve this

Environment Description AWS

  • Hudi version : 0.11.1

  • Spark version : 3.1.2

  • Hive version : NA

  • Hadoop version :3

  • Storage (HDFS/S3/GCS..) :S3

  • Running on Docker? (yes/no) :no

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

@danny0405
Copy link
Contributor

Did you enable the lazy cleaning for multi writers?

@koochiswathiTR
Copy link
Author

@danny0405 We are not using multi writer, Its only single writer.
What is lazy cleaning can you brief on this?

@danny0405
Copy link
Contributor

What version of Hudi did you use, seems an uknown bug.

@nsivabalan
Copy link
Contributor

can you post the contents of ".hoodie" w/ last mod time intact (ls -ltr).
Also, when you triggered rollback via cli, whats the entire command you passed.

I see we have an option --rollbackUsingMarkers. did you set it or no ?

@nsivabalan
Copy link
Contributor

we also made some fix on rolling back a completed instant #6313. can you try 0.12.1 may be.

@github-project-automation github-project-automation bot moved this to ⏳ Awaiting Triage in Hudi Issue Support Mar 8, 2023
@codope codope moved this from ⏳ Awaiting Triage to 👤 User Action in Hudi Issue Support May 3, 2023
@chestnutqiang
Copy link
Contributor

chestnutqiang commented Jun 26, 2023

Did you enable the lazy cleaning for multi writers?

If there are multiple parallelism writes for different partitions of the same table, does HUDI have any plans to optimize this? For example, a Spark SQL is split into multiple applications to write.

@danny0405
Copy link
Contributor

Guess not, becase until committing, Hudi has no idear whether the two(or more) commits have conflicts, so the rollback plan should execute from the latest instant to the oldest, to ensure the validity of the data set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 👤 User Action
Status: No status
Development

No branches or pull requests

4 participants