Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate Events Processor to Use New Version Tracker Impl #560

Merged
merged 5 commits into from
Oct 23, 2024

Conversation

dermanyang
Copy link
Contributor

@dermanyang dermanyang commented Oct 22, 2024

Purpose

Copied changes

Apply recent changes improving restart behavior from aptos-indexer-processor-example repo. Namely:

  1. Refactor the events_processor to use VersionTracker step instead of the LatestVersionProcessedTracker object (original PR)
  2. Create a BackfillProcessorStatus table and write to it when backfill_alias is set in the config (original PR)
  3. Add new column, BackfillStatus, and improve starting_version selection on restart (original PR)
    See original PRs for more details and discussion. Design [here] (https://www.notion.so/aptoslabs/Improve-Processor-Restart-and-Gap-Detection-Behavior-9e884917f4df4f5690d09f8a59581aab).

Other changes

  • Grouped the new migration/schema with the other migrations/schemas under rust/processor rather than rust/sdk-processor. But added the model under rust/sdk-processor/db/common/models.
  • Updated certain execute...() functions to return Result<usize, ProcessorError> instead of QueryResult<usize>. A change that has been in the example repo for a while.
  • Small changes to keep the new integration tests happy given the change to the SDK processor config.

Remaining Work

  • Migrate FA processor to use this too + delete LatestVersionProcessedTracker
  • Merge this infra PR to leave the pods in Complete when the indexer reaches the end_version instead of crash looping lol

Testing

New Cargo Tests for Starting Version

image

Regular Indexer Start

Starts from last check pointed version:
image
Starts from max(last check pointed, starting_version):
image

Backfill Indexer

image

Non-SDK Default Processor Sanity Check

image

@dermanyang dermanyang changed the title Migrate events processor to use new version tracker impl Migrate Events Processor to Use New Version Tracker Impl Oct 22, 2024
@dermanyang dermanyang marked this pull request as ready for review October 23, 2024 00:26
@dermanyang dermanyang requested review from yuunlimm and rtso October 23, 2024 00:26
inserted_at -> Timestamp,
is_token_v2 -> Nullable<Bool>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needed to be changed around otherwise whenever the migrations are re-ran, the ordering of the produced struct's constructor changes causing some issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to add a bit of context: this is reverting the change I made manually. this order should be always the same unless we change in sql file.

impl ToSql<Text, Pg> for BackfillStatus {
fn to_sql<'b>(&'b self, out: &mut Output<'b, '_, Pg>) -> serialize::Result {
match *self {
BackfillStatus::InProgress => out.write_all(b"in_progress")?,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we have the bytes in a constant var?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 106 to 109
let order_step = OrderByVersionStep::new(
starting_version,
Duration::from_secs(UPDATE_PROCESSOR_STATUS_SECS),
Duration::from_secs(DEFAULT_UPDATE_PROCESSOR_STATUS_SECS),
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we actually remove the OrderByVersionStep for the events and fa processor? They're not needed since there's no parallel processing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

Comment on lines 120 to 122
backfill_end_version: backfill_end_version
.unwrap_or(last_success_batch.metadata.end_version)
as i64,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If backfill_end_version isn't set, should the table just store it as null to indicate it's unset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call out. This way, it'll be clear that the BFP will run forever rather than incorrectly indicating that the last_processed_version is equal to the backfill_end_version and status being in_progress.

// Otherwise, return the checkpointed version + 1.
return Ok(
backfill_status.and_then(|status| match status.backfill_status {
BackfillStatus::InProgress => Some(status.last_success_version as u64 + 1),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's move the + 1 's out of this function and into get_starting_version since this function is solely responsible for get_latest_processed_version_from_db

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to leave in the +1s unfortunately since you can't factor it out from the different logical branches. I renamed the function to get_starting_version_from_db to correct the misnomer.

Comment on lines +32 to +37
Ok(latest_processed_version.unwrap_or(
indexer_processor_config
.transaction_stream_config
.starting_version
.unwrap_or(0),
))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the existing priority of the flags for regular processors which can make it hard if you're developing locally and want to start from a specific version. Could we keep the order for regular processor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean the use case where you want starting_version to have higher precedence than last_processed_version?

That's a trade-off we have to make to have better restart behavior. Otherwise, the regular processor will always restart at the starting_version if that's ever set

Copy link
Contributor Author

@dermanyang dermanyang Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rtso I could add the old starting_version_override config option lol. But rename it to starting_version_dev_override to make it clear its for development purposes so its easier for local development. Wdyt?

This way we can prioritize production behavior while having a secondary, but still convenient, way to have the local development behavior you described

Copy link
Collaborator

@rtso rtso Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we keep the priority the same for regular processors to match with existing devex, and establish a different priority only for backfill processors?

Once we have separate backfill jobs, head processors in prod should never need to use starting_version in the config. I'd imagine it'd only be used for development purposes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

head processors in prod should never need to use starting_version in the config

how come? if we made a logic change, wouldn't the steps be something like

  1. create a backfill processor from 0 to HEAD
  2. create a regular processor from HEAD onwards

to do #2, wouldn't we use starting_version=HEAD where HEAD is just some recent version?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I was thinking about the current processors already running in prod, but if you're a new processor and you set starting_version=HEAD then going forward, it should only need to refer to the latest version from DB.

And to my point of developing locally, they can also use the backfill_processor config to start from arbitrary version!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome. thanks for the discussion! yeah, or even run diesel reset to wipe the state

@dermanyang dermanyang merged commit a41f0e0 into main Oct 23, 2024
7 checks passed
@dermanyang dermanyang deleted the sdy/restart_behavior_changes branch October 23, 2024 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants