Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive node retries #7707

Merged
merged 5 commits into from
Jan 29, 2021
Merged

archive node retries #7707

merged 5 commits into from
Jan 29, 2021

Conversation

deepthiskumar
Copy link
Member

Added retry logic for database writes to support concurrent database transactions (described here). This also requires setting transaction isolation level to 'serializable' and can be done in two ways:

  1. set it as the default isolation level in postgresql.conf:
    default_transaction_isolation = 'serializable'
  2. set it for a specific database using the following query
    ALTER DATABASE <DATABASE NAME> SET DEFAULT_TRANSACTION_ISOLATION TO SERIALIZABLE ;

They need to be done as a separate step before archive node starts writing (and before bringing up the database instance in the case of option 1). I think setting it at the db level makes it explicit and therefore is a better option, but i'm not quite sure where in the codebase this configuration needs to live

@deepthiskumar deepthiskumar requested a review from a team as a code owner January 28, 2021 00:18
@deepthiskumar deepthiskumar added the ci-build-me Add this label to trigger a circle+buildkite build for this branch label Jan 28, 2021
@psteckler
Copy link
Member

Can you confirm that make archive_blocks still builds (there are additional arguments to the make_block_*_aux functions)?

match res with
| Error _ ->
Conn.rollback () >>| ignore
| Error _ as e ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be logs for the commit and rollback?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, added it

@deepthiskumar
Copy link
Member Author

Can you confirm that make archive_blocks still builds (there are additional arguments to the make_block_*_aux functions)?

Can you confirm that make archive_blocks still builds (there are additional arguments to the make_block_*_aux functions)?

There was a type error, fixed it

let rec go retry_count =
match%bind f () with
| Error e ->
if retry_count <= 0 then return (Error e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe accumulate errors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to logging it as warnings (the else case)? The error in this line is logged at the call site

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably good enough

else (
[%log warn] "Error in %s : $error. Retrying..." error_str
~metadata:[("error", `String (Caqti_error.show e))] ;
go (retry_count - 1) )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to immediately retry without any backoff timeout? Maybe it's worth waiting a few milliseconds? Though I'm not sure what else that will affect. Up to you...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see a problem adding a backoff timeout of a few milliseconds. It should help if there are multiple commit attempts

@O1ahmad
Copy link
Contributor

O1ahmad commented Jan 29, 2021

2. set it for a specific database using the following query
   `ALTER DATABASE <DATABASE NAME> SET DEFAULT_TRANSACTION_ISOLATION TO SERIALIZABLE ;`

... I think setting it at the db level makes it explicit and therefore is a better option, but i'm not quite sure where in the codebase this configuration needs to live

Agreed and something like below added to archive node's db-bootstrap jobs should work for this:

      - name: update-transaction-isolation-level
        image: bitnami/postgresql
        command: ["bash", "-c"]
        args: ["PGPASSWORD={{ .Values.postgresql.postgresqlPassword }} psql --username {{ .Values.postgresql.postgresqlUsername }} --host {{ tpl .Values.archive.postgresHost . }} --port {{ .Values.archive.postgresPort }} --dbname {{ .Values.archive.postgresDB}} -c ALTER DATABASE {{ .Values.archive.postgresDB }} SET DEFAULT_TRANSACTION_ISOLATION TO SERIALIZABLE ;"]

@O1ahmad
Copy link
Contributor

O1ahmad commented Jan 29, 2021

1. set it as the default isolation level in postgresql.conf:
   `default_transaction_isolation = 'serializable'

Alternatively, we can update the postgres settings set in terraform with extended configs:

postgresqlExtendedConf = {
      default_transaction_isolation ="serializable"
}

@aneesharaines
Copy link
Contributor

!approved-for-mainnet

@mrmr1993 mrmr1993 merged commit 548ad68 into compatible Jan 29, 2021
@mrmr1993 mrmr1993 deleted the fix/archive-retries branch January 29, 2021 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-build-me Add this label to trigger a circle+buildkite build for this branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants