Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persons table schema migration #6667

Merged
merged 9 commits into from
Nov 18, 2021
Merged

Persons table schema migration #6667

merged 9 commits into from
Nov 18, 2021

Conversation

yakkomajuri
Copy link
Contributor

@yakkomajuri yakkomajuri commented Oct 26, 2021

Changes

Still WIP as I need to get feedback on this.

This is part of the work to ensure CH and Postgres are always in sync.

The approach here is to always increment a version column in Postgres when we make an update (see #6628). This version is then passed to ClickHouse and we'll collapse rows based on this value.

The interesting part comes with is_deleted. Initially, I added is_deleted to the sort key (ORDER BY clause). This would ensure that we couldn't possibly collapse all rows with is_deleted (as has happened before), leaving us with a person that is deleted in Postgres but exists in ClickHouse.

The problem here is storage. We'd have to keep around 2x rows for each deleted person. As we'd only ever collapse down to 2 rows: one with is_deleted=0 and one with is_deleted=1.

The solution here was inspired by what we did with distinct IDs. When reading from Kafka, we can simply assign version to the max possible UInt64 value, ensuring we always keep the is_deleted row around without having to add it to the sort key. The way this was done also ensures backwards compatibility.

Other points/questions

  1. I opted for not changing the queries. They currently use argMax with _timestamp but this should be fine. The case where this could be incorrect is if we have equal timestamps before the collapsing happens. The alternative is to change the queries to use version, but we'd run into problems when we first migrate. Happy to get some thoughts here.
  2. I also added _partition to the table. We currently have _offset, which is somewhat helpful, but for it to be truly helpful we need the partition number too.

How did you test this code?

Added tests for direct insertion and "kafka insertion" into the new table (removed due to flakiness). Confirmed collapsing works as expected.

Also tested manually

@yakkomajuri yakkomajuri marked this pull request as draft October 26, 2021 16:26
@timgl timgl temporarily deployed to posthog-pr-6667 October 26, 2021 16:27 Inactive
@yakkomajuri yakkomajuri temporarily deployed to posthog-pr-6667 October 26, 2021 16:37 Inactive
@yakkomajuri yakkomajuri temporarily deployed to posthog-pr-6667 October 26, 2021 16:57 Inactive
@yakkomajuri yakkomajuri temporarily deployed to posthog-pr-6667 October 27, 2021 11:51 Inactive
@yakkomajuri yakkomajuri temporarily deployed to posthog-pr-6667 October 27, 2021 11:51 Inactive
@yakkomajuri yakkomajuri marked this pull request as ready for review October 29, 2021 13:24
@yakkomajuri yakkomajuri changed the title WIP persons table schema migration Persons table schema migration Oct 29, 2021
@yakkomajuri
Copy link
Contributor Author

yakkomajuri commented Nov 1, 2021

Note to self: consider https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/versionedcollapsingmergetree/

EDIT: Looked into it, not what we need. I'd have loved cancel rows to take priority over the version, but that's not the case.

@yakkomajuri
Copy link
Contributor Author

@macobo would appreciate a second pass after our chat.

@fuziontech can I get your 👁️ 👁️ in here too?

@yakkomajuri
Copy link
Contributor Author

Getting a paths test failure. Looked at it and it seems unrelated. Any context @neilkakkar ?

@yakkomajuri
Copy link
Contributor Author

also friendly ping @fuziontech

@yakkomajuri yakkomajuri marked this pull request as draft November 5, 2021 10:38
@yakkomajuri
Copy link
Contributor Author

Converted to draft as a safety measure given we first need to merge the changes that send us this version value via Kafka

Copy link
Member

@fuziontech fuziontech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work, but will probably need to be tested for helm chart deploys. This might time out the migration task. This will also require a custom migration for cloud. These select * into migrations are super slow on clickhouse. Migration will probably take 30-60 minutes on cloud.

@tiina303
Copy link
Contributor

tiina303 commented Nov 8, 2021

re chart

For the migration task timeout we can just increase it in the command line via --timeout 60m or to whatever as needed (if it's longer than 20m we should potentially make it a major upgrade with a note about it just being a big migration and needing more time).
However is that going to be the case for all new installs, n00b q: do we run all migrations from the beginning of time always?

@yakkomajuri yakkomajuri marked this pull request as ready for review November 15, 2021 10:34
@yakkomajuri yakkomajuri added the P0 Critical, breaking issue (page crash, missing functionality) label Nov 16, 2021
@fuziontech
Copy link
Member

I've tested this pretty extensively now and am comfortable to land it. Landing now and monitoring 👀

@fuziontech fuziontech merged commit 54f2349 into master Nov 18, 2021
@fuziontech fuziontech deleted the persons-table-schema branch November 18, 2021 05:19
fuziontech added a commit that referenced this pull request Nov 18, 2021
fuziontech added a commit that referenced this pull request Nov 18, 2021
fuziontech added a commit that referenced this pull request Nov 18, 2021
fuziontech added a commit that referenced this pull request Nov 18, 2021
* Revert "Revert "Persons table schema migration (#6667)" (#7198)"

This reverts commit 311452b.

* CREATE TABLE AS

* ON CLUSTER

* prefix database
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 Critical, breaking issue (page crash, missing functionality)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants