-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload static cohort csv #2932
Upload static cohort csv #2932
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backend comments first! QA'ing frontend now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frontend lgtm so just the point about repeats and updating cohorts above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic overall checks out. The only bug that would happen in production is that you are producing Proto encoded data to a table that is expecting JSON. You can checkout what encoding to a json table looks like here https://github.com/PostHog/posthog/blob/master/ee/clickhouse/models/person.py#L74
This also raises a really good point that we should probably disable inserting to clickhouse for tests and use the kafka path since that is the only active path in any environment.
ee/clickhouse/models/cohort.py
Outdated
for person_uuid in person_uuids: | ||
pb_event = person_static_cohort_pb2.PersonStaticCohort() | ||
pb_event.id = str(uuid.uuid4()) | ||
pb_event.person_id = str(person_uuid) | ||
pb_event.cohort_id = cohort_id | ||
pb_event.team_id = team.pk | ||
|
||
p = ClickhouseProducer() | ||
p.produce_proto(sql=INSERT_PERSON_STATIC_COHORT, topic=KAFKA_PERSON_STATIC_COHORT, data=pb_event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is perfect for a proto encoded table, but the table you have defined is json encoded.
Check out producing for persons to see what producing to a json table looks like
https://github.com/PostHog/posthog/blob/master/ee/clickhouse/models/person.py#L74
This is ready to go. Table created on cluster in prod. I'll let you land it tomorrow so you can test it out. |
… into upload-static-cohort-csv
Changes
Allows users to upload CSV with distinct ids or email addresses to create a cohort
@fuziontech Would love a check of what I've done with the Kafka topic/protobuf etc.
Checklist