Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session recordings ingester #10799

Closed
wants to merge 27 commits into from
Closed

Conversation

hazzadous
Copy link
Contributor

This is a WIP branch for the work on Session Recordings ingestion revamp.

Harry Waye and others added 18 commits July 8, 2022 17:06
* move recordings to their own queue

* refactor: use headers for metadata, avoid needing to pass JSON

I've moved the sessionId etc to KafkaMessage headers rather than having
in the JSON data. This way we can treat the actual message data as
opaque, which should have processing benefits as well as memory (if we
can get rid of those toStrings that is.).

* ci(session-recordings-ingester): add GitHub Workflow

This will handle building and testing the docker image. Will follow up
with the production deploy separately.

* fix working dir?

* set docker context

* wait for  server

* send data via kafka headers

* chore: fix prod start

* add .swcrc for docker image

* chore: get ci tests into a better state, fix signal handling in docker

* add metadata writing

* tweak test script

* fix typo in tests for eventSource

* add stub for producing to ClickHouse

* skip many events test, get CI green

Co-authored-by: Harry Waye <[email protected]>
expect(sessionRecording.events[0]).toMatchObject(JSON.parse(fullEventString))
})

// TODO: Handle this case and re-enable the test
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rcmarron what do you think the cases here are? I know we get some guarantees on this from Kafka, but I guess there might be other parts of the system that we need to consider?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For details on guarantees perhaps https://kafka.apache.org/documentation/#semantics is useful

rcmarron and others added 9 commits July 20, 2022 11:25
* it works

* update username + password in docker

* fix some tests

* add distinctIds to tests

* update tests to use new minio creds

* add healthcheck

Co-authored-by: Harry Waye <[email protected]>
* thread pool for metadata read

* read snapshots from object storage

* typing fixes
…nts in the ingester (#11022)

fix(recording-ingester): handle concurrency in the ingester
@posthog-bot
Copy link
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

@lancedouglas1
Copy link

Would adding a release-type allow this to be merged? Is there anything I can help with? e.g. testing

@hazzadous
Copy link
Contributor Author

Would adding a release-type allow this to be merged? Is there anything I can help with? e.g. testing

@lancedouglas1 this has been put on hold for the time being as alternative solutions are discussed. Help with testing would be amazing though as well as any other input you can give. I gather you are on self hosted.

May I ask what you use case/interest in this is for?

@lancedouglas1
Copy link

Would adding a release-type allow this to be merged? Is there anything I can help with? e.g. testing

@lancedouglas1 this has been put on hold for the time being as alternative solutions are discussed. Help with testing would be amazing though as well as any other input you can give. I gather you are on self hosted.

May I ask what you use case/interest in this is for?

I'm currently on cloud hosted as it works well, except the retention of recordings.

My usecase has to do with retention lengths of recordings due to compliance. Basically, it is very rarely that we have to go back and review a session recording to confirm user activity with 60 days of the actual activity. It is almost always 90+ days later that other processes flag a session as requiring investigation.
Our alternative is to self host and just save indefinitely but that would require a much larger investment in posthog that is not yet approved. Other things we're doing is putting more anomaly detection in our systems to close the window of time between events and investigations.

To be clear, we're not using recordings for security purposes, but rather we manage reservations for government owned properties, which means that we have many stakeholders with various interests in different types of citizenry-involved events that have happened in the past.

@benjackwhite benjackwhite self-assigned this Oct 5, 2022
@posthog-bot posthog-bot removed the stale label Nov 28, 2022
@posthog-bot
Copy link
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

@posthog-bot
Copy link
Contributor

This PR was closed due to lack of activity. Feel free to reopen if it's still relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants