-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: session recording to s3 - posthog only #9901
Conversation
return snapshot_data | ||
} | ||
|
||
if (team_id !== 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gather this needs to be replaced with ENABLED_SESSION_RECORDINGS_OBJECT_STORAGE_TEAM_IDS
or something less long!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ENABLED_SESSION_RECORDINGS_OBJECT_STORAGE_TEAM_IDS_ALLOW_LIST_FOR_TO_ALLOW_MORE_SAFELY_THE_TESTING_OF_THIS_FEATURE
return snapshot_data | ||
} | ||
|
||
if (team_id !== 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gather this needs to be replaced with ENABLED_SESSION_RECORDINGS_OBJECT_STORAGE_TEAM_IDS
env var or something less long!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me aside from the team_id hardcoding
b5a4c09
to
b46c460
Compare
@@ -766,6 +779,57 @@ export class EventsProcessor { | |||
} | |||
} | |||
|
|||
private tryStoreSessionRecordingToObjectStorage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this named try?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's what I was thinking when I made the method... It can just as easily be storeSessionRecording
:)
@@ -766,6 +779,57 @@ export class EventsProcessor { | |||
} | |||
} | |||
|
|||
private tryStoreSessionRecordingToObjectStorage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this class is doing way too much. Please put this elsewhere, e.g. extract class/method/etc - we don't want this class to be also responsible for business logic of object storage.
|
||
const storageWriteTimer = new Date() | ||
|
||
this.objectStorage.putObject(params, (err: any, resp: any) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plugin server works off of promises, not callbacks. Please create a promise-based API to be consistent.
|
||
this.objectStorage.putObject(params, (err: any, resp: any) => { | ||
if (err) { | ||
console.error(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug logging.
} | ||
}) | ||
|
||
this.pluginsServer.statsd?.timing('session_data.storage_upload.timing', storageWriteTimer, tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This timer will always be zero since we call it just after queueing a callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plugin-server/src/types.ts
Outdated
@@ -150,6 +150,7 @@ export interface PluginsServerConfig extends Record<string, any> { | |||
OBJECT_STORAGE_SECRET_ACCESS_KEY: string | |||
OBJECT_STORAGE_SESSION_RECORDING_FOLDER: string | |||
OBJECT_STORAGE_BUCKET: string | |||
OBJECT_STORAGE_TEAM_ALLOW_LIST: 'all' | number[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see code that's parsing this
plugin-server/src/types.ts
Outdated
@@ -150,6 +150,7 @@ export interface PluginsServerConfig extends Record<string, any> { | |||
OBJECT_STORAGE_SECRET_ACCESS_KEY: string | |||
OBJECT_STORAGE_SESSION_RECORDING_FOLDER: string | |||
OBJECT_STORAGE_BUCKET: string | |||
OBJECT_STORAGE_TEAM_ALLOW_LIST: 'all' | number[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Let's continue the same pattern as with other env vars rather than introducing a new one. So remove 'all' and make this a string same as CONVERSION_BUFFER_ENABLED_TEAMS
. If we want to turn this on for everyone, let's yeet the code or add a separate var.
ee/clickhouse/queries/session_recordings/clickhouse_session_recording.py
Outdated
Show resolved
Hide resolved
@@ -26,7 +26,7 @@ def _query_recording_snapshots(self) -> List[SessionRecordingEvent]: | |||
window_id=window_id, | |||
distinct_id=distinct_id, | |||
timestamp=timestamp, | |||
snapshot_data=json.loads(snapshot_data), | |||
snapshot_data=try_read_from_object_storage(session_id, snapshot_data), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why try
- as a code reader my immediate question is what happens if it fails?
ee/clickhouse/queries/session_recordings/clickhouse_session_recording.py
Outdated
Show resolved
Hide resolved
9e989d1
to
8d969b2
Compare
This reverts commit 3affcb5.
plugin-server/src/utils/db/hub.ts
Outdated
try { | ||
connectObjectStorage(serverConfig) | ||
objectStorage = connectObjectStorage(serverConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This try-catch is dead code - let's streamline all of this and remove the status.infos/try-catches etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have stripped it right back
@@ -122,20 +124,44 @@ describe('e2e', () => { | |||
}) | |||
|
|||
test('snapshot captured, processed, ingested', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is doing too much. Can you add a separate test where object storage is explicitly enabled and check the behavior of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are more focussed tests that prove that object_storage_path is created correctly... stripped this back to just checking that it is set. There's definitely follow-up work coming on this so will refactor/improve tests in that
expect(event.distinct_id).toEqual('some-id') | ||
expect(event.snapshot_data).toEqual({ timestamp: 123 }) | ||
|
||
const expectedFolderDate = castTimestampOrNow(now, TimestampFormat.DateOnly) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a separate/new test rather than tacking new behavior onto a existing one?
Especially since the name of the test is now wrong/misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split the test (after first trying to move them into their own file) and they started not generating events... I'll leave this to a follow-up so I can concentrate on why that happened with less surrounding noise
} | ||
}, | ||
} | ||
} catch (e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Why would this ever throw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would... the only place that might is initialising the s3 client and that's lazy so shouldn't throw
However, I wanted to make sure that initialising object storage couldn't stop the plugin server starting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any value of being so defensive here given there's no awaits/anything unusual. This style IMO makes the code harder to follow, covers no known usecases and potentially hides important bugs we want to be aware of/bail early on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a look, left some thoughts.
My biggest concern is with how does this handle cloud-scale? There's no way things will be fast or reliable, if for one recording, we need to first upload a thousand tiny JSON files to S3, and then download and combine all of them. I'm worried about the huge latencies.
- For each click event, just by moving my mouse I often generate between a dozen to a hundred session recording events. Even though our biggest volume customers don't use session recordings, we have as many session recording events/snapshots coming in each minute on cloud, as non-session recording events. Is
await uploadToS3
the best things we can do? - What if S3 is down?
- Can we or should we batch these instead, or upload async? I'd imagine a
recordings
table in postgres, with one row representing one.json
file for everything between two full snapshots in a recording. So about 200 snapshots now will be combined into one downloadable URL.
Edit: don't take my questions as blockers for trying it out... but know that we have just entered the forest for the trek towards Mt Doom. We haven't even seen the Orcs yet.
|
||
const aws = require('aws-sdk') | ||
|
||
let S3: typeof aws.S3 | null = null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm usually against globals. Why not make ObjectStorage
into an actual class
, and store this as an instance variable? The pojo/yolo-object returned from connectObjectStorage
is stored on the hub
anyway, so might as well be a real class.
public async fetchSessionRecordingEvents( | ||
sessionId?: string | undefined | ||
): Promise<PostgresSessionRecordingEvent[] | SessionRecordingEvent[]> { | ||
const predicate = !!sessionId ? ` WHERE session_id = '${sessionId}'` : '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure inlining the ID in the query without any sanitization is a good idea?
const altered_data = { ...snapshot_data } | ||
// don't delete the snapshot data **yet**, or if we have to roll back we lose data | ||
// this makes the timings less accurate because we're storing and loading the data twice | ||
// delete altered_data.data | ||
altered_data['object_storage_path'] = object_storage_path | ||
|
||
return JSON.stringify(altered_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const altered_data = { ...snapshot_data } | |
// don't delete the snapshot data **yet**, or if we have to roll back we lose data | |
// this makes the timings less accurate because we're storing and loading the data twice | |
// delete altered_data.data | |
altered_data['object_storage_path'] = object_storage_path | |
return JSON.stringify(altered_data) | |
// don't delete the snapshot data **yet**, or if we have to roll back we lose data | |
// this makes the timings less accurate because we're storing and loading the data twice | |
return JSON.stringify({ ...snapshot_data, object_storage_path }) |
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
Problem
see #9294
Changes
Adds ability to write session recordings to object storage for configured teams
Still stores recording to clickhouse so that recordings can continue to be displayed if the feature is turned off
How did you test this code?
added tests and ran it locally
If storage is enabled but not available there is a delay buffering recordings to view as the read is attempted before using the unaltered original ClickHouse data