read metadata from object storage #10810

rcmarron · 2022-07-15T01:23:27Z

This adds the object storage read path for meta data. So, when running this branch, the front-end uses object storage for the metadata calculations, and then it does the old path for fetching the actual data. It's still a WIP, but it works in my local dev env. A couple notes:

The reading multiple files from s3 seems to be very slow. I'm not positive what's going on. I added threading to try to speed it up, but it doesn't seem to help much. I'm wondering if the limiting factor here is my local minio?
I added a session_recordings/... to the start of all paths in object storage, so any old data there will be broken
I updated the minio secret keys etc to match the posthog docker files.

rcmarron · 2022-07-15T01:26:11Z

posthog/storage/object_storage.py

+    def _read_for_threading(self, result: List, index: int, bucket: str, key: str):
+        result[index] = self.read(bucket, key)
+
+    def read_all(self, bucket: str, keys: List[str], max_concurrent_requests: int) -> Optional[List[Tuple[str, str]]]:


Theoretically, this should be fast, but in practice it is the opposite. I'm not sure why

I guess for the metadata we don't do any progressive loading so we can't make the time to first byte really fast and therefore start playing immediately?

I'm not sure I'll be able to properly look at this today as am flying. Seems worth digging in first to where the bottleneck here is.

If all else fails we may be able to take advantage of that there is only one consumer processing messages that we can compact metadata down to one file 🤔

it works

8ac1294

rcmarron marked this pull request as draft July 15, 2022 01:23

rcmarron commented Jul 15, 2022

View reviewed changes

rcmarron and others added 5 commits July 14, 2022 18:32

update username + password in docker

dd3c1e2

fix some tests

30ffaa3

add distinctIds to tests

432ce3d

update tests to use new minio creds

fe33f05

add healthcheck

19d6cc3

hazzadous marked this pull request as ready for review July 20, 2022 10:25

hazzadous merged commit 6ff6e15 into session-recordings-ingester Jul 20, 2022

hazzadous deleted the read-metadata-from-object-storage branch July 20, 2022 10:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read metadata from object storage #10810

read metadata from object storage #10810

rcmarron commented Jul 15, 2022 •

edited

Loading

rcmarron Jul 15, 2022

hazzadous Jul 15, 2022

read metadata from object storage #10810

read metadata from object storage #10810

Conversation

rcmarron commented Jul 15, 2022 • edited Loading

rcmarron Jul 15, 2022

Choose a reason for hiding this comment

hazzadous Jul 15, 2022

Choose a reason for hiding this comment

rcmarron commented Jul 15, 2022 •

edited

Loading