Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read metadata from object storage #10810

Merged

Conversation

rcmarron
Copy link
Contributor

@rcmarron rcmarron commented Jul 15, 2022

This adds the object storage read path for meta data. So, when running this branch, the front-end uses object storage for the metadata calculations, and then it does the old path for fetching the actual data. It's still a WIP, but it works in my local dev env. A couple notes:

  • The reading multiple files from s3 seems to be very slow. I'm not positive what's going on. I added threading to try to speed it up, but it doesn't seem to help much. I'm wondering if the limiting factor here is my local minio?
  • I added a session_recordings/... to the start of all paths in object storage, so any old data there will be broken
  • I updated the minio secret keys etc to match the posthog docker files.

@rcmarron rcmarron marked this pull request as draft July 15, 2022 01:23
def _read_for_threading(self, result: List, index: int, bucket: str, key: str):
result[index] = self.read(bucket, key)

def read_all(self, bucket: str, keys: List[str], max_concurrent_requests: int) -> Optional[List[Tuple[str, str]]]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, this should be fast, but in practice it is the opposite. I'm not sure why

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for the metadata we don't do any progressive loading so we can't make the time to first byte really fast and therefore start playing immediately?

I'm not sure I'll be able to properly look at this today as am flying. Seems worth digging in first to where the bottleneck here is.

If all else fails we may be able to take advantage of that there is only one consumer processing messages that we can compact metadata down to one file 🤔

@hazzadous hazzadous marked this pull request as ready for review July 20, 2022 10:25
@hazzadous hazzadous merged commit 6ff6e15 into session-recordings-ingester Jul 20, 2022
@hazzadous hazzadous deleted the read-metadata-from-object-storage branch July 20, 2022 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants