Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure session recording attempts #5478

Closed
paolodamico opened this issue Aug 5, 2021 · 11 comments
Closed

Measure session recording attempts #5478

paolodamico opened this issue Aug 5, 2021 · 11 comments
Labels

Comments

@paolodamico
Copy link
Contributor

Is your feature request related to a problem?

We know session recording is a bit unreliable today, particularly in certain circumstances which we're unclear on today (e.g. browser versions, network connectivity, types of devices, etc.). To improve this, we need to measure it well.

Describe the solution you'd like

I'd like to be able to effectively measure failed session recordings (and ideally have some context on why they failed). One approach I was discussing with @macobo was to fire a regular PostHog event ("session recording started") whenever we activate session recording in the client. We can later match this event to a successfully completed session recording to get the ratio of success.

Describe alternatives you've considered

Using regular sessions as a proxy. Measure ratio of sessions in projects with enabled session recording that have < 1 recording.

Additional context

Work towards diagnosing causes.

Thank you for your feature request – we love each and every one!

@paolodamico paolodamico added the enhancement New feature or request label Aug 5, 2021
@paolodamico
Copy link
Contributor Author

@yakkomajuri not sure if this something you'd want to pick up or we should figure out in Core Experience?

@macobo
Copy link
Contributor

macobo commented Aug 6, 2021

Note - I would personally suggest the "alternative" approach since it reflects the users experience better. E.g. you can realistically end up in a situation where the "posthog-js" metric shows 99% and sessions/recordings ratio of e.g. 60%. As for why: see #4884

@paolodamico
Copy link
Contributor Author

Well definitely one benefit of going with the approach you suggest is that we can measure already instead of waiting for data to come in. I would really appreciate it if someone with more context could help me build this query in Metabase. Maybe @macobo or @EDsCODE you could help?

@paolodamico
Copy link
Contributor Author

@rcmarron @alexkim205 I think we should revive this and make sure we can measure % of failed recording attempts reliably.

@rcmarron
Copy link
Contributor

100% agree @paolodamico

I think original approach may be more appropriate now that the UX has changed (although both would be valuable). I just did a bit of digging, and it looks like we add a $phjs-rrweb-record property to the Capture Metrics event when we start a recording. (https://github.com/PostHog/posthog-js/blob/4b82c682981542c0115d14f730f846a9278059e9/src/extensions/sessionrecording.js#L111).

I'm not very familiar with the Capture Metrics event, but @macobo would it be appropriate to base a metric off of that. Basically, I would check to see how many Capture Metrics events are fired with the $phjs-rrweb-record property that don't have a related session recording. I'm having trouble finding where the Capture Metrics event is actually fired...

The query would probably be too large if we're checking across teams, so I might try to sample it down somehow (maybe just take a subset of teams with recordings enabled).

Thoughts?

@rcmarron
Copy link
Contributor

rcmarron commented Oct 18, 2021

I have some revised thoughts here. It's helpful to think about two separate categories of missing recordings:

  1. The server receives data, but it's incomplete
  2. The server never receives data about the recording

After learning more about how session recording works, I think the vast majority of our 'missing recording' cases fall into the first category (see #2927, #6482). I'm sure there are cases of the 2nd category, but my guess is that the causes are out of our control (e.g. user with no network, analytics blockers etc.).

My proposal is that we focus on the first category. Fix those issues and then evaluate if we think there is more work to be done from there.

With that in mind, here is a query that measures the first case: https://metabase.posthog.net/question/167-recordings-w-o-a-full-snapshot-in-past-24hrs

@paolodamico Does that work for you?

@paolodamico
Copy link
Contributor Author

paolodamico commented Oct 19, 2021

That context is helpful @rcmarron. Well I would argue, we're assuming that (2) is small and/or we can't do anything about it, but unless we measure it we have no way of answering this (what if we have a bug in posthog-js that is dropping a lot more sessions?). I would challenge us to measure this so we can then decide whether further action is warranted.

Even for cases like network issues there might be things to try: can we minimize payloads? maybe don't capture images if network is slow?, ...


As a side note, it's great that we have that query, we now have a baseline and a solid way of measuring whether we achieved our goal for the sprint.

@macobo
Copy link
Contributor

macobo commented Oct 20, 2021

Suggestion: Don't target all session recordings. Sessions which immediately bounce or are very short are a lot more low-value than long sessions.

Suggested metrics:

  • % of recordings where no full snapshots are missing (>30s)
  • % of recordings with no intermediary events are missing (>30s)
    • (e.g. done by sending an autoincrementing value together with every snapshot event, storing this info along with compression info)

@paolodamico
Copy link
Contributor Author

Really love those conceptual metrics! We have a sync conversation scheduled for tomorrow to discuss and figure out more specifics. For instance, I'd want us to make sure we're holistically tracking any session >30s.

@rcmarron
Copy link
Contributor

Closing the loop here. After some discussions, we decided to measure session recordings via 2 metrics:

  1. Number of session recordings that do not have a full snapshot. (https://metabase.posthog.net/question/167-recordings-w-o-a-full-snapshot-in-past-24hrs)
    • Note: It's on purpose that this isn't filtering out recordings less than 30 seconds. Even short recordings should have full snapshots, and if not it might be a sign of another issue like ph-js crashing when sending the snapshot.
  2. Recording playbacks where the rrweb player fired warnings. It warns when it can't make sense of the sequence of events (e.g. events are missing or the sequence doesn't make sense). This metric needs to be implemented (Report when the rr-web player has warnings. #6704)

@rcmarron
Copy link
Contributor

Closing this as part 1 is done and part 2 is captured here: #6704

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants