Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant performance issues with funnels on Postgres #5519

Closed
1 of 2 tasks
paolodamico opened this issue Aug 9, 2021 · 3 comments · Fixed by #5997
Closed
1 of 2 tasks

Significant performance issues with funnels on Postgres #5519

paolodamico opened this issue Aug 9, 2021 · 3 comments · Fixed by #5997
Labels
performance Has to do with performance. For PRs, runs the clickhouse query performance suite

Comments

@paolodamico
Copy link
Contributor

In what situation are you experiencing subpar performance?

An incredibly helpful user today, shared this experience on funnels taking way too much time and resources (exact numbers TBC) to execute with me after upgrading to 1.27 (Postgres). User reported event volume is pretty small (exact numbers coming soon).

  • Even at a 7-day time range leads to quite a few timeouts.
  • After the funnel effectively times out (client seems to stop polling), we show the empty state message ("Enter your funnel details and click 'calculate' to create a funnel visualization"). Unsure if this is an error we fixed recently for Postgres too.
  • Postgres memory usage starts hovering at around 20%
    image
  • Deep diving into the funnel queries reveals that query takes 12 minutes to complete.
  • Seems like the bottleneck is CPU usage on the instance.
  • Had similar issues since 1.24 but 1.27 made it more frequent and more evident.

How to reproduce

See above.

Environment

  • PostHog Cloud
  • self-hosted PostHog, version/commit: 1.27 on Postgres

Additional context

This seems like a huge blocker on Postgres, even if we start considering deprecating Postgres usage, this might be one of those things to actually address. Thoughts? @macobo @neilkakkar @EDsCODE

Thank you for your performance issue report – we want PostHog to go supersonic!

@paolodamico paolodamico added the performance Has to do with performance. For PRs, runs the clickhouse query performance suite label Aug 9, 2021
@macobo
Copy link
Contributor

macobo commented Aug 10, 2021

@paolodamico mind putting me in touch with the user in users slack? Would love to take a look and debug this, but this might require a bigger dataset or access I don't have :)

@paolodamico
Copy link
Contributor Author

Done!

@paolodamico
Copy link
Contributor Author

Problem reported by another user, funnels not loading in the insights page.

hazzadous pushed a commit that referenced this issue Sep 15, 2021
We're using this to join onto person distinct ids in lots of places to this
really should be indexed. There is almost certainly a better composite index
here however.

Relates to #5519 although I have yet to
look into if this is a resolution to the issue specifically.

The specific query for #5519 looks like this, from looking `pytest posthog/api/test/test_insight.py -k test_insight_funnels_basic_post`:

```
SELECT "posthog_person"."uuid",
    "posthog_person"."created_at",
    "posthog_person"."team_id",
    "posthog_person"."properties",
    "posthog_person"."is_user_id",
    MIN("step_0"."step_ts") as "step_0",
    MIN("step_1"."step_ts") as "step_1"
FROM (
        SELECT "pdi"."person_id",
            MIN("posthog_event"."timestamp") AS "step_ts"
        FROM posthog_event
            JOIN posthog_persondistinctid pdi ON pdi.distinct_id = posthog_event.distinct_id
        WHERE (
                "posthog_event"."timestamp" >= '2020-10-08T00:00:00+00:00'::timestamptz
                AND "posthog_event"."timestamp" <= '2020-11-15T13:10:40.739763+00:00'::timestamptz
                AND "posthog_event"."event" = 'user signed up'
                AND "posthog_event"."team_id" = 2
            )
        GROUP BY "pdi"."person_id"
    ) step_0
    LEFT JOIN LATERAL (
        SELECT "pdi"."person_id",
            MIN("posthog_event"."timestamp") AS "step_ts"
        FROM posthog_event
            JOIN posthog_persondistinctid pdi ON pdi.distinct_id = posthog_event.distinct_id
        WHERE (
                "posthog_event"."timestamp" >= '2020-10-08T00:00:00+00:00'::timestamptz
                AND "posthog_event"."timestamp" <= '2020-11-15T13:10:40.739763+00:00'::timestamptz
                AND "pdi"."person_id" = "step_0"."person_id"
                AND "posthog_event"."event" = 'user did things'
                AND "posthog_event"."team_id" = 2
                AND "posthog_event"."timestamp" >= "step_0"."step_ts"
            )
        GROUP BY "pdi"."person_id"
    ) step_1 ON TRUE
    JOIN posthog_person ON posthog_person.id = "step_0".person_id
WHERE "step_0".person_id IS NOT NULL
GROUP BY "posthog_person"."uuid",
    "posthog_person"."created_at",
    "posthog_person"."team_id",
    "posthog_person"."properties",
    "posthog_person"."is_user_id"
```
hazzadous pushed a commit that referenced this issue Sep 16, 2021
Have tested on prod with team_id=2. Seems to work at making the query
complete in time. Quite quick there but not much data.

There's probably a lot more we could do here but I don't want to burn
too much time on this.

Future improvements could be:

 1. Remove the first subquery and just do a join
 2. Investigate if there's benefit in removing the group by and using
    distinct on. I don't know enough about this tbh.
 3. Remove all the string manipulations with re. Just write up the query
    by hand.
 4. Remove the implicit coupling between query_bodies gen and the `for step,
    qb` look. It's probably best to do this in one pass.

Closes #5519
hazzadous pushed a commit that referenced this issue Sep 20, 2021
* docs(funnel): add comments around how the funnel query is build for pg

* chore(funnel): explicitly couple `_gen_lateral_bodies` to `_build_query`

`_gen_lateral_bodies` result seems to always be passed to `_build_query`
so it doesn't make too much sense to have them required to be called in
conjunction.

There are probably further changes that could be made. There is some
implicit coupling around the return value and the joining of the bodies
into a LATERAL JOIN, which should be made explicit.

* chore(funnels): reduce the query generation complexity

This is mainly just to make it a little more clear how the query fits
together.

I have done something naughty here in that I have changed from a using a
GROUP BY to using DISTINCT ON for the top level. I think you'd get
uniqueness of funnel path anyway, as each lateral join should only
produce only one result.

* perf(funnels): add filter on pdi.team_id to speed up query

Have tested on prod with team_id=2. Seems to work at making the query
complete in time. Quite quick there but not much data.

There's probably a lot more we could do here but I don't want to burn
too much time on this.

Future improvements could be:

 1. Remove the first subquery and just do a join
 2. Investigate if there's benefit in removing the group by and using
    distinct on. I don't know enough about this tbh.
 3. Remove all the string manipulations with re. Just write up the query
    by hand.
 4. Remove the implicit coupling between query_bodies gen and the `for step,
    qb` look. It's probably best to do this in one pass.

Closes #5519
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Has to do with performance. For PRs, runs the clickhouse query performance suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants