Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(funnels): add filter on pdi.team_id to speed up query #5997

Merged
merged 4 commits into from
Sep 20, 2021

Conversation

hazzadous
Copy link
Contributor

@hazzadous hazzadous commented Sep 16, 2021

Have tested on prod with team_id=2. Seems to work at making the query
complete in time. Quite quick there but not much data.

There's probably a lot more we could do here but I don't want to burn
too much time on this.

Future improvements could be:

  1. Remove the first subquery and just do a join
  2. Investigate if there's benefit in removing the group by and using
    distinct on. I don't know enough about this tbh.
  3. Remove all the string manipulations with re. Just write up the query
    by hand.
  4. Remove the implicit coupling between query_bodies gen and the for step, qb look. It's probably best to do this in one pass.

Closes #5519

@timgl timgl temporarily deployed to posthog-pr-5997 September 16, 2021 15:07 Inactive
@hazzadous hazzadous force-pushed the fix/gh-5888-funnels-query-perf-part-2 branch from be15b62 to a3f0e2d Compare September 16, 2021 15:09
@timgl timgl temporarily deployed to posthog-pr-5997 September 16, 2021 15:09 Inactive
@hazzadous hazzadous force-pushed the fix/gh-5888-funnels-query-perf-part-2 branch from a3f0e2d to 443f9eb Compare September 16, 2021 15:19
@timgl timgl temporarily deployed to posthog-pr-5997 September 16, 2021 15:19 Inactive
Harry Waye added 3 commits September 16, 2021 16:28
`_gen_lateral_bodies` result seems to always be passed to `_build_query`
so it doesn't make too much sense to have them required to be called in
conjunction.

There are probably further changes that could be made. There is some
implicit coupling around the return value and the joining of the bodies
into a LATERAL JOIN, which should be made explicit.
This is mainly just to make it a little more clear how the query fits
together.

I have done something naughty here in that I have changed from a using a
GROUP BY to using DISTINCT ON for the top level. I think you'd get
uniqueness of funnel path anyway, as each lateral join should only
produce only one result.
@hazzadous hazzadous force-pushed the fix/gh-5888-funnels-query-perf-part-2 branch from 443f9eb to 601b350 Compare September 16, 2021 15:28
@timgl timgl temporarily deployed to posthog-pr-5997 September 16, 2021 15:28 Inactive
@hazzadous hazzadous force-pushed the fix/gh-5888-funnels-query-perf-part-2 branch from 601b350 to 70df475 Compare September 16, 2021 15:38
@timgl timgl temporarily deployed to posthog-pr-5997 September 16, 2021 15:38 Inactive
Have tested on prod with team_id=2. Seems to work at making the query
complete in time. Quite quick there but not much data.

There's probably a lot more we could do here but I don't want to burn
too much time on this.

Future improvements could be:

 1. Remove the first subquery and just do a join
 2. Investigate if there's benefit in removing the group by and using
    distinct on. I don't know enough about this tbh.
 3. Remove all the string manipulations with re. Just write up the query
    by hand.
 4. Remove the implicit coupling between query_bodies gen and the `for step,
    qb` look. It's probably best to do this in one pass.

Closes #5519
@hazzadous hazzadous force-pushed the fix/gh-5888-funnels-query-perf-part-2 branch from 70df475 to 84f2610 Compare September 16, 2021 15:58
@timgl timgl temporarily deployed to posthog-pr-5997 September 16, 2021 15:59 Inactive
@hazzadous hazzadous merged commit 2fb7cf8 into master Sep 20, 2021
@hazzadous hazzadous deleted the fix/gh-5888-funnels-query-perf-part-2 branch September 20, 2021 07:50
hazzadous pushed a commit that referenced this pull request Oct 19, 2021
(#5997)"

This merged caused an issue with the funnels endpoint when display was
set to `ActionsLineGraph`

I'm reverting so we can add in a snapshot test that will fail when this
revert is reverted.

This reverts commit 2fb7cf8.
hazzadous pushed a commit that referenced this pull request Oct 19, 2021
This test just checks the Funnel with `display="ActionsLineGraph"`
against what it previously did before
#5997 was merged. This code path
was previously untested, so the issue wasn't picked up and resulted in
#6530
hazzadous pushed a commit that referenced this pull request Oct 19, 2021
This resolves an issue what was introduced by
#5997 where we would fail to
calculate a funnel result if display="ActionsLineGraph" was specified.
The fix was simply to wrap a string in `sql.SQL`.

Resolves #6530
hazzadous pushed a commit that referenced this pull request Oct 19, 2021
)

* Revert "perf(funnels): add filter on pdi.team_id to speed up query
(#5997)"

This merged caused an issue with the funnels endpoint when display was
set to `ActionsLineGraph`

I'm reverting so we can add in a snapshot test that will fail when this
revert is reverted.

This reverts commit 2fb7cf8.

* test(funnel): add snapshot test for funnel trend query for postgres

This test just checks the Funnel with `display="ActionsLineGraph"`
against what it previously did before
#5997 was merged. This code path
was previously untested, so the issue wasn't picked up and resulted in
#6530

* Revert "Revert "perf(funnels): add filter on pdi.team_id to speed up
query"

Now that we have a test in place for
#6530 I'm reverting this revert
so we can implement a fix.

This reverts commit 49761a3.

* only test on postgres

* fix(funnel): `TypeError` for funnel with display=ActionsLineGraph

This resolves an issue what was introduced by
#5997 where we would fail to
calculate a funnel result if display="ActionsLineGraph" was specified.
The fix was simply to wrap a string in `sql.SQL`.

Resolves #6530

* fix import order

* move test outside of test factory

* remove imports

* move test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Significant performance issues with funnels on Postgres
3 participants