Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically catch missing team_id filters #3085

Closed
macobo opened this issue Jan 26, 2021 · 2 comments
Closed

Automatically catch missing team_id filters #3085

macobo opened this issue Jan 26, 2021 · 2 comments
Labels

Comments

@macobo
Copy link
Contributor

macobo commented Jan 26, 2021

Is your feature request related to a problem?

Missing team_id filters can cause havoc, causing performance (and correctness problems) in queries.

Reviews can't catch all of them since sometimes unexpected tables have team_id filters or we might forget to push the filters into subqueries.

It would be great if we could avoid them.

Describe the solution you'd like

Let's add heuristics to avoid these errors.

A simple heuristic would be to count number of tables selected from/joined (X) with a team_id column and compare it against number of team_id filters (Y).

After each test it could confirm for all queries X <= Y

Describe alternatives you've considered

Make reviewers perfect.

Additional context

Related PRs: #3084 #2299 #2276 https://github.com/PostHog/posthog/pull/1967/files etc

Hacky proof-of-concept for Django ORM queries:

class TeamIdCheckTestMixin:
    def setUp(self):
        from django.conf import settings
        settings.DEBUG = True
    def tearDown(self):
        super().tearDown()  # type: ignore
        import re
        from django.conf import settings
        from django.db import connection, reset_queries
        from django.apps import apps as django_apps
        TABLE_REGEX = r'(?:FROM|JOIN) "?(\w+)"? '
        tablenames = set([model._meta.db_table for model in django_apps.get_models() if hasattr(model, 'team_id')]) - set(['posthog_team'])
        def match(query):
            tables = re.findall(TABLE_REGEX, query)
            relevant_tables = [t for t in tables if t in tablenames]
            team_id_count = len(re.findall(r'(team_id"? = )', query))
            return tables, relevant_tables, team_id_count
        for query in [q['sql'] for q in connection.queries]:
            if 'UPDATE' in query or 'posthog_personalapikey' in query: continue
            tables, relevant_tables, team_id_count = match(query)
            if team_id_count > 0 or len(relevant_tables) > 0:
                print(query, tables, relevant_tables, team_id_count)
            if len(relevant_tables) > team_id_count:
                print('FAIL')
                raise Exception()
        reset_queries()

Thank you for your feature request – we love each and every one!

@macobo macobo added the tests label Jan 26, 2021
macobo added a commit that referenced this issue Feb 18, 2021
Tested out a solution for #3085 - result is super noisy but helped me
find some bad queries.
macobo added a commit that referenced this issue Feb 19, 2021
Tested out a solution for #3085 - result is super noisy but helped me
find some bad queries.
@paolodamico paolodamico removed the tests label Feb 22, 2022
@posthog-bot
Copy link
Contributor

This issue hasn't seen activity in two years! If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in two weeks.

@posthog-bot
Copy link
Contributor

This issue was closed due to lack of activity. Feel free to reopen if it's still relevant.

@posthog-bot posthog-bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants