Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate inner queries instead of join+distinct #605

Closed

Conversation

fsateler
Copy link
Contributor

@fsateler fsateler commented Nov 19, 2019

When applying filters with related entities, CanCan used to generate a LEFT JOIN plus
a DISTINCT clause to avoid repeated records. This is problematic for a number of reasons:

  1. Postgres is picky about setting ORDER BY with a column not on the DISTINCT clause, because it applies the ORDER after the DISTINCT so it cannot 'see' the referenced column.
  2. Groups and Counts become more difficult to do because foo.distinct.count is not the same as foo.count.

The way to avoid this problem is to use inner queries that don't multiply the number of returned records. We do this by running the query we were previously using inside a WHERE IN block.

This is a different approach than #600.

cc @ghiculescu

Fixes #596

When applying filters with related entities, CanCan used to generate a LEFT JOIN plus
a DISTINCT clause to avoid repeated records. This is problematic for a number of reasons:

1. Postgres is picky about setting ORDER BY with a column not on the DISTINCT clause, because
   it applies the ORDER after the DISTINCT so it cannot 'see' the referenced column.
2. Groups and Counts become more difficult to do because `foo.distinct.count` is not the same
   as `foo.count`.

The way to avoid this problem is to use inner queries that don't
multiply the number of returned records. We do this by running the
query we were previously using inside a WHERE IN block.
@fsateler fsateler force-pushed the feature/dont-distinct branch from 9166512 to 1edb451 Compare November 19, 2019 21:44
@fsateler
Copy link
Contributor Author

I can't reproduce the CI failures locally. AFAICS, the only difference is with the postgres version (I'm using 11, CI appears to use 9.6).

Without an order by clause, order is undefined. Therefore if more than one result is returned,
we don't know which one is first. Ensure we have a single record to make sure we have the one we
want
@fsateler
Copy link
Contributor Author

I have reproduced this with postgres 9.6. The problem is that selecting custom columns and preloading of associatons in accessible_by_{integration,has_many_through} are non-deterministic since the result set contains two records and the order is not ensured. I fixed those tests by adding a condition ensuring a single result.

Otherwise, if the query already has a joins, we might add it twice.

This also makes it unnecessary to remove the order and joins
@fsateler fsateler force-pushed the feature/dont-distinct branch from a2767d2 to 67640e7 Compare November 20, 2019 19:42
@oleksii-leonov
Copy link
Contributor

@fsateler Great work!

When applying filters with related entities, CanCan used to generate a LEFT JOIN plus
a DISTINCT clause to avoid repeated records. This is problematic for a number of reasons:

One more reason to avoid DISTINCT — it would fail when used on tables with json columns (PostgreSQL have no equality operator for type json).
ERROR: could not identify an equality operator for type json LINE 1: SELECT DISTINCT "users".* FROM ...

If the user changed it, then it wouldn't work
@coorasse
Copy link
Member

I really like this PR! 👍 I'd like to run a couple of tests locally before merging it. Thanks for the nice work!

@derosm2
Copy link

derosm2 commented Feb 21, 2020

Any update on this? Excited to see this get merged in.

@coorasse
Copy link
Member

coorasse commented Mar 6, 2020

I am running the last tests and I included also #608 . Please see #619 . Will be released soon. Very nice work community! ❤️ you.

@coorasse
Copy link
Member

coorasse commented Mar 6, 2020

closed in #619

@coorasse coorasse closed this Mar 6, 2020
@fsateler fsateler deleted the feature/dont-distinct branch March 6, 2020 19:10
@@ -76,7 +76,7 @@ class Editor < ActiveRecord::Base

describe 'preloading of associatons' do
it 'preloads associations correctly' do
posts = Post.accessible_by(ability).includes(likes: :user)
posts = Post.accessible_by(ability).where(published: true).includes(likes: :user)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think these changes had any affect on tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did. See the explanation #605 (comment)

The query changed just enough to get postgres to return a different record (because the query contained 2 but we applied limit 1).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh yep right you are. I was only testing locally in sqlite when I wrote that comment. It seemed to not make a difference there.

ghiculescu added a commit to ghiculescu/cancancan that referenced this pull request Oct 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

accessible_by with hash of conditions causes Not unique table/alias
5 participants