Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate SkipWhile and TakeWhile #14104

Open
corstian opened this issue Dec 6, 2018 · 17 comments
Open

Translate SkipWhile and TakeWhile #14104

corstian opened this issue Dec 6, 2018 · 17 comments

Comments

@corstian
Copy link

corstian commented Dec 6, 2018

Feature Request

In order to facilitate my use case of implementing cursor based pagination I would find the LINQ .SkipWhile() method useful.

I would imagine this expression to result in something like the following SQL statement:

(DbSet<T>)
    .OrderBy(q => q.DepartureTime)
    .ThenBy(q => q.ArrivalTime)
    .SkipWhile(q => q.Id != "84CFC944-B64E-4C7D-80CD-053C771DC478")
    .Take(50);
SELECT * FROM [Flights] ORDER BY [DepartureTime], [ArrivalTime]
OFFSET (
    SELECT TOP 1 rn
    FROM (
        SELECT Id, ROW_NUMBER() OVER (ORDER BY [DepartureTime], [ArrivalTime]) as rn FROM [Flights]
    ) T
    WHERE T.Id = '84CFC944-B64E-4C7D-80CD-053C771DC478'
) rows
FETCH NEXT 50 ROWS ONLY

Until now I have been unable to find a reusable and composable way to achieve this behaviour by using the FromSql() extension method for multiple models.

Related Issues

#12046

@smitpatel smitpatel self-assigned this Dec 6, 2018
@ajcvickers ajcvickers added this to the Backlog milestone Dec 12, 2018
@smitpatel smitpatel removed their assignment Jan 16, 2019
@benmccallum
Copy link
Contributor

After almost creating another issue I've ended up here and thought I'd share my situation and get some more weight behind this proposal.

I'm building a GraphQL service and trying to support cursor-based pagination.

What I'm finding is that those articles make it sound really easy to implement, as do a lot of blog posts, but in reality that's rarely going to work. The examples I see always assume: "You sort by CreatedAt" or "You sort by BookingID" but that fails to address two potential issues:

  • The client wanting to order by another field
  • Order by another field that can have duplicates (see below)

In my situation right now, I'm facing both issues. I'm working on a set of Bookings. A booking has:

  • a booking id,
  • a booking date (date and time of arrival), and
  • a shop (where the booking is completed).

The client would like to be able to sort by booking date (issue 1), so this would therefore be the logical cursor to use, but when the client is an admin user, the result set (across shops) will have duplicate booking dates (issue 2) as it is essentially an discrete list of arrival times for a date. So, that means booking date alone can't be used as the cursor as getting a page after a given cursor would effectively skip all the remaining pages of bookings that had that same booking date.

So, the search for a cursor continues. I first though it could be a combination of booking date and booking id, but although that makes sense for the ORDER BY clause (ORDER BY BookingDate, BookingID), booking ids don't progress in line with booking dates, so the WHERE clause (WHERE BookingDate > Cursor.BookingDate AND BookingID > Cursor.BookingID) would effectively skip results if a booking with a later BookingDate has a lower BookingID than a previous page.

Finally, it would seem the best cursor to use would be a ROWNUMBER generated by SQL Server. That way I can do my ORDER BY clause across multiple fields and then in the WHERE clause filter by ROWNUMBER/cursor.

Ideal solution
Essentially what I'd like to be able to do take my existing IQueryable and just tell it to do a WHERE ROWNUMBER > cursor, and the OVER expression just dupes the ORDER BY of my query, happy days. Is that possible, or would it be easy to add support for that?

OFFSET solution
@corstian , issue poster here wrote his blog post with the same solution he's proposed above. Essentially he's calculating the OFFSET based on a subquery for the ROWNUMBER.

Is it possible now to re-use the IQueryable I've already built up with various filters and other logic earlier, and tack on this OFFSET clause with a subquery? Is there an elegant way to do that now in EF Core pending his SkipWhile suggestion above?

PK lookup first, then query results solution
Failing that, I think I'd be able to live with doing an upfront for the page of BookingIDs I'd need to do my full query on.

  1. SELECT BookingID, ROWNUMBER() OVER(BookingDate, BookingID) AS cursor FROM Booking ORDER BY BookingDate, BookingID) WHERE cursor > myCursor
    2.`SELECT * FROM Booking WHERE BookingID IN (...list of booking ids grabbed in above)

But similarly to above, I want to re-use my IQueryable and tack on the ROWNUMBER into my SELECT, how can I do that elegantly?

Painful solution
Use @corstian 's SqlKata solution to maybe do the PKs lookup first, therefore implementing all my filtering in that ORM and the queries to get totalCount, hasNextPage, hasPreviousPage, etc., just to get a list of PKs to do my real, object-returning EF query. Could be worse, but it's another dependency to bring on board probably with it's own quirks and not as much LINQ support as EF Core team can provide one would imagine.

@corstian
Copy link
Author

corstian commented Feb 7, 2019

The biggest issue is that I did not succeed in materializing the IQueryable in a way which is usable for me. My current solution is a way of building queries with SqlKata, comparable with EF LINQ, but weakly typed. It is this query which I am using as a subquery for pagination purposes. This way I could tackle all the problems you have described above (and yes, I'd also rather do that with EF Core, although SqlKata is an amazing library).

I'll write up another blogpost soon showing the details about my current solution, and maybe put it in a handy NuGet package. I'm currently running around on a exhibition so it might take a few days though.

@benmccallum
Copy link
Contributor

benmccallum commented Feb 7, 2019

Thanks for the response. It definitely should be easier than it is, hopefully the EF Core team can prioritise it in the backlog. Would love to see your solution. I may need to do the same. I found a .ToSql() extension method that gets the raw SQL but like you said it's a bit of a nightmare to be dealing with something so raw. Theoretically I could just pop on the OFFSET with subquery in there and see what happens, but it's pretty loose haha.

Update: scratch that. Few people have said it's super slow to generate the SQL, which makes sense given it's using reflection. I wish I could just generate the WHERE clause somehow though, to pop it into a raw query for the PKs and ROWNUMBER()...

cc: @RahenSaeed , this will be useful once it's in place as far as EF Core goes for implementing cursor-based pagination effectively in real-world scenarios.

@corstian
Copy link
Author

corstian commented Feb 7, 2019

@benmccallum I'll elaborate it more expansively later, but this is the gist of it:

/// <summary>
/// A ggeneric method for applying cursor based pagination
/// </summary>
/// <param name="query">The base query to apply pagination to</param>
/// <param name="queryFactory">A queryfactory instance</param>
/// <param name="column">The column to use for cursor based pagination</param>
/// <param name="cursor">The cursor value</param>
/// <param name="count">The number of items to retrieve after the cursor</param>
/// <returns></returns>
public static Query CursoredOffset(
    this Query query,
    QueryFactory queryFactory,
    string column,
    string cursor,
    int count)
{
    var compiler = new SqlServerCompiler();
    var ctx = new SqlResult
    {
        Query = query
    };
            
    var order = compiler.CompileOrders(ctx);

    query.Clauses.RemoveAll(q => q.Component == "order");

    if (string.IsNullOrWhiteSpace(order))
    {
        throw new Exception($"{nameof(query)} does not have an order by clause");
    }

    var internalQuery = queryFactory.Query()
        .With("d", query)
        .From("d")
        .CombineRaw(order)
        .CombineRaw($@"OFFSET (
            SELECT TOP 1 rn
            FROM(
                SELECT [d].{column}, ROW_NUMBER() OVER({ order }) AS rn FROM [d]
            ) T
            WHERE T.{column.Split('.').LastOrDefault() ?? column} = '{cursor}'
        ) rows
        FETCH NEXT {count} ROWS ONLY");

    return internalQuery;
}

So few things to note here; it uses the query factory and makes life easier for me. The SqlKata query is required to have at least one order specified. If you want to retrieve the resultcount you can grab the count from the query you provided the method above.

@benmccallum
Copy link
Contributor

benmccallum commented Feb 8, 2019

Thx for sharing. I've taken a slightly different approach for now as an example to our team and we're just deciding whether we want to roll that out around the place or try wait for EF Core to support something and avoid another dependency. I almost wonder how hard it would be (without any EF Core repo experience) for us to have a crack at this ourselves. My fear is that the combination of standard OrderBy, ThenBy and Take could makes things difficult unless we can detect the SkipWhile and branch off to generate the OFFSET clause.

For my POC to my team I build the filtering query in SqlKata, then hand off to a function (that I could probably make generic if I parameterised it a bit more) that fires off the necessary queries to get the page of records' PKs with the SqlKata query and then feeding them into the EF Core query. It's super roundabout but at least I can support the whole Connection (hasNextPage, hasPreviousPage, totalCount, edges) in a way that I could make generic with a few more parameters into the query.

@benmccallum
Copy link
Contributor

Having a dig into the EF Core code, I'm thinking we could take inspiration from how they do .UseRowNumberPaging() internally. See the implementation of the SELECT generation here.

@corstian, how would you suggest supporting reverse direction paging with your proposal. Would it just be a matter of reversing the ORDER BY expression and taking LIMIT? I guess so.

I'd be keen to hear whether @smitpatel or @ajcvickers think we can support your suggested syntax or whether we'd use a specific cursor-paging extension methods, like: GetNextPageByCursor(cursor, limit) and GetPreviousPageByCursor(cursor, limit), and then it's just a matter or extracting the ORDER BY in the query (if any), generating the OFFSET subquery and LIMIT and tacking it on.

I might give it a crack if we have some guidance from the core team first as to how they'd like it to look.

@corstian
Copy link
Author

According to the docs it should be possible to use FETCH PRIOR instead of FETCH NEXT. Haven't tried that yet, though.

I think @benmccallum makes a valid point when it comes to specific cursor paging extension methods. While technically it would be totally possible to support backwards pagination using the SkipWhile method there is no way this will make sense semantically. In a more LINQ way I'd be thinking about something like:

dbSet
    .OrderBy(q => q.Timestamp)
    .From(q => q.Id == Guid.NewGuid())
    .Next(100);
// or `.Previous(100);`

@benmccallum
Copy link
Contributor

Ah, yea should've realised there was a FETCH PRIOR, didn't look at the time.

I like your LINQ method names more than mine! I wonder if From could get confused with SQL FROM clause though. Perhaps FromMatch or FromWhere (that's a little weird), but you get what I mean.

@benmccallum
Copy link
Contributor

benmccallum commented Feb 13, 2019

Been digging into the EF Core source code and it's a little overwhelming haha. Came unstuck on the Getting And Building the Code guide running build /t:Compile.
image

If I can get past this then I'll be able to navigate the codebase a bit better. I want to try and find some tests that would be a good starting point to copy, paste, manipulate for what we want and then TDD this thing.

Update:
Strangely enough, running a straight up build without specifying the Compile target worked, so maybe the docs are out of date and like the error msg says, that target no longer exists...

Update 2:
Installed VS 2019 Preview and all the projects fail to open with the following. Calling it for tonight. C:\GitHub\EntityFrameworkCore\src\dotnet-ef\dotnet-ef.csproj : error : Project file is incomplete. Expected imports are missing.

I think the issue is that I can't run build /t:Compile and maybe that is why I don't have C:\Users\Developer/.dotnet/x64 which should be added as part of running that cmd.

@roji
Copy link
Member

roji commented Feb 14, 2019

@benmccallum the docs are indeed out of date with regards to /t:Compile, as you've found out a simple build should run fine.

The latest version of VS 2019 is a requirement to build the source. After build completes successfully, try starting VS by running startvs EFCore.Runtime.sln (that's a reduced solution which doesn't contain some projects which shouldn't be necessary for what you're trying to do). You can achieve the same by dragging and dropping the solution file over startvs.cmd in Windows Explorer.

If you're still having issues please post back.

@benmccallum
Copy link
Contributor

Thanks @roji , I haven't tried this yet as in the end I realised it might be better for me to try implement this in 2.2 and port forward to 3.0. Will see how I go!

@corstian
Copy link
Author

corstian commented Mar 6, 2019

I finally got to finish that post on cursor based pagination. And I found ways to paginate both forwards and backwards, with reasonable speed.

TL;DR: Put the SQL statement in a CTE, select ROW_COUNT within the CTE, and use this for your final selection.

benmccallum added a commit to benmccallum/EntityFrameworkCore that referenced this issue Mar 10, 2019
@benmccallum
Copy link
Contributor

benmccallum commented Mar 10, 2019

So I've been trying to introduce two new extension methods: TakeFromMatch and TakeBeforeMatch, both accepting matchPredicate and count. Naming would be subject to whatever everyone agrees is best of course. For the meantime, I've been trying to reverse-engineer how EF works, via unit tests and as I expected it's a beast.

PR draft linked above for where I've got to atm.

So far I've been able to:

  1. Add the extension methods in EntityFrameworkQueryableExtensions.
  2. With this in place the test failed and led me to find the MethodInfoBasedNodeTypeRegistryFactory which registers the MethodInfo setup above as a supported method in a MethodCallExpressionNodeBase implementation.
  3. Taking inspiration from the existing ones in EFCore\Query\ResultOperators\Internal, I've setup a TakeBesideMatchExpressionNode and TakeBesideMatchResultOperator, but this I'm a bit unsure at the moment.

Questions:

  1. Should my TakeBesideMatchExpressionNode be inheriting ResultOperatorExpressionNodeBase or one of the other ones here
  2. Should my TakeBesideMatchResultOperator be inheriting SequenceTypePreservingResultOperatorBase or one of the other ones here

I think next time I have a crack I need to take reference from how the Take/Skip or OrderBy implementations work, or maybe even the Where one. It seems like Re-linq Remotion is all about building a QueryModel, so I guess my goal is to figure out how to represent the OFFSET FETCH NEXT or previous in that query model and make that manipulation.

@corstian, great blog post btw!

@smitpatel
Copy link
Contributor

@benmccallum - With Relinq going away, all that processing would be unnecessary. We need to add additional methods to translate newly added methods. Currently the PR I have is much in flux. I will let you know once it starts stabilizing but it would be certainly easier to implement this than with Relinq.

@d00ML0rDz
Copy link

Just wondering if there is any update to this or any idea when we may see this implemented in EF Core?

@roji
Copy link
Member

roji commented Dec 9, 2021

I've recently done some research into pagination, and I don't think SkipWhile is the right building block for an efficient "cursor-oriented" pagination mechanism. In a nutshell, offset or rownumber-based approaches do not use indexes, making them inherently inefficient. In addition, in many cases these approaches also lack robustness in the face of concurrent updates, as rows are skipped or duplicated (although the approach detailed above doesn't suffer from this, since the ID is used as the cursor, rather than a raw offset).

For efficient cursor pagination, simple keyset pagination seems to be the best fit. In other words, assuming we're ording by an Id, pagination would simply use a WHERE clause to get rows after the last Id fetched from the previous page; compared to the above approach of an OFFSET subquery with ROW_NUMBER, this is simple and uses indexes. If more than one pagination key is being used, row value comparison can be used WHERE (Date, Id) > (@date, @id) (see #26822), or the expanded version of it (WHERE Date > @date OR Date = @date AND Id > @id).

Assuming we've excluded the use of SkipWhile for pagination purposes, there's still the question of non-pagination uses. There are some characteristics of relational databases which limit the usefulness of SkipWhile:

  1. Rows in relational database have no ordering. Unless ORDER BY is specified, ordering is non-deterministic.
  2. Doing SkipWhile over a non-deterministic set doesn't seem to have any value. Doing Take over unordered results can be useful to retrieve any matching X rows, but it's not clear why someone would want to do SkipWhile over a non-deterministic set.
  3. For SkipWhile to be deterministic, it must be done after ordering by all columns specified in it (e.g. .OrderBy(b => b.CreationDate).SkipWhile(b => b.CreationDate < @date)). The effect of this is very similar to Where, with the difference being that Where would eliminate all rows not matching the filter, whereas SkipWhile eliminates only the rows at the beginning.
  4. However, since the rows are already sorted, in the above example Where is guaranteed to return the same results as SkipWhile. It's possible to come up with scenarios where SkipWhile has different behavior from Where, but these usually seem contrived.

So I'm leaving this issue open in the backlog to track translating SkipWhile and TakeWhile for non-pagination purposes, given non-contrived real-world scenarios which would benefit from it.

@roji roji changed the title Implement SkipWhile() Translate SkipWhile and TakeWhile Dec 9, 2021
@roji roji added this to the Backlog milestone Dec 9, 2021
@slava-se
Copy link

@roji

So I'm leaving this issue open in the backlog to track translating SkipWhile and TakeWhile for non-pagination purposes, given non-contrived real-world scenarios which would benefit from it.

I'm having a scenario where I think TakeWhile() would be useful which is:

I'm trying to count unsuccessful user logins since last successful user login for some protection mechanism and don't see how I could achieve it without making 2 queries and applying some logic to it.

Possible way would be

await _dbContext.UserLogins
.Where(l => l.User.Email == "[email protected]")
.OrderByDescending(l => l.DateCreated)
.TakeWhile(l => l.IsSuccess)
.CountAsync();

Without TakeWhile I would need to retrieve the date (or Id) of last successful login and then count the unsuccessful attempts in the second query. Am I missing some way to make it in one query?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants