Skip to content
This repository has been archived by the owner on Jul 16, 2021. It is now read-only.

Builder::chunk may consider previously set offset and limit #103

Closed
vlakoff opened this issue May 28, 2016 · 19 comments
Closed

Builder::chunk may consider previously set offset and limit #103

vlakoff opened this issue May 28, 2016 · 19 comments

Comments

@vlakoff
Copy link

vlakoff commented May 28, 2016

Currently, if one does this:

DB::table('records')->offset(1000)->limit(500)->chunk(50, $callback);

He might expect to retrieve only the records from 1000 to 1500, but that's not the case: the whole table is retrieved. This is particularly surprising because other clauses such as where would be taken into consideration.

I have come with a first draft, not yet tested, and it still feels clumsy…

public function chunk($count, callable $callback)
{
    $thisOffset = $this->getOffset(); // method to be implemented
    $thisLimit = $this->getLimit(); // same

    $page = 1;
    $rowsfetched = 0;

    do {
        $skip = $thisOffset + ($page - 1) * $count;

        if ($thisLimit) {
            $remaining = $thisLimit - $rowsfetched;
            if ($remaining == 0) {
                return true;
            }
            $take = min($count, $remaining);
        } else {
            $take = $count;
        }

        $results = $this->skip($skip)->take($take)->get();
        $countResults = $results->count();

        if ($countResults == 0) {
            break;
        }

        // On each chunk result set, we will pass them to the callback and then let the
        // developer take care of everything within the callback, which allows us to
        // keep the memory low for spinning through large result sets for working.
        if (call_user_func($callback, $results) === false) {
            return false;
        }

        $page++;
        $rowsfetched += $countResults;
    }
    while ($countResults == $take);

    return true;
}

So,

  • Do you agree on the idea?
  • Suggestions for improving the code?
@JosephSilber
Copy link
Member

JosephSilber commented May 29, 2016 via email

@vlakoff
Copy link
Author

vlakoff commented May 29, 2016

Yep, I've found it back: laravel/framework#9681. It's not that recent… Funny how my code came close to arrilot's one.

@arrilot
Copy link

arrilot commented May 29, 2016

@vlakoff there are not many other ways :) I (and probably many others) still disagree with @taylorotwell in this case but he is the boss in the end

@taylorotwell
Copy link
Member

I will look into it again.

On May 29, 2016, at 7:23 AM, Nekrasov Ilya [email protected] wrote:

@vlakoff https://github.com/vlakoff there are not many other way :) I (and probably many others) still disagree with @taylorotwell https://github.com/taylorotwell in this case but he is the boss in the end


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #103 (comment), or mute the thread https://github.com/notifications/unsubscribe/AAcRfrVy5XEzMq59Hm4fHif3br0S4ta6ks5qGYUugaJpZM4IpJnt.

@vlakoff
Copy link
Author

vlakoff commented May 31, 2016

Thanks :-)

Just a few reminders:

  • Better check "until count($results) < $chunk/take", rather than "while count($result) > 0". This way, if the last chunk is partial, you know you have reached the end and don't do another query which returns no results. This could be applied to current code too.
  • A case I had to keep in mind when working on the logic: if there is a $limit set and the last chunk is entire (not partial). Make sure you don't continue processing in this case.

@drmmr763
Copy link

drmmr763 commented Aug 9, 2016

Just want to offer a friendly +1 for changing this. I literally just wrote the same lines expecting the chunk method to iterate over my given limit and offset values. It works perfectly with "whereBetween".

thanks

@vlakoff
Copy link
Author

vlakoff commented Nov 1, 2016

I've worked on a revised version which seems to be quite good. Already posting it here, if you want to have a glance at it:

public function chunk($count, callable $callback)
{
    $offset = $this->getOffset() ?: 0; // method to be implemented
    $remaining = $this->getLimit(); // same

    do {
        if (! is_null($remaining)) {
            $limit = min($count, $remaining);
        } else {
            $limit = $count;
        }

        $results = $this->skip($offset)->take($limit)->get();

        $countResults = $results->count();

        if ($countResults == 0) {
            break;
        }

        // On each chunk result set, we will pass them to the callback and then let the
        // developer take care of everything within the callback, which allows us to
        // keep the memory low for spinning through large result sets for working.
        if (call_user_func($callback, $results) === false) {
            return false;
        }

        $offset += $countResults;

        if (! is_null($remaining)) {
            $remaining -= $countResults;
            if ($remaining == 0) {
                break;
            }
        }
    } while ($countResults == $limit);

    return true;
}

@vlakoff
Copy link
Author

vlakoff commented Nov 17, 2016

Calling chunk() "pollutes" the offset and limit properties of the query builder instance. This may lead to side-effects if reusing the instance.

Would it be worth it to backup/restore these properties, or is it acceptable to consider the instance is "dirty" once the queries have been run?

edit: just submitted laravel/framework#16479 about this.

@ISaidHey
Copy link

Just chiming in with the need for this. Found this thread after writing limit()->chunk() logic and finding it not working.

@Kyle-J
Copy link

Kyle-J commented Feb 1, 2017

+1

Same issue here. Just came across a scenario expecting limit to work and am now trying to implement my own tick limit. It's isn't easy to break out because it's a closure passed to an internal loop, so break an continue don't work. If I use exit I can't output anything later so the only option is return and let it run it's course.

Anyone got a better way to break out of it?

if($tick_count >= $tick_limit) return;

@alberto-bottarini
Copy link

alberto-bottarini commented Feb 7, 2017

I've implemented a different approach:

        Builder::macro("chunkWithLimit", function ($count, $from, $to, callable $callback) {
            $page = 1;
            do {
                $offset = ($page - 1) * $count + $from;
                $results      = $this->offset($offset)->limit($count)->get();
                $countResults = count($results);
                if ($countResults == 0) {
                    break;
                }
                if ($callback($results) === false) {
                    return false;
                }
                $page++;
            } while ($countResults == $count && $offset < $to - $count);
            return true;
        });

Maybe this will be useful

@vlakoff
Copy link
Author

vlakoff commented Feb 7, 2017

This approach is interesting, as using extra parameters instead of the offset/limit properties allows to simplify a bit the implementation.

(note your implementation has a flaw, it doesn't support partial pages. For example chunkWithLimit(100, 0, 50) would process 100 items instead of 50, chunkWithLimit(100, 0, 150) would process 200 items instead of 150)


So, a macro could be a solution for now, but the ideal solution would still be to adapt the chunk methods.

My snippet above should work fine, but the boring part is updating the tests (mock skip/take instead of forPage) and adding new tests. If someone wants to take care of this, be my guest. I would be very grateful for this.

@DiederikvandenB
Copy link

Regardless of whether the function should be adapted or not, I do believe the docs need to be clearer about this. Similar to many others, the expected behaviour would be for the function not to ignore any offset/limits. Just one extra sentence in the docs could've saved me some debugging.

@drbyte
Copy link

drbyte commented Jun 16, 2017

@DiederikvandenB I'm sure a PR to the docs with that one extra sentence will be happily considered if it clarifies something that's lacking sufficient detail.

https://github.com/laravel/docs

@decadence
Copy link

And it will not necessary be merged but you should give it a try

@hhsadiq
Copy link

hhsadiq commented Jan 2, 2018

Facing same issue

@taylorotwell
Copy link
Member

If someone wants to PR a clean solution for this I would be open to considering it.

@andrewmclagan
Copy link

Suffering this issue after writing an Elasticsearch queue indexer. Very very hard to debug.

@vlakoff
Copy link
Author

vlakoff commented Mar 19, 2018

The new code I posted above (wow, 2016) does work, but as I said in a later comment (specifically, "mock skip/take instead of forPage"), the tedious part is the tests. I had given it a try, but it was getting incredibly verbose... and boring. Anyone willing to support?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests