Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x] Add Builder@lazy() and Builder@lazyById() methods #36699

Merged
merged 1 commit into from
Mar 23, 2021

Conversation

JosephSilber
Copy link
Contributor

@JosephSilber JosephSilber commented Mar 22, 2021

Background

For querying large datasets, the Builder currently has the cursor() method, which returns a LazyCollection. This uses less memory than a regular Collection (returned from the get() method), since it never keeps more than a single Eloquent model in memory.

However, the cursor() method still has several drawbacks:

  1. It's not truly lazy, since PHP still caches all query results in its buffer.

    Turning off buffering introduces its own set of challenges (namely not being able to execute other queries simultaneously).

  2. It cannot eager load relationships, since it only ever deals with a single record at a time.

  3. Depending on the DB, opening a cursor to a huge dataset may have a slight delay vs. running a query with a LIMIT.

We also have the chunk() method, which is kinda lazy, but with a clunky API — as is evident by the fact that we needed to introduce separate each() and chunkMap() methods (with chunkMap() having to build up the whole result set in memory 🙈).

Introducing lazy()

The new lazy() method introduced in this PR will chunk results behind the scenes, and return a single LazyCollection of results:

$lazyCollection = User::lazy();

Since it's a lazy collection, you have the full power of collections at your fingertips:

You can call each() directly on it:

User::lazy()->each->greet();

You can call map() on it:

$results = User::lazy()->map->calculateOutstandingBalance();

Or even chunk() it... The possibilities are endless, and we'll no longer have to create all of these separate one-off methods to query and manipulate results lazily.

@JosephSilber JosephSilber changed the title ]8.x] Add Builder@lazy() and Builder@lazyById() methods [8.x] Add Builder@lazy() and Builder@lazyById() methods Mar 22, 2021
@ejunker
Copy link
Contributor

ejunker commented Mar 22, 2021

This looks awesome! Such a powerful feature. I currently have some Builder macros similar to this and they have been critical when working with large data processing tasks. I think it would be great to have them in core so everyone can use them.

@taylorotwell taylorotwell merged commit bb6e6f2 into laravel:8.x Mar 23, 2021
@taylorotwell
Copy link
Member

Thanks! Would you mind sending me some documentation on this to laravel/docs

@JosephSilber
Copy link
Contributor Author

Sure! I'll try to whip something up later today or tomorrow.

@JosephSilber JosephSilber deleted the builder-lazy branch March 23, 2021 15:08
@tpetry
Copy link
Contributor

tpetry commented Mar 23, 2021

Is the cursor() method buffering all results to the memory for every database driver? Until now i expected the method to use real database cursors. I am asking because i got the intention that the postgresql driver seems not to buffer everything to the memory? I should have gotten already some out of memory errors for my laravel postgres applications with large datasets which did not happen. And i really can't find any information whether this is a limitation for mysql or every pdo database.

@stephenjude
Copy link

@JosephSilber much love from here. Thanks for this PR

@JosephSilber
Copy link
Contributor Author

@tpetry The PHP docs only explicitely mention MySQL, so maybe this doesn't apply to other drivers.

This would require testing the memory consumption on different databases, in order to know with any level of certainty.

@decadence
Copy link
Contributor

decadence commented Sep 15, 2023

@JosephSilber can you look at this bug with lazyById please?

Looks like fix will be this check

$lastId = $results->last()->{$alias};

if ($lastId === null) {
    throw new RuntimeException("The lazyById operation was aborted because the [{$alias}] column is not present in the query result.");
}

@decadence
Copy link
Contributor

Fixed here #48436

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants