Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch Processing using toIterable() #8410

Closed
stlrnz opened this issue Jan 5, 2021 · 2 comments · Fixed by #8467
Closed

Batch Processing using toIterable() #8410

stlrnz opened this issue Jan 5, 2021 · 2 comments · Fixed by #8467
Labels
Milestone

Comments

@stlrnz
Copy link

stlrnz commented Jan 5, 2021

Hi all!

I've updated my application to Doctrine ORM 2.8 some days ago.

Before that, my Repository code to iterate through a large result set looked like the following and worked perfectly even on >500.000 rows without leaking any memory.

    public function iterateAll(): Generator
    {
        $iterator = $this->createQueryBuilder('e')->getQuery()->iterate(null, Query::HYDRATE_SIMPLEOBJECT);

        foreach ($iterator as $equipment) {
            // index 0 is always the object
            yield $equipment[0];
        }
    }

Since Query::iterate() is deprecated now, I tried to use Query::toIterable() as suggested.

    public function iterateAll(): Generator
    {
        $iterator = $this->createQueryBuilder('e')->getQuery()->toIterable([], Query::HYDRATE_SIMPLEOBJECT);

        foreach ($iterator as $equipment) {
            yield $equipment;
        }
    }

With this implementation I stumbled over a massive memory leak in my application. I debugged this and found out that the AbstractHydrator is never releasing the Objects in AbstractHydrator::toIterable(). The called method AbstractHydrator::hydrateRowData() (or in my case SimpleObjectHydrator::hydrateRowData) just adds new entries to $result.

$this->hydrateRowData($row, $result);

Is that the intended behaviour? Wouldn't it be better to clear $result in every loop cycle to free memory?

If that's intended, is there a better way to loop through large result sets?

@beberlei
Copy link
Member

beberlei commented Jan 6, 2021

This is a bug

@beberlei beberlei added this to the 2.8.2 milestone Jan 6, 2021
@beberlei beberlei added the Bug label Jan 6, 2021
@simPod
Copy link
Contributor

simPod commented Jan 6, 2021

@beberlei this looks like in order to fix #3238 we need to keep hydrated objects in memory so ObjectHydrator can look them up. Any idea? Seems like we can't have both to me 🤔

beberlei added a commit to beberlei/doctrine2 that referenced this issue Feb 8, 2021
The new AbstractQuery::toIterable() had a memory leak that
AbstractQuery::iterable() did not have. This leak is now fixed.

After fixing the leak, one test failed where the identity map in
ObjectHydrator triggered and lead to a notice. Introduced a new
AbstractHydrator::cleanupAfterRowIteration() that the ObjectHydrator
uses to cleanup the state.
beberlei added a commit to beberlei/doctrine2 that referenced this issue Feb 16, 2021
The new AbstractQuery::toIterable() had a memory leak that
AbstractQuery::iterable() did not have. This leak is now fixed.

After fixing the leak, one test failed where the identity map in
ObjectHydrator triggered and lead to a notice. Introduced a new
AbstractHydrator::cleanupAfterRowIteration() that the ObjectHydrator
uses to cleanup the state.
beberlei added a commit to beberlei/doctrine2 that referenced this issue Feb 16, 2021
beberlei added a commit that referenced this issue Feb 16, 2021
* [GH-8410] Fix memory leak in new toIterable and state bug.

The new AbstractQuery::toIterable() had a memory leak that
AbstractQuery::iterable() did not have. This leak is now fixed.

After fixing the leak, one test failed where the identity map in
ObjectHydrator triggered and lead to a notice. Introduced a new
AbstractHydrator::cleanupAfterRowIteration() that the ObjectHydrator
uses to cleanup the state.

* [GH-8413] Bugfix: Iterating with multiple, mixed results

When multiple entity results are part of a row, the result handling
must be different. In addition mixed results with scalars are broken
and now throw an exception as illegal operation.

* Housekeeping: phpcs

* [GH-8413] Add assertions for entity alias iteration.

* [GH-8387] Missing @deprecated on Query::iterate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants