-
Notifications
You must be signed in to change notification settings - Fork 823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Resolve problems with eagerloading performance #10855
FIX Resolve problems with eagerloading performance #10855
Conversation
b4f0636
to
1570b3c
Compare
It is hard to work with the current structure - having each relation type in its own method makes it way easier to see where my concerns start and end with a given relation. This also reduces the chance of variable bleed-over, where the value in a variable in the first loop (in the same or other relation type) could bleed into the next iteration of the loop.
c29d7cf
to
24948c8
Compare
$this->eagerLoadRelations = array_merge($this->eagerLoadRelations, $arr); | ||
$this->eagerLoadRelations = array_unique(array_merge($this->eagerLoadRelations, $arr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this change, we'd end up processing the same set of relations twice.
For example, you added
SomeClass::get()->eagerLoad(
'MyManyMany.MyHasMany',
'MyManyMany.MySecondHasMany',
);
you would have fetched MyManyMany
for both relation chains.
for ($i = 0; $i < $count; $i++) { | ||
if ($i === 0) { | ||
$arr[] = $parts[$i]; | ||
} else { | ||
$arr[] = $arr[count($arr) - 1] . '.' . $parts[$i]; | ||
} | ||
// Add each relation in the chain as its own entry to be eagerloaded | ||
// e.g. for "Players.Teams.Coaches" you'll have three entries: | ||
// "Players", "Players.Teams", and "Players.Teams.Coaches | ||
$usedParts = []; | ||
foreach ($parts as $part) { | ||
$usedParts[] = $part; | ||
$arr[] = implode('.', $usedParts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't at all clear what was happening here - this change doesn't improve performance but it does improve code readability.
src/ORM/DataList.php
Outdated
$joinTableQuery = DB::query('SELECT * FROM "' . $join . '" WHERE "' . $parentField . '" IN (' . implode(',', $ids) . ')'); | ||
$relationItemIDs = []; | ||
$rows = []; | ||
while ($row = $joinTableQuery->record()) { | ||
$rows[] = [ | ||
$parentField => $row[$parentField], | ||
$childField => $row[$childField] | ||
]; | ||
$relationItemIDs[] = $row[$childField]; | ||
$joinTableQuery = DB::query('SELECT * FROM "' . $join . '" WHERE "' . $parentIDField . '" IN (' . implode(',', $parentIDs) . ')'); | ||
$relationItemIDs = $joinTableQuery->column($childIDField); | ||
$joinRows = $joinTableQuery; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the query result's own iterator rather than re-inventing the wheel.
src/ORM/DataList.php
Outdated
// Get ALL of the items for this relation up front, for ALL of the parents | ||
// Fetched as a map so we can get the ID for all records up front (instead of in another nested loop) | ||
// Fetched after that as an array because for some reason that performs better in the loop | ||
$relationArray = DataObject::get($relationDataClass)->byIDs($relationItemIDs)->map('ID', 'Me')->toArray(); | ||
|
||
// Build a map of which children belong to which parent | ||
$map = []; | ||
foreach ($joinRows as $row) { | ||
$parentID = $row[$parentIDField]; | ||
$childID = $row[$childIDField]; | ||
if (!isset($map[$parentID])) { | ||
$map[$parentID] = []; | ||
} | ||
$map[$parentID][] = $relationArray[$childID]; | ||
} | ||
$relationArray = DataObject::get($relationDataClass)->filter(['ID' => $relationItemIDs])->toArray(); | ||
foreach ($rows as $row) { | ||
$eagerLoadID = $row[$parentField]; | ||
if (!isset($this->eagerLoadedData[$eagerLoadRelation][$eagerLoadID][$relation])) { | ||
$arrayList = ArrayList::create(); | ||
$arrayList->setDataClass($manyManyLastComponent['childClass']); | ||
$this->eagerLoadedData[$eagerLoadRelation][$eagerLoadID][$relation] = $arrayList; | ||
|
||
// Store the children in an ArrayList against the correct parent | ||
foreach ($map as $parentID => $children) { | ||
// There shouldn't be a list for this relation already - if there is, something went wrong. | ||
if (isset($this->eagerLoadedData[$eagerLoadRelation][$parentID][$relationName])) { | ||
throw new LogicException("Relation list for $relationName on $parentDataClass cannot be fetched twice"); | ||
} | ||
$relationItem = array_values(array_filter($relationArray, function ($relationItem) use ($row, $childField) { | ||
return $relationItem->ID === $row[$childField]; | ||
}))[0]; | ||
$this->eagerLoadedData[$eagerLoadRelation][$eagerLoadID][$relation]->push($relationItem); | ||
|
||
$arrayList = ArrayList::create($children); | ||
$this->eagerLoadedData[$eagerLoadRelation][$parentID][$relationName] = $arrayList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole section of changes is the main performance booster and fixes the root problem in the issue.
It basically boils down to this:
- Previously we were looping through all of the join records, and for every join record we'd then loop through all of the relation records, and call
->ID
on it. - Now we're getting the IDs up front, and the nested loop is removed entirely.
src/ORM/DataList.php
Outdated
$joinThroughObjs = $join::get()->filter([$parentField => $ids]); | ||
$joinThroughObjs = DataObject::get($join)->filter([$parentIDField => $parentIDs]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to be more consistent with the other DataObject::get()
calls in the eager loading code
src/ORM/DataList.php
Outdated
if ($dataClass === $dataClasses[0]) { | ||
while ($row = $query->record()) { | ||
|
||
// It's a has_one directly on the records in THIS list | ||
if ($parentDataClass === $this->dataClass()) { | ||
foreach ($query as $itemData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the query's own iterator rather than reinventing the wheel.
Just noting in my own testing with this PR that eager loading actually improved performance on many-many than has-many, counter to the testing in the description on this PR where it hurt it. I guess it's a case of "your mileage may vary" and "you should validate whether or not eager loading works for your particular website" I've raised a new issue to look at optimising the performance of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally, seems to work great.
24948c8
to
7af0fe2
Compare
Multiple commits
There are two commits here. They are both important.
Commit 1: Refactors the existing code so that each relation type is fetched in its own method. There's more information about why I needed this in the commit message.
Commit 2: This is the commit that holds the actual change of logic. This commit also includes changes to some variable names, to make them more explicit or self explanatory.
It'll almost certainly be easier to review this PR by looking at the changes in each commit, so you can more easily see the actual change to logic. But for convenience's sake, I've added a github comment to every change in the second commit that isn't just a change of variable name or linting fix or added comment.
Performance before and after
The following results are for effectively the same setup as in the linked issue - except the
Team
class also has a separatehas_many
relation for thePlayer
class, which holds another (separate) 10 players per team. This means there are a total of 20,000 player records in the database.The following numbers are for the full request of loading a page, and come from the lekoala debugbar, so they're not 100% accurate - but they give a clear indication of the improvement this PR makes.
has_many
This PR doesn't affect the performance of eager loading has_many relations - but it's nice to see that eagerloading does improve the speed of loading has_many relations in general.
many_many
Without this PR, eager loading many_many relations is effectively impossible.
Even with this PR, we can see that eager loading many_many relations is slower than without it, so there's still room for further optimisations (it's worth noting the database query itself (
WHERE ("Player"."ID" IN (<IDs of teams>)
) is taking ~1 second of that time so the PHP side of it is about as good as it's likely to get).Issue