Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix various eagerloading bugs #10862

Merged

Conversation

GuySartorelli
Copy link
Member

@GuySartorelli GuySartorelli commented Jul 6, 2023

Multiple commits

I'm doing this all in one PR because it's going to all be affecting the same two files, and I don't want to have to keep rebasing each time a PR is merged for this. It's all related to each other anyway.

Anything which needed a fix in the code has an appropriately prefixed commit. There's one commit that's just a bunch of new tests that didn't need any changes to the code.

Feel free to squash if you want to when merging, though - if you do so, use the PR name as the commit message.

Commit 1 - minor refactor

Refactors the unit tests related to eagerloading into their own class, and moves their associated DataObjects into a new eagerloading namespace.
This is just to make it easier to find, when someone needs to check or add a test for eagerloading it's immediately clear where that goes.

Commit 2 - many_many_extraFields and join records for many_many through

Add tests for many_many_extraFields and join records for many_many through.
Fix functionality for both. Previously, both of these new tests would fail.

I'm using the actual ManyManyList and ManyManyThroughList classes here because they already know how to correctly handle the extraFields and join records. This does end up being ~50ms slower than $relationArray = DataObject::get($relationDataClass)->where('ID in ('. implode(',', $relationItemIDs) . ')')->map('ID', 'Me')->toArray(); (which is the faster version of byIDs, see #10860) - but it has the following benefits:

  • Ensures we have all of the correct extraFields data, without including fields that are in the db table but not in the extrafields spec (e.g. old extrafields that were later removed)
  • Doesn't mark extrafields as "changed", since it's part of the data passed in to DataList::createDataObject() and therefore part of the record's hydrated data
  • Doesn't require refactoring ManyManyList/ManyManyThroughList, nor copy/pasting functionality already handled in those classes

Commit 3 - iterating over the same list multiple times

Fix bug where iterating over the same DataList more than once would duplicate the records in eager-loaded relation lists.
The test is in the following commit, since this bug was found while I was working on filtering and had to be fixed for the filtering test to work.

Also fixes some related issues, such as toArray() not using the normal iterator logic (which meant it would separately exhibit this bug), and no longer carrying over eager-loaded data to a clone of the list.

Commit 4 - Add tests for filter, sort, limit, and other things

Tests that the underlying DataList can be filtered, sorted, limited, iterated over multiple times. also checks that eagerLoad() can be called at any stage in a query chain and a couple of other things (see the commit message).

There's also a slight refactor of the test class here - instead of using a static set of expected results, we're relying on the lazy-loaded data which we know will be correct if all other framework ORM tests pass. Other small changes were also made as necessary to accomodate the new tests

Commit 5 - DataObject protections against bad relations

DataObject had some protections when fetching has_many or many_many relations using the components methods - we were bypassing those protections but now we're not. Also make sure we clear eager-loaded relation data when flushing a record, which is very useful for memory management.

Issue

@GuySartorelli GuySartorelli marked this pull request as draft July 6, 2023 05:31
Comment on lines +1057 to +1064
$manyManyComponent['extraFields'] = $schema->manyManyExtraFieldsForComponent($parentDataClass, $relationName) ?: [];
if (is_a($manyManyComponent['relationClass'], ManyManyThroughList::class, true)) {
$manyManyComponent['joinClass'] = $manyManyComponent['join'];
$manyManyComponent['join'] = $schema->baseDataTable($manyManyComponent['joinClass']);
} else {
$manyManyComponent['joinClass'] = null;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make join always be the join table, and add the join class separately.
Also fetch the extra fields schema up front.

Comment on lines -1242 to +1288
// $join will either be:
// - the join table name for many-many
// - the join data class for many-many-through
$join = $manyManyLastComponent['join'];
$joinTable = $manyManyLastComponent['join'];
$extraFields = $manyManyLastComponent['extraFields'];
$joinClass = $manyManyLastComponent['joinClass'];

// Get the join records so we can correctly identify which children belong to which parents
$joinRows = DB::query('SELECT * FROM "' . $joinTable . '" WHERE "' . $parentIDField . '" IN (' . implode(',', $parentIDs) . ')');

// many_many_through
if (is_a($manyManyLastComponent['relationClass'], ManyManyThroughList::class, true)) {
$joinThroughObjs = DataObject::get($join)->filter([$parentIDField => $parentIDs]);
$relationItemIDs = [];
$joinRows = [];
foreach ($joinThroughObjs as $joinThroughObj) {
$joinRows[] = [
$parentIDField => $joinThroughObj->$parentIDField,
$childIDField => $joinThroughObj->$childIDField
];
$relationItemIDs[] = $joinThroughObj->$childIDField;
}
if ($joinClass !== null) {
$relationList = ManyManyThroughList::create(
$relationDataClass,
$joinClass,
$childIDField,
$parentIDField,
$extraFields,
$relationDataClass,
$parentDataClass
)->forForeignID($parentIDs);
// many_many + belongs_many_many
} else {
$joinTableQuery = DB::query('SELECT * FROM "' . $join . '" WHERE "' . $parentIDField . '" IN (' . implode(',', $parentIDs) . ')');
$relationItemIDs = $joinTableQuery->column($childIDField);
$joinRows = $joinTableQuery;
$relationList = ManyManyList::create(
$relationDataClass,
$joinTable,
$childIDField,
$parentIDField,
$extraFields
)->forForeignID($parentIDs);
}

// Get ALL of the items for this relation up front, for ALL of the parents
// Use a real RelationList here so that the extraFields and join record are correctly set for all relations
// Fetched as a map so we can get the ID for all records up front (instead of in another nested loop)
// Fetched after that as an array because for some reason that performs better in the loop
// Note that "Me" is a method on ViewableData that returns $this - i.e. that is the actual DataObject record
$relationArray = DataObject::get($relationDataClass)->byIDs($relationItemIDs)->map('ID', 'Me')->toArray();
$relationArray = $relationList->map('ID', 'Me')->toArray();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of this is just using the ManyManyList and ManyManyThroughList directly to get the records, instead of crafting our own queries. This means we get the extraFields data and join records set up correctly "for free" (read: without having to add any extra code, but with a ~50ms performance hit)

@GuySartorelli
Copy link
Member Author

CI failures are unrelated and will be fixed in #10866

@GuySartorelli GuySartorelli marked this pull request as ready for review July 11, 2023 05:10
Copy link
Member

@emteknetnz emteknetnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code seems fine, got a merge conflict

Includes making sure toArray() uses the correct iteration logic,
and cloning resets the eagerloaded data for the new clone.

Previously, a second iteration would add every relation item to the
relation list a second time - so with each iteration your relation list
count doubled (though it was the same records time and again).
Add tests for:
- filtered, limited, and sorted DataLists with eagerloaded relations
- iterating through the list multiple times
- eagerLoad method can be called anywhere in a query chain
- eager loaded relation lists can be correctly empty without throwing
  exceptions
- repeating relations to be eagerloaded in various permutations doesn't
  cause issues
- eagerloading doesn't negatively impact chunkedFetch functionality

squash
Don't fetch eager-loaded has_many or many_many unless we've validated
that the relation IS actually a has_many or many_many relation.
Also clear eager-loaded relations when flushing the record.
@GuySartorelli GuySartorelli force-pushed the pulls/5/eagerloading-new-tests branch from 75e50a0 to ed4c34b Compare July 17, 2023 00:03
@GuySartorelli
Copy link
Member Author

Conflict resolved - it was just the addition of use SilverStripe\ORM\Connect\DatabaseException; in DataListTest.

Copy link
Member

@emteknetnz emteknetnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@emteknetnz emteknetnz merged commit 10fba6e into silverstripe:5 Jul 17, 2023
@emteknetnz emteknetnz deleted the pulls/5/eagerloading-new-tests branch July 17, 2023 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants