-
Notifications
You must be signed in to change notification settings - Fork 11.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[5.3] Fix to prevent recursive where clause addition for chunkById #16498
Conversation
4e56efc
to
471a2f0
Compare
Can you show me some code I can use to recreate the issue? |
I can reproduce this issue, good catch. |
Alternatively, maybe the |
Can someone just share the code I need to reproduce this? |
Just use this on any table you have available: \DB::enableQueryLog();
$count = 0;
\DB::table('users')->chunkById(1, function () use (&$count) {
if (++$count > 2) return false;
});
var_dump( \DB::getQueryLog() ); |
Thanks vlakoff, yes that's the issue. Especially bad with low chunk sizes. Sure, there are many ways to solve this so up to you guys how you prefer to tackle it. I figured I'd just offer one of the simpler solutions. I hesitate on running the where clauses through a pre-filter though--eliminating basic redundancies wouldn't be too bad, but I could see that quickly snow balling into some crazy finite state machine to optimize wheres. The SQL engine query optimizer already does that sort of stuff anyway, and it seemed a little overkill for this situation. Also the chunkById issue is a bug, but I can see other cases where the ability to tag where clauses would be a useful feature and not a matter of optimizing away redundancy. For example, it might be nice to cycle through a set of enumerated values in a where clause without creating an entirely new query each loop. Another option would also be to change the existing |
I don't understand why we're having to fix these problems now. Did chunk always have this problem. Was the problem just introduced? If so, how and why and can we revert that change instead of introducing some new key into the wheres clause? Who knows what other consequences that will have. |
I think some people just gained interest in this previously rarely used method. My PRs were about usability (#16180, #16283) and adding tests (#16263, #16359), the main target being laravel/ideas#103. But here we have a real bug. It's probably present since the first implementation of |
Just in case, ping @mzur, as you have authored #14499 and #14711 maybe you have some insight. Also ping @nunomazer as you have authored #13604 (didn't you encounter the same issue with the "where" clause?) |
So this big under discussion in this PR was not introduced with any recent PR, but has existed since chunk was introduced? |
It seems so. Not |
Yes, this where clause exists since v5.2.27 when |
* | ||
* @param string $column | ||
* @param int $value | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing return annotation
@@ -548,6 +548,24 @@ public function where($column, $operator = null, $value = null, $boolean = 'and' | |||
} | |||
|
|||
/** | |||
* Adds an id limit to assist with Eloquent chunkById. | |||
* Subsequent calls override existing limit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing newline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this one means
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graham meant there should be an empty line between the short description and additional comments, like I think too it's the case here.
@@ -1498,9 +1516,9 @@ public function forPageAfterId($perPage = 15, $lastId = 0, $column = 'id') | |||
return $order['column'] === $column; | |||
})->values()->all(); | |||
|
|||
return $this->where($column, '>', $lastId) | |||
->orderBy($column, 'asc') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please restore this whitespace.
This
The result set for each chunk will be the same after the fix when only the last clause in the above iterations is applied.
Its severity is mild as the fix only affects query performance once 100+ chunks (with 100+ |
@derekmd no argument with the mild statement. This will indeed not affect the majority of users and it is idempotent (aside from when you exceed buffer limits and start to fatal). However I wouldn't want laravel to become known for "doesn't scale". These types of issues will contribute towards that reputation. |
Don't forget about a different fix I have suggested above. I bet it would be a better solution:
|
@vlakoff it seems to me that a filter at that lower level should be general purpose. And I would be worried about trying to write a general purpose optimizer to accurately recognize and eliminate redundant where clauses. Taking that to the extreme you'd essentially be writing a SQL query optimizer. It will also certainly be worse performance-wise. Possible I misunderstand what you mean though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should all be good now
I disagree when you say it's about optimizing the query, no, what we have here is a bug. About my fix, I think it integrates better in the codebase. Your fix is introducing named where clauses. Though it would be great to get some feedback about the possible fixes. |
@vlakoff if you have a better implementation of a fix then submit it. |
@vlakoff but how would you implement a general-purpose order-by filter at the lower level? Are you talking about inspecting each order by clause to see if it is subsumed by another one? That feels an awful lot like an NP-complete FSM waiting to happen if you do it in a way that isn't just a very naive approach targeting this specific case. Maybe I misunderstand, as @taylorotwell said maybe send a PR? |
@soundsgoodsofar possible to write a test for this? |
Fixed: 53f97a0 Would appreciate if y'all can confirm. |
I will make same fix to Eloquent query builder. |
I noticed this has been fixed, will this be released into |
Noticed during a recent meetup presentation that when running chunkById, the id where clause gets appended over and over again. (Ie you get queries like `SELECT * FROM my_table WHERE id > 50 AND id > 100 AND id > 150 AND ...).
For smaller datasets this isn't a big deal, but as the whole point of chunkById is dealing with increasingly larger datasets, this seems problematic. Eventually you'll run into query length limits somewhere. Even if you don't, transmitting so much useless data hurts my sense of performance.
This change is intended to "tag" the chunking where clause so it gets replaced each iteration instead of appended. I couldn't find anywhere that relies on the Builder where member variable to be numerically indexed, so I don't believe this will be an issue. Still, I'm not intimately familiar with this framework.
Also, this prevents the usage of more exotic id columns with chunkById. Frankly I think that's desirable--I would restrict it further to only allowing use of indexed columns if I could. If there's disagreement with this, another option would be to make the
where
function a wrapper around an internal function that also accepts a tag parameter.