-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some joins never complete with dynamic filtering enabled #9917
Comments
Failing query's plan (dynamic filtering on):
|
Same query with dynamic filtering turned off:
|
And this is the query where the predicate matches some rows:
|
I might have time to look into this, for now just parking the information here for later. |
@raunaqmorarka FYI (seen your name in many of the dynamic filtering PRs.) |
Many thanks for reporting this issue! IIUC, the problematic query: SELECT COUNT(*)
FROM lineitem l1, lineitem l2
WHERE l1.orderkey = l2.orderkey AND l1.partkey = l2.partkey AND l1.orderkey < 1; results in a broadcast join:
so the engine should use trino/core/trino-main/src/main/java/io/trino/sql/planner/LocalDynamicFilterConsumer.java Line 91 in 9b40ead
Following #3414 and #4685, we also allow the page source to block until the relevant dynamic filters are ready. It is possible that we have a bug somewhere in the above implementation, causing the query to get stuck :(
|
Thanks for the response @rzeyde-varada Here're the stages while this running.
|
Could you please try to see if disabling |
Would it be possible to share the It should be possible to retrieve it via the Web UI (using the "JSON" tab, in the upper-right corner): |
enable_coordinator_dynamic_filters_distribution does not make a difference. |
From the JSON above, it seems that DF collection is finished: "dynamicFiltersStats": {
"dynamicFilterDomainStats": [
{
"dynamicFilterId": "df_408",
"simplifiedDomain": "NONE",
"collectionDuration": "13.14s"
},
{
"dynamicFilterId": "df_409",
"simplifiedDomain": "NONE",
"collectionDuration": "13.14s"
}
],
"lazyDynamicFilters": 2,
"replicatedDynamicFilters": 2,
"totalDynamicFilters": 2,
"dynamicFiltersCompleted": 2
}, However, it's a bit strange that join-containing stage seems to be stuck in the scheduling state: "subStages": [
{
"stageId": "20211114_161209_00011_bwhrg.1",
"state": "SCHEDULING",
"plan": {
"id": "1",
"root": {
"@type": "aggregation",
"id": "479",
"source": {
"@type": "join",
"id": "339",
"type": "INNER",
"left": {
"@type": "project",
"id": "522",
"source": {
"@type": "filter",
"id": "410",
"source": {
"@type": "tablescan",
"id": "0", From the stack above, it seems that the workers are "waiting" for splits from the coordinator:
So IIUC, the coordinator is no longer producing splits / notifying the workers that there are no more splits... BTW, are there any errors/warning in the |
Also, does this issue reproduce on previous Trino releases? |
This should be fixed by #9952. This bug is only in unreleased version |
I have noticed that a join query sometimes does not finish when the pushed predicate on one side matches no rows, but only when dynamic filtering is enabled.
This is hard to reproduce as it seems to be related to data size, among other things.
The best I have achieved is loading the sf200.lineitem (1.2bn rows) table into Hive (via Trino) in a single node setup and then issuing this contrived query:
SELECT COUNT(*) FROM lineitem l1, lineitem l2 WHERE l1.orderkey = l2.orderkey AND l1.partkey = l2.partkey AND l1.orderkey < 1;
(Note that l1.orderkey < 1 matches no rows.)
The query will start and then just stop, with Trino consuming no noticeable CPU.
But the query finished when turning off dynamic filtering through the session.
Also, curiously this query does finish:
SELECT COUNT(*) FROM lineitem l1, lineitem l2 WHERE l1.orderkey = l2.orderkey AND l1.partkey = l2.partkey AND l1.orderkey < 2;
(and returns 0 rows)
Update: This 2nd query runs in about 1/2 the time (7s) when dynamic filters off (14s with dynamic filtering on)
So it has to do something with the predicate on one side of the join not returning any rows.
And this also did not happen with smaller data sets :(
(I'll follow up with the query plans.)
The text was updated successfully, but these errors were encountered: