Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Optimize] Avoid repeated sending of common components in Fragments #4904

Merged
merged 16 commits into from
Nov 22, 2020

Conversation

morningman
Copy link
Contributor

Proposed changes

This CL mainly changes:

  1. Avoid repeated sending of common components in Fragments

    In the previous implementation, a query may generate multiple Fragments,
    these Fragments contain some common information, such as DescriptorTable.
    Fragment will be sent to BE in a certain order, so these public information will be sent repeatedly
    and generated repeatedly on the BE side.

    In some complex SQL, these public information may be very large,
    thereby increasing the execution time of Fragment.

    So I improved this. For multiple Fragments sent to the same BE, only the first Fragment will carry
    these public information, and it will be cached on the BE side, and subsequent Fragments
    no longer need to carry this information.

    In the local test, the execution time of some complex SQL can be reduced from 3 seconds to 1 second.

  2. Add the time-consuming part of FE logic in Profile

    Including SQL analysis, planning, Fragment scheduling and sending on the FE side, and the time to fetch data.

Types of changes

  • Code refactor (Modify the code structure, format the code, etc...)

Checklist

Further comments

The query planner is still time-consuming when processing complex SQL. It will be optimized later.

@morningman morningman added kind/improvement area/sql/execution Issues or PRs related to the execution engine labels Nov 15, 2020
@morningman
Copy link
Contributor Author

This change is forward-compatible, the old version FE works well with new version BE.

Copy link
Contributor

@EmmyMiao87 EmmyMiao87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit f1b57c4 into apache:master Nov 22, 2020
@yangzhg yangzhg mentioned this pull request Feb 9, 2021
yiguolei pushed a commit that referenced this pull request Sep 9, 2022
…12495)


1. For query with 1656 union, the plan thrift size will be reduced from 400MB+ to 2MB.
This optimization is introduced from #4904, but lost after #9720

2. Disable ExprSubstitutionMap.verify when debug is disable.
So that the plan time of query with 1656 union will be reduced from 20s to 2s
yiguolei pushed a commit that referenced this pull request Sep 9, 2022
…12495)

1. For query with 1656 union, the plan thrift size will be reduced from 400MB+ to 2MB.
This optimization is introduced from #4904, but lost after #9720

2. Disable ExprSubstitutionMap.verify when debug is disable.
So that the plan time of query with 1656 union will be reduced from 20s to 2s
Henry2SS pushed a commit to Henry2SS/incubator-doris that referenced this pull request Sep 14, 2022
…pache#12495)

1. For query with 1656 union, the plan thrift size will be reduced from 400MB+ to 2MB.
This optimization is introduced from apache#4904, but lost after apache#9720

2. Disable ExprSubstitutionMap.verify when debug is disable.
So that the plan time of query with 1656 union will be reduced from 20s to 2s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sql/execution Issues or PRs related to the execution engine kind/improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature wanted]现在Doris的union all好像是会集中到一台机器上进行计算,导致效率不高
4 participants