Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement](statistics)Add config for the threshold of column count for auto analyze. #27713

Merged
merged 1 commit into from
Nov 29, 2023

Conversation

Jibing-Li
Copy link
Contributor

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@Jibing-Li Jibing-Li marked this pull request as ready for review November 28, 2023 12:11
Copy link
Contributor

PR approved by anyone and no changes requested.

@xiaokang
Copy link
Contributor

run buildall

@Jibing-Li
Copy link
Contributor Author

run buildall

@Jibing-Li
Copy link
Contributor Author

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 28, 2023
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.17 seconds
stream load tsv: 580 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.5 seconds inserted 10000000 Rows, about 350K ops/s
storage size: 17099616602 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 21a080049b45dc0033583e8bd1d355d1654650f4, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4916	4651	4648	4648
q2	375	149	159	149
q3	1519	1355	1272	1272
q4	1154	986	953	953
q5	3226	3242	3266	3242
q6	258	133	134	133
q7	1047	509	538	509
q8	2265	2226	2213	2213
q9	6976	6931	6930	6930
q10	3295	3372	3359	3359
q11	334	216	199	199
q12	357	222	225	222
q13	4667	3888	3917	3888
q14	250	217	215	215
q15	578	542	521	521
q16	420	377	398	377
q17	1027	672	644	644
q18	8402	7521	7453	7453
q19	1591	1539	1557	1539
q20	590	333	331	331
q21	3362	2967	2967	2967
q22	369	304	308	304
Total cold run time: 46978 ms
Total hot run time: 42068 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4594	4605	4579	4579
q2	326	224	224	224
q3	3746	3754	3741	3741
q4	2560	2538	2536	2536
q5	6171	6192	6193	6192
q6	245	127	127	127
q7	2671	2021	1977	1977
q8	3762	3739	3646	3646
q9	9382	9404	9363	9363
q10	4073	4160	4171	4160
q11	624	497	523	497
q12	798	635	636	635
q13	4391	3653	3628	3628
q14	275	260	246	246
q15	586	523	530	523
q16	527	455	462	455
q17	2114	2083	2078	2078
q18	9461	8808	8852	8808
q19	1854	1774	1765	1765
q20	2328	1972	1982	1972
q21	7362	6868	6944	6868
q22	714	602	555	555
Total cold run time: 68564 ms
Total hot run time: 64575 ms

xiaokang pushed a commit that referenced this pull request Nov 28, 2023
@morningman morningman merged commit b3111e4 into apache:master Nov 29, 2023
@Jibing-Li Jibing-Li deleted the 70 branch November 29, 2023 01:49
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Dec 3, 2023
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Dec 3, 2023
eldenmoon added a commit that referenced this pull request Dec 4, 2023
* [chore](case) Use correct insert stmt for cold heat separation case #27546 (#27585)

Co-authored-by: AlexYue <[email protected]>

* [enhance](S3) Print the error detail for every s3 operation (#27572) (#27615)

* [nereids] fix stats error when using dateTime type filter #27571 (#27577)

* [fix](planner)sort node should materialized required slots for itself #27605 (#27620)

* [fix](Nereids) non-deterministic expression should not be constant (#27606) (#27631)

* [enhancement](stats) Add process for aggstate type #27640 (#27642)

* [Fix](statistics)Fix bug and improve auto analyze. (#27626) (#27657)

1. Implement needReAnalyzeTable for ExternalTable. For now, external table will not be reanalyzed in 10 days.
2. For HiveMetastoreCache.loadPartitions, handle the empty iterator case to avoid Index out of boundary exception.
3. Wrap handle show analyze loop with try catch, so that when one table failed (for example, catalog dropped so the table couldn't be found anymore), we can still show the other tables.
4. For now, only OlapTable and Hive HMSExternalTable support sample analyze, throw exception for other types of table.
5. In StatisticsCollector, call constructJob after createTableLevelTaskForExternalTable to avoid NPE.

* [profile](bugfix) should not cache profile content because the profile may not be a full profile (#27635)

---------

Co-authored-by: yiguolei <[email protected]>

* [Enhance](fe) Support setting initial root password when FE firstly launch (#27438) (#27603)

* [opt](plan) only lock olap table when query plan #27639 (#27656)

bp #27639

* select coordinator node from user's tag when exec streaming load (#27106) (#27677)

* [fix](statistics)Need to recalculate health value when table row count become 0  #27673 (#27674)

backport #27673

* [fix](statistics)Fix sample min max npe bug  #27702 (#27707)

backport #27702

* [Bug](join) try fix wrong _has_null_in_build_side setted (#27684) (#27710)

* [Fix](show-load)Show load npe(userinfo is null) (#27698) (#27719)

* [pick](nereids)temporary partition is always pruned #27636 (#27722)

* [enhancement](stats) limit bq cap size for analyze task #27685 (#27687)

* [improvement](statistics) Add config for the threshold of column count for auto analyze #27713 (#27723)

* [doc](fix) k8s operator docs fix to 2.0 (#27476)

* [Improvement](planner)support select tablets with nereids optimize #23164 #23365 (#27740)

#23164
#23365

* [FIX](complextype)fix complex type hash equals (#27743)

* [fix](statistics) Fix show auto analyze missing jobs bug (#27761)

* [bugfix](topn) fix coredump in copy_column_data_to_block when nullable mismatch

return RuntimeError if copy_column_data_to_block nullable mismatch to avoid coredump in input_col_ptr->filter_by_selector(sel_rowid_idx, select_size, raw_res_ptr) .

The problem is reported by a doris user but I can not reproduce it, so there is no testcase added currently.

* [opt](stats) Use escape rather than base64 for min/max value #27746 (#27748)

* [refactor](http) disable snapshot and get_log_file api (#27724) (#27770)

* [branch-2.0](pick 27738) Warning log to trace send fragment #27738 (#27760)

* [branch-2.0](pick #27771) Add more detail msg for waitRPC exception (#27773)

* [Bug](pipeline) prevent PipelineFragmentContext destruct early (#27790)

* [deps](compression) Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 1. Add libdeflate lib.  (#27542) (#27711)

Backport from #27542.

* [FIX](case)fix case truncate table first #27792

* [doc](stats) add auto_analyze_table_width_threshold description. (#27818) (#27832)

* [fix](bdbje) Fix bdbje logging level not work (#27597) (#27788)

* `EnvironmentConfig.FILE_LOGGING_LEVEL` only set FileHandlerLevel, we should
   set logger level firstly, otherwise it will not take effect.

* [Opt](compression) Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 2. Opt gzip decompression by libdeflate lib. (#27669) (#27801)

Backport from #27669.

* [branch-2.0](fix) Fix broken exception message #27836

* [Bug](func) coredump in equal for null in function (#27843)

* [minor](stats) Update olap table row count after analyze (#27858)

pick from master #27814

* [fix](stats)min and max return NaN when table is empty (#27863)

fix analyze empty table and min/max null value bug:
1. Skip empty analyze task for sample analyze task. (Full analyze task already skipped).
2. Check sample rows is not 0 before calculate the scale factor.
3. Remove ' in sql template after remove base64 encoding for min/max value.

backport #27862

* [minor](stats) Throw error when sync analyze failed (#27846)

pick from master #27845

* [fix](stats) Don't save colToPartitions anymore to save mem (#27880)

pick from master #27879

* [fix](nereids) set operation's result type is wrong if decimal overflows (#27872)

pick from master #27870

* [Config] Modify the default value of tablet_schema_cache_recycle_interval (#27877)

* [fix](like_func) incorrect result of like with 'NO_BACKSLASH_ESCAPES' mode(#27842) (#27851)

* [fix](fe) Fix show frontends npt in some situations (#27295) (#27789)

```
java.lang.NullPointerException: null
    at com.sleepycat.je.rep.util.ReplicationGroupAdmin.getMasterSocket(ReplicationGroupAdmin.java:191)
    at com.sleepycat.je.rep.util.ReplicationGroupAdmin.doMessageExchange(ReplicationGroupAdmin.java:607)
    at com.sleepycat.je.rep.util.ReplicationGroupAdmin.getGroup(ReplicationGroupAdmin.java:406)
    at org.apache.doris.ha.BDBHA.getElectableNodes(BDBHA.java:132)
    at org.apache.doris.common.proc.FrontendsProcNode.getFrontendsInfo(FrontendsProcNode.java:84)
    at org.apache.doris.qe.ShowExecutor.handleShowFrontends(ShowExecutor.java:1923)
    at org.apache.doris.qe.ShowExecutor.execute(ShowExecutor.java:355)
    at org.apache.doris.qe.StmtExecutor.handleShow(StmtExecutor.java:2113)
    ...
```

* [branch-2.0](fix) Fix extremely high CPU usage caused by rf merge #27894 (#27895)

* [fix](stacktrace) ignore stacktrace for error code INVALID_ARGUMENT INVERTED_INDEX_NOT_IMPLEMENTED (#27898)

* ignore stacktrace for error INVALID_ARGUMENT INVERTED_INDEX_NOT_IMPLEMENTED

* AndBlockColumnPredicate::evaluate

* [opt](nereids) Branch-2.0: remove partition & histogram from col stats to reduce memory usage #27885 (#27896)

* [pick](Nereids) temporary partition is selected only if user manually specified: Branch-2.0 #27893 (#27905)

* [fix](multi-catalog)support the max compute partition prune (#27154) (#27902)

backport #27154

* [fix](Nereids) should not push down project to the nullable side of outer join #27912 (#27913)

* fix compile

---------

Co-authored-by: Dongyang Li <[email protected]>
Co-authored-by: AlexYue <[email protected]>
Co-authored-by: xzj7019 <[email protected]>
Co-authored-by: starocean999 <[email protected]>
Co-authored-by: morrySnow <[email protected]>
Co-authored-by: AKIRA <[email protected]>
Co-authored-by: Jibing-Li <[email protected]>
Co-authored-by: yiguolei <[email protected]>
Co-authored-by: yiguolei <[email protected]>
Co-authored-by: DuRipeng <[email protected]>
Co-authored-by: Mingyu Chen <[email protected]>
Co-authored-by: wangbo <[email protected]>
Co-authored-by: Pxl <[email protected]>
Co-authored-by: Calvin Kirs <[email protected]>
Co-authored-by: minghong <[email protected]>
Co-authored-by: catpineapple <[email protected]>
Co-authored-by: amory <[email protected]>
Co-authored-by: Kang <[email protected]>
Co-authored-by: zhiqiang <[email protected]>
Co-authored-by: Qi Chen <[email protected]>
Co-authored-by: Lei Zhang <[email protected]>
Co-authored-by: HappenLee <[email protected]>
Co-authored-by: Lightman <[email protected]>
Co-authored-by: Jerry Hu <[email protected]>
Co-authored-by: slothever <[email protected]>
gnehil pushed a commit to gnehil/doris that referenced this pull request Dec 4, 2023
@xiaokang xiaokang mentioned this pull request Dec 4, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.3-merged p0_b reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants