Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[minor](stats) Update olap table row count after analyze #27814

Merged
merged 1 commit into from
Dec 1, 2023

Conversation

Kikyou1997
Copy link
Contributor

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

Copy link
Contributor

PR approved by anyone and no changes requested.

@Kikyou1997
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.88 seconds
stream load tsv: 561 seconds loaded 74807831229 Bytes, about 127 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.5 seconds inserted 10000000 Rows, about 350K ops/s
storage size: 17167077049 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 395ffa67da3f3b0bb0adb0e8e490a6803ebf324d, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4888	4655	4678	4655
q2	373	142	160	142
q3	1520	1351	1321	1321
q4	1156	997	949	949
q5	3256	3261	3235	3235
q6	257	132	133	132
q7	1021	524	573	524
q8	2245	2250	2235	2235
q9	6970	6978	6931	6931
q10	3308	3373	3388	3373
q11	348	212	215	212
q12	352	217	217	217
q13	4678	5373	3888	3888
q14	250	223	219	219
q15	590	531	530	530
q16	434	406	390	390
q17	1045	687	593	593
q18	7878	7939	7821	7821
q19	1578	1554	1557	1554
q20	612	1362	318	318
q21	3410	2945	2964	2945
q22	376	299	307	299
Total cold run time: 46545 ms
Total hot run time: 42483 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4602	4600	4580	4580
q2	316	208	261	208
q3	3752	3749	3724	3724
q4	2522	2511	2522	2511
q5	6200	6199	6197	6197
q6	247	125	128	125
q7	2592	2003	1961	1961
q8	3742	3698	3706	3698
q9	9429	9372	9393	9372
q10	4065	4149	4159	4149
q11	652	527	536	527
q12	790	616	641	616
q13	4372	3611	3694	3611
q14	279	236	247	236
q15	589	526	523	523
q16	522	501	513	501
q17	2120	2062	2112	2062
q18	9605	9345	9446	9345
q19	1844	1793	1782	1782
q20	2322	1992	1991	1991
q21	7273	6790	6934	6790
q22	663	574	563	563
Total cold run time: 68498 ms
Total hot run time: 65072 ms

morrySnow
morrySnow previously approved these changes Dec 1, 2023
Copy link
Contributor

github-actions bot commented Dec 1, 2023

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 1, 2023
@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Dec 1, 2023
@Kikyou1997
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.78 seconds
stream load tsv: 560 seconds loaded 74807831229 Bytes, about 127 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.1 seconds inserted 10000000 Rows, about 343K ops/s
storage size: 17167443650 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 6dded4a3fbfd56a5f1dd9b0060b3ab56d3f71df4, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4898	4685	4666	4666
q2	349	147	160	147
q3	1508	1300	1278	1278
q4	1160	1006	930	930
q5	3228	3217	3195	3195
q6	255	127	133	127
q7	1007	512	571	512
q8	2233	2249	2219	2219
q9	6963	7290	6950	6950
q10	3285	3360	3339	3339
q11	342	221	205	205
q12	357	218	222	218
q13	4626	3915	3870	3870
q14	248	225	220	220
q15	596	544	539	539
q16	448	367	380	367
q17	1025	635	567	567
q18	7996	8003	7242	7242
q19	1568	1533	1549	1533
q20	591	313	339	313
q21	3402	2924	2928	2924
q22	376	305	306	305
Total cold run time: 46461 ms
Total hot run time: 41666 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4589	4584	4588	4584
q2	306	205	242	205
q3	3701	3700	3696	3696
q4	2518	2499	2492	2492
q5	6096	6076	6099	6076
q6	243	127	126	126
q7	2566	1964	1991	1964
q8	3707	3791	3735	3735
q9	9419	9396	9367	9367
q10	4004	4119	4109	4109
q11	663	517	506	506
q12	803	629	633	629
q13	4370	3672	3630	3630
q14	279	249	253	249
q15	596	529	523	523
q16	540	490	472	472
q17	2100	2069	2082	2069
q18	9299	8761	8821	8761
q19	1777	1791	1775	1775
q20	2304	1992	1994	1992
q21	7281	6810	6911	6810
q22	698	573	554	554
Total cold run time: 67859 ms
Total hot run time: 64324 ms

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 1, 2023
Copy link
Contributor

github-actions bot commented Dec 1, 2023

PR approved by at least one committer and no changes requested.

@morrySnow morrySnow merged commit 3969226 into apache:master Dec 1, 2023
morrySnow pushed a commit that referenced this pull request Dec 1, 2023
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Dec 3, 2023
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Dec 3, 2023
eldenmoon added a commit that referenced this pull request Dec 4, 2023
* [chore](case) Use correct insert stmt for cold heat separation case #27546 (#27585)

Co-authored-by: AlexYue <[email protected]>

* [enhance](S3) Print the error detail for every s3 operation (#27572) (#27615)

* [nereids] fix stats error when using dateTime type filter #27571 (#27577)

* [fix](planner)sort node should materialized required slots for itself #27605 (#27620)

* [fix](Nereids) non-deterministic expression should not be constant (#27606) (#27631)

* [enhancement](stats) Add process for aggstate type #27640 (#27642)

* [Fix](statistics)Fix bug and improve auto analyze. (#27626) (#27657)

1. Implement needReAnalyzeTable for ExternalTable. For now, external table will not be reanalyzed in 10 days.
2. For HiveMetastoreCache.loadPartitions, handle the empty iterator case to avoid Index out of boundary exception.
3. Wrap handle show analyze loop with try catch, so that when one table failed (for example, catalog dropped so the table couldn't be found anymore), we can still show the other tables.
4. For now, only OlapTable and Hive HMSExternalTable support sample analyze, throw exception for other types of table.
5. In StatisticsCollector, call constructJob after createTableLevelTaskForExternalTable to avoid NPE.

* [profile](bugfix) should not cache profile content because the profile may not be a full profile (#27635)

---------

Co-authored-by: yiguolei <[email protected]>

* [Enhance](fe) Support setting initial root password when FE firstly launch (#27438) (#27603)

* [opt](plan) only lock olap table when query plan #27639 (#27656)

bp #27639

* select coordinator node from user's tag when exec streaming load (#27106) (#27677)

* [fix](statistics)Need to recalculate health value when table row count become 0  #27673 (#27674)

backport #27673

* [fix](statistics)Fix sample min max npe bug  #27702 (#27707)

backport #27702

* [Bug](join) try fix wrong _has_null_in_build_side setted (#27684) (#27710)

* [Fix](show-load)Show load npe(userinfo is null) (#27698) (#27719)

* [pick](nereids)temporary partition is always pruned #27636 (#27722)

* [enhancement](stats) limit bq cap size for analyze task #27685 (#27687)

* [improvement](statistics) Add config for the threshold of column count for auto analyze #27713 (#27723)

* [doc](fix) k8s operator docs fix to 2.0 (#27476)

* [Improvement](planner)support select tablets with nereids optimize #23164 #23365 (#27740)

#23164
#23365

* [FIX](complextype)fix complex type hash equals (#27743)

* [fix](statistics) Fix show auto analyze missing jobs bug (#27761)

* [bugfix](topn) fix coredump in copy_column_data_to_block when nullable mismatch

return RuntimeError if copy_column_data_to_block nullable mismatch to avoid coredump in input_col_ptr->filter_by_selector(sel_rowid_idx, select_size, raw_res_ptr) .

The problem is reported by a doris user but I can not reproduce it, so there is no testcase added currently.

* [opt](stats) Use escape rather than base64 for min/max value #27746 (#27748)

* [refactor](http) disable snapshot and get_log_file api (#27724) (#27770)

* [branch-2.0](pick 27738) Warning log to trace send fragment #27738 (#27760)

* [branch-2.0](pick #27771) Add more detail msg for waitRPC exception (#27773)

* [Bug](pipeline) prevent PipelineFragmentContext destruct early (#27790)

* [deps](compression) Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 1. Add libdeflate lib.  (#27542) (#27711)

Backport from #27542.

* [FIX](case)fix case truncate table first #27792

* [doc](stats) add auto_analyze_table_width_threshold description. (#27818) (#27832)

* [fix](bdbje) Fix bdbje logging level not work (#27597) (#27788)

* `EnvironmentConfig.FILE_LOGGING_LEVEL` only set FileHandlerLevel, we should
   set logger level firstly, otherwise it will not take effect.

* [Opt](compression) Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 2. Opt gzip decompression by libdeflate lib. (#27669) (#27801)

Backport from #27669.

* [branch-2.0](fix) Fix broken exception message #27836

* [Bug](func) coredump in equal for null in function (#27843)

* [minor](stats) Update olap table row count after analyze (#27858)

pick from master #27814

* [fix](stats)min and max return NaN when table is empty (#27863)

fix analyze empty table and min/max null value bug:
1. Skip empty analyze task for sample analyze task. (Full analyze task already skipped).
2. Check sample rows is not 0 before calculate the scale factor.
3. Remove ' in sql template after remove base64 encoding for min/max value.

backport #27862

* [minor](stats) Throw error when sync analyze failed (#27846)

pick from master #27845

* [fix](stats) Don't save colToPartitions anymore to save mem (#27880)

pick from master #27879

* [fix](nereids) set operation's result type is wrong if decimal overflows (#27872)

pick from master #27870

* [Config] Modify the default value of tablet_schema_cache_recycle_interval (#27877)

* [fix](like_func) incorrect result of like with 'NO_BACKSLASH_ESCAPES' mode(#27842) (#27851)

* [fix](fe) Fix show frontends npt in some situations (#27295) (#27789)

```
java.lang.NullPointerException: null
    at com.sleepycat.je.rep.util.ReplicationGroupAdmin.getMasterSocket(ReplicationGroupAdmin.java:191)
    at com.sleepycat.je.rep.util.ReplicationGroupAdmin.doMessageExchange(ReplicationGroupAdmin.java:607)
    at com.sleepycat.je.rep.util.ReplicationGroupAdmin.getGroup(ReplicationGroupAdmin.java:406)
    at org.apache.doris.ha.BDBHA.getElectableNodes(BDBHA.java:132)
    at org.apache.doris.common.proc.FrontendsProcNode.getFrontendsInfo(FrontendsProcNode.java:84)
    at org.apache.doris.qe.ShowExecutor.handleShowFrontends(ShowExecutor.java:1923)
    at org.apache.doris.qe.ShowExecutor.execute(ShowExecutor.java:355)
    at org.apache.doris.qe.StmtExecutor.handleShow(StmtExecutor.java:2113)
    ...
```

* [branch-2.0](fix) Fix extremely high CPU usage caused by rf merge #27894 (#27895)

* [fix](stacktrace) ignore stacktrace for error code INVALID_ARGUMENT INVERTED_INDEX_NOT_IMPLEMENTED (#27898)

* ignore stacktrace for error INVALID_ARGUMENT INVERTED_INDEX_NOT_IMPLEMENTED

* AndBlockColumnPredicate::evaluate

* [opt](nereids) Branch-2.0: remove partition & histogram from col stats to reduce memory usage #27885 (#27896)

* [pick](Nereids) temporary partition is selected only if user manually specified: Branch-2.0 #27893 (#27905)

* [fix](multi-catalog)support the max compute partition prune (#27154) (#27902)

backport #27154

* [fix](Nereids) should not push down project to the nullable side of outer join #27912 (#27913)

* fix compile

---------

Co-authored-by: Dongyang Li <[email protected]>
Co-authored-by: AlexYue <[email protected]>
Co-authored-by: xzj7019 <[email protected]>
Co-authored-by: starocean999 <[email protected]>
Co-authored-by: morrySnow <[email protected]>
Co-authored-by: AKIRA <[email protected]>
Co-authored-by: Jibing-Li <[email protected]>
Co-authored-by: yiguolei <[email protected]>
Co-authored-by: yiguolei <[email protected]>
Co-authored-by: DuRipeng <[email protected]>
Co-authored-by: Mingyu Chen <[email protected]>
Co-authored-by: wangbo <[email protected]>
Co-authored-by: Pxl <[email protected]>
Co-authored-by: Calvin Kirs <[email protected]>
Co-authored-by: minghong <[email protected]>
Co-authored-by: catpineapple <[email protected]>
Co-authored-by: amory <[email protected]>
Co-authored-by: Kang <[email protected]>
Co-authored-by: zhiqiang <[email protected]>
Co-authored-by: Qi Chen <[email protected]>
Co-authored-by: Lei Zhang <[email protected]>
Co-authored-by: HappenLee <[email protected]>
Co-authored-by: Lightman <[email protected]>
Co-authored-by: Jerry Hu <[email protected]>
Co-authored-by: slothever <[email protected]>
gnehil pushed a commit to gnehil/doris that referenced this pull request Dec 4, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.3-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants