Skip to content

Commit

Permalink
[update] Update query docs of 3.0/dev, fix typo and issues (#1271)
Browse files Browse the repository at this point in the history
# Versions 

- [x] dev
- [x] 3.0
- [x] 2.1
- [ ] 2.0

# Languages

- [x] Chinese
- [x] English

---------

Signed-off-by: Yongqiang YANG <[email protected]>
Co-authored-by: Yongqiang YANG <[email protected]>
Co-authored-by: Yongqiang YANG <[email protected]>
Co-authored-by: yagagagaga <[email protected]>
Co-authored-by: zhannngchen <[email protected]>
Co-authored-by: wangtianyi2004 <[email protected]>
Co-authored-by: kkop <[email protected]>
Co-authored-by: Jake-00 <[email protected]>
Co-authored-by: smiletan <[email protected]>
Co-authored-by: hui lai <[email protected]>
Co-authored-by: wudi <[email protected]>
  • Loading branch information
11 people authored Nov 7, 2024
1 parent bd5abae commit 702688d
Show file tree
Hide file tree
Showing 382 changed files with 37,618 additions and 37,485 deletions.
2 changes: 1 addition & 1 deletion common_docs_zh/ecosystem/hive-hll-udf.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ under the License.

# Hive HLL UDF

Hive HLL UDF 提供了在 hive 表中生成 HLL 运算等 UDF,Hive 中的 HLL 与 Doris HLL 完全一致,Hive 中的 HLL 可以通过 Spark HLL Load 导入 Doris。关于 HLL 更多介绍可以参考:[使用 HLL 近似去重](../query/duplicate/using-hll.md)
Hive HLL UDF 提供了在 hive 表中生成 HLL 运算等 UDF,Hive 中的 HLL 与 Doris HLL 完全一致,Hive 中的 HLL 可以通过 Spark HLL Load 导入 Doris。关于 HLL 更多介绍可以参考:[使用 HLL 近似去重](https://doris.apache.org/zh-CN/docs/query-acceleration/distinct-counts/using-hll/)

函数简介:
1. UDAF
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ under the License.

Beginning with version 0.9.0, Doris introduced an optimized replica management strategy and supported a richer replica status viewing tool. This document focuses on Doris data replica balancing, repair scheduling strategies, and replica management operations and maintenance methods. Help users to more easily master and manage the replica status in the cluster.

> Repairing and balancing copies of tables with Colocation attributes can be referred to [HERE](../../query/join-optimization/colocation-join.md)
> Repairing and balancing copies of tables with Colocation attributes can be referred to [HERE](../../query-data/join#colocate-join)
## Noun Interpretation

Expand Down
2 changes: 1 addition & 1 deletion docs/compute-storage-decoupled/file-cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ Cache-related metrics in the SQL profile are found under SegmentIterator, includ
| RemoteIOUseTimer | Time taken to read from remote storage |
| WriteCacheIOUseTimer | Time taken to write to the File Cache |

You can view query performance analysis through [Query Performance Analysis](../query/query-analysis/query-analytics).
You can view query performance analysis through [Query Performance Analysis](../query-acceleration/tuning/query-profile).

## Usage

Expand Down
2 changes: 1 addition & 1 deletion docs/install/cluster-deployment/standard-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,7 +290,7 @@ This is a CIDR representation that specifies the IP used by the FE. In environme
JAVA_OPTS="-Xmx16384m -XX:+UseMembar -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -Xloggc:$DORIS_HOME/log/fe.gc.log.$DATE"
```

6. Modify the case sensitivity parameter `lower_case_table_names` By default, Doris is case-sensitive for table names. If you require case-insensitive table names, you need to set this during cluster initialization. Note that once the cluster initialization is completed, the table name case sensitivity cannot be changed. Please refer to the [variable](../../query/query-variables/variables) documentation for more details on the `lower_case_table_names` setting.
6. Modify the case sensitivity parameter `lower_case_table_names` By default, Doris is case-sensitive for table names. If you require case-insensitive table names, you need to set this during cluster initialization. Note that once the cluster initialization is completed, the table name case sensitivity cannot be changed. Please refer to the variable documentation for more details on the `lower_case_table_names` setting.

**Start FE process**

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -101,15 +101,6 @@ Parameters:

the first parameter is the bitmap column, the second parameter is the dimension column for filtering, and the third parameter is the variable length parameter, which means different values of the filter dimension column

```
mysql> select orthogonal_bitmap_intersect(members, tag_group, 1150000, 1150001, 390006) from tag_map where tag_group in ( 1150000, 1150001, 390006);
+-------------------------------------------------------------------------------+
| orthogonal_bitmap_intersect(`members`, `tag_group`, 1150000, 1150001, 390006) |
+-------------------------------------------------------------------------------+
| NULL |
+-------------------------------------------------------------------------------+
```

Explain:

on the basis of this table schema, this function has two levels of aggregation in query planning. In the first layer, be nodes (update and serialize) first press filter_ Values are used to hash aggregate the keys, and then the bitmaps of all keys are intersected. The results are serialized and sent to the second level be nodes (merge and finalize). In the second level be nodes, all the bitmap values from the first level nodes are combined circularly
Expand All @@ -133,15 +124,6 @@ Parameters:

The first parameter is the bitmap column, the second parameter is the dimension column for filtering, and the third parameter is the variable length parameter, which means different values of the filter dimension column

```
mysql> select orthogonal_bitmap_intersect_count(members, tag_group, 1150000, 1150001, 390006) from tag_map where tag_group in ( 1150000, 1150001, 390006);
+-------------------------------------------------------------------------------------+
| orthogonal_bitmap_intersect_count(`members`, `tag_group`, 1150000, 1150001, 390006) |
+-------------------------------------------------------------------------------------+
| 0 |
+-------------------------------------------------------------------------------------+
```

Explain:

on the basis of this table schema, the query planning aggregation is divided into two layers. In the first layer, be nodes (update and serialize) first press filter_ Values are used to hash aggregate the keys, and then the intersection of bitmaps of all keys is performed, and then the intersection results are counted. The count values are serialized and sent to the second level be nodes (merge and finalize). In the second level be nodes, the sum of all the count values from the first level nodes is calculated circularly
Expand All @@ -155,15 +137,6 @@ Syntax:

orthogonal_bitmap_union_count(bitmap_column)

```
mysql> select orthogonal_bitmap_union_count(members) from tag_map where tag_group in ( 1150000, 1150001, 390006);
+------------------------------------------+
| orthogonal_bitmap_union_count(`members`) |
+------------------------------------------+
| 286957811 |
+------------------------------------------+
```

Explain:

on the basis of this table schema, this function is divided into two layers. In the first layer, be nodes (update and serialize) merge all the bitmaps, and then count the resulting bitmaps. The count values are serialized and sent to the second level be nodes (merge and finalize). In the second layer, the be nodes are used to calculate the sum of all the count values from the first level nodes
Expand All @@ -182,16 +155,6 @@ the first parameter is the Bitmap column, the second parameter is the dimension

the calculators supported by the expression: & represents intersection calculation, | represents union calculation, - represents difference calculation, ^ represents XOR calculation, and \ represents escape characters

```
select orthogonal_bitmap_expr_calculate_count(user_id, tag, '(833736|999777)&(1308083|231207)&(1000|20000-30000)') from user_tag_bitmap where tag in (833736,999777,130808,231207,1000,20000,30000);
Note: 1000, 20000, 30000 plastic tags represent different labels of users
```

```
select orthogonal_bitmap_expr_calculate_count(user_id, tag, '(A:a/b|B:2\\-4)&(C:1-D:12)&E:23') from user_str_tag_bitmap where tag in ('A:a/b', 'B:2-4', 'C:1', 'D:12', 'E:23');
Note: 'A:a/b', 'B:2-4', etc. are string types tag, representing different labels of users, where 'B:2-4' needs to be escaped as'B:2\\-4'
```

Explain:

the aggregation of query planning is divided into two layers. The first layer of be aggregation node calculation includes init, update, and serialize steps. The second layer of be aggregation node calculation includes merge and finalize steps. In the first layer of be node, the input string is parsed in the init phase, which is converted into a suffix expression (inverse Polish), parses the calculated key value, and initializes it in the map<key, bitmap>structure; In the update phase, the underlying kernel scan dimension column (filter_column) calls back the update function, and then aggregates the bitmap in the map structure of the previous step in the unit of computing key; In the serialize stage, the bitmap of the key column is parsed according to the suffix expression, and the bitmap intersection, merge and difference set is calculated using the first in, last out principle of the stack structure. Then the final bitmap is serialized and sent to the aggregation be node in the second layer. Aggregates be nodes in the second layer, finds the union set of all bitmap values from the first layer nodes, and returns the final bitmap results
Expand All @@ -204,16 +167,6 @@ Syntax:

orthogonal_bitmap_expr_calculate_count(bitmap_column, filter_column, input_string)

```
select orthogonal_bitmap_expr_calculate_count(user_id, tag, '(833736|999777)&(1308083|231207)&(1000|20000-30000)') from user_tag_bitmap where tag in (833736,999777,130808,231207,1000,20000,30000);
Note: 1000, 20000, 30000 plastic tags represent different labels of users
```

```
select orthogonal_bitmap_expr_calculate_count(user_id, tag, '(A:a/b|B:2\\-4)&(C:1-D:12)&E:23') from user_str_tag_bitmap where tag in ('A:a/b', 'B:2-4', 'C:1', 'D:12', 'E:23');
Note: 'A:a/b', 'B:2-4', etc. are string types tag, representing different labels of users, where 'B:2-4' needs to be escaped as'B:2\\-4'
```

Explain:

the aggregation of query planning is divided into two layers. The first layer of be aggregation node calculation includes init, update, and serialize steps. The second layer of be aggregation node calculation includes merge and finalize steps. In the first layer of be node, the input string is parsed in the init phase, converted to suffix expression Formula (inverse Polish formula), parse the calculated key value and initialize it in the map<key, bitmap>structure; In the update phase, the underlying kernel scan dimension column (filter_column) calls back the update function, and then aggregates the bitmap in the map structure of the previous step in the unit of computing key; In the serialize stage, the bitmap of the key column is parsed according to the suffix expression, and the bitmap intersection, merge and difference set is calculated using the first in, last out principle of the stack structure. Then the count value of the final bitmap is serialized and sent to the aggregation be node in the second layer.> Aggregates be nodes in the second layer, adds and sums all count values from the first layer nodes, and returns the final count result.
Expand Down
File renamed without changes.
Loading

0 comments on commit 702688d

Please sign in to comment.