Release Note 2.0.0-alpha1 #19231

xiaokang · 2023-04-30T14:52:06Z

NOTICE

Doris 2.0.0-alpha1 version is an ALPHA release that is aimed to be used for evaluating the new features of Doris 2.0.
It's recommended to deploy 2.0.0-alpha in a new test cluster.
2.0.0-alpha1 version should NOT be deployed in production clusters.

Highlight Features

1. Semi structured data storage and fast analysis

inverted index: supports both fulltext search and normal equal, range query.
- supports fulltext search query
  - supports Chinese, English and Unicode standard tokenizer.
  - supports both STRING and ARRAY types.
- supports normal equal, range query
  - supports normal equal, range query on STRING, NUMERIC, DATE, DATETIME types.
- supports logical combination of multiple conditions, not only AND but also OR and NOT.
- much more efficient compared to ElasticSearch in esrally http logs benchmark: 4x speed up for data load, 80% storage space reduced, 2x speed up for 11 queries.
- see more: https://doris.apache.org/docs/dev/data-table/index/inverted-index
dynamic schema table (experimental)
- automatically infer field names and types of semi structured JSON data
- dynamically extend table schema correspondingly.
- see more: https://doris.apache.org/docs/dev/data-table/dynamic-schema-table
complex datatypes
- JSONB data type is more efficient with fast simdjson first time data parsing
- ARRAY data type is more mature, adding dozens of array functions
- MAP data type is added for key-value pairs data, such as extensible user behavior properties
- STRUCT data type is add for traditional struct

2. High concurrent and low latency point query

implement row storage and row cache to speed up fetch whole rows.
implement short circuit query plan and execution for primary key query like SELECT * FROM tablex WHERE id = xxx
cache compiled query plan using PreparedStatement.
High QPS (30K+) on a single cloud host with 16 cpu core and 64g memory.
See more: https://doris.apache.org/docs/dev/query-acceleration/hight-concurrent-point-query

3. Vertical compaction enable by default

vertical compaction divides the schema into column groups, and then merges data by column, which can effectively reduce the memory overhead of compaction and improve the execution speed of compaction.
In the actual test, the memory used by vertical compaction is only 1/10 of the original compaction algorithm, and the compaction rate is increased by 15%.
See more: https://doris.apache.org/docs/dev/advanced/best-practice/compaction/#vertical-compaction

4. Separation of hot and cold data

Users can set the hot and cold data strategy through SQL, so as to move historical data to cheap storage such as object storage to reduce storage costs.
Cold data can still be accessed, and Doris provides a local cache to speed up the access efficiency of cold data.
See more: https://doris.apache.org/docs/dev/advanced/cold_hot_separation

5. Pipeline execution engine adapted to the architecture of modern multi-core CPUs (disable by default)

Asynchronous blocking operators: blocking operators will no longer occupy thread resources, and will no longer generate thread switching overhead.
Adaptive load: adopts Multi-Level Feedback Queue to schedule query priorities. In mixed load scenarios, each query can be fairly allocated to a fixed thread scheduling time slice, thus ensuring that Doris can perform different tasks under different loads with more stable performance.
Controllable number of threads: The default number of execution threads of the pipeLine execution engine is the number of CPUs and cores, and Doris starts the corresponding execution thread pool to manage the execution threads.
See more: https://doris.apache.org/docs/dev/query-acceleration/pipeline-execution-engine

6. Nereids - The Brand New Planner (disable by default)

Smarter: The new optimizer presents the optimization points of each RBO and CBO in the form of rules. For each rule, the new optimizer provides a set of patterns used to describe the shape of the query plan, which can exactly match the query plan that can be optimized. Based on this, the new optimizer can better support more complex query statements such as multi-level subquery nesting. At the same time, the CBO of the new optimizer is based on the advanced cascades framework, uses richer data statistics, and applies a cost model with more scientific dimensions. This makes the new optimizer more handy when faced with multi-table join queries.
More robust: All optimization rules of the new optimizer are completed on the logical execution plan tree. After the query syntax and semantic analysis is completed, it will be transformed into a tree structure. Compared with the internal data structure of the old optimizer, it is more reasonable and unified. Taking subquery processing as an example, the new optimizer is based on a new data structure, which avoids separate processing of subqueries by many rules in the old optimizer. In turn, the possibility of logic errors in optimization rules is reduced.
More flexible: The architectural design of the new optimizer is more reasonable and modern. Optimization rules and processing stages can be easily extended. Optimizer developers can respond to user needs more easier and quickly.
See more: https://doris.apache.org/docs/dev/query-acceleration/nereids

Behavior Changed

Enable light weight schema change by default
By default, datev2, datetimev2, and decimalv3 are used to create tables, and datav1, datetimev1, and decimalv2 are not supported for creating tables. [Enhancement](data-type) add FE config to prohibit create date and decimalv2 type #19077
In the JDBC and Iceberg catalogs, decimalv3 is used by default. [Enhancement](multi-catalogs) Use decimal V3 type in JDBC and Iceberg tables. #18926
Added max_openfiles and swap checks in the startup script of be, so if the system configuration is not reasonable, be may fail to start. [Improve](be)check max open file #18888
It is forbidden to log in without a password when accessing FE on localhost. [Enchancement](auth) Forbid to login doris from 127.0.0.1 without password #18816
When there is a multi catalog in the system, when querying the data of the information schema, only the data of the internal catalog will be displayed by default. [fix](info_db) avoid infodb query timeout when external catalog info is too large or is not reachable #18662
Renamed the process name of Doris to DorisFE and DorisBE. [Enhencement](FE) rename PaloFe to DorisFE #18167
The non-vectorized code has been removed from the system, so the enable_vectorized_engine parameter no longer works. refactor Delete the environment variable enable_vectorized_engine by Mryange · Pull Requests
Limit the depth of the expression tree, the default is 200. [fix](expr) avoid crashing caused by big depth of expression tree #17314
In order to be compatible with BI tools, datev2 and datetimev2 are displayed as date and datetime when show create table. [Enchancement](compatible) show dateV2/datetimeV2 to date/datetime #18358

The text was updated successfully, but these errors were encountered:

morningman added the release notes label May 1, 2023

xiaokang changed the title ~~Release Note 2.0.0-alpha~~ Release Note 2.0.0-alpha1 May 2, 2023

xiaokang mentioned this issue May 5, 2023

[community](release) add download scripts for 2.0.0-alpha1 release #19289

Merged

5 tasks

luzhijing pinned this issue May 6, 2023

xiaokang mentioned this issue Jun 21, 2023

Release Note 2.0-beta #21059

Open

luzhijing unpinned this issue Jul 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release Note 2.0.0-alpha1 #19231

Release Note 2.0.0-alpha1 #19231

xiaokang commented Apr 30, 2023 •

edited by morningman

Loading

Release Note 2.0.0-alpha1 #19231

Release Note 2.0.0-alpha1 #19231

Comments

xiaokang commented Apr 30, 2023 • edited by morningman Loading

NOTICE

Highlight Features

1. Semi structured data storage and fast analysis

2. High concurrent and low latency point query

3. Vertical compaction enable by default

4. Separation of hot and cold data

5. Pipeline execution engine adapted to the architecture of modern multi-core CPUs (disable by default)

6. Nereids - The Brand New Planner (disable by default)

Behavior Changed

xiaokang commented Apr 30, 2023 •

edited by morningman

Loading