Release plan 3.3 #40907

Dshadowzh · 2024-02-06T10:09:58Z

ETA: April 2024

Shared-Data Enhancements

Achieve Consistency Between Shared-Data and Share-Nothing Architectures: fast schema evolution, manual compaction, sync mv
Advanced Cache Management: Unified cache mechanism for DLA and Shared-Data, introduce improvements in cache priorities and the ability to maintain cache blacklists
Elevated Performance for Data Ingestion and Cold Queries
Support of persistent primary key Indexes into S3 within Shared-Data Architecture
Optimization of Garbage Collection Mechanism
Materialized Views (MV) for Shared-Data Move to General Availability (GA)
ClusterSync: Synchronizing Shared-Nothing and Shared-Data Clusters

DLA(Data lake analytics)

Parquet Reader Improvement: Refactor and enhance the Parquet reader for better memory efficiency and performance.
Iceberg Table Format Enhancements: Optimize Iceberg table partitions, metadata, and statistics collection, introduce query support for equality delete tables, and enable update/delete/schema change operations within Iceberg.
Table Sink Enhancements: Double the performance of the Sink operator compared to Trino and add support for Avro and ORC file formats.
Memory Consumption Management: Implement measures to limit memory consumption during high concurrency operations.

Performance Improvement

Fulltext Inverted Index Support: Introduce support for Fulltext Inverted Indexes for efficient text searches.
Bitmap Functions: Significantly enhance the performance of bitmap functions, add some bitmap export functions.
Expression Code Generation: Implement code generation for expressions to accelerate query execution.
Global Dictionary Optimization: Optimize the global dictionary for improved performance.
Storage Efficiency: Enhance storage efficiency with better codecs to reduce file size.
DateTime Microsecond Support: Extend DateTime column precision to support microseconds, available from version 3.2 onwards.

Management Enhancements

Partition Column Flexibility: Introduce support for string and integer data types in partition columns for more versatile data organization.
FE Memory Usage Insights: Provide detailed memory usage metrics by module within the Frontend (FE) for better resource management.
Tag-Based Data Distribution: Introduce tag-based data distribution to improve control over data placement for disaster recovery.
Table-Level Locking in FE: Implement table-level locking in the Frontend to enhance data consistency and concurrency control.
Column Renaming: Support the ability to rename columns, offering greater flexibility in schema management.
Enhanced Table Creation: Enable the creation of tables with an ORDER BY clause to specify sort keys.

Batch Processing

Spill Feature General Availability(GA): The spill feature is now generally available and can be enabled globally, enhancing query stability for large datasets.
Remote Storage Spilling: Extend spilling capabilities to remote storage, offering more flexibility and efficiency in handling large-scale data processing.
Support for Temporary Tables: Introduce the use of temporary tables for efficient intermediate data processing and management within batch operations.

Materialized view

View-based Rewrite: Support rewrite MV with the same view.
Text-based Rewrite: Introduce text-based rewrite capabilities to rewrite the query/sub-query with the similar text.
Iceberg MV Updates: Iceberg materialized views now support update-triggered refreshes and enhanced partition mapping.
Enhanced MV Observability: Improve monitoring and management of materialized views for better system insight.
New Properties for MV Rewrite Control: Implement enable_query_rewrite property to disable query rewrite and reduce the overall overhead.
Large-Scale MV Refresh Performance: Boost the efficiency of materialized view refreshes on a large scale.
Multi-Fact Table Partition Refresh: Enable partition refresh across multiple fact tables for increased data management flexibility.

The text was updated successfully, but these errors were encountered:

alberttwong · 2024-02-06T19:53:02Z

Table Sink Enhancements: Double the performance of the Sink operator compared to Trino and add support for Avro and ORC file formats.

Which table sink?

Dshadowzh · 2024-02-07T09:15:56Z

Table Sink Enhancements: Double the performance of the Sink operator compared to Trino and add support for Avro and ORC file formats.

Which table sink?

Starting with Hive tables, the introduction of support for these file formats offers advantages that can later extend to other table formats, such as Iceberg and Hudi sinks. And we want to implement Iceberg update and delete(position delete) in this version too.

itweixiang · 2024-02-23T10:43:08Z

非常期待完整倒排索引的功能

pingchangxin1 · 2024-03-04T03:49:07Z

I want to know whether the new structure of fe will appear in this version ?

zyclove · 2024-04-22T07:04:09Z

请问读写分离有排期吗？数据写入对查询影响很大，有没有好的读写分离方案呢？谢谢

xuqin1019 · 2024-04-29T02:58:04Z

Multi-Fact Table Partition Refresh: Enable partition refresh across multiple fact tables for increased data management flexibility.

@Dshadowzh Thanks! I am really interested in this feature.
In our scenarios, we define MVs on multiple base tables and we want to avoid full table refresh if any base table changes. Could I know how to use this feature(maybe MV supports list partition?) and when will the 3.3 version be released?

github-actions · 2024-10-28T11:01:16Z

We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!

Dshadowzh added the type/feature-request label Feb 6, 2024

Dshadowzh mentioned this issue Feb 6, 2024

StarRocks Roadmap 2024 #39686

Open

61 tasks

github-actions bot added the no-issue-activity label Oct 28, 2024

Dshadowzh closed this as completed Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release plan 3.3 #40907

Release plan 3.3 #40907

Dshadowzh commented Feb 6, 2024 •

edited

Loading

alberttwong commented Feb 6, 2024

Dshadowzh commented Feb 7, 2024 •

edited

Loading

itweixiang commented Feb 23, 2024

pingchangxin1 commented Mar 4, 2024

zyclove commented Apr 22, 2024

xuqin1019 commented Apr 29, 2024 •

edited

Loading

github-actions bot commented Oct 28, 2024

Release plan 3.3 #40907

Release plan 3.3 #40907

Comments

Dshadowzh commented Feb 6, 2024 • edited Loading

Shared-Data Enhancements

DLA(Data lake analytics)

Performance Improvement

Management Enhancements

Batch Processing

Materialized view

alberttwong commented Feb 6, 2024

Dshadowzh commented Feb 7, 2024 • edited Loading

itweixiang commented Feb 23, 2024

pingchangxin1 commented Mar 4, 2024

zyclove commented Apr 22, 2024

xuqin1019 commented Apr 29, 2024 • edited Loading

github-actions bot commented Oct 28, 2024

Dshadowzh commented Feb 6, 2024 •

edited

Loading

Dshadowzh commented Feb 7, 2024 •

edited

Loading

xuqin1019 commented Apr 29, 2024 •

edited

Loading