Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release plan 3.3 #40907

Closed
Dshadowzh opened this issue Feb 6, 2024 · 7 comments
Closed

Release plan 3.3 #40907

Dshadowzh opened this issue Feb 6, 2024 · 7 comments

Comments

@Dshadowzh
Copy link
Contributor

Dshadowzh commented Feb 6, 2024

ETA: April 2024

Shared-Data Enhancements

  • Achieve Consistency Between Shared-Data and Share-Nothing Architectures: fast schema evolution, manual compaction, sync mv
  • Advanced Cache Management: Unified cache mechanism for DLA and Shared-Data, introduce improvements in cache priorities and the ability to maintain cache blacklists
  • Elevated Performance for Data Ingestion and Cold Queries
  • Support of persistent primary key Indexes into S3 within Shared-Data Architecture
  • Optimization of Garbage Collection Mechanism
  • Materialized Views (MV) for Shared-Data Move to General Availability (GA)
  • ClusterSync: Synchronizing Shared-Nothing and Shared-Data Clusters

DLA(Data lake analytics)

  • Parquet Reader Improvement: Refactor and enhance the Parquet reader for better memory efficiency and performance.
  • Iceberg Table Format Enhancements: Optimize Iceberg table partitions, metadata, and statistics collection, introduce query support for equality delete tables, and enable update/delete/schema change operations within Iceberg.
  • Table Sink Enhancements: Double the performance of the Sink operator compared to Trino and add support for Avro and ORC file formats.
  • Memory Consumption Management: Implement measures to limit memory consumption during high concurrency operations.

Performance Improvement

  • Fulltext Inverted Index Support: Introduce support for Fulltext Inverted Indexes for efficient text searches.
  • Bitmap Functions: Significantly enhance the performance of bitmap functions, add some bitmap export functions.
  • Expression Code Generation: Implement code generation for expressions to accelerate query execution.
  • Global Dictionary Optimization: Optimize the global dictionary for improved performance.
  • Storage Efficiency: Enhance storage efficiency with better codecs to reduce file size.
  • DateTime Microsecond Support: Extend DateTime column precision to support microseconds, available from version 3.2 onwards.

Management Enhancements

  • Partition Column Flexibility: Introduce support for string and integer data types in partition columns for more versatile data organization.
  • FE Memory Usage Insights: Provide detailed memory usage metrics by module within the Frontend (FE) for better resource management.
  • Tag-Based Data Distribution: Introduce tag-based data distribution to improve control over data placement for disaster recovery.
  • Table-Level Locking in FE: Implement table-level locking in the Frontend to enhance data consistency and concurrency control.
  • Column Renaming: Support the ability to rename columns, offering greater flexibility in schema management.
  • Enhanced Table Creation: Enable the creation of tables with an ORDER BY clause to specify sort keys.

Batch Processing

  • Spill Feature General Availability(GA): The spill feature is now generally available and can be enabled globally, enhancing query stability for large datasets.
  • Remote Storage Spilling: Extend spilling capabilities to remote storage, offering more flexibility and efficiency in handling large-scale data processing.
  • Support for Temporary Tables: Introduce the use of temporary tables for efficient intermediate data processing and management within batch operations.

Materialized view

  • View-based Rewrite: Support rewrite MV with the same view.
  • Text-based Rewrite: Introduce text-based rewrite capabilities to rewrite the query/sub-query with the similar text.
  • Iceberg MV Updates: Iceberg materialized views now support update-triggered refreshes and enhanced partition mapping.
  • Enhanced MV Observability: Improve monitoring and management of materialized views for better system insight.
  • New Properties for MV Rewrite Control: Implement enable_query_rewrite property to disable query rewrite and reduce the overall overhead.
  • Large-Scale MV Refresh Performance: Boost the efficiency of materialized view refreshes on a large scale.
  • Multi-Fact Table Partition Refresh: Enable partition refresh across multiple fact tables for increased data management flexibility.
@alberttwong
Copy link
Contributor

Table Sink Enhancements: Double the performance of the Sink operator compared to Trino and add support for Avro and ORC file formats.

Which table sink?

@Dshadowzh
Copy link
Contributor Author

Dshadowzh commented Feb 7, 2024

Table Sink Enhancements: Double the performance of the Sink operator compared to Trino and add support for Avro and ORC file formats.

Which table sink?

Starting with Hive tables, the introduction of support for these file formats offers advantages that can later extend to other table formats, such as Iceberg and Hudi sinks. And we want to implement Iceberg update and delete(position delete) in this version too.

@itweixiang
Copy link
Contributor

非常期待完整倒排索引的功能

@pingchangxin1
Copy link

I want to know whether the new structure of fe will appear in this version ?

@zyclove
Copy link

zyclove commented Apr 22, 2024

请问读写分离有排期吗?数据写入对查询影响很大,有没有好的读写分离方案呢?谢谢

@xuqin1019
Copy link

xuqin1019 commented Apr 29, 2024

Multi-Fact Table Partition Refresh: Enable partition refresh across multiple fact tables for increased data management flexibility.

@Dshadowzh Thanks! I am really interested in this feature.
In our scenarios, we define MVs on multiple base tables and we want to avoid full table refresh if any base table changes. Could I know how to use this feature(maybe MV supports list partition?) and when will the 3.3 version be released?

Copy link

We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants