Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Generic Aggregate Function State #49000

Closed
3 tasks done
LiShuMing opened this issue Jul 26, 2024 · 2 comments · Fixed by #48939, #50000, #50069, #50425 or #50600
Closed
3 tasks done

Support Generic Aggregate Function State #49000

LiShuMing opened this issue Jul 26, 2024 · 2 comments · Fixed by #48939, #50000, #50069, #50425 or #50600

Comments

@LiShuMing
Copy link
Contributor

LiShuMing commented Jul 26, 2024

Feature request

Is your feature request related to a problem? Please describe.

When the detail data is very large, the aggregation model is very useful for querying after summarizing by dimensions; moreover, the aggregation model is also the necessary means for incremental computation of aggregate functions in StarRocks at present.

However, it currently only supports the following aggregate functions, which are very limited; even in terms of extensibility, if you want to support intermediate states such as HllSketch or Avg functions, the current development support logic is also quite complicated, with a lot of repetitive work.

Describe the solution you'd like

Since all aggregate functions have already defined the types of their intermediate states and methods for serialization/deserialization, all aggregate functions defined in the Query Engine can be used as columns in the aggregation model, and are no longer limited to the aggregate functions currently supported by the aggregation model.

Support Generic Aggregate Function in Aggregate Model

ColName aggregateFunctionName(paramType1, paramType2, ....)
  • ColName: The column name when the column is stored.
  • aggregateFunctionName: Defines the aggregate function corresponding to this column, used for compaction and re-aggregation.
  • The type of this column is the state type of the aggregate function, which will be automatically inferred.
    Since obtaining an aggregate function requires the function's input parameters + return type + whether it is nullable, the input parameters of the aggregate function need to be input here to uniquely determine an aggregate function.

Support Aggregate Function Combinator

/**
 * DESC: immediate_type {agg_func}_state(arg_types)
 *  input type  : aggregate function's argument types
 *  return type : aggregate function's immediate type
 */
  • _union combinator

/**
 * Union combinator for aggregate function to union the agg state to return the immediate result of aggregate function.
 * DESC: immediate_type {agg_func}_union(immediate_type)
 *  input type          : aggregate function's immediate_type
 *  intermediate type   : aggregate function's immediate_type
 *  return type         : aggregate function's immediate_type
 */
  • _merge combinator
/**
 * Merge combinator for aggregate function to merge the agg state to return the final result of aggregate function.
 * DESC: return_type {agg_func}_merge(immediate_type)
 *  input type          : aggregate function's immediate_type
 *  intermediate type   : aggregate function's immediate_type
 *  return type         : aggregate function's return type
 */

Support to Use Generic Aggregate Function State in Synchronized Materialized View

  • Support create & rewrite all common aggregate functions in synchronized materialized view

Support to Use Generic Aggregate Function State in ASynchronized Materialized View

  • Support create & rewrite all common aggregate functions in asynchronized materialized view

Describe alternatives you've considered

Additional context

@djiangc
Copy link

djiangc commented Sep 22, 2024

just to confirm, would this feature be available to Java UDAF too?

@LiShuMing
Copy link
Contributor Author

just to confirm, would this feature be available to Java UDAF too?

Not support yet. Only build-in aggregation functions are supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment