-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "baseline" metrics to all built in operators #866
Comments
If possible, can we have number of output streams/partitions per operator and their corresponding output rows, too? I am not sure if they are captured in repartition or not. If IIRC, the repartitioning only happens with certain setups. |
One thing perhaps we could do is to capture the statistics for each output partition and then add some way to aggregate them together. I think @andygrove suggested something like this on #679 (comment) though in the context of aggregating for distributed queries |
This looks really cool. It would be nice to have the option to show the individual partitions stacked as smaller lines below each operator. It would make sense to collect stats per partition and then aggregate them for reporting. |
I think there is also the case where user defined operators are the parents and we need to generate metrics recursively for their children too for end to end tracing. The list of metrics look like stuff that can be automatically generated across all operators if we can force all operator implementations to follow a certain convention. For example, instead of calling |
Thanks @houqp -- using a wrapper is a good (great!) idea -- I will think about how this might look like and give it a shot, likely after #679 |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I would like to be able get an overall understanding of where time is being spent during query execution via
EXPLAIN ANALYZE
(see #858) so that I know where to focuse additional performance optimization activitiesAdditionally, I would like to be able to graph a stacked flamechart such as the following see more details on https://github.com/influxdata/influxdb_iox/issues/2273) that shows when the different operators ran in relation to each other.
Describe the solution you'd like
I would like to instrument all operators (
impl ExecutionPlan
) included in DataFusion so that they produce at least the following metrics:execute
was runI plan to use the
SQLMetric
infrastructure for doing so, probably after #679Describe alternatives you've considered
Open questions:
Additional context
Related work:
The text was updated successfully, but these errors were encountered: