-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new visualization for custom X-Y axes #4185
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@mistercrunch @williaster any thoughts? |
superset/viz.py
Outdated
viz_type = 'line_xy' | ||
verbose_name = _('XY Line Chart') | ||
sort_series = False | ||
is_timeseries = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be False
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is primarily used as a small hack to implement limiting on the groupBy
groupings, using the limit as the timeseries_limit
in the query objects. The limit is applied on the subquery in the case of sqla
, and the first phase in the case of the druid
query.
The reason why limiting should be applied is so that, in some cases, the visualization doesn't come out as a large number of lines, which would crash the browser.
I realize it probably makes more sense to limit the result set after running the query, limiting on the sliceBy
field, but I'm open to suggestions here. We can also forego implementing limiting for now, and encourage users to apply proper filters on their queries. Otherwise I'll leave it the way it is now and include a comment in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note - I decided to go with the latter solution and forego implementing limiting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have a conversation as to whether supporting "NOT GROUPED BY" for this specific visualization is necessary. I'd rather keep things simple here and only allow "group by" on the X axis
description: t('Columns or metrics to display'), | ||
mapStateToProps: (state) => { | ||
let choices = []; | ||
if (state.controls && state.datasource) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the exact same function as in columns_and_metrics_x
, let's make this DRY
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of being able to plot non grouped-by measures. In some cases I feel that this is a missing option (e.g., box plot).
The non-grouped xy-plot would be even more useful if there were an option to just plot markers without any lines. In other words, a plain scatter plot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I must have mussed that the scatter plot is implemented here, already.
So, I would highly appreciate if the non-grouped code could just stay...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understand your NOT GROUPED BY use case, is it for optimization purposes in cases where the data is already aggregated? Note that you can still make columns that are used in metrics "groupbable" and group by them. Also note that the cost of grouping something that's already grouped should be fairly cheap.
I think having the two interfaces in one makes things confusing. For the case of the Table viz it was very necessary to allow NOT GROUPED BY, and I'm trying to understand what use case requires that in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We, for example, need it to show raw values of our sensor products. So each data point represents one serial number.
You might say that grouping by SN should just do the job. However, grouping by, e.g., the job number, is then more useful.
In other cases there exists real xy measurement data for a product, like e.g. a transfer characteristic, and this again is not useful to be aggregated but has to be investigated for each SN separately.
Limiting the amount of data may of course be challenging then, but this is the user's responsibility in my eyes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mistercrunch The 'NOT GROUPED BY' use case is the same one why it is available on the table visualization. Fundamentally, this general X-Y chart lets you select any column of a table-visualization as the X axis or the Y axis. In short, if you have two numeric columns in a table you should be able to display the data as a line chart regardless of what the 'query' looked like.
The reason for the 'NOT GROUPED BY' section to be included in the table viz is to be able to visualize (list) non-aggregated data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 not grouped by
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about as an alternative to the GROUPED BY
/ NOT GROUPED BY
, if we allowed for a mix of metrics and dimensions in the dropdowns? If no metrics are selected, then we don't aggregate. Perhaps that's the way that the table view should work as well.
We're planning a fair amount of work to have a more column-centric approach in the explore view and supporting backwards compatibility on this new chart type may make that harder.
], | ||
}, | ||
{ | ||
label: t('NOT GROUPED BY'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling NOT GROUPED BY
here adds a lot of complexity to this visualization, why is it necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note - I included NOT GROUPED BY
to address the feature @rumbin mentioned above.
54df860
to
1f54180
Compare
Is there anyway to make months (Jan-Dec) as x axis. |
@raghavmishra1202 Correct me if I'm wrong, but it sounds like what you are trying to visualize is best done with a time series chart with month granularity. If instead, you are trying to plot specific month-integer values (1-12, and not in a timestamp-like field) on the X-Axis, and want them displayed from a mapping of numbers to month-string, that seems like a display-specific change that can be implemented in in the If we want to include similar functionality in the XY Line Chart (granularities for things other than dates), we would have to support bucketing values into ranges, which is not included in this change set. |
@jeffreythewang thanks for your reply. Unfortunately that doesn't work in my case. the data i have is organized by basically months in numeric form in one column and data in the other. If i do the time series chart i get some errors. If i do a normal plot it shows the months as 1-12 on the x axis with corresponding data. I basically group the string version of month with numeric but now the organization of the axis gets messed up. it's weird cause the csv shows the correct order any suggestions? like i mentioned above either it arranges x axis as |
Also is there a way to download all the dashboard data into a single csv/zip folder? |
@raghavmishra1202 The way that the XY Line Chart works under the hood is that it orders the result set of the query based on whichever column/metric is selected to be plotted on the X-axis. This way, the line(s) that is/are plotted will have continuity in the series. In other words, the dots are connected in the order that you expect to be connected. For the best visualization, quantitative columns/metrics should be plotted on the X or Y axis, and qualitative fields should go in the For your specific use case, in my opinion, there are a few ways to go about it:
Another solution is to allow users to specify their own ordering for the result set, but I'm not sure how useful or intuitive that would be. I'm open to suggestions here. Also re: downloading dashboard data into a single csv/zip - As far as I know, it does not seem like it, and you'll have to export individual charts as CSV. It can be filed as a feature request though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, there is an issue with rendering the XY Line Chart when a temporal column is not present, so please do not merge right now. I will update the PR when I push the fix.
UPDATE: This issue is now fixed.
curious why this needs to be a new visualization type vs just exposing more customization options (eg not grouped by, no line) in the existing time series line chart, thus making it more general and not a time series chart. if this is a superset of functionality, why not just deprecate/migrate the line series instead of cramming more stuff in? this is more consistent with other charting frameworks where you just choose x and y and there's no assumption that x is a specifc type (eg time) |
@williaster there are a few reasons.
|
fc2aa08
to
1f2e61c
Compare
That makes sense in general, but doesn't seem like it makes the case for a new chart type to me. It seems like matching
We know which columns are time columns right? It seems like we could just leverage that information to toggle on / off advanced analytics + annotations when the x-axis is time? there is precedent with showing/hiding controls in your annotations work and in the future annotations could be updated to support non-time xy-charts. while this might be considered more work, it seems like less work than a future refactor of consolidating visualizations even further if we want to move to a model where we have a generic xy chart? (which again, is extremely prevalent in other plotting frameworks) perhaps I'm wrong but I really just think there's a lot of overlap here with the existing line chart and instead of bloating the # of visualizations it seems like this would just be a good generalization to make. |
@williaster I was thinking that it should be different chart types as there are many controls that should become inactive when out of the time series mode. I'd say that one of the guidelines for creating new chart type as opposed to overloading an existing one is around the % of shared/relevant controls. We currently don't have support for conditional form controls, meaning that selecting certain option in one control would affect the existence of other controls or control-sections. Until we have this I think it should be 2 different chart types. I'm more troubled by the |
@mistercrunch I agree with all that, esp about avoiding long-term having selectors be aware of aggregation sounds gr8, and we can consolidate vis's then. |
Hey guys. What's the progress on this? Any chance it will make the 0.23 release? What love to work with this visualization! 😃 |
Won't make 0.23.0 for sure as the RC has been cut. Only bug fixes will make 0.23.0 from this point. |
Rename vars Small fixes for XY charts on table sources XY plots work on druid datasources, not refactored Don't touch the query models, redo the UI spread syntax apply linter
1f2e61c
to
77587ea
Compare
Hey guys. What's the progress on this? What love to work with this visualization for me too! 😃 |
No need |
@jeffreythewang hello , This branch 'Add new visualization for custom X-Y axes ' , Has it merged with the main branch ? The x-axis need not necessarily be a time/date, it could just be any number. Would be great to make that possible! Can I use it? |
Hello everybody, what's going on? this visualization is what I want too 😃 |
Any update on this? this seems to be one of the main pain points for many folks. |
@mistercrunch if we're committed to potentially merging this in, I can rebase |
as we refactor all charts to use a consistent API and move them to @jeffreythewang if we help with documentation for how to create such a plugin / do this migration now, would you be open to making this a chart plugin that lives outside the superset repository? |
@williaster Yeah sounds good to me |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Curious if this made it into superset as a chart plugin @jeffreythewang |
Summary
Description
I added a new visualization for plotting data points on a line chart which allows the user to specify any column or metric of the query as the X-Y axis. It is based entirely on the query fields of
TableViz
. This way, you can easily switch between the Table View and the XY Line Chart, since they have similar form data fields. I made no changes to the query objects, and primarily made changes to the resultant pandas DataFrame. Some features include:Examples
Group-By Query (with multiple aggregations plotted on the Y-Axis)
Generic Column Selection (with slicing)
Scatter Plot
Related Issues
#3769
#3184