-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linear regression? #168
Comments
Prototype #105 https://observablehq.com/d/9fc6187b5f9d2293 Conclusion was that it was dangerous to do as a mark, because we want to be able to inspect the result and understand the significance of the correlation. So, we may want to provide this as a transform, not just a mark, but also think about how the mark could visualize the significance by default. Also we want to support other types of regressions as d3-regression does… |
Upgraded prototype at https://observablehq.com/@fil/plot-regression, with a nice case of Simpson's paradox. |
I'd like to compute the standard error / confidence intervals, probably have to start by locating the R source code. (I don't think we want this baked-in in Plot, but once we know how to compute it the 95%-band it will be neat to add it to the chart.) |
Bumping this… I want the equivalent of ggplot2’s geom_smooth, probably with LOESS. Perhaps we can use the loess package. |
Here's a loess transform plugin https://observablehq.com/@observablehq/plot-loess-168 |
I’ve updated the linear regression prototype here: https://observablehq.com/d/9fc6187b5f9d2293 Fil, your LOESS prototype is promising. One question I have is whether we should be sampling the LOESS prediction for every input datum, or if there is some other common method of sampling that will produce a smoother or more efficient output? It looks like there is a model.grid() function in LOESS for getting uniform samples. |
There remains the question of how we might show the confidence interval (or the significance of the correlation) for a linear regression:
And whether it would be better to compute the linear regression in screen space, say as an initializer #801, so that it incorporates any scale’s mathematical transform (such as a log or sqrt scale). And how we would plug in other regression implementations. My prototype is limited to linear, but if we want to support nonlinear regressions we’d need to generate a line rather than a link, as Fil does in the LOESS prototype. |
Also would we want some easy way to pull out R^2 and show summary statistics alongside the regression? |
I think it's important to have the R^2, the equation and other statistics if possible. Here's what I ended up doing for one of my projects. I needed a facet wrap with linear regression on each chart. I used d3-regression to get the predict function, the R^2 and the equation. With the predict function, I created the data for the line mark. And the R^2 and the equation are inside a paragraph above the chart. For the facet wrap, it's just multiple plots inside a div with position flex. The code is ugly, and it's not very flexible... But you get the idea of the final result I tried to achieve. It was important to have the R^2 to identify in which city the relationship was the strongest. The trend line alone wouldn't be enough. |
Another example, by @mukhtyar |
E.g.,
The text was updated successfully, but these errors were encountered: