Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear regression? #168

Closed
mbostock opened this issue Feb 24, 2021 · 10 comments · Fixed by #945
Closed

Linear regression? #168

mbostock opened this issue Feb 24, 2021 · 10 comments · Fixed by #945
Labels
enhancement New feature or request

Comments

@mbostock
Copy link
Member

mbostock commented Feb 24, 2021

E.g.,

untitled (6)

@mbostock mbostock added the enhancement New feature or request label Feb 24, 2021
@mbostock
Copy link
Member Author

mbostock commented Feb 24, 2021

Prototype #105 https://observablehq.com/d/9fc6187b5f9d2293

Conclusion was that it was dangerous to do as a mark, because we want to be able to inspect the result and understand the significance of the correlation. So, we may want to provide this as a transform, not just a mark, but also think about how the mark could visualize the significance by default.

Also we want to support other types of regressions as d3-regression does…

@Fil
Copy link
Contributor

Fil commented Jun 18, 2021

Upgraded prototype at https://observablehq.com/@fil/plot-regression, with a nice case of Simpson's paradox.

untitled - 2021-06-18T154504 796

@Fil
Copy link
Contributor

Fil commented Jun 18, 2021

I'd like to compute the standard error / confidence intervals, probably have to start by locating the R source code.

(I don't think we want this baked-in in Plot, but once we know how to compute it the 95%-band it will be neat to add it to the chart.)

@mbostock
Copy link
Member Author

Bumping this… I want the equivalent of ggplot2’s geom_smooth, probably with LOESS. Perhaps we can use the loess package.

@Fil
Copy link
Contributor

Fil commented Mar 1, 2022

Here's a loess transform plugin https://observablehq.com/@observablehq/plot-loess-168
(As usual it depends on #411 for Plot.column and Plot.transform)

Capture d’écran 2022-03-01 à 22 13 14

@mbostock
Copy link
Member Author

mbostock commented Apr 14, 2022

I’ve updated the linear regression prototype here:

https://observablehq.com/d/9fc6187b5f9d2293

Fil, your LOESS prototype is promising. One question I have is whether we should be sampling the LOESS prediction for every input datum, or if there is some other common method of sampling that will produce a smoother or more efficient output? It looks like there is a model.grid() function in LOESS for getting uniform samples.

@mbostock
Copy link
Member Author

There remains the question of how we might show the confidence interval (or the significance of the correlation) for a linear regression:

First remark is that a linear regression is primarily a statistical analysis and modelling technique, and it's unfortunate if we can't get the model back from plot. In particular, we would like to get the significance of the correlation back, not only the trend. Currently if the correlation is too weak to be meaningful, Plot will happily display a line, which might be the wrong thing to do. We'd also want to get the "predict" function back, in most cases.

And whether it would be better to compute the linear regression in screen space, say as an initializer #801, so that it incorporates any scale’s mathematical transform (such as a log or sqrt scale).

And how we would plug in other regression implementations. My prototype is limited to linear, but if we want to support nonlinear regressions we’d need to generate a line rather than a link, as Fil does in the LOESS prototype.

@mbostock
Copy link
Member Author

Also would we want some easy way to pull out R^2 and show summary statistics alongside the regression?

@nshiab
Copy link

nshiab commented Apr 15, 2022

I think it's important to have the R^2, the equation and other statistics if possible. Here's what I ended up doing for one of my projects.

Capture d’écran 2022-04-15 à 13 45 53

I needed a facet wrap with linear regression on each chart. I used d3-regression to get the predict function, the R^2 and the equation. With the predict function, I created the data for the line mark. And the R^2 and the equation are inside a paragraph above the chart. For the facet wrap, it's just multiple plots inside a div with position flex.

The code is ugly, and it's not very flexible... But you get the idea of the final result I tried to achieve. It was important to have the R^2 to identify in which city the relationship was the strongest. The trend line alone wouldn't be enough.

@Fil
Copy link
Contributor

Fil commented Apr 26, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants