-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate bandwidth in x and y directions? + NRD formula #27
Comments
I've now implemented this through various test notebooks that are not yet fully ready (coming soon). I'm enthusiastic about the idea of selecting the bandwidth relative to the variance of each dimension, but there are already a few observations I can share: First, there is an obvious case where it doesn't work: when the data has no variance (maybe it's a single point, or points that show very small variation in y, for some reason). In those cases we wouldn't want the density contours to be flattened to a line. So there's going to be a sort of minimum, which can be set at the (arbitrary) default value of 20 pixels that exists already. Second, the way it looks is a bit underwhelming. The current strategy creates "circles" around the data, the x/y aspect ratio creates "ellipses" (on purpose). Certainly nicer for statistics, but not as nice on the eye. So, I would not want to have a different aspect ratio with the default bandwidth generator. Third, the nrd formula returns values that don't coincide with the way we use the given bandwidth. (Currently bandwidth represents, let's say, the radius of 1 iteration of blurring on a 4x grid, whereas in the litterature it's something like the std dev of the gaussian.) In my experiments, the scale factor between these values is about 5. As a consequence, either we change, and users will have to rescale their hand-tuned bandwidths (my experience with this is that it's always hand-tuned to give a "nice" graph), or we continue with the same "bandwidth" and scale nrd to match what it's supposed to deliver, but its statistical properties are incorrect. Maybe a solution could be to deprecate bandwidth() and replace it with a new name like blur() or something. |
Here's an implementation that seems to work, based on the new d3.blur proposal. The remarks above still stand. |
I figure that as a first step we should ship a version that accepts x/y bandwidths as inputs, and allow experimentations (this depends in turn on d3.blur (d3/d3-array#151). For the nrd stuff, I'd wait for serious statisticians to test and validate the approach. |
As discussed over at the Observable forum, it might be nice for the bandwidth to accept a 2-entry list or an object with
x
andy
attributes or the like, especially since internally the implementation is already blurring separately inx
andy
directions.The text was updated successfully, but these errors were encountered: