Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draw contours for scatterplots #955

Closed
miakramer opened this issue Feb 4, 2017 · 11 comments · Fixed by #959
Closed

Draw contours for scatterplots #955

miakramer opened this issue Feb 4, 2017 · 11 comments · Fixed by #959

Comments

@miakramer
Copy link

I'm using Geom.contour to draw contour lines on top of a scatter plot. Right now I'm creating my own 2d bins and using this, but this also has the disadvantage of not drawing a contour line around the outer portions of the points. This would be a really nice feature to have, though I don't know how hard it might be to implement efficiently.

@tlnagy
Copy link
Member

tlnagy commented Feb 4, 2017

Could you give a minimal example (with a plot preferably) of what you would like to have?

@miakramer
Copy link
Author

miakramer commented Feb 4, 2017

This image shows pretty well what would be useful. This is about as close as I could get. I used the following code to bin the data:

function contourify(x::Symbol, y::Symbol, xmin::Int, xmax::Int, ymin::Int, ymax::Int, xres::Int, yres::Int)
    A = zeros(Int, (xres, yres))
    xdata  = TABLE1[x] .+ (0-xmin)
    ydata  = TABLE1[y] .+ (0-ymin)
    xdelta = (xmax - xmin) / xres
    ydelta = (ymax - ymin) / yres
    for i in 1:length(xdata)
        A[ (xdata[i] / xdelta |> ceil |> Int),
           (ydata[i] / ydelta |> ceil |> Int) ] += 1
    end
    A, [xmin + xdelta * i for i in 0:xres-1], [ymin + ydelta * i for i in 0:yres-1]
end

I'm sure there are better ways, but I just wanted a quick and dirty solution as I'm not using a huge dataset. There's probably an even better way, which would be something like finding the mean of all points, then drawing a curve around the closes n points to the centre, then the closest n+10 or whatever points.

I know that what's causing the problem with mine (tiny contour lines centred around the middle) is a more exponential falloff, but there should be some good way of dealing with that, I assume.

@bjarthur
Copy link
Member

bjarthur commented Feb 5, 2017

i've got some code to do this. will post tomorrow.

@bjarthur
Copy link
Member

bjarthur commented Feb 5, 2017

the "better way" is to use KernelDensity.jl instead of binning manually.

image

julia> using Gadfly, Distributions, Contour, KernelDensity

julia> function plot_contour_lines(xdata, ydata;
             xrange=extrema(xdata), yrange=extrema(ydata), zsteps=0.5.^collect(1:2:8))
         xstep=linspace(xrange...,100)
         ystep=linspace(yrange...,100)
         k=kde((xdata,ydata),(xstep,ystep))
         layers=[]
         for h in maximum(k.density).*zsteps
           c=Contour.contour(xstep,ystep,k.density,h)
           for l in lines(c)
             xvals, yvals = coordinates(l)
             push!(layers, layer(x=xvals, y=yvals, Geom.line(preserve_order=true),
                   Theme(default_color=colorant"red")))
           end
         end
         layers
       end
plot_contour_lines (generic function with 1 method)

julia> xdata = rand(Rayleigh(2),1000);

julia> ydata = rand(Rayleigh(2),1000);

julia> contours = plot_contour_lines(xdata, ydata);

julia> plot(contours..., layer(x=xdata, y=ydata, Geom.point))

would be great to see this built-in to Gadfly. let me know how i can help with that.

@miakramer
Copy link
Author

Awesome, thank you!

Yes, I agree this would make a good addition. Options for having linear, logarithmic, exponential, etc. density falloffs would be nice as well.

@tlnagy
Copy link
Member

tlnagy commented Feb 7, 2017

Great work @bjarthur! Do you have any thoughts on what kind of interface might be best? It seems a bit specific right now.

@bjarthur
Copy link
Member

bjarthur commented Feb 7, 2017

perhaps the aesthetics of Geom.contour could be refactored to input an array of scatter points, in addition to the already existing function or matrix. there could then be an optional flag which specified whether to plot them or not, in addition to the mandatory contour lines.

if that makes sense, i can draft a PR.

@tlnagy
Copy link
Member

tlnagy commented Feb 7, 2017

What about doing something like what ggplot does?

ggplot(faithful, aes(waiting, eruptions)) +
  geom_density_2d()

from http://docs.ggplot2.org/current/geom_contour.html

So then you can plot(x=xs, y=ys, Geom.point, Geom.density2d). This is a bit more in line with Gadfly's composable interface.

@bjarthur
Copy link
Member

bjarthur commented Feb 9, 2017

sounds good. will try to get to this next week.

@bjarthur
Copy link
Member

@mxkramer see #959 and let me know what you think.

@tlnagy
Copy link
Member

tlnagy commented Feb 23, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants