Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plotting methods #38

Closed
joshday opened this issue Sep 24, 2015 · 11 comments
Closed

Plotting methods #38

joshday opened this issue Sep 24, 2015 · 11 comments

Comments

@joshday
Copy link
Owner

joshday commented Sep 24, 2015

Continued from JuliaPlots/Plots.jl#30

Since Plots.jl exists, we can now implement plotting methods with less worry about a user's preferred plotting package.

In the josh branch, I currently have a new traceplot!(o, b, data...; f) function which plots the value x = nobs(o) y = f(o) for every b observations. f defaults to v -> state(v)[1], but any function that works on the OnlineStat and returns a Real scalar or Vector should work.

Edit for example:

x = randn(10_000, 10)
beta = collect(1:10)
y = x*beta + randn(10_000)

o = SGModel(10)
traceplot!(o, 100, x, y)
@tbreloff
Copy link
Collaborator

👍 Cool... this is certainly the type of thing I was thinking of.

It would also be great to add some "plotting recipes" that work with various stats. For example, it would be really cool to be able to do a pca/svd on a scatter plot and add overlays like the image from wikipedia:

image

@tbreloff
Copy link
Collaborator

I was just peaking through your plotmethods code... what do you think about adding a Function keyword arg ontrace::Function to the tracefit! method, and changing it to call the function instead of adding to and returning an array of values:

function tracefit!(o::OnlineStat, b::Integer, data...; batch::Bool = false, ontrace::Function = nop)
    b = @compat Int(b)
    n = nrows(data[1])
    i = 1
    s = state(o)
    #result = [copy(o)]
    while i <= n
        rng = i:min(i + b - 1, n)
        batch_data = map(x -> rows(x, rng), data)
        batch ? updatebatch!(o, batch_data...) : update!(o, batch_data...)
        #push!(result, copy(o))
        ontrace(o)
        i += b
    end
    #result
    return
end

# then implement the current functionality like (untested code):
o = Mean()
result = Mean[]
tracefit!(o, 1, rand(10); ontrace = o->push!(result, copy(o)))

The advantage here is that you can use the same tracefit! method to add to a plot or do something else other than return a bunch of copies of an object.

Although, I think this interface could be cleaned up further, so maybe add a new method with this functionality for now?

@joshday
Copy link
Owner Author

joshday commented Oct 12, 2015

Since my end goal was making plots, I like this idea.

b should be changed to a keyword argument. It just looks so interrupting. In general tracefit! needs a rewrite.

@joshday
Copy link
Owner Author

joshday commented Oct 12, 2015

What about something like this? It updates an OnlineStat (with all of data by default, but you can pass a different batch size), and calls a function after each update.

I think it could replace onlinefit!, tracefit!, and traceplot!. EDIT: Maybe not replace traceplot!, but traceplot! would call this function.

function update_do!(o::OnlineStat, data...;
        b::Integer = size(data[1], 1),
        dothis::Function = x -> nothing,
        batch::Bool = false
    )
    b = @compat Int(b)
    n = size(data[1], 1)
    i = 1
    while i <= n
        rng = i:min(i + b - 1, n)
        batch_data = map(x -> rows(x, rng), data)
        batch ? updatebatch!(o, batch_data...) : update!(o, batch_data...)
        i += b
        dothis(o)
    end
end
julia> o = OnlineStats.Mean();

julia> OnlineStats.update_do!(o, randn(100), b = 50, dothis = o -> println(nobs(o)))
50
100

@tbreloff
Copy link
Collaborator

I certainly like it more as part of the core update function, but would argue that having both update! and update_do! doesn't change anything. Also while we're at it, how do you feel about a name change to fit!?

Here's a possible abstraction for callbacks. designed as functors:

julia> abstract OnlineCallback

julia> immutable DoEvery <: OnlineCallback
           b::Int
           f::Function
       end

julia> cb = DoEvery(5, o->println("callback for $o."))
DoEvery(5,(anonymous function))

julia> Base.call(cb::DoEvery, i::Integer, o) = mod1(i,cb.b)==cb.b ? cb.f(o) : nothing
call (generic function with 1501 methods)

julia> for i in 1:20
           cb(i,i^2)
       end
callback for 25.
callback for 100.
callback for 225.
callback for 400.

Then a rough draft for altering the method, assuming we can add a parameter which defines whether a stat should update batch or not:

immutable NoCallback <: OnlineCallback end
Base.call(cb::NoCallback, args...) = nothing

abstract BatchMode
immutable Batch <: BatchMode end
immutable Online <: BatchMode end

function fit!(o::OnlineStat{Online}, x, y;
                                                     onupdate::Function = NoCallback())
    for i in 1:size(x,1)
        fit!(o, row(x,i), row(y,i))
        onupdate(i, o)
    end
end

function fit!(o::OnlineStat{Batch}, x, y; 
                                                     onupdate::Function = NoCallback(),
                                                     b::Integer = default_batch_size(o))
    n = size(x, 1)
    i = 0
    while i < n
        rng = i+1:min(i + b, n)
        batchfit!(o,  rows(x,rng), rows(y,rng))
        i += b
        onupdate(i, o)
    end
end

@joshday
Copy link
Owner Author

joshday commented Oct 12, 2015

It would be nice to extend fit!. I think update! is a better verb, (fit a variance doesn't sound right), but maybe it's too general and prone to name conflicts. I could go either way.

I'm worried that OnlineCallback/BatchMode is too elegant. With a few small changes to update_do! (including a better name), it seems like all that functionality could be included.

Do we just need a more general update! method that incorporates callbacks and batch updates?

@tbreloff
Copy link
Collaborator

It's always bugged me that there are different verbs for batch update vs singleton updates. Singleton updates are just a batch size of 1, no?

I mostly agree with you about fit vs update. I'm just trying to reconcile the verbs here with the ones that may come out of JuliaML/LossFunctions.jl#3. I really want to be able to chain these things together into a pipeline, and having different verbs (which mean essentially the same thing) will ruin that. Maybe the primary verb there should be update as well (could even consider leaving off the ! since the verb is clear that it mutates.), and just define:

module LearnBase
  update(o, args...) = @not_implemented
  transform(o, args...) = @not_implemented

  const fit = update
  const solve = update
  const predict = transform
end

Any anyone can choose the one that feels natural to them. Although maybe we should actually reference any verbs from StatsBase instead.

@Evizero
Copy link

Evizero commented Oct 12, 2015

Personally I dislike fit as well. It doesn't sound right for many situations. I think it's just an artefact for historic reasons that originated from JuliaStats. To me the most natural would be train.

But from a community perspective I think that it's probably best if we just adopt the language standard of Julia. Using synonyms seems like an ugly solution and just makes the code less readable for outsiders. Let's not make it more complicated than it needs to be and just use fit if something learns from data.

@joshday
Copy link
Owner Author

joshday commented Oct 12, 2015

Thanks for the link! A unified framework is enough to sell me on fit!.

@Evizero
Copy link

Evizero commented Oct 12, 2015

Great to have you on board. There is a lot to learn from your package and I think we are at a good point in time to attempt such a design coordination. I think combined we cover enough of a scope so that others would likely follow (provided we succeed in this endeavour which I am confident about)

@joshday
Copy link
Owner Author

joshday commented Nov 5, 2015

#42 is relevant. All plotting methods should go in the plots.jl file which gets included with Requires.

@joshday joshday closed this as completed Nov 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants