Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DiscreteNonParametric and Categorical Construction Issue #1832

Open
btmit opened this issue Jan 30, 2024 · 4 comments · May be fixed by #1908
Open

DiscreteNonParametric and Categorical Construction Issue #1832

btmit opened this issue Jan 30, 2024 · 4 comments · May be fixed by #1908

Comments

@btmit
Copy link

btmit commented Jan 30, 2024

Construction of a Categorical distribution seems to make a copy of the p vector. I see this through profiling, @btime and the fact that I can't see changes in the original vector after I create the Categorical. There are three issues I see:

  1. Categorical docstring includes the following: "Note: The input vector p is directly used as a field of the constructed distribution, without being copied." which seems incorrect.
  2. Performance issues in critical sections of code where this allocation can really add up
  3. Bugs such as the following:
using Distributions
x = rand(3,5)
x = x ./ sum(x, dims=1)  # each column is a valid probability vector
c = Categorical.(eachcol(x))

julia> c = Categorical.(eachcol(x))
ERROR: MethodError: Cannot convert an object of type Vector{Float64} to an object of type SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}

I believe the underlying issue is that the DiscreteNonParametric inner constructor tries to sort and reorder everything, which creates a copy and then the constructor doesn't update the type.

@devmotion devmotion linked a pull request Oct 2, 2024 that will close this issue
@JockLawrie
Copy link

JockLawrie commented Jan 8, 2025

Just an anecdote, I'm hitting the performance implication of this issue. I'm running discrete event simulations that construct Categorical distributions from several statistical models 100s of billions of times. The ability to reuse p would help greatly here. Alternatively, supplying a tuple instead of a vector would work too.

@devmotion
Copy link
Member

Did you compare it with #1908?

@JockLawrie
Copy link

Yes PR 1908 looks like it fixes this issue, thanks. Is there anything blocking it being merged?

@devmotion
Copy link
Member

It hasn't been approved yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants