Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange eltype behavior #1402

Closed
cscherrer opened this issue Oct 6, 2021 · 24 comments
Closed

Strange eltype behavior #1402

cscherrer opened this issue Oct 6, 2021 · 24 comments

Comments

@cscherrer
Copy link

This seems problematic:

julia> using Distributions

julia> eltype(Normal())
Float64

julia> eltype(MvNormal(ones(3)))
Float64

I found this very surprising, as I'd guess many others have/will. In general, it seems reasonable to expect a law that for any d::Distribution,

typeof(rand(d)) <: eltype(d)

My biggest concern here is that I'd like to implement eltype in MeasureTheory.jl, and I don't know if people might expect it to behave as it does in Distributions.

Could there be a path to changing this?

@devmotion
Copy link
Member

This has been the topic of multiple issues and PRs. The short version is: you can't expect typeof(rand(d)) <: eltype(d) as eltype only refers to the type of the parameters of d but not the type of rand(d) currently. And for multi-, matrix-, and highervariate distributions it always refers to the unpacked type.

I'll close this issue since it is a duplicate of eg #1071.

@cscherrer
Copy link
Author

Thanks for the link! Looks like this has been quite a saga. This design decision for eltype to refer to the parameters is very confusing, and seems inconsistent with the rest of Julia. I think it may be best for MeasureTheory to ignore this quirk and just focus of having its eltype consistent with Base.

@oschulz
Copy link
Contributor

oschulz commented Oct 11, 2021

It would be nice to have a way to get the default variate type for a distribution, though (which should then match the type that comes out of rand).

@devmotion
Copy link
Member

which should then match the type that comes out of rand.

The main issue here is that it is not always possible to compute the type of rand(d) in a nice way, e.g., if sampling is based on a complicated push forward operation or you want to use different "elementary" random samples, e.g. rand(), rand(Float32), rand(Float16), or rand(BigFloat) for different precision.

@oschulz
Copy link
Contributor

oschulz commented Oct 11, 2021

You mean it would be difficult to "synchronize" the behavior of rand with a "variate type spec" attached to the distribution?

@mschauer mschauer changed the title Strange eltype bahavior Strange eltype behavior Oct 11, 2021
@devmotion
Copy link
Member

Yes. I assume, if you want to include the type of rand(d) as a type information (which I think can be limiting in more complicated settings) then the cleanest way would be an additional type parameter (currently type parameters just indicate the type of the parameters but not the type of rand(d)).

@mschauer
Copy link
Member

Yeah, you know there is a parallel infrastructure where the type of rand(d) is derived from ValueSupport and VariateForm in most cases https://github.com/JuliaStats/Distributions.jl/blob/625b72237a342c8d3bf60ec05541f8cb4a78faff/src/common.jl

@devmotion
Copy link
Member

Sure but unfortunately it is only a heuristic and requires e.g. float in

rand!(rng, s, Array{float(eltype(s))}(undef, dims))
.

@oschulz
Copy link
Contributor

oschulz commented Oct 11, 2021

I actually often wished for something that gives me the type of the variate/rand-result when writing generic code that deals with distributions. I don't think a type-parameter would work for that, since not all distributions (e.g. truncated) would have it, at least not all in a predictable position in the list of type parameters.

But couldn't we have a function vartype or variate_type? It wouldn't return an element type, but the full type of a single variate, so Float64, or Int (for univariate dists), or Vector{Float32}, etc. (for multivariate dists), and so on. It should be straightforward to implement for the "primitive/elementary" distributions, and I think "complex" distributions (truncated, product, etc.) could calculate/forward it correctly.

It would of course go hand-in-hand with VariateForm, but add the default numeric type, resp. precision.

@devmotion
Copy link
Member

It wouldn't return an element type, but the full type of a single variate, so Float64, or Int, or Vector{Float32}, etc.

I think the only reliable way that always works is to call typeof(rand(d)) 🤷 I guess for many simple cases a heuristic such as float(eltype(d)) for continuous univariate distributions will work but it's not guaranteed to be correct.

@oschulz
Copy link
Contributor

oschulz commented Oct 11, 2021

I think the only reliable way that always works ... I guess for many simple cases a heuristic

Well, however implements rand should be able to predict the outcome, right? :-) It didn't mean it as a heuristic, but as something that (at least for many distributions) would be implemented explicitly.

@devmotion
Copy link
Member

Well, however implements rand should be able to predict the outcome, right?

I disagree in general here 🙂 Even if you implement rand by hand it can be difficult to write down a function that computes the return type correctly since you have to make sure that any shortcut works for all possible parameter types and RNGs (and desired types such as Float32 etc. if you let users specify them) [OT: there were many type instabilities in pdf/logpdf implementations because of such shortcuts instead of just evaluating pdf/logpdf and returning oftype(result, -Inf) for values outside of the support]. And it becomes even trickier if you work eg. with Turing models and only implement the model, possibly depending on a large number of parameters, and want to compute the return type when sampling from the model.

@mschauer
Copy link
Member

We know how to deal with that, we have exactly the same behaviour, the same issues for iterators and we introduced traits Base.EltypeUnknown accordingly but I agree that this is just rehashing the same discussion in a new place.

@cscherrer
Copy link
Author

On Zulip, @ExpandingMan pointed out that the law satisfied seems to be

eltype(d::Distribution) == eltype(rand(d))

It's reasonable to have a way to compute that, it's just not the thing I'm usually interested in. For MeasureTheory, I think it was @phipsgabler who suggested a sampletype function. We did implement this, though we don't lean on it too heavily so it mostly scaffolding to this point.

Thinking some more about this lately, I'm thinking the default implementation could be

sampletype(m::M) where {M<:AbstractMeasure}= Core.Compiler.return_type(rand, Tuple{M})

In some cases, this might break or give us something too wide, which we can narrow with added methods if we need to. I'd think Distributions could have something similar.

For the issue of RNG inputs, @devmotion I think your suggestion of another argument to rand seems to work well.

@devmotion
Copy link
Member

eltype(d::Distribution) == eltype(rand(d))

As said before, this is also not a general property enforced or a design currently, usually eltype really only depends on the parameters. E.g.

julia> using Distributions

julia> eltype(Normal{Int}(0, 1))
Int64

julia> eltype(rand(Normal{Int}(0, 1)))
Float64

julia> eltype(Dirichlet(5, 1))
Int64

julia> eltype(rand(Dirichlet(5, 1)))
Float64

I really think it is not a good idea to use Core.Compiler.return_type in any higher-level package or user-facing code, even if it is "only" a default implementation.

@cscherrer
Copy link
Author

As said before, this is also not a general property enforced or a design currently, usually eltype really only depends on the parameters.

Is there discussion somewhere explaining how this is a good thing?

I really think it is not a good idea to use Core.Compiler.return_type in any higher-level package or user-facing code, even if it is "only" a default implementation.

Why not? Seems like a great use of the abstract interpretation in the compiler (I assume that's how it works). Hopefully the abstract interpretation will itself be user-facing at some point, but until then this I'd think this is a reasonable workaround.

@devmotion
Copy link
Member

Is there discussion somewhere explaining how this is a good thing?

Yes, as mentioned multiple times this discussion here is completely redundant and just a duplicate of many older issues and PRs 😄 Here eltype(::Type{Normal{T}}) = T and eltype(::Type{<:Dirichlet{T}}) = T causes the Int but of course the samples can't be Int in both cases. One could bake in the float(eltype(d)) heuristic in the definition of eltype already (i.e., make sure it does not return integer types) but this was deemed to be confusing and non-standard. And I have to admit, since it is a heuristic for the initialization of arrays for rand!, it seems more appropriate to put this heuristic in rand! and don't try to be too clever in eltype - for other non-rand use cases it might actually be relevant and interesting if it is a Normal{Int} or Normal{Float64} (it allows also to avoid static type parameters and just use eltype(d) in the function body). So with the current use of eltype as the element type of the parameters (which as explained in this discussion again is much easier to reason about and to define correctly) these examples are unavoidable. As I mentioned above, if you want to reason about typeof(rand(d)) (and possibly fix it a priori) it would be cleaner to add it as an additional type parameter (could be initialized based on something like sampletype or just the current default Float64 for continuous univariate distributions) but not mess with the reported element types of the parameters.

Why not?

Because it will break in all kinds of ways (e.g. JuliaLang/julia#41442 and JuliaLang/julia#35910) and is always allowed to return Any, and hence IMO it is not a safe default implementation. It seems much simpler to just define sampletype(d) = typeof(rand(d)) as fallback and provide optimizations in more restrictive scenarios whenever it is allowed.

@cscherrer
Copy link
Author

Yes, as mentioned multiple times

Ok, I saw discussion that it was that way, but didn't see anything about why it was that way.

The details you give are helpful for this, and it looks like we just expect entirely different use cases for the function. I think of eltype as answering "what type of values does this container hold?" (distributions are a kind of container). In some cases, Any might be all we know.

It seems you expect it to be more like, "What primitive type should be used to instantiate arrays constructed using this?". That a fine question, just very different than I expected. From the many issues and PRs, it seems pretty common for people to be surprised by this usage.

@oschulz
Copy link
Contributor

oschulz commented Oct 11, 2021

distributions are a kind of container

I'm not sure if that's a good way to view distributions in all use cases, at least not as a container of variates. In retrospect, maybe defining size and eltype on distributions was a bit misleading - depending on what you do, you may focus on the parameters of the distributions or the variates. Now we have size returning the size of the variates and eltype returning the type of the dist params. :-)

That's why I thought we should maybe have a vartype or variate_type for the type of the variates, if technically feasible.

Apart from that, though - what should eltype return if a distribution has both Integer and Real parameters?

@rfourquet
Copy link

As I alluded to in #882 (comment), there is Random.gentype for the purpose described here. There was an issue about removing it (JuliaLang/julia#31968), but I would like to close it because of a couple of use cases I have, and the situation in Distributions.jl suggests also to not remove it.

@mschauer
Copy link
Member

Now we have size returning the size of the variates and eltype returning the type of the dist params. :-)

One more of our original sins.

@oschulz
Copy link
Contributor

oschulz commented Nov 22, 2021

and the situation in Distributions.jl suggests also to not remove it.

So we'd define gentype for distributions to return a/the (default) variate type?

@cscherrer
Copy link
Author

This looks great! And for non-distributional things it seems to act like eltype:

julia> Random.gentype([randn(3) for j in 1:4])
Vector{Float64} (alias for Array{Float64, 1})

So maybe we drop sampletype and use this? We should find a place to discuss what laws it's expected to follow, so we can avoid ending up in the eltype situation again.

@oschulz
Copy link
Contributor

oschulz commented Nov 22, 2021

I'd love to have some official way to provide sample type information. I'm currently expanding the concept of NamedTuple-distributions and similar in ValueShapes.jl, and were samples aren't just scalars and arrays anymore this capability would be very useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants