Strange `eltype` behavior #1402

cscherrer · 2021-10-06T14:17:01Z

This seems problematic:

julia> using Distributions

julia> eltype(Normal())
Float64

julia> eltype(MvNormal(ones(3)))
Float64

I found this very surprising, as I'd guess many others have/will. In general, it seems reasonable to expect a law that for any d::Distribution,

typeof(rand(d)) <: eltype(d)

My biggest concern here is that I'd like to implement eltype in MeasureTheory.jl, and I don't know if people might expect it to behave as it does in Distributions.

Could there be a path to changing this?

The text was updated successfully, but these errors were encountered:

devmotion · 2021-10-06T14:45:15Z

This has been the topic of multiple issues and PRs. The short version is: you can't expect typeof(rand(d)) <: eltype(d) as eltype only refers to the type of the parameters of d but not the type of rand(d) currently. And for multi-, matrix-, and highervariate distributions it always refers to the unpacked type.

I'll close this issue since it is a duplicate of eg #1071.

cscherrer · 2021-10-06T15:12:07Z

Thanks for the link! Looks like this has been quite a saga. This design decision for eltype to refer to the parameters is very confusing, and seems inconsistent with the rest of Julia. I think it may be best for MeasureTheory to ignore this quirk and just focus of having its eltype consistent with Base.

oschulz · 2021-10-11T11:23:04Z

It would be nice to have a way to get the default variate type for a distribution, though (which should then match the type that comes out of rand).

devmotion · 2021-10-11T11:28:52Z

which should then match the type that comes out of rand.

The main issue here is that it is not always possible to compute the type of rand(d) in a nice way, e.g., if sampling is based on a complicated push forward operation or you want to use different "elementary" random samples, e.g. rand(), rand(Float32), rand(Float16), or rand(BigFloat) for different precision.

oschulz · 2021-10-11T11:32:29Z

You mean it would be difficult to "synchronize" the behavior of rand with a "variate type spec" attached to the distribution?

devmotion · 2021-10-11T11:42:52Z

Yes. I assume, if you want to include the type of rand(d) as a type information (which I think can be limiting in more complicated settings) then the cleanest way would be an additional type parameter (currently type parameters just indicate the type of the parameters but not the type of rand(d)).

mschauer · 2021-10-11T11:48:02Z

Yeah, you know there is a parallel infrastructure where the type of rand(d) is derived from ValueSupport and VariateForm in most cases https://github.com/JuliaStats/Distributions.jl/blob/625b72237a342c8d3bf60ec05541f8cb4a78faff/src/common.jl

devmotion · 2021-10-11T11:50:25Z

Sure but unfortunately it is only a heuristic and requires e.g. float in

Distributions.jl/src/univariates.jl

Line 140 in 27fb31c

rand!(rng, s, Array{float(eltype(s))}(undef, dims))

.

oschulz · 2021-10-11T11:50:39Z

I actually often wished for something that gives me the type of the variate/rand-result when writing generic code that deals with distributions. I don't think a type-parameter would work for that, since not all distributions (e.g. truncated) would have it, at least not all in a predictable position in the list of type parameters.

But couldn't we have a function vartype or variate_type? It wouldn't return an element type, but the full type of a single variate, so Float64, or Int (for univariate dists), or Vector{Float32}, etc. (for multivariate dists), and so on. It should be straightforward to implement for the "primitive/elementary" distributions, and I think "complex" distributions (truncated, product, etc.) could calculate/forward it correctly.

It would of course go hand-in-hand with VariateForm, but add the default numeric type, resp. precision.

devmotion · 2021-10-11T11:54:08Z

It wouldn't return an element type, but the full type of a single variate, so Float64, or Int, or Vector{Float32}, etc.

I think the only reliable way that always works is to call typeof(rand(d)) 🤷 I guess for many simple cases a heuristic such as float(eltype(d)) for continuous univariate distributions will work but it's not guaranteed to be correct.

oschulz · 2021-10-11T11:55:52Z

I think the only reliable way that always works ... I guess for many simple cases a heuristic

Well, however implements rand should be able to predict the outcome, right? :-) It didn't mean it as a heuristic, but as something that (at least for many distributions) would be implemented explicitly.

devmotion · 2021-10-11T12:18:36Z

Well, however implements rand should be able to predict the outcome, right?

I disagree in general here 🙂 Even if you implement rand by hand it can be difficult to write down a function that computes the return type correctly since you have to make sure that any shortcut works for all possible parameter types and RNGs (and desired types such as Float32 etc. if you let users specify them) [OT: there were many type instabilities in pdf/logpdf implementations because of such shortcuts instead of just evaluating pdf/logpdf and returning oftype(result, -Inf) for values outside of the support]. And it becomes even trickier if you work eg. with Turing models and only implement the model, possibly depending on a large number of parameters, and want to compute the return type when sampling from the model.

mschauer · 2021-10-11T12:35:51Z

We know how to deal with that, we have exactly the same behaviour, the same issues for iterators and we introduced traits Base.EltypeUnknown accordingly but I agree that this is just rehashing the same discussion in a new place.

cscherrer · 2021-10-11T12:55:53Z

On Zulip, @ExpandingMan pointed out that the law satisfied seems to be

eltype(d::Distribution) == eltype(rand(d))

It's reasonable to have a way to compute that, it's just not the thing I'm usually interested in. For MeasureTheory, I think it was @phipsgabler who suggested a sampletype function. We did implement this, though we don't lean on it too heavily so it mostly scaffolding to this point.

Thinking some more about this lately, I'm thinking the default implementation could be

sampletype(m::M) where {M<:AbstractMeasure}= Core.Compiler.return_type(rand, Tuple{M})

In some cases, this might break or give us something too wide, which we can narrow with added methods if we need to. I'd think Distributions could have something similar.

For the issue of RNG inputs, @devmotion I think your suggestion of another argument to rand seems to work well.

devmotion · 2021-10-11T13:04:47Z

eltype(d::Distribution) == eltype(rand(d))

As said before, this is also not a general property enforced or a design currently, usually eltype really only depends on the parameters. E.g.

julia> using Distributions

julia> eltype(Normal{Int}(0, 1))
Int64

julia> eltype(rand(Normal{Int}(0, 1)))
Float64

julia> eltype(Dirichlet(5, 1))
Int64

julia> eltype(rand(Dirichlet(5, 1)))
Float64

I really think it is not a good idea to use Core.Compiler.return_type in any higher-level package or user-facing code, even if it is "only" a default implementation.

cscherrer · 2021-10-11T13:13:31Z

As said before, this is also not a general property enforced or a design currently, usually eltype really only depends on the parameters.

Is there discussion somewhere explaining how this is a good thing?

I really think it is not a good idea to use Core.Compiler.return_type in any higher-level package or user-facing code, even if it is "only" a default implementation.

Why not? Seems like a great use of the abstract interpretation in the compiler (I assume that's how it works). Hopefully the abstract interpretation will itself be user-facing at some point, but until then this I'd think this is a reasonable workaround.

devmotion · 2021-10-11T13:42:10Z

Is there discussion somewhere explaining how this is a good thing?

Yes, as mentioned multiple times this discussion here is completely redundant and just a duplicate of many older issues and PRs 😄 Here eltype(::Type{Normal{T}}) = T and eltype(::Type{<:Dirichlet{T}}) = T causes the Int but of course the samples can't be Int in both cases. One could bake in the float(eltype(d)) heuristic in the definition of eltype already (i.e., make sure it does not return integer types) but this was deemed to be confusing and non-standard. And I have to admit, since it is a heuristic for the initialization of arrays for rand!, it seems more appropriate to put this heuristic in rand! and don't try to be too clever in eltype - for other non-rand use cases it might actually be relevant and interesting if it is a Normal{Int} or Normal{Float64} (it allows also to avoid static type parameters and just use eltype(d) in the function body). So with the current use of eltype as the element type of the parameters (which as explained in this discussion again is much easier to reason about and to define correctly) these examples are unavoidable. As I mentioned above, if you want to reason about typeof(rand(d)) (and possibly fix it a priori) it would be cleaner to add it as an additional type parameter (could be initialized based on something like sampletype or just the current default Float64 for continuous univariate distributions) but not mess with the reported element types of the parameters.

Why not?

Because it will break in all kinds of ways (e.g. JuliaLang/julia#41442 and JuliaLang/julia#35910) and is always allowed to return Any, and hence IMO it is not a safe default implementation. It seems much simpler to just define sampletype(d) = typeof(rand(d)) as fallback and provide optimizations in more restrictive scenarios whenever it is allowed.

cscherrer · 2021-10-11T16:46:52Z

Yes, as mentioned multiple times

Ok, I saw discussion that it was that way, but didn't see anything about why it was that way.

The details you give are helpful for this, and it looks like we just expect entirely different use cases for the function. I think of eltype as answering "what type of values does this container hold?" (distributions are a kind of container). In some cases, Any might be all we know.

It seems you expect it to be more like, "What primitive type should be used to instantiate arrays constructed using this?". That a fine question, just very different than I expected. From the many issues and PRs, it seems pretty common for people to be surprised by this usage.

oschulz · 2021-10-11T19:51:02Z

distributions are a kind of container

I'm not sure if that's a good way to view distributions in all use cases, at least not as a container of variates. In retrospect, maybe defining size and eltype on distributions was a bit misleading - depending on what you do, you may focus on the parameters of the distributions or the variates. Now we have size returning the size of the variates and eltype returning the type of the dist params. :-)

That's why I thought we should maybe have a vartype or variate_type for the type of the variates, if technically feasible.

Apart from that, though - what should eltype return if a distribution has both Integer and Real parameters?

rfourquet · 2021-11-22T06:22:36Z

As I alluded to in #882 (comment), there is Random.gentype for the purpose described here. There was an issue about removing it (JuliaLang/julia#31968), but I would like to close it because of a couple of use cases I have, and the situation in Distributions.jl suggests also to not remove it.

mschauer · 2021-11-22T09:26:06Z

Now we have size returning the size of the variates and eltype returning the type of the dist params. :-)

One more of our original sins.

oschulz · 2021-11-22T12:46:23Z

and the situation in Distributions.jl suggests also to not remove it.

So we'd define gentype for distributions to return a/the (default) variate type?

cscherrer · 2021-11-22T13:46:45Z

This looks great! And for non-distributional things it seems to act like eltype:

julia> Random.gentype([randn(3) for j in 1:4])
Vector{Float64} (alias for Array{Float64, 1})

So maybe we drop sampletype and use this? We should find a place to discuss what laws it's expected to follow, so we can avoid ending up in the eltype situation again.

oschulz · 2021-11-22T15:23:31Z

I'd love to have some official way to provide sample type information. I'm currently expanding the concept of NamedTuple-distributions and similar in ValueShapes.jl, and were samples aren't just scalars and arrays anymore this capability would be very useful.

devmotion closed this as completed Oct 6, 2021

mschauer changed the title ~~Strange eltype bahavior~~ Strange eltype behavior Oct 11, 2021

devmotion mentioned this issue Sep 26, 2023

Base.eltype for uniform continuous distributions #1766

Closed

devmotion mentioned this issue Nov 23, 2023

Add NamedTupleVariate and ProductNamedTupleDistribution #1803

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange `eltype` behavior #1402

Strange `eltype` behavior #1402

cscherrer commented Oct 6, 2021

devmotion commented Oct 6, 2021

cscherrer commented Oct 6, 2021

oschulz commented Oct 11, 2021

devmotion commented Oct 11, 2021

oschulz commented Oct 11, 2021 •

edited

Loading

devmotion commented Oct 11, 2021

mschauer commented Oct 11, 2021

devmotion commented Oct 11, 2021

oschulz commented Oct 11, 2021 •

edited

Loading

devmotion commented Oct 11, 2021

oschulz commented Oct 11, 2021

devmotion commented Oct 11, 2021

mschauer commented Oct 11, 2021

cscherrer commented Oct 11, 2021

devmotion commented Oct 11, 2021

cscherrer commented Oct 11, 2021

devmotion commented Oct 11, 2021

cscherrer commented Oct 11, 2021

oschulz commented Oct 11, 2021 •

edited

Loading

rfourquet commented Nov 22, 2021

mschauer commented Nov 22, 2021

oschulz commented Nov 22, 2021

cscherrer commented Nov 22, 2021

oschulz commented Nov 22, 2021

Strange eltype behavior #1402

Strange eltype behavior #1402

Comments

cscherrer commented Oct 6, 2021

devmotion commented Oct 6, 2021

cscherrer commented Oct 6, 2021

oschulz commented Oct 11, 2021

devmotion commented Oct 11, 2021

oschulz commented Oct 11, 2021 • edited Loading

devmotion commented Oct 11, 2021

mschauer commented Oct 11, 2021

devmotion commented Oct 11, 2021

oschulz commented Oct 11, 2021 • edited Loading

devmotion commented Oct 11, 2021

oschulz commented Oct 11, 2021

devmotion commented Oct 11, 2021

mschauer commented Oct 11, 2021

cscherrer commented Oct 11, 2021

devmotion commented Oct 11, 2021

cscherrer commented Oct 11, 2021

devmotion commented Oct 11, 2021

cscherrer commented Oct 11, 2021

oschulz commented Oct 11, 2021 • edited Loading

rfourquet commented Nov 22, 2021

mschauer commented Nov 22, 2021

oschulz commented Nov 22, 2021

cscherrer commented Nov 22, 2021

oschulz commented Nov 22, 2021

Strange `eltype` behavior #1402

Strange `eltype` behavior #1402

oschulz commented Oct 11, 2021 •

edited

Loading

oschulz commented Oct 11, 2021 •

edited

Loading

oschulz commented Oct 11, 2021 •

edited

Loading