Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the future of Random.gentype #31968

Open
2 tasks
tpapp opened this issue May 8, 2019 · 21 comments · May be fixed by #32008
Open
2 tasks

the future of Random.gentype #31968

tpapp opened this issue May 8, 2019 · 21 comments · May be fixed by #32008
Labels
randomness Random number generation and the Random stdlib

Comments

@tpapp
Copy link
Contributor

tpapp commented May 8, 2019

Random.gentype was introduced in #27756, so that custom random samplers would not have to use Base.eltype for returning element types, but the former falls back to the latter.

In the discussion of the recent docs cleanup PR #31787 it was unclear whether to make Random.gentype part of the API.

It would be great to decide

  • whether Random.gentype(sampler) or Base.eltype(sampler) should be the API for querying typeof(rand(sampler)), especially if sampler is not a collection.
  • whether Random.gentype is needed at all, or should be removed from the internal code, too.

@rfourquet, please comment.

@rfourquet
Copy link
Member

Feedback in #27756 have been very helpful to avoid the initially proposed gentype API. The result is that there is not much substance left to gentype, which behaves essentially like eltype. The takeaway is that:

  1. rand(T), when T is a type, is hardcoded to return an object of type T.
  2. gentype(x) == gentype(typeof(x))
  3. gentype(x) defaults to eltype(x)

Even if 1. looks like a restriction, it makes it simpler to reason about what type you get, and it doesn't prevent accomplishing anything.

The main question is: for collections, do we want to allow gentype(x) != eltype(x) ?
For the same reason as 1., I believe now that we shouldn't: it's simpler to have the restrictive rule "rand on a collection returns an element from this collection". If we do this, then only one of gentype or eltype should be defined, which is difficult to enforce. Then the simplest would be to simply remove gentype, and to extend the definition of eltype so that it applies to non-collection objects on which rand can be called.

@rfourquet rfourquet added the randomness Random number generation and the Random stdlib label May 10, 2019
@tpapp
Copy link
Contributor Author

tpapp commented May 11, 2019

Now that I have studied the code in detail, I think that removing Random.gentype and just using Base.eltype for this purpose would be innocuous and simplify the code.

An objection could be that this is punning on eltype to a certain extent.

I am fine with either solution though, but a decision should be made so that it can be documented as part of the Random API.

@rfourquet
Copy link
Member

Now that I have studied the code in detail

Yay, I feel less pressure to not disappear under a bus now 😜 (not that I want to)

@tpapp
Copy link
Contributor Author

tpapp commented May 12, 2019

I wonder whether I should redo #31988 once #31787 is merged, to demonstrate the simplifications in the code, or wait for a decision here.

My understanding is that Random.gentype was not part of the existing API, so no one should have relied on it and we can remove it without technically breaking anything. Of course, https://github.com/rfourquet/RandomExtensions.jl is an exception but I trust that fixing that won't be a problem.

However, I think that custom samplers in Random is a very nice interface now and more packages should be using it (in particular, Distributions.jl), which is why I want to help with this. But I recognize that clarifying this may not end up in 1.2 and we could have to wait for 1.3 for that.

@rfourquet
Copy link
Member

rfourquet commented May 12, 2019

I wonder whether I should redo #31988 once #31787 is merged, to demonstrate the simplifications in the code, or wait for a decision here.

I wil merge the doc change in #31990 when CI finishes, and you can then update #31988, or wait for a decision. But I don't think that code simplification are important here, they are quite small and negligible compared to the conceptual simplification for the user.

My understanding is that Random.gentype was not part of the existing API, so no one should have relied on it and we can remove it without technically breaking anything

You are correct, that's why I insisted that you don't introduce gentype in your doc changes. I believe I'm the only one having a package using this function, so this is indeed not a problem.

and more packages should be using it (in particular, Distributions.jl)

🙏 I have been meaning to tackle this, it's great to have you on board!

clarifying this may not end up in 1.2 and we could have to wait for 1.3 for that

There is not a big problem waiting for 1.3 for that. As of now, the official interface is using eltype, so we can just start with this. If gentype becomes official, it will be a matter of changing eltype to gentype, which is not urgent. I agree though that it would be great to sort out this question fast.

@tpapp tpapp linked a pull request May 12, 2019 that will close this issue
@tpapp
Copy link
Contributor Author

tpapp commented May 14, 2019

@rfourquet: I am wondering what would be the best way to move forward with this minor issue. If you can think of core devs who would want to comment on this but may have missed it, please mention them.

I have the feeling that relatively few people follow the Random API closely, and also that the support for #27756 was lukewarm at best and no one would object to reverting it, ie just removing Random.gentype as in #32008.

@rfourquet
Copy link
Member

I believe that we don't really need more support to remove gentype per se, but that we do to "upgrade" the eltype meaning/documentation (which I find important if we remove gentype). So for me the best way forward is to update the eltype docstring in your PR and to ask for feedback on this specific point.

@rfourquet
Copy link
Member

I realised that there is a PR (#28704) wanting to add rand(::Pair), i.e. rand(1=>2) would be equivalent to rand((1, 2)). But there is opposition to this change: Pair is more than a 2-tuple semantically, so we may reserve this special case for a possible future use-case. I just added a comment there with such a possible use-case.
As a consequence, we may want rand(p::Pair) to return something of a different type than eltype(p)!
I.e. eltype(p) != gentype(p).
This is what I did in JuliaRandom/RandomExtensions.jl#4 (I do even "worse" there, by allowing gentype(p) to be a subtype of gentype(typeof(p)), to allow more tightly typed collections).

The take-away for me is that, if the API for rand(::Pair) is reserved, we should keep gentype internally. Of course, we could delete gentype for now and re-introduce it later if needed, but I don't see deleting gentype urgent enough to warrant this code-churn.

For now, my prefered solution is to keep gentype entirely internal, and to disallow having gentype(x) != eltype(x) except for Pair!

@tpapp
Copy link
Contributor Author

tpapp commented Jun 4, 2019

Sorry, but I don't see the advantage of the syntax introduced in that PR, and for me it is not worth special-casing the Random API for this.

@rfourquet
Copy link
Member

I'm not totally sold on this alternative API (from this PR), but just wanted to illustrate a possible meaning to rand(::Pair), which requires a distinct gentype function from eltype.
(One advantage is brevity and readability, which becomes apparent when you nest calls, but it's not the place to discuss it).

@tpapp
Copy link
Contributor Author

tpapp commented Jun 5, 2019

FWIW, I think one should not become overenthusiastic about defining samplers for types. It should only be done where the choices are pretty standard an unambiguous, which can then be used as building blocks. I imagine most (all?) of these are already in Base or Random.

For anything beyond generating uniform random values of some T <: Real, I think that the right choice is to encapsulate the distribution information in a value, and define a sampler on that.

From this perspective, Random.gentype should not be needed.

@rfourquet
Copy link
Member

I mostly agree with what you said, but don't understand the conclusion:

From this perspective, Random.gentype should not be needed.

Whether it's called gentype or eltype, this function must be defined for these values which encapsulate the distribution information... Did I misunderstand?

@tpapp
Copy link
Contributor Author

tpapp commented Jun 5, 2019

Sorry, I was not clear. I meant that a separate function is not needed and we can just use Base.eltype for this purpose.

@rfourquet
Copy link
Member

Ok. But gentype was introduced only for values anyway, as the generated values from a type are supposed to always be values of that type. So the question remains.
I think I still sligthly favor removing gentype for simplicity, but I don't see the urgency. We may wait more to see if the extra flexibility coming from gentype is useful enough (I saw that there are open issues in the Distribution package related to eltype, I wonder what the devs thereof think about this issue).

@tpapp
Copy link
Contributor Author

tpapp commented Jun 5, 2019

I agree about the lack of urgency, since the docs now talk about eltype, so the user does not need to have anything to do with Random.gentype.

I wonder if there is a label you could apply to the issue that suggests revisiting it at some point. Maybe the 1.3 milestone, or is that too early?

@rfourquet
Copy link
Member

rfourquet commented Jun 5, 2019

The mere fact that it's an open issue should in theory be enough to suggest revisiting at some point ;-)
Of course, I know that it's not always the case in practice. But I guess this issue will be reminded to us when a question arises somewhere else on this topic.

@zsunberg
Copy link
Contributor

I'm not sure if gentype needs to be part of the interface (it may be better to rely on compiler type inference than making people implement it), but I'd like to weigh in strongly in favor of gentype NOT being replaced by eltype. Distributions.jl has treated them differently, e.g.

julia> eltype(MvNormal(zeros(2)))
Float64

In this case, if one interprets MvNormal(zeros(2)) to be a random variable distributed according to a zero mean multivariate normal distribution, then it makes total sense to define eltype to be the type of the elements of this random variable. In my opinion gentype represents a distinct concept and eltype should not be stretched far enough to consume it.

@tpapp
Copy link
Contributor Author

tpapp commented Sep 10, 2019

then it makes total sense to define eltype to be the type of the elements of this random variable

It am not sure about this: it would be undefined for non-array samplers.

Ideally, we would have a way to describe array shapes & eltypes, without being concrete, not unlike the information consumed by Base.similar (but without the type), this could be returned and then queried for the type when applicable.

Whether this should be eltype or gentype is an open question.

@zsunberg
Copy link
Contributor

it would be undefined for non-array samplers.

julia> eltype(1.0)
Float64

julia> using Distributions; eltype(Normal())
Float64

This makes sense to me as a definition for a non-array distribution since eltype of a scalar is that scalar value's type.

@hyrodium
Copy link
Contributor

hyrodium commented Jan 2, 2024

I propose the following regarding Random.gentype:

  • Retain Random.gentype.
  • Update the Random.gentype behavior with Random.gentype(T::DataType) = T.

This would allow for code such as:

function f(X, n)
    v = Vector{Random.gentype(X)}(undef, n)
    for i in 1:n
        v[i] = rand(X)
    end
    return v
end

This function operates effectively whether X is a collection or a type, as demonstrated below:

julia> using Random, StaticArrays

julia> function f(X, n)
           v = Vector{Random.gentype(X)}(undef, n)
           for i in 1:n
               v[i] = rand(X)
           end
           return v
       end
f (generic function with 1 method)

julia> f(4:6, 5)  # Works fine with collection `X`
5-element Vector{Int64}:
 4
 5
 6
 6
 5

julia> f(Bool, 5)  # Works fine with `X === Bool` because `Random.gentype(Bool) == Bool`
5-element Vector{Bool}:
 1
 0
 1
 0
 1

julia> f(SVector{2,Float64}, 5)  # Throws an error because `Random.gentype(SVector{2,Float64}) == Float64`
ERROR: MethodError: Cannot `convert` an object of type SVector{2, Float64} to an object of type Float64

Closest candidates are:
  convert(::Type{T}, ::T) where T<:Number
   @ Base number.jl:6
  convert(::Type{T}, ::T) where T
   @ Base Base.jl:84
  convert(::Type{T}, ::Number) where T<:Number
   @ Base number.jl:7
  ...

Stacktrace:
[...]

julia> Random.gentype(T::DataType) = T

julia> f(SVector{2,Float64}, 5)  # Correct!
5-element Vector{SVector{2, Float64}}:
 [0.10326006658523756, 0.14320469574471628]
 [0.8780611866560915, 0.6063184375760816]
 [0.6257441894822869, 0.4733629411318736]
 [0.07619040358834162, 0.2326119259240491]
 [0.15946024991595642, 0.2648635245489229]

@hyrodium
Copy link
Contributor

hyrodium commented Jan 2, 2024

Ah, there was a similar discussion: #27756 (comment)

I think gentype(x) != gentype(typeof(x)) is not that problematic, and documenting gentype as "gentype(x) returns the type of rand(x). Note that gentype(x) may not be equal to gentype(typeof(x))." would be sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
randomness Random number generation and the Random stdlib
Projects
None yet
4 participants