Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uniform for Generic #70

Merged
merged 7 commits into from
Aug 20, 2020
Merged

Uniform for Generic #70

merged 7 commits into from
Aug 20, 2020

Conversation

Bodigrim
Copy link
Contributor

Cf. #21 and partially #26

> :set -XDeriveGeneric -XDeriveAnyClass
> import GHC.Generics (Generic)
> data Action = Code Bool | Eat (Maybe Bool) | Sleep deriving (Show, Generic, Uniform)
> gen <- newIOGenM (mkStdGen 42)
> uniformListM 10 gen :: IO [Action]
[Code False,Code False,Eat Nothing,Code True,Eat (Just False),Eat (Just True),Eat Nothing,Eat (Just False),Sleep,Code True]

@Shimuuar
Copy link
Contributor

There's interesting question about efficiency for generation of pure product types. For large products (say Foo Word64 Word64 Word64 it's obviously more efficient to generate each Word64 separately. Multiplication and division for long arithmetic is not terribly fast and superlinear in bit size. However for cases like Bar D3 D3 D3 it could be more efficient to generate uniform number on range [0,27) and then do two divisions.

@Bodigrim
Copy link
Contributor Author

I'm not particularly concerned about performance here - Generic instances are expected to be less efficient than hand-written ones. It would be nice to pickup low-hanging fruits if any, but I do not want to complicate the design a lot.

@Bodigrim
Copy link
Contributor Author

Since cardinality is hidden from users, we can potentially make it not Integer, but Either Int Integer, where Left 32 stands for 2^32 elements. And GFinite (a :*: b) could use this information, switching to shifts instead of divisions.

@Shimuuar
Copy link
Contributor

I disagree about performance. I certainly expect that performance of generic instances would be similar to naively written ones it also should avoid pathological performance cases if it's possible. Otherwise we get "this is standard way to derive instances except don't use it this and that case" (it will be used anyway). And generic deriving is place where it's reasonable to spend optimization effort even if makes things more complicated. Cost is paid once but benefit is spread over all users.

@Bodigrim
Copy link
Contributor Author

Any ideas why doctests pass in all builds except GHC 8.10? https://travis-ci.org/github/haskell/random/jobs/702872773 CC @lehins

@lehins
Copy link
Contributor

lehins commented Jun 28, 2020

Any ideas why doctests pass in all builds except GHC 8.10?

@Bodigrim I am not sure, I've been seeing this with my own packages too, but I haven't investigated it much. Feels like something has changed with ghc-8.10 and Cabal combination which prevents doctests to function properly. I usually only care that doctests are executed with some ghc version, because this way there is an already pretty good chance that examples in haddock are still good.

Really seems like an upstream bug and I try to stay away from digging into build tools too much and let someone else figure it out :)

We could for now disable doctests with CPP for ghc-8.10, they are already disabled for < ghc-8.2

If you feel like investigating it further though, please let me know what you find.

@Bodigrim
Copy link
Contributor Author

Bodigrim commented Jun 28, 2020

@lehins Yeah, I've seen this kind of arbitrary doctest failures with GHC 8.10 before, in my other projects. Disabling for now.

@Shimuuar Well, one can certainly get more mileage generating made-to-measure instances via TemplateHaskell, but I'm against this approach in random. However, I implemented the trick described above to replace long divisions/multiplications with shifts and sprinkled {-# INLINE #-} generously.

@Shimuuar
Copy link
Contributor

I thought a little more about it and come to conclusion that there's much better approach for product types. Current one defines uniform in terms of Finite which requires user to define Finite instance for user data types which isn't that terrible but it becomes impossible once you start working with real-valued data. Let take for example

newtype Angle = Angle Double

instance Uniform Angle where
  uniformM g = Angle . ((2*pi)*) <$> uniformM g

It's perfectly valid uniform instance yet there's no sane Finite instance. Make it point on sphere and there's no instance at all.

I think better approach is to define generic Uniform for product in terms Uniform and fall back on Finite shenanigans for sum types. Something along the lines:

class GUniform f where
  guniformM :: StatefulGen g m => g -> m (f p)

instance GUniform f => GUniform (M1 i c f) where
  guniformM = fmap M1 . guniformM

instance Uniform a => GUniform (K1 i a) where
  guniformM = fmap K1 . uniformM

instance GUniform U1 where
  guniformM _ = return U1

instance (GUniform f, GUniform g) => GUniform (f :*: g) where
  guniformM g = (:*:) <$> guniformM g <*> guniformM g

instance (...) => GUniform (f :+: g) where
  guniformM g = {- uses Finite -}

I think it's better since it avoids introducing Finite in common case of product types. So it appears only when dealing with sum types.

@Bodigrim
Copy link
Contributor Author

Bodigrim commented Jun 28, 2020

I'm not particularly opposed to your idea, but I have some reservations.

  1. Angle is not quite a geometrical angle, because it includes both 0 and 2π.
  2. There are plenty of instances of Uniform for tuples, so if your data is a plain product you can whip up an instance of Tuple Angle Angle in no time.
  3. Under your proposal instances for small finitely-inhabited products become worse: Quadruple Bool Bool Bool Bool would require 4 random numbers instead of 1 as currently.

@Bodigrim Bodigrim marked this pull request as ready for review June 28, 2020 20:18
@Shimuuar
Copy link
Contributor

  1. It's depends on interpretation of data type. Sometime one want to consider 0 and 2π a same thing.BTW 2*pi<2π. This is just a simplest example. It's very easy to invent more: just take any bounded set in R^n.

  2. I don't understand how tuples could help to define instance for entirely contrived data Angle2 = Angle Angle Angle. One could write uniformM g = AngleA <$> uniformM g <*> uniformM g but whole point of generics is to avoid such boilerplate.

  3. True. For types with small number of inhabitants it's more efficient to generate single random number and then divide. Working DerivingVia would've resolved this tension. However for single default implementation I think it's better to err on side of being more generic. We could provide uniformViaFiniteM hwen it could be used as optimzation

@Bodigrim
Copy link
Contributor Author

I actually thought a bit more and decided that it is better to follow GUniform avenue, because it allows users to inject their own instances of Uniform. Thanks, @Shimuuar!

@Bodigrim Bodigrim changed the title WIP Uniform for Generic Uniform for Generic Jun 28, 2020
@Shimuuar
Copy link
Contributor

because it allows users to inject their own instances of Uniform

I had this idea in mind and couldn't explain it properly. Thanks for articulating it

Copy link
Contributor

@Shimuuar Shimuuar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked generated core and GHC is able to specialize generated constructor. However earlier I found that generics were specialized away even at -O1. Now I they disappear even at -O1. Was that my mistake? We certainly need better tools for gazing into abyss^W GHC core.

Other that Enum instance LGTM

src/System/Random/GFinite.hs Show resolved Hide resolved
src/System/Random/Internal.hs Show resolved Hide resolved
@Bodigrim
Copy link
Contributor Author

Bodigrim commented Jul 21, 2020

@Shimuuar @lehins @idontgetoutmuch @curiousleo any more review suggestions?
(I'm happy to remove UniformRange instance until we sort out UniformRange for tuples)

@lehins
Copy link
Contributor

lehins commented Jul 22, 2020

I'm happy to remove UniformRange instance until we sort out UniformRange for tuples)

@Bodigrim Yes, please. The instance that gets derived for tuples in this PR is not something we want in this library. I don't care if it follows all the laws in the World, but Ord instance on tuples should not get anywhere near UniformRange class.

@Bodigrim
Copy link
Contributor Author

@lehins done.

Copy link
Contributor

@lehins lehins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll admit that am far from a Generics guru, but overall it seems legit and of course the functionality it provides is incredibly useful. 👍 from me. @Shimuuar you had a look at this PR before, is there anything you'd like to add before we merge it?

test/Spec.hs Outdated Show resolved Hide resolved
@Shimuuar
Copy link
Contributor

No everything is good

@lehins lehins merged commit 20c05aa into haskell:master Aug 20, 2020
@idontgetoutmuch
Copy link
Member

Great work!

@Bodigrim Bodigrim deleted the generic branch January 22, 2021 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants