FEA Add support for accepting a Numpy RandomState #6150

betatim · 2024-11-28T16:45:06Z

In addition to accepting integers you can now also pass a RandomState object. It is used to derive an integer to use a seed.

~~add support for cupy random state objects~~ something for a new PR

Closes #4753

In addition to accepting integers you can now also pass a RandomState object. It is used to derive an integer to use a seed.

betatim · 2024-12-03T13:14:26Z

The failures seemed to be related to some dask timeout which is unrelated I think.

Lets see what happens for the latest commit

For my education, why does it say I requested a code review from people? I don't remember clicking any buttons :-/

dantegd · 2024-12-02T22:03:02Z

python/cuml/cuml/tests/test_common.py

+        cuml.UMAP,
+    ],
+)
+def test_random_state_argument(Estimator):


Should we add a quick test here that the results are the same with the seed, or is that tested in the individual algo tests?

I don't think the results will be the same because RandomState(42) will not lead to 42 being passed as the seed to the internal functions that cuml calls.

We can't pass any form of "RNG state" to the internal functions, we can just pass an integer. So I think the best we can do when a RandomState is passed in is to use it to generate a uint64 and use that as seed for the internal functions. I think this is better than trying to extract the (original) seed from the RandomState because that way you get a different value if the random state has been used previously.

For example in this (contrived) example I think the two RFs should not both use 42 as the seed internally as they are two separate instances.

rs = RandomState(42) rf1 = cuml.RandomForestClassifier(random_state=rs) rf2 = cuml.RandomForestClassifier(random_state=rs)

viclafargue

LGTM

betatim · 2024-12-04T13:19:48Z

python/cuml/cuml/cluster/kmeans.pyx

@@ -302,6 +306,11 @@ class KMeans(UniversalBase,
                                                  else None),
                                check_dtype=check_dtype)

+        # XXX Should deriving a seed from a random state be idempotent? Should repeated
+        # XXX calls of `fit` create new seeds or not?


What do people think about this? Should we re-derive a seed each time fit is called?

That is an excellent question... what would be the behavior of sklearn?

If you pass an int each call to fit is the same, but if you pass a random state it keeps getting forwarded, so each fit is different. (I think it is at least somewhat unclear what should happen, at least within scikit-learn we've not really been able to converge on something :-/)

I think here I'd vote for deriving a new seed each time. My thinking is that that way we match scikit-learn (no need to somehow special case this for the accelerator). Even if I can't justify why having a new seed each time is "the right thing to do"

…ndom-state-everywhere

Add support for accepting a Numpy RandomState

61768cb

In addition to accepting integers you can now also pass a RandomState object. It is used to derive an integer to use a seed.

github-actions bot added the Cython / Python Cython or Python issue label Nov 28, 2024

betatim added 4 commits December 2, 2024 10:40

Tweak multi-gpu case

77aa6e3

Change check_random_seed to return a random int for None

ea1dfe5

Use a default value for seeds

8b973ce

Use a fixed random state

814c62b

betatim marked this pull request as ready for review December 3, 2024 12:25

betatim requested a review from a team as a code owner December 3, 2024 12:25

betatim requested review from dantegd and divyegala December 3, 2024 12:25

Merge branch 'branch-24.12' into random-state-everywhere

c411530

dantegd added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Dec 3, 2024

dantegd reviewed Dec 3, 2024

View reviewed changes

viclafargue approved these changes Dec 4, 2024

View reviewed changes

betatim commented Dec 4, 2024

View reviewed changes

betatim added 2 commits December 6, 2024 16:50

Always derive a new seed on fit

7c4aae3

Merge remote-tracking branch 'origin/random-state-everywhere' into ra…

1daab08

…ndom-state-everywhere

dantegd changed the base branch from branch-24.12 to branch-25.02 December 11, 2024 23:44

betatim added 2 commits December 19, 2024 00:05

Ping

b0f7841

Merge branch 'branch-25.02' into random-state-everywhere

7092d59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEA Add support for accepting a Numpy RandomState #6150

FEA Add support for accepting a Numpy RandomState #6150

betatim commented Nov 28, 2024 •

edited

Loading

betatim commented Dec 3, 2024

dantegd Dec 2, 2024

betatim Dec 4, 2024

viclafargue left a comment

betatim Dec 4, 2024

dantegd Dec 5, 2024

betatim Dec 6, 2024

FEA Add support for accepting a Numpy RandomState #6150

Are you sure you want to change the base?

FEA Add support for accepting a Numpy RandomState #6150

Conversation

betatim commented Nov 28, 2024 • edited Loading

betatim commented Dec 3, 2024

dantegd Dec 2, 2024

Choose a reason for hiding this comment

betatim Dec 4, 2024

Choose a reason for hiding this comment

viclafargue left a comment

Choose a reason for hiding this comment

betatim Dec 4, 2024

Choose a reason for hiding this comment

dantegd Dec 5, 2024

Choose a reason for hiding this comment

betatim Dec 6, 2024

Choose a reason for hiding this comment

betatim commented Nov 28, 2024 •

edited

Loading