Use local RandomState instead of seeding the global RNG #12259

YuriyGuts · 2019-02-12T21:05:02Z

Summary

According to NumPy NEP 19 — Random Number Generator Policy, instantiating local RandomState objects is preferred, rather than seeding the global np.random generator.

Seeding the global generator may introduce unexpected behavior in the applications that use Keras and also rely on its state.

Proposed change: use a local RandomState object in dataset loaders and the Orthogonal initializer.

Related Issues

#12258

PR Overview

This PR requires new unit tests [y/n] (make sure tests are included)
This PR requires to update the documentation [y/n] (make sure the docs are up-to-date)
This PR is backwards compatible [y/n]
This PR changes the current API [y/n] (all API changes need to be approved by fchollet)

fchollet

LGTM, thanks for the PR. We'll merge after the other PR has been merged, since it was created before.

vikua · 2019-02-21T08:06:07Z

I don't know how we ended up in situation with two similar PRs. What I usually do is I search for similar PRs which haven't been merged yet before contributing, and if there are some - I either suggest changes to those or create a new PR with non-overlapping changes.
Otherwise it is a funny strategy - find some PR in OSS repo, create a new one with some improvements, profit ¯_(ツ)_/¯
But, considering the fact that this PR is a superset of #12232 I can close mine to get things done.

zachmayer · 2019-02-22T15:22:52Z

Looks like #12232 was closed in favor of this one

ConcurrencyPractitioner · 2019-02-25T05:24:35Z

Oh hi. I just want to point out some possible optimization. In keras/engine/training_arrays.py, a method, batch_shuffle, is called repetitively in a loop (then batch_shuffle calls np.random.shuffle). My suggestion is that we add an extra seed parameter to the fit_loop method which calls batch_shuffle so that we create only a local RandomState and then use this instead when invoking np.random.shuffle.

ConcurrencyPractitioner · 2019-02-25T05:35:53Z

diff --git a/keras/engine/training_arrays.py b/keras/engine/training_arrays.py
index 466dd6bf..c7029682 100644
--- a/keras/engine/training_arrays.py
+++ b/keras/engine/training_arrays.py
@@ -32,7 +32,8 @@ def fit_loop(model, fit_function, fit_inputs,
              initial_epoch=0,
              steps_per_epoch=None,
              validation_steps=None,
-             validation_freq=1):
+             validation_freq=1,
+             seed=113):
     """Abstract fit function for `fit_function(fit_inputs)`.

     Assumes that fit_function returns a list, labeled by out_labels.
@@ -143,6 +144,8 @@ def fit_loop(model, fit_function, fit_inputs,
             model._feed_targets +
             model._feed_sample_weights)
     indices_for_conversion_to_dense = []
+    rng = np.random.RandomState(seed)
+
     for i in range(len(feed)):
         if issparse(fit_inputs[i]) and not K.is_sparse(feed[i]):
             indices_for_conversion_to_dense.append(i)
@@ -180,7 +183,7 @@ def fit_loop(model, fit_function, fit_inputs,
             if shuffle == 'batch':
                 index_array = batch_shuffle(index_array, batch_size)
             elif shuffle:
-                np.random.shuffle(index_array)
+                rng.shuffle(index_array)

             batches = make_batches(num_train_samples, batch_size)
             for batch_index, (batch_start, batch_end) in enumerate(batches):

Just the diff for my suggestion.

farizrahman4u · 2019-02-27T08:17:24Z

@ConcurrencyPractitioner can you make that a PR please?

…2259) * Use local RandomState instead of seeding the global RNG * Create a unit test module for datasets and move tests there * Move initializer test to the proper file

YuriyGuts added 3 commits February 12, 2019 22:55

Use local RandomState instead of seeding the global RNG

0e15d5b

Create a unit test module for datasets and move tests there

7d39490

Move initializer test to the proper file

8ee3d8d

YuriyGuts mentioned this pull request Feb 12, 2019

Do not re-seed NumPy's global RNG whenever possible (NEP-19) #12258

Closed

3 tasks

farizrahman4u approved these changes Feb 13, 2019

View reviewed changes

YuriyGuts mentioned this pull request Feb 13, 2019

Use RandomState instead of global random seed in Orthogonal initializer #12232

Closed

3 tasks

fchollet reviewed Feb 16, 2019

View reviewed changes

fchollet merged commit 91ccb28 into keras-team:master Feb 24, 2019

ConcurrencyPractitioner mentioned this pull request Feb 27, 2019

Adding local Random state to fit_loop #12357

Closed

4 tasks

a-cass mentioned this pull request Jun 19, 2021

Image data flows modify Numpy's global RNG keras-team/keras-preprocessing#342

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use local RandomState instead of seeding the global RNG #12259

Use local RandomState instead of seeding the global RNG #12259

YuriyGuts commented Feb 12, 2019

fchollet left a comment

vikua commented Feb 21, 2019

zachmayer commented Feb 22, 2019

ConcurrencyPractitioner commented Feb 25, 2019 •

edited

Loading

ConcurrencyPractitioner commented Feb 25, 2019 •

edited

Loading

farizrahman4u commented Feb 27, 2019

Use local RandomState instead of seeding the global RNG #12259

Use local RandomState instead of seeding the global RNG #12259

Conversation

YuriyGuts commented Feb 12, 2019

Summary

Related Issues

PR Overview

fchollet left a comment

Choose a reason for hiding this comment

vikua commented Feb 21, 2019

zachmayer commented Feb 22, 2019

ConcurrencyPractitioner commented Feb 25, 2019 • edited Loading

ConcurrencyPractitioner commented Feb 25, 2019 • edited Loading

farizrahman4u commented Feb 27, 2019

ConcurrencyPractitioner commented Feb 25, 2019 •

edited

Loading

ConcurrencyPractitioner commented Feb 25, 2019 •

edited

Loading