Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use local RandomState instead of seeding the global RNG #12259

Merged
merged 3 commits into from
Feb 24, 2019
Merged

Use local RandomState instead of seeding the global RNG #12259

merged 3 commits into from
Feb 24, 2019

Conversation

YuriyGuts
Copy link
Contributor

Summary

According to NumPy NEP 19 — Random Number Generator Policy, instantiating local RandomState objects is preferred, rather than seeding the global np.random generator.

Seeding the global generator may introduce unexpected behavior in the applications that use Keras and also rely on its state.

Proposed change: use a local RandomState object in dataset loaders and the Orthogonal initializer.

Related Issues

#12258

PR Overview

  • This PR requires new unit tests [y/n] (make sure tests are included)
  • This PR requires to update the documentation [y/n] (make sure the docs are up-to-date)
  • This PR is backwards compatible [y/n]
  • This PR changes the current API [y/n] (all API changes need to be approved by fchollet)

Copy link
Collaborator

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the PR. We'll merge after the other PR has been merged, since it was created before.

@vikua
Copy link
Contributor

vikua commented Feb 21, 2019

I don't know how we ended up in situation with two similar PRs. What I usually do is I search for similar PRs which haven't been merged yet before contributing, and if there are some - I either suggest changes to those or create a new PR with non-overlapping changes.
Otherwise it is a funny strategy - find some PR in OSS repo, create a new one with some improvements, profit ¯_(ツ)_/¯
But, considering the fact that this PR is a superset of #12232 I can close mine to get things done.

@zachmayer
Copy link
Contributor

Looks like #12232 was closed in favor of this one

@fchollet fchollet merged commit 91ccb28 into keras-team:master Feb 24, 2019
@ConcurrencyPractitioner
Copy link
Contributor

ConcurrencyPractitioner commented Feb 25, 2019

Oh hi. I just want to point out some possible optimization. In keras/engine/training_arrays.py, a method, batch_shuffle, is called repetitively in a loop (then batch_shuffle calls np.random.shuffle). My suggestion is that we add an extra seed parameter to the fit_loop method which calls batch_shuffle so that we create only a local RandomState and then use this instead when invoking np.random.shuffle.

@ConcurrencyPractitioner
Copy link
Contributor

ConcurrencyPractitioner commented Feb 25, 2019

diff --git a/keras/engine/training_arrays.py b/keras/engine/training_arrays.py
index 466dd6bf..c7029682 100644
--- a/keras/engine/training_arrays.py
+++ b/keras/engine/training_arrays.py
@@ -32,7 +32,8 @@ def fit_loop(model, fit_function, fit_inputs,
              initial_epoch=0,
              steps_per_epoch=None,
              validation_steps=None,
-             validation_freq=1):
+             validation_freq=1,
+             seed=113):
     """Abstract fit function for `fit_function(fit_inputs)`.

     Assumes that fit_function returns a list, labeled by out_labels.
@@ -143,6 +144,8 @@ def fit_loop(model, fit_function, fit_inputs,
             model._feed_targets +
             model._feed_sample_weights)
     indices_for_conversion_to_dense = []
+    rng = np.random.RandomState(seed)
+
     for i in range(len(feed)):
         if issparse(fit_inputs[i]) and not K.is_sparse(feed[i]):
             indices_for_conversion_to_dense.append(i)
@@ -180,7 +183,7 @@ def fit_loop(model, fit_function, fit_inputs,
             if shuffle == 'batch':
                 index_array = batch_shuffle(index_array, batch_size)
             elif shuffle:
-                np.random.shuffle(index_array)
+                rng.shuffle(index_array)

             batches = make_batches(num_train_samples, batch_size)
             for batch_index, (batch_start, batch_end) in enumerate(batches):

Just the diff for my suggestion.

@farizrahman4u
Copy link
Contributor

@ConcurrencyPractitioner can you make that a PR please?

kiku-jw pushed a commit to kiku-jw/keras that referenced this pull request Mar 4, 2019
…2259)

* Use local RandomState instead of seeding the global RNG

* Create a unit test module for datasets and move tests there

* Move initializer test to the proper file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants