Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove inconsistent usage of np.random and standard library #430

Closed
wants to merge 1 commit into from

Conversation

jrdurrant
Copy link

Currently both the numpy.random module and random module from the standard library are used for sampling transformations. This means that specifying a random seed for only one of these will not affect the other and subsequent runs may or may not have the same behaviour (for a given seed).

This can be subtle to pick up on since some transforms will stay the same and some will not. By using a single source of randomness it ensures consistent samples can be generated

@jrdurrant jrdurrant closed this Feb 23, 2018
@fmassa
Copy link
Member

fmassa commented Feb 25, 2018

You raise a good point.
Any particular reason why you decided to close the PR?

We might eventually want to add something like a seed parameter to the transforms, to enforce reproducibility over different runs.

@fmassa
Copy link
Member

fmassa commented Feb 26, 2018

Oh, I see, this is actually a duplicate of #354

@jrdurrant
Copy link
Author

Yes, exactly! Sorry I should have been clear on that

rajveerb pushed a commit to rajveerb/vision that referenced this pull request Nov 30, 2023
* RNN-T reference update for MLPerf Training v1.0

* switch to stable DALI release

* transcritp tensor building - index with np array instead of torch tensor

* fix multi-GPU bucketing

* eval every epoch, logging improvement

* user can adjust optimizer betas

* gradient clipping

* missing config file

* [README] add driver disclaimer

* right path to sentencepieces

* bind all gpus in docker/launch.sh script

* move speed perturbation out of evaluation

* remove not related code; update logging; default arguments with LAMB

* add evaluation when every sample is seen once

* add run_and_time.sh

* update logging

* missing augmentation logs

* revert unwanted dropout removal from first two encode layers

* scaling weights initialization

* limit number of symbols produced by the greedy decoder

* simplification - rm old eval pipeline

* dev_ema in tb_logginer

* loading from checkpoint restores optimizer state

* Rnnt logging update (pytorch#4)

* logging uses constants instead of raw strings
* missing log entries
* add weights initialization logging according to mlcommons/logging#80

* 0.5 wights initialization scale gives more stable convergence

* fix typo, update logging lib to include new constant

* README update

* apply review suggestions

* [README] fix model diagram

2x time stacking after 2nd encoder layer, not 3x

* transcript tensor padding comment

* DALI output doesn't need extra zeroing of padding

* Update README.md

Links to code sources, fix LSTM weight and bias initialization description

* [README] model diagram fix - adjust to 1023 sentencepieces
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants