Generating audio while training the provided default mode and default arguments #29

mukul74 · 2021-12-21T15:53:37Z

Hello, @relativeflux Thanks for reviving the SampleRNN in TensorFlow.
I have a question regarding the audio generation using a model trained on 1 audio of 8sec just for some inference and for validation, just one audio file.

--data_dir ./chunks --num_epochs 100 --batch_size 1 --max_checkpoints 1 --checkpoint_every 10 --output_file_dur 10 --sample_rate 11025

Audio Sampling_rate: 11025
I trained the model for around 40 epochs and while training and training accuracy comes out to 100% and validation accuracy is to be 4.132, as expected.
For Ref :
Epoch: 40/100, Step: 82/86, Loss: 0.000, Accuracy: 100.000, (0.440 sec/step)
Epoch: 40/100, Step: 83/86, Loss: 0.000, Accuracy: 100.000, (0.449 sec/step)
Epoch: 40/100, Step: 84/86, Loss: 0.000, Accuracy: 100.000, (0.438 sec/step)
Epoch: 40/100, Step: 85/86, Loss: 0.000, Accuracy: 100.000, (0.434 sec/step)
Epoch: 40/100, Step: 86/86, Loss: 0.000, Accuracy: 100.000, (0.437 sec/step)

Epoch: 40/100, Total Steps: 86, Loss: 0.000, Accuracy: 100.000, Val Loss: 13.038, Val Accuracy: 4.132 (1 min 0.427 sec)

But when I hear the generated audio using this checkpoint, I can hear only a small sequence of data and mostly corrupted by noise and nothing else. Generated audio sampled for 10 sec. But if I am not working due to overfitting, generated audio must provide exact training data as output or something very similar.

Just wanted to ask, am I doing something wrong or this is an expected result.

relativeflux · 2021-12-22T14:13:44Z

@mukul74 Hi, and thanks for getting in touch about this. I'm actually not sure about this one, I've done quite a bit of experimentation with small datasets but nothing much with reducing the batch size. I'd be interested to see how this compares to the same thing carried out with, say, WaveNet. I'll run some tests.

mukul74 · 2021-12-22T14:55:56Z

@relativeflux Hi, thanks for the reply. I guess I can share my observation on running a single audio file on the default model.

Even after training for 40 epochs, the generated audio file was almost garbage, although the training accuracy was 100% and loss was 0.00.
For Eg: Epoch: 40/100, Step: 86/86, Loss: 0.000, Accuracy: 100.000, (0.437 sec/step)
But after Epoch 60/100, Step: 86/86, Loss: 0.000, Accuracy: 100.000, the generated file was very very similar to training data. As it should be.
The only thing a little bit weird for me is why similar results are not generated during epoch 40.
Anyways thanks for the working Tensorflow implementation, I have to realize the PyTorch flavor of this repo for my project and then extend that model for my research.

relativeflux · 2021-12-23T19:04:12Z

@mukul74 Thank you so much for this, very useful information. I'll do some further investigation on this when I get back in the new year.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating audio while training the provided default mode and default arguments #29

Generating audio while training the provided default mode and default arguments #29

mukul74 commented Dec 21, 2021 •

edited

Loading

relativeflux commented Dec 22, 2021

mukul74 commented Dec 22, 2021

relativeflux commented Dec 23, 2021

Generating audio while training the provided default mode and default arguments #29

Generating audio while training the provided default mode and default arguments #29

Comments

mukul74 commented Dec 21, 2021 • edited Loading

relativeflux commented Dec 22, 2021

mukul74 commented Dec 22, 2021

relativeflux commented Dec 23, 2021

mukul74 commented Dec 21, 2021 •

edited

Loading