Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Noise segments #17

Open
danieleghisi opened this issue Apr 8, 2017 · 12 comments
Open

Noise segments #17

danieleghisi opened this issue Apr 8, 2017 · 12 comments

Comments

@danieleghisi
Copy link

Hi, and thanks for sharing this wonderful model.
I must say, it is probably the best model I've tried so far for sample-by-sample generation.

I do have a question, though: I constantly end up having noise-burst or noise-segments on my generated audios; sometimes that covers the whole generation, sometimes it just comes at a certain point.

This does not seem to improve with epochs (after a couple of days of training, I still have the same noise burst), and this seems to happen throughout different datasets (mostly classical music) - oddly enough, this seems to be better in noisy datasets, such as rain or water sounds :)

Does anyone have an idea of how to avoid or limit this issue?
Is there a parameter I should fine tune for this?

Thanks again,
Daniele

@richardassar
Copy link
Contributor

richardassar commented Apr 9, 2017

I also have this problem with the piano dataset. My guess is that noisy datasets help protect the network from overly confident output distributions, essentially approximating https://arxiv.org/abs/1701.06548

It might be worth trying dithering, randomly flipping the least significant bit.

I found that multitrack music and other instruments (violin) suffer from this problem far less. I also found the problem went away with absolute-max normalization (not the current scheme which essentially adds a DC offset), however this seems to greatly reduce the quality of the output.

It only takes a small number of poor samples to throw the model off. I've suggested scheduled sampling https://arxiv.org/abs/1506.03099 but it might be impractical.

If I find anything concrete I'll let you know.

@danieleghisi
Copy link
Author

Hi, and first of all thanks for answering.
Is it hard to perform the absolute-max normalization you mention? If you could drop a couple of words on how I might do that within in the current code, that'd be great...

Thanks again,
Daniele

@richardassar
Copy link
Contributor

richardassar commented Apr 9, 2017

It isn't hard at all, instead of (pseudo-code) (x - x:min()) / (x:max() - x:min()) you want to do ((x / max(abs(x:min()), x:max()) + 1) / 2.

https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/datasets/dataset.py#L53

The current method almost acts like a form of "local contrast normalization" over the dataset but also ends up shifting the mean of the signal, which actually might help. Whenever I tried abs-max normalization the samples tend towards silence or very quiet tones for the majority of the generated audio. I didn't assess this extensively though.

@danieleghisi
Copy link
Author

Oh yep, I understand. I'll give it a try, then!
Thanks again,
Daniele

@richardassar
Copy link
Contributor

richardassar commented Apr 11, 2017

It might also be interesting to see if the LSTM works better here, if you do try this then please let me know if you see any improvement.

@danieleghisi
Copy link
Author

I have tried abs-max normalization, which does reduce noise but samples always go to almost-silence situations very quickly, and the quality of the generation is hence quite poor, exactly as you said.
I will try LSTM as well, with abs-max normalization.

@richardassar
Copy link
Contributor

richardassar commented Apr 13, 2017 via email

@pipoket
Copy link

pipoket commented May 23, 2017

Do you have anything new about this issue?

I'm trying to train my model using my own piano music data(approx. 3 hours long) with default parameters but the problem OP mentioned seems to persist.

Based on the comment from both of you, what I suspect is that the noise is due to the fact that my music data has very wide dynamic range, from almost silence notes to very loud notes.

I would really appreciate it if you could share your results.

@richardassar
Copy link
Contributor

richardassar commented May 23, 2017 via email

@pipoket
Copy link

pipoket commented May 24, 2017

@richardassar, Thanks for the heads up.

Although I tried to find the difference looking at your code and I suppose that the logic for reducing the sampling temperature resides in SampleRNN_torch/scripts/generate_dataset.lua code, I don't think I could say that I have fully understood the difference.

I think it would be better for me to train with your Torch implementation. AFAIK, if I manually generate train data again using your lua script (not using the ones generated using Python script), there is a chance of my problem being solved. Is my understanding correct?

Anyway, I'm going to try to train my model with your implementation and see whether there is a difference :)

====UPDATE====

Silly me. I found the option -sampling_temperature from train.lua and fast_sample.lua scripts.
After training for a while, I'll try to use different values for that parameter to see the difference.
Thanks!

@richardassar
Copy link
Contributor

richardassar commented May 24, 2017 via email

@danieleghisi
Copy link
Author

danieleghisi commented May 24, 2017

@richardassar Very interesting, thanks for sharing the implementation, I will test it probably next week, as soon as the GPUs I have access to are free.

Would it be possible for you to point out what should be changed to the theano implementation in order to add the sampling temperature? I suppose it must be something to be added in the softmax_and_sample function (ops.py, line 266)... Should I just add
flattened_logits = flattened_logits/temperature
after the logits.reshape?

Thanks again,
Daniele

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants