Noise segments #17

danieleghisi · 2017-04-08T10:02:19Z

Hi, and thanks for sharing this wonderful model.
I must say, it is probably the best model I've tried so far for sample-by-sample generation.

I do have a question, though: I constantly end up having noise-burst or noise-segments on my generated audios; sometimes that covers the whole generation, sometimes it just comes at a certain point.

This does not seem to improve with epochs (after a couple of days of training, I still have the same noise burst), and this seems to happen throughout different datasets (mostly classical music) - oddly enough, this seems to be better in noisy datasets, such as rain or water sounds :)

Does anyone have an idea of how to avoid or limit this issue?
Is there a parameter I should fine tune for this?

Thanks again,
Daniele

richardassar · 2017-04-09T03:50:57Z

I also have this problem with the piano dataset. My guess is that noisy datasets help protect the network from overly confident output distributions, essentially approximating https://arxiv.org/abs/1701.06548

It might be worth trying dithering, randomly flipping the least significant bit.

I found that multitrack music and other instruments (violin) suffer from this problem far less. I also found the problem went away with absolute-max normalization (not the current scheme which essentially adds a DC offset), however this seems to greatly reduce the quality of the output.

It only takes a small number of poor samples to throw the model off. I've suggested scheduled sampling https://arxiv.org/abs/1506.03099 but it might be impractical.

If I find anything concrete I'll let you know.

danieleghisi · 2017-04-09T11:43:23Z

Hi, and first of all thanks for answering.
Is it hard to perform the absolute-max normalization you mention? If you could drop a couple of words on how I might do that within in the current code, that'd be great...

Thanks again,
Daniele

richardassar · 2017-04-09T11:57:06Z

It isn't hard at all, instead of (pseudo-code) (x - x:min()) / (x:max() - x:min()) you want to do ((x / max(abs(x:min()), x:max()) + 1) / 2.

https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/datasets/dataset.py#L53

The current method almost acts like a form of "local contrast normalization" over the dataset but also ends up shifting the mean of the signal, which actually might help. Whenever I tried abs-max normalization the samples tend towards silence or very quiet tones for the majority of the generated audio. I didn't assess this extensively though.

danieleghisi · 2017-04-09T12:24:53Z

Oh yep, I understand. I'll give it a try, then!
Thanks again,
Daniele

richardassar · 2017-04-11T21:45:32Z

It might also be interesting to see if the LSTM works better here, if you do try this then please let me know if you see any improvement.

danieleghisi · 2017-04-13T10:20:25Z

I have tried abs-max normalization, which does reduce noise but samples always go to almost-silence situations very quickly, and the quality of the generation is hence quite poor, exactly as you said.
I will try LSTM as well, with abs-max normalization.

richardassar · 2017-04-13T10:22:07Z

Yes, I experience exactly this. With abs-max my thoughts are that issue may be due to the difference in dynamic range. Trying some of the non-linear quantization schemes might help, Mu-law/A-law etc.

…

On 13 April 2017 at 11:20, danieleghisi ***@***.***> wrote: I have tried abs-max normalization, which does reduce noise but samples always go to almost-silence situations very quickly, and the quality of the generation is hence quite poor, exactly as you said. I will try LSTM as well, with abs-max normalization. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#17 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA1ALEXd4760OIto8OikFkKf1JaZY1S9ks5rvfbqgaJpZM4M3qGY> .

pipoket · 2017-05-23T08:34:34Z

Do you have anything new about this issue?

I'm trying to train my model using my own piano music data(approx. 3 hours long) with default parameters but the problem OP mentioned seems to persist.

Based on the comment from both of you, what I suspect is that the noise is due to the fact that my music data has very wide dynamic range, from almost silence notes to very loud notes.

I would really appreciate it if you could share your results.

richardassar · 2017-05-23T14:51:21Z

I've had some success in reducing the sampling temperature slightly. See my torch implementation for details. https://github.com/richardassar/SampleRNN_torch Let me know if it helps. On 23 May 2017 17:34, "pipoket" <[email protected]> wrote: Do you have anything new about this issue? I'm trying to train my model using my own piano music data(approx. 3 hours long) with default parameters but the problem OP mentioned seems to persist. Based on the comment from both of you, what I suspect is that the noise is due to the fact that my music data has very wide dynamic range, from almost silence notes to very loud notes. I would really appreciate it if you could share your results. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#17 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA1ALIJKyCbhJlx3_hfrPmggeNar70qaks5r8pobgaJpZM4M3qGY> . On 23 May 2017 17:34, "pipoket" <[email protected]> wrote: Do you have anything new about this issue? I'm trying to train my model using my own piano music data(approx. 3 hours long) with default parameters but the problem OP mentioned seems to persist. Based on the comment from both of you, what I suspect is that the noise is due to the fact that my music data has very wide dynamic range, from almost silence notes to very loud notes. I would really appreciate it if you could share your results. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#17 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA1ALIJKyCbhJlx3_hfrPmggeNar70qaks5r8pobgaJpZM4M3qGY> .

pipoket · 2017-05-24T06:16:55Z

@richardassar, Thanks for the heads up.

Although I tried to find the difference looking at your code and I suppose that the logic for reducing the sampling temperature resides in SampleRNN_torch/scripts/generate_dataset.lua code, I don't think I could say that I have fully understood the difference.

I think it would be better for me to train with your Torch implementation. AFAIK, if I manually generate train data again using your lua script (not using the ones generated using Python script), there is a chance of my problem being solved. Is my understanding correct?

Anyway, I'm going to try to train my model with your implementation and see whether there is a difference :)

====UPDATE====

Silly me. I found the option -sampling_temperature from train.lua and fast_sample.lua scripts.
After training for a while, I'll try to use different values for that parameter to see the difference.
Thanks!

richardassar · 2017-05-24T10:36:30Z

It is trivial to add sampling temperature to the theano implementation. Feel free to use my implementation however, the generate_dataset.lua script is used in the create_dataset.sh scripts and is very easy to use. You will achieve effectively similar results with my implementation, the only difference is the lack of learn_h0 but this makes *very* little to no difference to training or sampling as confirmed by Soroush Mehri in our correspondance. I do intend to add this and I have developed the supporting code but want to test it before integrating, at which point my implementation will be 100% identical to the theano implementation. Let me know how it goes. Thanks.

…

On 24 May 2017 15:16, "pipoket" ***@***.***> wrote: @richardassar <https://github.com/richardassar>, Thanks for the heads up. Although I tried to find the difference looking at your code and I suppose that the logic for *reducing the sampling temperature* resides in SampleRNN_torch/scripts/generate_dataset.lua code, I don't think I could say that I have fully understood the difference. I think it would be better for me to train with your Torch implementation. AFAIK, if I manually generate train data again using your lua script (not using the ones generated using Python script), there is a chance of my problem being solved. Is my understanding correct? Anyway, I'm going to try to train my model with your implementation and see whether there is a difference :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#17 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA1ALK-DnAiHoiLWHFpslILE7A3-ndxAks5r88tYgaJpZM4M3qGY> .

danieleghisi · 2017-05-24T10:42:38Z

@richardassar Very interesting, thanks for sharing the implementation, I will test it probably next week, as soon as the GPUs I have access to are free.

Would it be possible for you to point out what should be changed to the theano implementation in order to add the sampling temperature? I suppose it must be something to be added in the softmax_and_sample function (ops.py, line 266)... Should I just add
flattened_logits = flattened_logits/temperature
after the logits.reshape?

Thanks again,
Daniele

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Noise segments #17

Noise segments #17

danieleghisi commented Apr 8, 2017

richardassar commented Apr 9, 2017 •

edited

Loading

danieleghisi commented Apr 9, 2017

richardassar commented Apr 9, 2017 •

edited

Loading

danieleghisi commented Apr 9, 2017

richardassar commented Apr 11, 2017 •

edited

Loading

danieleghisi commented Apr 13, 2017

richardassar commented Apr 13, 2017 via email

pipoket commented May 23, 2017

richardassar commented May 23, 2017 via email

pipoket commented May 24, 2017 •

edited

Loading

richardassar commented May 24, 2017 via email

danieleghisi commented May 24, 2017 •

edited

Loading

Noise segments #17

Noise segments #17

Comments

danieleghisi commented Apr 8, 2017

richardassar commented Apr 9, 2017 • edited Loading

danieleghisi commented Apr 9, 2017

richardassar commented Apr 9, 2017 • edited Loading

danieleghisi commented Apr 9, 2017

richardassar commented Apr 11, 2017 • edited Loading

danieleghisi commented Apr 13, 2017

richardassar commented Apr 13, 2017 via email

pipoket commented May 23, 2017

richardassar commented May 23, 2017 via email

pipoket commented May 24, 2017 • edited Loading

richardassar commented May 24, 2017 via email

danieleghisi commented May 24, 2017 • edited Loading

richardassar commented Apr 9, 2017 •

edited

Loading

richardassar commented Apr 9, 2017 •

edited

Loading

richardassar commented Apr 11, 2017 •

edited

Loading

pipoket commented May 24, 2017 •

edited

Loading

danieleghisi commented May 24, 2017 •

edited

Loading