-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Noise segments #17
Comments
I also have this problem with the piano dataset. My guess is that noisy datasets help protect the network from overly confident output distributions, essentially approximating https://arxiv.org/abs/1701.06548 It might be worth trying dithering, randomly flipping the least significant bit. I found that multitrack music and other instruments (violin) suffer from this problem far less. I also found the problem went away with absolute-max normalization (not the current scheme which essentially adds a DC offset), however this seems to greatly reduce the quality of the output. It only takes a small number of poor samples to throw the model off. I've suggested scheduled sampling https://arxiv.org/abs/1506.03099 but it might be impractical. If I find anything concrete I'll let you know. |
Hi, and first of all thanks for answering. Thanks again, |
It isn't hard at all, instead of (pseudo-code) https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/datasets/dataset.py#L53 The current method almost acts like a form of "local contrast normalization" over the dataset but also ends up shifting the mean of the signal, which actually might help. Whenever I tried abs-max normalization the samples tend towards silence or very quiet tones for the majority of the generated audio. I didn't assess this extensively though. |
Oh yep, I understand. I'll give it a try, then! |
It might also be interesting to see if the LSTM works better here, if you do try this then please let me know if you see any improvement. |
I have tried abs-max normalization, which does reduce noise but samples always go to almost-silence situations very quickly, and the quality of the generation is hence quite poor, exactly as you said. |
Yes, I experience exactly this. With abs-max my thoughts are that issue may
be due to the difference in dynamic range. Trying some of the non-linear
quantization schemes might help, Mu-law/A-law etc.
…On 13 April 2017 at 11:20, danieleghisi ***@***.***> wrote:
I have tried abs-max normalization, which does reduce noise but samples
always go to almost-silence situations very quickly, and the quality of the
generation is hence quite poor, exactly as you said.
I will try LSTM as well, with abs-max normalization.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#17 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1ALEXd4760OIto8OikFkKf1JaZY1S9ks5rvfbqgaJpZM4M3qGY>
.
|
Do you have anything new about this issue? I'm trying to train my model using my own piano music data(approx. 3 hours long) with default parameters but the problem OP mentioned seems to persist. Based on the comment from both of you, what I suspect is that the noise is due to the fact that my music data has very wide dynamic range, from almost silence notes to very loud notes. I would really appreciate it if you could share your results. |
I've had some success in reducing the sampling temperature slightly.
See my torch implementation for details.
https://github.com/richardassar/SampleRNN_torch
Let me know if it helps.
On 23 May 2017 17:34, "pipoket" <[email protected]> wrote:
Do you have anything new about this issue?
I'm trying to train my model using my own piano music data(approx. 3 hours
long) with default parameters but the problem OP mentioned seems to persist.
Based on the comment from both of you, what I suspect is that the noise is
due to the fact that my music data has very wide dynamic range, from almost
silence notes to very loud notes.
I would really appreciate it if you could share your results.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#17 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1ALIJKyCbhJlx3_hfrPmggeNar70qaks5r8pobgaJpZM4M3qGY>
.
On 23 May 2017 17:34, "pipoket" <[email protected]> wrote:
Do you have anything new about this issue?
I'm trying to train my model using my own piano music data(approx. 3 hours
long) with default parameters but the problem OP mentioned seems to persist.
Based on the comment from both of you, what I suspect is that the noise is
due to the fact that my music data has very wide dynamic range, from almost
silence notes to very loud notes.
I would really appreciate it if you could share your results.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#17 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1ALIJKyCbhJlx3_hfrPmggeNar70qaks5r8pobgaJpZM4M3qGY>
.
|
@richardassar, Thanks for the heads up. Although I tried to find the difference looking at your code and I suppose that the logic for reducing the sampling temperature resides in I think it would be better for me to train with your Torch implementation. AFAIK, if I manually generate train data again using your lua script (not using the ones generated using Python script), there is a chance of my problem being solved. Is my understanding correct? Anyway, I'm going to try to train my model with your implementation and see whether there is a difference :) ====UPDATE==== Silly me. I found the option |
It is trivial to add sampling temperature to the theano implementation.
Feel free to use my implementation however, the generate_dataset.lua script
is used in the create_dataset.sh scripts and is very easy to use.
You will achieve effectively similar results with my implementation, the
only difference is the lack of learn_h0 but this makes *very* little to no
difference to training or sampling as confirmed by Soroush Mehri in our
correspondance. I do intend to add this and I have developed the supporting
code but want to test it before integrating, at which point my
implementation will be 100% identical to the theano implementation.
Let me know how it goes. Thanks.
…On 24 May 2017 15:16, "pipoket" ***@***.***> wrote:
@richardassar <https://github.com/richardassar>, Thanks for the heads up.
Although I tried to find the difference looking at your code and I suppose
that the logic for *reducing the sampling temperature* resides in
SampleRNN_torch/scripts/generate_dataset.lua code, I don't think I could
say that I have fully understood the difference.
I think it would be better for me to train with your Torch implementation.
AFAIK, if I manually generate train data again using your lua script (not
using the ones generated using Python script), there is a chance of my
problem being solved. Is my understanding correct?
Anyway, I'm going to try to train my model with your implementation and
see whether there is a difference :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1ALK-DnAiHoiLWHFpslILE7A3-ndxAks5r88tYgaJpZM4M3qGY>
.
|
@richardassar Very interesting, thanks for sharing the implementation, I will test it probably next week, as soon as the GPUs I have access to are free. Would it be possible for you to point out what should be changed to the theano implementation in order to add the sampling temperature? I suppose it must be something to be added in the softmax_and_sample function (ops.py, line 266)... Should I just add Thanks again, |
Hi, and thanks for sharing this wonderful model.
I must say, it is probably the best model I've tried so far for sample-by-sample generation.
I do have a question, though: I constantly end up having noise-burst or noise-segments on my generated audios; sometimes that covers the whole generation, sometimes it just comes at a certain point.
This does not seem to improve with epochs (after a couple of days of training, I still have the same noise burst), and this seems to happen throughout different datasets (mostly classical music) - oddly enough, this seems to be better in noisy datasets, such as rain or water sounds :)
Does anyone have an idea of how to avoid or limit this issue?
Is there a parameter I should fine tune for this?
Thanks again,
Daniele
The text was updated successfully, but these errors were encountered: