-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Pop2Piano #21785
Add Pop2Piano #21785
Conversation
Hi @ArthurZucker the implementation is almost ready(also tests) but I feel that the way I implemented this model is not a descent level, thats why I want you to take a look at the Please don't mind about docs I will change them afterwards EDIT : Please ignore this |
d6f6f7e
to
e2cbd03
Compare
(Here is the author of pop2piano) |
30b4d9d
to
5c81213
Compare
@sweetcocoa Thanks for you comments, HF team has helped me a lot in this integration. |
For solving the import issues, you have to create a |
Hi @ArthurZucker thanks for you comment! |
src/transformers/models/pop2piano/feature_extraction_pop2piano.py
Outdated
Show resolved
Hide resolved
src/transformers/models/pop2piano/feature_extraction_pop2piano.py
Outdated
Show resolved
Hide resolved
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
b9357cd
to
5dd2d02
Compare
Hi @ArthurZucker sorry for the delay but the tests are green now! please review it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work already! Left a few comments, mostly let's remove as many dependencies as we can + remove one-liners. Congrats on getting matching logits btw 🔥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work already! Left a few comments, mostly let's remove as many dependencies as we can + remove one-liners. Congrats on getting matching logits btw 🔥
Hi @ArthurZucker , sorry for the huge delay, I have made most of the changes that you asked. Also there are some changes that I didn't do these are below -
The licensing page of Also please forgive me if I missed something, I will change them in the future commit. |
Hey! Will try to have a look asap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks fine
Pinging @sanchit-gandhi for a review too! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already very clean! Thanks for all you efforts 🔥 will let @sanchit-gandhi do a review too!
src/transformers/models/pop2piano/feature_extraction_pop2piano.py
Outdated
Show resolved
Hide resolved
Hi @ArthurZucker I have pushed the changes you requested except this one(I have asked for some help about that on the respective thread). BTW the failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks a lot better thanks!
Hi @ArthurZucker I have added the Please let me know if any more changes are needed to the tokenizer or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot better! Thanks for the simplification
cc @sanchit-gandhi for a final review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the huge effort on iterating here @susnato. In particular, the tokenizer code is pretty tricky, where you have to manage different array types and logic for post-processing. But overall it's looking very clean, so nice job!
Overall, I think the high-level API looks very clean and in-keeping with the other transformers
audio models. The complexity is buried mainly in the feature extractor and tokenizer, which IMO is fine given how difficult the pre- and post-processing is for this model
Just a few minor suggestions regarding the feature extractor - otherwise I very much agree with the recent round of changes suggested by @ArthurZucker for the tokenizer!
src/transformers/models/pop2piano/feature_extraction_pop2piano.py
Outdated
Show resolved
Hide resolved
|
||
return mel_specs | ||
|
||
def log_mel_spectrogram(self, sequence: np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually feel like this could have been a single function that does the mel spectrogram and takes the logarithm - I personally think thin wrappers around functions hurts readability of code, and is somewhat against the transformers
design where we avoid writing multiple small functions in place of a single larger one. But happy to keep it split into two if that's what @amyeroberts prefers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe that's part of the design philosophy of transformers! It's true that in some case, particularly forward passes, we like the logic to be more verbose and contained within the method such that the model logic is clear. Generally though, functions should be small and only do one thing. As the comments highlights, readability and clarity are unfortunately subjective :)
The reason I asked was because in code we should write fn do_a
and fn do_b
and then compose with b(a(x))
instead of fn do_a_and_b
. As @sanchit-gandhi mentions, log_mel_spectrogram
is effectively a small wrapper, and if I were to implement, I probably would just have mel_spectrogram
, and do np.log(x)
in the __call__
method. Or have mel_spectrogram
as a module level function that the log_mel_spectrogram
method calls.
I don't think it matters too much, and we can even revert back to a single log_mel_spectrogram
and split things up in the future if we needed to. @susnato - I'm happy to go with whatever you think is best here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like both of your design choices so either way works for me. Personally I would like to keep it as it is and change it in the future(if needed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, in that case if you don't have a preference, let's just do the log transform in the __call__
method: it's easier to add methods than take away in this library because of backwards compatibility considerations
Hi @sanchit-gandhi I have pushed the changes. |
Hey @susnato! Thanks for pushing the latest round of changes - would you mind resolving any comment threads that you've addressed, so that the reviewer knows what's been implemented and what's still pending? Thanks! |
Hi @sanchit-gandhi, done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work!
Thanks for all the rounds of iterating and being so patient and collaborative whilst we found a design for this model which works in the library. Now the tokenizer design is settled, only thing we need to add are a few more tests to make sure it works as expected and doesn't break in the future. Once that's done I think we're good to merge!
@@ -0,0 +1,190 @@ | |||
# Copyright 2022 The HuggingFace Inc. team. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copyright 2022 The HuggingFace Inc. team. All rights reserved. | |
# Copyright 2023 The HuggingFace Inc. team. All rights reserved. |
processor_outputs = processor(audio=input_speech, sampling_rate=sampling_rate, return_tensors="np") | ||
|
||
for key in feature_extractor_outputs.keys(): | ||
self.assertAlmostEqual(feature_extractor_outputs[key].sum(), processor_outputs[key].sum(), delta=1e-2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite a big difference considering they should be doing exactly the same thing 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I have changed it to 1e-4
in the current version.
|
||
return mel_specs | ||
|
||
def log_mel_spectrogram(self, sequence: np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, in that case if you don't have a preference, let's just do the log transform in the __call__
method: it's easier to add methods than take away in this library because of backwards compatibility considerations
|
||
# This is the test for a real music from K-Pop genre. | ||
@slow | ||
def test_real_music(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really more of an integration test for whole pipeline than the tokenizer itself: many things can change upstream which would affect the output. It's a great test though! Could we move it to test_modeling_pop2piano.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
@require_torch | ||
@require_pretty_midi | ||
class Pop2PianoTokenizerTest(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense that we can't use TokenizerTesterMixin
here. We should add any equivalent tests from that mixin that would apply here though e.g. test_get_vocab, test_save_and_load_tokenizer, test_internal_consistency.
There should also be tests for this tokenizer's specific logic e.g. mapping to / from the token_type and values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should also be tests for this tokenizer's specific logic e.g. mapping to / from the token_type and values
If I am not wrong you are talking about conversion between token_ids
and tokens
(notes here) right? We already have that in the tests - test_call
and test_batch_decode
.
In addition to that, I have added the following tests - test_get_vocab
, test_save_and_load_tokenizer
, test_pickle_tokenizer
, test_padding_side_in_kwargs
, test_truncation_side_in_kwargs
, test_right_and_left_padding
, test_right_and_left_truncation
, test_padding_to_multiple_of
and test_padding_with_attention_mask
.
Btw test_internal_consistency
is not valid for this tokenizer because the outputs of the Feature Extractor is used in the tokenizer's batch_decode
method in order to influence the conversion of the token_ids
to notes
, whereas the __call__
method converts notes
to token_ids
without using/generating similar outputs like the feature extractor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am not wrong you are talking about conversion between token_ids and tokens(notes here) right? We already have that in the tests - test_call and test_batch_decode.
Yes, that's what I'm talking about. AFAICT, these tests only test the type of the returned objects, not their values. It would be good to have at least one test that makes sure the correct token type is mapped to notes in the relative_tokens_ids_to_notes
and notes_to_midi
methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @amyeroberts, in the recent push, I have extended the test_call
to test for accuracy of the outputs of __call__
method and created another test (test_batch_decode_outputs
) to check for the accuracy of the outputs of batch_decode
method.
Please let me know if any more tests are needed or not.
Hi @amyeroberts I have pushed the changes you asked and added more tokenizer tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work!
Because of the structure of the model - it was a lot more complex than the average model PR to integrate this into the library. Thanks for all your work iterating with us and adding this to the library 🤗❤️
) | ||
) | ||
|
||
def check_resize_embeddings_Pop2Piano_v1_1( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: pep8 - capitalization for function names
def check_resize_embeddings_Pop2Piano_v1_1( | |
def check_resize_embeddings_pop2piano_v1_1( |
Hi @amyeroberts I have pushed the change. |
The checks are green!
Yes it was relatively hard but I learned a lot about this amazing library from this integration! |
* init commit * config updated also some modeling * Processor and Model config combined * extraction pipeline(upto before spectogram & mel_conditioner) added but not properly tested * model loading successful! * feature extractor done! * FE can now be called from HF * postprocessing added in fe file * same as prev commit * Pop2PianoConfig doc done * cfg docs slightly changed * fe docs done * batched * batched working! * temp * v1 * checking * trying to go with generate * with generate and model tests passed * before rebasing * . * tests done docs done remaining others & nits * nits * LogMelSpectogram shifted to FeatureExtractor * is_tf rmeoved from pop2piano/init * import solved * tokenization tests added * minor fixed regarding modeling_pop2piano * tokenizer changed to only return midi_object and other changes * Updated paper abstract(Camera-ready version) (#2) * more comments and nits * ruff changes * code quality fix * sg comments * t5 change added and rebased * comments except batching * batching done * comments * small doc fix * example removed from modeling * ckpt * forward it compatible with fe and generation done * comments * comments * code-quality fix(maybe) * ckpts changed * doc file changed from mdx to md * test fixes * tokenizer test fix * changes * nits done main changes remaining * code modified * Pop2PianoProcessor added with tests * other comments * added Pop2PianoProcessor to dummy_objects * added require_onnx to modeling file * changes * update .md file * remove extra line in index.md * back to the main index * added pop2piano to index * Added tokenizer.__call__ with valid args and batch_decode and aligned the processor part too * changes * added return types to 2 tokenizer methods * the PR build test might work now * added backends * PR build fix * vocab added * comments * refactored vocab into 1 file * added conversion script * comments * essentia version changed in .md * comments * more tokenizer tests added * minor fix * tests extended for outputs acc check * small fix --------- Co-authored-by: Jongho Choi <[email protected]>
* init commit * config updated also some modeling * Processor and Model config combined * extraction pipeline(upto before spectogram & mel_conditioner) added but not properly tested * model loading successful! * feature extractor done! * FE can now be called from HF * postprocessing added in fe file * same as prev commit * Pop2PianoConfig doc done * cfg docs slightly changed * fe docs done * batched * batched working! * temp * v1 * checking * trying to go with generate * with generate and model tests passed * before rebasing * . * tests done docs done remaining others & nits * nits * LogMelSpectogram shifted to FeatureExtractor * is_tf rmeoved from pop2piano/init * import solved * tokenization tests added * minor fixed regarding modeling_pop2piano * tokenizer changed to only return midi_object and other changes * Updated paper abstract(Camera-ready version) (#2) * more comments and nits * ruff changes * code quality fix * sg comments * t5 change added and rebased * comments except batching * batching done * comments * small doc fix * example removed from modeling * ckpt * forward it compatible with fe and generation done * comments * comments * code-quality fix(maybe) * ckpts changed * doc file changed from mdx to md * test fixes * tokenizer test fix * changes * nits done main changes remaining * code modified * Pop2PianoProcessor added with tests * other comments * added Pop2PianoProcessor to dummy_objects * added require_onnx to modeling file * changes * update .md file * remove extra line in index.md * back to the main index * added pop2piano to index * Added tokenizer.__call__ with valid args and batch_decode and aligned the processor part too * changes * added return types to 2 tokenizer methods * the PR build test might work now * added backends * PR build fix * vocab added * comments * refactored vocab into 1 file * added conversion script * comments * essentia version changed in .md * comments * more tokenizer tests added * minor fix * tests extended for outputs acc check * small fix --------- Co-authored-by: Jongho Choi <[email protected]>
What does this PR do?
Adds Pop2Piano model to HuggingFace.
Fixes #20126
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker