Skip to content

v3.0.4 PerTok tokenizer and Attribute Controls

Latest
Compare
Choose a tag to compare
@Natooz Natooz released this 15 Sep 10:42
· 11 commits to main since this release
7ea77d4

This release introduces the PerTok tokenizer by Lemonaide AI, attribute controls tokens and minor fixes.

Highlights

PerTok: Performance Tokenizer

(associated paper to be released)

Developed by Julian Lenz (@JLenzy) at Lemonaide AI to capture expressive timing in symbolic scores while maintaining competitively low sequence lengths. It achieves this by dividing time differences into Macro and Micro categories, introducing a new MicroTime token type. Subtle deviations from the quantized beat are represented with these Timeshift tokens.
Furthermore, PerTok enables you to encode an unlimited number of note subdivisions by enabling multiple, overlapping values within the 'beat_res' parameter of the TokenizerConfig.

The micro timing tokens will be extended to all tokenizers in a future update.

### Attribute Control tokens

Attribute controls are additional tokens allowing to train models in order to control them during inference, by enforcing a model to predict music with specific features.

What's Changed

  • updates to Example_HuggingFace_Mistral_Transformer.ipynb by @briane412 in #164
  • _model_name is now a protected property by @Natooz in #165
  • Fixing docs for tokenizer training by @Natooz in #167
  • Default continuing_subword_prefix when splitting token sequences by @Natooz in #168
  • small bug fix in MIDI pretokenization by @shenranwang in #170
  • adding no_preprocess_score argument when tokenizing by @Natooz in #172
  • TokSequence summable, concatenate_track_sequences arg for MMM by @Natooz in #173
  • Docs update by @Natooz in #175
  • Fixing split methods for empty files (no tracks and/or no notes) by @Natooz in #177
  • Logo now with white outer stroke by @Natooz in #180
  • Attribute controls feature by @helloWorld199 in #181
  • better distinction between one_token_stream and config.one_token_stream_for_programs by @Natooz in #182
  • making sure MMM token sequences are not concatenated when splitting them per bar/beat in tokenizer_training_iterator.py by @Natooz in #183
  • rST Documentation fixes by @scottclowe in #184
  • Bump actions/stale from 5.1.1 to 9.0.0 by @dependabot in #185
  • Bump actions/download-artifact from 3 to 4 by @dependabot in #186
  • Bump codecov/codecov-action from 3.1.0 to 4.5.0 by @dependabot in #187
  • Bump actions/upload-artifact from 3 to 4 by @dependabot in #188
  • Fixing bugs caused by changes from symusic v0.5.0 by @Natooz in #192
  • use_velocities and use_duration configuration parameters by @Natooz in #193
  • collator now handles decoder input ids (seq2seq models) by @Natooz in #194
  • PerTok Tokenizer by @JLenzy in #191

New Contributors

Full Changelog: v3.0.3...v3.0.4