Org #86

apmcleod · 2019-10-31T05:52:42Z

Fixes a number of issues:

Fixes Change Error Identification -> Error Location throughout. #76 by renaming task 3 to Location.
Fixes Decide what datasets to include in ACME v1.0, and what parameters to use for the degs. #42 with default values for all degradation params. I also use the default min/max pitches throughout the code: when creating the pianoroll and command datasets (make sure this seems ok for command), and as defaults for arg parsing.
Fixes If command/pianoroll datasets only partially made, they are not re-made in full #66 by adding a --reformat arg to train_task.py, forcing csv re-creation.
Fixes Support evaluation for people not using our trainers #80 by adding evaluators for all tasks in mdtk.eval.py. Check how I've named the task 4 evaluator (as ErrorCorrection, a la the others, and also as helpfulness = ErrorCorrection).
Fixes Fix the absolute diatribe of warnings and logs spewed out for make_dataset.py #52 by adding a --verbose (-v) option to make_dataset, and suppressing warnings from the degradations within the script (because we handle them with code). This allows other (important) warnings to print. Also adds verbose option to the downloaders, and defaults all verbose args to False (there and in the filesystem_utils). Also, places the unimportant (I'd say) warnings in filesystem_utils behind the --verbose flag. Shortened some of the tqdm texts (it's just long filepaths).
Fixes Probably make --clear default in make_dataset.py #87 by making clearing the output and input directory default. Using --stale-data causes --input-dir not to be cleared. There is no way not to clear --output-dir. I just cannot think of a use case for this.

Additionally, this does some work on #37.

Please also check #37, #27, and #22. For these, I am happy to not fix them. If you agree, remove the milestone marker from them (and optionally also close the issues).

… codebase. INCLUDING CommandDataset! Fixes to #42

…csvs. Fixes #66

…'s output size to the correct length. The trainers don't need this info (or can just read the length of the output in the case of creating the conf_mat). This will throw some errors if a Dataset returns a data point larger than the values read from degreadation_name (which should never happen). Fixes #81

…s and re-ordering existing). Related to #37.

#80.

apmcleod · 2019-10-31T08:30:42Z

mdtk/degradations.py

@@ -91,9 +99,11 @@ def pre_process(df, sort=False):
    df : pd.DataFrame
        The postprocessed dataframe.
    """
+    df = df.loc[:, NOTE_DF_SORT_ORDER]


(RE: #37)

There is an argument for removing this line:

Nothing would err with too many columns. Nor would any existing columns be removed. Rather, added notes would just have NaNs in those columns, which is fine.

Some degradations (remove_note, shift_XXX, join_notes) would still work fine with too many columns.

This would cause dfs with too few columns to work for remove_note (still fail on the others).

However, this would cause the resulting dfs to have floats instead of ints if a new note is added (since NaN can't be an int).

Why did we put it there in the first place? Is there somewhere that insists on a column order (presume so!).

This line also removes additional columns, which would break some degradations. The reordering is just a happy accident.

Ah yes, I see. I think in another module I wanted the dataframes to have a specific set of columns and in a specific order when they were first created, pre-degradation. I don't see a reason for this restriction in your pre_process function for degradations here. I think you're good to remove and see if it breaks any tests!

It will break this test I added: #86 (comment)

But no others. Although that's only because we never test with incorrect columns.

The point with having this line is for if people make their own dfs, but I think it's fine to assume they have a reasonable set, and weird things will happen if not (eg. Nan's and floats appear)

I can't change this easily now from Switzerland

apmcleod · 2019-10-31T08:32:43Z

mdtk/tests/test_degradations.py

+        'onset': [0.5, 100.5, 200.5, 200.5],
+        'pitch': [10, 20, 30.5, 40],
+        'dur': [100, 100, 100.5, 100],
+        'extra': [5, 5, 'apple', None]


Also remove this line if you remove the previous line.

…ess that is used. Fixes #52

…stale-data is used. Fixes #87

apmcleod · 2019-11-01T06:38:20Z

baselines/eval_task.py

-    os.environ["PYTHONWARNINGS"] = "ignore" # Also affect subprocesses
+warnings.showwarning = print_warn_msg_only
+# TODO: This should ideally be 'once', but it doesn't work for some reason
+warnings.filterwarnings('ignore', message='.* exceeds given seq_len')


apmcleod · 2019-11-01T06:39:31Z

baselines/train_task.py

-
+warnings.showwarning = print_warn_msg_only
+# TODO: This should ideally be 'once', but it doesn't work for some reason
+warnings.filterwarnings('ignore', message='.* exceeds given seq_len')


apmcleod · 2019-11-01T06:39:34Z

baselines/train_task.py

-
+warnings.showwarning = print_warn_msg_only
+# TODO: This should ideally be 'once', but it doesn't work for some reason
+warnings.filterwarnings('ignore', message='.* exceeds given seq_len')


JamesOwers

~~Bon, I think i'd done this in another branch, but not pushed to main. Silly me!~~ ugh, was trying to comment on specific commits...why this so difficult!

JamesOwers · 2019-11-12T15:59:17Z

I think I just managed to cancel all my comments on files...great! Not my day! Anyway, I'm going to approve all this and test as I'm making ACME v1.0.

I'll reply inline to your initial comment:

Fixes a number of issues:

Fixes Change Error Identification -> Error Location throughout. #76 by renaming task 3 to Location.

LGTM

Fixes Decide what datasets to include in ACME v1.0, and what parameters to use for the degs. #42 with default values for all degradation params. I also use the default min/max pitches

throughout the code: when creating the pianoroll and command datasets (make sure this seems ok for command), and as defaults for arg parsing.
Great! Now I can't get these wrong!

Fixes If command/pianoroll datasets only partially made, they are not re-made in full #66 by adding a --reformat arg to train_task.py, forcing csv re-creation.

I like

Fixes Support evaluation for people not using our trainers #80 by adding evaluators for all tasks in mdtk.eval.py. Check how I've named the task 4 evaluator (as ErrorCorrection, a la the others, and also as helpfulness = ErrorCorrection).

Only thing I don't like is the breaking of the naming convention that only classes begin with an upper case letter. But bugger it. If I care enough, I'll either rename them, or make them classes with a call in future.

Fixes Fix the absolute diatribe of warnings and logs spewed out for make_dataset.py #52 by adding a --verbose (-v) option to make_dataset, and suppressing warnings from the degradations within the script (because we handle them with code). This allows other (important) warnings to print. Also adds verbose option to the downloaders, and defaults all verbose args to False (there and in the filesystem_utils). Also, places the unimportant (I'd say) warnings in filesystem_utils behind the --verbose flag. Shortened some of the tqdm texts (it's just long filepaths).

This is great. As I say in the issue - minimal printing is good

Fixes Probably make --clear default in make_dataset.py #87 by making clearing the output and input directory default. Using --stale-data causes --input-dir not to be cleared. There is no way not to clear --output-dir. I just cannot think of a use case for this.

Much prefer this default

Additionally, this does some work on #37.

Please also check #37, #27, and #22. For these, I am happy to not fix them. If you agree, remove the milestone marker from them (and optionally also close the issues).

Have commented on all of these and will move about when I do a full sweep of all issues.

apmcleod added 7 commits October 31, 2019 10:50

Changed Identification to Location with sed. Fixes #76

9968218

Added reasonable default values for all degs, and use them throughout…

c504519

… codebase. INCLUDING CommandDataset! Fixes to #42

Added --reformat arg to train_task.py to force re-creation of format …

d5c9eea

…csvs. Fixes #66

All degs round df and convert to int. Related to #37.

6afee70

All degs now handle extra and out-of-order columns (by removing extra…

cf3935b

…s and re-ordering existing). Related to #37.

Added evaluators for all tasks independent of our trainers/models. Fixes

36d35bf

#80.

apmcleod requested a review from JamesOwers October 31, 2019 05:52

apmcleod commented Oct 31, 2019

View reviewed changes

apmcleod added 8 commits November 1, 2019 11:34

Added --verbose (-v) option to make_dataset, and supress printing unl…

0d0e270

…ess that is used. Fixes #52

Always clear --output-dir, default to clearning --input-dir unless --…

4aaba70

…stale-data is used. Fixes #87

Added progress bar for local midi and csv

bbada68

Fix for errors in midi files

10cd46a

Fixed bug garbling metadata.csv

e1e6160

Fixed the last min/max pitch

a1b7ab5

Changed warning filters on train/eval to simpler. Should fix this.

166e219

Added note to trainer about df conversion

e9d324a

apmcleod commented Nov 1, 2019

View reviewed changes

JamesOwers reviewed Nov 12, 2019

View reviewed changes

JamesOwers merged commit f2a2712 into master Nov 12, 2019

apmcleod deleted the org branch November 16, 2019 05:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Org #86

Org #86

apmcleod commented Oct 31, 2019 •

edited

Loading

apmcleod Oct 31, 2019 •

edited

Loading

JamesOwers Nov 12, 2019

apmcleod Nov 12, 2019

JamesOwers Nov 13, 2019

apmcleod Nov 13, 2019

apmcleod Nov 13, 2019

apmcleod Oct 31, 2019

apmcleod Nov 1, 2019

apmcleod Nov 1, 2019

apmcleod Nov 1, 2019

JamesOwers left a comment •

edited

Loading

JamesOwers commented Nov 12, 2019 •

edited

Loading

Org #86

Org #86

Conversation

apmcleod commented Oct 31, 2019 • edited Loading

apmcleod Oct 31, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JamesOwers left a comment • edited Loading

Choose a reason for hiding this comment

JamesOwers commented Nov 12, 2019 • edited Loading

apmcleod commented Oct 31, 2019 •

edited

Loading

apmcleod Oct 31, 2019 •

edited

Loading

JamesOwers left a comment •

edited

Loading

JamesOwers commented Nov 12, 2019 •

edited

Loading