Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Org #86

Merged
merged 15 commits into from
Nov 12, 2019
Merged

Org #86

merged 15 commits into from
Nov 12, 2019

Conversation

apmcleod
Copy link
Collaborator

@apmcleod apmcleod commented Oct 31, 2019

Fixes a number of issues:

Additionally, this does some work on #37.

Please also check #37, #27, and #22. For these, I am happy to not fix them. If you agree, remove the milestone marker from them (and optionally also close the issues).

… codebase. INCLUDING CommandDataset! Fixes to #42
…'s output size to the correct length. The trainers don't need this info (or can just read the length of the output in the case of creating the conf_mat). This will throw some errors if a Dataset returns a data point larger than the values read from degreadation_name (which should never happen). Fixes #81
@apmcleod apmcleod requested a review from JamesOwers October 31, 2019 05:52
@@ -91,9 +99,11 @@ def pre_process(df, sort=False):
df : pd.DataFrame
The postprocessed dataframe.
"""
df = df.loc[:, NOTE_DF_SORT_ORDER]
Copy link
Collaborator Author

@apmcleod apmcleod Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(RE: #37)

There is an argument for removing this line:

  • Nothing would err with too many columns. Nor would any existing columns be removed. Rather, added notes would just have NaNs in those columns, which is fine.
  • Some degradations (remove_note, shift_XXX, join_notes) would still work fine with too many columns.
  • This would cause dfs with too few columns to work for remove_note (still fail on the others).
  • However, this would cause the resulting dfs to have floats instead of ints if a new note is added (since NaN can't be an int).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we put it there in the first place? Is there somewhere that insists on a column order (presume so!).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line also removes additional columns, which would break some degradations. The reordering is just a happy accident.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I see. I think in another module I wanted the dataframes to have a specific set of columns and in a specific order when they were first created, pre-degradation. I don't see a reason for this restriction in your pre_process function for degradations here. I think you're good to remove and see if it breaks any tests!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will break this test I added: #86 (comment)

But no others. Although that's only because we never test with incorrect columns.

The point with having this line is for if people make their own dfs, but I think it's fine to assume they have a reasonable set, and weird things will happen if not (eg. Nan's and floats appear)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't change this easily now from Switzerland

'onset': [0.5, 100.5, 200.5, 200.5],
'pitch': [10, 20, 30.5, 40],
'dur': [100, 100, 100.5, 100],
'extra': [5, 5, 'apple', None]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also remove this line if you remove the previous line.

os.environ["PYTHONWARNINGS"] = "ignore" # Also affect subprocesses
warnings.showwarning = print_warn_msg_only
# TODO: This should ideally be 'once', but it doesn't work for some reason
warnings.filterwarnings('ignore', message='.* exceeds given seq_len')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #88


warnings.showwarning = print_warn_msg_only
# TODO: This should ideally be 'once', but it doesn't work for some reason
warnings.filterwarnings('ignore', message='.* exceeds given seq_len')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #88


warnings.showwarning = print_warn_msg_only
# TODO: This should ideally be 'once', but it doesn't work for some reason
warnings.filterwarnings('ignore', message='.* exceeds given seq_len')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #88

Copy link
Owner

@JamesOwers JamesOwers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bon, I think i'd done this in another branch, but not pushed to main. Silly me! ugh, was trying to comment on specific commits...why this so difficult!

@JamesOwers
Copy link
Owner

JamesOwers commented Nov 12, 2019

I think I just managed to cancel all my comments on files...great! Not my day! Anyway, I'm going to approve all this and test as I'm making ACME v1.0.

I'll reply inline to your initial comment:

Fixes a number of issues:

LGTM

throughout the code: when creating the pianoroll and command datasets (make sure this seems ok for command), and as defaults for arg parsing.
Great! Now I can't get these wrong!

I like

Only thing I don't like is the breaking of the naming convention that only classes begin with an upper case letter. But bugger it. If I care enough, I'll either rename them, or make them classes with a call in future.

  • Fixes Fix the absolute diatribe of warnings and logs spewed out for make_dataset.py #52 by adding a --verbose (-v) option to make_dataset, and suppressing warnings from the degradations within the script (because we handle them with code). This allows other (important) warnings to print. Also adds verbose option to the downloaders, and defaults all verbose args to False (there and in the filesystem_utils). Also, places the unimportant (I'd say) warnings in filesystem_utils behind the --verbose flag. Shortened some of the tqdm texts (it's just long filepaths).

This is great. As I say in the issue - minimal printing is good

Much prefer this default

Additionally, this does some work on #37.

Please also check #37, #27, and #22. For these, I am happy to not fix them. If you agree, remove the milestone marker from them (and optionally also close the issues).

Have commented on all of these and will move about when I do a full sweep of all issues.

@JamesOwers JamesOwers merged commit f2a2712 into master Nov 12, 2019
@apmcleod apmcleod deleted the org branch November 16, 2019 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment