Fixup all checkpointing examples #323

muellerzr · 2022-04-21T16:14:36Z

Fix logic in all checkpointing examples

What does this add?

This PR fixes a number of bugs currently present in the save/load examples

Who is it for?

Closes #322

Why is it needed?

As I was exploring solving 322 and writing tests, I was noticing that some behaviors weren't quite behaving how I would have expected them to.

It also didn't make logical sense to me that if we resume at epoch 1, the numbering starts at epoch 0 again (and thus, our checkpoint saves do as well!). So, that behavior had to change slightly.

HuggingFaceDocBuilderDev · 2022-04-21T16:23:22Z

The documentation is not available anymore as the PR was closed or merged.

muellerzr · 2022-04-21T17:45:44Z

@sgugger 50/50 on whether to consider these "slow" tests or not. They add ~1.5 min to the CI

sgugger

LGTM! Just one question on the tests added (nice addition btw :-) )

Thanks a lot for fixing those!

sgugger · 2022-04-21T17:51:53Z

tests/test_examples.py

            with mock.patch.object(sys, "argv", testargs):
                checkpointing.main()
-                self.assertTrue(os.path.exists(os.path.join(tmpdir, "epoch_0")))
+            with self.assertRaises(AssertionError):
+                mocked_print.assert_any_call("epoch 0:", {"accuracy": 0.5, "f1": 0.0})


How are we sure we will get those values exactly?

This stems from the scheduler we use in the examples, it makes it impossible for the model to train quickly so we always get an accuracy of .5 and an f1 of 0 for all of our epochs. Hence why none of these example tests check for if we get "good" accuracy, it's for independent behavior.

But I dug deeper and found mock.ANY. In this case we only care about matching the epoch * text, not the results. So instead we have something like this:

dummy_results = {"accuracy":mock.ANY, "f1":mock.ANY} with self.assertRaises(AssertionError): mocked_print.assert_any_call("epoch 0:", dummy_results)

Which helps me sleep much better at night

You never know if your model might have learned a tiny something, so yes, much better :-)

Fix all issues

3a4af7e

muellerzr added the bug Something isn't working label Apr 21, 2022

muellerzr requested a review from sgugger April 21, 2022 16:14

Rm extra print

b844c38

muellerzr marked this pull request as draft April 21, 2022 16:27

muellerzr removed the request for review from sgugger April 21, 2022 16:27

muellerzr added 5 commits April 21, 2022 12:40

Diff catch

986f120

Try seperating out the tests?

04f3977

Try with different setup

1036afc

Reduce time hopefully

454d36f

Revert

fb8dda3

muellerzr marked this pull request as ready for review April 21, 2022 17:44

muellerzr requested a review from sgugger April 21, 2022 17:44

sgugger approved these changes Apr 21, 2022

View reviewed changes

Use mocking on the results

4f9ed92

muellerzr merged commit 3e14dd1 into main Apr 21, 2022

muellerzr deleted the pass-logic branch April 21, 2022 18:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixup all checkpointing examples #323

Fixup all checkpointing examples #323

muellerzr commented Apr 21, 2022

HuggingFaceDocBuilderDev commented Apr 21, 2022 •

edited

Loading

muellerzr commented Apr 21, 2022

sgugger left a comment

sgugger Apr 21, 2022

muellerzr Apr 21, 2022

sgugger Apr 21, 2022

Fixup all checkpointing examples #323

Fixup all checkpointing examples #323

Conversation

muellerzr commented Apr 21, 2022

Fix logic in all checkpointing examples

What does this add?

Who is it for?

Why is it needed?

HuggingFaceDocBuilderDev commented Apr 21, 2022 • edited Loading

muellerzr commented Apr 21, 2022

sgugger left a comment

Choose a reason for hiding this comment

sgugger Apr 21, 2022

Choose a reason for hiding this comment

muellerzr Apr 21, 2022

Choose a reason for hiding this comment

sgugger Apr 21, 2022

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Apr 21, 2022 •

edited

Loading