Set stored kcal to healthy kcal for testing #46941

actual-nh · 2021-01-22T15:02:14Z

Summary

Infrastructure "Set stored kcal to healthy kcal for testing"

Purpose of change

Randomizing test orders (#46473) reveals that one type of problematic interference is due to variable stored calories from prior activity.

Describe the solution

In clear_character() (in tests/player_helpers.cpp), set stored kcal to healthy kcal. (Note: Setting to any less causes errors; setting to any more puts BMI above 25.)

Testing

See #46473. (From that, some additional setting of vitamins may also be needed, but this is a start.)

Additional context

~~See #46934 for another case where it appears likely this will help.~~ Never mind on that one; a more frequent symptom is the "test subject" starving to death in the middle of a test...

Randomizing test orders reveals that one type of problematic interference is due to variable stored calories from prior activity. (cherry picked from commit 306bfb4cbd817d4db60e92807025549050a576ff)

BrettDong · 2021-01-28T01:58:38Z

Probably you need to update the numbers in unit tests:

../tests/weary_test.cpp:170: FAILED:
1371   CHECK( info.transition_minutes( 0, 1, 130_minutes ) == Approx( 130 ).margin( 5 ) )
1372 with expansion:
1373   120 == Approx( 130.0 )
1374 with messages:
1375   Weariness: 3 Max Full Exert: EXTRA_EXERCISE Mult: 1
1376   BMR: 1741 Intake: 932 Tracker: 7 Thresh: 846 At: 0
1377   Cal: 55393/55000 Fatigue: 289 Morale: 0 Wgt: 76825 (BMI 25.1)
1378   Transition: Weariness lvl from 0 to 1 at 120 min (W 1024 Th 1020)

actual-nh · 2021-01-28T02:22:28Z

No longer applicable: ~~I am not sure why the Github build tests are still showing up as "in progress" - the g++-7 one had test failures, and the others were accordingly cancelled.~~

Probably you need to update the numbers in unit tests:

Looks like it. The debug_weary_info output at the start indicates the prior test had formerly left the calories at over 55000, influencing the test; now, it isn't (at least via calories). That the results differed between g++-7 and the Travis+Appveyor is interesting, and may imply it would be preferable to set those differing by 10 between actual and expected to the mid-point.

../tests/weary_test.cpp:170: FAILED:
1371   CHECK( info.transition_minutes( 0, 1, 130_minutes ) == Approx( 130 ).margin( 5 ) )
1372 with expansion:
1373   120 == Approx( 130.0 )

1377 Cal: 55393/55000 Fatigue: 289 Morale: 0 Wgt: 76825 (BMI 25.1)
1378 Transition: Weariness lvl from 0 to 1 at 120 min (W 1024 Th 1020)

…anch 'master' of https://github.com/CleverRaven/Cataclysm-DDA into set_test_kcal

Previous weary testing was distored by variable incoming kcal stored, with the current default test order giving more kcal than the expected 55,000.

actual-nh · 2021-01-30T16:44:13Z

Interesting. The earlier build had given differing results between the General (g++-7) and Appveyor (and Travis - don't know yet on this one for Travis; seems to be delayed) for the 24-hour weary digging task - different enough that I couldn't set the expected test results to cover both of them:

Previous Travis:

  Transition: Weariness lvl from 0 to 1 at 120 min (W 1024 Th 1020)
  Transition: Weariness lvl from 1 to 2 at 250 min (W 2020 Th 1000)
  Transition: Weariness lvl from 2 to 3 at 360 min (W 2735 Th 983)
  Transition: Weariness lvl from 3 to 4 at 470 min (W 3228 Th 968)
  Transition: Weariness lvl from 4 to 5 at 590 min (W 3558 Th 951)
  Transition: Weariness lvl from 5 to 6 at 715 min (W 3805 Th 932)
  Transition: Weariness lvl from 6 to 7 at 810 min (W 3945 Th 920)
  Transition: Weariness lvl from 7 to 8 at 885 min (W 4059 Th 910)

Previous Appveyor:

  Transition: Weariness lvl from 0 to 1 at 120 min (W 1027 Th 1020)
  Transition: Weariness lvl from 1 to 2 at 250 min (W 2023 Th 1000)
  Transition: Weariness lvl from 2 to 3 at 360 min (W 2739 Th 983)
  Transition: Weariness lvl from 3 to 4 at 465 min (W 3207 Th 968)
  Transition: Weariness lvl from 4 to 5 at 590 min (W 3549 Th 951)
  Transition: Weariness lvl from 5 to 6 at 715 min (W 3795 Th 933)
  Transition: Weariness lvl from 6 to 7 at 815 min (W 3941 Th 919)
  Transition: Weariness lvl from 7 to 8 at 890 min (W 4056 Th 909)

Previous General (g++7):

  Transition: Weariness lvl from 0 to 1 at 125 min (W 1078 Th 1019)
  Transition: Weariness lvl from 1 to 2 at 250 min (W 2023 Th 1000)
  Transition: Weariness lvl from 2 to 3 at 360 min (W 2737 Th 983)
  Transition: Weariness lvl from 3 to 4 at 470 min (W 3208 Th 968)
  Transition: Weariness lvl from 4 to 5 at 600 min (W 3574 Th 949)
  Transition: Weariness lvl from 5 to 6 at 735 min (W 3765 Th 930)
  Transition: Weariness lvl from 6 to 7 at 840 min (W 3928 Th 916)
  Transition: Weariness lvl from 7 to 8 at 925 min (W 4037 Th 905)

Now, Appveyor (and General g++-7) match(es) the General (g++-7) one above. I think I'll try using sufficiently close to the General/g++-7 numbers above.

BrettDong · 2021-01-30T16:50:12Z

What is the reason do you think that the results differ? Is it because some unit test elsewhere corrupted internal state, or is it related to compiler / platform?

At least currently, two different sources (Github General build g++-7 and Appveyor) are giving the same numbers for the 24-hour digging weary test. This updates the target test numbers to match these, with a bit of wiggle room in the direction of the earlier results (partially since Travis hasn't given any results yet).

actual-nh · 2021-01-30T16:58:41Z

What is the reason do you think that the results differ? Is it because some unit test elsewhere corrupted internal state, or is it related to compiler / platform?

Appveyor and the others differ in whether tests are run in parallel, so it's difficult to tell there. The Github and Travis runs should not be differing in test order, so there is at least an intermittent compiler/platform issue.

BrettDong · 2021-01-30T17:02:38Z

Does it give the exact same result if run unit tests with the same RNG seed and on the exact same environment, say locally, multiple times?

actual-nh · 2021-01-30T17:06:14Z

Does it give the exact same result if run unit tests with the same RNG seed and on the exact same environment, say locally, multiple times?

Actually, running just the weary tests 3 times locally, with different orderings (not much reordering for 3 tests, admittedly...) and different RNG seeds, gives consistent results - different from any of the above. I'll try doing a run of the full set of tests locally, to see if those match the earlier local results or something else, but it will take a while.

EDIT: The one g++-7 is complaining about gives the same test results on 2 runs locally; I had not run it the first couple of times I was checking just the 24-hour tests, which I turn out to have actually done 4 times with the same results (scrolled back far enough to see that!); will try another couple of times running just the weary tests as well as a full test run. That the Github g++-7 one is now giving different results, in contrast to those run locally with just the weary tests, says that it's at least partially something from another test - one that uses the differing RNG seed - corrupting internal state.

actual-nh · 2021-01-30T17:18:48Z

And now the Github g++-7 one is giving differing results on one of the two weary_recovery tasks... and I didn't touch the targets for those in the most recent commit! (See edited comment above for some thoughts on that.)

Note that one limitation of running all tests is that there doesn't seem a way to do --success for a subset of tests in Catch - so I'm either only going to see where tests fail, or will need to collect lots and lots of output... I suppose it won't be enough for my disk capacity to have problems.

Oh. It would help to just do the subset of tests that's currently in the same process in build.sh; will try that.

actual-nh · 2021-01-30T20:16:22Z

This one has all non-slow tests:
all_test1_log.txt.gz
Extract:
all_test1_weary_log_extract.txt

This one has just the weary tests, run with randomized order 4 times (2 before any changing of test targets, 2 after most recent update of test targets):
all_weary_tests_log.txt

I'm going to try doing just the weary tests 3 times, in the default order (or the ordering in my test list file - need to check which), to see if the RNG seed is what's causing different results in the second 2 of the above.

actual-nh · 2021-01-30T22:48:45Z

Hrm. I'm going to need to put in a clear_avatar() before the initial debug_weary_info. I had meant this to help tell what was happening just before, but it's getting affected by differing heights (and probably ages). This will be temporary, since it is redundant with do_activity having clear_avatar (unless do_activity has a parameter added for whether to clear the avatar; not sure how acceptable this would be).

Variable initial heights and weights is making having debug_weary_info before do_activity not useful. This adds a clear_avatar before them; unless do_activity is modified to take a flag as to whether to clear the avatar itself, this is likely temporary.

actual-nh · 2021-01-30T23:32:46Z

OK, with that done, there are no differences from different seeds in the default order (which is not dictated by the filter order in a file, BTW):

No differences between seeds, so long as they're in the same order, is also confirmed by various of the below with randomized order, with one slight exception so far.

Let's see if changing the order, but still only running the weary tests, will do things... yes; the problem with my earlier thinking that it wasn't was inadequate sample size, at least for the ones other than the 24 hour tests, looks like:

Different order 1 (assorted/24h/recovery): weary1_orderR_log.txt
Default order (assorted/recovery/24h): weary2_orderR_log.txt; duplicates weary2_log.txt
Different order 2 (24h/recovery/assorted): weary3_orderR_log.txt
Different order 1, new RNG seed: weary4_orderR_log.txt
Different order 3 (recovery/assorted/24h), new RNG seed: weary5_orderR_log.txt
Different order 2, new RNG seed: weary6_orderR_log.txt; duplicates weary3_orderR_log.txt
(Skipped uploading weary7 - duplicate of weary1_orderR_log.txt; it would be very nice if there was some way of specifying the order...)
Different order 4 (24h/assorted/recovery), new RNG seed: weary8_orderR_log.txt
(Skipped uploading weary9 - duplicate of weary3_orderR_log.txt)
(Skipped uploading wearyA - duplicate of weary1_log.txt)
(Skipped uploading wearyB - duplicate of weary3_orderR_log.txt)
(Skipped uploading wearyC - duplicate of weary5_orderR_log.txt)
(Skipped uploading wearyD - duplicate of weary8_orderR_log.txt)
Different order 5 (recovery/24h/assorted): wearyE_orderR_log.txt

My current thinking with regard to test targets, BTW, is that the ones with that set of weary tests first is probably the most reliable for what the test should be doing - provided that there isn't other evidence such as weariness level fluctuations indicating otherwise, which is unfortunately true (at least in my local runs) for both the 24 hour digging task and the assorted 8 and 12 hour digging tasks (and, for that matter, with digging 8 hours then waiting/resting 8 hours). The vehicle work one, with combinations of working and resting, is hard to tell, but until I see a run with fewer fluctuations, 995 instead of 980 seems the way to go.

Weary1_orderR_log.txt and weary4_orderR_log.txt, while not differing in order, do differ slightly - the weary thresholds fluctuate a bit in the vehicle work one:

<   Transition: Weariness lvl from 1 to 0 at 980 min (W 938 Th 940)
<   Transition: Weariness lvl from 0 to 1 at 985 min (W 944 Th 941)
<   Transition: Weariness lvl from 1 to 0 at 995 min (W 888 Th 942)
---
>   Transition: Weariness lvl from 1 to 0 at 980 min (W 938 Th 941)
>   Transition: Weariness lvl from 0 to 1 at 985 min (W 944 Th 942)
>   Transition: Weariness lvl from 1 to 0 at 995 min (W 888 Th 943)

OTOH, this is not true for any of the other orderings.

See comments in CleverRaven#46941 - most specifically issuecomment-770297143.

actual-nh · 2021-01-31T03:02:23Z

The Mac build (in the General build matrix) failed when the crafting_skill_gain test process aborted:
Assertion failed: (test_world != nullptr), function init_global_game_state, file ../tests/test_main.cpp, line 130.
I somehow doubt this is due to my modifications, particularly since they work fine locally. @BrettDong, if you or someone else on a 10.15+ machine could check out the saved cata_test "artifact"? (BTW, I have to wonder whether there is some unexpected interaction - file access/writing most likely - between the different test processes.)

BrettDong · 2021-01-31T03:18:59Z

Your guess is correct. I managed to reproduce this error if I run tests in parallel like build-scripts/build.sh does. I'm looking into this.

BrettDong · 2021-01-31T03:43:08Z

Should be fixed in #47134.

…aster' of https://github.com/CleverRaven/Cataclysm-DDA into set_test_kcal

actual-nh · 2021-01-31T18:21:03Z

I am now somewhat debating whether to remove the added clear_avatar() calls (and the debug_weary_info() calls they make useful); they are otherwise redundant with the same call done in do_activity. But they can give useful information, and there are still problems with weariness according to various of the weary tests (both interaction with other tests and the problematic fluctuation in weary levels during various tests - I may actually add tests to check for those). Any thoughts? @BrettDong? @anothersimulacrum?

anothersimulacrum · 2021-02-01T03:29:02Z

If it gives more information that may help diagnose a test failure, I don't see reason to strip it out.

actual-nh · 2021-02-01T05:00:51Z

If it gives more information that may help diagnose a test failure, I don't see reason to strip it out.

Well, the Travis tests finally completed with no errors; I think this is ready.

actual-nh · 2021-02-04T00:09:08Z

tests/player_helpers.cpp

    // This sets HP to max, clears addictions and morale,
    // and sets hunger, thirst, fatigue and such to zero
    dummy.environmental_revert_effect();


Anyone have any idea whether this originally intended to reset calories along with hunger "and such"?

Start on adding tests for unrealistic fluctuations in weary level; see CleverRaven#46384 (and some cases in CleverRaven#46941) for example problems. The initial tests look for problems with the weary_recovery task of digging for 8 hours then waiting for 8 hours; weary level should not go down in the first 8 hours, and should not go up in the second 8 hours.

* Set stored kcal to healthy kcal for testing Randomizing test orders reveals that one type of problematic interference is due to variable stored calories from prior activity. (cherry picked from commit 306bfb4cbd817d4db60e92807025549050a576ff) * Match (some) weary testing to kcal resetting Previous weary testing was distored by variable incoming kcal stored, with the current default test order giving more kcal than the expected 55,000. * Update numbers to match consensus test results At least currently, two different sources (Github General build g++-7 and Appveyor) are giving the same numbers for the 24-hour digging weary test. This updates the target test numbers to match these, with a bit of wiggle room in the direction of the earlier results (partially since Travis hasn't given any results yet). * Clear_avatar before initial debug_weary_info Variable initial heights and weights is making having debug_weary_info before do_activity not useful. This adds a clear_avatar before them; unless do_activity is modified to take a flag as to whether to clear the avatar itself, this is likely temporary. * Change the last target for the vehicle weary test See comments in CleverRaven#46941 - most specifically issuecomment-770297143. Co-authored-by: actual-nh <[email protected]>

Start on adding tests for unrealistic fluctuations in weary level; see CleverRaven#46384 (and some cases in CleverRaven#46941) for example problems. The initial tests look for problems with the weary_recovery task of digging for 8 hours then waiting for 8 hours; weary level should not go down in the first 8 hours, and should not go up in the second 8 hours. (cherry picked from commit bdd942b)

Set stored kcal to healthy kcal for testing

68e7b79

Randomizing test orders reveals that one type of problematic interference is due to variable stored calories from prior activity. (cherry picked from commit 306bfb4cbd817d4db60e92807025549050a576ff)

BrettDong added the Code: Tests label Jan 22, 2021

actual-nh added 2 commits January 27, 2021 22:19

Make sure all updates are present for checking weary tests - Merge br…

71d1e9d

…anch 'master' of https://github.com/CleverRaven/Cataclysm-DDA into set_test_kcal

Match (some) weary testing to kcal resetting

16cb808

Previous weary testing was distored by variable incoming kcal stored, with the current default test order giving more kcal than the expected 55,000.

Change the last target for the vehicle weary test

732315b

See comments in CleverRaven#46941 - most specifically issuecomment-770297143.

Import fixes for test errors, make sure tries again - Merge branch 'm…

fcddcf2

…aster' of https://github.com/CleverRaven/Cataclysm-DDA into set_test_kcal

actual-nh commented Feb 4, 2021

View reviewed changes

kevingranade merged commit 2f8f32f into CleverRaven:master Feb 6, 2021

actual-nh deleted the set_test_kcal branch February 6, 2021 02:30

actual-nh mentioned this pull request Feb 6, 2021

[CR] Catch weariness fluctuations #47273

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set stored kcal to healthy kcal for testing #46941

Set stored kcal to healthy kcal for testing #46941

actual-nh commented Jan 22, 2021 •

edited

Loading

BrettDong commented Jan 28, 2021 •

edited

Loading

actual-nh commented Jan 28, 2021 •

edited

Loading

actual-nh commented Jan 30, 2021

BrettDong commented Jan 30, 2021

actual-nh commented Jan 30, 2021 •

edited

Loading

BrettDong commented Jan 30, 2021

actual-nh commented Jan 30, 2021 •

edited

Loading

actual-nh commented Jan 30, 2021 •

edited

Loading

actual-nh commented Jan 30, 2021

actual-nh commented Jan 30, 2021

actual-nh commented Jan 30, 2021 •

edited

Loading

actual-nh commented Jan 31, 2021 •

edited

Loading

BrettDong commented Jan 31, 2021

BrettDong commented Jan 31, 2021

actual-nh commented Jan 31, 2021 •

edited

Loading

anothersimulacrum commented Feb 1, 2021

actual-nh commented Feb 1, 2021

actual-nh Feb 4, 2021

Set stored kcal to healthy kcal for testing #46941

Set stored kcal to healthy kcal for testing #46941

Conversation

actual-nh commented Jan 22, 2021 • edited Loading

Summary

Purpose of change

Describe the solution

Testing

Additional context

BrettDong commented Jan 28, 2021 • edited Loading

actual-nh commented Jan 28, 2021 • edited Loading

actual-nh commented Jan 30, 2021

BrettDong commented Jan 30, 2021

actual-nh commented Jan 30, 2021 • edited Loading

BrettDong commented Jan 30, 2021

actual-nh commented Jan 30, 2021 • edited Loading

actual-nh commented Jan 30, 2021 • edited Loading

actual-nh commented Jan 30, 2021

actual-nh commented Jan 30, 2021

actual-nh commented Jan 30, 2021 • edited Loading

actual-nh commented Jan 31, 2021 • edited Loading

BrettDong commented Jan 31, 2021

BrettDong commented Jan 31, 2021

actual-nh commented Jan 31, 2021 • edited Loading

anothersimulacrum commented Feb 1, 2021

actual-nh commented Feb 1, 2021

actual-nh Feb 4, 2021

Choose a reason for hiding this comment

actual-nh commented Jan 22, 2021 •

edited

Loading

BrettDong commented Jan 28, 2021 •

edited

Loading

actual-nh commented Jan 28, 2021 •

edited

Loading

actual-nh commented Jan 30, 2021 •

edited

Loading

actual-nh commented Jan 30, 2021 •

edited

Loading

actual-nh commented Jan 30, 2021 •

edited

Loading

actual-nh commented Jan 30, 2021 •

edited

Loading

actual-nh commented Jan 31, 2021 •

edited

Loading

actual-nh commented Jan 31, 2021 •

edited

Loading