[CR] Randomize test order: catch interaction, etc bugs #46473

actual-nh · 2021-01-02T02:03:37Z

Summary

SUMMARY: Infrastructure "Randomize test order to catch interactions, other bugs"

Purpose of change

As seen in #46439 and #46404, some test faults are triggered by differing test orders. Once this and other existing faults are fixed using data from testing this PR, it can be merged to help with future testing.

Describe the solution

Add --order rand when invoking cata_test.

Describe alternatives you've considered

I can't see any practical ones.

Testing

See #46439.

Additional context

I plan on using the results from this PR to fix the tests, test them on this PR, then put in other PRs for the individual fixes. (Unless someone has a better idea?)

Randomize ordering of tests to check for problems such as seen in CleverRaven#46439.

Need to work on getting build.sh to use make check.

Do all of these _have_ to be using separate test-initialization methods?

actual-nh · 2021-01-02T03:26:48Z

I may do a PR to note in TESTING.md exactly where one needs to change file lines to globally modify what cata_test (or various alternate names) does...

actual-nh · 2021-01-02T03:59:20Z

I have asked in Discussions (#46476) if someone with a Linux box can analyze the Github "artifacts".

@anothersimulacrum

Thanks to @anothersimulacrum - converts from crash to failure, allowing more analysis.

environmental_revert_effect removes hunger - it does not do anything (currently?) regarding stored kcal. (It is possible vitamins also need to be reset - will have to see.)

actual-nh · 2021-01-02T20:25:18Z

Current status:

Patch to add caloric replenishment to clear_avatar() has succeeded in eliminating starvation of the test subject. However, it may be altering weary test results a bit - will take a further look at thresholds, etc to see what's happening and what to do. (There is one of them that's technically looking for 585 - iirc - minutes, but then checking vs 590... and currently the result is 580 - which would be within 5 of 585.) The thresholds are also still fluctuating like crazy! One vitamin test result was also off (on travis only).
The manhack release test is failing because there isn't any room to do so. I will check if this is lacking a clear_map() before it.... yep, it's missing. Currently checking on personal test; works; trying patch.
The vehicle thing did happen in the general build, but not my first personal test (this time around); will check re travis - yes, it did.

Originally gave an error of unable to find an adjacent square to put the manhack in.

This is to avoid errors of not being able to place an NPC.

actual-nh · 2021-01-03T03:01:15Z

Currently:

Several involving time (and light levels). Possibly something is leaving a modified value for season length?
The biometrics test may be making assumptions as to how much activity the avatar has done earlier; not sure, though.
Vehicle level test keeps erroring (no surprise; vehicle_level_test - null dmon_p pointer (and running into the player) #46441).
Weary checks are still erroring (original being used for comparison may have had the level going up and down way too much, however), with the weary threshold still rather unstable (see Weariness fluctuating unexpectedly/inconsistently in weary tests #46384 for a bit more on this - may add more output to debug_weary_info to start investigating this more).

actual-nh · 2021-01-03T19:08:33Z

May want to alter the various cata_test calls to have --durations yes while doing --order rand, to make sure know what's before and after what.

Make sure have correct starting season length (and that "set_eternal_season" doesn't change it when turn eternal season on/off).

Fixing, and making sure did fix.

monster_test.cpp was altering `trigdist` and not restoring it. (Most notably visible in `vision_single_tile_skylight`.)

actual-nh · 2021-01-03T21:20:40Z

moon_test.cpp was altering season length and not restoring it.
calendar_test.cpp now sets the correct season length and checks to make sure set_eternal_season on then off did not disrupt it.
monster_test.cpp was altering trigdist and not putting it back to the same value. (A REQUIRE check in vision_single_tile_skylight - and/or the shadowcasting checks, which have an rl_dist in them - for trigdist == true may be indicated also.)

tests/monster_test.cpp

@anothersimulacrum

Just put `trigdist` back - the option itself should be reset when `override_option` goes out of (variable) scope and its destructor gets called. (A thank-you to @anothersimulacrum for pointing this out to me.)

actual-nh · 2021-01-04T00:07:01Z

So, errors still seen:

Weary_recovery and/or weary_24h_tasks - virtually always at least one; usually both; at least 50% have weird fluctuations in weariness levels and/or weariness threshold (Weariness fluctuating unexpectedly/inconsistently in weary tests #46384)
Vehicle_level_test (vehicle_level_test - null dmon_p pointer (and running into the player) #46441) - usually seen
All_nutrition_starve_test (for vitamin C) - sometimes seen
Intermittently a few others like item name tests and a field test (Test/CI builds (intermittently) failing on field, possibly weary tests (unrelated to pull requests) #46256 shows other examples of the latter)

Expand debug_weary_info (in character.cpp) to give enough information to reconstruct reasons for weary_threshold variations.

…m/CleverRaven/Cataclysm-DDA into patch-3

…h 'master' of https://github.com/CleverRaven/Cataclysm-DDA into patch-3)

Intermittent errors in weary tests (see CleverRaven#46256) are hard to debug without more information. While this information has been expanded in CleverRaven#46473, this does not give enough examples, nor is it able to tell what in "normal" builds is causing intermittent weary test problems.

…into patch-3

Give more detailed information on weariness (tracker and intake), to try to figure out why keeps going up and down during tests. Since the "healthy" stored calories are at bmi 25, put calories to healthy minus debug_nutrition, to prevent going over.

anothersimulacrum · 2021-01-19T03:01:47Z

tests/player_helpers.cpp

@@ -72,7 +72,9 @@ void clear_character( player &dummy )
    // This sets HP to max, clears addictions and morale,
    // and sets hunger, thirst, fatigue and such to zero
    dummy.environmental_revert_effect();
-    dummy.set_stored_kcal( dummy.get_healthy_kcal() ); // But not stored kcal
+    // However, the above does not set stored kcal;
+    // 2170 is calories of debug_nutrition


Stick a REQUIRE or something here guaranteeing that?

Point - but it's erroring in stomach contents bmr requirements, so I may need to revert that anyway. (I was trying to keep bmi in the 18.5-25 range, even after stomach contents were absorbed.)

actual-nh · 2021-01-19T16:21:44Z

The Appveyor results from the above are the most informative regarding weary; Travis also has one of immediate interest, although approximately duplicated in Appveyor; general is from the mixed-work/food (and other non-weary), which is harder to interpret. I'm going to try removing the -2170 (and also do a slight bit of reformatting so messages don't wrap) next, somewhat to confirm these results (from which it appears the intake is what's fluctuating weariness up and down...).

The caloric subtraction (minus calories for debug_nutrition) is causing errors in other tests, and it is also desirable to make sure it isn't doing anything to the weariness tests themselves (weary intake). With the new information (weary tracker and intake), the summarize transition output is linewrapping; trying to prevent.

actual-nh · 2021-01-19T19:27:35Z

The general, travis, and appveyor test failures are indicating that the 24-hour-dig task's fluctuating from 1 to 0 to 1 in weariness level is due to intake going up (as calories are absorbed from the guts). They unfortunately do not give information for the 8-hours-dig, 8-hours-wait A local test using --success to get full test output is yielding about the same for the 24-hour-dig task, but the 8-hours/8-hours task seems to be fluctuating due to more interactions between tracker and intake:

Digging Pits 8 hours, then waiting 8:
  Weariness: 0 Max Full Exert: EXTRA_EXERCISE Mult: 1
  BMR: 1738 Intake: 0 Tracker: 0 Thresh: 1038 At: 0
  Cal: 55000/55000 Fatigue: 0 Morale: 0 Wgt: 76562 (BMI 25.0)
  Transition: Weary lvl 0 to 1 at 120 min (W 1028 Th 1020 Tr 1450 In 843)
  Transition: Weary lvl 1 to 2 at 250 min (W 2025 Th 1000 Tr 2702 In 1353)
  Transition: Weary lvl 2 to 3 at 360 min (W 2740 Th 983 Tr 3494 In 1507)
  Transition: Weary lvl 3 to 4 at 465 min (W 3222 Th 968 Tr 3997 In 1549)
  Transition: Weary lvl 4 to 3 at 520 min (W 3128 Th 960 Tr 3889 In 1521)
  Transition: Weary lvl 3 to 4 at 540 min (W 3181 Th 957 Tr 3913 In 1463)
  Transition: Weary lvl 4 to 3 at 550 min (W 2997 Th 956 Tr 3729 In 1463)
  Transition: Weary lvl 3 to 2 at 640 min (W 2581 Th 944 Tr 3296 In 1429)
  Transition: Weary lvl 2 to 3 at 650 min (W 2593 Th 943 Tr 3308 In 1429)
  Transition: Weary lvl 3 to 2 at 670 min (W 2483 Th 940 Tr 3166 In 1366)
  Transition: Weary lvl 2 to 1 at 875 min (W 1815 Th 911 Tr 2418 In 1205)
  Transition: Weary lvl 1 to 2 at 885 min (W 1827 Th 910 Tr 2430 In 1205)
  Transition: Weary lvl 2 to 1 at 905 min (W 1757 Th 907 Tr 2332 In 1149)
  Weariness: 1733 Max Full Exert: ACTIVE_EXERCISE Mult: 0.8
  BMR: 1722 Intake: 1093 Tracker: 2280 Thresh: 900 At: 1
  Cal: 52525/55000 Fatigue: 193 Morale: 0 Wgt: 74908 (BMI 24.5)

I suspect the above is due to intake being subtracted every 12 times try_reduce_weariness is called, while tracker can be subtracted up to 4 times that frequently (if sleeping; up to 2 times that frequently if NO_EXERCISE, which should be the case after 480 minutes here). Some of this may be relieved by #45316, but not the 24-hour-dig test problem. This may also be helped by tracking caloric intake over a longer time-frame (say, 24 hours; note #36976). Note that the use of "weariness levels" accentuates any fluctuations; perhaps a system with some hysteresis would help?

However... @I-am-Erk, @anothersimulacrum: I am unclear on the exact purpose of factoring caloric intake into weariness. (Currently, one would almost say the character has hypoglycemia...)

…into patch-3

jbytheway · 2021-01-27T02:41:18Z

Really glad to see someone working on getting randomly ordered tests working. It's something that was on my todo list but I probably never would have got around to.

I just want to make sure you're aware of tools/reduce_tests.sh. It's really helpful for figuring out the source of inter-test dependencies (though, if you're on Windows I'm not sure how much help it would be. You might still be able to use it via one of the various Windows shells).

If there are specific things you'd like to see tested on Linux, let me know, and I'll try to get to them.

actual-nh · 2021-01-27T02:57:07Z

Really glad to see someone working on getting randomly ordered tests working. It's something that was on my todo list but I probably never would have got around to.

Understand! Things are about to get busier for me at work (teaching Anatomy & Physiology), but I'll do as much as I can. (Once the parallel tests stabilize a bit, I will see about adapting it for randomized test orders - should not be difficult; sticking --order rand into EXTRA_TEST_OPTS seems the best to adapt to this and any other changes.)

I just want to make sure you're aware of tools/reduce_tests.sh. It's really helpful for figuring out the source of inter-test dependencies (though, if you're on Windows I'm not sure how much help it would be. You might still be able to use it via one of the various Windows shells).

Thanks! I am on a Mac, so that works; will take a look at now.

If there are specific things you'd like to see tested on Linux, let me know, and I'll try to get to them.

Thanks!

actual-nh · 2021-02-23T05:50:53Z

@anothersimulacrum: Feel free to put a Feature Freeze on this one.

…into patch-3; not running cases (android on Travis; appveyor) without testing, nor single-test jobs

Night-Pryanik · 2022-08-05T13:26:11Z

I suppose this PR is abandoned, since last commit was a year and a half ago? I'm not sure if it should stay in PR tracker, especially with this many unresolved conflicts.

Night-Pryanik · 2022-09-29T12:58:21Z

Closing as abandoned.

Randomize test order: catch interaction, etc bugs

3257605

Randomize ordering of tests to check for problems such as seen in CleverRaven#46439.

BrettDong added the Code: Tests Measurement, self-control, statistics, balancing. label Jan 2, 2021

actual-nh added 3 commits January 1, 2021 22:14

Sigh... forgot about build.sh.

91f70f6

Need to work on getting build.sh to use make check.

And .appveyor.yml

e88968a

Do all of these _have_ to be using separate test-initialization methods?

Yet another place...

1513390

actual-nh mentioned this pull request Jan 2, 2021

vehicle_level_test - null dmon_p pointer (and running into the player) #46441

Open

actual-nh added 3 commits January 2, 2021 00:39

Add dmon_p requirement

1d08eab

Thanks to @anothersimulacrum - converts from crash to failure, allowing more analysis.

Revert stored kcal to healthy in clear_avatar

8e8464f

environmental_revert_effect removes hunger - it does not do anything (currently?) regarding stored kcal. (It is possible vitamins also need to be reset - will have to see.)

Merge branch 'master' into patch-3

1d91865

actual-nh mentioned this pull request Jan 2, 2021

Close #46439 (segmentation error) #46498

Merged

actual-nh added 2 commits January 2, 2021 18:26

Add clear_map to manhack test

0046e20

Originally gave an error of unable to find an adjacent square to put the manhack in.

Add clear_map to prep_test

19d0847

This is to avoid errors of not being able to place an NPC.

actual-nh added 3 commits January 3, 2021 14:19

calendar_test.cpp season length

08e4efa

Make sure have correct starting season length (and that "set_eternal_season" doesn't change it when turn eternal season on/off).

moon_test.cpp is leaving season_length changed

9303d2d

Fixing, and making sure did fix.

Restore trigdist after monster_test.cpp changes

4284b62

monster_test.cpp was altering `trigdist` and not restoring it. (Most notably visible in `vision_single_tile_skylight`.)

actual-nh added 2 commits January 3, 2021 16:23

Satisfy (I hope) astyle

890469c

Aspell - again!

41830f7

anothersimulacrum reviewed Jan 3, 2021

View reviewed changes

tests/monster_test.cpp Outdated Show resolved Hide resolved

Just put trigdist back

ad758ae

Just put `trigdist` back - the option itself should be reset when `override_option` goes out of (variable) scope and its destructor gets called. (A thank-you to @anothersimulacrum for pointing this out to me.)

actual-nh changed the title ~~Randomize test order: catch interaction, etc bugs~~ [CR] Randomize test order: catch interaction, etc bugs Jan 3, 2021

actual-nh marked this pull request as draft January 3, 2021 23:50

Expand debug_weary_info

e559d90

Expand debug_weary_info (in character.cpp) to give enough information to reconstruct reasons for weary_threshold variations.

actual-nh mentioned this pull request Jan 4, 2021

Weariness fluctuating unexpectedly/inconsistently in weary tests #46384

Closed

Making easier to compare - Merge branch 'master' of https://github.co…

1aecd2e

…m/CleverRaven/Cataclysm-DDA into patch-3

This was referenced Jan 12, 2021

Restore old trigdist setting after tests #46686

Merged

Prevent fatigue below -20 affecting weariness threshold #46688

Merged

Updating patch-3 to current master for easier comparison (Merge branc…

d60e11d

…h 'master' of https://github.com/CleverRaven/Cataclysm-DDA into patch-3)

actual-nh mentioned this pull request Jan 15, 2021

Expand weary (debugging) info #46757

Merged

actual-nh added 2 commits January 18, 2021 20:19

Merge branch 'master' of https://github.com/CleverRaven/Cataclysm-DDA …

b26c431

…into patch-3

anothersimulacrum reviewed Jan 19, 2021

View reviewed changes

anothersimulacrum mentioned this pull request Jan 20, 2021

Remove randomness from metabolism #46906

Merged

This was referenced Jan 22, 2021

Batch test executions by default #46934

Merged

Set stored kcal to healthy kcal for testing #46941

Merged

Merge branch 'master' of https://github.com/CleverRaven/Cataclysm-DDA …

b4efe39

…into patch-3

This was referenced Jan 31, 2021

Use GNU Parallel to run tests concurrently #47135

Merged

[CR] Catch weariness fluctuations #47273

Closed

actual-nh mentioned this pull request Feb 21, 2021

Reduce weariness instability (includes tests); clarified #47653

Merged

anothersimulacrum added the 0.F Feature Freeze label Feb 23, 2021

Merge branch 'master' of https://github.com/CleverRaven/Cataclysm-DDA …

19b2a7c

…into patch-3; not running cases (android on Travis; appveyor) without testing, nor single-test jobs

actual-nh mentioned this pull request Feb 28, 2021

[CR] Test order randomization using a focused set of tests #47791

Closed

Night-Pryanik removed the 0.F Feature Freeze label Aug 8, 2021

actual-nh added the (P5 - Long-term) Long-term WIP, may stay on the list for a while. label Aug 8, 2021

Night-Pryanik closed this Sep 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CR] Randomize test order: catch interaction, etc bugs #46473

[CR] Randomize test order: catch interaction, etc bugs #46473

actual-nh commented Jan 2, 2021 •

edited

Loading

actual-nh commented Jan 2, 2021 •

edited

Loading

actual-nh commented Jan 2, 2021

actual-nh commented Jan 2, 2021 •

edited

Loading

actual-nh commented Jan 3, 2021 •

edited

Loading

actual-nh commented Jan 3, 2021

actual-nh commented Jan 3, 2021 •

edited

Loading

actual-nh commented Jan 4, 2021 •

edited

Loading

anothersimulacrum Jan 19, 2021

actual-nh Jan 19, 2021 •

edited

Loading

actual-nh commented Jan 19, 2021

actual-nh commented Jan 19, 2021 •

edited

Loading

jbytheway commented Jan 27, 2021

actual-nh commented Jan 27, 2021

actual-nh commented Feb 23, 2021

Night-Pryanik commented Aug 5, 2022

Night-Pryanik commented Sep 29, 2022

[CR] Randomize test order: catch interaction, etc bugs #46473

[CR] Randomize test order: catch interaction, etc bugs #46473

Conversation

actual-nh commented Jan 2, 2021 • edited Loading

Summary

Purpose of change

Describe the solution

Describe alternatives you've considered

Testing

Additional context

actual-nh commented Jan 2, 2021 • edited Loading

actual-nh commented Jan 2, 2021

actual-nh commented Jan 2, 2021 • edited Loading

actual-nh commented Jan 3, 2021 • edited Loading

actual-nh commented Jan 3, 2021

actual-nh commented Jan 3, 2021 • edited Loading

actual-nh commented Jan 4, 2021 • edited Loading

anothersimulacrum Jan 19, 2021

Choose a reason for hiding this comment

actual-nh Jan 19, 2021 • edited Loading

Choose a reason for hiding this comment

actual-nh commented Jan 19, 2021

actual-nh commented Jan 19, 2021 • edited Loading

jbytheway commented Jan 27, 2021

actual-nh commented Jan 27, 2021

actual-nh commented Feb 23, 2021

Night-Pryanik commented Aug 5, 2022

Night-Pryanik commented Sep 29, 2022

actual-nh commented Jan 2, 2021 •

edited

Loading

actual-nh commented Jan 2, 2021 •

edited

Loading

actual-nh commented Jan 2, 2021 •

edited

Loading

actual-nh commented Jan 3, 2021 •

edited

Loading

actual-nh commented Jan 3, 2021 •

edited

Loading

actual-nh commented Jan 4, 2021 •

edited

Loading

actual-nh Jan 19, 2021 •

edited

Loading

actual-nh commented Jan 19, 2021 •

edited

Loading