Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CR] Randomize test order: catch interaction, etc bugs #46473

Closed
wants to merge 39 commits into from

Conversation

actual-nh
Copy link
Contributor

@actual-nh actual-nh commented Jan 2, 2021

Summary

SUMMARY: Infrastructure "Randomize test order to catch interactions, other bugs"

Purpose of change

As seen in #46439 and #46404, some test faults are triggered by differing test orders. Once this and other existing faults are fixed using data from testing this PR, it can be merged to help with future testing.

Describe the solution

Add --order rand when invoking cata_test.

Describe alternatives you've considered

I can't see any practical ones.

Testing

See #46439.

Additional context

I plan on using the results from this PR to fix the tests, test them on this PR, then put in other PRs for the individual fixes. (Unless someone has a better idea?)

Randomize ordering of tests to check for problems such as seen in CleverRaven#46439.
@BrettDong BrettDong added the Code: Tests Measurement, self-control, statistics, balancing. label Jan 2, 2021
Need to work on getting build.sh to use make check.
Do all of these _have_ to be using separate test-initialization methods?
@actual-nh
Copy link
Contributor Author

actual-nh commented Jan 2, 2021

I may do a PR to note in TESTING.md exactly where one needs to change file lines to globally modify what cata_test (or various alternate names) does...

@actual-nh
Copy link
Contributor Author

I have asked in Discussions (#46476) if someone with a Linux box can analyze the Github "artifacts".

Thanks to @anothersimulacrum - converts from crash to failure, allowing more analysis.
environmental_revert_effect removes hunger - it does not do anything (currently?) regarding stored kcal. (It is possible vitamins also need to be reset - will have to see.)
@actual-nh
Copy link
Contributor Author

actual-nh commented Jan 2, 2021

Current status:

  • Patch to add caloric replenishment to clear_avatar() has succeeded in eliminating starvation of the test subject. However, it may be altering weary test results a bit - will take a further look at thresholds, etc to see what's happening and what to do. (There is one of them that's technically looking for 585 - iirc - minutes, but then checking vs 590... and currently the result is 580 - which would be within 5 of 585.) The thresholds are also still fluctuating like crazy! One vitamin test result was also off (on travis only).
  • The manhack release test is failing because there isn't any room to do so. I will check if this is lacking a clear_map() before it.... yep, it's missing. Currently checking on personal test; works; trying patch.
  • The vehicle thing did happen in the general build, but not my first personal test (this time around); will check re travis - yes, it did.

Originally gave an error of unable to find an adjacent square to put the manhack in.
This is to avoid errors of not being able to place an NPC.
@actual-nh
Copy link
Contributor Author

actual-nh commented Jan 3, 2021

Currently:

@actual-nh
Copy link
Contributor Author

May want to alter the various cata_test calls to have --durations yes while doing --order rand, to make sure know what's before and after what.

Make sure have correct starting season length (and that "set_eternal_season" doesn't change it when turn eternal season on/off).
Fixing, and making sure did fix.
monster_test.cpp was altering `trigdist` and not restoring it. (Most notably visible in `vision_single_tile_skylight`.)
@actual-nh
Copy link
Contributor Author

actual-nh commented Jan 3, 2021

  • moon_test.cpp was altering season length and not restoring it.
  • calendar_test.cpp now sets the correct season length and checks to make sure set_eternal_season on then off did not disrupt it.
  • monster_test.cpp was altering trigdist and not putting it back to the same value. (A REQUIRE check in vision_single_tile_skylight - and/or the shadowcasting checks, which have an rl_dist in them - for trigdist == true may be indicated also.)

Just put `trigdist` back - the option itself should be reset when `override_option` goes out of (variable) scope and its destructor gets called. (A thank-you to @anothersimulacrum for pointing this out to me.)
@actual-nh actual-nh changed the title Randomize test order: catch interaction, etc bugs [CR] Randomize test order: catch interaction, etc bugs Jan 3, 2021
@actual-nh actual-nh marked this pull request as draft January 3, 2021 23:50
@actual-nh
Copy link
Contributor Author

actual-nh commented Jan 4, 2021

So, errors still seen:

Expand debug_weary_info (in character.cpp) to give enough information to reconstruct reasons for weary_threshold variations.
actual-nh added a commit to actual-nh/Cataclysm-DDA that referenced this pull request Jan 15, 2021
Intermittent errors in weary tests (see CleverRaven#46256) are hard to debug
without more information. While this information has been expanded
in CleverRaven#46473, this does not give enough examples, nor is it able to
tell what in "normal" builds is causing intermittent weary test
problems.
Give more detailed information on weariness (tracker and intake),
to try to figure out why keeps going up and down during tests.

Since the "healthy" stored calories are at bmi 25, put calories
to healthy minus debug_nutrition, to prevent going over.
@@ -72,7 +72,9 @@ void clear_character( player &dummy )
// This sets HP to max, clears addictions and morale,
// and sets hunger, thirst, fatigue and such to zero
dummy.environmental_revert_effect();
dummy.set_stored_kcal( dummy.get_healthy_kcal() ); // But not stored kcal
// However, the above does not set stored kcal;
// 2170 is calories of debug_nutrition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stick a REQUIRE or something here guaranteeing that?

Copy link
Contributor Author

@actual-nh actual-nh Jan 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point - but it's erroring in stomach contents bmr requirements, so I may need to revert that anyway. (I was trying to keep bmi in the 18.5-25 range, even after stomach contents were absorbed.)

@actual-nh
Copy link
Contributor Author

The Appveyor results from the above are the most informative regarding weary; Travis also has one of immediate interest, although approximately duplicated in Appveyor; general is from the mixed-work/food (and other non-weary), which is harder to interpret. I'm going to try removing the -2170 (and also do a slight bit of reformatting so messages don't wrap) next, somewhat to confirm these results (from which it appears the intake is what's fluctuating weariness up and down...).

The caloric subtraction (minus calories for debug_nutrition) is causing
errors in other tests, and it is also desirable to make sure it isn't
doing anything to the weariness tests themselves (weary intake).

With the new information (weary tracker and intake), the summarize
transition output is linewrapping; trying to prevent.
@actual-nh
Copy link
Contributor Author

actual-nh commented Jan 19, 2021

The general, travis, and appveyor test failures are indicating that the 24-hour-dig task's fluctuating from 1 to 0 to 1 in weariness level is due to intake going up (as calories are absorbed from the guts). They unfortunately do not give information for the 8-hours-dig, 8-hours-wait A local test using --success to get full test output is yielding about the same for the 24-hour-dig task, but the 8-hours/8-hours task seems to be fluctuating due to more interactions between tracker and intake:

Digging Pits 8 hours, then waiting 8:
  Weariness: 0 Max Full Exert: EXTRA_EXERCISE Mult: 1
  BMR: 1738 Intake: 0 Tracker: 0 Thresh: 1038 At: 0
  Cal: 55000/55000 Fatigue: 0 Morale: 0 Wgt: 76562 (BMI 25.0)
  Transition: Weary lvl 0 to 1 at 120 min (W 1028 Th 1020 Tr 1450 In 843)
  Transition: Weary lvl 1 to 2 at 250 min (W 2025 Th 1000 Tr 2702 In 1353)
  Transition: Weary lvl 2 to 3 at 360 min (W 2740 Th 983 Tr 3494 In 1507)
  Transition: Weary lvl 3 to 4 at 465 min (W 3222 Th 968 Tr 3997 In 1549)
  Transition: Weary lvl 4 to 3 at 520 min (W 3128 Th 960 Tr 3889 In 1521)
  Transition: Weary lvl 3 to 4 at 540 min (W 3181 Th 957 Tr 3913 In 1463)
  Transition: Weary lvl 4 to 3 at 550 min (W 2997 Th 956 Tr 3729 In 1463)
  Transition: Weary lvl 3 to 2 at 640 min (W 2581 Th 944 Tr 3296 In 1429)
  Transition: Weary lvl 2 to 3 at 650 min (W 2593 Th 943 Tr 3308 In 1429)
  Transition: Weary lvl 3 to 2 at 670 min (W 2483 Th 940 Tr 3166 In 1366)
  Transition: Weary lvl 2 to 1 at 875 min (W 1815 Th 911 Tr 2418 In 1205)
  Transition: Weary lvl 1 to 2 at 885 min (W 1827 Th 910 Tr 2430 In 1205)
  Transition: Weary lvl 2 to 1 at 905 min (W 1757 Th 907 Tr 2332 In 1149)
  Weariness: 1733 Max Full Exert: ACTIVE_EXERCISE Mult: 0.8
  BMR: 1722 Intake: 1093 Tracker: 2280 Thresh: 900 At: 1
  Cal: 52525/55000 Fatigue: 193 Morale: 0 Wgt: 74908 (BMI 24.5)

I suspect the above is due to intake being subtracted every 12 times try_reduce_weariness is called, while tracker can be subtracted up to 4 times that frequently (if sleeping; up to 2 times that frequently if NO_EXERCISE, which should be the case after 480 minutes here). Some of this may be relieved by #45316, but not the 24-hour-dig test problem. This may also be helped by tracking caloric intake over a longer time-frame (say, 24 hours; note #36976). Note that the use of "weariness levels" accentuates any fluctuations; perhaps a system with some hysteresis would help?

However... @I-am-Erk, @anothersimulacrum: I am unclear on the exact purpose of factoring caloric intake into weariness. (Currently, one would almost say the character has hypoglycemia...)

@jbytheway
Copy link
Contributor

Really glad to see someone working on getting randomly ordered tests working. It's something that was on my todo list but I probably never would have got around to.

I just want to make sure you're aware of tools/reduce_tests.sh. It's really helpful for figuring out the source of inter-test dependencies (though, if you're on Windows I'm not sure how much help it would be. You might still be able to use it via one of the various Windows shells).

If there are specific things you'd like to see tested on Linux, let me know, and I'll try to get to them.

@actual-nh
Copy link
Contributor Author

Really glad to see someone working on getting randomly ordered tests working. It's something that was on my todo list but I probably never would have got around to.

Understand! Things are about to get busier for me at work (teaching Anatomy & Physiology), but I'll do as much as I can. (Once the parallel tests stabilize a bit, I will see about adapting it for randomized test orders - should not be difficult; sticking --order rand into EXTRA_TEST_OPTS seems the best to adapt to this and any other changes.)

I just want to make sure you're aware of tools/reduce_tests.sh. It's really helpful for figuring out the source of inter-test dependencies (though, if you're on Windows I'm not sure how much help it would be. You might still be able to use it via one of the various Windows shells).

Thanks! I am on a Mac, so that works; will take a look at now.

If there are specific things you'd like to see tested on Linux, let me know, and I'll try to get to them.

Thanks!

@actual-nh
Copy link
Contributor Author

@anothersimulacrum: Feel free to put a Feature Freeze on this one.

…into patch-3; not running cases (android on Travis; appveyor) without testing, nor single-test jobs
@actual-nh actual-nh added the (P5 - Long-term) Long-term WIP, may stay on the list for a while. label Aug 8, 2021
@Night-Pryanik
Copy link
Contributor

I suppose this PR is abandoned, since last commit was a year and a half ago? I'm not sure if it should stay in PR tracker, especially with this many unresolved conflicts.

@Night-Pryanik
Copy link
Contributor

Closing as abandoned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code: Tests Measurement, self-control, statistics, balancing. (P5 - Long-term) Long-term WIP, may stay on the list for a while.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants