[CR] Test order randomization using a focused set of tests #47791

actual-nh · 2021-02-28T05:04:16Z

Summary

None

Purpose of change

This is a more-focused version of #46473. More specifically, it focuses on ordering of tests involved in weariness and nutrition, with more focus on the former. It includes modifications from #47653 and #47273.

Describe the solution

One noticeable set of tests that alter depending on ordering is those in weary_test.cpp. This PR does testing of only those, and of those plus some possibly-related nutrition tests. It rearranges what architectures/compilers some testing is run on, to allow for test failures and still get useful input, but also try to limit its occupation of PR-testing resources. It does runs both with and without rearrangement using the same seed.

Describe alternatives you've considered

My local computer unfortunately takes several minutes (during which its responsiveness to other commands is reduced) to even start up a world for weariness testing, making repeated test runs here painful - and, even then, uninformative on architecture/compiler differences.

Additional context

Not meant for merging.

Give more detailed information on weariness (tracker and intake), to try to figure out why keeps going up and down during tests. Since the "healthy" stored calories are at bmi 25, put calories to healthy minus debug_nutrition, to prevent going over. (cherry picked from commit b897540)

The caloric subtraction (minus calories for debug_nutrition) is causing errors in other tests, and it is also desirable to make sure it isn't doing anything to the weariness tests themselves (weary intake). With the new information (weary tracker and intake), the summarize transition output is linewrapping; trying to prevent. (cherry picked from commit e967767)

Start on adding tests for unrealistic fluctuations in weary level; see CleverRaven#46384 (and some cases in CleverRaven#46941) for example problems. The initial tests look for problems with the weary_recovery task of digging for 8 hours then waiting for 8 hours; weary level should not go down in the first 8 hours, and should not go up in the second 8 hours.

In some conditions, namely continuous exercise at the same level, a decrease in weariness level is unrealistic. Check for this.

Heavy tasks, while a logical section, do have the problem of repeating the earlier task's information, making it harder to tell which task triggered the message. Also make debug_weary_info() more informative using additional clear_avatar().

Forgot to copy over identifier declaration.

Weary levels keep fluctuating unrealistically, probably because weary.intake (and weary.tracker?) are changing in large jumps at times.

This adjusts expected test values to the (quite consistent) ones for the altered weary intake/tracker. It also does the weary.tracker adjustment in a less-perfectionistic (but more-functional) way.

Two of the tests are doing a consistent failure due to fluctuations. Ultimately, these fluctuations need eliminating, but it would be nice to get some information on if anything else is going on.

This is an extension of a previous modification to try to smooth out the weary.intake reduction (smaller changes but more frequent), and both increases that and does similar for weary.tracker. The weary.intake changes are leading up to - probably after 0.F - an exponential moving average being used instead, so that characters are not, essentially hypoglycemic.

This monitors, with weary level transitions, the low_activity_ticks and tick_counter, to see if this can help figure out why weary.tracker is increasing while resting. (cherry picked from commit 4c4b9cd)

As far as I can tell, the cause of the weary.tracker going up during resting periods is that rest does still expend calories (bmr), and it happens every 5 minutes - while weary.tracker was only reduced every 30 (current) or 15 (this branch) minutes. This commit makes weary.tracker reduction occur every 5 minutes - every time try_reduce_weariness() is called.

Some of the test times were being altered by the every-30-minute (awake) weary.tracker reductions. Alter to match new ones, also taking into account local testing (including in scrambled order). While with some scrambled tests am seeing inconsistencies between 8-hour and 12-hour digging, 8-hour without fluctuations indicated 3->4 should not be 470 minutes, but no more than 465 - which it already was for 12-hour, weirdly enough (oops by me earlier?).

Start on adding tests for unrealistic fluctuations in weary level; see CleverRaven#46384 (and some cases in CleverRaven#46941) for example problems. The initial tests look for problems with the weary_recovery task of digging for 8 hours then waiting for 8 hours; weary level should not go down in the first 8 hours, and should not go up in the second 8 hours. (cherry picked from commit bdd942b)

In some conditions, namely continuous exercise at the same level, a decrease in weariness level is unrealistic. Check for this. (cherry picked from commit 49fc687)

Weary levels keep fluctuating unrealistically, probably because weary.intake (and weary.tracker?) are changing in large jumps at times. (cherry picked from commit db03ee5)

This adjusts expected test values to the (quite consistent) ones for the altered weary intake/tracker. It also does the weary.tracker adjustment in a less-perfectionistic (but more-functional) way. (cherry picked from commit 028b0f0)

This is an extension of a previous modification to try to smooth out the weary.intake reduction (smaller changes but more frequent), and both increases that and does similar for weary.tracker. The weary.intake changes are leading up to - probably after 0.F - an exponential moving average being used instead, so that characters are not, essentially hypoglycemic. (cherry picked from commit ab4ba67)

As far as I can tell, the cause of the weary.tracker going up during resting periods is that rest does still expend calories (bmr), and it happens every 5 minutes - while weary.tracker was only reduced every 30 (current) or 15 (this branch) minutes. This commit makes weary.tracker reduction occur every 5 minutes - every time try_reduce_weariness() is called. (cherry picked from commit 5d9dcb2)

Some of the test times were being altered by the every-30-minute (awake) weary.tracker reductions. Alter to match new ones, also taking into account local testing (including in scrambled order). While with some scrambled tests am seeing inconsistencies between 8-hour and 12-hour digging, 8-hour without fluctuations indicated 3->4 should not be 470 minutes, but no more than 465 - which it already was for 12-hour, weirdly enough (oops by me earlier?). (cherry picked from commit c68507b)

It is going to require more work - most likely adjusting weary.intake to an exponential moving average - to get rid of these two failed tests. (Note that these were added tests in the first place; the failure - weariness fluctuation - in question was happening before but simply wasn't detected.)

This clears the avatar prior to the (potential) debug output so what is actually the case inside do_activity can be seen.

Merge branch 'master' of https://github.com/CleverRaven/Cataclysm-DDA into weariness_fluctuations_1

This (near-duplicate of have_weary_decrease()) is not in use, so removing to be in accord with CDDA guidelines.

(This does not include any build-scripts/build.sh alterations.) In order to get more of an idea of what's happening with some tests known to have different failure patterns depending on the test order, namely those related to weariness, set up for eventually doing only a few tests, but in different orders and on different architectures.

…h_weariness_fluctuations' into weariness_order

…aven/Cataclysm-DDA into weariness_fluctuations_1

Given rearrangements in what tests will not happen if others fail, this is no longer needed to get more information. Some changes may be needed in said rearrangements, however.

…lse)

Try to ensure not "hogging" github jobs, despite fail-fast: false.

actual-nh · 2021-03-09T03:05:05Z

Gods-dammit, I'm an idiot. I'm sorry. I failed to realize that Github would be doing the matrix completely in parallel across multiple machines, INCLUDING finding out the date on different machines.

EDIT: I didn't spot it earlier because they turn out to be rather well synchronized...

~~Do you happen to know whether clara::detail::convertInto( seed, config.rngSeed ); would accept something like the sha1 commit hash, if I stuck a 0x on the front? If not, I'll check further into it.~~

~~EDIT: Looks like all I have to do is try using it and it should indicate whether it's a proper seed. Good. Not sure if I'm up to putting this into build.sh tonight, but I may try.~~

jbytheway · 2021-03-09T11:36:16Z

I'd expect that it would expect a hex-formatted number. But if it doesn't you can convert to decimal using bash

$ echo $((0x100))
256

Correct my error from earlier re date as seed. Also should make Github and Travis use the same seed.

actual-nh · 2021-03-09T17:04:08Z

I'd expect that it would expect a hex-formatted number. But if it doesn't you can convert to decimal using bash
$ echo $((0x100))
256

It probably would accept a small hex-formatted number, but it never printed a seed when I tried one the length of a SHA hash. I wound up doing $(( 0x3f1fabb706e42a609724c782298e0dfef85bb154 % 1000000000 )) - one that length should work; the 1000000000 is derived from the shuf invocation used as a seed in another script.

TRAVIS_PULL_REQUEST_SHA is not the correct SHA hash (and indeed may not change until the base branch is altered); see if TRAVIS_COMMIT will correspond with GITHUB_SHA.

actual-nh · 2021-03-09T19:55:52Z

GITHUB_SHA does not seem to work properly; I have put in an issue at the best place I could find, namely github/docs#4422.

actual-nh · 2021-03-16T18:48:59Z

Github docs was pretty useless (although hopefully the documentation will be changed to reflect that GITHUB_SHA is the prior commit - making it rather useless for coordinating... sigh).

Co-authored-by: anothersimulacrum <[email protected]>

Co-authored-by: anothersimulacrum <[email protected]> (cherry picked from commit a0e00b7)

Co-authored-by: anothersimulacrum <[email protected]> (cherry picked from commit e11fcc4)

https://github.com/CleverRaven/Cataclysm-DDA into weariness_order

…' of https://github.com/CleverRaven/Cataclysm-DDA into weariness_fluctuations_1

…leverRaven/Cataclysm-DDA into activity_per_turn_weariness_fluctuations

I forgot to include <cmath> for std::ceil, std::floor, and std::pow when I moved things over to activity_tracker.cpp (character.cpp had it already... as well as lots of others, of course!).

Given ccache, I think it was a bad idea to modify the runs, plus data so far indicate no changes across machine/compiler for weary. Admittedly, this gives a bit less data from different seeds.

actual-nh · 2021-04-01T22:12:19Z

These are designed to get as much info as possible regarding the weary tests with the turn-by-turn activity measurement changes (so far rather few!) combined with #47653's weary changes.

stale · 2021-07-09T11:58:50Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Please do not 'bump' or comment on this issue unless you are actively working on it. Stale issues, and stale issues that are closed are still considered.

Night-Pryanik · 2022-08-05T13:26:37Z

I suppose this PR is abandoned, since last commit was a year and a half ago? I'm not sure if it should stay in PR tracker, especially with this many unresolved conflicts.

Night-Pryanik · 2022-09-29T12:58:46Z

Closing as abandoned.

actual-nh added 30 commits February 5, 2021 23:20

Add further weariness fluctuation testing

49fc687

In some conditions, namely continuous exercise at the same level, a decrease in weariness level is unrealistic. Check for this.

Adjust weary test output for readability

7b82317

Heavy tasks, while a logical section, do have the problem of repeating the earlier task's information, making it harder to tell which task triggered the message. Also make debug_weary_info() more informative using additional clear_avatar().

Make earlier commit actually work.

f37d217

Forgot to copy over identifier declaration.

Smooth out weary intake/tracker a bit

db03ee5

Weary levels keep fluctuating unrealistically, probably because weary.intake (and weary.tracker?) are changing in large jumps at times.

Adjust for smoothing weary intake/tracker

028b0f0

This adjusts expected test values to the (quite consistent) ones for the altered weary intake/tracker. It also does the weary.tracker adjustment in a less-perfectionistic (but more-functional) way.

Temporarily allow expected failures on 2 tests.

45b1139

Two of the tests are doing a consistent failure due to fluctuations. Ultimately, these fluctuations need eliminating, but it would be nice to get some information on if anything else is going on.

Check on weary ticks.

c01a661

This monitors, with weary level transitions, the low_activity_ticks and tick_counter, to see if this can help figure out why weary.tracker is increasing while resting. (cherry picked from commit 4c4b9cd)

Add further weariness fluctuation testing

d12b85d

In some conditions, namely continuous exercise at the same level, a decrease in weariness level is unrealistic. Check for this. (cherry picked from commit 49fc687)

Smooth out weary intake/tracker a bit

5790b15

Weary levels keep fluctuating unrealistically, probably because weary.intake (and weary.tracker?) are changing in large jumps at times. (cherry picked from commit db03ee5)

Adjust for smoothing weary intake/tracker

098fa14

This adjusts expected test values to the (quite consistent) ones for the altered weary intake/tracker. It also does the weary.tracker adjustment in a less-perfectionistic (but more-functional) way. (cherry picked from commit 028b0f0)

Add another clear_avatar() to weary_24h_tests

e6e294e

This clears the avatar prior to the (potential) debug output so what is actually the case inside do_activity can be seen.

Test with recent fixes.

b2c8072

Merge branch 'master' of https://github.com/CleverRaven/Cataclysm-DDA into weariness_fluctuations_1

Remove have_weary_increase()

0ded893

This (near-duplicate of have_weary_decrease()) is not in use, so removing to be in accord with CDDA guidelines.

Get info, weary.tracker/.intake, test alterations. Merge branch 'catc…

a8565b9

…h_weariness_fluctuations' into weariness_order

For comparisons - Merge branch 'master' of https://github.com/CleverR…

06d3bc6

…aven/Cataclysm-DDA into weariness_fluctuations_1

Not allow one of the weary test groups to fail

850c2cf

Given rearrangements in what tests will not happen if others fail, this is no longer needed to get more information. Some changes may be needed in said rearrangements, however.

Limit tests, compilations run (latter to compensate for fast-fail: fa…

81cecb4

…lse)

Set max-parallel to 2 in matrix.yml

f145b92

Try to ensure not "hogging" github jobs, despite fail-fast: false.

Use commit SHA modulo 1000000000 as seed

3f1fabb

Correct my error from earlier re date as seed. Also should make Github and Travis use the same seed.

See if TRAVIS_COMMIT is the right SHA hash

998fa1c

TRAVIS_PULL_REQUEST_SHA is not the correct SHA hash (and indeed may not change until the base branch is altered); see if TRAVIS_COMMIT will correspond with GITHUB_SHA.

actual-nh and others added 11 commits March 16, 2021 15:54

Make sure it's floating-point division

e11fcc4

Co-authored-by: anothersimulacrum <[email protected]>

Make sure it's floating-point division

a0e00b7

Co-authored-by: anothersimulacrum <[email protected]>

Make sure it's floating-point division

0b693c2

Co-authored-by: anothersimulacrum <[email protected]> (cherry picked from commit a0e00b7)

Make sure it's floating-point division

4aa85ad

Co-authored-by: anothersimulacrum <[email protected]> (cherry picked from commit e11fcc4)

Update with newest changes (prevent mismatch): Merge branch 'master' of

d052801

https://github.com/CleverRaven/Cataclysm-DDA into weariness_order

Make sure branch is properly based for testing - Merge branch 'master…

cfe3aca

…' of https://github.com/CleverRaven/Cataclysm-DDA into weariness_fluctuations_1

Solve merge conflicts - Merge branch 'master' of https://github.com/C…

b5641ec

…leverRaven/Cataclysm-DDA into activity_per_turn_weariness_fluctuations

Remember to include <cmath> for C++ math funcs

139b4c5

I forgot to include <cmath> for std::ceil, std::floor, and std::pow when I moved things over to activity_tracker.cpp (character.cpp had it already... as well as lots of others, of course!).

Merge branch 'weariness_fluctuations_1' into weariness_order

a1aeb0d

Didn't notice one conflict.

5a229a8

Fail-fast on; mayfail for weary; per-run seeds

d0f88b1

actual-nh mentioned this pull request Apr 1, 2021

Fix saving of player data #48308

Merged

actual-nh added 2 commits April 1, 2021 17:50

Remove 2 of 5 github checks - speed up

c26ab47

Given ccache, I think it was a bad idea to modify the runs, plus data so far indicate no changes across machine/compiler for weary. Admittedly, this gives a bit less data from different seeds.

Fix not noticing need to change %d to %s...

f275826

actual-nh mentioned this pull request Apr 7, 2021

Better way to run parallel test processes in different directories #48399

Closed

stale bot added the stale Closed for lack of activity, but still valid. label Jul 9, 2021

actual-nh added (P5 - Long-term) Long-term WIP, may stay on the list for a while. and removed stale Closed for lack of activity, but still valid. labels Sep 2, 2021

Night-Pryanik closed this Sep 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CR] Test order randomization using a focused set of tests #47791

[CR] Test order randomization using a focused set of tests #47791

actual-nh commented Feb 28, 2021 •

edited

Loading

actual-nh commented Mar 9, 2021 •

edited

Loading

jbytheway commented Mar 9, 2021

actual-nh commented Mar 9, 2021

actual-nh commented Mar 9, 2021

actual-nh commented Mar 16, 2021 •

edited

Loading

actual-nh commented Apr 1, 2021

stale bot commented Jul 9, 2021

Night-Pryanik commented Aug 5, 2022

Night-Pryanik commented Sep 29, 2022

[CR] Test order randomization using a focused set of tests #47791

[CR] Test order randomization using a focused set of tests #47791

Conversation

actual-nh commented Feb 28, 2021 • edited Loading

Summary

Purpose of change

Describe the solution

Describe alternatives you've considered

Additional context

actual-nh commented Mar 9, 2021 • edited Loading

jbytheway commented Mar 9, 2021

actual-nh commented Mar 9, 2021

actual-nh commented Mar 9, 2021

actual-nh commented Mar 16, 2021 • edited Loading

actual-nh commented Apr 1, 2021

stale bot commented Jul 9, 2021

Night-Pryanik commented Aug 5, 2022

Night-Pryanik commented Sep 29, 2022

actual-nh commented Feb 28, 2021 •

edited

Loading

actual-nh commented Mar 9, 2021 •

edited

Loading

actual-nh commented Mar 16, 2021 •

edited

Loading