Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CR] Test order randomization using a focused set of tests #47791

Closed
wants to merge 66 commits into from

Conversation

actual-nh
Copy link
Contributor

@actual-nh actual-nh commented Feb 28, 2021

Summary

None

Purpose of change

This is a more-focused version of #46473. More specifically, it focuses on ordering of tests involved in weariness and nutrition, with more focus on the former. It includes modifications from #47653 and #47273.

Describe the solution

One noticeable set of tests that alter depending on ordering is those in weary_test.cpp. This PR does testing of only those, and of those plus some possibly-related nutrition tests. It rearranges what architectures/compilers some testing is run on, to allow for test failures and still get useful input, but also try to limit its occupation of PR-testing resources. It does runs both with and without rearrangement using the same seed.

Describe alternatives you've considered

My local computer unfortunately takes several minutes (during which its responsiveness to other commands is reduced) to even start up a world for weariness testing, making repeated test runs here painful - and, even then, uninformative on architecture/compiler differences.

Additional context

Not meant for merging.

Give more detailed information on weariness (tracker and intake),
to try to figure out why keeps going up and down during tests.

Since the "healthy" stored calories are at bmi 25, put calories
to healthy minus debug_nutrition, to prevent going over.

(cherry picked from commit b897540)
The caloric subtraction (minus calories for debug_nutrition) is causing
errors in other tests, and it is also desirable to make sure it isn't
doing anything to the weariness tests themselves (weary intake).

With the new information (weary tracker and intake), the summarize
transition output is linewrapping; trying to prevent.

(cherry picked from commit e967767)
Start on adding tests for unrealistic fluctuations in weary level;
see CleverRaven#46384 (and some cases in CleverRaven#46941) for example problems. The initial
tests look for problems with the weary_recovery task of digging for 8
hours then waiting for 8 hours; weary level should not go down in the
first 8 hours, and should not go up in the second 8 hours.
In some conditions, namely continuous exercise at the same level, a
decrease in weariness level is unrealistic. Check for this.
Heavy tasks, while a logical section, do have the problem of repeating
the earlier task's information, making it harder to tell which task
triggered the message. Also make debug_weary_info() more informative
using additional clear_avatar().
Forgot to copy over identifier declaration.
Weary levels keep fluctuating unrealistically, probably because
weary.intake (and weary.tracker?) are changing in large jumps at times.
This adjusts expected test values to the (quite consistent) ones for
the altered weary intake/tracker. It also does the weary.tracker
adjustment in a less-perfectionistic (but more-functional) way.
Two of the tests are doing a consistent failure due to fluctuations.
Ultimately, these fluctuations need eliminating, but it would be nice
to get some information on if anything else is going on.
This is an extension of a previous modification to try to smooth out
the weary.intake reduction (smaller changes but more frequent), and
both increases that and does similar for weary.tracker. The
weary.intake changes are leading up to - probably after 0.F - an
exponential moving average being used instead, so that characters
are not, essentially hypoglycemic.
This monitors, with weary level transitions, the low_activity_ticks
and tick_counter, to see if this can help figure out why weary.tracker
is increasing while resting.

(cherry picked from commit 4c4b9cd)
As far as I can tell, the cause of the weary.tracker going up during
resting periods is that rest does still expend calories (bmr),
and it happens every 5 minutes - while weary.tracker was only reduced
every 30 (current) or 15 (this branch) minutes. This commit makes
weary.tracker reduction occur every 5 minutes - every time
try_reduce_weariness() is called.
Some of the test times were being altered by the every-30-minute
(awake) weary.tracker reductions. Alter to match new ones, also taking
into account local testing (including in scrambled order). While with
some scrambled tests am seeing inconsistencies between 8-hour and
12-hour digging, 8-hour without fluctuations indicated 3->4 should not
be 470 minutes, but no more than 465 - which it already was for
12-hour, weirdly enough (oops by me earlier?).
Start on adding tests for unrealistic fluctuations in weary level;
see CleverRaven#46384 (and some cases in CleverRaven#46941) for example problems. The initial
tests look for problems with the weary_recovery task of digging for 8
hours then waiting for 8 hours; weary level should not go down in the
first 8 hours, and should not go up in the second 8 hours.

(cherry picked from commit bdd942b)
In some conditions, namely continuous exercise at the same level, a
decrease in weariness level is unrealistic. Check for this.

(cherry picked from commit 49fc687)
Weary levels keep fluctuating unrealistically, probably because
weary.intake (and weary.tracker?) are changing in large jumps at times.

(cherry picked from commit db03ee5)
This adjusts expected test values to the (quite consistent) ones for
the altered weary intake/tracker. It also does the weary.tracker
adjustment in a less-perfectionistic (but more-functional) way.

(cherry picked from commit 028b0f0)
This is an extension of a previous modification to try to smooth out
the weary.intake reduction (smaller changes but more frequent), and
both increases that and does similar for weary.tracker. The
weary.intake changes are leading up to - probably after 0.F - an
exponential moving average being used instead, so that characters
are not, essentially hypoglycemic.

(cherry picked from commit ab4ba67)
As far as I can tell, the cause of the weary.tracker going up during
resting periods is that rest does still expend calories (bmr),
and it happens every 5 minutes - while weary.tracker was only reduced
every 30 (current) or 15 (this branch) minutes. This commit makes
weary.tracker reduction occur every 5 minutes - every time
try_reduce_weariness() is called.

(cherry picked from commit 5d9dcb2)
Some of the test times were being altered by the every-30-minute
(awake) weary.tracker reductions. Alter to match new ones, also taking
into account local testing (including in scrambled order). While with
some scrambled tests am seeing inconsistencies between 8-hour and
12-hour digging, 8-hour without fluctuations indicated 3->4 should not
be 470 minutes, but no more than 465 - which it already was for
12-hour, weirdly enough (oops by me earlier?).

(cherry picked from commit c68507b)
It is going to require more work - most likely adjusting weary.intake
to an exponential moving average - to get rid of these two failed
tests. (Note that these were added tests in the first place; the
failure - weariness fluctuation - in question was happening before
but simply wasn't detected.)
This clears the avatar prior to the (potential) debug output so what
is actually the case inside do_activity can be seen.
Merge branch 'master' of https://github.com/CleverRaven/Cataclysm-DDA into weariness_fluctuations_1
This (near-duplicate of have_weary_decrease()) is not in use, so
removing to be in accord with CDDA guidelines.
(This does not include any build-scripts/build.sh alterations.)
In order to get more of an idea of what's happening with some tests
known to have different failure patterns depending on the test order,
namely those related to weariness, set up for eventually doing
only a few tests, but in different orders and on different
architectures.
…h_weariness_fluctuations' into weariness_order
Given rearrangements in what tests will not happen if others fail,
this is no longer needed to get more information. Some changes may be
needed in said rearrangements, however.
Try to ensure not "hogging" github jobs, despite fail-fast: false.
@actual-nh
Copy link
Contributor Author

actual-nh commented Mar 9, 2021

Gods-dammit, I'm an idiot. I'm sorry. I failed to realize that Github would be doing the matrix completely in parallel across multiple machines, INCLUDING finding out the date on different machines.

EDIT: I didn't spot it earlier because they turn out to be rather well synchronized...

Do you happen to know whether clara::detail::convertInto( seed, config.rngSeed ); would accept something like the sha1 commit hash, if I stuck a 0x on the front? If not, I'll check further into it.

EDIT: Looks like all I have to do is try using it and it should indicate whether it's a proper seed. Good. Not sure if I'm up to putting this into build.sh tonight, but I may try.

@jbytheway
Copy link
Contributor

I'd expect that it would expect a hex-formatted number. But if it doesn't you can convert to decimal using bash

$ echo $((0x100))
256

Correct my error from earlier re date as seed. Also should make
Github and Travis use the same seed.
@actual-nh
Copy link
Contributor Author

I'd expect that it would expect a hex-formatted number. But if it doesn't you can convert to decimal using bash

$ echo $((0x100))
256

It probably would accept a small hex-formatted number, but it never printed a seed when I tried one the length of a SHA hash. I wound up doing $(( 0x3f1fabb706e42a609724c782298e0dfef85bb154 % 1000000000 )) - one that length should work; the 1000000000 is derived from the shuf invocation used as a seed in another script.

TRAVIS_PULL_REQUEST_SHA is not the correct SHA hash (and indeed may not
change until the base branch is altered); see if TRAVIS_COMMIT will
correspond with GITHUB_SHA.
@actual-nh
Copy link
Contributor Author

GITHUB_SHA does not seem to work properly; I have put in an issue at the best place I could find, namely github/docs#4422.

@actual-nh
Copy link
Contributor Author

actual-nh commented Mar 16, 2021

Github docs was pretty useless (although hopefully the documentation will be changed to reflect that GITHUB_SHA is the prior commit - making it rather useless for coordinating... sigh).

Given ccache, I think it was a bad idea to modify the runs, plus
data so far indicate no changes across machine/compiler for weary.
Admittedly, this gives a bit less data from different seeds.
@actual-nh
Copy link
Contributor Author

These are designed to get as much info as possible regarding the weary tests with the turn-by-turn activity measurement changes (so far rather few!) combined with #47653's weary changes.

@stale
Copy link

stale bot commented Jul 9, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Please do not 'bump' or comment on this issue unless you are actively working on it. Stale issues, and stale issues that are closed are still considered.

@stale stale bot added the stale Closed for lack of activity, but still valid. label Jul 9, 2021
@actual-nh actual-nh added (P5 - Long-term) Long-term WIP, may stay on the list for a while. and removed stale Closed for lack of activity, but still valid. labels Sep 2, 2021
@Night-Pryanik
Copy link
Contributor

I suppose this PR is abandoned, since last commit was a year and a half ago? I'm not sure if it should stay in PR tracker, especially with this many unresolved conflicts.

@Night-Pryanik
Copy link
Contributor

Closing as abandoned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code: Tests Measurement, self-control, statistics, balancing. (P5 - Long-term) Long-term WIP, may stay on the list for a while.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants