-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CR] Test order randomization using a focused set of tests #47791
Conversation
Give more detailed information on weariness (tracker and intake), to try to figure out why keeps going up and down during tests. Since the "healthy" stored calories are at bmi 25, put calories to healthy minus debug_nutrition, to prevent going over. (cherry picked from commit b897540)
The caloric subtraction (minus calories for debug_nutrition) is causing errors in other tests, and it is also desirable to make sure it isn't doing anything to the weariness tests themselves (weary intake). With the new information (weary tracker and intake), the summarize transition output is linewrapping; trying to prevent. (cherry picked from commit e967767)
Start on adding tests for unrealistic fluctuations in weary level; see CleverRaven#46384 (and some cases in CleverRaven#46941) for example problems. The initial tests look for problems with the weary_recovery task of digging for 8 hours then waiting for 8 hours; weary level should not go down in the first 8 hours, and should not go up in the second 8 hours.
In some conditions, namely continuous exercise at the same level, a decrease in weariness level is unrealistic. Check for this.
Heavy tasks, while a logical section, do have the problem of repeating the earlier task's information, making it harder to tell which task triggered the message. Also make debug_weary_info() more informative using additional clear_avatar().
Forgot to copy over identifier declaration.
Weary levels keep fluctuating unrealistically, probably because weary.intake (and weary.tracker?) are changing in large jumps at times.
This adjusts expected test values to the (quite consistent) ones for the altered weary intake/tracker. It also does the weary.tracker adjustment in a less-perfectionistic (but more-functional) way.
Two of the tests are doing a consistent failure due to fluctuations. Ultimately, these fluctuations need eliminating, but it would be nice to get some information on if anything else is going on.
This is an extension of a previous modification to try to smooth out the weary.intake reduction (smaller changes but more frequent), and both increases that and does similar for weary.tracker. The weary.intake changes are leading up to - probably after 0.F - an exponential moving average being used instead, so that characters are not, essentially hypoglycemic.
This monitors, with weary level transitions, the low_activity_ticks and tick_counter, to see if this can help figure out why weary.tracker is increasing while resting. (cherry picked from commit 4c4b9cd)
As far as I can tell, the cause of the weary.tracker going up during resting periods is that rest does still expend calories (bmr), and it happens every 5 minutes - while weary.tracker was only reduced every 30 (current) or 15 (this branch) minutes. This commit makes weary.tracker reduction occur every 5 minutes - every time try_reduce_weariness() is called.
Some of the test times were being altered by the every-30-minute (awake) weary.tracker reductions. Alter to match new ones, also taking into account local testing (including in scrambled order). While with some scrambled tests am seeing inconsistencies between 8-hour and 12-hour digging, 8-hour without fluctuations indicated 3->4 should not be 470 minutes, but no more than 465 - which it already was for 12-hour, weirdly enough (oops by me earlier?).
Start on adding tests for unrealistic fluctuations in weary level; see CleverRaven#46384 (and some cases in CleverRaven#46941) for example problems. The initial tests look for problems with the weary_recovery task of digging for 8 hours then waiting for 8 hours; weary level should not go down in the first 8 hours, and should not go up in the second 8 hours. (cherry picked from commit bdd942b)
In some conditions, namely continuous exercise at the same level, a decrease in weariness level is unrealistic. Check for this. (cherry picked from commit 49fc687)
Weary levels keep fluctuating unrealistically, probably because weary.intake (and weary.tracker?) are changing in large jumps at times. (cherry picked from commit db03ee5)
This adjusts expected test values to the (quite consistent) ones for the altered weary intake/tracker. It also does the weary.tracker adjustment in a less-perfectionistic (but more-functional) way. (cherry picked from commit 028b0f0)
This is an extension of a previous modification to try to smooth out the weary.intake reduction (smaller changes but more frequent), and both increases that and does similar for weary.tracker. The weary.intake changes are leading up to - probably after 0.F - an exponential moving average being used instead, so that characters are not, essentially hypoglycemic. (cherry picked from commit ab4ba67)
As far as I can tell, the cause of the weary.tracker going up during resting periods is that rest does still expend calories (bmr), and it happens every 5 minutes - while weary.tracker was only reduced every 30 (current) or 15 (this branch) minutes. This commit makes weary.tracker reduction occur every 5 minutes - every time try_reduce_weariness() is called. (cherry picked from commit 5d9dcb2)
Some of the test times were being altered by the every-30-minute (awake) weary.tracker reductions. Alter to match new ones, also taking into account local testing (including in scrambled order). While with some scrambled tests am seeing inconsistencies between 8-hour and 12-hour digging, 8-hour without fluctuations indicated 3->4 should not be 470 minutes, but no more than 465 - which it already was for 12-hour, weirdly enough (oops by me earlier?). (cherry picked from commit c68507b)
It is going to require more work - most likely adjusting weary.intake to an exponential moving average - to get rid of these two failed tests. (Note that these were added tests in the first place; the failure - weariness fluctuation - in question was happening before but simply wasn't detected.)
This clears the avatar prior to the (potential) debug output so what is actually the case inside do_activity can be seen.
Merge branch 'master' of https://github.com/CleverRaven/Cataclysm-DDA into weariness_fluctuations_1
This (near-duplicate of have_weary_decrease()) is not in use, so removing to be in accord with CDDA guidelines.
(This does not include any build-scripts/build.sh alterations.) In order to get more of an idea of what's happening with some tests known to have different failure patterns depending on the test order, namely those related to weariness, set up for eventually doing only a few tests, but in different orders and on different architectures.
…h_weariness_fluctuations' into weariness_order
…aven/Cataclysm-DDA into weariness_fluctuations_1
Given rearrangements in what tests will not happen if others fail, this is no longer needed to get more information. Some changes may be needed in said rearrangements, however.
Try to ensure not "hogging" github jobs, despite fail-fast: false.
Gods-dammit, I'm an idiot. I'm sorry. I failed to realize that Github would be doing the matrix completely in parallel across multiple machines, INCLUDING finding out the date on different machines. EDIT: I didn't spot it earlier because they turn out to be rather well synchronized...
|
I'd expect that it would expect a hex-formatted number. But if it doesn't you can convert to decimal using bash
|
Correct my error from earlier re date as seed. Also should make Github and Travis use the same seed.
It probably would accept a small hex-formatted number, but it never printed a seed when I tried one the length of a SHA hash. I wound up doing |
TRAVIS_PULL_REQUEST_SHA is not the correct SHA hash (and indeed may not change until the base branch is altered); see if TRAVIS_COMMIT will correspond with GITHUB_SHA.
GITHUB_SHA does not seem to work properly; I have put in an issue at the best place I could find, namely github/docs#4422. |
Github docs was pretty useless (although hopefully the documentation will be changed to reflect that GITHUB_SHA is the prior commit - making it rather useless for coordinating... sigh). |
Co-authored-by: anothersimulacrum <[email protected]>
Co-authored-by: anothersimulacrum <[email protected]>
Co-authored-by: anothersimulacrum <[email protected]> (cherry picked from commit a0e00b7)
Co-authored-by: anothersimulacrum <[email protected]> (cherry picked from commit e11fcc4)
…' of https://github.com/CleverRaven/Cataclysm-DDA into weariness_fluctuations_1
…leverRaven/Cataclysm-DDA into activity_per_turn_weariness_fluctuations
I forgot to include <cmath> for std::ceil, std::floor, and std::pow when I moved things over to activity_tracker.cpp (character.cpp had it already... as well as lots of others, of course!).
Given ccache, I think it was a bad idea to modify the runs, plus data so far indicate no changes across machine/compiler for weary. Admittedly, this gives a bit less data from different seeds.
These are designed to get as much info as possible regarding the weary tests with the turn-by-turn activity measurement changes (so far rather few!) combined with #47653's weary changes. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Please do not 'bump' or comment on this issue unless you are actively working on it. Stale issues, and stale issues that are closed are still considered. |
I suppose this PR is abandoned, since last commit was a year and a half ago? I'm not sure if it should stay in PR tracker, especially with this many unresolved conflicts. |
Closing as abandoned. |
Summary
None
Purpose of change
This is a more-focused version of #46473. More specifically, it focuses on ordering of tests involved in weariness and nutrition, with more focus on the former. It includes modifications from #47653 and #47273.
Describe the solution
One noticeable set of tests that alter depending on ordering is those in weary_test.cpp. This PR does testing of only those, and of those plus some possibly-related nutrition tests. It rearranges what architectures/compilers some testing is run on, to allow for test failures and still get useful input, but also try to limit its occupation of PR-testing resources. It does runs both with and without rearrangement using the same seed.
Describe alternatives you've considered
My local computer unfortunately takes several minutes (during which its responsiveness to other commands is reduced) to even start up a world for weariness testing, making repeated test runs here painful - and, even then, uninformative on architecture/compiler differences.
Additional context
Not meant for merging.