Test/CI builds (intermittently) failing on field, possibly weary tests (unrelated to pull requests) #46256

actual-nh · 2020-12-22T17:09:15Z

Describe the bug

Travis and/or general build-matrix builds for unrelated pull requests are intermittently (?) failing on the weary and/or fields test, with variations on under which compiler/etc combination is involved:

../tests/weary_test.cpp:122: FAILED:
  CHECK( info.transition_minutes( 4, 3, 550_minutes ) == Approx( 550 ).margin( 0 ) )
with expansion:
  520 (0x208) == Approx( 550.0 )
with messages:
  
Digging Pits 8 hours, then waiting 8:
  Transition: Weariness from 0 to 1 at 120 minutes
  Transition: Weariness from 1 to 2 at 250 minutes
  Transition: Weariness from 2 to 3 at 360 minutes
  Transition: Weariness from 3 to 4 at 465 minutes
  Transition: Weariness from 4 to 3 at 520 minutes
  Transition: Weariness from 3 to 2 at 640 minutes
  Transition: Weariness from 2 to 3 at 660 minutes
  Transition: Weariness from 3 to 2 at 670 minutes
  Transition: Weariness from 2 to 1 at 875 minutes
  Transition: Weariness from 1 to 2 at 890 minutes
  Transition: Weariness from 2 to 1 at 905 minutes
  Weariness: 1712 Max Exertion: ACTIVE_EXERCISE Mult: 0.8
  BMR: 1708 Intake: 1093 Tracker: 2259 Thresh: 893 At: 1
  Calories: 50444
Log messages during failed test:
12:07:12AM: You eat your holy SPAM of debugging (fresh).
12:07:12AM: You start walking.
0.686 s: Heavy tasks
0.686 s: weary_recovery
1.480 s: 1 day vehicle work
1.481 s: weary_recovery
1.387 s: Waiting 24 hours
1.387 s: weary_24h_tasks
0.834 s: Digging 24 hours
0.834 s: weary_24h_tasks
2.467 s: weather realism

Note: 520 instead of 550 is quite consistent.

And another seen multiple times:

../tests/field_test.cpp:44: FAILED:
  CHECK( count_fields( field_type ) == Approx( 8712 ).margin( 200 ) )
with expansion:
  8493 (0x212d) == Approx( 8712.0 )
Log messages during failed test:
12:00:00PM: You're covered in acid!
12:00:00PM: You're covered in acid!
12:00:00PM: The acid burns your legs and feet!
1.359 s: acid_field_expiry_on_map

Steps To Reproduce

(I have commented out old Weary cases since some may be fixed by #46906.) Already reproduced multiple times (commit tags given to find the right one in multi-commit PRs; generated links are not consistently related):

Correct pluralization of log message for boomer explosion #46996 (02c43dd - Fields, Travis)
fix magic casting xp #46949 (2638e9e - Fields, general)
Batch test executions by default #46934 (7db2c80 - Fields, Travis)
added check for parent in insertion menu (no more bagception) #46913 (e6afa81 - Fields, Travis)
Skillgain from reading books fixed #46908 (a9f28bc - Fields, Travis)
Add RLE compression for json ser/deser of colony<item> #46837 (baaed94 - Fields, general)
Add Find Isherwood Farm mission #46821 (c1ca5a7 - Fields, on Appveyor)

Revert "Reverses subtraction in character.cpp to correct skill rust calculation" #46831 (72da9df - Fields, Travis)

Prone move mode #46750 (e693355 - Fields, Travis)
New island prison layout #46726 (3ddf94a - Fields, Travis)

NPC's are friends with their factions monster faction #46554 (ebedbe1 - Fields, general)
Introducing patchwork nomex clothing parts to several items recipes #46549 (85501ed - Fields, Travis)
Display power unit for bionic power and batteries #46514 (a32d490 - Fields, general)
Remove duplicated tool display when crafting #46426 (c12aaa8 - Fields, Travis)
Interface "Ultica 2020-12-30" #46405 (b512c18 - Fields, Travis)

Show source location in error messages of deferred json #46350 (4c9eb7b - Fields, general build matrix)
Inventory view update - Show color for food in containers, show worst rot status for containers with multiple items #46329 (8152d40 - Fields, general build matrix)

XL clothing adjustments #46299 (493b571, 13dfee2 - Fields, Travis)

Allow completing monster killing mission by killing the fusing monster when the mission target is fused #46295 (04a5d26 - Fields, Travis)
Misc item category fixes #46275 (4b60302 - Fields, Travis)

Fix auto mining cancelation #46266 (6155b32 - Fields, Travis)

Additional fixes to Appveyor build #46220 (25b9fd7 - Fields, Travis)

Comprehensive grammar sweep #46181 (8381e8f - Fields, general build matrix)
Adjust monster immunities to venoms, being downed #46156 (eb1a0c8 - Fields, Travis; not most recent)

Reduce too high encumbrance on clothing with pockets covering multiple body parts #45760 (ec7edff - Fields, Travis)

Expected behavior

Consistent test results when no change was made to C++ code or JSON files drawn upon.

Additional context

Ping: @kevingranade (fields test)

The text was updated successfully, but these errors were encountered:

Salty-Panda · 2020-12-22T17:18:43Z

Also #45760

actual-nh · 2020-12-22T17:27:09Z

Also #45760

Thanks! I've put it in the list.

int-ua · 2020-12-23T17:08:23Z

This may be alleviated a bit by #46272
Especially if someone will help me limit the general build matrix further. If it's possible.

actual-nh · 2020-12-23T17:20:08Z

That could help with the general build matrix; I suspect the only ones needed to check for this sort of error are ones that could alter arithmetic optimization (e.g., roundoff). The tests in question do need to be run (AFAIU) when there are JSON changes, since a lot of what they're checking for uses values from JSON files. ~~OTOH, the statistical tests are going to have some risk of a false positive, and the more you run them, the more likely you are to get a false positive.~~ Oops, only the archery test mentioned previously (now commented out - am thinking is false positive) is using a statistical test.

actual-nh · 2020-12-27T19:12:04Z

Especially if someone will help me limit the general build matrix further. If it's possible.

@int-ua: Is further work needed on that?

int-ua · 2020-12-27T20:02:39Z

Not sure. Can it ignore more than

    - 'doc/**'
    - 'lgtm/**'
    - 'tools/**'

?

actual-nh · 2020-12-27T20:38:31Z

I suspect utilities/** also, unless they're running tests of creating/redoing tilesets. Do any of them build the doxygen documentation? If not, doxygen_doc can be ignored. Ditto for object_creator and msvc-object_creator. If android is only tested via the appveyor builds, then that directory could be ignored. They don't seem to currently be building on OSX, so build-data/osx should be ignorable.

The matrix could be split between non-cmake and cmake, and between non-tiles and tiles; not sure if it would be worth the extra complexity to be able to ignore CMakeModules and CMakeLists.txt for the former, or gfx (plus not fetching libsdl2-dev libsdl2-ttf-dev libsdl2-image-dev libsdl2-mixer-dev) for the latter.

int-ua · 2020-12-27T20:54:30Z

You have more insight into that matrix, will you implement the suggested changes?

actual-nh · 2020-12-27T20:56:52Z

I can do the first paragraph, and mention the second paragraph in the PR.

actual-nh · 2020-12-28T17:22:21Z

I've just added to the initial part the observation that the "weary" test failure is very consistent in terms of numbers - 520 instead of the expected 550. Anyone have any idea why? @anothersimulacrum? @I-am-Erk?

I-am-Erk · 2020-12-28T19:30:33Z

I am not sure why the weariness recovery is quicker in that test but it might be reasonable to add an acceptable deviation to those tests, like ±10% or something

actual-nh · 2020-12-29T01:49:32Z

It is actually possible that 520 is the correct value, not 550; see #46384. (More specifically, the 550 value appears to involve (more) fluctuations in the weary level that really shouldn't be there...)

int-ua · 2020-12-30T11:11:25Z

#46405

acid_field_expiry_on_map
-------------------------------------------------------------------------------
../tests/field_test.cpp:27
...............................................................................
../tests/field_test.cpp:44: FAILED:
  CHECK( count_fields( field_type ) == Approx( 8712 ).margin( 200 ) )
with expansion:
  8496 (0x2130) == Approx( 8712.0 )

actual-nh · 2020-12-30T15:54:14Z

#46405

Thanks! Have put in.

Intermittent errors in weary tests (see CleverRaven#46256) are hard to debug without more information. While this information has been expanded in CleverRaven#46473, this does not give enough examples, nor is it able to tell what in "normal" builds is causing intermittent weary test problems.

Aivean · 2021-01-26T20:38:32Z

Regarding the flaky fields test.

It's filling the whole z-level in acid and waiting "half-life" time (only 2 minutes for acid) to ensure that approximately half of the acid fields had decayed.

The problem is that the Approx.margin = 200 in the check of the number of alive fields was probably chosen arbitrary, and apparently it's not large enough to account for the variance in the half-life decay.

Our only options here are to try and reduce the variance:

run test multiple times and average the results (tests performance will suffer)
use field with longer half-life time (also performance hit)
use larger margin

Another option would be to seed the rng for the decay process, but I don't think it's a good solution.

actual-nh · 2021-01-28T02:55:57Z

I am going to close this issue, I think; hopefully the two patches (by @anothersimulacrum and @Aivean; thanks very much!) have reduced these down to a tolerable level (or possibly even eliminated them, in the case of the weary tests). Anything further on weary tests will likely be on #46384 (or others as appropriate). Thanks everyone!

actual-nh changed the title ~~Travis builds (intermittently) failing on weary test (unrelated to pull requests)~~ Travis+General builds (intermittently) failing on weary test (unrelated to pull requests) Dec 22, 2020

wapcaplet added <Bug> This needs to be fixed Code: Tests Measurement, self-control, statistics, balancing. labels Dec 22, 2020

actual-nh changed the title ~~Travis+General builds (intermittently) failing on weary test (unrelated to pull requests)~~ Travis+General builds (intermittently) failing on weary, other tests (unrelated to pull requests) Dec 22, 2020

actual-nh changed the title ~~Travis+General builds (intermittently) failing on weary, other tests (unrelated to pull requests)~~ Travis+General builds (intermittently) failing on weary, field tests (unrelated to pull requests) Dec 24, 2020

actual-nh mentioned this issue Dec 27, 2020

[CR] Ignore more paths for general build matrix #46367

Merged

actual-nh mentioned this issue Dec 29, 2020

Weariness fluctuating unexpectedly/inconsistently in weary tests #46384

Closed

actual-nh mentioned this issue Jan 4, 2021

[CR] Randomize test order: catch interaction, etc bugs #46473

Closed

This was referenced Jan 15, 2021

Expand weary (debugging) info #46757

Merged

Name and Recipe Change (Steel Knuckles to Scrap Knuckles) #46331

Merged

anothersimulacrum mentioned this issue Jan 20, 2021

Remove randomness from metabolism #46906

Merged

actual-nh changed the title ~~Travis+General builds (intermittently) failing on weary, field tests (unrelated to pull requests)~~ Test/CI builds (intermittently) failing on weary, field tests (unrelated to pull requests) Jan 23, 2021

actual-nh changed the title ~~Test/CI builds (intermittently) failing on weary, field tests (unrelated to pull requests)~~ Test/CI builds (intermittently) failing on field, possibly weary tests (unrelated to pull requests) Jan 24, 2021

actual-nh mentioned this issue Jan 26, 2021

Fields performance #47032

Closed

Aivean mentioned this issue Jan 26, 2021

Increase error margin for the acid decay test to reduce flakiness #47054

Merged

actual-nh closed this as completed Jan 28, 2021

actual-nh mentioned this issue Feb 16, 2021

manual labour doesn't make you sleepy #47512

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test/CI builds (intermittently) failing on field, possibly weary tests (unrelated to pull requests) #46256

Test/CI builds (intermittently) failing on field, possibly weary tests (unrelated to pull requests) #46256

actual-nh commented Dec 22, 2020 •

edited

Loading

Salty-Panda commented Dec 22, 2020

actual-nh commented Dec 22, 2020

int-ua commented Dec 23, 2020

actual-nh commented Dec 23, 2020 •

edited

Loading

actual-nh commented Dec 27, 2020

int-ua commented Dec 27, 2020 •

edited

Loading

actual-nh commented Dec 27, 2020 •

edited

Loading

int-ua commented Dec 27, 2020

actual-nh commented Dec 27, 2020

actual-nh commented Dec 28, 2020

I-am-Erk commented Dec 28, 2020 •

edited

Loading

actual-nh commented Dec 29, 2020 •

edited

Loading

int-ua commented Dec 30, 2020 •

edited

Loading

actual-nh commented Dec 30, 2020

Aivean commented Jan 26, 2021

actual-nh commented Jan 28, 2021 •

edited

Loading

Test/CI builds (intermittently) failing on field, possibly weary tests (unrelated to pull requests) #46256

Test/CI builds (intermittently) failing on field, possibly weary tests (unrelated to pull requests) #46256

Comments

actual-nh commented Dec 22, 2020 • edited Loading

Describe the bug

Steps To Reproduce

Expected behavior

Additional context

Salty-Panda commented Dec 22, 2020

actual-nh commented Dec 22, 2020

int-ua commented Dec 23, 2020

actual-nh commented Dec 23, 2020 • edited Loading

actual-nh commented Dec 27, 2020

int-ua commented Dec 27, 2020 • edited Loading

actual-nh commented Dec 27, 2020 • edited Loading

int-ua commented Dec 27, 2020

actual-nh commented Dec 27, 2020

actual-nh commented Dec 28, 2020

I-am-Erk commented Dec 28, 2020 • edited Loading

actual-nh commented Dec 29, 2020 • edited Loading

int-ua commented Dec 30, 2020 • edited Loading

actual-nh commented Dec 30, 2020

Aivean commented Jan 26, 2021

actual-nh commented Jan 28, 2021 • edited Loading

actual-nh commented Dec 22, 2020 •

edited

Loading

actual-nh commented Dec 23, 2020 •

edited

Loading

int-ua commented Dec 27, 2020 •

edited

Loading

actual-nh commented Dec 27, 2020 •

edited

Loading

I-am-Erk commented Dec 28, 2020 •

edited

Loading

actual-nh commented Dec 29, 2020 •

edited

Loading

int-ua commented Dec 30, 2020 •

edited

Loading

actual-nh commented Jan 28, 2021 •

edited

Loading