Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test/CI builds (intermittently) failing on field, possibly weary tests (unrelated to pull requests) #46256

Closed
actual-nh opened this issue Dec 22, 2020 · 16 comments
Labels
<Bug> This needs to be fixed Code: Tests Measurement, self-control, statistics, balancing.

Comments

@actual-nh
Copy link
Contributor

actual-nh commented Dec 22, 2020

Describe the bug

Travis and/or general build-matrix builds for unrelated pull requests are intermittently (?) failing on the weary and/or fields test, with variations on under which compiler/etc combination is involved:

../tests/weary_test.cpp:122: FAILED:
  CHECK( info.transition_minutes( 4, 3, 550_minutes ) == Approx( 550 ).margin( 0 ) )
with expansion:
  520 (0x208) == Approx( 550.0 )
with messages:
  
Digging Pits 8 hours, then waiting 8:
  Transition: Weariness from 0 to 1 at 120 minutes
  Transition: Weariness from 1 to 2 at 250 minutes
  Transition: Weariness from 2 to 3 at 360 minutes
  Transition: Weariness from 3 to 4 at 465 minutes
  Transition: Weariness from 4 to 3 at 520 minutes
  Transition: Weariness from 3 to 2 at 640 minutes
  Transition: Weariness from 2 to 3 at 660 minutes
  Transition: Weariness from 3 to 2 at 670 minutes
  Transition: Weariness from 2 to 1 at 875 minutes
  Transition: Weariness from 1 to 2 at 890 minutes
  Transition: Weariness from 2 to 1 at 905 minutes
  Weariness: 1712 Max Exertion: ACTIVE_EXERCISE Mult: 0.8
  BMR: 1708 Intake: 1093 Tracker: 2259 Thresh: 893 At: 1
  Calories: 50444
Log messages during failed test:
12:07:12AM: You eat your holy SPAM of debugging (fresh).
12:07:12AM: You start walking.
0.686 s: Heavy tasks
0.686 s: weary_recovery
1.480 s: 1 day vehicle work
1.481 s: weary_recovery
1.387 s: Waiting 24 hours
1.387 s: weary_24h_tasks
0.834 s: Digging 24 hours
0.834 s: weary_24h_tasks
2.467 s: weather realism

Note: 520 instead of 550 is quite consistent.

And another seen multiple times:

../tests/field_test.cpp:44: FAILED:
  CHECK( count_fields( field_type ) == Approx( 8712 ).margin( 200 ) )
with expansion:
  8493 (0x212d) == Approx( 8712.0 )
Log messages during failed test:
12:00:00PM: You're covered in acid!
12:00:00PM: You're covered in acid!
12:00:00PM: The acid burns your legs and feet!
1.359 s: acid_field_expiry_on_map

Steps To Reproduce

(I have commented out old Weary cases since some may be fixed by #46906.) Already reproduced multiple times (commit tags given to find the right one in multi-commit PRs; generated links are not consistently related):

Expected behavior

Consistent test results when no change was made to C++ code or JSON files drawn upon.

Additional context

Ping: @kevingranade (fields test)

@Salty-Panda
Copy link
Contributor

Also #45760

@actual-nh actual-nh changed the title Travis builds (intermittently) failing on weary test (unrelated to pull requests) Travis+General builds (intermittently) failing on weary test (unrelated to pull requests) Dec 22, 2020
@actual-nh
Copy link
Contributor Author

Also #45760

Thanks! I've put it in the list.

@wapcaplet wapcaplet added <Bug> This needs to be fixed Code: Tests Measurement, self-control, statistics, balancing. labels Dec 22, 2020
@actual-nh actual-nh changed the title Travis+General builds (intermittently) failing on weary test (unrelated to pull requests) Travis+General builds (intermittently) failing on weary, other tests (unrelated to pull requests) Dec 22, 2020
@int-ua
Copy link
Contributor

int-ua commented Dec 23, 2020

This may be alleviated a bit by #46272
Especially if someone will help me limit the general build matrix further. If it's possible.

@actual-nh
Copy link
Contributor Author

actual-nh commented Dec 23, 2020

That could help with the general build matrix; I suspect the only ones needed to check for this sort of error are ones that could alter arithmetic optimization (e.g., roundoff). The tests in question do need to be run (AFAIU) when there are JSON changes, since a lot of what they're checking for uses values from JSON files. OTOH, the statistical tests are going to have some risk of a false positive, and the more you run them, the more likely you are to get a false positive. Oops, only the archery test mentioned previously (now commented out - am thinking is false positive) is using a statistical test.

@actual-nh actual-nh changed the title Travis+General builds (intermittently) failing on weary, other tests (unrelated to pull requests) Travis+General builds (intermittently) failing on weary, field tests (unrelated to pull requests) Dec 24, 2020
@actual-nh
Copy link
Contributor Author

Especially if someone will help me limit the general build matrix further. If it's possible.

@int-ua: Is further work needed on that?

@int-ua
Copy link
Contributor

int-ua commented Dec 27, 2020

Not sure. Can it ignore more than

    - 'doc/**'
    - 'lgtm/**'
    - 'tools/**'

?

@actual-nh
Copy link
Contributor Author

actual-nh commented Dec 27, 2020

I suspect utilities/** also, unless they're running tests of creating/redoing tilesets. Do any of them build the doxygen documentation? If not, doxygen_doc can be ignored. Ditto for object_creator and msvc-object_creator. If android is only tested via the appveyor builds, then that directory could be ignored. They don't seem to currently be building on OSX, so build-data/osx should be ignorable.

The matrix could be split between non-cmake and cmake, and between non-tiles and tiles; not sure if it would be worth the extra complexity to be able to ignore CMakeModules and CMakeLists.txt for the former, or gfx (plus not fetching libsdl2-dev libsdl2-ttf-dev libsdl2-image-dev libsdl2-mixer-dev) for the latter.

@int-ua
Copy link
Contributor

int-ua commented Dec 27, 2020

You have more insight into that matrix, will you implement the suggested changes?

@actual-nh
Copy link
Contributor Author

I can do the first paragraph, and mention the second paragraph in the PR.

@actual-nh
Copy link
Contributor Author

I've just added to the initial part the observation that the "weary" test failure is very consistent in terms of numbers - 520 instead of the expected 550. Anyone have any idea why? @anothersimulacrum? @I-am-Erk?

@I-am-Erk
Copy link
Member

I-am-Erk commented Dec 28, 2020

I am not sure why the weariness recovery is quicker in that test but it might be reasonable to add an acceptable deviation to those tests, like ±10% or something

@actual-nh
Copy link
Contributor Author

actual-nh commented Dec 29, 2020

It is actually possible that 520 is the correct value, not 550; see #46384. (More specifically, the 550 value appears to involve (more) fluctuations in the weary level that really shouldn't be there...)

@int-ua
Copy link
Contributor

int-ua commented Dec 30, 2020

#46405

acid_field_expiry_on_map
-------------------------------------------------------------------------------
../tests/field_test.cpp:27
...............................................................................
../tests/field_test.cpp:44: FAILED:
  CHECK( count_fields( field_type ) == Approx( 8712 ).margin( 200 ) )
with expansion:
  8496 (0x2130) == Approx( 8712.0 )

@actual-nh
Copy link
Contributor Author

#46405

Thanks! Have put in.

actual-nh added a commit to actual-nh/Cataclysm-DDA that referenced this issue Jan 15, 2021
Intermittent errors in weary tests (see CleverRaven#46256) are hard to debug
without more information. While this information has been expanded
in CleverRaven#46473, this does not give enough examples, nor is it able to
tell what in "normal" builds is causing intermittent weary test
problems.
@actual-nh actual-nh changed the title Travis+General builds (intermittently) failing on weary, field tests (unrelated to pull requests) Test/CI builds (intermittently) failing on weary, field tests (unrelated to pull requests) Jan 23, 2021
@actual-nh actual-nh changed the title Test/CI builds (intermittently) failing on weary, field tests (unrelated to pull requests) Test/CI builds (intermittently) failing on field, possibly weary tests (unrelated to pull requests) Jan 24, 2021
@Aivean
Copy link
Contributor

Aivean commented Jan 26, 2021

Regarding the flaky fields test.

It's filling the whole z-level in acid and waiting "half-life" time (only 2 minutes for acid) to ensure that approximately half of the acid fields had decayed.

The problem is that the Approx.margin = 200 in the check of the number of alive fields was probably chosen arbitrary, and apparently it's not large enough to account for the variance in the half-life decay.

Our only options here are to try and reduce the variance:

  1. run test multiple times and average the results (tests performance will suffer)
  2. use field with longer half-life time (also performance hit)
  3. use larger margin

Another option would be to seed the rng for the decay process, but I don't think it's a good solution.

@actual-nh
Copy link
Contributor Author

actual-nh commented Jan 28, 2021

I am going to close this issue, I think; hopefully the two patches (by @anothersimulacrum and @Aivean; thanks very much!) have reduced these down to a tolerable level (or possibly even eliminated them, in the case of the weary tests). Anything further on weary tests will likely be on #46384 (or others as appropriate). Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
<Bug> This needs to be fixed Code: Tests Measurement, self-control, statistics, balancing.
Projects
None yet
Development

No branches or pull requests

6 participants