Fix tests/integration_tests/cli/test_integration_cli.py::test_failing_job_cli_error_message #6863

xjules · 2023-12-29T11:06:38Z

Describe the bug
The following test fails, but only when running it as github workflow:

def test_failing_job_cli_error_message():
        # modify poly_eval.py
        with open("poly_eval.py", mode="a", encoding="utf-8") as poly_script:
            poly_script.writelines(["    raise RuntimeError('Argh')"])
    
        args = Mock()
        args.config = "poly_high_min_reals.ert"
        parser = ArgumentParser(prog="test_main")
    
        parser = ArgumentParser(prog="test_main")
        parsed = ert_parser(
            parser,
            [TEST_RUN_MODE, "poly.ert"],
        )
        expected_substrings = [
            "Realization: 0 failed after reaching max submit (2)",
            "job poly_eval failed",
            "Process exited with status code 1",
            "Traceback",
            "raise RuntimeError('Argh')",
            "RuntimeError: Argh",
        ]
        try:
            run_cli(parsed)
        except ErtCliError as error:
            for substring in expected_substrings:
>               assert substring in f"{error}"
E               AssertionError: assert 'Realization: 0 failed after reaching max submit (2)' in 'Experiment failed!\n'

To reproduce
It only fails on github action workflows.
more info here for example: https://github.com/equinor/komodo-releases/actions/runs/7352338393/job/20017761397

Expected behaviour
The test should pass

Screenshots
If applicable, add screenshots to help explain your problem.

Environment

OS: [e.g. RHEL7]
ERT/Komodo release: bleeiding
Python version
Remote/HPC execution involved: no

Additional context
This test fails when running with job_queue while scheduler execution is skipped. Apparently the error message is not composed correctly.

jonathan-eq · 2024-01-02T08:35:24Z

As of today, my bisect tests stopped working completely. The test does not complete, and fails after two hours on ert.services._base_service.ServerBootFail. Marking this as blocked until this is no longer occuring.

jonathan-eq · 2024-01-04T09:14:33Z

It is still working on nightly builds, so it is my github actions workflow setup that is not working correctly.

jonathan-eq · 2024-01-04T11:47:50Z

Depends on #6888 for better error logging

jonathan-eq · 2024-01-04T13:58:13Z

~~NB: This error only occurs for job queue, not scheduler~~
It occurs for both modes. For job queue constantly, and scheduler sometimes.

berland · 2024-01-04T14:46:24Z

Possible lead: The XML part of this was changed in the legacy code while implementing support for it in scheduler.

xjules · 2024-01-05T08:55:06Z

NB: This error only occurs for job queue, not scheduler

Last 3 days it has failed on both.

berland · 2024-01-10T08:11:11Z

The error is reproducible by logging in to a linappnode, su-ing to f_scout_ci and replicating commands in run_tests_one_project.yml

berland · 2024-01-10T08:40:42Z

A hypothesis is that the runpath poly_example is being reused:

[f_scout_ci@st-linapp1192 iter-0]$ pwd
/private/f_scout_ci/ert/pytest_tmp_dir/pytest-of-f_scout_ci/pytest-3/popen-gw3/poly_example0/test_data/poly_out/realization-1/iter-0
[f_scout_ci@st-linapp1192 iter-0]$ ls -l
total 344
-rw-rw-r-- 1 f_scout_ci f_scout_ci  272 Jan 10 09:25 JOB_LOG
-rw-rw-r-- 1 f_scout_ci f_scout_ci 1691 Jan 10 09:25 jobs.json
-rw-rw-r-- 1 f_scout_ci f_scout_ci 1690 Jan 10 09:24 jobs.json_backup_2024-01-10_09-25-17Z
drwxrwxr-x 2 f_scout_ci f_scout_ci  104 Jan 10 09:25 logs
-rw-rw-r-- 1 f_scout_ci f_scout_ci   28 Jan 10 09:25 OK
-rw-rw-r-- 1 f_scout_ci f_scout_ci   97 Jan 10 09:25 parameters.json
-rw-rw-r-- 1 f_scout_ci f_scout_ci   97 Jan 10 09:24 parameters.json_backup_2024-01-10_09-25-17Z
-rw-rw-r-- 1 f_scout_ci f_scout_ci   52 Jan 10 09:25 parameters.txt
-rw-rw-r-- 1 f_scout_ci f_scout_ci   52 Jan 10 09:24 parameters.txt_backup_2024-01-10_09-25-17Z
-rw-rw-r-- 1 f_scout_ci f_scout_ci    0 Jan 10 09:25 poly_eval.stderr.0
-rw-rw-r-- 1 f_scout_ci f_scout_ci    0 Jan 10 09:25 poly_eval.stdout.0
-rw-rw-r-- 1 f_scout_ci f_scout_ci  184 Jan 10 09:25 poly.out
-rw-rw-r-- 1 f_scout_ci f_scout_ci  128 Jan 10 09:25 STATUS
-rw-rw-r-- 1 f_scout_ci f_scout_ci  501 Jan 10 09:25 status.json
[f_scout_ci@st-linapp1192 iter-0]$ cat JOB_LOG
09:24:48  Calling: /private/f_scout_ci/ert/pytest_tmp_dir/pytest-of-f_scout_ci/pytest-3/popen-gw3/poly_example0/test_data/poly_eval.py
09:25:19  Calling: /private/f_scout_ci/ert/pytest_tmp_dir/pytest-of-f_scout_ci/pytest-3/popen-gw3/poly_example0/test_data/poly_eval.py

berland · 2024-01-10T11:52:19Z

A hypothesis is that the runpath poly_example is being reused:

Proven to be a wrong lead.

berland · 2024-01-10T11:55:46Z

A new lead is that something is wrong with the logger setup.

This command will give a failing test within a few minutes:

$ python -m pytest -n 4 --benchmark-disable --eclipse-simulator --durations=0 -v --dist load -k "integration_cli" tests

while this will pass:

$ python -m pytest -n 4 --benchmark-disable --eclipse-simulator --durations=0 -v --dist load -k "integration_cli" tests --log-cli-level=DEBUG

berland · 2024-01-10T12:59:26Z

Bisecting the komodo nightly build logs, the last good version is from December 18. Then the good/bad state of nightly builds is masked by pydantic issues, until December 29 where we have first failure.

xjules added the bug label Dec 29, 2023

github-project-automation bot added this to SCOUT Dec 29, 2023

jonathan-eq self-assigned this Dec 29, 2023

jonathan-eq moved this to In Progress in SCOUT Dec 29, 2023

jonathan-eq added the blocked label Jan 2, 2024

jonathan-eq removed the blocked label Jan 4, 2024

jonathan-eq moved this from In Progress to Todo in SCOUT Jan 4, 2024

jonathan-eq moved this from Todo to In Progress in SCOUT Jan 4, 2024

xjules mentioned this issue Jan 5, 2024

test_failing_job_cli_error_messsage fails on komodo #6905

Closed

berland self-assigned this Jan 9, 2024

berland mentioned this issue Jan 10, 2024

Ensure consistent log levels in integration tests #6922

Merged

9 tasks

berland closed this as completed in #6922 Jan 10, 2024

github-project-automation bot moved this from In Progress to Done in SCOUT Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix tests/integration_tests/cli/test_integration_cli.py::test_failing_job_cli_error_message #6863

Fix tests/integration_tests/cli/test_integration_cli.py::test_failing_job_cli_error_message #6863

xjules commented Dec 29, 2023 •

edited

Loading

jonathan-eq commented Jan 2, 2024

jonathan-eq commented Jan 4, 2024

jonathan-eq commented Jan 4, 2024

jonathan-eq commented Jan 4, 2024 •

edited

Loading

berland commented Jan 4, 2024

xjules commented Jan 5, 2024 •

edited

Loading

berland commented Jan 10, 2024

berland commented Jan 10, 2024

berland commented Jan 10, 2024

berland commented Jan 10, 2024

berland commented Jan 10, 2024

Fix tests/integration_tests/cli/test_integration_cli.py::test_failing_job_cli_error_message #6863

Fix tests/integration_tests/cli/test_integration_cli.py::test_failing_job_cli_error_message #6863

Comments

xjules commented Dec 29, 2023 • edited Loading

jonathan-eq commented Jan 2, 2024

jonathan-eq commented Jan 4, 2024

jonathan-eq commented Jan 4, 2024

jonathan-eq commented Jan 4, 2024 • edited Loading

berland commented Jan 4, 2024

xjules commented Jan 5, 2024 • edited Loading

berland commented Jan 10, 2024

berland commented Jan 10, 2024

berland commented Jan 10, 2024

berland commented Jan 10, 2024

berland commented Jan 10, 2024

xjules commented Dec 29, 2023 •

edited

Loading

jonathan-eq commented Jan 4, 2024 •

edited

Loading

xjules commented Jan 5, 2024 •

edited

Loading