nrunner seems to not cleanup properly on timeout #5407

kraxel · 2022-06-21T09:37:08Z

Describe the bug
nrunner seems to not call tearDown() when a testcase times out

Steps to reproduce
I have testcases which need some background processes.
They are started by the testcase like this:

self.dnsmasq = avocado.utils.process.SubProcess(cmdline, ...)

And are stopped this way:

    def tearDown(self):
        self.log.info('### stop processes')
        if self.dnsmasq is not None:
            self.dnsmasq.stop()

Expected behavior
When the test is done tearDown() is called for cleanup.

Current behavior
When the test case times out (result 'INTERRUPTED') tearDown() is apparently not called,
so I sometimes have leftover processes hanging around after the test suite is done.

System information (please complete the following information):

Fedora 36
Avocado 97.0
Installed via pip3 --user

Additional information
Used to work fine with the old runner, so I switched back to that,
hoping nrunner gets fixed before the old runner is removed.

Which worked for a number of other nrunner issues, but not this one ...

test cases repo: https://gitlab.com/kraxel/edk2-tests

The text was updated successfully, but these errors were encountered:

clebergnu · 2022-06-21T15:44:04Z

@kraxel thanks for reporting this issue.

Interrupting the test and then moving the execution to the same (test) instance tearDown() method would be very hard to implement. I can see two ways around this:

Implement specific resource trackers in Avocado. Those can include processes and files in known (or configurable) locations that Avocado itself will clean up post test execution (including interruption).
Evolve the other better isolated spawners such as podman. The way the podman spawner behave is that the container will be destroyed upon test interruption, so all children processes would be automatically removed too.

Bear in mind that option two currently has a number of shortcomings, but we intend to make it no different from a local test execution.

Would you, as a developer/user, fancy one over the other?

kraxel · 2022-06-22T08:19:35Z

option two shortcoming #1 -- seems to not be documented very well. Is there something more useful the object reference (which doesn't tell me how I can use it) ?
https://avocado-framework.readthedocs.io/en/latest/api/plugins/avocado.plugins.spawners.html#module-avocado.plugins.spawners.podman

Without that it's hard to judge. From a design point of view running inside a container and leave the cleanup to the container runtime makes sense to me.

What are the known shortcomings?

How does the podman spawner handle networking? I have some tests which create an isolated network namespace, then hook up dhcp, dns, tftp, http and qemu to that namespace for network boot testing. Can that be done with the podman spawner? Or maybe I don't have to create a network namespace myself then because podman creates one for the container anyway?

clebergnu · 2022-06-24T21:14:02Z

option two shortcoming #1 -- seems to not be documented very well. Is there something more useful the object reference (which doesn't tell me how I can use it) ? https://avocado-framework.readthedocs.io/en/latest/api/plugins/avocado.plugins.spawners.html#module-avocado.plugins.spawners.podman

True. We do expect to make this more visible as it grows in capabilities. Expect an update to the documentation shortly.

Anyway, right now the podman spawner is only visible by doing:

$ avocado plugins
...
Plugins that spawn tasks and know about their status (spawner): 
podman  Podman (container) based spawner
process Process based spawner
...

And the usage message is available when running avocado run --help:

...
  --nrunner-spawner SPAWNER
                        Spawn tasks in a specific spawner. Available spawners:
                        'process' and 'podman'
...
podman spawner specific options:
  --spawner-podman-bin PODMAN_BIN
                        Path to the podman binary
  --spawner-podman-image CONTAINER_IMAGE
                        Image name to use when creating the container. The
                        first default choice is a container image matching the
                        current OS. If unable to detect, default becomes the
                        latest Fedora release. Default on this system:
                        fedora:36
  --spawner-podman-avocado-egg AVOCADO_EGG
                        Avocado egg path to be used during initial bootstrap
                        of avocado inside the isolated environment. By
                        default, Avocado will try to download (or get from
                        cache) an egg from its repository.

Without that it's hard to judge. From a design point of view running inside a container and leave the cleanup to the container runtime makes sense to me.

What are the known shortcomings?

There may be many shortcomings, some inherent from the container technology itself, and some from Avocado's use of the technology.
But, there are not many known at this point due to its lack of widespread use. I can think of the fact that the container environment needs to have a functional Python stack at this moment (but nothing else).

Its strengths when paired with Avocado, on the other hand, include:

The automatic provisioning (in a non-intrusive way) of the Avocado runtime in the container (for all currently supported Python versions)
An automatic read-only volume mounting of the base location of your test files into the container (so that test files and associated data file are found)

How does the podman spawner handle networking? I have some tests which create an isolated network namespace, then hook up dhcp, dns, tftp, http and qemu to that namespace for network boot testing. Can that be done with the podman spawner? Or maybe I don't have to create a network namespace myself then because podman creates one for the container anyway?

The communication between Avocado running the job (and spawning tests in a container) and the actual "avocado-runner-*" running the test inside the container are not dependent on networking proper (that's the short answer). So it should be possible to use any of the podman networking capabilities, including its own isolated network namespace indeed. Avocado currently doesn't allow control over the networking options for the podman container it creates, but that's an easy fix.

In short, Avocado is betting that container based workloads make a lot of sense for functional/integration testing, so a lot of new features are being developed with this mindset. One example of a very cool feature under development is this.

Long story short, all the dependencies in a test will be attempted to be fulfilled transparently in the environment they are running (including podman containers), and those environments will be cached for future use. If you need to install packages, download files (and more to come) for your test to run successfully, you just need to describe them in the test. Example:

from avocado import Test
from avocado.utils import process

class TFTP(Test):
    """
    avocado: dependency={"type": "package", "name": "tftp-server"}
    """
    def test(self):
        self.log.info(process.run("in.tftpd --version"))

On first run, Avocado will prepare the container images. On subsequent runs, Avocado will just reuse them.

Let me know if that sounds valuable and an extra incentive to investigate the use option two with the edk2 tests.

clebergnu · 2023-01-17T16:08:06Z

In order to move this issue forward, I'll attempt to run the e2dk tests (and examples given in the issue description) with the podman spawner.

kraxel · 2023-01-18T09:17:48Z

You might want revert 9c79f0c5ddacc172bca8b1d86c384614672a5423 then.
That wraps qemu into a 'timeout' call so in case it hangs for whatever reason it gets killed with SIGALARM and the cleanup functions get a chance to run.

It introduces SIGTERM handling in avocado-instrumented tests to run a tearDown method when the test has been interrupted. It uses functionality from Test class, which already handles interruptions. It only adds a monitoring loop to send all messages created by tearDown to avocado core. Reference: avocado-framework#5707 avocado-framework#5407 Signed-off-by: Jan Richter <[email protected]>

kraxel added bug triage labels Jun 21, 2022

clebergnu self-assigned this Oct 31, 2022

richtja removed the triage label Nov 14, 2022

richtja added this to Default project Dec 15, 2022

richtja moved this to Long Term (Next Q) Backlog in Default project Dec 15, 2022

clebergnu moved this from Long Term (Next Q) Backlog to Short Term (Current Q) Backlog in Default project Jan 17, 2023

clebergnu added this to the #101 (Codename TBD) milestone Jan 17, 2023

richtja moved this from Short Term (Current Q) Backlog to Triage in Default project Jan 17, 2023

richtja moved this from Triage to Short Term (Current Q) Backlog in Default project Jan 17, 2023

clebergnu modified the milestones: #101 (One Hundred and One Dalmatians), #102 (Codename TBD) Mar 10, 2023

richtja modified the milestones: #102 (102 Dalmatians), #103 (LTS release - Codename TBD) Jun 26, 2023

clebergnu assigned richtja and unassigned clebergnu Jul 13, 2023

richtja moved this from Short Term (Current Q) Backlog to In progress in Default project Jul 24, 2023

richtja mentioned this issue Oct 11, 2023

Documentation review before LTS release #5757

Closed

7 tasks

richtja mentioned this issue Nov 9, 2023

avocado-instrumented tearDown method during timeout #5795

Merged

richtja linked a pull request Nov 9, 2023 that will close this issue

avocado-instrumented tearDown method during timeout #5795

Merged

clebergnu closed this as completed in #5795 Nov 10, 2023

github-project-automation bot moved this from In progress to Done 103 in Default project Nov 10, 2023

github-project-automation bot added this to Avocado Kanban Aug 28, 2024

github-project-automation bot moved this to Done #100(the 100) in Avocado Kanban Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nrunner seems to not cleanup properly on timeout #5407

nrunner seems to not cleanup properly on timeout #5407

kraxel commented Jun 21, 2022

clebergnu commented Jun 21, 2022

kraxel commented Jun 22, 2022

clebergnu commented Jun 24, 2022

clebergnu commented Jan 17, 2023

kraxel commented Jan 18, 2023

nrunner seems to not cleanup properly on timeout #5407

nrunner seems to not cleanup properly on timeout #5407

Comments

kraxel commented Jun 21, 2022

clebergnu commented Jun 21, 2022

kraxel commented Jun 22, 2022

clebergnu commented Jun 24, 2022

clebergnu commented Jan 17, 2023

kraxel commented Jan 18, 2023