Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nrunner seems to not cleanup properly on timeout #5407

Closed
kraxel opened this issue Jun 21, 2022 · 5 comments · Fixed by #5795
Closed

nrunner seems to not cleanup properly on timeout #5407

kraxel opened this issue Jun 21, 2022 · 5 comments · Fixed by #5795
Assignees
Labels

Comments

@kraxel
Copy link
Contributor

kraxel commented Jun 21, 2022

Describe the bug
nrunner seems to not call tearDown() when a testcase times out

Steps to reproduce
I have testcases which need some background processes.
They are started by the testcase like this:

self.dnsmasq = avocado.utils.process.SubProcess(cmdline, ...)

And are stopped this way:

    def tearDown(self):
        self.log.info('### stop processes')
        if self.dnsmasq is not None:
            self.dnsmasq.stop()

Expected behavior
When the test is done tearDown() is called for cleanup.

Current behavior
When the test case times out (result 'INTERRUPTED') tearDown() is apparently not called,
so I sometimes have leftover processes hanging around after the test suite is done.

System information (please complete the following information):

  • Fedora 36
  • Avocado 97.0
  • Installed via pip3 --user

Additional information
Used to work fine with the old runner, so I switched back to that,
hoping nrunner gets fixed before the old runner is removed.

Which worked for a number of other nrunner issues, but not this one ...

test cases repo: https://gitlab.com/kraxel/edk2-tests

@clebergnu
Copy link
Contributor

@kraxel thanks for reporting this issue.

Interrupting the test and then moving the execution to the same (test) instance tearDown() method would be very hard to implement. I can see two ways around this:

  1. Implement specific resource trackers in Avocado. Those can include processes and files in known (or configurable) locations that Avocado itself will clean up post test execution (including interruption).
  2. Evolve the other better isolated spawners such as podman. The way the podman spawner behave is that the container will be destroyed upon test interruption, so all children processes would be automatically removed too.

Bear in mind that option two currently has a number of shortcomings, but we intend to make it no different from a local test execution.

Would you, as a developer/user, fancy one over the other?

@kraxel
Copy link
Contributor Author

kraxel commented Jun 22, 2022

option two shortcoming #1 -- seems to not be documented very well. Is there something more useful the object reference (which doesn't tell me how I can use it) ?
https://avocado-framework.readthedocs.io/en/latest/api/plugins/avocado.plugins.spawners.html#module-avocado.plugins.spawners.podman

Without that it's hard to judge. From a design point of view running inside a container and leave the cleanup to the container runtime makes sense to me.

What are the known shortcomings?

How does the podman spawner handle networking? I have some tests which create an isolated network namespace, then hook up dhcp, dns, tftp, http and qemu to that namespace for network boot testing. Can that be done with the podman spawner? Or maybe I don't have to create a network namespace myself then because podman creates one for the container anyway?

@clebergnu
Copy link
Contributor

option two shortcoming #1 -- seems to not be documented very well. Is there something more useful the object reference (which doesn't tell me how I can use it) ? https://avocado-framework.readthedocs.io/en/latest/api/plugins/avocado.plugins.spawners.html#module-avocado.plugins.spawners.podman

True. We do expect to make this more visible as it grows in capabilities. Expect an update to the documentation shortly.

Anyway, right now the podman spawner is only visible by doing:

$ avocado plugins
...
Plugins that spawn tasks and know about their status (spawner): 
podman  Podman (container) based spawner
process Process based spawner
...

And the usage message is available when running avocado run --help:

...
  --nrunner-spawner SPAWNER
                        Spawn tasks in a specific spawner. Available spawners:
                        'process' and 'podman'
...
podman spawner specific options:
  --spawner-podman-bin PODMAN_BIN
                        Path to the podman binary
  --spawner-podman-image CONTAINER_IMAGE
                        Image name to use when creating the container. The
                        first default choice is a container image matching the
                        current OS. If unable to detect, default becomes the
                        latest Fedora release. Default on this system:
                        fedora:36
  --spawner-podman-avocado-egg AVOCADO_EGG
                        Avocado egg path to be used during initial bootstrap
                        of avocado inside the isolated environment. By
                        default, Avocado will try to download (or get from
                        cache) an egg from its repository.

Without that it's hard to judge. From a design point of view running inside a container and leave the cleanup to the container runtime makes sense to me.

What are the known shortcomings?

There may be many shortcomings, some inherent from the container technology itself, and some from Avocado's use of the technology.
But, there are not many known at this point due to its lack of widespread use. I can think of the fact that the container environment needs to have a functional Python stack at this moment (but nothing else).

Its strengths when paired with Avocado, on the other hand, include:

  • The automatic provisioning (in a non-intrusive way) of the Avocado runtime in the container (for all currently supported Python versions)
  • An automatic read-only volume mounting of the base location of your test files into the container (so that test files and associated data file are found)

How does the podman spawner handle networking? I have some tests which create an isolated network namespace, then hook up dhcp, dns, tftp, http and qemu to that namespace for network boot testing. Can that be done with the podman spawner? Or maybe I don't have to create a network namespace myself then because podman creates one for the container anyway?

The communication between Avocado running the job (and spawning tests in a container) and the actual "avocado-runner-*" running the test inside the container are not dependent on networking proper (that's the short answer). So it should be possible to use any of the podman networking capabilities, including its own isolated network namespace indeed. Avocado currently doesn't allow control over the networking options for the podman container it creates, but that's an easy fix.

In short, Avocado is betting that container based workloads make a lot of sense for functional/integration testing, so a lot of new features are being developed with this mindset. One example of a very cool feature under development is this.

Long story short, all the dependencies in a test will be attempted to be fulfilled transparently in the environment they are running (including podman containers), and those environments will be cached for future use. If you need to install packages, download files (and more to come) for your test to run successfully, you just need to describe them in the test. Example:

from avocado import Test
from avocado.utils import process

class TFTP(Test):
    """
    avocado: dependency={"type": "package", "name": "tftp-server"}
    """
    def test(self):
        self.log.info(process.run("in.tftpd --version"))

On first run, Avocado will prepare the container images. On subsequent runs, Avocado will just reuse them.

Let me know if that sounds valuable and an extra incentive to investigate the use option two with the edk2 tests.

@clebergnu clebergnu self-assigned this Oct 31, 2022
@richtja richtja removed the triage label Nov 14, 2022
@richtja richtja moved this to Long Term (Next Q) Backlog in Default project Dec 15, 2022
@clebergnu clebergnu moved this from Long Term (Next Q) Backlog to Short Term (Current Q) Backlog in Default project Jan 17, 2023
@clebergnu
Copy link
Contributor

In order to move this issue forward, I'll attempt to run the e2dk tests (and examples given in the issue description) with the podman spawner.

@clebergnu clebergnu added this to the #101 (Codename TBD) milestone Jan 17, 2023
@richtja richtja moved this from Short Term (Current Q) Backlog to Triage in Default project Jan 17, 2023
@richtja richtja moved this from Triage to Short Term (Current Q) Backlog in Default project Jan 17, 2023
@kraxel
Copy link
Contributor Author

kraxel commented Jan 18, 2023

You might want revert 9c79f0c5ddacc172bca8b1d86c384614672a5423 then.
That wraps qemu into a 'timeout' call so in case it hangs for whatever reason it gets killed with SIGALARM and the cleanup functions get a chance to run.

@clebergnu clebergnu assigned richtja and unassigned clebergnu Jul 13, 2023
@richtja richtja moved this from Short Term (Current Q) Backlog to In progress in Default project Jul 24, 2023
richtja added a commit to richtja/avocado that referenced this issue Nov 8, 2023
It introduces SIGTERM handling in avocado-instrumented tests to run a
tearDown method when the test has been interrupted. It uses
functionality from Test class, which already handles interruptions. It
only adds a monitoring loop to send all messages created by tearDown to
avocado core.

Reference: avocado-framework#5707 avocado-framework#5407
Signed-off-by: Jan Richter <[email protected]>
richtja added a commit to richtja/avocado that referenced this issue Nov 9, 2023
It introduces SIGTERM handling in avocado-instrumented tests to run a
tearDown method when the test has been interrupted. It uses
functionality from Test class, which already handles interruptions. It
only adds a monitoring loop to send all messages created by tearDown to
avocado core.

Reference: avocado-framework#5707 avocado-framework#5407
Signed-off-by: Jan Richter <[email protected]>
richtja added a commit to richtja/avocado that referenced this issue Nov 9, 2023
It introduces SIGTERM handling in avocado-instrumented tests to run a
tearDown method when the test has been interrupted. It uses
functionality from Test class, which already handles interruptions. It
only adds a monitoring loop to send all messages created by tearDown to
avocado core.

Reference: avocado-framework#5707 avocado-framework#5407
Signed-off-by: Jan Richter <[email protected]>
@richtja richtja linked a pull request Nov 9, 2023 that will close this issue
@github-project-automation github-project-automation bot moved this from In progress to Done 103 in Default project Nov 10, 2023
@github-project-automation github-project-automation bot moved this to Done #100(the 100) in Avocado Kanban Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done #100(the 100)
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants