-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nrunner: introduce status server auto mode #4879
nrunner: introduce status server auto mode #4879
Conversation
dd01e5b
to
c470b8c
Compare
This introduces a mode under which the status server will set up its own "listen" and "uri" values. The "listen" value is of course related to where the status server will be listening for status updates, from clients (tasks) which are given the "uri" value. Instead of a TCP based socket, the auto mode defaults to using a UNIX domain socket created within the job's own directory, which better avoids clashes (no two jobs should be using the same job directory anyway). One way to see the difference in action is to run: $ nc -l 127.0.0.1 8888 And on another session: $ avocado run --test-runner=nrunner --nrunner-status-server-auto /bin/true And compare it to: $ avocado run --test-runner=nrunner /bin/true But not using a resource available system wide, but one inside the job's directory, the possibility of clashes are virtually non-existing. Signed-off-by: Cleber Rosa <[email protected]>
Users often complain about jobs with no test results. There are of course many possible causes, but a common one is a clash and failure of communication with the status server. Because it's currently dependent on a fixed TCP port, clashes are quite easy, especially when running multiple jobs at once (or nested). Also, a good number of tests have to go through the trouble of setting up custom status server to avoid clashes. This removes those custom and simply relies on the now automatic status servers. This enables the automatic status server, based on a much more private UNIX domain socket, so that clashes become virtually impossible. Signed-off-by: Cleber Rosa <[email protected]>
c470b8c
to
6a2c45e
Compare
FYI, I was testing it and got:
These are the status server messages:
|
You got those when not using the auto mode, right? If so, this is to the best of my knowledge not a regression, but just what tampering with the socket that would be used by the status servers results in.
|
Right.
Okay, so what would be the correct way to run the following command without crashing Avocado?
I don't think I got the idea of the option here. |
Good! I was starting to think I had things upside down here :)
The correct way is to just use the auto mode, the default after the 2nd commit, meaning:
With that, even if you have something using the status server port, you'd be fine. |
Another example that shows clash-free behavior is:
Resulting in:
|
I got the idea of the auto mode. What I'm wondering is why do we have the command-line option |
Okay, forget about it, I ran the command multiple times again, with and without the command-line option and it is not failing anymore. Don't know what happened. |
The purpose of Now, it's not |
Thanks for the explanation. I was testing it with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @clebergnu, besides my squash comments. I do have one question:
How this is going to work with --nrunner-spawner='podman'
? Aren't the UNIX sockets a local thing?
I'm getting this result:
avocado run --test-runner=nrunner --nrunner-spawner=podman /bin/true
JOB ID : fc75cc05d89ffadf3871f61cc18ec20482d419b7
JOB LOG : /home/local/avocado/job-results/job-2021-08-20T10.04-fc75cc0/job.log
RESULTS : PASS 0 | ERROR 0 | FAIL 0 | SKIP 1 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB HTML : /home/local/avocado/job-results/job-2021-08-20T10.04-fc75cc0/results.html
JOB TIME : 45.74 s
'status server.') | ||
settings.register_option(section=section, | ||
key='status_server_auto', | ||
default=False, | ||
default=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind if those changes are squashed with the previous commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept the two apart, because I felt like the switch to auto mode, would be more controversial than the introduction of the feature itself.
But, seems I was wrong. I'll squash them on v2.
long_arg='--nrunner-status-server-auto', | ||
action='store_true') | ||
long_arg='--nrunner-status-server-disable-auto', | ||
action='store_false') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here?
@beraldoleal good point about the On the short term, we can look into:
How does that sound? |
^ Not a good idea, tried and failed. According to this, it requires a privileged container, so this would work: diff --git a/avocado/plugins/spawners/podman.py b/avocado/plugins/spawners/podman.py
index 14b4b34df..9b0745a83 100644
--- a/avocado/plugins/spawners/podman.py
+++ b/avocado/plugins/spawners/podman.py
@@ -87,6 +87,15 @@ class PodmanSpawner(Spawner, SpawnerMixin):
return out.startswith(b'Up ')
async def spawn_task(self, runtime_task):
+
+ mount_status_server_socket = False
+ mounted_status_server_socket = '/tmp/.status_server.sock'
+ status_server_uri = runtime_task.task.status_services[0].uri
+ if ':' not in status_server_uri:
+ # a unix domain socket is being used
+ mount_status_server_socket = True
+ runtime_task.task.status_services[0].uri = mounted_status_server_socket
+
task = runtime_task.task
entry_point_cmd = '/tmp/avocado-runner'
entry_point_args = task.get_command_args()
@@ -109,6 +118,8 @@ class PodmanSpawner(Spawner, SpawnerMixin):
proc = await asyncio.create_subprocess_exec(
podman_bin, "create",
"--net=host",
+ "--privileged",
+ "-v", "%s:%s" % (status_server_uri, '/tmp/.status_server.sock'),
entry_point_arg,
image,
stdout=asyncio.subprocess.PIPE, I'm not sure how I feel about it though.
|
Looks like a quick fix/workaround until we come up with the implementation for the second option, which I think is a valid use case for other spawners too.
|
This seems to be a podman only solution, and maybe we could think a little bit further for a more generic solution (I'm thinking about future possible spawners)
Hum... Not sure if I understood you here. But one "ideal" scenario, this should still be done via TCP. And to avoid collisions maybe some kind of token/identifier of the client to avoid messing the messages. Again, this could also be solved with a PUB/SUB mechanism and different channels.
Sure, yes, lets document, move forward and try to investigate how to work with a more robust solution for all spawners. Now, thinking about the default behavior, since this is not working with podman spawner, maybe we should discard the second commit changes that makes this default. Makes sense? |
And TBH we can trade (drop) the |
Ok, just found the workaround solution exporting to the spawner. So yes, I'm ok with using it, but IMO we still need a more generic solution here. |
This introduces a mode under which the status server will set up its own "listen" and "uri" values.
The "listen" value is of course related to where the status server will be listening for status updates, from clients (tasks) which are given the "uri" value.
Instead of a TCP based socket, the auto mode defaults to using a UNIX domain socket created within the job's own directory, which better avoids clashes (no two jobs should be using the same job directory anyway).