Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requirements database #5412

Closed

Conversation

richtja
Copy link
Contributor

@richtja richtja commented Jun 23, 2022

This PR adds the ability to spawners to store requirements into cache.
This will make more efficient running multiple tests with the same
requirement or rerunning the jobs itself.

This also brings ability to use requirements with podman spawner. So now
is possible to run something like this:

class SuccessTest(Test):

    def check_hello(self):
        result = process.run("hello", ignore_status=True)
        self.assertEqual(result.exit_status, 0)
        self.assertIn('Hello, world!', result.stdout_text,)

    def test_a(self):
        """
        avocado dependency={"type": "package", "name": "hello"}
        """
        self.check_hello()

Warning: Right now, the requirement cache doesn't have any controls for
clearing cache or updating, and also didn't check the current
environment. So if the user do some changes to the environment outside
Avocado runtime, like uninstalling package, podman prune, removing
asset, etc. The tests with requirements will be broken.

Signed-off-by: Jan Richter [email protected]

@richtja richtja added this to the #98 milestone Jun 23, 2022
@richtja richtja requested review from beraldoleal and clebergnu June 23, 2022 22:10
@richtja richtja self-assigned this Jun 23, 2022
richtja added 5 commits June 28, 2022 11:08
It changes order of runtime tasks in dependency graph, in such way that
pre-test task are depending on each other, which ensures serial run in
state machine for pre-test tasks. It will solve problems with conflicts
between different pre-test task runs.

Signed-off-by: Jan Richter <[email protected]>
The runtime task has information about all its dependencies. Let's add a
method which will list all of them which have been already finished.

Signed-off-by: Jan Richter <[email protected]>
The RuntimeTask has all the information to about dependencies to make a
decision if the dependencies were resolved, and it is possible to run a
task. So let's move this responsibility to the RuntimeTask.

Signed-off-by: Jan Richter <[email protected]>
This adds few methods for manipulating with requirements cache. This
will be useful for spawners which will use requirements cache.

Signed-off-by: Jan Richter <[email protected]>
This commit add ability to spawners to store requirements into cache.
This will make more efficient running multiple tests with the same
requirement or rerunning the jobs itself.

This also brings ability to use requirements with podman spawner. So now
is possible to run something like this:

class SuccessTest(Test):

    def check_hello(self):
        result = process.run("hello", ignore_status=True)
        self.assertEqual(result.exit_status, 0)
        self.assertIn('Hello, world!', result.stdout_text,)

    def test_a(self):
        """
        🥑 dependency={"type": "package", "name": "hello"}
        """
        self.check_hello()

Warning: Right now, the requirement cache doesn't have any controls for
clearing cache or updating, and also didn't check the current
environment. So if the user do some changes to the environment outside
Avocado runtime, like uninstalling package, podman prune, removing
asset, etc. The tests with requirements will be broken.

Reference:
Signed-off-by: Jan Richter <[email protected]>
@richtja richtja force-pushed the requirements_database branch from d2c0dda to c129e87 Compare June 28, 2022 09:27
Copy link
Contributor

@clebergnu clebergnu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @richtja,

This is some GREAT work! I've spotted some things that need to be updated, and also some discussion points.

BTW, I tried quite hard to break this first version, and couldn't. Just about everything worked the way I expected.

For a new version, we need to improve the documentation and make this very visible to users (maybe even a section on the README).

@@ -201,29 +198,23 @@ def __init__(self, tests, test_suite_name, status_server_uri, job_id):
test_suite_name,
status_server_uri,
job_id)
self.graph[runtime_test] = runtime_test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bit of "diff noise" here, with one empty line being removed, and another one added. Seems like it was unintentional.

@@ -62,6 +62,14 @@ def are_dependencies_finished(self):
return False
return True

def get_finished_dependencies(self):
"""Returns all dependencies which already finished."""
finished = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's certainly a matter of taste, but I find list comprehensions more readable in situations like this. Suggestion:

    def get_finished_dependencies(self):
        """Returns all dependencies which already finished."""
        return [dep for dep in self.dependencies if
                dep.status and "FINISHED" in dep.status]

But as it is a matter of style, free to ignore this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can change that.

@@ -70,6 +69,15 @@ def get_finished_dependencies(self):
finished.append(dependency)
return finished

def is_possible_to_run(self):
dependency_finished = self.are_dependencies_finished()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the dependency_finished variable is never used again, this could become:

if self.are_dependencies_finished():
   for dependency in self.dependencies:
...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or:

if not self.are_dependencies_finished():
   return False

for dependency in self.dependencies:
...

result_stats = set(key.upper()for key in
self._state_machine._status_repo.result_stats.keys())
if self._failfast and not result_stats.isdisjoint(STATUSES_NOT_OK):
await self._state_machine.abort("FAILFAST is enabled")
raise TestFailFast("Interrupting job (failfast).")

await self._state_machine.finish_task(runtime_task)
await self._state_machine.finish_task(runtime_task, "FINISHED")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this, because without a status_reason, the log message produced will be:

Task "foo-bar" finished

And with this change, it will be:

Task "foo-bar" finished: FINISHED

I am not sure it adds value... To me, adds a bit of confusion.

Copy link
Contributor Author

@richtja richtja Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can change the log message here. Because the important part of this change is, that the finish_task will set the runtime_task status to FINISHED and the status is used for detecting all finished tasks in here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adds some value, if we have all the states well defined. this is the reason and we could definitely be explicit about the reason. What are possible reasons for a finished task?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can change the log message here. Because the important part of this change is, that the finish_task will set the runtime_task status to FINISHED and the status is used for detecting all finished tasks in here.

I'm fine with that, and I overlooked the function of the status. We can have the message something like:

Task "foo-bar" finished with status: FINISHED

That seems fine to me.

requirement_type, requirement):
"""Checks if requirement is in cache.

:rtype: True if requirement is in cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These docstring lines are misaligned.

return False


def update_enviroment(environment_type, old_environment, new_environment):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: s/enviroment/environment


if is_task_in_cache:
await self._state_machine.finish_task(
runtime_task, "FINISHED: Task in cache")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think FINISHED is not needed, given that finish_test will already put that information in the log.

Copy link
Contributor Author

@richtja richtja Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FINISHED is important here because it will be part of runtime_task status and the status is used for detecting all finished tasks in here. It is the same problem as I mentioned in monitor method.

from avocado.core.teststatus import STATUSES_NOT_OK

ENVIRONMENT_TYPE = 'local'
ENVIRONMENT = 'localhost.localdomain'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is the best choice for the environment name. Maybe use the actual hostname?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I will change that.

result = process.run(command, ignore_status=True)
self.assertEqual(result.exit_status, exit_codes.AVOCADO_ALL_OK)
self.assertIn('PASS 1', result.stdout_text,)
self.assertIn('SKIP 2', result.stdout_text,)
self.assertNotIn('-foo-bar-', result.stdout_text,)

@unittest.skipUnless(os.getenv('CI'), skip_package_manager_message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could later be converted into Avocado tests and use variants (instead of the repeated test with the podman=True parameters).

Another point to be discussed and considered, is if we could/should rely on requirements mechanism itself to determine if podman tests should run (if podman package is available?), instead of using skip*.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great idea, I will do that. I was so focused on the problem itself that I didn't use avocado features for making my testing easier 😅

@richtja
Copy link
Contributor Author

richtja commented Jun 28, 2022

Hi @richtja,

This is some GREAT work! I've spotted some things that need to be updated, and also some discussion points.

Thanks @clebergnu for review. I will use your suggestions. We just need to discuss the usage of runtime_task.status for detecting finished tasks.

BTW, I tried quite hard to break this first version, and couldn't. Just about everything worked the way I expected.

That is great to hear, thanks for the testing.

For a new version, we need to improve the documentation and make this very visible to users (maybe even a section on the README).

Sure, I will add it to the new version.

Copy link
Member

@beraldoleal beraldoleal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @richtja this LGTM, nice work! Just left a few comments for you.

return hash(self.task.identifier)
return hash((str(self.task.runnable), self.task.job_id,
self.task.category))
return hash(self.task.identifier)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -70,6 +69,15 @@ def get_finished_dependencies(self):
finished.append(dependency)
return finished

def is_possible_to_run(self):
dependency_finished = self.are_dependencies_finished()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or:

if not self.are_dependencies_finished():
   return False

for dependency in self.dependencies:
...

@@ -214,7 +183,7 @@ async def check_finished_dependencies(runtime_task):

# dependencies finished, let's check if they finished
# successfully, so we can move on with the parent task
dependencies_ok = await check_finished_dependencies(runtime_task)
dependencies_ok = runtime_task.is_possible_to_run()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a matter of taste, so feel free to ignore, but to make this method shorter, I would go with "can_run()".

self._state_machine._status_repo.get_latest_task_data(
str(runtime_task.task.identifier)) or {}
# maybe, the results are not available yet
while latest_task_data.get("result", None) is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get() by default returns None. So I would remove the None from the get() method.

result_stats = set(key.upper()for key in
self._state_machine._status_repo.result_stats.keys())
if self._failfast and not result_stats.isdisjoint(STATUSES_NOT_OK):
await self._state_machine.abort("FAILFAST is enabled")
raise TestFailFast("Interrupting job (failfast).")

await self._state_machine.finish_task(runtime_task)
await self._state_machine.finish_task(runtime_task, "FINISHED")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adds some value, if we have all the states well defined. this is the reason and we could definitely be explicit about the reason. What are possible reasons for a finished task?

return cache.is_requirement_in_cache(ENVIRONMENT_TYPE,
ENVIRONMENT,
kind,
name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that you are passing ENVIRONMENT_TYPE, ENVIRONMENT, kind and name for most of those cache methods, makes me think if is not worth ed to created an object for it.

@richtja
Copy link
Contributor Author

richtja commented Jun 29, 2022

Thanks @clebergnu and @beraldoleal for your reviews, I created new version #5418 with fixes.

@richtja richtja closed this Jun 29, 2022
@richtja richtja deleted the requirements_database branch June 30, 2022 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants