"Optimal" tests reordering - synthesis & strategy #5460

smarie · 2019-06-18T10:47:55Z

Introduction

Fixture order due to parametrization is a "hot topic" in the sense that there's been some work been done on it, but is a non-trivial issue because fixture setup order might greatly affect test suite performance.

Currently as you all know, pytest post-processes the order of test items in the pytest_collection_modifyitems hook. The purpose is to reach some kind of "optimality". There are currently many open tickets in pytest-dev about these ordering issues. My personal feeling is that we will not solve each of these problems separately, and that there is a need for a single place where we can discuss what "optimal" means, and what is the direction pytest will take on that topic.

Current Status

1- What is "optimal" ?

a) Current implementation

For current implementation, optimal means minimum number of fixtures setup/teardown.
Optimization is layered by scope: it works on higher-scope first, then lower-scopes iteratively without touching the higher-level ordering.
Finally it has still open issues to face (Ordering of session fixtures is wrong when combined with class or module fixtures under some conditions #3161, function-scoped fixture run before session-scoped fixture #5303, Surprising parametrized fixtures order #5054). A Pull Request has been opened and is under progress (PR [WIP] Minimize calls to session, module, and class-scoped fixtures by changing test order #3551).

Even if PR #3551 makes it way to solving the above issues (thanks @ceridwen !), a few other issues go beyong the current definition of "optimal":

b) Additional need 1: "priorities"

The first issue with current approach is that inside a given scope, ordering may be counter intuitive especially when there are multiple "best orders". Some comments in some tickets (#2846 (comment), #2846 (comment)) do not agree with it.

A new ticket #3393 has been opened and led to request an updated definition of "optimal": adding a "priority" argument. @Sup3rGeo proposed a plugin to handle this: pytest-param-priority.

My personal feeling: "priority" is a very technical term that most users will not understand properly. Whereas a notion of "setup/teardown cost", that users can express in seconds or in any other unit of their choice, could be easier to document and understand.

c) Additional need 2: "constraints"

#4892 raises the question of "shared resources" between fixtures. Part of the OP's need can be solved by setting the fixtures with the highest cost as "high priority", but the notion of "shared resource" is still an additional need: that two fixtures have an "interlock" between their setup/teardown. (one can not be setup if the other is setup).

2- Other desirable features

a) Explicit ordering

pytest_reorder proposes an additional commandline option to reorder tests based on their node ids, or based on a custom regex matching order. This allows users to customize the order pretty much the way they wish.

pytest-ordering proposes to reorder tests based on marks. Not sure that this applies to fixtures also.

b) Disabling order optimization

As this topic grows it seems more and more appropriate to be able to disable any kind of order optimization, just to be able to understand wher an order comes from. I suggested in #5054 and implemented in pytest-cases a commandline switch to skip all reordering done by pytest and plugins.

c) Readability / maintainability

To quote #3161 (comment)

"After spending some time staring at the reorder_items_atscope function, I still don't understand what the order it's supposed to produce is. I would assume that the correct order is to group all the fixtures of a given scope together and otherwise preserve the order in which they're processed. Are there more constraints than that?"

This raises the point about readability/maintainability of the chosen algorithm, whatever it is.

d) Support for parallelism

pytest-xdist allows users to parallelize tests. I expect that the "optimal" scheduling will therefore have to be completely modified in presence of parallelism.

Now what ?

From here, the debate is open:

Is this entire topic in scope of pytest or just some of it? If not, where is the best place to work on this topic ?
At which point should we switch from "relying on heuristics" where the algorithm needs to be modified everytime a new issue is discovered, to "relying on an optimization solver" where the problem formulation is the only thing that should be maintained ? For me the best way to properly handle several sets of constraints and to be able to add more later, is probably this. It seems relatively easy to formulate the (MILP) optimization problem in the case where there is no parallelism, there are plenty of resources about it for example this chapter 4. Then once the mathematical problem is formulated, many MILP solvers exist in python as presented here.

Your ideas ?

The text was updated successfully, but these errors were encountered:

RonnyPfannschmidt · 2019-06-18T11:15:57Z

i am definitively onboard for making this a contraints solver,
i beleive for sound implemntation we need to destroy the test protocol hooks as currently designed (yay)

while we are at it there should be a layer weaved in to support communication about setup/teardown dependencies for xdist as its current scheduling mechanisms leave much to be desired

Zac-HD added topic: collection related to the collection phase topic: fixtures anything involving fixtures directly or indirectly type: enhancement new feature or API change, should be merged into features branch labels Jul 9, 2019

Sup3rGeo mentioned this issue Jul 12, 2019

Refactor fixture finalization #4871

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Optimal" tests reordering - synthesis & strategy #5460

"Optimal" tests reordering - synthesis & strategy #5460

smarie commented Jun 18, 2019 •

edited

Loading

RonnyPfannschmidt commented Jun 18, 2019

"Optimal" tests reordering - synthesis & strategy #5460

"Optimal" tests reordering - synthesis & strategy #5460

Comments

smarie commented Jun 18, 2019 • edited Loading

Introduction

Current Status

1- What is "optimal" ?

a) Current implementation

b) Additional need 1: "priorities"

c) Additional need 2: "constraints"

2- Other desirable features

a) Explicit ordering

b) Disabling order optimization

c) Readability / maintainability

d) Support for parallelism

Now what ?

RonnyPfannschmidt commented Jun 18, 2019

smarie commented Jun 18, 2019 •

edited

Loading