Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate file collection for non-pattern matching file #2679

Open
elliterate opened this issue Aug 12, 2017 · 5 comments
Open

Duplicate file collection for non-pattern matching file #2679

elliterate opened this issue Aug 12, 2017 · 5 comments
Labels
topic: collection related to the collection phase type: bug problem that needs to be addressed

Comments

@elliterate
Copy link
Contributor

elliterate commented Aug 12, 2017

When py.test is executed with file_or_dir arguments for:

  1. a directory and
  2. a file that
    1. is in the directory but
    2. does not match the configured Python test module pattern (e.g., test_*.py),

the file will be collected twice.

I believe this is because the file in question is considered for collection twice (once as a member of an explicitly-requested directory and again as an explicitly-requested file) and matches Session#isinitpath both times, thus being collected twice.

This is similar to #1187, except that the file would not ordinarily be collected as a member of the directory, since it does not match the Python test module pattern. As such, this is not a case of explicitly requesting collection of the same file_or_dir twice.

Example

Setup

$ python --version
Python 2.7.13
$ pip list
pip (9.0.1)
py (1.4.34)
pytest (3.2.1)
setuptools (28.8.0)
$ tree
.
└── tests
    ├── _test_three.py
    ├── _test_two.py
    └── test_one.py

Scenario A: Default test collection

$ py.test --collect-only
===================================== test session starts =====================================
platform darwin -- Python 2.7.13, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: $HOME/src/pytest-duplicate-collection, inifile:
collected 1 item
<Module 'tests/test_one.py'>
  <Function 'test_one'>

================================ no tests ran in 0.01 seconds =================================

Scenario B: Explicit directory test collection

$ py.test --collect-only tests
===================================== test session starts =====================================
platform darwin -- Python 2.7.13, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: $HOME/src/pytest-duplicate-collection, inifile:
collected 1 item
<Module 'tests/test_one.py'>
  <Function 'test_one'>

================================ no tests ran in 0.01 seconds =================================

Scenario C: Explicit non-matching file test collection

$ py.test --collect-only tests/_test_two.py
===================================== test session starts =====================================
platform darwin -- Python 2.7.13, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: $HOME/src/pytest-duplicate-collection, inifile:
collected 1 item
<Module 'tests/_test_two.py'>
  <Function 'test_two'>

================================ no tests ran in 0.00 seconds =================================

Scenario D: Explicit directory and non-matching file test collection

$ py.test --collect-only tests tests/_test_two.py
===================================== test session starts =====================================
platform darwin -- Python 2.7.13, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: $HOME/src/pytest-duplicate-collection, inifile:
collected 3 items
<Module 'tests/_test_two.py'>
  <Function 'test_two'>
<Module 'tests/test_one.py'>
  <Function 'test_one'>
<Module 'tests/_test_two.py'>
  <Function 'test_two'>

================================ no tests ran in 0.01 seconds =================================
@nicoddemus nicoddemus added topic: collection related to the collection phase type: bug problem that needs to be addressed labels Aug 15, 2017
@nicoddemus
Copy link
Member

@elliterate thanks for the report and the well detailed reproducer, we appreciate it. 👍

@ghost
Copy link

ghost commented Oct 6, 2017

This seems to be 'work as designed'.
Digging into the code, I figure out that there's a notion of initial paths: the directories and/or files that are specified explicitly in the command-line when running pytest. In the case of @elliterate, the initial paths include the tests directory and the tests/_test_two.py file.

The code in the module _pytest/python.py (this line) checks if pytest should collect tests in path:

if not parent.session.isinitpath(path):
# do this check ONLY if path NOT IN initial paths
    for pat in parent.config.getini('python_files'):
        if path.fnmatch(pat):
            break
        else:
            return
# the path is considered collected if the code reaches this point

So what happened in the example of @elliterate is the file _test_two.py was collected twice, and twice it was checked if it should be collected and passed the check because it was listed explicitly by the user. Renaming _test_two.py to whatever.py doesn't change anything.

So I guess there're some options:

  1. make this feature clear to users in the documentation
  2. do pattern matching regardless of a path being in the initial paths or not, which means the doc also needs to be updated and users who have been using this feature this way would be pissed off
  3. not collect a file more than once, which is an old story (Giving file_or_dir twice collects tests twice #1187)

I'm in for option 1.

PS: @nicoddemus sorry for deleting my previous comment. I didn't understand the issue well enough.

@elliterate
Copy link
Contributor Author

This seems to be 'work as designed'.

I don't understand how you reached that conclusion. Yes, this is what the code does, but that doesn't mean its behavior is by design. (If that were true, then there would be no such thing as a bug in software; everything would by definition be a feature, since that is how the code works.)

In #1187, the argument was made that if py.test A causes test set {X} to be executed, it would be reasonable to infer that py.test A A would cause test set {X} + {X} to be executed. That is, if you request the collection of a path twice, it will collect all of the matching tests twice. I don't necessarily agree that it's a desirable behavior, but I can buy the argument that people might infer it.

In this case, however, while py.test A causes test set {X} to be executed and py.test B causes test set {Y} to be executed, py.test A B causes test set {X + Y} + {Y} to be executed. In other words, the request to collect path B is altering the collection of path A. I see no reason for anyone to infer or desire this behavior and I believe that makes this a completely difference scenario from #1187.

@ghost
Copy link

ghost commented Oct 6, 2017

py.test A B causes test set {X + Y} + {Y} to be executed.

This is why I think it should be clarified in the doc. I think you understand pytest A B as if A and B should be disjoint, which is not always the case. X could be broken down to X' and Z while Y could be Y' and Z so in the end the example was asking for pytest to run test set {X'} + {Y'} + {Z} + {Z}.

When a user specifies a file, my understanding is it's not only to run just a file among many files but also to run a file whose name may not match the default patterns. So naming it _test_two.py is no different from my_tests.py or whatever a user prefers.

We certainly could make pytest ignore a file that doesn't match the pattern despite being specified explicitly by a user. I just think that some users may have use the tool like this and changing this would resulting in another user reporting another issue.

It's not on me to decide on which way the development of pytest should follow. I was looking at this issue and wanted to understand what happened and gain and better understanding of the codebase. I'm happy with whatever options the core developers want for this issue.

@ghost
Copy link

ghost commented Oct 6, 2017

I don't necessarily agree that it's a desirable behavior, but I can buy the argument that people might infer it.

Btw, I'm with you on this. I'm not a fan of having a file collected twice, but it was settled in another argument.

And in case you haven't noticed, I have worked as designed in quote. For me this issue seems to be a combination of having a file collected twice and having an explicitly specified test file skipped the file name patterns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: collection related to the collection phase type: bug problem that needs to be addressed
Projects
None yet
Development

No branches or pull requests

2 participants