Fix occasional test failures when running multiple jobs #3816

grossag · 2020-10-13T19:26:24Z

The way runtest.py passes the list of fixture directories is racy because it sets it in os.environ['FIXTURE_DIRS'] and then spawns the subprocess, counting on Python to start the subprocess before that list is overwritten when spawning the next directory. At least on Windows, the environment is not copied in subprocess.run so runtest.py may overwrite the list of fixture directories with the list for test #2 while the subprocess module is still kicking off test #1. I was able to easily reproduce this by running the
command: python runtest.py -j 2 test\MSVC\VSWHERE.py test\AS\ASPPFLAGS.py a few times in a row. However, with this fix, that command repeatedly succeeds.

To validate ths fix, I also ran that command with "--xml a.xml" and "--xml a.xml --nopipefiles" to validate that those other executors worked correctly.

Remove this paragraph

Please have a look at our developer documentation before submitting your Pull Request.

https://scons.org/guidelines.html

Contributor Checklist:

I have created a new test or updated the unit tests to cover the new/changed functionality.
I have updated CHANGES.txt (and read the README.rst)
I have updated the appropriate documentation

The way runtest.py passes the list of fixture directories is racy because it sets it in os.environ['FIXTURE_DIRS'] and then spawns the subprocess, counting on Python to start the subprocess before that list is overwritten when spawning the next directory. At least on Windows, the environment is not copied in subprocess.run so runtest.py may overwrite the list of fixture directories with the list for test #2 while the subprocess module is still kicking off test #1. I was able to easily reproduce this by running the command: `python runtest.py -j 2 test\MSVC\VSWHERE.py test\AS\ASPPFLAGS.py` a few times in a row. However, with this fix, that command repeatedly succeeds. To validate ths fix, I also ran that command with "--xml a.xml" and "--xml a.xml --nopipefiles" to validate that those other executors worked correctly.

mwichmann · 2020-10-13T19:33:23Z

Looks okay to me.

bdbaddog · 2020-10-13T19:51:48Z

Curious this is necessary python docs say:
If env is not None, it must be a mapping that defines the environment variables for the new process; these are used instead of the default behavior of inheriting the current process’ environment. It is passed directly to Popen.

and env=None is the default.

https://docs.python.org/3/library/subprocess.html

mwichmann · 2020-10-13T20:01:08Z

I think that was the point - as is, it's sharing os.environ, and that proved racy when different tests try to write different fixture dirs to the same variable?

grossag · 2020-10-13T20:02:48Z

Curious this is necessary python docs say:
If env is not None, it must be a mapping that defines the environment variables for the new process; these are used instead of the default behavior of inheriting the current process’ environment. It is passed directly to Popen.

and env=None is the default.

https://docs.python.org/3/library/subprocess.html

I dug into the Python subprocess code and found that that code does no asynchronous logic. My current theory is that it is the call to _winapi.CreateProcess that is not protected against changes to the environment because the Windows CreateProcess API is slightly asynchronous in its calls to other layers of Windows. I don't have a great theory as to why though. The only thing that my experiments can show for certain is that subprocess.run is not atomic in its usage of the calling process's environment on Windows.

bdbaddog merged commit b70f085 into SCons:master Oct 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix occasional test failures when running multiple jobs #3816

Fix occasional test failures when running multiple jobs #3816

grossag commented Oct 13, 2020 •

edited

Loading

mwichmann commented Oct 13, 2020

bdbaddog commented Oct 13, 2020 •

edited

Loading

mwichmann commented Oct 13, 2020

grossag commented Oct 13, 2020

Fix occasional test failures when running multiple jobs #3816

Fix occasional test failures when running multiple jobs #3816

Conversation

grossag commented Oct 13, 2020 • edited Loading

Remove this paragraph

Contributor Checklist:

mwichmann commented Oct 13, 2020

bdbaddog commented Oct 13, 2020 • edited Loading

mwichmann commented Oct 13, 2020

grossag commented Oct 13, 2020

grossag commented Oct 13, 2020 •

edited

Loading

bdbaddog commented Oct 13, 2020 •

edited

Loading