Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix occasional test failures when running multiple jobs #3816

Merged
merged 1 commit into from
Oct 13, 2020

Conversation

grossag
Copy link
Contributor

@grossag grossag commented Oct 13, 2020

The way runtest.py passes the list of fixture directories is racy because it sets it in os.environ['FIXTURE_DIRS'] and then spawns the subprocess, counting on Python to start the subprocess before that list is overwritten when spawning the next directory. At least on Windows, the environment is not copied in subprocess.run so runtest.py may overwrite the list of fixture directories with the list for test #2 while the subprocess module is still kicking off test #1. I was able to easily reproduce this by running the
command: python runtest.py -j 2 test\MSVC\VSWHERE.py test\AS\ASPPFLAGS.py a few times in a row. However, with this fix, that command repeatedly succeeds.

To validate ths fix, I also ran that command with "--xml a.xml" and "--xml a.xml --nopipefiles" to validate that those other executors worked correctly.

Remove this paragraph

Please have a look at our developer documentation before submitting your Pull Request.

https://scons.org/guidelines.html

Contributor Checklist:

  • I have created a new test or updated the unit tests to cover the new/changed functionality.
  • I have updated CHANGES.txt (and read the README.rst)
  • I have updated the appropriate documentation

The way runtest.py passes the list of fixture directories is racy because it
sets it in os.environ['FIXTURE_DIRS'] and then spawns the subprocess, counting
on Python to start the subprocess before that list is overwritten when spawning
the next directory. At least on Windows, the environment is not copied in
subprocess.run so runtest.py may overwrite the list of fixture directories
with the list for test #2 while the subprocess module is still kicking off
test #1. I was able to easily reproduce this by running the command:
`python runtest.py -j 2 test\MSVC\VSWHERE.py test\AS\ASPPFLAGS.py`
a few times in a row. However, with this fix, that command repeatedly succeeds.

To validate ths fix, I also ran that command with "--xml a.xml" and
"--xml a.xml --nopipefiles" to validate that those other executors worked
correctly.
@mwichmann
Copy link
Collaborator

Looks okay to me.

@bdbaddog
Copy link
Contributor

bdbaddog commented Oct 13, 2020

Curious this is necessary python docs say:
If env is not None, it must be a mapping that defines the environment variables for the new process; these are used instead of the default behavior of inheriting the current process’ environment. It is passed directly to Popen.

and env=None is the default.

https://docs.python.org/3/library/subprocess.html

@mwichmann
Copy link
Collaborator

I think that was the point - as is, it's sharing os.environ, and that proved racy when different tests try to write different fixture dirs to the same variable?

@grossag
Copy link
Contributor Author

grossag commented Oct 13, 2020

Curious this is necessary python docs say:
If env is not None, it must be a mapping that defines the environment variables for the new process; these are used instead of the default behavior of inheriting the current process’ environment. It is passed directly to Popen.

and env=None is the default.

https://docs.python.org/3/library/subprocess.html

I dug into the Python subprocess code and found that that code does no asynchronous logic. My current theory is that it is the call to _winapi.CreateProcess that is not protected against changes to the environment because the Windows CreateProcess API is slightly asynchronous in its calls to other layers of Windows. I don't have a great theory as to why though. The only thing that my experiments can show for certain is that subprocess.run is not atomic in its usage of the calling process's environment on Windows.

@bdbaddog bdbaddog merged commit b70f085 into SCons:master Oct 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants