Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catching AttributeError which may occur when executing multi-threaded #4424

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bwuerkner
Copy link

@bwuerkner bwuerkner commented Sep 29, 2023

I don't have a testcase for this because I don't really understand why or how this is happening but I can 100% reproduce it locally with a rather complicated setup.

The setup consists amongst other things of custom Builder and Action implementations.

The error I observe only occurs when running with -j2 or higher, it never occurs single threaded.
It also goes away if I just call SCons again after it failed and then proceeds as expected.
Running a --clean provokes the error again on next invocation.

The error is:

AttributeError : module 'SCons.Tool.gcc' has no attribute 'generate'
Traceback (most recent call last):
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Environment.py", line 1286, in get_CacheDir
    path = self._CacheDir_path
           ^^^^^^^^^^^^^^^^^^^
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Environment.py", line 2540, in __getattr__
    attr = getattr(self.__dict__['__subject'], name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'SConsEnvironment' object has no attribute '_CacheDir_path'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Taskmaster/__init__.py", line 231, in execute
    if not t.retrieve_from_cache():
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Node/FS.py", line 3034, in retrieve_from_cache
    return self.get_build_env().get_CacheDir().retrieve(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Environment.py", line 1288, in get_CacheDir
    path = SCons.Defaults.DefaultEnvironment()._CacheDir_path
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Defaults.py", line 85, in DefaultEnvironment
    _default_env = SCons.Environment.Environment(*args, **kw)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Environment.py", line 1248, in __init__
    apply_tools(self, tools, toolpath)
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Environment.py", line 117, in apply_tools
    _ = env.Tool(tool)
        ^^^^^^^^^^^^^^
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Environment.py", line 2033, in Tool
    tool(self)
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Tool/__init__.py", line 265, in __call__
    self.generate(env, *args, **kw)
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Tool/default.py", line 40, in generate
    for t in SCons.Tool.tool_list(env['PLATFORM'], env):
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Tool/__init__.py", line 769, in tool_list
    c_compiler = FindTool(c_compilers, env) or c_compilers[0]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Tool/__init__.py", line 671, in FindTool
    t = Tool(tool)
        ^^^^^^^^^^
  File "/data/ENV/PY3/.venv/lib/python3.11/site-packages/SCons/Tool/__init__.py", line 119, in __init__
    self.generate = module.generate
                    ^^^^^^^^^^^^^^^
AttributeError: module 'SCons.Tool.gcc' has no attribute 'generate'

The AttributeError module is not consistent either, I've seen at least:
AttributeError: module 'SCons.Tool.default' has no attribute 'generate'
AttributeError: module 'SCons.Tool.gcc' has no attribute 'generate'

Also I'm not using Tool.gcc at all in this Project

Contributor Checklist:

  • I have created a new test or updated the unit tests to cover the new/changed functionality.
  • I have updated CHANGES.txt (and read the README.rst)
  • I have updated the appropriate documentation

@bwuerkner bwuerkner changed the title Catching Attribute which may occur when executing multi-threaded Catching AttributeError which may occur when executing multi-threaded Sep 29, 2023
@mwichmann
Copy link
Collaborator

That's kind of scary - it means there's a way for the taskmaster to try retrieving the build environment of a task that doesn't really have one. All the rest is a consequence, we shouldn't really be going down the route of constructing a default environment with the initialization that goes with it. I'm suspicious this means something has a NullEnvironment where it shouldn't.

@mwichmann
Copy link
Collaborator

mwichmann commented Sep 29, 2023

Hmmm... the comment in retrieve_from_cache says:

    def retrieve_from_cache(self) -> bool:                                     
        """Try to retrieve the node's content from a cache                     
                                                                               
        This method is called from multiple threads in a parallel build,       
        so only do thread safe stuff here. 
        ... """

Following the thread of execution down to the instantiation of DefaultEnvironment gets into a place that is far from "atomic", with all the tool initialization... in fact, DefaultEnvironment() itself may be racy as it checks a global flag before going off and doing expensive stuff and then rewrites itself - but a second entry before that finishes is likely to go somewhere we don't want.

@bdbaddog
Copy link
Contributor

Can you describe or share your custom Builder/Action?
Version of Python?
Version of SCons?

@bwuerkner
Copy link
Author

bwuerkner commented Sep 29, 2023

@bdbaddog

Python 3.11.4
Linux (Ubuntu Focal)
SCons 4.5.2 (pip package) AND also tried with SCons on master branch

I'll try to summarize them and give some code snippets.
Generally, what I'm doing, is create a lot of targets in a loop based on the input from a JSON file. This has been working for many months now actually and is still working with all existing JSON files that were previously working.
What's not working is a new JSON file that doesn't seem to have any relevant diff to the working ones.
All of this to say that: This is working in certain circumstances and not in others which makes it hard to narrow down.

Now for the setup:

custom_tool.py

def exists(env):
    return True

def generate(env):
    some_action = Action(internal_function_to_execute_some_command, strfunction=internal_function_for_name)
    bld = Builder(action=some_action)
    env.Append(BUILDERS={"custom_tool": bld})

The internal_function_to_execute_some_command does all the heavy lifting, but even if I just immediately return 0 the error is triggered

Aside from that I have a SConstruct file that contains the loading logic, some error handling and parameter setting and the following:

SConstruct

env = Environment(
    variables = cmdline_params,
    tools=['custom_tool'],
    toolpath=['path/to/custom/tool'], 
    ENV=os.environ,
    overall_start_time = time.time(),
    db = None,
    event=None,
    execution_log={}
)

and

    for entry in data:
        target = env.custom_tool(
                source=entry.source,
                target=entry.dest
            )
        targets.append(target)

@bwuerkner
Copy link
Author

bwuerkner commented Sep 29, 2023

Hmmm... the comment in retrieve_from_cache says:

    def retrieve_from_cache(self) -> bool:                                     
        """Try to retrieve the node's content from a cache                     
                                                                               
        This method is called from multiple threads in a parallel build,       
        so only do thread safe stuff here.  ..."""

Following the thread of execution down to the instantiation of DefaultEnvironment gets into a place that is far from "atomic", with all the tool initialization, is far from that... in fact, DefaultEnvironment() itself may be racy as it checks a global flag before going off and doing expensive stuff and then rewrites itself - but a second entry before that finishes is likely to go somewhere we don't want.

I did also find it weird that a function related to the cache is called when there's no cachedir defined.
What you're describing sounds like it could be the root cause. As outlined in my other comment the race condition only appears with certain input and doesn't for other.

@mwichmann
Copy link
Collaborator

just for grins, if you're not actually using the Default Environment (that is, your calls are all of the form env.Mybuilder, env.Append, etc.), could you add:

DefaultEnvironment(tools=[])

This isn't a fix, it's just an experiment which might help because of two factors:

  1. forces the initialization of DefaultEnvironment to happen early
  2. makes it much quicker

There still something quite interesting going on that it's not yet clear how to drill down on.

@bdbaddog
Copy link
Contributor

Can you expand on what internal_function_to_execute_some_command does?
Even a sanitized code snippet could provide some insight..
Are you calling any builders in that action?

@bwuerkner
Copy link
Author

bwuerkner commented Sep 30, 2023

DefaultEnvironment(tools=[])

That does fix the issue so I guess Experiment succeeded.

Can you expand on what internal_function_to_execute_some_command does? Even a sanitized code snippet could provide some insight.. Are you calling any builders in that action?

No I'm not calling any builders or other SCons related content in that function. It is rather simple and as outlined above the content doesn't actually seem to matter.
Currently it looks like this:

def internal_function_to_execute_some_command(target, source, env):
    return 0

and with that I can trigger the issue without a problem. I'm somewhat assuming it has to do with the parallel initialization of the Builders after their initial creation.

I didn't clearly state it in my original message but before the error occurs I get

scons: done reading SConscript files.
scons: Building targets ...

I'm also starting to think that this PR is not gonna be necessary since the manual initialization prevents the problem in a way more stable way.

@bdbaddog
Copy link
Contributor

DefaultEnvironment(tools=[])

That does fix the issue so I guess Experiment succeeded.

Can you expand on what internal_function_to_execute_some_command does? Even a sanitized code snippet could provide some insight.. Are you calling any builders in that action?

No I'm not calling any builders or other SCons related content in that function. It is rather simple and as outlined above the content doesn't actually seem to matter. Currently it looks like this:

def internal_function_to_execute_some_command(target, source, env):
    return 0

and with that I can trigger the issue without a problem. I'm somewhat assuming it has to do with the parallel initialization of the Builders after their initial creation.

I didn't clearly state it in my original message but before the error occurs I get

scons: done reading SConscript files.
scons: Building targets ...

I'm also starting to think that this PR is not gonna be necessary since the manual initialization prevents the problem in a way more stable way.

It's very odd we've never seen this failure mode before, if nothing special about your usage is triggering it.
Does your builder define an emitter or a scanner?

@mwichmann
Copy link
Collaborator

mwichmann commented Oct 1, 2023

I'm also starting to think that this PR is not gonna be necessary since the manual initialization prevents the problem in a way more stable way.

It's very odd we've never seen this failure mode before, if nothing special about your usage is triggering it.

I agree, that's odd. I think the code path looks vulnerable, but if it actually is (rather than just in theory), why hasn't it been seen over the last 20 years? It may be time to take a deeper look at revamping the way the Default Environment is initialized. There's a form of deferred initialization now, but as we've talked about in other threads (and as mentioned in #1069 - apparently some work was once started on this) what we probably really want is deferring the tool setup in the default environment. The above suggestion defers tool setup by not doing it at all on that environment, but it seems rather a hack - why do you need to do something special if you know you're not going to call on the environment yourself (and then what if you change your mind?). That's probably food for a separate thread - maybe in a GitHub Discussion, but thought I'd mention it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants