Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskVine vs threadpoolctl import error reporting #3608

Open
benclifford opened this issue Sep 6, 2024 · 5 comments
Open

TaskVine vs threadpoolctl import error reporting #3608

benclifford opened this issue Sep 6, 2024 · 5 comments

Comments

@benclifford
Copy link
Collaborator

Describe the bug

TaskVine installed from source starts needing threadpoolctl at commit 0d3c115b082f3f6e365385d7602c923b676a9a3f

pytest of taskvine with this config:

from parsl.config import Config
from parsl.data_provider.file_noop import NoOpFileStaging
from parsl.data_provider.ftp import FTPInTaskStaging
from parsl.data_provider.http import HTTPInTaskStaging
from parsl.executors.taskvine import TaskVineExecutor, TaskVineManagerConfig


def fresh_config():
    return Config(executors=[TaskVineExecutor(manager_config=TaskVineManagerConfig(port=9000, shared_fs=True),
                                              worker_launch_method='factory', function_exec_mode='serverless')])

stops working at that commit and instead hangs.

It doesn't log anything that I could find about needing threadpoolctl.

This shouldn't be a problem when installing from proper packaging - but more generally I think it is a problem that some internal piece is breaking and hanging, and that same situation may arise for other more relevant errors.

To Reproduce
Use TaskVine at that commit above with the supplied config file, and make sure threadpoolctl is not installed with pip

Expected behavior
proper shutdown with error message leading the user towards the problem

Environment
my laptop

@tphung3
Copy link
Contributor

tphung3 commented Nov 8, 2024

Some observations on my end:

  • I tried running a test with CCTools 1 commit before threadpoolctl: 4a9b3084f170284d4851404d122d1a6f13675b73 and my test still hangs. I wonder if threadpoolctl has anything to do with the hanging.
  • Here's my test for reference. It creates 3 types of executors (threadpool, htex, taskvine) and runs a simple function on all.
  • Putting the dfk in a with context resolves the hanging. Without this context, taskvine hangs. Htex doesn't hang and instead exits but the log doesn't show the dfk calls its cleanup method (to test this you must make htex the last executor before Python exits).

A question to follow up:

  • Should using with context be the recommended way for parsl to clean up? If so I think it warrants changing the test code to reflect this. Wdyt @benclifford ?
#!/usr/bin/env python3

import parsl
from parsl import python_app
from parsl import bash_app
from parsl.config import Config
from parsl.executors.taskvine import TaskVineExecutor
from parsl.executors.taskvine import TaskVineManagerConfig
from parsl.executors.taskvine import TaskVineFactoryConfig
from parsl.providers import LocalProvider
from parsl.executors.threads import ThreadPoolExecutor
from parsl.channels import LocalChannel
from parsl.executors import HighThroughputExecutor

local_thread_config = Config(
        executors=[ThreadPoolExecutor(
            max_threads=4,
            label='local-threads')],
        strategy=None,)

htex_config = Config(
        executors=[HighThroughputExecutor(
            label='local-htex',
            cores_per_worker=4,
            provider=LocalProvider(channel=LocalChannel(),
            init_blocks=1,
            max_blocks=1,))],
        strategy=None,)

vine_config = Config(
        executors=[TaskVineExecutor(
            label='local-vine',
            worker_launch_method='factory',
            function_exec_mode='regular',
            manager_config=TaskVineManagerConfig(
                project_name='parsl-vine'),
            factory_config=TaskVineFactoryConfig(
                min_workers=1,
                max_workers=1,
                workers_per_cycle=1,
                cores=4,
                batch_type='local'))],
            strategy=None,)

all_configs = [local_thread_config, htex_config, vine_config]

def f(x):
    return x + 1
f_app = python_app(f)

def main():
    active_configs = [local_thread_config, htex_config, vine_config]
    #active_configs = [local_thread_config, htex_config]
    for config in active_configs:
        #with parsl.load(config):
        parsl.load(config)
        num_tasks = 1
        arg = 5
        results = []
        for _ in range(num_tasks):
            results.append(f_app(arg))

        total_sum = 0
        for i in range(num_tasks):
            total_sum += results[i].result()

        expected = f(arg) * num_tasks
        assert total_sum == expected
        
        parsl.clear()
        print(f'OK: {config.executors[0].label}')

if __name__ == '__main__':
    main()

@tphung3
Copy link
Contributor

tphung3 commented Nov 8, 2024

Ah I see what you mean. Under the hood TaskVine sends a library and then sends a function to run inside the library. The library now requires threadpoolctl so it will fail if it can't import threadpoolctl. This triggers infinite (I think) retries from the manager making the code looks like it hanged. A look into taskvine logs should give some evidence for this (or against this).

@tphung3
Copy link
Contributor

tphung3 commented Nov 8, 2024

Deeper I think the question would be how hard do we try before giving up and telling the user so...

@benclifford
Copy link
Collaborator Author

Most of the rest of Parsl has a "abandon as soon as there is an error, and let Parsl-level retry policy deal with it". Which usually people have as none, or a small number of tries. That leads, ideally, to whatever the error was being propagated to the user's workflow code as an exception in the relevant task/app Future object. So...

Deeper I think the question would be how hard do we try before giving up and telling the user so...

... it would align with that philosophy to not try very hard at all and let someone else (the user, and Parsl retries) deal with it.

@tphung3
Copy link
Contributor

tphung3 commented Nov 10, 2024

That makes sense to me. I added this to a list of improvements I'm planning for TaskVineExecutor. Thanks for the clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants