Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decreased performance after replacing multiprocessing.dummy with concurrent.futures #476

Open
ogenstad opened this issue Jan 26, 2020 · 5 comments

Comments

@ogenstad
Copy link
Collaborator

I did a quick test with my code from https://github.com/networklore/networklore-demos/tree/master/python/nornir/ansible-nornir-speed

pip install nornir==2.3.0
time python nornir-run.py 5000
(ran three times)
python nornir-run.py 5000  10.49s user 3.46s system 116% cpu 12.006 total
python nornir-run.py 5000  10.65s user 3.27s system 119% cpu 11.602 total
python nornir-run.py 5000  10.91s user 3.26s system 117% cpu 12.076 total
pip install https://github.com/nornir-automation/nornir/archive/develop.zip
(pip install pydantic==1.3 because it got upgraded to 1.4..)
time python nornir-run.py 5000
(ran three times)
python nornir-run.py 5000  21.00s user 3.45s system 109% cpu 22.282 total
python nornir-run.py 5000  20.47s user 3.12s system 109% cpu 21.575 total
python nornir-run.py 5000  19.82s user 2.88s system 107% cpu 21.108 total

Basically the PR #467, made this twice as slow. I'm guessing that the change was made after checking a task where we were mostly waiting for the network where my example is related to local processing. As the example with concurrent.futures isn't faster for the network task I would vote to revert #467. The description of Nornir is "Pluggable multi-threaded framework with inventory management to help operate collections of devices", I think this should perhaps clarify any misconception regarding multiprocessing or threading. If we want to be extra helpful I guess we can insert a comment above the from multiprocessing.dummy import Pool line, I'm not sure that I see it as a big problem though.

/cc @dmfigol

@dmfigol
Copy link
Collaborator

dmfigol commented Jan 27, 2020

Hmmm. Thanks for these tests Patrick!
This is very weird. When I was doing my research for PR, I haven't found a single thread on the internet where someone would claim with proof that multiprocessing.dummyis faster than concurrent.futures.
I was testing this with time.sleep task and real networking tasks, there was no difference. In your tests it is twice slower 😮
I am at CLEUR this week, but I will test this on the next weekend. (thanks for the repro code, btw!)
One guess is that in my implementation a future is created for every task. May be an overhead of so many future objects?

@dbarrosop
Copy link
Contributor

Thanks for this. I will hold the 2.4.0 release until we have clarified this (unless people starts having lots of issues with pydantic, in which case I will revert the PR that introduced this change so we can reevaluate for 2.5.0)

@dmfigol
Copy link
Collaborator

dmfigol commented Feb 3, 2020

@ogenstad Patrick, I am sorry, I don't see the same problem.
Here is my testing methodology:

  • using macOS
  • cloned your repo
  • install nornir@develop and pydantic 1.3 in venv:
    pip install git+https://github.com/nornir-automation/nornir.git@develop
    pip install pydantic==1.3
  • modified nornir/core/__init__.py to have 3 versions (concurrent.futures submit, multiprocessing.dummy, concurrent.futures map) and comment them out depending on the test
  • run find output/nornir/ -name "*.cfg" -delete ; time python nornir-run.py 10000 ; ls -l output/nornir | wc -l

Here is the code for _run_parallel:

    def _run_parallel(
        self,
        task: Task,
        hosts: List["Host"],
        num_workers: int,
        **kwargs: Dict[str, Any],
    ) -> AggregatedResult:
        agg_result = AggregatedResult(kwargs.get("name") or task.name)
        futures = []
        with ThreadPoolExecutor(num_workers) as pool:
            for host in hosts:
                future = pool.submit(task.copy().start, host, self)
                futures.append(future)

        for future in futures:
            worker_result = future.result()
            agg_result[worker_result.host.name] = worker_result
        return agg_result

        # result = AggregatedResult(kwargs.get("name") or task.name)
        # 
        # pool = Pool(processes=num_workers)
        # result_pool = [
        #     pool.apply_async(task.copy().start, args=(h, self)) for h in hosts
        # ]
        # pool.close()
        # pool.join()
        # 
        # for rp in result_pool:
        #     r = rp.get()
        #     result[r.host.name] = r
        # return result

        # agg_result = AggregatedResult(kwargs.get("name") or task.name)
        # worker = functools.partial(task.copy().start, nornir=self)
        # with ThreadPoolExecutor(num_workers) as pool:
        #     results = pool.map(worker, hosts)
        #
        # for worker_result in results:
        #     agg_result[worker_result.host.name] = worker_result
        # return agg_result

Here are results:

  1. concurrent.futures submit
python nornir-run.py 10000  20.60s user 5.90s system 118% cpu 22.363 total
python nornir-run.py 10000  21.65s user 5.89s system 118% cpu 23.317 total
python nornir-run.py 10000  22.68s user 6.16s system 116% cpu 24.671 total
python nornir-run.py 10000  24.14s user 6.19s system 116% cpu 25.985 total
python nornir-run.py 10000  22.82s user 6.04s system 117% cpu 24.590 total
  1. multiprocessing.dummy
python nornir-run.py 10000  23.01s user 6.04s system 116% cpu 24.908 total
python nornir-run.py 10000  23.17s user 6.08s system 116% cpu 25.029 total
python nornir-run.py 10000  25.09s user 6.38s system 116% cpu 26.989 total
python nornir-run.py 10000  23.21s user 6.03s system 116% cpu 25.064 total
python nornir-run.py 10000  23.70s user 6.08s system 116% cpu 25.574 total

Difference is small and submit concurrent.futures is slightly faster but quite negligible (coincides with what I had in PR)

One weird thing I noticed was with concurrent.futures map. While the number of results were 10000, there were only ~4800 resulting files created in the output directory (that's why I added ls -l | wc -l). I haven't seen this with concurrent.futures submit or multiprocessing.dummy. I don't understand where this issue is coming from at all.

Do you mind re-testing the same way I did? What is your operating system by the way?
Thanks!

@dbarrosop
Copy link
Contributor

I think at this point we have two options:

  1. Revert PR and hold it a bit until we understand what's going on here a bit better
  2. Hold on the release a week or two for further testing.

There is no real benefit of migrating away from multiprocessor.dummy while there seems to be some associated risk we don't fully understand (like some configurations being significantly slower).

@dmfigol
Copy link
Collaborator

dmfigol commented Feb 4, 2020

I am biased since I submitted PR, but I'd prefer not to revert it unless we have enough evidence it is causing issues.
Also, I just realized, that we are in no rush for the release. At first I though that everyone doing pip install nornir will hit pydantic issue, but I just realized that current PyPI version of nornir has pydantic pinned to 0.18.2, so problems with pydantic only affect people using develop branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants