Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Degradation on Independent Task benchmark #93

Open
wlruys opened this issue Apr 17, 2023 · 3 comments
Open

Performance Degradation on Independent Task benchmark #93

wlruys opened this issue Apr 17, 2023 · 3 comments
Assignees

Comments

@wlruys
Copy link
Contributor

wlruys commented Apr 17, 2023

I've been trying to reproduce our scaling plots from February to show the advantages of the C++ runtime for TST.

I've found our scaling is highly degraded for 1000 1ms independent tasks.
On a RTX node, we only have 4.5x speedup on 8 threads.
On the SKX node, we get 5.9x speedup (down from 7.4x speedup in Feb.)

Disabling contexts gets us back to 6.57x speedup.
Using a single C++ call for cleanup instead of 2 (PARLA_ENABLE_PYTHON_RUNAHEAD=false), gets us to 6.9x speedup.

I'm not yet sure where the remaining missing time is. If its on the python side, possibly in creating device requirements?

@wlruys wlruys self-assigned this Apr 17, 2023
@wlruys
Copy link
Contributor Author

wlruys commented Apr 17, 2023

We need to make our benchmarks easy to reproduce and automated again. (e.g. fix the google benchmark scripts) and to check them before any feature merges.

@nicelhc13
Copy link
Contributor

questions:

  1. is it a single-device task or multi-device task?
  2. is it using any data operand?
  3. it should be cpu right? (if u tested on skx node)

i agree with the google benchmark. the issue of the google bench is that it breaks binlog.

@wlruys
Copy link
Contributor Author

wlruys commented Apr 18, 2023

  1. single
  2. no data
    3 yep this was cpu only throughput testing

I'll handle this since its really most of my code that hurt it. I need to move parts of contexts to c++ and decrease the amount of new dictionary allocations on the python side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants