Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break large functions into chunked sub-functions #19

Closed
chrispcampbell opened this issue Jul 10, 2020 · 2 comments · Fixed by #21
Closed

Break large functions into chunked sub-functions #19

chrispcampbell opened this issue Jul 10, 2020 · 2 comments · Fixed by #21
Assignees
Milestone

Comments

@chrispcampbell
Copy link
Contributor

When investigating performance/crashing issues with En-ROADS on iPad, in addition to the findings from #18, I discovered another source of memory pressure: the large number of instructions in the generated functions of the En-ROADS model leads to a large stack frame size.

The functions in question are:

  • initConstants (~1500 lines)
  • initLevels (~3900 lines)
  • evalAux (~4000 lines)
  • evalLevels (~350 lines)

Each of these functions operates on what are essentially global variables, so as long as we execute the operations in the same order, there's no harm in breaking up these large functions into smaller sub-functions that execute a chunk of instructions.

For example, instead of:

void evalAux() {
  _var1 = ...;
  _var2 = ...;
  ...
  _var3000 = ...;
}

We can have:

void evalAux0() {
  _var1 = ...;
  _var2 = ...;
  ...
  _var20 = ...;
}

void evalAux1() {
  _var21 = ...;
  _var22 = ...;
  ...
  _var40 = ...;
}

...

void evalAux() {
  evalAux0();
  evalAux1();
  ...
  evalAux150();
}

I experimented a bit with the "chunk" size and it looks like 30 operations per sub-function is a sweet spot.

Here are some preliminary numbers that show the performance benefits of this approach, as compared to the original baseline (i.e., last week's code) as well as implementing these optimizations on top of #18.

Performance

MacBook Pro (2019) | 2.4 GHz 8-core i9, 32 GB RAM, macOS 10.15

Issue C run (ms) Wasm run (ms) Wasm init (ms) JS mem (MB) Page mem (MB)
baseline 45.8 87.5 38.0 94 685
SDE 18 46.0 85.6 18.0 39 672
SDE 19 42.8 49.4 15.0 38 25

iPhone 8 | A11, iOS 13

Issue C run (ms) Wasm run (ms) Wasm init (ms) JS mem (MB) Page mem (MB)
baseline 39.9 187.0 165.0 39 645
SDE 18 40.3 219.0 86.0 38 724
SDE 19 40.1 81.6 83.0 38 41

iPad Air (2013) | A7, iOS 12

Issue C run (ms) Wasm run (ms) Wasm init (ms) JS mem (MB) Page mem (MB)
baseline 151.0 1372.2 30146.0 77 331
SDE 18 166.0 1408.0 4416.0 42 395
SDE 19 151.0 837.6 1291.0 45 41

Size

Issue Wasm size (bytes)
baseline 1,084,036
SDE 18 773,968
SDE 19 776,851
@chrispcampbell
Copy link
Contributor Author

Merged to develop in e155754.

@chrispcampbell chrispcampbell added this to the 0.5.3 milestone Jul 10, 2020
@chrispcampbell chrispcampbell linked a pull request Jul 10, 2020 that will close this issue
@travisfranck
Copy link
Collaborator

This is really cool. Some things I note from reviewing the tables:

  1. Memory size a much improved. WASM code overall is much better. Very nice.
  2. In my head, we always had a goal to have the model run in under 10ms. I think Todd said that was what the human brain thought was "instant". Seems like we're already well above that now. Maybe closer to 100ms on older than 2019 laptops.
  3. For the iOS data, C code is significantly faster than JS code. Good to keep in mind if when we consider our iOS deployment strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants