Project Meeting 2018.11.16

Jump to bottom

Ben Stabler edited this page Nov 16, 2018 · 14 revisions

Multiprocessing

Work to date
- Lots of improvements for debugging/tracing/logging/etc. while we continue to optimize/understand runtime performance
- Can now run in separate instances (on separate machines if desired) and coalesce into one pipeline file
- New stride run option to run partitions - for example, slice households into 5 samples and run the first
- Run 1/5th once with mp runs in 52 minutes, and uses 1/5th of the CPUs and 1/5th of the RAM
- Doesn't scale since two strides at once, 72 minutes
- But could run 5 simultaneous for a complete run in 52 minutes (but would need to write a distributed management setup and runner, which is not scoped)
- Good article about pandas memory usage issues
Next steps
- Maybe try different low level C shared code like open blas instead of MKL
- numpy is doing some memory management outside of Python garbage collection which is suspicious
- Working on running a few big cloud-based runs using our Azure DevOps account
- Next try two strides at once on Linux since may behave differently
- Maybe try single server but separate virtual machines
Wrapping up
- Current best 100% sample run is 130 minutes with 20 processors on a single server
- Existing TM1 runs at MTC are ~4 to 5 hours for a 50% sample
- Working toward a deployment recommendations memo based on our findings
- We're really targeting two setups - a single server at an agency and/or cloud-based
- Plan to wrap up the task by the end of next week

ActivitySim

Clone this wiki locally