mp_tasks - activitysim multiprocessing overview
+Activitysim runs a list of models sequentially, performing various computational operations
+on tables. Model steps can modify values in existing tables, add columns, or create additional
+tables. Activitysim provides the facility, via expression files, to specify vectorized operations
+on data tables. The ability to vectorize operations depends upon the independence of the
+computations performed on the vectorized elements.
+Python is agonizingly slow performing scalar operations sequentially on large datasets, so
+vectorization (using pandas and/or numpy) is essential for good performance.
+Fortunately most activity based model simulation steps are row independent at the household,
+person, tour, or trip level. The decisions for one household are independent of the choices
+made by other households. Thus it is (generally speaking) possible to run an entire simulation
+on a household sample with only one household, and get the same result for that household as
+you would running the simulation on a thousand households. (See the shared data section below
+for an exception to this highly convenient situation.)
+The random number generator supports this goal by providing streams of random numbers
+for each households and person that are mutually independent and repeatable across model runs
+and processes.
+To the extent that simulation model steps are row independent, we can implement most simulations
+as a series of vectorized operations on pandas DataFrames and numpy arrays. These vectorized
+operations are much faster than sequential python because they are implemented by native code
+(compiled C) and are to some extent multi-threaded. But the benefits of numpy multi-processing are
+limited because they only apply to atomic numpy or pandas calls, and as soon as control returns
+to python it is single-threaded and slow.
+Multi-threading is not an attractive strategy to get around the python performance problem because
+of the limitations imposed by python’s global interpreter lock (GIL). Rather than struggling with
+python multi-threading, this module uses the python multiprocessing to parallelize certain models.
+Because of activitysim’s modular and extensible architecture, we don’t hardwire the multiprocessing
+architecture. The specification of which models should be run in parallel, how many processers
+should be used, and the segmentation of the data between processes are all specified in the
+settings config file. For conceptual simplicity, the single processing model as treated as
+dominant (because even though in practice multiprocessing may be the norm for production runs,
+the single-processing model will be used in development and debugging and keeping it dominant
+will tend to concentrate the multiprocessing-specific code in one place and prevent multiprocessing
+considerations from permeating the code base obscuring the model-specific logic.
+The primary function of the multiprocessing settings are to identify distinct stages of
+computation, and to specify how many simultaneous processes should be used to perform them,
+and how the data to be treated should be apportioned between those processes. We assume that
+the data can be apportioned between subprocesses according to the index of a single primary table
+(e.g. households) or else are by derivative or dependent tables that reference that table’s index
+(primary key) with a ref_col (foreign key) sharing the name of the primary table’s key.
+Generally speaking, we assume that any new tables that are created are directly dependent on the
+previously existing tables, and all rows in new tables are either attributable to previously
+existing rows in the pipeline tables, or are global utility tables that are identical across
+sub-processes.
+Note: There are a few exceptions to ‘row independence’, such as school and location choice models,
+where the model behavior is externally constrained or adjusted. For instance, we want school
+location choice to match known aggregate school enrollments by zone. Similarly, a parking model
+(not yet implemented) might be constrained by availability. These situations require special
+handling.
+models:
+ ### mp_initialize step
+ - initialize_landuse
+ - compute_accessibility
+ - initialize_households
+ ### mp_households step
+ - school_location
+ - workplace_location
+ - auto_ownership_simulate
+ - free_parking
+ ### mp_summarize step
+ - write_tables
+
+multiprocess_steps:
+ - name: mp_initialize
+ begin: initialize_landuse
+ - name: mp_households
+ begin: school_location
+ num_processes: 2
+ slice:
+ tables:
+ - households
+ - persons
+ - name: mp_summarize
+ begin: write_tables
+
+
+The multiprocess_steps setting above annotates the models list to indicate that the simulation
+should be broken into three steps.
+The first multiprocess_step (mp_initialize) begins with the initialize_landuse step and is
+implicity single-process because there is no ‘slice’ key indicating how to apportion the tables.
+This first step includes all models listed in the ‘models’ setting up until the first step
+in the next multiprocess_steps.
+The second multiprocess_step (mp_households) starts with the school location model and continues
+through auto_ownership_simulate. The ‘slice’ info indicates that the tables should be sliced by
+households, and that persons is a dependent table and so and persons with a ref_col (foreign key
+column with the same name as the Households table index) referencing a household record should be
+taken to ‘belong’ to that household. Similarly, any other table that either share an index
+(i.e. having the same name) with either the households or persons table, or have a ref_col to
+either of their indexes, should also be considered a dependent table.
+The num_processes setting of 2 indicates that the pipeline should be split in two, and half of the
+households should be apportioned into each subprocess pipeline, and all dependent tables should
+likewise be apportioned accordingly. All other tables (e.g. land_use) that do share an index (name)
+or have a ref_col should be considered mirrored and be included in their entirety.
+The primary table is sliced by num_processes-sized strides. (e.g. for num_processes == 2, the
+sub-processes get every second record starting at offsets 0 and 1 respectively. All other dependent
+tables slices are based (directly or indirectly) on this primary stride segmentation of the primary
+table index.
+Two separate sub-process are launched (num_processes == 2) and each passed the name of their
+apportioned pipeline file. They execute independently and if they terminate successfully, their
+contents are then coalesced into a single pipeline file whose tables should then be essentially
+the same as it had been generated by a single process.
+We assume that any new tables that are created by the sub-processes are directly dependent on the
+previously primary tables or are mirrored. Thus we can coalesce the sub-process pipelines by
+concatenating the primary and dependent tables and simply retaining any copy of the mirrored tables
+(since they should all be identical.)
+The third multiprocess_step (mp_summarize) then is handled in single-process mode and runs the
+write_tables model, writing the results, but also leaving the tables in the pipeline, with
+essentially the same tables and results as if the whole simulation had been run as a single process.
+