Release v1.3.0 · ActivitySim/activitysim

ActivitySim version 1.3 brings some significant new features to the platform. Users may need to
make some small changes to configuration files to take full advantage of the version.

New Canonical Examples

Beginning with version 1.3, ActivitySim provides two supported "canonical" example
implementations:

the SANDAG Model is a two-zone
model based on the SANDAG ABM3 model, and
the MTC Model is a
one-zone model based on the MTC's Travel Model One.

Each example implementation includes a complete set of model components, input data,
and configuration files, and is intended to serve as a reference for users to build
their own models. They are provided as stand-alone repositories, to highlight the
fact that model implementations are separate from the ActivitySim core codebase,
and to make it easier for users to fork and modify the examples for their own use
without needing to modify the ActivitySim core codebase. The examples are maintained
by the ActivitySim Consortium and are kept up-to-date with the latest version of
ActivitySim.

The two example models are not identical to the original agency models from which
they were created. They are generally similar to those models, and have been calibrated
and validated to reproduce reasonable results. They are intended to demonstrate the
capabilities of ActivitySim and to provide a starting point for users to build their own
models. However, they are not intended to be used as-is for policy analysis or forecasting.

Logging

The reading of YAML configuration files has been modified to use the "safe" reader,
which prohibits the use of arbitrary Python code in configuration files. This is a
security enhancement, but it requires some changes to the way logging is configured.

In previous versions, the logging configuration file could contain Python code to
place log files in various subdirectories of the output directory, which might
vary for different subprocesses of the model, like this:

logging:
  handlers:
    logfile:
      class: logging.FileHandler
      filename: !!python/object/apply:activitysim.core.config.log_file_path ['activitysim.log']
      mode: w
      formatter: fileFormatter
      level: NOTSET

In the new version, the use of !!python/object/apply is prohibited. Instead of using
this directive, the log_file_path function can be invoked in the configuration file
by using the get_log_file_path key, like this:

logging:
  handlers:
    logfile:
      class: logging.FileHandler
      filename:
        get_log_file_path: activitysim.log
      mode: w
      formatter: fileFormatter
      level: NOTSET

Similarly, previous use of the if_sub_task directive in the logging level
configuration like this:

logging:
  handlers:
    console:
      class: logging.StreamHandler
      stream: ext://sys.stdout
      level: !!python/object/apply:activitysim.core.mp_tasks.if_sub_task [WARNING, NOTSET]
      formatter: elapsedFormatter

can be replaced with the if_sub_task and if_not_sub_task keys, like this:

logging:
  handlers:
    console:
      class: logging.StreamHandler
      stream: ext://sys.stdout
      level:
        if_sub_task: WARNING
        if_not_sub_task: NOTSET
      formatter: elapsedFormatter

For more details, see logging.

Chunking

Version 1.3 introduces a new "explicit" chunking mechanism.

Explicit chunking is simpler to use and understand than dynamic chunking, and in
practice has been found to be more robust and reliable. It requires no "training"
and is activated in the top level model configuration file (typically settings.yaml):

chunk_training_mode: explicit

Then, for model components that may stress the memory limits of the machine,
the user can specify the number of choosers in each chunk explicitly, either as an integer
number of choosers per chunk, or as a fraction of the overall number of choosers.
This is done by setting the explicit_chunk configuration setting in the model
component's settings. For this setting, integer values greater than or equal to 1
correspond to the number of chooser rows in each explicit chunk. Fractional values
less than 1 correspond to the fraction of the total number of choosers.
If the explicit_chunk value is 0 or missing, then no chunking
is applied for that component. The explicit_chunk values in each component's
settings are ignored if the chunk_training_mode is not set to explicit.
Refer to each model component's configuration documentation for details.

Refer to code updates that implement explicit chunking for accessibility in
PR #759, for
vehicle type choice, non-mandatory tour frequency, school escorting, and
joint tour frequency in PR #804,
and all remaining interaction-simulate components in
PR #870.

Automatic dropping of unused columns

Variables that are not used in a model component are now automatically dropped
from the chooser table before the component is run. Whether a variable is deemed
as "used" is determined by a text search of the model component code and specification
files for the variable name. Dropping unused columns can be disabled by setting
drop_unused_columns
to False in the compute_settings
for any model component, but by default this setting is True, as it can result in a
significant reduction in memory usage for large models.

Dropping columns may also cause problems if the model is not correctly configured.
If it is desired to use this feature, but some required columns are being dropped
incorrectly, the user can specify columns that should not be dropped by setting the
protect_columns
setting under compute_settings.
This allows the user to specify columns that should not be dropped, even if they are
not apparently used in the model component. For example:

compute_settings:
  protect_columns:
  - origin_destination

Code updates to drop unused columns are in
PR #833 and to protect
columns in PR #871.

Automatic conversion of string data to categorical

Version 1.3 introduces a new feature that automatically converts string data
to categorical data. This reduces memory usage and speeds up processing for
large models. The conversion is done automatically for string columns
in most chooser tables.

To further reduce memory usage, there is also an optional downcasting of numeric
data available. For example, this allows storing integers that never exceed 255
as int8 instead of int64. This feature is controlled by the downcast_int
and downcast_float settings in the top level model configuration file (typically
settings.yaml). The default value for these settings is False, meaning that
downcasting is not applied. It is recommended to leave these settings at their
default values unless memory availability is severely constrained, as downcasting
can cause numerical instability in some cases. First, changing the precision of
numeric data could cause results to change slightly and impact a previous calibrated
model result. Second, downcasting to lower byte data types, e.g., int8, can cause
numeric overflow in downstream components if the numeric variable is used in
mathematical calculations that would result in values beyond the lower bit width
limit (e.g. squaring the value). If downcasting is desired, it is strongly recommended
to review all model specifications for compatability, and to review model results
to verify if the changes are acceptable.

See code updates in PR #782
and PR #863

Alternatives preprocessors for trip destination.

Added alternatives preprocessor in
PR #865,
and converted to separate preprocessors for sample (at the TAZ level) and
simulate (at the MAZ level for 2 zone systems) in
PR #869.

Per-component sharrow controls

This version adds a uniform interface for controlling sharrow optimizations
at the component level. This allows users to disable sharrow entirely,
or to disable the "fastmath" optimization for individual components.
Controls for sharrow are set in each component's settings under compute_settings.
For example, to disable sharrow entirely for a component, use:

compute_settings:
  sharrow_skip: true

This overrides the global sharrow setting, and is useful if you want to skip
sharrow for particular components, either because their specifications are
not compatible with sharrow or if the sharrow performance is known to be
poor on this component.

When a component has multiple subcomponents, the sharrow_skip setting can be
a dictionary that maps the names of the subcomponents to boolean values.
For example, in the school escorting component, to skip sharrow for an
OUTBOUND and OUTBOUND_COND subcomponent but not the INBOUND subcomponent,
use the following settings:

compute_settings:
  sharrow_skip:
    OUTBOUND: true
    INBOUND: false
    OUTBOUND_COND: true

The compute_settings can also be used to disable the "fastmath" optimization.
This is useful if the component is known to have numerical stability issues
with the fastmath optimization enabled, usually when the component potentially
works with data that includes NaN or Inf values. To disable fastmath for
a component, use:

compute_settings:
  fastmath: false

Code updates that apply these settings are in
PR #824.

Configuration validation

Version 1.3 adds a configuration validation system using the Pydantic library.
Previously, the YAML-based configuration files were allowed to contain arbitrary
keys and values, which could lead to errors if the configuration was not correctly
specified. The new validation system checks the configuration files for correctness,
and provides useful error messages if the configuration is invalid. Invalid
conditions include missing required keys, incorrect data types, and the presence
of unexpected keys. Existing models may need to be cleaned up (i.e. extraneous settings
in config files removed) to conform to the new validation system.

See PR #758 for code updates.

Input checker

Version 1.3 adds an input checker that verifies that the input data is consistent
with expectations. This tool can help identify problems with the input data before
the model is run, and can be used to ensure that the input data is correctly
formatted and complete.

See PR #753 for code updates.

Removal of orca dependency

This new version of ActivitySim does not use orca as a dependency, and thus does
not rely on orca’s global state to manage data. Instead, a new State
class is introduced, which encapsulates the current state of a simulation including
all data tables. This is a significant change “under the hood”, which may be
particularly consequential for model that use “extensions” to the ActivitySim framework.
See PR #654 for code updates.

Pull Requests included in this release

Black 2022 style by @jpn-- in #642
add model_settings to estimator.write_coefficients by @bwentl in #651
BayDAG Contributions by @dhensle in #657
docs: apply a minor correction to user guides by @asiripanich in #659
Trip scheduling logic by @jpn-- in #660
Pin Dependencies by @jpn-- in #665
Updated SEMCOG Example by @dhensle in #603
syncronize by @jpn-- in #680
Replace ORCA with non-global state by @jpn-- in #654
Fix memory usage by @jpn-- in #751
orca residual cleanup by @jpn-- in #694
Disable unstable estimation mode test by @jpn-- in #765
Overflow protection by @jpn-- in #764
Fixes windows error on large MAZ systems by @jpn-- in #760
update repo test pointers by @jpn-- in #783
Input Checker by @dhensle in #753
Merge missing changes from main back to develop by @jpn-- in #788
Config Settings and Documentation by @jpn-- in #758
Explicit chunking by @jpn-- in #759
Selecting choices from joint tour participant ID column explicitly; by @bricegnichols in #653
Option to write output tables as parquet files by @stefancoe in #763
rollback mistaken merge by @jpn-- in #793
Moved 'tot_tours' from nm tour frequency script to alternatives file by @JoeJimFlood in #661
Add options to handle larger dataset for location models by @bwentl in #687
Pydantic 2 by @jpn-- in #796
added stricter joining of annotated fields by @nick-fournier-rsg in #672
Vehicle Type Alts Filtering Bug Fix by @dhensle in #790
Data Type Optimization by @i-am-sijia in #782
Alt col name bug fix for option to handle larger dataset for location models by @bwentl in #798
Performance monitoring fixes by @jpn-- in #797
(pre) release for 1.3 by @jpn-- in #800
cleanup_failed_trips requires workflow state by @jpn-- in #801
Vehicle Type Optimization by @dhensle in #806
remove code duplication by @jpn-- in #803
Fix at-work frequency by @jpn-- in #808
generate stable vehicle type dtype categories by @jpn-- in #807
explicit chunking for interaction-simulate components by @jpn-- in #804
use default versioning scheme, not simplified by @jpn-- in #809
No categories for escort participants by @jpn-- in #813
School Escorting Optimization by @dhensle in #810
Fix inconsistent alts object name by @i-am-sijia in #817
Time period to categorical in legacy mode by @i-am-sijia in #819
Infrastructure Updates by @jpn-- in #812
Automatically ignore output directories from workflow runs by @jpn-- in #829
Vehicle type Categorical by @jpn-- in #826
BayDAG Contribution #1: Auto Ownership Processing by @dhensle in #767
BayDAG Contribution #6: Joint Tour Participation Infinite Loop by @dhensle in #772
BayDAG Contribution #4: Joint Tour Destination Performance by @dhensle in #770
BayDAG Contribution #8: Landuse and Reindex available in location choice by @dhensle in #774
BayDAG Contribution #9: Mode Choice Logsum Extraction by @dhensle in #775
BayDAG Contribution #2: Enhanced Disaggregate Accessibility Merging by @dhensle in #768
BayDAG Contribution #15: Larch Interface for Added Models by @dhensle in #781
BayDAG Contribution #5: Joint Tour Frequency and Composition Estimation by @dhensle in #771
BayDAG Contribution #7: Sampling in EDB for Location Choice by @dhensle in #773
Changed Environment YML by @lachlan-git in #832
BayDAG Contribution #3: CDAP estimation with joint component by @dhensle in #769
BayDAG Contribution #10: NMTF Person Available Periods by @dhensle in #776
BayDAG Contribution #13: Estimation Mode Usability by @dhensle in #779
BayDAG Contribution #14: Increasing Larch Loading for NMTF by @dhensle in #780
pin pandera by @jpn-- in #843
fix bug in interaction_simulate by @jpn-- in #839
BayDAG Contribution #11: School Escorting Estimation Updates by @dhensle in #777
BayDAG Contribution #16: Parking Locations in Trip Matrices by @dhensle in #840
Common settings for fastmath, sharrow_skip, and other compute controls across components by @jpn-- in #824
BayDAG All Contributions by @dhensle in #834
Automatically drop unneeded columns in choosers table by @i-am-sijia in #833
add nd_skims for sharrow flows on trip dest by @jpn-- in #852
Add tolerance control on progressive testing by @jpn-- in #855
allow input checker aux files to be parquet by @jpn-- in #851
Various minor fixes by @jpn-- in #856
School Escorting Runtime Optimization by @dhensle in #828
Allow skim access in trip mode choice annotate by @dhensle in #857
Consecutive time window overlap bug fix by @dhensle in #854
state.run.all should allow memory profiling by @jpn-- in #859
ensure tour_type is categorical by @jpn-- in #863
Trip Destination Alternatives Preprocessor by @dhensle in #865
Trip destination alts preprocessor for both sample and simulate steps by @dhensle in #869
protect_columns for all simulations (choosers, alts, simple simulate, interaction simulate, etc) by @i-am-sijia in #871
joint_tour_frequency_composition column to categorical by @i-am-sijia in #866
Explicit chunking on all interaction simulate models by @dhensle in #870
require numpy<2.x by @jpn-- in #875
Changes to support sharrow on 2-zone model by @jpn-- in #867
Integer Encoding Vehicle Allocation Spec by @dhensle in #877
Performance docs by @jpn-- in #878
Release notes v1.3 by @jpn-- in #882
Bugfix for trip scheduling choice by @jpn-- in #884
Fix docs, add performance report by @jpn-- in #885

New Contributors

@bwentl made their first contribution in #651
@asiripanich made their first contribution in #659
@bricegnichols made their first contribution in #653
@stefancoe made their first contribution in #763
@lachlan-git made their first contribution in #832

Full Changelog: v1.2.2...v1.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.3.0