Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visualization Pull Request #532

Merged
merged 102 commits into from
Apr 12, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
102 commits
Select commit Hold shift + click to select a range
97ca9ac
First summarize commit.
Nov 11, 2021
8323213
Quick concept test for summarize
Nov 11, 2021
44be451
generalized output path
chesterharvey Nov 12, 2021
981803d
Added functionality for summarize to access pipeline if available or …
chesterharvey Nov 16, 2021
ca3b205
Initialize test via pipeline
Nov 18, 2021
a24e754
update testing location
Nov 18, 2021
85abe6a
organize mock pipeline.
Nov 18, 2021
f32525d
update for network_los as an injectable.
Nov 30, 2021
d1c8f42
added logic to add output directory if not already available before i…
chesterharvey Nov 18, 2021
e5e8e14
Added logging for summaries
chesterharvey Nov 18, 2021
828d973
generalized summarize model so that all tables are available as local…
chesterharvey Nov 30, 2021
7e187f4
added annotate_trips to summary function--needs work
chesterharvey Nov 30, 2021
6e9c804
built summaries for demo dashboard
chesterharvey Nov 30, 2021
de289e9
updates for network_los
Nov 30, 2021
2fbdfed
added skim summary columns to trips in summary function
chesterharvey Dec 1, 2021
9910dfd
added simwrapper yamls and summarize step on model run
chesterharvey Dec 2, 2021
0167503
added simwrapper templates to example_mtc output
chesterharvey Dec 2, 2021
d286b50
updates to summary expressions and simwrapper yamls
chesterharvey Dec 2, 2021
3d291f1
Add slicers and aggregators. Intermediate check-in. Need to re-attach…
Dec 7, 2021
38ae31b
Reimplement initial skims based calculations. more work needed.
Dec 7, 2021
b8a7128
Add preprocessor to summarize.
Dec 8, 2021
3e7c1d2
changed key name for bin breaks
chesterharvey Dec 8, 2021
09bae50
Add preprocessor
Dec 8, 2021
eacdfa9
summarize preprocessor expressions
chesterharvey Dec 10, 2021
a6e461f
Update summarize_preprocessor.csv
chesterharvey Dec 10, 2021
2d564d0
Update summarize_preprocessor.csv
chesterharvey Dec 10, 2021
de4e62a
updated summarize preprocessor expressions
chesterharvey Dec 10, 2021
0f811ab
summarize preprocessor expressions
chesterharvey Dec 10, 2021
0a49cc2
summary expression updates
chesterharvey Dec 10, 2021
6c84299
added summarize preprocessors
chesterharvey Dec 11, 2021
20268da
updated summarize preprocessor expressions
chesterharvey Dec 11, 2021
2e5157d
updated summarize configs in example_mtc model
chesterharvey Dec 14, 2021
3e877bd
Added binning functions for preprocessing and summary expressions
chesterharvey Dec 15, 2021
15c3a6a
added example summaries incorporating slicers
chesterharvey Dec 16, 2021
243c9ed
remove old csv file....
Dec 22, 2021
a51a201
remove extraneous print statement
Dec 22, 2021
dfbae13
black + isort
Dec 22, 2021
92a20df
Type hints
Dec 22, 2021
aa4c528
Add type hints
Dec 22, 2021
211e1d8
Change debug level
Dec 22, 2021
3cd363d
test data
Dec 22, 2021
04f3ed2
Initial docstrings
Dec 22, 2021
8e6bcf0
code cleanup
Dec 22, 2021
186ae09
Re-orient tests from root directory
Jan 4, 2022
a5be56f
Reorient tests
Jan 4, 2022
ce85d26
Update Sphinx docs
Jan 4, 2022
6b6055e
fix pycodestyle
Jan 4, 2022
80310a8
Fix binning overflow.
Jan 4, 2022
5356a19
Constrain Numpy version #533
Jan 4, 2022
7c704ed
Remove write_summaries from marin example
Jan 4, 2022
0ec3c47
Docstrings for summarize.py methods
chesterharvey Jan 5, 2022
89013ce
aggregation tests added for summarize model
chesterharvey Jan 5, 2022
0c025b1
pycodestyle cleanup
chesterharvey Jan 5, 2022
a2023b4
fixed typo in example_MTC summarize.yaml
chesterharvey Jan 6, 2022
b22e978
initialized testing structure for auto ownership mode
chesterharvey Feb 7, 2022
68a65a2
made all standard pipeline tables available for summary expressions
chesterharvey Feb 10, 2022
d9277a5
syntax big fix
chesterharvey Feb 10, 2022
eb36fa5
updated functionality for exporting available pipeline tables for exp…
chesterharvey Feb 10, 2022
60ca801
configured export of pipeline tables as a yaml flag
chesterharvey Feb 10, 2022
f551dd1
setup testing framework for auto_ownership model
chesterharvey Feb 14, 2022
a66e265
fixed code style errors
chesterharvey Feb 23, 2022
60fd31c
allow annotate_preprocessors to annotate a table without a skim wrapper
chesterharvey Feb 23, 2022
f812657
defined separate preprocessors for different pipeline tables
chesterharvey Feb 23, 2022
0458d45
enabled preprocessing for multiple tables
chesterharvey Feb 23, 2022
83a9522
added ability to create temporary variables in summary expressions file
chesterharvey Feb 24, 2022
0deec59
added the simwrapper python package as a dependency
chesterharvey Feb 24, 2022
78dd56a
fixed python style
chesterharvey Feb 24, 2022
831d58c
updated summary expressions
chesterharvey Feb 24, 2022
2f2dbe3
add Dockerfile
i-am-sijia Feb 24, 2022
e53c99c
added summarize model to example_mtc
chesterharvey Feb 24, 2022
bac92a8
updated visualization docs
chesterharvey Feb 24, 2022
c711bdf
updated docs
chesterharvey Feb 24, 2022
cc9823a
reverting to earlier commit to address travis testing failures
chesterharvey Feb 24, 2022
1dc1aa0
updates to make all pipeline tables available as locals and allow tem…
chesterharvey Feb 25, 2022
38be435
updates to viz documentation
chesterharvey Feb 25, 2022
0b68119
pycodestyle fix
chesterharvey Feb 25, 2022
85c9f04
viz documentation updates
chesterharvey Feb 25, 2022
8198d5b
remove unnecessary test pipeline tables
chesterharvey Feb 25, 2022
9e69428
Add expressions for tours and trips counts
chesterharvey Feb 25, 2022
a30f160
Allow yamls to maintained in example outputs
chesterharvey Feb 25, 2022
aec98f1
Update dashboard-1-summary.yaml
chesterharvey Feb 25, 2022
b17943b
Add summarize model to example_mtc settings
chesterharvey Feb 25, 2022
bb9ad1f
Add simwrapper as a dependency
chesterharvey Feb 25, 2022
9bb224c
added summarize config files to mtc example
chesterharvey Feb 25, 2022
a898568
Updates to docs
chesterharvey Feb 28, 2022
51728e2
rename 'slicers' as 'bins'
chesterharvey Feb 28, 2022
279f060
Allow export of pipeline tables
chesterharvey Mar 1, 2022
139aa1b
Update SLICERS to BIN
chesterharvey Mar 1, 2022
227908f
pycodestyle updates
chesterharvey Mar 1, 2022
79d88f5
removed unnecessary geojson from sample data
chesterharvey Mar 2, 2022
1b1a4c6
Added simwrapper to environment yamls temporarily with pip
chesterharvey Mar 2, 2022
3974e39
Enabled pipeline table export for expression development
chesterharvey Mar 2, 2022
15235e8
add docker-compose
i-am-sijia Mar 3, 2022
d96c740
Merge branch 'mtc_tm2' into ft_vis_1
i-am-sijia Mar 3, 2022
b9ecce4
Revert "Merge branch 'mtc_tm2' into ft_vis_1"
i-am-sijia Mar 3, 2022
0791b4a
Update summarize.csv
chesterharvey Mar 3, 2022
e6f7077
Merge branch 'develop' into ft_vis_1
chesterharvey Mar 3, 2022
afbfe4b
update badge link to travis-ci.com
chesterharvey Mar 3, 2022
d76a94f
fixed bug in travis badge url
chesterharvey Mar 3, 2022
cd8fbad
simwrapper is on conda-forge now, so remove pip install rules & docs
billyc Mar 6, 2022
812186f
Merge pull request #3 from billyc/ft_vis_1
i-am-sijia Mar 7, 2022
8110573
add back pip install -e
i-am-sijia Mar 7, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -72,3 +72,4 @@ _test_est
*_local/
*_local.*

**/output/
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
ActivitySim
===========

[![Build Status](https://travis-ci.org/ActivitySim/activitysim.svg?branch=master)](https://travis-ci.org/ActivitySim/activitysim)[![Coverage Status](https://coveralls.io/repos/github/ActivitySim/activitysim/badge.svg?branch=master)](https://coveralls.io/github/ActivitySim/activitysim?branch=master)
[![Build Status](https://travis-ci.com/ActivitySim/activitysim.svg?branch=master)](https://travis-ci.org/github/ActivitySim/activitysim)[![Coverage Status](https://coveralls.io/repos/github/ActivitySim/activitysim/badge.svg?branch=master)](https://coveralls.io/github/ActivitySim/activitysim?branch=master)

The mission of the ActivitySim project is to create and maintain advanced, open-source,
activity-based travel behavior modeling software based on best software development
Expand All @@ -15,4 +15,4 @@ and benefit from contributions of other agency partners.

## Documentation

https://activitysim.github.io/activitysim
https://activitysim.github.io/activitysim
359 changes: 335 additions & 24 deletions activitysim/abm/models/summarize.py
Original file line number Diff line number Diff line change
@@ -1,43 +1,354 @@
# ActivitySim
# See full license in LICENSE.txt.
import logging
import sys
import os

import numpy as np
import pandas as pd
from activitysim.abm.models.trip_matrices import annotate_trips
from activitysim.core import config, expressions, inject, pipeline

logger = logging.getLogger(__name__)

from activitysim.core import pipeline
from activitysim.core import inject
from activitysim.core import config

from activitysim.core.config import setting
def wrap_skims(
network_los: pipeline.Pipeline,
trips_merged: pd.DataFrame
) -> dict[str, object]:
"""
Retrieve skim wrappers for merged trips.

logger = logging.getLogger(__name__)
For each record in `trips_merged`, retrieve skim wrappers for appropriate time of day.

Returns dictionary of skims wrappers that are available for use in expressions defined
in `summarize_preprocessor.csv`
"""
skim_dict = network_los.get_default_skim_dict()

trips_merged['start_tour_period'] = network_los.skim_time_period_label(
trips_merged['start']
)
trips_merged['end_tour_period'] = network_los.skim_time_period_label(
trips_merged['end']
)
trips_merged['trip_period'] = network_los.skim_time_period_label(
trips_merged['depart']
)

tour_odt_skim_stack_wrapper = skim_dict.wrap_3d(
orig_key='origin_tour',
dest_key='destination_tour',
dim3_key='start_tour_period',
)
tour_dot_skim_stack_wrapper = skim_dict.wrap_3d(
orig_key='destination_tour', dest_key='origin_tour', dim3_key='end_tour_period'
)
trip_odt_skim_stack_wrapper = skim_dict.wrap_3d(
orig_key='origin_trip', dest_key='destination_trip', dim3_key='trip_period'
)

tour_od_skim_stack_wrapper = skim_dict.wrap('origin_tour', 'destination_tour')
trip_od_skim_stack_wrapper = skim_dict.wrap('origin_trip', 'destination_trip')

return {
"tour_odt_skims": tour_odt_skim_stack_wrapper,
"tour_dot_skims": tour_dot_skim_stack_wrapper,
"trip_odt_skims": trip_odt_skim_stack_wrapper,
"tour_od_skims": tour_od_skim_stack_wrapper,
"trip_od_skims": trip_od_skim_stack_wrapper,
}


DEFAULT_BIN_LABEL_FORMAT = "{left:,.2f} - {right:,.2f}"


def construct_bin_labels(bins: pd.Series, label_format: str) -> pd.Series:
"""
Construct bin label strings based on intervals (pd.Interval) in `bins`

`label_format` is an F-string format that can reference the following variables:
- 'left': Bin minimum
- 'right': Min maximum
- 'mid': Bin center
- 'rank': Bin rank (lowest to highest)

For example: '{left:,.2f} - {right:,.2f}' might yield '0.00 - 1.00'
"""
left = bins.apply(lambda x: x.left)
mid = bins.apply(lambda x: x.mid)
right = bins.apply(lambda x: x.right)
# Get integer ranks of bins (e.g., 1st, 2nd ... nth quantile)
rank = mid.map(
{
x: sorted(mid.unique().tolist()).index(x) + 1 if pd.notnull(x) else np.nan
for x in mid.unique()
},
na_action='ignore',
)

def construct_label(label_format, bounds_dict):
bounds_dict = {
x: bound for x, bound in bounds_dict.items() if x in label_format
}
return label_format.format(**bounds_dict)

labels = pd.Series(
[
construct_label(label_format, {'left': lt, 'mid': md, 'right': rt, 'rank': rk})
for lt, md, rt, rk in zip(left, mid, right, rank)
],
index=bins.index,
)
# Convert to numeric if possible
labels = pd.to_numeric(labels, errors='ignore')
return labels


def quantiles(
data: pd.Series,
bins: pd.Series,
label_format: str = DEFAULT_BIN_LABEL_FORMAT
) -> pd.Series:
"""
Construct quantiles from a Series given a number of bins.

For example: set bins = 5 to construct quintiles.

data: Input Series
bins: Number of bins
label_format: F-string format for bin labels
Bins are labeled with 'min - max' ranges by default.

Returns a Series indexed by labels
"""
vals = data.sort_values()
# qcut a ranking instead of raw values to deal with high frequencies of the same value
# (e.g., many 0 values) that may span multiple bins
ranks = vals.rank(method='first')
bins = pd.qcut(ranks, bins, duplicates='drop')
bins = construct_bin_labels(bins, label_format)
return bins


def spaced_intervals(
data: pd.Series,
lower_bound: float,
interval: float,
label_format: str = DEFAULT_BIN_LABEL_FORMAT,
) -> pd.Series:
"""
Construct evenly-spaced intervals from a Series given a starting value and bin size.

data: Input Series
lower_bound: Minimum value of lowest bin
interval: Bin spacing above the `lower_bound`
label_format: F-string format for bin labels
Bins are labeled with 'min - max' ranges by default.

Returns a Series indexed by labels
"""
if lower_bound == 'min':
lower_bound = data.min()
breaks = np.arange(lower_bound, data.max() + interval, interval)
bins = pd.cut(data, breaks, include_lowest=True)
bins = construct_bin_labels(bins, label_format)
return bins


def equal_intervals(
data: pd.Series,
bins: int,
label_format: str = DEFAULT_BIN_LABEL_FORMAT
) -> pd.Series:
"""
Construct equally-spaced intervals across the entire range of a Series.

data: Input Series
bins: Number of bins
label_format: F-string format for bin labels
Bins are labeled with 'min - max' ranges by default.

Returns a Series indexed by labels
"""
bins = pd.cut(data, bins, include_lowest=True)
bins = construct_bin_labels(bins, label_format)
return bins


def manual_breaks(
data: pd.Series,
bin_breaks: list,
labels: list = None,
label_format: str = DEFAULT_BIN_LABEL_FORMAT
) -> pd.Series:
"""
Classify numeric data in a Pandas Series into manually-defined bins.

data: Input Series
bin_breaks: Break points between bins
labels: Manually-defined labels for each bin (`len(labels)` == `len(bin_breaks) + 1`)
label_format: F-string format for bin labels if not defined by `labels`
Bins are labeled with 'min - max' ranges by default.

Returns a Series indexed by labels
"""
if isinstance(labels, list):
return pd.cut(data, bin_breaks, labels=labels, include_lowest=True)
else:
bins = pd.cut(data, bin_breaks, include_lowest=True)
bins = construct_bin_labels(bins, label_format)
return bins


@inject.step()
def write_summaries(output_dir):
def summarize(
network_los: pipeline.Pipeline,
persons: pd.DataFrame,
persons_merged: pd.DataFrame,
households: pd.DataFrame,
households_merged: pd.DataFrame,
trips: pd.DataFrame,
tours: pd.DataFrame,
tours_merged: pd.DataFrame,
land_use: pd.DataFrame,
):
"""
A standard model that uses expression files to summarize pipeline tables for vizualization.

Summaries are configured in `summarize.yaml`, including specification of the
expression file (`summarize.csv` by default).

Columns in pipeline tables can also be sliced and aggregated prior to summarization.
This preprocessing is configured in `summarize.yaml`.

Outputs a seperate csv summary file for each expression;
outputs starting with '_' are saved as temporary local variables.
"""
trace_label = 'summarize'
model_settings_file_name = 'summarize.yaml'
model_settings = config.read_model_settings(model_settings_file_name)

output_location = (
model_settings['OUTPUT'] if 'OUTPUT' in model_settings else 'summaries'
)
os.makedirs(config.output_file_path(output_location), exist_ok=True)

spec = pd.read_csv(
config.config_file_path(model_settings['SPECIFICATION']), comment='#'
)

# Load dataframes from pipeline
persons = persons.to_frame()
persons_merged = persons_merged.to_frame()
households = households.to_frame()
households_merged = households_merged.to_frame()
trips = trips.to_frame()
tours = tours_merged.to_frame()
tours_merged = tours_merged.to_frame()
land_use = land_use.to_frame()

# - trips_merged - merge trips and tours_merged
trips_merged = pd.merge(
trips,
tours_merged.drop(columns=['person_id', 'household_id']),
left_on='tour_id',
right_index=True,
suffixes=['_trip', '_tour'],
how="left",
)

# Add dataframes as local variables
locals_d = {
'persons': persons,
'persons_merged': persons_merged,
'households': households,
'households_merged': households_merged,
'trips': trips,
'trips_merged': trips_merged,
'tours': tours_merged,
'tours_merged': tours_merged,
'land_use': land_use,
}

skims = wrap_skims(network_los, trips_merged)

# Annotate trips_merged
expressions.annotate_preprocessors(
trips_merged, locals_d, skims, model_settings, 'summarize'
)

for table_name, df in locals_d.items():
if table_name in model_settings:

meta = model_settings[table_name]
df = eval(table_name)

if 'AGGREGATE' in meta and meta['AGGREGATE']:
for agg in meta['AGGREGATE']:
assert set(('column', 'label', 'map')) <= agg.keys()
df[agg['label']] = (
df[agg['column']].map(agg['map']).fillna(df[agg['column']])
)

if 'BIN' in meta and meta['BIN']:
for slicer in meta['BIN']:
if slicer['type'] == 'manual_breaks':
df[slicer['label']] = manual_breaks(
df[slicer['column']], slicer['bin_breaks'], slicer['bin_labels']
)

elif slicer['type'] == 'quantiles':
df[slicer['label']] = quantiles(
df[slicer['column']], slicer['bins'], slicer['label_format']
)

elif slicer['type'] == 'spaced_intervals':
df[slicer['label']] = spaced_intervals(
df[slicer['column']],
slicer['lower_bound'],
slicer['interval'],
slicer['label_format'],
)

elif slicer['type'] == 'equal_intervals':
df[slicer['label']] = equal_intervals(
df[slicer['column']], slicer['bins'], slicer['label_format']
)

summary_settings_name = 'output_summaries'
summary_file_name = 'summaries.txt'
# Output pipeline tables for expression development
if model_settings['EXPORT_PIPELINE_TABLES'] is True:
pipeline_table_dir = os.path.join(output_location, 'pipeline_tables')
os.makedirs(config.output_file_path(pipeline_table_dir), exist_ok=True)
for name, df in locals_d.items():
df.to_csv(config.output_file_path(os.path.join(pipeline_table_dir, f'{name}.csv')))

summary_settings = setting(summary_settings_name)
# Add classification functions to locals
locals_d.update(
{
'quantiles': quantiles,
'spaced_intervals': spaced_intervals,
'equal_intervals': equal_intervals,
'manual_breaks': manual_breaks,
}
)

if summary_settings is None:
logger.info("No {summary_settings_name} specified in settings file. Nothing to write.")
return
for i, row in spec.iterrows():

summary_dict = summary_settings
out_file = row['Output']
expr = row['Expression']

mode = 'wb' if sys.version_info < (3,) else 'w'
with open(config.output_file_path(summary_file_name), mode) as output_file:
# Save temporary variables starting with underscores in locals_d
if out_file.startswith('_'):

for table_name, column_names in summary_dict.items():
logger.debug(f'Temp Variable: {expr} -> {out_file}')

df = pipeline.get_table(table_name)
locals_d[out_file] = eval(expr, globals(), locals_d)
continue

for c in column_names:
n = 100
empty = (df[c] == '') | df[c].isnull()
logger.debug(f'Summary: {expr} -> {out_file}.csv')

print(f"\n### {table_name}.{c} type: {df.dtypes[c]} rows: {len(df)} ({empty.sum()} empty)\n\n",
file=output_file)
print(df[c].value_counts().nlargest(n), file=output_file)
resultset = eval(expr, globals(), locals_d)
resultset.to_csv(
config.output_file_path(os.path.join(output_location, f'{out_file}.csv')),
index=False,
)
Loading