Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

joint_tour_frequency_composition column to categorical #866

Merged
merged 2 commits into from
May 21, 2024

Conversation

i-am-sijia
Copy link
Contributor

joint_tour_frequency_composition column currently is a string, this PR converts it to pandas categorical.

@i-am-sijia i-am-sijia requested a review from jpn-- May 6, 2024 14:03
@@ -192,6 +192,9 @@ def joint_tour_frequency_composition(
households_merged["joint_tour_frequency_composition"] = (
choices.reindex(households_merged.index).fillna(no_tours_alt).astype(str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the "alt" column of the joint_tour_frequency_composition_alternatives.csv input file (at least for the SANDAG demo model) is integers. The .reindex here is potentially leaving null values, which promotes the choice column to float (at least for pandas 1.x), then we .fillna which converts the NaNs to 0. I'm not sure why the last step here is converting the string when a convert to integer would be fine. I don't think we want this to end up as categorical with the category labels as integers, that's confusing. Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true. We could just keep this column as int. I don't recall why this is converted to str in the last step. I just checked the ABM3 configs, it's not using this column in downstream UECs, so we should be able to actually change this to int without any impact downstream.

However, if we just do .astype(int) here, by default it will use int64, which uses more memory than if we convert it to pandas categorical. I also feel grudge against something like a hardcoded .astype(int8).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like

households_merged["joint_tour_frequency_composition"] = pd.to_numeric(
    choices.reindex(households_merged.index).fillna(no_tours_alt),
    downcast="integer"
)

@jpn-- jpn-- merged commit d8c5829 into ActivitySim:main May 21, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants