Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert notebooks to scripts #23

Merged
merged 16 commits into from
May 3, 2024
Merged

Convert notebooks to scripts #23

merged 16 commits into from
May 3, 2024

Conversation

sgreenbury
Copy link
Collaborator

@sgreenbury sgreenbury commented Apr 29, 2024

This PR closes #21 converting the preprocessing notebooks to scripts so that the downstream data is constructed for notebooks 3 and onwards.

Outstanding areas to look at refining:

  • Make deterministic: set seed/random_state as part of input
  • Write metrics that capture matching performance/quality to file for validation (relates to Validation framework for model #17)
  • Flexible input to allow different sized samples upon running the script


# temporary reduction of the dataset for quick analysis
# TODO: check if this should be present?
spc = spc.head(15000)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps can be an input parameter for the script to determine if run on the whole population?

Copy link
Collaborator

@Hussein-Mahfouz Hussein-Mahfouz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the readme! Everything else looks good to me, except I can't run the 1_prep_synthpop.py. I think this is an issue in my setup so can revisit it in the weekly meeting. Happy to merge this if @BZ-BowenZhang is ok with it also

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no oa11cd and msoa11cd found in the region_people_hh.parquet

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the output file from scripts/1_prep_synthpop.py, the column names are oa and msoa, I think that is the reason for this error

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The acbm package automatically installed from poetry does not have the processing sub-module, I had to manually move the processing.py into the folder to fix this issue

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error seems fixed when I ran poetry update again

scripts/2.1_sandbox-match_households.py Outdated Show resolved Hide resolved
@BZ-BowenZhang BZ-BowenZhang merged commit fa5227b into main May 3, 2024
4 checks passed
@Hussein-Mahfouz Hussein-Mahfouz deleted the 21-convert-to-scripts branch September 19, 2024 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert preprocessing notebooks to scripts
3 participants