-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
478 update main #115
478 update main #115
Conversation
b408579
to
18fdf94
Compare
d24a5cc
to
e6d86dc
Compare
…when performing joins
…y in contributors
- Keep columns is now a list not dict - Created master dictionary with all column names to avoid overlap in dicts
mbs_results/staging/dfs_from_spp.py
Outdated
from rdsa_utils.cdp.helpers.s3_utils import load_json | ||
|
||
|
||
def dfs_from_spp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function name should have a verb, read_dfs_from_spp
perhaps? or something like this
return mapper | ||
|
||
|
||
def read_and_combine_colon_sep_files(folder_path, column_names, config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing docs
|
||
Returns | ||
------- | ||
_type_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing type
output_path = config["output_path"] | ||
file_version_mbs = metadata.metadata("monthly-business-survey-results")["version"] | ||
snapshot_name = config["mbs_file_name"].split(".")[0] | ||
imputation_filename = f"imputation_output_v{file_version_mbs}_{snapshot_name}.csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loved that you have snapshot name
warnings.warn("A placeholder function for validating dataframe post staging") | ||
|
||
|
||
def validate_imputation(df: pd.DataFrame, config: dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing docs in all validation
mbs_results/estimation/estimate.py
Outdated
Parameters | ||
---------- | ||
df : pd.DataFrame | ||
_description_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing description and types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work some very minor changes in the comments, also might be worth adding a unit testing the run main function, this will guarantee that everything's integrated if the pipeline is extended or updated. We test the methods thoroughly so just a basic integration testing should be fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looking good :)
Summary
Add your summary here - keep it brief, to the point, and in plain English.
Checklists
This pull request meets the following requirements:
If you feel some of these conditions do not apply for this pull request, please
add a comment to explain why.