Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed issue with run live or frozen #119

Merged
merged 4 commits into from
Oct 30, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 9 additions & 6 deletions mbs_results/staging/data_cleaning.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
from typing import List

import numpy as np
import pandas as pd

from mbs_results.utilities.utils import convert_column_to_datetime
Expand Down Expand Up @@ -308,6 +307,8 @@ def run_live_or_frozen(

"""

df = df.copy()

if state not in ["frozen", "live"]:
raise ValueError(
"""{} is not an accepted state status, use either frozen or live """.format(
Expand All @@ -316,8 +317,10 @@ def run_live_or_frozen(
)

if state == "frozen":

df.loc[df[error_marker].isin(error_values), target] = np.nan
df["frozen_error"] = df.apply(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice use of lambda function to simplify copying over the value from the target column :)

lambda x: x[target] if x[error_marker] in (error_values) else "", axis=1
)
df = df.fillna("")

return df

Expand Down Expand Up @@ -439,8 +442,8 @@ def correct_values(
# Update value only if columns exist
if set(check_columns).issubset(df.columns):

df_temp.loc[
df[condition_column].isin(condition_values), columns_to_correct
] = replace_with
df_temp.loc[df[condition_column].isin(condition_values), columns_to_correct] = (
replace_with
)

return df_temp
16 changes: 8 additions & 8 deletions tests/data/staging/data_cleaning/test_run_live_or_frozen.csv
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
target,error,live,frozen
1,C,1,1
2,E,2,
3,O,3,3
4,W,4,
5,C,5,5
6,E,6,
7,W,7,
target,error,live,frozen,frozen_error
2,C,2,2,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does frozen_error have a numerical type? Does this not relate to the other error in col2?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frozen_error contains the deleted adjusted values.

7,E,7,,7
1,O,1,1,
6,W,6,,6
3,C,3,3,
5,E,5,,5
4,W,4,,4
9 changes: 6 additions & 3 deletions tests/staging/test_data_cleaning.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,13 +104,16 @@ def test_run_live_or_frozen(filepath):

df = pd.read_csv(filepath / "test_run_live_or_frozen.csv")

df_in = df.drop(columns=["frozen"])
df_in = df.drop(columns=["frozen", "frozen_error"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guessing you are using one input data to have the input and correct output which is why its being dropper early in test?


live_ouput = run_live_or_frozen(df_in, "target", "error", "live")

frozen_output = run_live_or_frozen(df_in, "target", "error", "frozen")

expected_output_frozen = df_in.copy()
expected_output_frozen["target"] = df["frozen"]
expected_output_frozen = df.copy()

expected_output_frozen.drop(columns=["frozen"], inplace=True)
expected_output_frozen = expected_output_frozen.fillna("")

assert_frame_equal(frozen_output, expected_output_frozen)
assert_frame_equal(live_ouput, df_in)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never noticed this before, but what happens if the first assert fails, but the second passes? or vice versa? Just want to make sure the unit test fails if both or one fail

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked together and it fails if either one fails.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorted this, agreed that if one fails the unit test fails. Only appears as one unit test though

Expand Down
Loading