-
Notifications
You must be signed in to change notification settings - Fork 80
Make TSDataset.to_flatten
faster for big datasets
#848
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
🚀 Deployed on https://deploy-preview-848--etna-docs.netlify.app |
Code for generation benchmark datasets: def load_dataset(
num_segments: int,
num_periods: int = 100,
num_add_blocks: int = 0,
add_object_category: bool = False,
make_encoded_category: bool = True,
random_state: int = 0,
) -> pd.DataFrame:
rng = np.random.default_rng(random_state)
df = generate_ar_df(
periods=num_periods, start_time="2020-01-01", n_segments=num_segments
)
for i in range(num_add_blocks):
# add int column
df[f"new_int_{i}"] = rng.integers(low=-100, high=100, size=df.shape[0])
# add float column
df[f"new_float_{i}"] = rng.uniform(low=-100, high=100, size=df.shape[0])
# add category column
num_categories = num_segments // 10
categories = list(range(num_categories))
column_values = rng.choice(categories, size=df.shape[0])
# in this case we make encoded categories as category
if make_encoded_category:
df[f"new_cat_{i}_cat"] = column_values
df[f"new_cat_{i}_cat"] = df[f"new_cat_{i}_cat"].astype("category")
# in this case we keep them int (it can be beneficial for some methods)
else:
df[f"new_cat_{i}_encoded"] = column_values
if add_object_category:
num_categories = num_segments // 10
categories = [str(cat) for cat in range(num_categories)]
df["new_obj_cat"] = rng.choice(categories, size=df.shape[0])
df["new_obj_cat"] = df["new_obj_cat"].astype("category")
return df Results of benchmarkThere is a little mistake on pictures below. The second plot on each image represents situation when |
Codecov Report
@@ Coverage Diff @@
## master #848 +/- ##
===========================================
- Coverage 84.65% 49.36% -35.30%
===========================================
Files 130 130
Lines 7411 7414 +3
===========================================
- Hits 6274 3660 -2614
- Misses 1137 3754 +2617
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
martins0n
approved these changes
Aug 15, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Before submitting (must do checklist)
Proposed Changes
Look #781.
Closing issues
Closes #781.
Closes #777.