Skip to content

Feature selection #875

Merged
merged 36 commits into from
Jul 11, 2023
Merged

Feature selection #875

merged 36 commits into from
Jul 11, 2023

Conversation

scanhex12
Copy link
Collaborator

@scanhex12 scanhex12 commented Aug 22, 2022

Before submitting (must do checklist)

  • Did you read the contribution guide?
  • Did you update the docs? We use Numpy format for all the methods and classes.
  • Did you write any new necessary tests?
  • Did you update the CHANGELOG?

Proposed Changes

Closing issues

#712

@github-actions
Copy link

github-actions bot commented Aug 22, 2022

🚀 Deployed on https://deploy-preview-875--etna-docs.netlify.app

@github-actions github-actions bot temporarily deployed to pull request August 22, 2022 19:57 Inactive
@codecov-commenter
Copy link

codecov-commenter commented Aug 22, 2022

Codecov Report

Merging #875 (3b37d92) into master (752e35b) will decrease coverage by 37.56%.
The diff coverage is n/a.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@             Coverage Diff             @@
##           master     #875       +/-   ##
===========================================
- Coverage   88.50%   50.94%   -37.56%     
===========================================
  Files         192      192               
  Lines       12151    12146        -5     
===========================================
- Hits        10754     6188     -4566     
- Misses       1397     5958     +4561     
Impacted Files Coverage Δ
etna/analysis/feature_relevance/relevance_table.py 30.76% <ø> (-69.24%) ⬇️
etna/transforms/feature_selection/gale_shapley.py 42.17% <ø> (-57.83%) ⬇️

... and 132 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@scanhex12 scanhex12 self-assigned this Aug 23, 2022
@github-actions github-actions bot temporarily deployed to pull request August 24, 2022 09:55 Inactive
@alex-hse-repository alex-hse-repository self-requested a review August 25, 2022 03:37
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@Mr-Geekman Mr-Geekman marked this pull request as ready for review July 5, 2023 08:30
@github-actions github-actions bot temporarily deployed to pull request July 5, 2023 08:36 Inactive
@github-actions github-actions bot temporarily deployed to pull request July 5, 2023 08:47 Inactive
@review-notebook-app
Copy link

review-notebook-app bot commented Jul 5, 2023

View / edit / reply to this conversation on ReviewNB

alex-hse-repository commented on 2023-07-05T13:18:13Z
----------------------------------------------------------------

  1. Only if we need them for inverse transformation
  2. I guess so, there should be some kind of rounding [see](https://github.com/tinkoff-ai/etna/blob/4fc17d0942811fa619dc51a9bfc209bb25162e5c/etna/transforms/feature_selection/gale_shapley.py#L296)

@review-notebook-app
Copy link

review-notebook-app bot commented Jul 5, 2023

View / edit / reply to this conversation on ReviewNB

alex-hse-repository commented on 2023-07-05T13:18:14Z
----------------------------------------------------------------

The original horizon is actually 365


alex-hse-repository commented on 2023-07-05T13:18:59Z
----------------------------------------------------------------

But we can leave this version as well

@review-notebook-app
Copy link

review-notebook-app bot commented Jul 5, 2023

View / edit / reply to this conversation on ReviewNB

alex-hse-repository commented on 2023-07-05T13:18:15Z
----------------------------------------------------------------

May be

len(temp_ts.column.get_level_value("feature").unique())

@review-notebook-app
Copy link

review-notebook-app bot commented Jul 5, 2023

View / edit / reply to this conversation on ReviewNB

alex-hse-repository commented on 2023-07-05T13:18:16Z
----------------------------------------------------------------

Here we need to emphasis that return_features is only needed if they are nessesary for inverse_transform


@review-notebook-app
Copy link

review-notebook-app bot commented Jul 5, 2023

View / edit / reply to this conversation on ReviewNB

alex-hse-repository commented on 2023-07-05T13:18:16Z
----------------------------------------------------------------

It is actually more then 10% better


@review-notebook-app
Copy link

review-notebook-app bot commented Jul 5, 2023

View / edit / reply to this conversation on ReviewNB

alex-hse-repository commented on 2023-07-05T13:18:17Z
----------------------------------------------------------------

Lets limit the number of features


@review-notebook-app
Copy link

review-notebook-app bot commented Jul 5, 2023

View / edit / reply to this conversation on ReviewNB

alex-hse-repository commented on 2023-07-05T13:18:19Z
----------------------------------------------------------------

Too complicated)

Try to find the medium article about the mrmr and copy the picture with formula from there


@review-notebook-app
Copy link

review-notebook-app bot commented Jul 5, 2023

View / edit / reply to this conversation on ReviewNB

alex-hse-repository commented on 2023-07-05T13:18:20Z
----------------------------------------------------------------

Try to use ModelRelevanceTable, may be results will be slightly better(at least it works faster). Looks like the New Year grounds the metrics.


alex-hse-repository commented on 2023-07-10T08:12:44Z
----------------------------------------------------------------

May you check the results with fast_redundancy=False

alex-hse-repository commented on 2023-07-10T08:15:55Z
----------------------------------------------------------------

If ModelRelevanceTable works worse than StatisticsRelevanceTable we can actually leave statistics, here we want to show that feature selection improves quality or improves speed without significance decrease in quality

Mr-Geekman commented on 2023-07-10T08:26:53Z
----------------------------------------------------------------

For MRMR both ModelRelevanceTable and StatisticsRelevanceTable works worse than a baseline. I'll write all the results in the next message to compare.

Mr-Geekman commented on 2023-07-10T16:23:47Z
----------------------------------------------------------------

These are results on my personal laptop.

SMAPE baseline: 8.094
Time baseline: 79.353

ModelRelevanceTable + fast_redundancy=True

- SMAPE: 7.416

- Time: 43.722

StatisticsRelevanceTable + fast_redundancy=True

- SMAPE: 13.262

- Time: 42.955

ModelRelevanceTable + fast_redundancy=False

- SMAPE: 7.197

- Time: 143.238

StatisticsRelevanceTable + fast_redundancy=False

- SMAPE: 13.262

- Time: 144.947

It seems like bug fix also changed the results for MRMR.

alex-hse-repository commented on 2023-07-11T10:59:08Z
----------------------------------------------------------------

I guess we can take ModelRelevanceTable + fast_redundancy=True

Copy link
Collaborator

But we can leave this version as well


View entire conversation on ReviewNB

Copy link
Collaborator

May you check the results with fast_redundancy=False


View entire conversation on ReviewNB

Copy link
Collaborator

If ModelRelevanceTable works worse than StatisticsRelevanceTable we can actually leave statistics, here we want to show that feature selection improves quality or improves speed without significance decrease in quality


View entire conversation on ReviewNB

Copy link
Contributor

For MRMR both ModelRelevanceTable and StatisticsRelevanceTable works worse than a baseline. I'll write all the results in the next message to compare.


View entire conversation on ReviewNB

Copy link
Contributor

These are results on my personal laptop.

SMAPE baseline: 8.094
Time baseline: 79.353

ModelRelevanceTable + fast_redundancy=True

- SMAPE: 7.416

- Time: 43.722

StatisticsRelevanceTable + fast_redundancy=True

- SMAPE: 13.262

- Time: 42.955

ModelRelevanceTable + fast_redundancy=False

- SMAPE: 7.197

- Time: 143.238

StatisticsRelevanceTable + fast_redundancy=False

- SMAPE: 13.262

- Time: 144.947

It seems like bug fix also changed the results for MRMR.


View entire conversation on ReviewNB

Copy link
Collaborator

I guess we can take ModelRelevanceTable + fast_redundancy=True


View entire conversation on ReviewNB

@github-actions github-actions bot temporarily deployed to pull request July 11, 2023 13:33 Inactive
@Mr-Geekman Mr-Geekman merged commit 766d10b into master Jul 11, 2023
@Mr-Geekman Mr-Geekman deleted the feature_selection branch July 11, 2023 16:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants