Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_rejected_variables missing after release of v2.4.0 #315

Closed
neomatrix369 opened this issue Jan 11, 2020 · 4 comments
Closed

get_rejected_variables missing after release of v2.4.0 #315

neomatrix369 opened this issue Jan 11, 2020 · 4 comments
Labels
bug 🐛 Something isn't working feature request 💬 Requests for new features

Comments

@neomatrix369
Copy link

neomatrix369 commented Jan 11, 2020

Describe the bug

My old code which used to work on a version prior to v2.4.0 now does not work as it uses parameters and methods that have been removed/deprecated in this version.

One of them is access to the list of Rejected variables - this is invaluable for data-processing and other reasoning activities.

I didn't see it mentioned in the release notes or in the docs - that they are removed and if there are alternative ways to accomplish it

How would we do this in the new version? Process the report as a JSON file and then access the rejected variables from there? Any other methods?

The docs are good but would be great to see a format like this, https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html where we see a list of all parameters and then potentially examples to follow.

To Reproduce

Version being used: v2.4.0 (worked in the previous version)

Try running the below with the new version of pandas-profiling:

    profile = dataset.profile_report(title=title,
                        # if 'bayesian_blocks_bins': True fails 
                        # to provide a valid number of bins use the below config:
                        #   {'bins': 8, 'bayesian_blocks_bins': False} 
                          plot={'histogram': {'bayesian_blocks_bins': True}}, 
                          style={'full_width': True}, 
                          minify_html=True)

and then also try this:

    filename, fileext = os.path.splitext(output_file)
    rejected_variables_filename = f'{filename}-rejected-variables.txt'
    rejected_variables = profile.get_rejected_variables(threshold=0.9)
    print("Rejected variables: ", rejected_variables)

both will fail to compile/run with errors!

@neomatrix369 neomatrix369 added the bug 🐛 Something isn't working label Jan 11, 2020
@sbrugman sbrugman added the feature request 💬 Requests for new features label Jan 11, 2020
@sbrugman
Copy link
Collaborator

sbrugman commented Jan 11, 2020

Hi Mani,

There are three different observations in this one issue: rejected variables are missing, documentation could be more extensive and a bug in bayesian blocks. It would be good to split them into separate issues (a feature request for documentation, a bug report for the bayesian blocks, please make sure to include a dataset). We then can dedicate this issue to the rejected variables.

You have a good point that the removal of the get_rejected_variables was not mentioned. There were a few reasons to remove it (temporarily):

The v2.4.0 release fixes these issues. The same functionality can now be obtained using:
profile.get_description()['messages']. In v2.4.1 I'll add an improved mapping from the description set to rejected variables (=get_rejected_variables).

@sbrugman sbrugman changed the title Potential regression after release of v2.4.0 get_rejected_variables missing after release of v2.4.0 Jan 11, 2020
@neomatrix369
Copy link
Author

neomatrix369 commented Jan 13, 2020

Hi Simon, I'll try to split these but also will answer quickly to some of the points.

  • bayesian bug was already reported earlier I think, the note:
                        # if 'bayesian_blocks_bins': True fails 
                        # to provide a valid number of bins use the below config:
                        #   {'bins': 8, 'bayesian_blocks_bins': False} 

comes either from your config or issues/discussions made previously, see #222, #227, #245, #293. So there might not be something to report, I just added that note for myself so I follow the right config.

@neomatrix369
Copy link
Author

neomatrix369 commented Jan 13, 2020

I'll request docs for the get rejected variable aspect.

And also for removal of these parameters, that worked in the previous version:

style={'full_width': True}, minify_html=True

@neomatrix369
Copy link
Author

neomatrix369 commented Jan 13, 2020

With regards to reinstating get rejected variables, it's better to keep the same name and internally make the above call as you suggested so +1 there:

def get_rejected_variables(threshold=0.5):
    return profile.get_description()['messages']

(a contrived example)

sbrugman added a commit that referenced this issue Jan 13, 2020
…ibility and temporary solution for #319)

Include __repr__ on message class (#315).
sbrugman added a commit that referenced this issue Jan 14, 2020
@sbrugman sbrugman mentioned this issue Jan 14, 2020
chanedwin pushed a commit to chanedwin/pandas-profiling that referenced this issue Oct 11, 2020
…ibility and temporary solution for ydataai#319)

Include __repr__ on message class (ydataai#315).
chanedwin pushed a commit to chanedwin/pandas-profiling that referenced this issue Oct 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working feature request 💬 Requests for new features
Projects
None yet
Development

No branches or pull requests

2 participants