Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a bug of ValueError happened when using TableOne(data) #62

Closed
Yuyoo opened this issue May 23, 2018 · 8 comments
Closed

a bug of ValueError happened when using TableOne(data) #62

Yuyoo opened this issue May 23, 2018 · 8 comments

Comments

@Yuyoo
Copy link

Yuyoo commented May 23, 2018

Hi,Tom and Alistair. Long time no see since Datathon in BeiJing in 2017. How have you been doing?
I found a bug in tableone.py in the lastest version 0.5.6.
Because of the difference of condition judgment in py2/py3, there is a bug in tableone.py in line 96. The bug can cause the error when using TableOne(data).
In line 96, "data[columns].columns.get_duplicates()" returns "Index([], dtype='object')". In py3, Index([], dtype='object') could not be solved as False, and would throw a ValueError: The truth value of a Index is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). I have test it in py2, and it works well.
I suggest that we can fix it by change "data[columns].columns.get_duplicates()" to "data[columns].columns.get_duplicates().values.size", or you can solved it in other way.

@tompollard
Copy link
Owner

hi @Yuyoo, thanks for highlighting this issue. Please could you provide code to reproduce the problem? In Python 3, the following code returns the expected "duplicate columns" error for me:

# load sample data into a pandas dataframe
url="https://raw.githubusercontent.com/tompollard/tableone/master/data/pn2012_demo.csv"
data=pd.read_csv(url)

# create duplicate columns
data = data.rename(index=str, columns={"MechVent": "Height", "Weight": "Height", 
                                       "SysABP":"Age", "ICU":"Age"})

# create table
overall_table = TableOne(data)

raises the expected error:

---------------------------------------------------------------------------
InputError                                Traceback (most recent call last)
<ipython-input-8-332ab7cb68f8> in <module>()
      1 # create an instance of TableOne with the input arguments
      2 # firstly, with no grouping variable
----> 3 overall_table = TableOne(data)

~/projects/tableone/tableone.py in __init__(self, data, columns, categorical, groupby, nonnormal, pval, pval_adjust, isnull, ddof, labels, sort, limit, remarks)
     96         dups = data[columns].columns.get_duplicates()
     97         if dups:
---> 98             raise InputError('Input contains duplicate columns: {}'.format(dups))
     99 
    100         # if categorical not specified, try to identify categorical

InputError: Input contains duplicate columns: ['Age', 'Height']

Your suggested fix returns an error:

columns = data.columns.get_values()

data[columns].columns.get_duplicates().values.size

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-18-adf1753aef69> in <module>()
----> 1 data[columns].columns.get_duplicates().values.size

AttributeError: 'list' object has no attribute 'values'

@Yuyoo
Copy link
Author

Yuyoo commented May 23, 2018

The bug happened as:
TableOne(data)
C:\Users\Yuyoo\Anaconda3\lib\site-packages\tableone.py:96: FutureWarning: 'get_duplicates' is deprecated and will be removed in a future release. You can use idx[idx.duplicated()].unique() instead dups = data[columns].columns.get_duplicates() Traceback (most recent call last): File "D:/Tianchi/meinian2/code/table1_test.py", line 8, in <module> print(TableOne(data)) File "C:\Users\Yuyoo\Anaconda3\lib\site-packages\tableone.py", line 97, in __init__ if dups: File "C:\Users\Yuyoo\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2002, in __nonzero__ .format(self.__class__.__name__)) ValueError: The truth value of a Index is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Sorry, I didnt examine my method in py2. In py2, data[columns].columns.get_duplicates() return type of list, and 'list' object has no attribute 'values', I think you can change it to "len(data[columns].columns.get_duplicates())". It is universal in both py2 and py3.

@tompollard
Copy link
Owner

Okay, got it, thanks @Yuyoo. I now get this error after upgrading to pandas '0.23.0' (from '0.22.0').

@Yuyoo
Copy link
Author

Yuyoo commented May 23, 2018

Yeah, it will update pandas defaultly when pip install --upgrade tableone. I didnt get the error when i use the old version of tableone.

@tompollard
Copy link
Owner

Yeah, bad timing because we just published a paper about the package! We'll get the issues fixed as soon as possible. This particular bug is fixed with:

        # check for duplicate columns
        dups = data[columns].columns[data[columns].columns.duplicated()].unique()
        if not dups.empty:
            raise InputError('Input contains duplicate columns: {}'.format(dups))

We'll work on the other issues shortly. Thanks again for raising this :)

@tompollard tompollard reopened this May 23, 2018
@tompollard tompollard mentioned this issue May 23, 2018
@Yuyoo
Copy link
Author

Yuyoo commented May 23, 2018

Haha, its no problem, everything will be ok. You have done a good job, its convenient for us to do research. Best wish to you!

@tompollard
Copy link
Owner

tompollard commented May 23, 2018

The following line also raises an error in Pandas 0.2.3:

grouped_data = pd.crosstab(data[self._groupby],data[v])

ValueError: Duplicated level name: "death", 
assigned to level 1, is already used for level 0.

The error is raised when the _groupby column matches v (in the case above, groupby='death' and v='death')

Odd, because it looks like this was fixed as a bug in Pandas at some point in the past:
pandas-dev/pandas#13279

@tompollard
Copy link
Owner

Fixed in version 0.5.7. Thanks again :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants