Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bool category not properly parsed #27

Open
sjanssen2 opened this issue Jan 9, 2023 · 4 comments
Open

bool category not properly parsed #27

sjanssen2 opened this issue Jan 9, 2023 · 4 comments

Comments

@sjanssen2
Copy link

Assume I have a metadata category like infection with values TRUE or FALSE. If I load these data as in your example metadata = pd.read_table("data/metadata.tsv", sep="\t", index_col=0) they are of type object and proper boolean values, i.e. True and False. If I would add a dtype=str, the values are still of type object but strings, namely 'TRUE' and 'FALSE'.

Only the dtype=str way works for me. Otherwise evident throws the error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3628             try:
-> 3629                 return self._engine.get_loc(casted_key)
   3630             except KeyError as err:

~/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

~/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'infection'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_1855806/1849832882.py in <module>
      1 for cat in ["birth_timestamp","cage","genotype","infection"]:
----> 2     print(adh.calculate_effect_size(column=cat))

~/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/evident/data_handler.py in calculate_effect_size(self, column, difference)
    112         :rtype: evident.results.EffectSizeResult
    113         """
--> 114         if self.metadata[column].dtype != np.dtype("object"):
    115             raise exc.NonCategoricalColumnError(self.metadata[column])
    116 

~/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3503             if self.columns.nlevels > 1:
   3504                 return self._getitem_multilevel(key)
-> 3505             indexer = self.columns.get_loc(key)
   3506             if is_integer(indexer):
   3507                 indexer = [indexer]

~/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3629                 return self._engine.get_loc(casted_key)
   3630             except KeyError as err:
-> 3631                 raise KeyError(key) from err
   3632             except TypeError:
   3633                 # If we have a listlike key, _check_indexing_error will raise

KeyError: 'infection'

You might want to return a more explicit error message in those cases.

@sjanssen2
Copy link
Author

same seems to be the case for dates

@sjanssen2
Copy link
Author

same issue might affect with the Bokeh server?!

(qiime2-2022.8) t490s x86_64 /media/jlu/vol/jlab/MicrobiomeAnalyses/Projects/Pandyra_LCMV>bokeh serve --show app
2023-01-09 17:21:02,339 Starting Bokeh server version 2.4.3 (running on Tornado 6.2)
2023-01-09 17:21:02,342 User authentication hooks NOT provided (default user enabled)
2023-01-09 17:21:02,349 Bokeh app running at: http://localhost:5006/app
2023-01-09 17:21:02,350 Starting Bokeh server with process id: 25929
/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/evident/data_handler.py:72: UserWarning: Some categories have been dropped because they had either only one level or too many. Use the max_levels_per_category argument to modify this threshold.
Dropped columns: ['birth_timestamp', 'host_age', 'infection', 'mouse_number']
  warn(
2023-01-09 17:21:04,054 Error running application handler <bokeh.application.handlers.directory.DirectoryHandler object at 0x7fdca1155dc0>: 'infection'
File 'base.py', line 3631, in get_loc:
raise KeyError(key) from err Traceback (most recent call last):
  File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3629, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'infection'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/bokeh/application/handlers/code_runner.py", line 231, in run
    exec(self._code, module.__dict__)
  File "/media/jlu/vol/jlab/MicrobiomeAnalyses/Projects/Pandyra_LCMV/app/main.py", line 48, in <module>
    effect_size_by_category(dh, binary_cols)
  File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/evident/effect_size.py", line 49, in effect_size_by_category
    results = Parallel(n_jobs=n_jobs, **parallel_args)(
  File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/joblib/parallel.py", line 1046, in __call__
    while self.dispatch_one_batch(iterator):
  File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
    self._dispatch(tasks)
  File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/joblib/parallel.py", line 779, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/joblib/parallel.py", line 262, in __call__
    return [func(*args, **kwargs)
  File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/joblib/parallel.py", line 262, in <listcomp>
    return [func(*args, **kwargs)
  File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/evident/data_handler.py", line 114, in calculate_effect_size
    if self.metadata[column].dtype != np.dtype("object"):
  File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/pandas/core/frame.py", line 3505, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/sjanssen/miniconda3/envs/qiime2-2022.8/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3631, in get_loc
    raise KeyError(key) from err
KeyError: 'infection'
 
2023-01-09 17:21:04,487 WebSocket connection opened
2023-01-09 17:21:04,487 ServerConnection created
^C
Interrupted, shutting down

@gibsramen
Copy link
Collaborator

Thanks for bringing this up. I'm not sure how to handle dates but for booleans I think we can just allow bool dtype columns.

gibsramen added a commit to gibsramen/evident that referenced this issue Jan 10, 2023
@gibsramen
Copy link
Collaborator

@sjanssen2 Can you try out this change and see if it resolves your boolean issue?

https://github.com/gibsramen/evident/tree/fix-bool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants