Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column names should be like count, percent, sum as Y-data profiling might use these names. So if such names are used, ambiguity is created. #1476

Open
frelion opened this issue Oct 11, 2023 · 3 comments
Labels
information requested ❔ Cannot reproduce, waiting for minimum reproduction details.

Comments

@frelion
Copy link

frelion commented Oct 11, 2023

          Column names should be like count, percent, sum as Y-data profiling might use these names. So if such names are used, ambiguity is created.

My issue got resolved when I rename column name from 'count' to 'patient_count'.

Originally posted by @waghts95 in #1402 (comment)

Can I solve this problem only by changing the column name?

Error reported in spark mode but running normally in pandas version.

@fabclmnt
Copy link
Contributor

fabclmnt commented Dec 4, 2023

Hi @frelion ,

not sure if I got the question. Can you please provide more details?

@fabclmnt fabclmnt added information requested ❔ Cannot reproduce, waiting for minimum reproduction details. and removed needs-triage labels Dec 4, 2023
@frelion
Copy link
Author

frelion commented Dec 9, 2023

Hi @frelion ,

not sure if I got the question. Can you please provide more details?

you can try running run this code below:

from ydata_profiling import ProfileReport
from pyspark.sql import SparkSession
from pyspark import SparkConf

SPARK_CONF = SparkConf()\
    .setMaster("local[2]")\
    .setAppName("test_ydata_profiling")
spark = SparkSession.builder.config(conf=SPARK_CONF).getOrCreate()

df = spark.createDataFrame([("A", 1), ("B", 2), ("C", 3)], ["catg", "count"])
ProfileReport(df)

error info:

---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
File [~/anaconda3/envs/python3.8/lib/python3.8/site-packages/IPython/core/formatters.py:344](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/IPython/core/formatters.py:344), in BaseFormatter.__call__(self, obj)
    [342](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/IPython/core/formatters.py:342)     method = get_real_method(obj, self.print_method)
    [343](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/IPython/core/formatters.py:343)     if method is not None:
--> [344](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/IPython/core/formatters.py:344)         return method()
    [345](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/IPython/core/formatters.py:345)     return None
    [346](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/IPython/core/formatters.py:346) else:

File [~/anaconda3/envs/python3.8/lib/python3.8/site-packages/typeguard/__init__.py:1033](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/typeguard/__init__.py:1033), in typechecked.<locals>.wrapper(*args, **kwargs)
   [1031](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/typeguard/__init__.py:1031) memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   [1032](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/typeguard/__init__.py:1032) check_argument_types(memo)
-> [1033](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/typeguard/__init__.py:1033) retval = func(*args, **kwargs)
   [1034](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/typeguard/__init__.py:1034) try:
   [1035](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/typeguard/__init__.py:1035)     check_return_type(retval, memo)

File [~/anaconda3/envs/python3.8/lib/python3.8/site-packages/ydata_profiling/profile_report.py:511](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/ydata_profiling/profile_report.py:511), in ProfileReport._repr_html_(self)
    [509](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/ydata_profiling/profile_report.py:509) def _repr_html_(self) -> None:
    [510](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/ydata_profiling/profile_report.py:510)     """The ipython notebook widgets user interface gets called by the jupyter notebook."""
--> [511](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/ydata_profiling/profile_report.py:511)     self.to_notebook_iframe()

File [~/anaconda3/envs/python3.8/lib/python3.8/site-packages/typeguard/__init__.py:1033](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/typeguard/__init__.py:1033), in typechecked.<locals>.wrapper(*args, **kwargs)
   [1031](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/typeguard/__init__.py:1031) memo = _CallMemo(python_func, _localns, args=args, kwargs=kwargs)
   [1032](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/typeguard/__init__.py:1032) check_argument_types(memo)
-> [1033](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/typeguard/__init__.py:1033) retval = func(*args, **kwargs)
...
--> [175](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/pyspark/errors/exceptions/captured.py:175)     raise converted from None
    [176](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/pyspark/errors/exceptions/captured.py:176) else:
    [177](https://vscode-remote+wsl-002bubuntu-002d22-002e04.vscode-resource.vscode-cdn.net/home/sunzibin/tmp/~/anaconda3/envs/python3.8/lib/python3.8/site-packages/pyspark/errors/exceptions/captured.py:177)     raise

AnalysisException: [AMBIGUOUS_REFERENCE] Reference `count` is ambiguous, could be: [`count`, `count`].

@frelion
Copy link
Author

frelion commented Dec 9, 2023

@fabclmnt
My solution is to rename all column names of the data before analysis and then rename them back after the analysis is completed. what do you think about?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
information requested ❔ Cannot reproduce, waiting for minimum reproduction details.
Projects
None yet
Development

No branches or pull requests

3 participants