You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thank you for reporting the issue. Regarding ydata-profiling for spark it is clear that we have only launched one initial version that not only includes only a small set of functionality but also have some know issues.
We are looking for contributors that are willing to keep evolving the Spark integration, as this was something initiated by the community. If you're open to it, feel free to check the issues labelled with the tag spark.
Current Behaviour
Spark Dataframe structure:
code:
Look what distribution it produced for playtime_sec_total:
Now I converted this dataframe to the Pandas dataframe and here is what I see indeed:
So, conclusion is this: the product is totally buggy with this type of fields, and I don't trust it any more.
Expected Behaviour
You need to fix the handling of decimal fields.
Data Description
see above
Code that reproduces the bug
pandas-profiling version
ydata-profiling==4.8.3
Dependencies
OS
Ubuntu 22.04
Checklist
The text was updated successfully, but these errors were encountered: