-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataframe count
and size
groupby aggregations
#2831
Comments
############################################################################################################################# Pandas Example############################################################################################################################# import arkouda as ak import numpy as np ivalues = ak.array([4, 1, 3, 2, 2, 2, 5, 5, 2, 3]) ak_df = ak.DataFrame({"nums":ivalues}) pd_df = ak_df.to_pandas() ak_count = ak_df.groupby("nums").count() pd_count = pd_df.groupby(["nums"]).count() pd_size = pd_df.groupby(["nums"]).size() ############################################################################################################################# Output############################################################################################################################# In [2]: ivalues = ak.array([4, 1, 3, 2, 2, 2, 5, 5, 2, 3]) In [3]: In [4]: In [5]: type(ak_count) In [6]: pd_count = pd_df.groupby(["nums"]).count() In [7]: type(pd_count) In [8]: In [9]: type(pd_size) |
` Pyspark Example############################################################################################################################# import numpy as np Define the schemaschema = StructType([ ivalues = np.array([[4, 1, 3, 2, 2, 2, 5, 5, 2, 3]]).T Create the PySpark DataFramepyspark_df = spark.createDataFrame(ivalues.tolist(), schema=schema) pyspark_count = pyspark_df.groupby("nums").count() pyspark_count.show() type(pyspark_count) ############################################################################################################################# Output#############################################################################################################################
` |
Attaching the examples as a file as well: |
…oupby().sum() to pandas (#2892) * Closes ticket #2831 to align dataframe.groupby().size() to pandas * clean up formatting * remove usage of | union for dictionaries from dataframe.py because it unsuported in python 3.8 * fix formatting in dataframe.py * update dataframe.GroupBy.size() and .count() to default as_series = None, and return series when as_index=True and as_series=None * change default value to as_index=True in dataframe.GroupBy to match pandas * fix a type in PROTO_tests/tests/series_test.py and other minor code efficiencies --------- Co-authored-by: Amanda Potts <[email protected]>
While working with @tgstevensonRedRocket we found discrepancies between the return type of
pd_df.groupby.count()
andak_df.groupby.count()
. And I don't think we have a size aggregation at allThe text was updated successfully, but these errors were encountered: