Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Create collection of dataframes for testing #2530

Closed
mrocklin opened this issue Aug 11, 2019 · 2 comments
Closed

[FEA] Create collection of dataframes for testing #2530

mrocklin opened this issue Aug 11, 2019 · 2 comments
Labels
feature request New feature or request Python Affects Python cuDF API. tests Unit testing for project

Comments

@mrocklin
Copy link
Collaborator

When testing cudf with new systems it is often useful to have a somewhat comprehensive collection of dataframes that cudf might create. This would be useful for testing systems like serialization, writing to various formats, and so on. Ideally, I would test my system with with cudf by doing something like the following:

# cudf/python/cudf/utils.py

dataframes = {
    "series": lambda: cudf.Series([1, 2, 3]),
    "series-strings": lambda: cudf.Series(["a", "b", "c"]),
    "dataframe": lambda: cudf.DataFrame({"x": [1, 2, 3], "y": [1., 2., 3.]}),
    "empty-series": lambda: cudf.Series([], name="foo"),
    ...
    "large-series": lambda: cudf.Series(np.arange(100000000)),
    ...
}
# my-external-code.py
from cudf.utils import dataframes

for name, df_func in dataframes.items():
    df = df_func()
    try:
        myfunction(df)
    except Exception as e:
        print(name, "failed with exception", e)

This collection might include things like ...

  1. dataframes of different types
  2. series
  3. empty dataframes/series
  4. strings
  5. categoricals
  6. dataframes with a non-trivial index
  7. Big dataframes/series
  8. Datetimes

and so on.

I think that @quasiben could use this for UCX testing (we keep running into situations where our serialization can't handle things correctly) and I suspect that it would also be helpful for I/O work (cc @mjsamoht )

I made the actual values callables above so that we wouldn't need to actually allocate these at startup.

@mrocklin mrocklin added feature request New feature or request Needs Triage Need team to review and classify labels Aug 11, 2019
@kkraus14 kkraus14 added code quality Python Affects Python cuDF API. tests Unit testing for project and removed Needs Triage Need team to review and classify labels Aug 15, 2019
@GregoryKimball
Copy link
Contributor

The python benchmarks introduced by #11125 provide an excellent collection of dataframes and operations for testing.

@vyasr
Copy link
Contributor

vyasr commented May 10, 2024

I think I accidentally reopened this.

@vyasr vyasr closed this as completed May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API. tests Unit testing for project
Projects
None yet
Development

No branches or pull requests

4 participants