Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing the full functionality of the 'sample' function #1887

Open
chi2liu opened this issue Nov 4, 2020 · 2 comments · May be fixed by #1893
Open

Implementing the full functionality of the 'sample' function #1887

chi2liu opened this issue Nov 4, 2020 · 2 comments · May be fixed by #1893

Comments

@chi2liu
Copy link

chi2liu commented Nov 4, 2020

The current implementation of the sample function is based on the sample function of pyspark, and the parameter n is not supported, and frac cannot be empty.

From the source code of pandas, https://github.com/pandas-dev/pandas/blob/master/pandas/core/generic.py#L5076
the implementation of the sample function relies on methods such as iloc, take and reindex. These methods are currently supported by koalas. Therefore, the implementation of the sample function can be based on the current logic of pandas.

I will try to implement the sample function of frame and series from this idea.

@amueller
Copy link

amueller commented Nov 24, 2020

I also noticed that Series.sample doesn't support frac right now (koalas 1.4.0). Is that expected? And do you have a timeline for #1893 being merged?

@ueshin
Copy link
Collaborator

ueshin commented Nov 24, 2020

Hi @amueller,

Seems like Series.sample supports the frac parameter now.

For #1893, now it's stuck by a performance concern (#1893 (comment)).
Could you kindly advice us if you have a good idea?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants