-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Equivalent of dask.dataframe.from_map? #6492
Comments
@zmbc I think it's quite similar to what modin/modin/distributed/dataframe/pandas/partitions.py Lines 129 to 136 in 29d9da0
Example: import pandas as pd
from modin.distributed.dataframe.pandas import from_partitions
import ray
ray.init(num_cpus=4)
partitions = list(map(lambda number: ray.put(pd.DataFrame([number])), [0, 1, 2]))
modin_df = from_partitions(partitions, axis=0) |
Thanks! That's helpful, but it does require me to interact directly with the underlying engine (Ray in your example). That's kind of a pain when I want to write code that will work with any Modin engine. |
Indeed too many low level details for end user. Looks like this functionality can be useful and we can try to implement it. Thanks for the idea @zmbc! cc @modin-project/modin-core in case you have any more thoughts about that. |
Signed-off-by: Igoshev, Iaroslav <[email protected]>
Signed-off-by: Igoshev, Iaroslav <[email protected]>
Signed-off-by: Igoshev, Iaroslav <[email protected]>
Signed-off-by: Igoshev, Iaroslav <[email protected]>
dask.dataframe has an experimental function called
from_map
. This allows the user to define a custom function for loading/creating partitions of a dataframe. Conceptually, I think this is similar to Registering Custom Functions, except that it allows a function to create a DataFrame, rather than operate on an existing DataFrame.This seems like it would be very useful to have in Modin; I don't see any straightforward way to achieve this currently. Am I missing something? I know that it would need to have some additional complexity that Dask doesn't have to worry about, in order to be general across storage formats and allow 2D partitioning, but these don't seem like insurmountable issues.
Any reactions to the utility and/or feasibility of this feature?
The text was updated successfully, but these errors were encountered: