Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using multi-lines defined in separate columns #283

Closed
StevenCHowell opened this issue Feb 17, 2017 · 4 comments
Closed

Using multi-lines defined in separate columns #283

StevenCHowell opened this issue Feb 17, 2017 · 4 comments

Comments

@StevenCHowell
Copy link

It would be useful to be able to use Datashader to directly plot data from NumPy arrays, particularly for plotting lines. This Stack Overflow question and response demonstrates the workaround I used to put the data into a DataFrame. I am not sure if there is a simple way to accept both types of input. As a fundamental data structure for scientific data, I think it worth considering adding support for NumPy array.

@StevenCHowell
Copy link
Author

Something like this could be incorporated to reformat NumPy arrays of lines into a DataFrame that will work well with datashader (following the info in #286):

def np_to_ds_format(x_array, y_arrays):
    """
    reformat data for plotting in DataShader

    Parameters
    ----------
    x_array: array of x-values (should be a one dimensional np.array of length
             N)
    y_arrays: M different arrays of y-values (should be a np.array with
              dimensions
        NxM) L)

    Returns
    -------
    ds_df: datashader DataFrame
    """

    n_points, n_arrays = y_arrays.shape
    assert n_points == len(x_array), 'mismatched x/y-data'

    dfs = []
    split = pd.DataFrame({'x': [np.nan]})
    for i in range(n_arrays):
        x = x_array
        y = y_arrays[:, i]
        df = pd.DataFrame({'x': x, 'y': y})
        dfs.append(df)
        dfs.append(split)

    ds_df = pd.concat(dfs, ignore_index=True)
    return ds_df

@jbednar jbednar changed the title Directly plotting data from NumPy arrays Using multi-lines defined in separate columns Feb 23, 2017
@jbednar
Copy link
Member

jbednar commented Feb 23, 2017

Thanks for the example. I originally thought you were asking for something like what I've just written up in #289, so I've changed the title of this issue to reflect that you're really asking for a convenient way to work with lines defined in independent arrays/columns, as opposed to a single unified dataframe separated with NaNs.

The code is helpful, but because it accepts only a single x array, it will only work for a fairly specialized scenario where all lines have the same number of samples and all samples are stored in a single array. Datashader will happily work with any number of different lines of different sizes, which in NumPy could be represented variously as pairs of x and y arrays matching in size, or as single arrays with x and y columns. It's hard to see why we would provide a function for the above special case if we didn't also cover these other cases; I don't think any one of them is particularly more common than the others. And generalizing the above example to cover all those cases could get tricky.

So the above code is definitely useful as an example, but I'm not sure it makes sense to be part of the library as it is.

@jbednar
Copy link
Member

jbednar commented Feb 23, 2017

Also, similar issues apply to Pandas dataframes that have dozens or hundreds of columns, where someone may reasonably want to be able to specify a list of those names and then plot them directly, rather than by constructing another dataframe concatenating them.

@jbednar
Copy link
Member

jbednar commented Dec 19, 2018

I think this issue is addressed by the separately developed ds.utils.dataframe_from_multiple_sequences utility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants