-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Table to matrix #58
Comments
That sounds like a useful feature. Though I think only one argument is needed: A few remarks:
I don't think this is needed. Tables always have observations as rows and variables as columns. What can change is how the storage is actually done, but the Tables.jl interface ensures we don't have to care about it.
Yes, this can be an inspiration. But of course this should use
In general, iterating over columns will likely be the most efficient approach for most tables (even row-oriented ones) when @quinnj Is that right? |
The current readme says specifically for example, if MyTable is a row-oriented format, I might define my "sink" function like: so I assumed this could exist 🤷♂️ |
Yes, but this refers to the storage, not to the meaning of rows AFAIK. That's exactly like the difference between |
Isn't this the point of statsmodels.jl? |
StatsModels is much more complex, it parses formulas and processes them using contrasts. Here we just need to copy values to a matrix layout. |
I dunno, tables can (in principle) have different eltypes for different columns. It might make sense to have a fallback |
Yes, but here we don't even talk about transforming categorical variables to dummies. We just want to use promotion to find the best common type to all columns and copy the data to a matrix of that type. Anything more complex should indeed go through StatsModels. |
The new |
I think we still need to add the |
Why? If you materialize a matrix, you can just do |
The point is precisely that the packages which would like to use |
…lowing the user to specify whether input columns should be materialized as matrix columns or rows
Ok, @nalimilan @tlienart, see the PR here: #66 |
…lowing the user to specify whether input columns should be materialized as matrix columns or rows
Alright, with #66 merged, I think we're good here. |
(apologies if it's a dumb question)
In line with this discourse thread it seems that a good way forward is to have ML algorithms (such as, say,
kmeans
) acceptTables
instead of "just"Matrix
orAbstractMatrix
.In some cases, the algorithm may work directly using
rows
orcolumns
of the Table but in some case it may be preferable to work with the matrix of values of the table directly (maybe an example would be PCA).Would it therefore be possible/relevant for
Tables.jl
to implement a function likeTables.matrix(table)
and possibly have that function take an argument depending on whether the user wants a matrix with rows-as-observation or column-as-observation?As a suggested possible API I could see something like
Edit: here are what I think may be useful building blocks? (with guidance/hints I'm happy to try to give this a shot by the way, though I'm not very familiar with Julia's table universe...)
isroworiented
/iscolumnoriented
vardim=2
and the promoted type is<:Number
, I believecopy(transpose(matrix(..., vardim=1)))
will be faster than a row iteration, if it's not a<:Number
and therefore transpose might not work,permutedims
should.The text was updated successfully, but these errors were encountered: