You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be really nice to have a way of associating chromosome info with the dataframe containing the ranges. I would propose using pd.DataFrame.attrs for storing metadata like chromosome info, column names.
Why
GRanges objects from bioconductor have a @seqinfo attribute that contains sequence info about the assembly being used. For example:
It would be nice if we could also attach this kind of information to our range dataframe for use with bioframe. This could be done by putting something equivalent to @seqinfo into the pd.DataFrame.attrs attribute. Something similar could also be done for different range column names.
Current use of global configuration
With cols, this library already provides ways of setting different values without needing to pass them all the time (docs). These are using a global config or temporarily modifying that config with a context manager.
I think both of these are less ergonomic
They require explicit code for something which could be explicit in the data, but implicit in the code.
They're global, and don't allow working with different configurations at the same time
Downsides
pd.DataFrame.attrs
The main downside is pd.DataFrame.attrs.
It's still marked as experimental, and can change
It doesn't show up in the repr, so it's not obvious if anything has been added
I would hope that usage here could influence further development of the features.
May not work with other backends
It's not immediately obvious whether alternative backends would also support this kind of feature
The feature
It would be really nice to have a way of associating chromosome info with the dataframe containing the ranges. I would propose using
pd.DataFrame.attrs
for storing metadata like chromosome info, column names.Why
GRanges
objects from bioconductor have a@seqinfo
attribute that contains sequence info about the assembly being used. For example:It would be nice if we could also attach this kind of information to our range dataframe for use with bioframe. This could be done by putting something equivalent to
@seqinfo
into thepd.DataFrame.attrs
attribute. Something similar could also be done for different range column names.Current use of global configuration
With
cols
, this library already provides ways of setting different values without needing to pass them all the time (docs). These are using a global config or temporarily modifying that config with a context manager.I think both of these are less ergonomic
Downsides
pd.DataFrame.attrs
The main downside is
pd.DataFrame.attrs
.I would hope that usage here could influence further development of the features.
May not work with other backends
It's not immediately obvious whether alternative backends would also support this kind of feature
Alternatives
bioframe
designThe text was updated successfully, but these errors were encountered: