Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read10xVisium describes the wrong directory structure #76

Closed
LTLA opened this issue Sep 14, 2021 · 13 comments
Closed

read10xVisium describes the wrong directory structure #76

LTLA opened this issue Sep 14, 2021 · 13 comments
Assignees

Comments

@LTLA
Copy link
Contributor

LTLA commented Sep 14, 2021

Currently, ?read10xVisium says that it expects the following file structure:

     sample
     |—outs
     · · |—raw/filtered_feature_bc_matrix.h5
     · · |—raw/filtered_feature_bc_matrix
     · · · · |—barcodes.tsv
     · · · · |—features.tsv
     · · · · |—matrix.mtx
     · · |—spatial
     · · · · |—scalefactors_json.json
     · · · · |—tissue_lowres_image.png
     · · · · |—tissue_positions_list.csv

The raw/ doesn't look right. I assume that only filtered_feature_bc_matrix is intended here, with possibly raw_feature_bc_matrix if data="raw".

It may also be worth noting which files and directories are actually needed for different options, e.g., the HDF5 file is only required if type="HDF5". This is relevant when people are trying to manually reassemble some Spaceranger-like organization from another source, e.g., GEO.

@lmweber
Copy link
Collaborator

lmweber commented Sep 14, 2021

Thanks @LTLA for raising this and the other 3 issues a few days ago.

We are going to discuss in our Slack channel and then push some updates.

@lmweber
Copy link
Collaborator

lmweber commented Sep 14, 2021

Ah in this one the raw/filtered_ is shorthand for saying it can either be raw or filtered. We can make this more clear.

While testing it out now I also noticed the read10xVisium() function is missing outs/ in the paths, so we will also need to include this in the update.

@lmweber
Copy link
Collaborator

lmweber commented Sep 14, 2021

This will also require moving the files in file.path("extdata", "10xVisium") into a subdirectory outs so that the example still works. Just noting this here so I don't forget.

@LTLA
Copy link
Contributor Author

LTLA commented Sep 15, 2021

The outs thing seems like an unnecessary breaking change. People using the current read10xVisium will now have to change the supplied paths. If you are going to change it, we should see the usual deprecation process.

@lmweber
Copy link
Collaborator

lmweber commented Sep 15, 2021

True - maybe that will cause more problems than it is worth

@lmweber
Copy link
Collaborator

lmweber commented Sep 15, 2021

Although I think it might actually be a bug since when I try simple code as follows, I get an error due to the missing outs/. Maybe we previously only tested it on our small example that doesn't include outs/.

spe <- read10xVisium(
  samples = here(df_samples$path, df_samples$sample_id), 
  sample_id = df_samples$sample_id, 
  type = "sparse", 
  data = "raw", 
  images = c("hires", "lowres"), 
  load = TRUE
)

I'll test it out some more and figure out if I am doing something wrong in my current example or if it is a bug.

@lmweber
Copy link
Collaborator

lmweber commented Sep 15, 2021

I think it's just a bug. We are missing outs/ in fns and dir in this block of code, and we even left a comment to check it later.

I'll check it to make sure I am understanding correctly, and open a pull request so we can discuss in our next meeting @drighelli @HelenaLC .

    # setup file paths
    fns <- paste0(
        data, "_feature_bc_matrix", 
        switch(type, HDF5=".h5", ""))
    counts <- file.path(samples, fns)
    
    # TODO: check that these files exist & are of valid format
    # otherwise things will fail & give unhelpful error messages
    
    dir <- file.path(samples, "spatial")
    xyz <- file.path(dir, "tissue_positions_list.csv")
    sfs <- file.path(dir, "scalefactors_json.json")
    names(xyz) <- names(sfs) <- sids

@LTLA
Copy link
Contributor Author

LTLA commented Sep 15, 2021

I think it's just a bug. We are missing outs/ in fns and dir in this block of code, and we even left a comment to check it later.

Perhaps, but people (a.k.a. me) are already adding outs/ to the paths ourselves, so if read10xVisium automatically adds on outs to the supplied paths, it's going to break current code. If you MUST do this (and I disagree with that, see below), then you should spend the next release cycle checking whether outs/ has already been added and only auto-adding if (i) there is no outs/ on the supplied path and (ii) the current path points to a directory with an outs/ subdirectory.

Putting aside the backwards-incompatibility, what's wrong with requiring people to supply a path directly to outs/? That is, very literally, the Spaceranger output directory, so there's nothing wrong with being explicit about it. Indeed, DropletUtils::read10xCounts goes further and requires users to specify the exact *_feature_bc_matrix or whatever folder actually contains the MTX/TSV files. We don't try to be cute and auto-add outs/filtered_feature_bc_matrix to the path.

Furthermore, I would argue that this auto-addition actively gets in the way of power users. People might be running Spaceranger, moving outs/ to another directory, and deleting the remaining cruft; or they might be downloading files and images from GEO and assembling it into the expected structure; or whatever. If you auto-add outs/, everyone needs to nest their content in a redundant outs/ subdirectory just to make the function happy.

@lmweber
Copy link
Collaborator

lmweber commented Sep 15, 2021

Hmm, ok thanks, will think about it some more.

My view was that adding outs/ manually makes things unnecessarily complicated for new users, who do not really know or care about this outs/ directory, and simply want to give it a list of sample names to load.

But if this would now create additional unintended problems for more advanced users, then we do not want this either. We will discuss some more before deciding, and I'll also have a look at how DropletUtils does it. Thanks for the comments.

@lmweber
Copy link
Collaborator

lmweber commented Sep 15, 2021

This is a good idea if we do proceed with this. I could easily add an if check for this.

then you should spend the next release cycle checking whether outs/ has already been added and only auto-adding if (i) there is no outs/ on the supplied path and (ii) the current path points to a directory with an outs/ subdirectory.

@lmweber
Copy link
Collaborator

lmweber commented Sep 15, 2021

Maybe an if check plus a warning noting that this has been changed.

@LTLA
Copy link
Contributor Author

LTLA commented Sep 15, 2021

If you want to please both sets of users, you can have both samples= and dirs=file.path(samples, "outs") arguments. Then advanced users can just set dirs, in which case samples is not used at all; and regular users can set samples= and let dirs= automatically resolve itself. This assumes that the only use of samples is in defining dirs.

@lmweber
Copy link
Collaborator

lmweber commented Nov 24, 2021

Done in #87

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants