Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metagenomics Sample Sheet generation assigns project blank wells to a separate folder. #483

Closed
RodolfoSalido opened this issue Apr 24, 2019 · 15 comments

Comments

@RodolfoSalido
Copy link

Sample Sheet assigns a value of 'Controls' in 'Sample_project' column to Blank wells. This leads to reads from Blank wells from distinct projects to get demultiplexed into one common 'Controls' folder per sequencing run independent of their respective project folders.

Screen Shot 2019-04-24 at 2 12 43 PM

@AmandaBirmingham
Copy link
Collaborator

@RodolfoSalido Thank you very much for pointing this out! What value do you want to go into this column of the sample sheet for blanks?

@AmandaBirmingham
Copy link
Collaborator

AmandaBirmingham commented Apr 24, 2019

It is coming back to me a little: this Sample_project column was the source of huge amounts of discussion and requirements gathering (see #204) . Specifically, @tanaes provided these guidances:

#204 (comment)

  1. Freaking blanks

This is a problem that keeps rearing its head. The informal idiom we've used is that -- typically -- only a single study is included on an extraction plate. Those blanks then get inherited by that study (study-level association), with a rather inconsistent naming convention which may or may not specify well and plate number within the study.

Really I think this is best solved by having the study-level modality to the sample plating interface that @ElDeveloper and I were discussing as a way to avoid having to display the long study identifier in the window. This would allow any extraction-level controls to be unambiguously associated with a particular project, which I think should be the preferred way to do it. Anything downstream of that (e.g. leftover wells in library prep plates) I think would be ok having a 'None'-equivalent study identified and project shortname.

An alternative would be to have a 'Controls' study that combined all of these types of samples. This is maybe not such a terrible idea, as appropriate controls for a given plate or process could in principle be queried from the database.

This was then followed by #204 (comment) :

OK, after chatting about this with some folks, it seems like the best option vis-a-vis the study sheet is to have any of the controls on a sequencing run end up in a 'Controls' demultiplex folder after BCL2Fastq. We don't necessarily need to make this an actual Qiita study, but that would enable a uniform place to access control samples downstream.

Has this conclusion changed?

@RodolfoSalido
Copy link
Author

RodolfoSalido commented Apr 24, 2019 via email

@charles-cowart charles-cowart added this to the Full Launch milestone Apr 24, 2019
@RodolfoSalido
Copy link
Author

RodolfoSalido commented Apr 25, 2019 via email

@AmandaBirmingham
Copy link
Collaborator

This issue and #431 are not the same, but they are definitely kissing cousins. Please keep both in mind when attempting a fix.

@AmandaBirmingham
Copy link
Collaborator

@RodolfoSalido wrote:

Controls cannot be an output folder for metagenomics...

Ok, so this means the specifications for this code have changed :(, and the stakeholders on this need to re-have the conversation about assigning sample project name for blanks and controls. The above comment adds:

the blanks are associated with a specific extraction and project [emphasis mine]

There's the rub. In LabControl, blanks are certainly associated with a specific extraction (because we track the provenance of every well back through its extraction, etc). However, they are NOT necessarily associated with a specific project (i.e., study). This is because LabControl supports putting samples from multiple studies on a single sample plate.

In the case where a plate contains samples from multiple studies, what study do you want the controls/blanks on that plate to be assigned to? In this case, LabControl could assign the controls/blanks to NEITHER (current approach) or BOTH, but I am not aware of any reasonable way for LabControl to guess which study a given control/blank belongs to ... maybe someone with more domain knowledge is?

I think stakeholders who need to weigh in on this and approve any solution are @RodolfoSalido @ghsmu414 @jdereus @ackermag . If I am missing anyone, please pull them in--the team already spent literally months discussing this last year (#204) and I really want to avoid a repeat of a long discussion that still produces the incorrect specifications.

@ghsmu414
Copy link

Hi Amanda,

I think for multiple projects on the same plate, the controls should be added to all projects associated.

@charles-cowart
Copy link
Collaborator

k, working on 431 - will take a look after the CMI meeting.

@AmandaBirmingham
Copy link
Collaborator

@ghsmu414 That makes perfect logical sense to me :) ... How would that be correctly represented in the sample sheet?

@ghsmu414
Copy link

@AmandaBirmingham I think they can be assigned to the largest project on the plate (the project with the most samples on that plate) If that is too difficult it can be assigned to one of the projects and we will be able to track them back for analysis.

@wasade
Copy link
Member

wasade commented Apr 25, 2019

To resolve the one to many case, where controls are to be added to all studies on the plate, would it be easier to sort this out after bcl2fastq? If so, then I believe we could retain the present functionality, and work with @jdereus to replicate the control per sample sequence files to the respective studies.

@charles-cowart
Copy link
Collaborator

@AmandaBirmingham: Specifically, the 'Project_name' column is empty for the control file. Gail said we could pull the value from the data for the samples file, but as you know with the way that loops works, it would require us to first see an iteration for samples, store the value for project_name, and hope that it's suitable for all subsequent entries in control.

Another option that I implemented, but do not enjoy, is assuming the structure of the sample plate values and munging the project name out of them. It's a real hack; I'm more than happy to table it. What I'd like to have is have project_name be a non-NULL result for control values in the query. If we can make larger changes to support that, that would be ideal.

@AmandaBirmingham
Copy link
Collaborator

Per discussion 20190501 with Greg, Charlie, Jeff, Gail, Daniel, Amanda: For 0.1.0 milestone, modify code to error out if try to create sample sheet/prep sheet for plates with >1 study on them, so will not create misleading data (but recall all data still stored in labcontrol so can go back and create whatever files needed from that later, once decision made and code updated). W that check in place, put study name of the (single) study on a plate into sample_proj_name field of shotgun sample sheet for any blanks/controls on that plate.

@AmandaBirmingham AmandaBirmingham self-assigned this May 7, 2019
AmandaBirmingham added a commit that referenced this issue May 8, 2019
@charles-cowart
Copy link
Collaborator

@AmandaBirmingham I think I can close this issue, right?

@AmandaBirmingham
Copy link
Collaborator

Yep, was fixed by #504 . Thanks for catching that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants