-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metagenomics Sample Sheet generation assigns project blank wells to a separate folder. #483
Comments
@RodolfoSalido Thank you very much for pointing this out! What value do you want to go into this column of the sample sheet for blanks? |
It is coming back to me a little: this Sample_project column was the source of huge amounts of discussion and requirements gathering (see #204) . Specifically, @tanaes provided these guidances:
This was then followed by #204 (comment) :
Has this conclusion changed? |
I had a hunch that there had to be a discussion about this because it appeared to be designed.
I raised the issue because Greg thought it could be problematic to have all blanks pooled into one folder per sequencing run. I’ve cc’ed him so he can elaborate.
…-Rodolfo
Sent from my iPhone
On Apr 24, 2019, at 4:12 PM, Amanda Birmingham ***@***.***> wrote:
It is coming back to me a little: this Sample_project column was the source of huge amounts of discussion and requirements gathering (see #204) . Specifically, #204 (comment) from @tanaes said
OK, after chatting about this with some folks, it seems like the best option vis-a-vis the study sheet is to have any of the controls on a sequencing run end up in a 'Controls' demultiplex folder after BCL2Fastq. We don't necessarily need to make this an actual Qiita study, but that would enable a uniform place to access control samples downstream.
Has this conclusion changed?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Controls cannot be an output folder for metagenomics... the blanks are associated with a specific extraction and project. And these controls are project based so sometimes they are fecal and sometimes they are skin. We know there is well to well contamination so controls for each project are different.
… On Apr 24, 2019, at 4:29 PM, ***@***.*** wrote:
I had a hunch that there had to be a discussion about this because it appeared to be designed.
I raised the issue because Greg thought it could be problematic to have all blanks pooled into one folder per sequencing run. I’ve cc’ed him so he can elaborate.
-Rodolfo
Sent from my iPhone
> On Apr 24, 2019, at 4:12 PM, Amanda Birmingham ***@***.***> wrote:
>
> It is coming back to me a little: this Sample_project column was the source of huge amounts of discussion and requirements gathering (see #204) . Specifically, #204 (comment) from @tanaes said
>
> OK, after chatting about this with some folks, it seems like the best option vis-a-vis the study sheet is to have any of the controls on a sequencing run end up in a 'Controls' demultiplex folder after BCL2Fastq. We don't necessarily need to make this an actual Qiita study, but that would enable a uniform place to access control samples downstream.
>
> Has this conclusion changed?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
|
This issue and #431 are not the same, but they are definitely kissing cousins. Please keep both in mind when attempting a fix. |
@RodolfoSalido wrote:
Ok, so this means the specifications for this code have changed :(, and the stakeholders on this need to re-have the conversation about assigning sample project name for blanks and controls. The above comment adds:
There's the rub. In LabControl, blanks are certainly associated with a specific extraction (because we track the provenance of every well back through its extraction, etc). However, they are NOT necessarily associated with a specific project (i.e., study). This is because LabControl supports putting samples from multiple studies on a single sample plate. In the case where a plate contains samples from multiple studies, what study do you want the controls/blanks on that plate to be assigned to? In this case, LabControl could assign the controls/blanks to NEITHER (current approach) or BOTH, but I am not aware of any reasonable way for LabControl to guess which study a given control/blank belongs to ... maybe someone with more domain knowledge is? I think stakeholders who need to weigh in on this and approve any solution are @RodolfoSalido @ghsmu414 @jdereus @ackermag . If I am missing anyone, please pull them in--the team already spent literally months discussing this last year (#204) and I really want to avoid a repeat of a long discussion that still produces the incorrect specifications. |
Hi Amanda, I think for multiple projects on the same plate, the controls should be added to all projects associated. |
k, working on 431 - will take a look after the CMI meeting. |
@ghsmu414 That makes perfect logical sense to me :) ... How would that be correctly represented in the sample sheet? |
@AmandaBirmingham I think they can be assigned to the largest project on the plate (the project with the most samples on that plate) If that is too difficult it can be assigned to one of the projects and we will be able to track them back for analysis. |
To resolve the one to many case, where controls are to be added to all studies on the plate, would it be easier to sort this out after bcl2fastq? If so, then I believe we could retain the present functionality, and work with @jdereus to replicate the control per sample sequence files to the respective studies. |
@AmandaBirmingham: Specifically, the 'Project_name' column is empty for the control file. Gail said we could pull the value from the data for the samples file, but as you know with the way that loops works, it would require us to first see an iteration for samples, store the value for project_name, and hope that it's suitable for all subsequent entries in control. Another option that I implemented, but do not enjoy, is assuming the structure of the sample plate values and munging the project name out of them. It's a real hack; I'm more than happy to table it. What I'd like to have is have project_name be a non-NULL result for control values in the query. If we can make larger changes to support that, that would be ideal. |
Per discussion 20190501 with Greg, Charlie, Jeff, Gail, Daniel, Amanda: For 0.1.0 milestone, modify code to error out if try to create sample sheet/prep sheet for plates with >1 study on them, so will not create misleading data (but recall all data still stored in labcontrol so can go back and create whatever files needed from that later, once decision made and code updated). W that check in place, put study name of the (single) study on a plate into sample_proj_name field of shotgun sample sheet for any blanks/controls on that plate. |
@AmandaBirmingham I think I can close this issue, right? |
Yep, was fixed by #504 . Thanks for catching that :) |
Sample Sheet assigns a value of 'Controls' in 'Sample_project' column to Blank wells. This leads to reads from Blank wells from distinct projects to get demultiplexed into one common 'Controls' folder per sequencing run independent of their respective project folders.
The text was updated successfully, but these errors were encountered: