Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prepare_detergent failing when using all samples #13

Open
pblaney opened this issue Feb 15, 2023 · 2 comments
Open

prepare_detergent failing when using all samples #13

pblaney opened this issue Feb 15, 2023 · 2 comments

Comments

@pblaney
Copy link

pblaney commented Feb 15, 2023

Hello,

After collecting a test set of fragCounter coverage profiles for 4 normal samples, I attempted to run the dryclean workflow.
I encountered the following error while trying the first step of creating the PoN in prepare_detergent:

pon_detergent <- prepare_detergent(normal.table.path = "/drycleanRun/test_ton.rds",
                                   use.all = TRUE,
                                   num.cores = 2,
                                   build = "hg38",
                                   path.to.save = "drycleanRun/",
                                   nochr = T,
                                   save.pon = T)

### OUTPUT ###
Starting the preparation of Panel of Normal samples a.k.a detergent
4 samples available
Using all samples
PAR file not provided, using hg38 default. If this is not the correct build, please provide a GRange object delineating for corresponding build
PAR read
Checking for existence of files
4 files present
  |=====================================================================================================================| 100%, Elapsed 07:21
Error in setattr(ans, "names", c(keep.names, paste0("V", seq_len(length(ans) -  : 
  'names' attribute [1] must be the same length as the vector [0]

While troubleshooting, it seems like others have encountered the same error, but at a different stage of the workflow (#2).
Based on the output message, it looks like the error occurs within pbmclapply function call at line 259 although I am not exactly sure where.

I then decided to test prepare_detergent under the other possible approaches instead of using all samples.
Interestingly, using either of the two alternative options choose.randomly = TRUE or choose.by.clustering = TRUE both executed without an error.

Here using choose.randomly = TRUE and selecting 2 of the 4 samples:

pon_detergent <- prepare_detergent(normal.table.path = "/drycleanRun/test_ton.rds",
                                   use.all = FALSE,
                                   choose.randomly = TRUE,
                                   number.of.samples = 2,
                                   choose.by.clustering = FALSE,
                                   num.cores = 2,
                                   build = "hg38",
                                   path.to.save = "drycleanRun/",
                                   nochr = T,
                                   save.pon = T)

### OUTPUT ###
Starting the preparation of Panel of Normal samples a.k.a detergent
4 samples available
Selecting 2 normal samples randomly
PAR file not provided, using hg38 default. If this is not the correct build, please provide a GRange object delineating for corresponding build
PAR read
Checking for existence of files
2 files present
  |============================================================================================================| 100%, Elapsed 03:28
Starting decomposition
This is version 2
Warning: Item 1 has 3031053 rows but longest item has 15155223; recycled with remainder.Finished making the PON or detergent and saving it to the path provided

And here using choose.by.clustering = TRUE

pon_detergent <- prepare_detergent(normal.table.path = "/drycleanRun/test_ton.rds",
                                   use.all = FALSE,
                                   choose.randomly = FALSE,
                                   number.of.samples = 2,
                                   choose.by.clustering = TRUE,
                                   num.cores = 2,
                                   build = "hg38",
                                   path.to.save = "drycleanRun/",
                                   nochr = T,
                                   save.pon = T)

### OUTPUT ###
Starting the preparation of Panel of Normal samples a.k.a detergent
4 samples available
Starting the clustering
Starting decomposition on a small section of genome
This is version 2
Starting clustering
PAR file not provided, using hg38 default. If this is not the correct build, please provide a GRange object delineating for corresponding build
PAR read
Checking for existence of files
2 files present
  |============================================================================================================| 100%, Elapsed 01:52
Starting decomposition
This is version 2
Warning: Item 1 has 3031053 rows but longest item has 15155223; recycled with remainder.Finished making the PON or detergent and saving it to the path provided

The output detergent.rds is in working order as I was able to run start_wash_cycle without any problems.
I will likely use the clustering method for further analysis but wanted to point out this issue for others who encounter it.

Best,
Patrick

@zining01
Copy link
Collaborator

Hi Patrick,

Thanks for letting us know about the error. I have not encountered this before on our samples. What happens if you set number.of.samples to the total number of available samples when choosing randomly?

Zi-Ning

@zining01 zining01 reopened this Feb 28, 2023
@pblaney
Copy link
Author

pblaney commented Mar 31, 2023

Hello Zi-Ning,

I finally had some time to test out your suggestion. Unfortunately, using choose.randomly with setting number.of.samples equal to the total number of samples leads to the same error as use.all.

Furthermore, choose.randomly works when I set the number of samples to 2 out of 4 but it fails when I use 3 out of 4.
The same occurs with choose.by.clustering.

I'll keep testing to see if I can determine a pattern or give more information for debugging if others experience the same issue. I plan to greatly increase the input sample size so this may help resolve this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants