Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Character List columns cause an error in join_overlap_left() #91

Open
smped opened this issue Jul 2, 2021 · 3 comments
Open

[BUG] Character List columns cause an error in join_overlap_left() #91

smped opened this issue Jul 2, 2021 · 3 comments

Comments

@smped
Copy link

smped commented Jul 2, 2021

Hi Stuart,

Hope things are going well & I'm still finding this to be such a useful package.

I've come across a problem with join_overlap_left() if the right ranges contain a CharacterList column, as might be output from reduce_ranges() depending on the function being used. If there is a CharacterList column, the fuction simply outputs the error:

Error: subscript contains NAs

As a minimal reproducible example:

library(plyranges)
x <- GRanges(c("chr1:1-10", "chr1:21-30")) 
y <- GRanges("chr1:25-30") %>% mutate(letter = CharacterList("a"))
join_overlap_left(x, y)
Error: subscript contains NAs

This produces the above error, however, the same error doesn't occur when using a generic S3 list column

y$letter <- as(y$letter, "list") 
join_overlap_left(x, y)

GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand | letter
         <Rle> <IRanges>  <Rle> | <list>
  [1]     chr1      1-10      * |       
  [2]     chr1     21-30      * |      a
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

I also noticed that I couldn't find a way to change the original CharacterList column into an S3 list using mutate(), but that might be a side issue.

y <- GRanges("chr1:25-30") %>% mutate(letter = CharacterList("a"))
mutate(y, letter = as(letter, "list"))

GRanges object with 1 range and 1 metadata column:
      seqnames    ranges strand |      letter
         <Rle> <IRanges>  <Rle> | <character>
  [1]     chr1     25-30      * |           a
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

This shouldn't (to my mind) produce a character column, but should return a list column. If the object is more complicated than my toy example, it can cause the data to fall apart pretty badly.

y <- GRanges(c("chr1:25-30", "chr1:101")) %>% 
  mutate(letter = CharacterList(list("a", c("b", "c")))) 
y %>%
  mutate(letter = as(letter, "list"))

GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand |      letter
         <Rle> <IRanges>  <Rle> | <character>
  [1]     chr1     25-30      * |           a
  [2]     chr1       101      * |           a
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
Warning message:
In recycleSingleBracketReplacementValue(value, x, nsbs) :
  number of values supplied is not a sub-multiple of the number of values to be replaced

Hopefully that's not too much information.

Cheers,

Steve

R session information

Session info ──────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.1.0 (2021-05-18)
 os       Ubuntu 20.04.2 LTS          
 system   x86_64, linux-gnu           
 ui       X11                         
 language (EN)                        
 collate  C.UTF-8                     
 ctype    C.UTF-8                     
 tz       Australia/Adelaide          
 date     2021-07-02Packages ──────────────────────────────────────────────────────────────────
 package              * version  date       lib source        
 assertthat             0.2.1    2019-03-21 [1] CRAN (R 4.1.0)
 Biobase                2.52.0   2021-05-19 [1] Bioconductor  
 BiocGenerics         * 0.38.0   2021-05-19 [1] Bioconductor  
 BiocIO                 1.2.0    2021-05-19 [1] Bioconductor  
 BiocManager            1.30.16  2021-06-15 [1] CRAN (R 4.1.0)
 BiocParallel           1.26.0   2021-05-19 [1] Bioconductor  
 Biostrings             2.60.1   2021-06-06 [1] Bioconductor  
 bitops                 1.0-7    2021-04-24 [1] CRAN (R 4.1.0)
 cli                    3.0.0    2021-06-30 [1] CRAN (R 4.1.0)
 crayon                 1.4.1    2021-02-08 [1] CRAN (R 4.1.0)
 DBI                    1.1.1    2021-01-15 [1] CRAN (R 4.1.0)
 DelayedArray           0.18.0   2021-05-19 [1] Bioconductor  
 digest                 0.6.27   2020-10-24 [1] CRAN (R 4.1.0)
 dplyr                  1.0.7    2021-06-18 [1] CRAN (R 4.1.0)
 ellipsis               0.3.2    2021-04-29 [1] CRAN (R 4.1.0)
 evaluate               0.14     2019-05-28 [1] CRAN (R 4.1.0)
 fansi                  0.5.0    2021-05-25 [1] CRAN (R 4.1.0)
 fs                     1.5.0    2020-07-31 [1] CRAN (R 4.1.0)
 generics               0.1.0    2020-10-31 [1] CRAN (R 4.1.0)
 GenomeInfoDb         * 1.28.0   2021-05-19 [1] Bioconductor  
 GenomeInfoDbData       1.2.6    2021-06-28 [1] Bioconductor  
 GenomicAlignments      1.28.0   2021-05-19 [1] Bioconductor  
 GenomicRanges        * 1.44.0   2021-05-19 [1] Bioconductor  
 glue                   1.4.2    2020-08-27 [1] CRAN (R 4.1.0)
 htmltools              0.5.1.1  2021-01-22 [1] CRAN (R 4.1.0)
 httpuv                 1.6.1    2021-05-07 [1] CRAN (R 4.1.0)
 IRanges              * 2.26.0   2021-05-19 [1] Bioconductor  
 knitr                  1.33     2021-04-24 [1] CRAN (R 4.1.0)
 later                  1.2.0    2021-04-23 [1] CRAN (R 4.1.0)
 lattice                0.20-44  2021-05-02 [4] CRAN (R 4.1.0)
 lifecycle              1.0.0    2021-02-15 [1] CRAN (R 4.1.0)
 magrittr               2.0.1    2020-11-17 [1] CRAN (R 4.1.0)
 Matrix                 1.3-4    2021-06-01 [4] CRAN (R 4.1.0)
 MatrixGenerics         1.4.0    2021-05-19 [1] Bioconductor  
 matrixStats            0.59.0   2021-06-01 [1] CRAN (R 4.1.0)
 pillar                 1.6.1    2021-05-16 [1] CRAN (R 4.1.0)
 pkgconfig              2.0.3    2019-09-22 [1] CRAN (R 4.1.0)
 plyranges            * 1.12.1   2021-06-29 [1] Bioconductor  
 promises               1.2.0.1  2021-02-11 [1] CRAN (R 4.1.0)
 purrr                  0.3.4    2020-04-17 [1] CRAN (R 4.1.0)
 R6                     2.5.0    2020-10-28 [1] CRAN (R 4.1.0)
 Rcpp                   1.0.6    2021-01-15 [1] CRAN (R 4.1.0)
 RCurl                  1.98-1.3 2021-03-16 [1] CRAN (R 4.1.0)
 restfulr               0.0.13   2017-08-06 [1] CRAN (R 4.1.0)
 rjson                  0.2.20   2018-06-08 [1] CRAN (R 4.1.0)
 rlang                  0.4.11   2021-04-30 [1] CRAN (R 4.1.0)
 rmarkdown              2.9      2021-06-15 [1] CRAN (R 4.1.0)
 Rsamtools              2.8.0    2021-05-19 [1] Bioconductor  
 rstudioapi             0.13     2020-11-12 [1] CRAN (R 4.1.0)
 rtracklayer            1.52.0   2021-05-19 [1] Bioconductor  
 S4Vectors            * 0.30.0   2021-05-19 [1] Bioconductor  
 sessioninfo            1.1.1    2018-11-05 [1] CRAN (R 4.1.0)
 SummarizedExperiment   1.22.0   2021-05-19 [1] Bioconductor  
 tibble                 3.1.2    2021-05-16 [1] CRAN (R 4.1.0)
 tidyselect             1.1.1    2021-04-30 [1] CRAN (R 4.1.0)
 utf8                   1.2.1    2021-03-12 [1] CRAN (R 4.1.0)
 vctrs                  0.3.8    2021-04-29 [1] CRAN (R 4.1.0)
 withr                  2.4.2    2021-04-18 [1] CRAN (R 4.1.0)
 workflowr            * 1.6.2    2020-04-30 [1] CRAN (R 4.1.0)
 xfun                   0.24     2021-06-15 [1] CRAN (R 4.1.0)
 XML                    3.99-0.6 2021-03-16 [1] CRAN (R 4.1.0)
 XVector                0.32.0   2021-05-19 [1] Bioconductor  
 yaml                   2.2.1    2020-02-01 [1] CRAN (R 4.1.0)
 zlibbioc               1.38.0   2021-05-19 [1] Bioconductor  
@smped
Copy link
Author

smped commented Jul 2, 2021

I should also add that I tracked it down to the following line from .join_overlap_left()

mcols_outer <- na_dframe(mcols(right), sum(only_left))

Might save you a few minutes while debugging

@sa-lee
Copy link
Collaborator

sa-lee commented Jul 7, 2021

Thanks for the report Steve, I'll try to get to this one on the weekend :)

@hw538
Copy link

hw538 commented Jul 7, 2022

the same bug happened when the meta col is 'DNAStringSet' , would you mind adding support to this? Thank you ~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants