Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bed_cluster grouping different contigs together #388

Closed
kcamnairb opened this issue Aug 4, 2022 · 2 comments
Closed

bed_cluster grouping different contigs together #388

kcamnairb opened this issue Aug 4, 2022 · 2 comments

Comments

@kcamnairb
Copy link

Hi, I've noticed a strange result with bed_cluster where it is somehow grouping together two contigs that are different but have similar names.

library(tidyverse)
library(valr)
tibble::tribble(
  ~chrom, ~start,   ~end,
  "contig_22", 177581, 177582,
  "contig_223",   5111,   5112,
) %>%
  bed_cluster(max_dist=10000)
## A tibble: 2 × 4
#  chrom       start    end   .id
#  <chr>       <dbl>  <dbl> <int>
#1 contig_22  177581 177582     1
#2 contig_223   5111   5112     1

But if the contig names are changed slightly bed_cluster will correctly group them as different:

tibble::tribble(
  ~chrom, ~start,   ~end,
  "contig_22", 177581, 177582,
  "contig_123",   5111,   5112,
) %>%
  bed_cluster(max_dist=10000)
## A tibble: 2 × 4
#  chrom       start    end   .id
#  <chr>       <dbl>  <dbl> <int>
#1 contig_123   5111   5112     1
#2 contig_22  177581 177582     2
@kriemo
Copy link
Member

kriemo commented Aug 4, 2022

thanks for the bug report. This looks like an issue with our handling of the max_dist argument. I'll dig into this a bit more and get back to you when we have a fix.

library(tidyverse)
library(valr)
a <- tibble::tribble(
  ~chrom, ~start,   ~end,
  "contig_22", 177581, 177582,
  "contig_223",   5111,   5112,
) 
bed_cluster(a, max_dist=0)
#> # A tibble: 2 × 4
#>   chrom       start    end   .id
#>   <chr>       <dbl>  <dbl> <int>
#> 1 contig_22  177581 177582     1
#> 2 contig_223   5111   5112     2
bed_cluster(a, max_dist=5110)
#> # A tibble: 2 × 4
#>   chrom       start    end   .id
#>   <chr>       <dbl>  <dbl> <int>
#> 1 contig_22  177581 177582     1
#> 2 contig_223   5111   5112     2
bed_cluster(a, max_dist=5111)
#> # A tibble: 2 × 4
#>   chrom       start    end   .id
#>   <chr>       <dbl>  <dbl> <int>
#> 1 contig_22  177581 177582     1
#> 2 contig_223   5111   5112     1

Created on 2022-08-04 by the reprex package (v2.0.1)

@kriemo
Copy link
Member

kriemo commented Aug 4, 2022

This issue should be fixed now in the main branch. To install use:

# install.packages("devtools")
devtools::install_github('rnabioco/valr')

Feel free to reopen if you find the issue is not fixed. Thanks again for the bug report and your interest in valr.

@kriemo kriemo closed this as completed Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants