Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to write the function called findoverlaps with minoverlap>=1? #48

Closed
zhangchunyong999 opened this issue Nov 20, 2022 · 1 comment
Closed

Comments

@zhangchunyong999
Copy link

zhangchunyong999 commented Nov 20, 2022

Hello,bioinformaticians.
I want to write a function called findovelaps like bioconductor R. I tried like the following.

using GenomicFeatures
using DataFrames
function number_interval(tp::Tuple)
	# Unpack Tuple.
	(i, interval) = tp

	# Setup numbered metadata.
	new_metadata = (
		i = i,
		original = GenomicFeatures.metadata(interval)
	)

	# Create new interval with numbered metadata.
	return Interval(
		seqname(interval),
		leftposition(interval),
		rightposition(interval),
		strand(interval),
		new_metadata
	)
end
function findoverlaps(query,subject)
    query_numbered= query|> enumerate .|> number_interval  
    subject_numbered=subject |> enumerate .|> number_interval 
    df = Vector{Tuple{Int64, Int64}}()
    for (q,r) in eachoverlap(query_numbered,subject_numbered)
        result=(
             GenomicFeatures.metadata(q).i, 
             GenomicFeatures.metadata(r).i
        )
        push!(df,result)
        
    end
    rename!(DataFrame(df),[:queryHits,:subjectHits])
end

col = [
	Interval("chr1", 10628, 10683, '?', "abc")
	Interval("chr1", 10643, 10779, '?', "abc")
	Interval("chr1", 10645, 10748, '?', "abc")
	Interval("chr1", 10648, 10786, '?', "abc")
] |> IntervalCollection

hhh = [
	Interval("chr1", 10631, 10638)
	Interval("chr1", 10633, 10635)
	Interval("chr1", 10636, 10650)
	Interval("chr1", 10638, 10649)
	Interval("chr1", 10641, 10651)
] |> IntervalCollection

I ran the function findoverlaps,it returned following.

julia> overlap=findoverlaps(col,hhh)
14×2 DataFrame
 Row │ queryHits  subjectHits
     │ Int64      Int64
─────┼────────────────────────
   1 │         1            1
   2 │         1            2
   3 │         1            3
   4 │         1            4
   5 │         1            5
   6 │         2            3
   7 │         2            4
   8 │         2            5
   9 │         3            3
  10 │         3            4
  11 │         3            5
  12 │         4            3
  13 │         4            4
  14 │         4            5

But I can not solve the problem with minoverlap=5.For example,hhh’s first line has more than 5 overlaps with col’s first ,I will output the index of col’s index and hhh’s index.hhh’s second line does not have 5 overlaps,it will not occur in the final dataframe.The function above seems to solve minoverlap=1.What should I do to solve this problem? Thank all guys for helping me!

@CiaranOMara
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants