Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue with open(f, file) #7901

Closed
jiahao opened this issue Aug 7, 2014 · 3 comments
Closed

Performance issue with open(f, file) #7901

jiahao opened this issue Aug 7, 2014 · 3 comments
Labels
performance Must go faster

Comments

@jiahao
Copy link
Member

jiahao commented Aug 7, 2014

To set up this computation:

download("ftp://ftp.ensembl.org/pub/release-67/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.67.dna_rm.chromosome.Y.fa.gz", "test.fa.gz")
run(`gunzip test.fa.gz`)

Version 1:

function parse(filename)
    f=open(filename) ###
    stuff=readall(f)      ###
    gc = at = 0
    for c in stuff.data
        gc += ifelse((c=='G')||(c=='C'), 1, 0)
        at += ifelse((c=='A')||(c=='T'), 1, 0)
    end
    gc_frac = gc / (gc + at)
    println("GC count: ", gc_frac)
end

parse("test.fa")
$ time julia v1.jl
GC count: 0.3762174467726095

real    0m1.389s
user    0m1.911s
sys     0m1.352s

Version 2:

function parse(filename)
    stuff = open(readall, filename) ###
    gc = at = 0
    for c in stuff.data
        gc += ifelse((c=='G')||(c=='C'), 1, 0)
        at += ifelse((c=='A')||(c=='T'), 1, 0)
    end
    gc_frac = gc / (gc + at)
    println("GC count: ", gc_frac)
end

parse("test.fa")
$ time julia v2.jl
GC count: 0.3762174467726095

real    0m35.539s
user    0m35.286s
sys     0m2.159s
julia> versioninfo()
Julia Version 0.3.0-rc2+37
Commit fc56f24 (2014-08-07 19:33 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E7- 8850  @ 2.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Nehalem)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

Context from this gist

@simonster
Copy link
Member

The type of stuff can't be inferred because of the function-valued argument, which makes everything else in the function resort to dynamic dispatch, which is slow. To make this faster we'd need type inference for calls with function-valued arguments or #5654.

@vtjnash
Copy link
Member

vtjnash commented Aug 8, 2014

alternatively, break up your function calls just before large loops (I think this is mentioned in the manual?)

@JeffBezanson
Copy link
Member

closing as dup of #1864 #210

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

4 participants