Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Julia 3x slower than C on this I/O benchmark #2050

Closed
mlubin opened this issue Jan 15, 2013 · 20 comments
Closed

Julia 3x slower than C on this I/O benchmark #2050

mlubin opened this issue Jan 15, 2013 · 20 comments
Labels
docs This change adds or pertains to documentation io Involving the I/O subsystem: libuv, read, write, etc. performance Must go faster
Milestone

Comments

@mlubin
Copy link
Member

mlubin commented Jan 15, 2013

The following Julia code takes about 65 seconds on my machine:

function bench()
    N = 10000
    rows = ["row$i" for i in 1:N]
    cols = ["col$i" for i in 1:N]
    f = open("juliadump","w")
    for i in 1:N
        for j in 1:N
            write(f,"$(rows[i]) $(cols[j])\n")
        end
    end
    close(f)
end

The following C code takes about 15 seconds:

#include <stdio.h>
#include <stdlib.h>

int main() {

    const int N = 10000;
    int i,j; 

    char **rows = malloc(N*sizeof(char*));
    char **cols = malloc(N*sizeof(char*));
    for (i = 0; i < N; i++) {
        rows[i] = malloc(10);
        sprintf(rows[i],"row%d",i);
        cols[i] = malloc(10);
        sprintf(cols[i],"col%d",i);
    }

    FILE *f = fopen("cdump","w");

    for (i = 0; i < N; i++) {
        for (j = 0; j < N; j++) {
            fprintf(f,"%s %s\n",rows[i],cols[j]);
        }
    }

    fclose(f);

    for (i = 0; i < N; i++) {
        free(rows[i]);
        free(cols[i]);
    }

    free(rows);
    free(cols);
    return 0;
}

Seems like a good test of both memory thrashing and I/O. How can Julia's performance be improved?

@mlubin
Copy link
Member Author

mlubin commented Jan 15, 2013

Note that unrolling the write loop in Julia does improve performance somewhat. The following code takes about 55 seconds:

function bench2()
    N = 10000
    rows = ["row$i" for i in 1:N]
    cols = ["col$i" for i in 1:N]
    f = open("juliadump","w")
    for i in 1:N
        for j in 1:2:N
            write(f,"$(rows[i]) $(cols[j])\n$(rows[i]) $(cols[j+1])\n")
        end
    end
    close(f)
end

@carlobaldassi
Copy link
Member

Changing this line

write(f,"$(rows[i]) $(cols[j])\n")

to

@printf(f, "%s %s\n", rows[i], cols[i])

brings down the timings to the same values of C (gcc with -O3) on my machine.

@carlobaldassi
Copy link
Member

As an alternative, this is even faster:

write(f, rows[i])
write(f, " ")
write(f, cols[i])

On the other hand, using strcat(rows[i], " ", cols[i]) is more than 2x slower (still better than string interpolation though). This probably calls for a method like:

import Base.write
write(io::IO, x1, x...) = (write(io, x1); write(io, x...))

which is just as fast as the "manually unrolled" one above (but raises ambiguities currently, with write methods defined with Any as their first argument - I think those should be fixed BTW).

@StefanKarpinski
Copy link
Member

I'm glad we can get to parity with C somehow but we clearly need to get all of these various ways of expressing this up to the same performance.

@mlubin
Copy link
Member Author

mlubin commented Jan 15, 2013

Huge speedup, nice @carlobaldassi. I agree that this shouldn't be a gotcha though.

@carlobaldassi
Copy link
Member

Sorry, disregard my previous comment. The fastest option is @printf. Also, writing to /dev/null in order to remove hard drive effects which bring in a lot of variability, I measured Julia performance to be within a factor 1.1 of C.

@diegozea
Copy link
Contributor

1.1 times C, great!!! :)

@IainNZ
Copy link
Member

IainNZ commented Jan 15, 2013

That is fantastic, thanks @carlobaldassi, really helps Miles and I out!

@mlubin
Copy link
Member Author

mlubin commented Jan 15, 2013

It might be too much to expect this magic to happen without using a macro, but it should at least be documented.

@StefanKarpinski
Copy link
Member

It could be made a lot closer in performance by not copying string data repeatedly while doing this (which is what the printf macro accomplishes).

@GlenHertz
Copy link
Contributor

This seems like good info to go in the "Performance Tips" of the manual.

@ViralBShah
Copy link
Member

Is this just a doc issue now?

@StefanKarpinski
Copy link
Member

I still think we need to address the performance of the other ways of printing data. It's not really ok that there are some ways of printing things that are a performance trap while others are fast.

JeffBezanson added a commit that referenced this issue Jul 2, 2013
ViralBShah added a commit that referenced this issue Jul 2, 2013
JeffBezanson added a commit that referenced this issue Jul 2, 2013
@ViralBShah
Copy link
Member

Bumping to check if there is anything in here that should be captured in documentation or in performance tips.

@mlubin
Copy link
Member Author

mlubin commented Aug 1, 2013

The original code now runs in 47 seconds for me, so now we're only 3x slower than C, updated title :)

@vtjnash
Copy link
Member

vtjnash commented Aug 7, 2013

Probably the only way to make this faster is to use RopeString, but that still incurs an allocation so it may be the same speed, or slower. It's probably not possible to beat the multi-argument call to write (or equivalently @printf). I think we can just close this by adding a statement in the Performance section recommending avoiding string interpolation where possible.

@JeffBezanson
Copy link
Member

Yes, RopeStrings are a disaster unless the strings you're concatenating are enormous.

@JeffBezanson
Copy link
Member

I now see a factor of 2 difference, and the same as C without string interp. I'll add a performance note and close this.

@vtjnash
Copy link
Member

vtjnash commented Feb 16, 2014

where'd the extra factor go?

having improved this by a factor of two (to a factor of two of C) is quite nice

@JeffBezanson
Copy link
Member

I got that number by writing to /dev/null. Not 100% sure why it is faster, but this is also a different machine than in the original report. There have been various improvements here and there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs This change adds or pertains to documentation io Involving the I/O subsystem: libuv, read, write, etc. performance Must go faster
Projects
None yet
Development

No branches or pull requests

9 participants