Julia 3x slower than C on this I/O benchmark #2050

mlubin · 2013-01-15T00:52:07Z

The following Julia code takes about 65 seconds on my machine:

function bench()
    N = 10000
    rows = ["row$i" for i in 1:N]
    cols = ["col$i" for i in 1:N]
    f = open("juliadump","w")
    for i in 1:N
        for j in 1:N
            write(f,"$(rows[i]) $(cols[j])\n")
        end
    end
    close(f)
end

The following C code takes about 15 seconds:

#include <stdio.h>
#include <stdlib.h>

int main() {

    const int N = 10000;
    int i,j; 

    char **rows = malloc(N*sizeof(char*));
    char **cols = malloc(N*sizeof(char*));
    for (i = 0; i < N; i++) {
        rows[i] = malloc(10);
        sprintf(rows[i],"row%d",i);
        cols[i] = malloc(10);
        sprintf(cols[i],"col%d",i);
    }

    FILE *f = fopen("cdump","w");

    for (i = 0; i < N; i++) {
        for (j = 0; j < N; j++) {
            fprintf(f,"%s %s\n",rows[i],cols[j]);
        }
    }

    fclose(f);

    for (i = 0; i < N; i++) {
        free(rows[i]);
        free(cols[i]);
    }

    free(rows);
    free(cols);
    return 0;
}

Seems like a good test of both memory thrashing and I/O. How can Julia's performance be improved?

mlubin · 2013-01-15T00:56:19Z

Note that unrolling the write loop in Julia does improve performance somewhat. The following code takes about 55 seconds:

function bench2()
    N = 10000
    rows = ["row$i" for i in 1:N]
    cols = ["col$i" for i in 1:N]
    f = open("juliadump","w")
    for i in 1:N
        for j in 1:2:N
            write(f,"$(rows[i]) $(cols[j])\n$(rows[i]) $(cols[j+1])\n")
        end
    end
    close(f)
end

carlobaldassi · 2013-01-15T01:42:19Z

Changing this line

write(f,"$(rows[i]) $(cols[j])\n")

to

@printf(f, "%s %s\n", rows[i], cols[i])

brings down the timings to the same values of C (gcc with -O3) on my machine.

carlobaldassi · 2013-01-15T01:55:29Z

As an alternative, this is even faster:

write(f, rows[i])
write(f, " ")
write(f, cols[i])

On the other hand, using strcat(rows[i], " ", cols[i]) is more than 2x slower (still better than string interpolation though). This probably calls for a method like:

import Base.write
write(io::IO, x1, x...) = (write(io, x1); write(io, x...))

which is just as fast as the "manually unrolled" one above (but raises ambiguities currently, with write methods defined with Any as their first argument - I think those should be fixed BTW).

StefanKarpinski · 2013-01-15T02:03:50Z

I'm glad we can get to parity with C somehow but we clearly need to get all of these various ways of expressing this up to the same performance.

mlubin · 2013-01-15T02:45:01Z

Huge speedup, nice @carlobaldassi. I agree that this shouldn't be a gotcha though.

carlobaldassi · 2013-01-15T02:46:03Z

Sorry, disregard my previous comment. The fastest option is @printf. Also, writing to /dev/null in order to remove hard drive effects which bring in a lot of variability, I measured Julia performance to be within a factor 1.1 of C.

diegozea · 2013-01-15T03:00:13Z

1.1 times C, great!!! :)

IainNZ · 2013-01-15T03:45:57Z

That is fantastic, thanks @carlobaldassi, really helps Miles and I out!

mlubin · 2013-01-15T04:29:26Z

It might be too much to expect this magic to happen without using a macro, but it should at least be documented.

StefanKarpinski · 2013-01-15T04:41:43Z

It could be made a lot closer in performance by not copying string data repeatedly while doing this (which is what the printf macro accomplishes).

GlenHertz · 2013-02-07T15:11:54Z

This seems like good info to go in the "Performance Tips" of the manual.

ViralBShah · 2013-06-30T09:18:29Z

Is this just a doc issue now?

StefanKarpinski · 2013-06-30T12:55:36Z

I still think we need to address the performance of the other ways of printing data. It's not really ok that there are some ways of printing things that are a performance trap while others are fast.

add write(io, xs...)

This reverts commit dc86b8d.

This reverts commit 57ad0dd.

…orking helps #2050, part of #3440

ViralBShah · 2013-08-01T05:41:20Z

Bumping to check if there is anything in here that should be captured in documentation or in performance tips.

mlubin · 2013-08-01T16:49:47Z

The original code now runs in 47 seconds for me, so now we're only 3x slower than C, updated title :)

vtjnash · 2013-08-07T18:46:18Z

Probably the only way to make this faster is to use RopeString, but that still incurs an allocation so it may be the same speed, or slower. It's probably not possible to beat the multi-argument call to write (or equivalently @printf). I think we can just close this by adding a statement in the Performance section recommending avoiding string interpolation where possible.

JeffBezanson · 2013-08-07T18:49:14Z

Yes, RopeStrings are a disaster unless the strings you're concatenating are enormous.

JeffBezanson · 2014-02-16T20:17:08Z

I now see a factor of 2 difference, and the same as C without string interp. I'll add a performance note and close this.

vtjnash · 2014-02-16T20:33:43Z

where'd the extra factor go?

having improved this by a factor of two (to a factor of two of C) is quite nice

JeffBezanson · 2014-02-16T21:31:44Z

I got that number by writing to /dev/null. Not 100% sure why it is faster, but this is also a different machine than in the original report. There have been various improvements here and there.

JeffBezanson added a commit that referenced this issue Jul 2, 2013

faster string() for ASCIIString. helps #2050

dc86b8d

add write(io, xs...)

ViralBShah added a commit that referenced this issue Jul 2, 2013

Revert "faster string() for ASCIIString. helps #2050"

57ad0dd

This reverts commit dc86b8d.

JeffBezanson added a commit that referenced this issue Jul 2, 2013

Revert "Revert "faster string() for ASCIIString. helps #2050""

4eb9f83

This reverts commit 57ad0dd.

JeffBezanson added a commit that referenced this issue Jul 2, 2013

compiler: remove temp vars that block vararg tuple elimination from w…

7fe5847

…orking helps #2050, part of #3440

JeffBezanson closed this as completed in 321e7fe Feb 16, 2014

JeffBezanson mentioned this issue Mar 27, 2015

performance regression in println #10650

Closed

yuyichao mentioned this issue May 27, 2015

Fix Unicode bugs with UTF-16/UTF-32 conversions (#10959) #11004

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Julia 3x slower than C on this I/O benchmark #2050

Julia 3x slower than C on this I/O benchmark #2050

mlubin commented Jan 15, 2013

mlubin commented Jan 15, 2013

carlobaldassi commented Jan 15, 2013

carlobaldassi commented Jan 15, 2013

StefanKarpinski commented Jan 15, 2013

mlubin commented Jan 15, 2013

carlobaldassi commented Jan 15, 2013

diegozea commented Jan 15, 2013

IainNZ commented Jan 15, 2013

mlubin commented Jan 15, 2013

StefanKarpinski commented Jan 15, 2013

GlenHertz commented Feb 7, 2013

ViralBShah commented Jun 30, 2013

StefanKarpinski commented Jun 30, 2013

ViralBShah commented Aug 1, 2013

mlubin commented Aug 1, 2013

vtjnash commented Aug 7, 2013

JeffBezanson commented Aug 7, 2013

JeffBezanson commented Feb 16, 2014

vtjnash commented Feb 16, 2014

JeffBezanson commented Feb 16, 2014

Julia 3x slower than C on this I/O benchmark #2050

Julia 3x slower than C on this I/O benchmark #2050

Comments

mlubin commented Jan 15, 2013

mlubin commented Jan 15, 2013

carlobaldassi commented Jan 15, 2013

carlobaldassi commented Jan 15, 2013

StefanKarpinski commented Jan 15, 2013

mlubin commented Jan 15, 2013

carlobaldassi commented Jan 15, 2013

diegozea commented Jan 15, 2013

IainNZ commented Jan 15, 2013

mlubin commented Jan 15, 2013

StefanKarpinski commented Jan 15, 2013

GlenHertz commented Feb 7, 2013

ViralBShah commented Jun 30, 2013

StefanKarpinski commented Jun 30, 2013

ViralBShah commented Aug 1, 2013

mlubin commented Aug 1, 2013

vtjnash commented Aug 7, 2013

JeffBezanson commented Aug 7, 2013

JeffBezanson commented Feb 16, 2014

vtjnash commented Feb 16, 2014

JeffBezanson commented Feb 16, 2014