tcpsocket supports a send buffer #6876

amitmurthy · 2014-05-17T16:39:31Z

In the same spirit as #6768, this PR avoids a copy while transmitting large arrays between workers. It does this by

moving the buffering functionality of worker connections to becoming a feature of tcpsocket itself and
directly writing large arrays to the socket instead of via an intermediate buffer.

hisq · 2014-05-17T17:01:21Z

Probably also fixes #4508.

JeffBezanson · 2014-05-17T17:04:13Z

Awesome!! Looks pretty good but I will read it in more detail later.

vtjnash · 2014-05-19T00:25:47Z

would it make sense to make this a feature of all our streams, add generic lock/unlock functions, and utilize them whenever entering/exiting IO-related functions (e.g. print/show)?

Keno · 2014-05-19T02:34:08Z

I agree with Jameson. This looks fine to me and I agree that we shouldn't by default buffer just a single write call, since that would be confusing.

amitmurthy · 2014-05-19T05:25:43Z

@vtjnash did you mean all streams (including IOStream) or only AsyncStreams?

vtjnash · 2014-05-19T05:49:54Z

perhaps only AsyncStreams, but all streams would be even nicer. it doesn't entirely makes sense for IOBuffer to be pre-buffered, but it does make sense for it to have a write lock.

This would make IOStream double-buffered, but we could then replace IOStream with uv_file in general and get non-blocking IO for everything (which could potentially make the Pkg manager faster, since it appears to be I/O limited doing a independent operations on metadata for a lot of files).

JeffBezanson · 2014-05-19T15:32:04Z

Yes, with a buffer on the libuv File we can get rid of IOStream entirely (assuming it will perform well).

amitmurthy · 2014-05-20T05:06:29Z

base/stream.jl

+    end
+end
+
+make_lockable(s::AsyncStream) = (s.write_lock = RemoteRef())


Currently using a RemoteRef for locking. Wondering if I should change to use libuv mutexes. Local remote refs in workers suffer from the fact that they cannot be initialized till the worker id is known. Also libuv mutexes should be faster and more lightweight.

Just realized that libuv mutexes are recursive and Julia tasks would show up as being part of the same thread. May not be usable.

amitmurthy · 2014-05-20T05:07:16Z

Have moved the implementation to support all AsyncStreams (except UPDSocket).

amitmurthy · 2014-05-20T05:10:47Z

No tidy way to solve #3787 without locks though. With locking (if we decide to export the same), user code could look like

Base.make_lockable(STDOUT)
@sync begin
    for i in 1:3
        @async begin
            Base.lock_w(STDOUT) 
            println(`Getting garbled`)
            Base.unlock_w(STDOUT) 
        end
    end
end

amitmurthy · 2014-05-26T06:12:18Z

Bump. As stated it now supports all AsyncStreams. Feedback?

Merging of File and IOStream is a bit more involved and I would like to do that separately.

Keno · 2014-06-02T21:08:11Z

What's the need for having the lock inside the IOStream rather than as a separate thing?

amitmurthy · 2014-06-03T03:13:38Z

It is more intuitive to lock a stream and then write multiple times to it. Rather than have user code manage a lock separately. Sort of like lockf / fcntl on an open file fd.

Anyways, the lock in the IOStream object is not initialized till make_locakable is called on the stream. This is required since a local RemoteRef cannot be instantiated properly on a worker till its julia pid is known. And the pid is known only when the worker receives the first initialization message from the master process.

amitmurthy · 2014-06-12T04:56:19Z

Bump. Would be great if this could make it into 0.3

andreasnoack · 2014-11-17T18:09:38Z

What is the status here? I get i nice speedup from this on large arrays. See the yellow graph. It shows the time it takes to copy an array of Float64s to another process on my laptop as a function of the array size. The hump seems to be genuine as I have rerun the tests couple of times and the reported times are minima of ten runs. Maybe related to the way the buffer grows?

There also seems to be quite a bit of overhead in the @spawn compared to MPI Send/Recv. Is this tracked in an issue or just considered unavoidable because of the way @spawns work?

amitmurthy · 2014-11-18T02:14:26Z

It is waiting for a review by @JeffBezanson

The hump comes about since this optimization does not kick in for arrays less than 1MB in bytesize. The value of const SENDBUF_SZ.

I think quite a bit of overhead in @spawn is due to serialization itself. For e.g.

julia> iob=IOBuffer()
julia> a=ones(10^8);

julia> @time remotecall_fetch(2, x->x, a);
elapsed time: 0.925497984 seconds (1769649824 bytes allocated, 5.42% gc time)

julia> @time remotecall_fetch(2, x->x, a);
elapsed time: 0.914441734 seconds (1767317752 bytes allocated, 3.63% gc time)

julia> @time serialize(iob, a);
elapsed time: 0.239022428 seconds (967168640 bytes allocated, 5.84% gc time)

julia> seekstart(iob)

julia> @time serialize(iob, a);
elapsed time: 0.131713773 seconds (112 bytes allocated)

So, a local serialization to a pre-allocated buffer takes 0.13 seconds. If not pre-allocated, it is 0.23 seconds. Again the deserialization/serialization on the remote end will have similar figures.

amitmurthy · 2015-02-04T12:45:01Z

Closed in favor of #10073

amitmurthy added 3 commits May 20, 2014 09:36

tcpsocket supports a send buffer

ce9467f

All asyncstreams (except UDPsocket) support a send buffer

a903e34

fixup

06d140f

amitmurthy reviewed May 20, 2014
View reviewed changes

jiahao force-pushed the master branch 3 times, most recently from 6c7c7e3 to 1a4c02f Compare October 11, 2014 22:06

jiahao force-pushed the master branch from cdde4df to 7fdc860 Compare October 28, 2014 04:20

amitmurthy mentioned this pull request Nov 7, 2014

DArray : memory not fully recovered upon gc() #8912

Closed

MikeInnes force-pushed the master branch from 5c60996 to b1c3df3 Compare November 14, 2014 17:07

andreasnoack mentioned this pull request Nov 26, 2014

Parallel Computing Planning Issue #9167

Closed

12 tasks

This was referenced Feb 1, 2015

Speed of data movement in @spawn #9992

Closed

optimized send - direct writes for large bitstype arrays #10073

Merged

amitmurthy closed this Feb 4, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tcpsocket supports a send buffer #6876

tcpsocket supports a send buffer #6876

amitmurthy commented May 17, 2014

hisq commented May 17, 2014

JeffBezanson commented May 17, 2014

vtjnash commented May 19, 2014

Keno commented May 19, 2014

amitmurthy commented May 19, 2014

vtjnash commented May 19, 2014

JeffBezanson commented May 19, 2014

amitmurthy May 20, 2014

amitmurthy May 20, 2014

amitmurthy commented May 20, 2014

amitmurthy commented May 20, 2014

amitmurthy commented May 26, 2014

Keno commented Jun 2, 2014

amitmurthy commented Jun 3, 2014

amitmurthy commented Jun 12, 2014

andreasnoack commented Nov 17, 2014

amitmurthy commented Nov 18, 2014

amitmurthy commented Feb 4, 2015

tcpsocket supports a send buffer #6876

tcpsocket supports a send buffer #6876

Conversation

amitmurthy commented May 17, 2014

hisq commented May 17, 2014

JeffBezanson commented May 17, 2014

vtjnash commented May 19, 2014

Keno commented May 19, 2014

amitmurthy commented May 19, 2014

vtjnash commented May 19, 2014

JeffBezanson commented May 19, 2014

amitmurthy May 20, 2014

Choose a reason for hiding this comment

amitmurthy May 20, 2014

Choose a reason for hiding this comment

amitmurthy commented May 20, 2014

amitmurthy commented May 20, 2014

amitmurthy commented May 26, 2014

Keno commented Jun 2, 2014

amitmurthy commented Jun 3, 2014

amitmurthy commented Jun 12, 2014

andreasnoack commented Nov 17, 2014

amitmurthy commented Nov 18, 2014

amitmurthy commented Feb 4, 2015