Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upload file - slow perfomance #382

Closed
SofianJabal opened this issue Apr 2, 2020 · 5 comments
Closed

upload file - slow perfomance #382

SofianJabal opened this issue Apr 2, 2020 · 5 comments
Assignees
Labels
Triage Investigation by a maintainer has started

Comments

@SofianJabal
Copy link

Hi, I found that it is started to take a lot of time to copy ~200 MB file on rpyc module when I moved from 4.1.2 to 4.1.4
it was taking less than 2 min, but now it takes ~20 min
I'm using upload_file function

After debugging the issue, I found the problematic commit
It is 020f9fd ...

I suggest to revert it and return the old code, then think on a better way to handle it

Thanks,
Sofian.

@comrumino comrumino self-assigned this Apr 3, 2020
@comrumino comrumino added the To Start Description reviewed and a maintainer needs "to start" triage label Apr 3, 2020
@comrumino
Copy link
Collaborator

comrumino commented Apr 3, 2020

@SofianJabal I think I may already fixed it at 7a216864789df6f3d9de61223c52a30c947554a1 when mentioned in a comment for an IPython issue---this commit isn't included in 4.1.4. If you have the time, could you confirm the issue is resolved using master? I can distribute the fix for you if it is resolved on master.

Thanks :)

@comrumino comrumino added Triage Investigation by a maintainer has started and removed To Start Description reviewed and a maintainer needs "to start" triage labels Apr 3, 2020
@SofianJabal
Copy link
Author

I used your code in 7a21686 and it is still reproduced, you should fix self.stream.write(...) to write one time and not 3 times as you implemented, see in channel.py lines 82, 83 and 84

Or you can return the original code of the send function in channel.py to be like this (this is what I did for me :) ):

def send(self, data):
    """Sends the given string of data as a packet over the underlying
    stream. Blocks until the packet has been sent.

    :param data: the byte string to send as a packet
    """
    if self.compress and len(data) > self.COMPRESSION_THRESHOLD:
        compressed = 1
        data = zlib.compress(data, self.COMPRESSION_LEVEL)
    else:
        compressed = 0
    data_size = len(data)
    header = self.FRAME_HEADER.pack(data_size, compressed)
    buf = header + data + self.FLUSHER
    self.stream.write(buf)

@llchan
Copy link

llchan commented Apr 22, 2020

+1. The fix in master helps a lot, but it's still degraded relative to 4.1.2. In our application it was 1.53x slower (vs 40x in 4.1.4), but that includes a lot of other non-RPyC stuff so the RPyC-specific degradation is likely higher.

@comrumino
Copy link
Collaborator

comrumino commented Apr 25, 2020

Thanks for the response!

I released 4.1.5 which should perform 1.5x slower. Of course, a slower performance for your use case isn't ideal. At the same time, there is an issue with the copy operation if the object being transmitted is say 2GB and the associated copy operation overhead being detrimental to some users. This makes the changes between 4.1.2 to 4.1.5 a bit tricky to perfect.

The current implementation considers the small and the very large. To optimize for the objects in between the small and large, you could increase MAX_IO_CHUNK to provide better performance at the cost of responsiveness. The question remains, how to make a self-documenting and usable way for users to tune performance for the core channel. Here is the differences for reference.

diff --git a/rpyc/core/channel.py b/rpyc/core/channel.py
index 446e89d..7eaa7fd 100644
--- a/rpyc/core/channel.py
+++ b/rpyc/core/channel.py
@@ -70,6 +70,15 @@ class Channel(object):
             data = zlib.compress(data, self.COMPRESSION_LEVEL)
         else:
             compressed = 0
-        header = self.FRAME_HEADER.pack(len(data), compressed)
-        buf = header + data + self.FLUSHER
-        self.stream.write(buf)
+        data_size = len(data)
+        header = self.FRAME_HEADER.pack(data_size, compressed)
+        flush_size = len(self.FLUSHER)
+        if self.FRAME_HEADER.size + data_size + flush_size <= self.stream.MAX_IO_CHUNK:
+            # avoid overhead from socket writes requiring GIL to be held
+            self.stream.write(header + data + self.FLUSHER)
+        else:
+            # Data larger than 64KB, the extra writes are negligible
+            part1 = self.stream.MAX_IO_CHUNK - self.FRAME_HEADER.size
+            self.stream.write(header + data[:part1])
+            self.stream.write(data[part1:])
+            self.stream.write(self.FLUSHER)

@comrumino
Copy link
Collaborator

Since the fix helps significantly and other code changes could have changed performance, I referenced this issue in #326 considering that further improvement is out of this issue's scope.

Community involvement is always welcomed 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triage Investigation by a maintainer has started
Projects
None yet
Development

No branches or pull requests

3 participants