Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode-only strings in IronPython make rpyc.utils.classic.deliver() fail if the server is Python 2 #251

Closed
pilcru opened this issue Jan 12, 2018 · 6 comments

Comments

@pilcru
Copy link

pilcru commented Jan 12, 2018

When connecting an IronPython (2.7) client to a server running Python 2, rpyc.utils.classic.deliver() fails:

On the server:

&> python -V && python rpyc_classic.py
Python 2.7.13 :: Continuum Analytics, Inc.
INFO:SLAVE/18812:server started on [127.0.0.1]:18812
INFO:SLAVE/18812:accepted ('127.0.0.1', 52319) with fd 868
INFO:SLAVE/18812:welcome ('127.0.0.1', 52319)
DEBUG:SLAVE/18812:Exception caught
Traceback (most recent call last):
  File "c:\tools\Anaconda3\envs\rhinoremote\lib\site-packages\rpyc\core\protocol.py", line 347, in _dispatch_request
    res = self._HANDLERS[handler](self, *args)
  File "c:\tools\Anaconda3\envs\rhinoremote\lib\site-packages\rpyc\core\protocol.py", line 624, in _handle_call
    return self._local_objects[oid](*args, **dict(kwargs))
TypeError: loads() argument 1 must be string, not unicode

On the client:

IronPython 2.7.5 (2.7.5.0) on .NET 4.0.30319.42000 (64-bit)
Type "help", "copyright", "credits" or "license" for more information.
>>> import rpyc
>>> rpyc.__version__
(3, 4, 4)
>>> conn = rpyc.classic.connect('localhost')
>>> rpyc.utils.classic.deliver(conn, 'a')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Program Files (x86)\IronPython 2.7\Lib\rpyc\utils\classic.py", line 289, in deliver
  File "C:\Program Files (x86)\IronPython 2.7\Lib\rpyc\core\netref.py", line 199, in __call__
  File "C:\Program Files (x86)\IronPython 2.7\Lib\rpyc\core\netref.py", line 72, in syncreq
  File "C:\Program Files (x86)\IronPython 2.7\Lib\rpyc\core\protocol.py", line 523, in sync_request
TypeError: loads() argument 1 must be string, not unicode

========= Remote Traceback (1) =========
...

I have identified the root of the problem to be in the _dump_registry in brine.py. Because IronPython does not distinguish between regular strings and unicode strings (ie. unicode is str is True), the str entry in the registry points to _dump_unicode, so thats how all strings are dumped. In turns, this makes the server decode as unicode all strings sent by value through brine, which in Python 2 cannot be passed to pickle.loads().

One way to solve this is to force plain ascii strings to be sent as regular strings by brine by changing _dump_unicode to:

    import sys
    if sys.platform == 'cli':
        @register(_dump_registry, unicode)
        def _dump_unicode(obj, stream):
            try:
                _dump_str(obj.decode('ascii'), stream)
            except UnicodeDecodeError:
                stream.append(TAG_UNICODE)
                _dump_str(obj.encode("utf8"), stream)
    else:
        @register(_dump_registry, unicode)
        def _dump_unicode(obj, stream):
            stream.append(TAG_UNICODE)
            _dump_str(obj.encode("utf8"), stream)

But this might impact performance, and it might be better to send a flag to the server for when the string should not be considered unicode. However, the question is then to define what should be unicode or not... Another way would be to force the server to convert to str before unpickling, but I haven't found a way of doing specifically that.

I know that IronPython 2 is an almost-dead project that you are not officially supporting, so I'm fine if you do not want to include this in the main branch; I just thought I'd let you know. This might have also been what was causing part of #10.

@coldfix
Copy link
Contributor

coldfix commented Jan 12, 2018

Hi, thanks for reporting.

Because IronPython does not distinguish between regular strings and unicode strings (ie. unicode is str is True)

For clarification, so in Ironpython we have unicode = str as on py3, but it does still distunguish bytes from unicode so unicode = str != bytes, right?

    if sys.platform == 'cli':

IronPython shows as cli?

        def _dump_unicode(obj, stream):
            try:
                _dump_str(obj.decode('ascii'), stream)
            except UnicodeDecodeError:
                stream.append(TAG_UNICODE)
                _dump_str(obj.encode("utf8"), stream)

I think you mean obj.encode in both cases, right?

The problem with this is that it opens the door for a lot more unicode/str related problems as were common on py2 whenever mixing unicode with bytes objects (i.e. a certain variable in a certain function can be either unicode or bytes) and that manifest as exceptions only once the program is used with true unicode data but may go unnoticed a long time during testing.

To be honest, given that IronPython handles str = unicode, I don't think a reliable patch to address communication between IronPython/CPython is possible.

(Otherwise, I would be willing to merge a working solution, if it exists and is simple enough)

coldfix added a commit that referenced this issue Jan 16, 2018
Apparently, on IronPython `str = unicode != bytes`. This should fix
problems with intra-ironpython communication.

Related to #251.
@pilcru
Copy link
Author

pilcru commented Jan 23, 2018

Hi, and sorry for not replying sooner.

I agree with all your comments here, and the solution in b02e7e7. I had actually implemented something very similar before I saw it, starting from your comment that unicode != bytes on IronPython.

There is just one thing to add to solve the original problem: rpyc.utils.classic.deliver needs to cast to bytes before sending through brine (IronPython's pickle.dumps returns str, not bytes). Changing it to

deliver(conn, localobj):
    return conn.modules["rpyc.lib.compat"].pickle.loads(bytes(pickle.dumps(localobj)))

works for my case (IronPython to Python 2). It should also be fine in regular Python since pickle.dumps already returns bytes (Python 3) or str == bytes (Python 2), but I haven't tested it.

coldfix added a commit that referenced this issue Jan 23, 2018
@coldfix
Copy link
Contributor

coldfix commented Jan 23, 2018

No problem. I have added your suggestion. Does this solve your issue?

Best, Thomas

@pilcru
Copy link
Author

pilcru commented Jan 23, 2018

Yes, thank you!

@pilcru
Copy link
Author

pilcru commented Feb 16, 2018

Sorry to reopen this, it seems I forgot one of the problematic pickle.dumps. This one in protocol.py also needs to be cast to bytes before returning, or trying to load a copy of an IronPython netref into CPython will fail.

@pilcru pilcru reopened this Feb 16, 2018
pilcru added a commit to pilcru/rpyc that referenced this issue Feb 16, 2018
@coldfix
Copy link
Contributor

coldfix commented Feb 19, 2018

Hi, thanks for the info. I hope this does what you had in mind.

pilcru added a commit to pilcru/rpyc that referenced this issue Feb 19, 2018
coldfix added a commit that referenced this issue Jun 11, 2018
This release brings a few minor backward incompatibilities, so be sure to read
on before upgrading. However, fear not: the ones that are most likely relevant
to you have a relatively simple migration path.

Backward Incompatibilities
^^^^^^^^^^^^^^^^^^^^^^^^^^

* ``classic.teleport_function`` now executes the function in the connection's
  namespace by default. To get the old behaviour, use
  ``teleport_function(conn, func, conn.modules[func.__module__].__dict__)``
  instead.

* Changed signature of ``Service.on_connect`` and ``on_disconnect``, adding
  the connection as argument.

* Changed signature of ``Service.__init__``, removing the connection argument

* no longer store connection as ``self._conn``. (allows services that serve
  multiple clients using the same service object, see `#198`_).

* ``SlaveService`` is now split into two asymetric classes: ``SlaveService``
  and ``MasterService``. The slave exposes functionality to the master but can
  not anymore access remote objects on the master (`#232`_, `#248`_).
  If you were previously using ``SlaveService``, you may experience problems
  when feeding the slave with netrefs to objects on the master. In this case, do
  any of the following:

  * use ``ClassicService`` (acts exactly like the old ``SlaveService``)
  * use ``SlaveService`` with a ``config`` that allows attribute access etc
  * use ``rpyc.utils.deliver`` to feed copies rather than netrefs to
    the slave

* ``RegistryServer.on_service_removed`` is once again called whenever a service
  instance is removed, making it symmetric to ``on_service_added`` (`#238`_)
  This reverts PR `#173`_ on issue `#172`_.

* Removed module ``rpyc.experimental.splitbrain``. It's too confusing and
  undocumented for me and I won't be developing it, so better remove it
  altogether. (It's still available in the ``splitbrain`` branch)

* Removed module ``rpyc.experimental.retunnel``. Seemingly unused anywhere, no
  documentation, no clue what this is about.

* ``bin/rpyc_classic.py`` will bind to ``127.0.0.1`` instead of ``0.0.0.0`` by
  default

* ``SlaveService`` no longer serves exposed attributes (i.e., it now uses
  ``allow_exposed_attrs=False``)

* Exposed attributes no longer hide plain attributes if one otherwise has the
  required permissions to access the plain attribute. (`#165`_)

.. _#165: #165
.. _#172: #172
.. _#173: #173
.. _#198: #198
.. _#232: #232
.. _#238: #238
.. _#248: #248

What else is new
^^^^^^^^^^^^^^^^

* teleported functions will now be defined by default in the globals dict

* Can now explicitly specify globals for teleported functions

* Can now use streams as context manager

* keep a hard reference to connection in netrefs, may fix some ``EOFError``
  issues, in particular on Jython related (`#237`_)

* handle synchronous and asynchronous requests uniformly

* fix deadlock with connections talking to each other multithreadedly (`#270`_)

* handle timeouts cumulatively

* fix possible performance bug in ``Win32PipeStream.poll`` (oversleeping)

* use readthedocs theme for documentation (`#269`_)

* actually time out sync requests (`#264`_)

* clarify documentation concerning exceptions in ``Connection.ping`` (`#265`_)

* fix ``__hash__`` for netrefs (`#267`_, `#268`_)

* rename ``async`` module to ``async_`` for py37 compatibility (`#253`_)

* fix ``deliver()`` from IronPython to CPython2 (`#251`_)

* fix brine string handling in py2 IronPython (`#251`_)

* add gevent_ Server. For now, this requires using ``gevent.monkey.patch_all()``
  before importing for rpyc. Client connections can already be made without
  further changes to rpyc, just using gevent's monkey patching. (`#146`_)

* add function ``rpyc.lib.spawn`` to spawn daemon threads

* fix several bugs in ``bin/rpycd.py`` that crashed this script on startup
  (`#231`_)

* fix problem with MongoDB, or more generally any remote objects that have a
  *catch-all* ``__getattr__`` (`#165`_)

* fix bug when copying remote numpy arrays (`#236`_)

* added ``rpyc.utils.helpers.classpartial`` to bind arguments to services (`#244`_)

* can now pass services optionally as instance or class (could only pass as
  class, `#244`_)

* The service is now charged with setting up the connection, doing so in
  ``Service._connect``. This allows using custom protocols by e.g. subclassing
  ``Connection``.  More discussions and related features in `#239`_-`#247`_.

* service can now easily override protocol handlers, by updating
  ``conn._HANDLERS`` in ``_connect`` or ``on_connect``. For example:
  ``conn._HANDLERS[HANDLE_GETATTR] = self._handle_getattr``.

* most protocol handlers (``Connection._handle_XXX``) now directly get the
  object rather than its ID as first argument. This makes overriding
  individual handlers feel much more high-level. And by the way it turns out
  that this fixes two long-standing issues (`#137`_, `#153`_)

* fix bug with proxying context managers (`#228`_)

* expose server classes from ``rpyc`` top level module

* fix logger issue on jython

.. _#137: #137
.. _#146: #146
.. _#153: #153
.. _#165: #165
.. _#228: #228
.. _#231: #231
.. _#236: #236
.. _#237: #237
.. _#239: #239
.. _#244: #244
.. _#247: #247
.. _#251: #251
.. _#253: #253
.. _#264: #264
.. _#265: #265
.. _#267: #267
.. _#268: #268
.. _#269: #269
.. _#270: #270

.. _gevent: http://www.gevent.org/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants