Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpack for binary data #105

Closed
copyhold opened this issue Jul 29, 2018 · 13 comments · Fixed by #211
Closed

Unpack for binary data #105

copyhold opened this issue Jul 29, 2018 · 13 comments · Fixed by #211
Assignees
Labels
bug Something isn't working

Comments

@copyhold
Copy link

Hi,

My data in tarantool stored as binary strings, they are not UTF in any manner. It's just a sequence of bytes.
When trying to select data with this module , unpacker fails with error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/tarantool/space.py", line 75, in select
    return self.connection.select(self.space_no, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tarantool/connection.py", line 775, in select
    response = self._send_request(request)
  File "/usr/local/lib/python3.6/dist-packages/tarantool/connection.py", line 357, in _send_request
    return self._send_request_wo_reconnect(request)
  File "/usr/local/lib/python3.6/dist-packages/tarantool/connection.py", line 264, in _send_request_wo_reconnect
    response = Response(self, self._read_response())
  File "/usr/local/lib/python3.6/dist-packages/tarantool/response.py", line 62, in __init__
    self._body = unpacker.unpack()
  File "msgpack/_unpacker.pyx", line 519, in msgpack._unpacker.Unpacker.unpack
  File "msgpack/_unpacker.pyx", line 499, in msgpack._unpacker.Unpacker._unpack
msgpack.exceptions.UnpackValueError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Can this be resolved somehow?

@31hkim
Copy link

31hkim commented May 19, 2020

Any success with fixing that? Still have that problem with Tarantool 2.2 and connector 0.6.6 even when I don't select binary data from a space containing it

@Totktonada
Copy link
Member

It seems, we cannot just set encoding to None when creating a connection, because at least schema reload does not support this.

However we can temporary set encoding to None:

diff --git a/unit/suites/test_dml.py b/unit/suites/test_dml.py
index 31e821e..1264180 100644
--- a/unit/suites/test_dml.py
+++ b/unit/suites/test_dml.py
@@ -70,6 +70,12 @@ class TestSuite_Request(unittest.TestCase):
                     self.con.insert('space_1', [i, i%5, 'tuple_'+str(i)])[0],
                     [i, i%5, 'tuple_'+str(i)]
             )
+        saved_encoding = self.con.encoding
+        self.con.encoding = None
+        t = [1001, 1, b'\xff']
+        self.assertEqual(self.con.insert('space_1', t)[0], t)
+        self.con.encoding = saved_encoding
+
     def test_00_03_answer_repr(self):
         repr_str = """- [1, 1, 'tuple_1']"""
         self.assertEqual(repr(self.con.select('space_1', 1)), repr_str)
@@ -122,6 +128,12 @@ class TestSuite_Request(unittest.TestCase):
             [[200, 0, 'tuple_200'], [205, 0, 'tuple_205']]
         )
 
+        saved_encoding = self.con.encoding
+        self.con.encoding = None
+        res = self.con.select('space_1', 1001)
+        self.assertSequenceEqual(res, [[1001, 1, b'\xff']])
+        self.con.encoding = saved_encoding
+
     def test_03_delete(self):
         # Check that delete works fine
         self.assertSequenceEqual(self.con.delete('space_1', 20), [[20, 0, 'tuple_20']])

NB: File another issue for Connection(host, port, encoding=None) support.

@31hkim
Copy link

31hkim commented May 20, 2020

Now I'm getting this:

>> print(result)
- None
- b'Type mismatch: can not convert 'REDACTED BINARY STRING' to varbinary'

And honestly I don't understand what encodings have to do when it comes to raw binary data. Like shouldn't it be just directly converted to bytes Python type?

Totktonada added a commit that referenced this issue May 20, 2020
This allows to write mp_bin, which is required for 'varbinary' field
type: just write a value of 'bytes' type and it will be encoded as
mp_bin instead of mp_str.

mp_bin is already decoded into bytes, which is consistent with the new
encode behaviour.

XXX: Change the behaviour only under an option? If we'll go this way, we
can also split mp_str / mp_bin across unicode / str for Python 2, but
don't sure it is convenient.

Fixes #105
@Totktonada
Copy link
Member

Okay, two different problems are discussed here:

  1. A user store binary (non-utf-8 data) in a 'string' field and don't want data being encoded or decoded when working with this field. May be worked around as shown above.
  2. A user want to send / receive mp_bin to work with a 'varbinary' field. Everything looks okay on decoding side (mp_bin is decoded to bytes), but a bytes value is encoded as mp_str. Maybe we can change it on Python 3 (RFC patch).

@31hkim
Copy link

31hkim commented May 21, 2020

Ah, sorry for confusion, and thank you. Will wait for a complete fix

@31hkim
Copy link

31hkim commented May 28, 2020

Any success?

@Totktonada
Copy link
Member

We're at planning of Q3 tasks and I hope this one will be included here. I'll specifically mark it as important.

@ligurio ligurio self-assigned this Jul 20, 2020
Totktonada added a commit that referenced this issue Aug 26, 2020
Several different problems are fixed here, but all have the same root.
When a connection encoding is None (it is default on Python 2 and may be
set explicitly on Python 3), all mp_str values are decoded into bytes,
not Unicode strings (note that bytes is alias for str in Python 2). But
the database schema parsing code have assumptions that _vspace / _vindex
values are Unicode strings.

The resolved problems are the following:

1. Default encoding in bytes#decode() method is 'ascii', however names
   in tarantool can contain symbols beyond ASCII symbol table. Set
   'utf-8' for names decoding.
2. Convert all binary values into Unicode strings before parse or store
   them. This allows further correct accesses to the local schema
   representation.
3. Convert binary parameters like space, index or field name into
   Unicode strings, when a schema is accessed to don't trigger redundant
   schema refetching.

Those problems are briefly mentioned in [1].

Tested manually with Python 2 and Python 3: my testing tarantool
instance has a space with name '©' and after the changes I'm able to
connect to it when the connection encoding is set to None. Also I
verified that schema is not fetched each time when I do
<connection>.select('©') in Python 2 (where such string literal is str /
bytes, not Unicode string).

[1]: #105 (comment)
@Totktonada
Copy link
Member

NB: Propose to msgpack-python to provide helpers like msgpack.as_bin() and msgpack.as_str() to encode a particular structure (dict, list, tuple) member as mp_bin or mp_str disregarding use_bin_type option. It may be requirement of an msgpack-based protocol to have a string here and a binary value there (exactly our case).

Usage example:

import msgpack
msgpack.dumps(dict(foo='string', bar=msgpack.as_bin('binary')))

Totktonada added a commit that referenced this issue Aug 28, 2020
Several different problems are fixed here, but all have the same root.
When a connection encoding is None (it is default on Python 2 and may be
set explicitly on Python 3), all mp_str values are decoded into bytes,
not Unicode strings (note that bytes is alias for str in Python 2). But
the database schema parsing code have assumptions that _vspace / _vindex
values are Unicode strings.

The resolved problems are the following:

1. Default encoding in bytes#decode() method is 'ascii', however names
   in tarantool can contain symbols beyond ASCII symbol table. Set
   'utf-8' for names decoding.
2. Convert all binary values into Unicode strings before parse or store
   them. This allows further correct accesses to the local schema
   representation.
3. Convert binary parameters like space, index or field name into
   Unicode strings, when a schema is accessed to don't trigger redundant
   schema refetching.

Those problems are briefly mentioned in [1].

Tested manually with Python 2 and Python 3: my testing tarantool
instance has a space with name '©' and after the changes I'm able to
connect to it when the connection encoding is set to None. Also I
verified that schema is not fetched each time when I do
<connection>.select('©') in Python 2 (where such string literal is str /
bytes, not Unicode string).

Relevant test cases are added in next commits.

[1]: #105 (comment)
Totktonada added a commit that referenced this issue Aug 28, 2020
Several different problems are fixed here, but all have the same root.
When a connection encoding is None (it is default on Python 2 and may be
set explicitly on Python 3), all mp_str values are decoded into bytes,
not Unicode strings (note that bytes is alias for str in Python 2). But
the database schema parsing code have assumptions that _vspace / _vindex
values are Unicode strings.

The resolved problems are the following:

1. Default encoding in bytes#decode() method is 'ascii', however names
   in tarantool can contain symbols beyond ASCII symbol table. Set
   'utf-8' for names decoding.
2. Convert all binary values into Unicode strings before parse or store
   them. This allows further correct accesses to the local schema
   representation.
3. Convert binary parameters like space, index or field name into
   Unicode strings, when a schema is accessed to don't trigger redundant
   schema refetching.

Those problems are briefly mentioned in [1].

Tested manually with Python 2 and Python 3: my testing tarantool
instance has a space with name '©' and after the changes I'm able to
connect to it when the connection encoding is set to None. Also I
verified that schema is not fetched each time when I do
<connection>.select('©') in Python 2 (where such string literal is str /
bytes, not Unicode string).

Relevant test cases are added in next commits.

[1]: #105 (comment)
Totktonada added a commit that referenced this issue Aug 28, 2020
Several different problems are fixed here, but all have the same root.
When a connection encoding is None (it is default on Python 2 and may be
set explicitly on Python 3), all mp_str values are decoded into bytes,
not Unicode strings (note that bytes is alias for str in Python 2). But
the database schema parsing code have assumptions that _vspace / _vindex
values are Unicode strings.

The resolved problems are the following:

1. Default encoding in bytes#decode() method is 'ascii', however names
   in tarantool can contain symbols beyond ASCII symbol table. Set
   'utf-8' for names decoding.
2. Convert all binary values into Unicode strings before parse or store
   them. This allows further correct accesses to the local schema
   representation.
3. Convert binary parameters like space, index or field name into
   Unicode strings, when a schema is accessed to don't trigger redundant
   schema refetching.

Those problems are briefly mentioned in [1].

Tested manually with Python 2 and Python 3: my testing tarantool
instance has a space with name '©' and after the changes I'm able to
connect to it when the connection encoding is set to None. Also I
verified that schema is not fetched each time when I do
<connection>.select('©') in Python 2 (where such string literal is str /
bytes, not Unicode string).

Relevant test cases are added in next commits.

[1]: #105 (comment)
@ligurio ligurio assigned Totktonada and unassigned ligurio Nov 18, 2020
@Totktonada
Copy link
Member

I think regarding this once again and it seems that encoding of bytes as mp_bin would be good idea in context of Python 3. We should not break compatibility, though. Changing defaults is nightmare. Let's give a meaningful error message instead, that would direct a user to use appropriate option.

NB: Think regarding a patch of this kind:

diff --git a/tarantool/request.py b/tarantool/request.py
index d1a5a82..c54a47b 100644
--- a/tarantool/request.py
+++ b/tarantool/request.py
@@ -79,26 +79,11 @@ class Request(object):
 
         packer_kwargs = dict()
 
-        # use_bin_type=True is default since msgpack-1.0.0.
-        #
-        # The option controls whether to pack binary (non-unicode)
-        # string values as mp_bin or as mp_str.
-        #
-        # The default behaviour of the connector is to pack both
-        # bytes and Unicode strings as mp_str.
-        #
-        # msgpack-0.5.0 (and only this version) warns when the
-        # option is unset:
-        #
-        #  | FutureWarning: use_bin_type option is not specified.
-        #  | Default value of the option will be changed in future
-        #  | version.
-        #
-        # The option is supported since msgpack-0.4.0, so we can
-        # just always set it for all msgpack versions to get rid
-        # of the warning on msgpack-0.5.0 and to keep our
-        # behaviour on msgpack-1.0.0.
-        packer_kwargs['use_bin_type'] = False
+        # XXX: Write an explanation here.
+        # XXX: Pass it from a connection constructor args, but keep
+        # use_bin_type=False behaviour for Python 2 (because string
+        # literals are bytes).
+        packer_kwargs['use_bin_type'] = True
 
         self.packer = msgpack.Packer(**packer_kwargs)

@ObjatieGroba
Copy link

There are no need to send extra arguments (as raw = True) into msgpack.

data = ['a', b'b']
assert msgpack.unpackb(msgpack.packb(data)) == data

Major diff at 1.0.0 version is encoding option is removed. UTF-8 is used always.

That is why it is better not to pass any arguments into Decoder if encoding is 'utf-8' and version of msgpack greater than 1.0.0

serge-name added a commit to serge-name/tarantool-python that referenced this issue Jul 16, 2021
@serge-name
Copy link

serge-name commented Jul 16, 2021

Hello, any progress on this issue?

I would suggest to bump msgpack to 1.0.0 and to remove that use_bin_type=False line. Otherwise we have no chance to store binary data.

Current behavior for tarantool-python 0.7.1:

@>>> c.insert("some", ("foo", pickle.dumps(42)))
Traceback (most recent call last):
…
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
@>>> c.delete("some", "foo")
Traceback (most recent call last):
…
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
@>>> c.delete("some", "foo")      # already deleted

@>>>

The fix makes everything working:

@>>> c.insert("some", ("foo", pickle.dumps(42)))
- ['foo', b'\x80\x04K*.']
@>>> c.delete("some", "foo")
- ['foo', b'\x80\x04K*.']

@Totktonada
Copy link
Member

Sorry, no progress yet. I need to dive into the topic again.

@serge-name
Copy link

After the posting that PR, I realised that it is incomplete. I found several checks for different msgpack version in the tarantool-python code. I suggest two solutions:

  1. add another version check to set use_bin_type=False only if msgpack is older than 1.0.0, just only one condition line.
  2. transition to msgpack>=1.0.0 and remove all obsolete version checks.

I'm ready to send either PR, just tell me what you prefer :)

Yesterday I tried to use Tarantool in my company's projects and right now I have to use a private fork of python-tarantool with my fix applied. Our projects need to store pickled python objects in Tarantool.

I hope that solution will emerge very soon.

@Totktonada Totktonada added bug Something isn't working and removed defect labels Dec 14, 2021
DifferentialOrange added a commit that referenced this issue Mar 16, 2022
Before this patch, both bytes and Unicode strings was encoded as mp_str.
It was possible to work with utf and non-utf strings, but not with
varbinary [1] (mp_bin). This patch adds varbinary support for Python 3
by default. Python 2 connector behavior remains the same.

For encoding="utf-8" (default), the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_bin (varbinary) -> bytes
 str      -> mp_str (string)    -> str

For encoding=None, the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_str (string)    -> bytes

This patch changes current behavior for Python 3. Now bytes objects
encoded to varbinary by default. bytes objects also supported as keys.

This patch do not adds new restrictions (like "do not permit to use
str in encoding=None mode because result may be confusing") to preserve
current behavior (for example, using space name as str in schema
get_space).

1. tarantool/tarantool#4201

Closes #105
DifferentialOrange added a commit that referenced this issue Mar 16, 2022
Before this patch, both bytes and str were encoded as mp_str. It was
possible to work with utf and non-utf strings, but not with
varbinary [1] (mp_bin). This patch adds varbinary support for Python 3
by default. Python 2 connector behavior remains the same.

For encoding="utf-8" (default), the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_bin (varbinary) -> bytes
 str      -> mp_str (string)    -> str

For encoding=None, the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_str (string)    -> bytes

This patch changes current behavior for Python 3. Now bytes objects
encoded to varbinary by default. bytes objects are also supported
as keys.

This patch does not add new restrictions (like "do not permit to use
str in encoding=None mode because result may be confusing") to preserve
current behavior (for example, using space name as str in schema
get_space).

1. tarantool/tarantool#4201

Closes #105
DifferentialOrange added a commit that referenced this issue Mar 16, 2022
Before this patch, both bytes and str were encoded as mp_str. It was
possible to work with utf and non-utf strings, but not with
varbinary [1] (mp_bin). This patch adds varbinary support for Python 3
by default. Python 2 connector behavior remains the same.

For encoding="utf-8" (default), the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_bin (varbinary) -> bytes
 str      -> mp_str (string)    -> str

For encoding=None, the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_str (string)    -> bytes

This patch changes current behavior for Python 3. Now bytes objects
encoded to varbinary by default. bytes objects are also supported
as keys.

This patch does not add new restrictions (like "do not permit to use
str in encoding=None mode because result may be confusing") to preserve
current behavior (for example, using space name as str in schema
get_space).

1. tarantool/tarantool#4201

Closes #105
DifferentialOrange added a commit that referenced this issue Mar 23, 2022
Before this patch, both bytes and str were encoded as mp_str. It was
possible to work with utf and non-utf strings, but not with
varbinary [1] (mp_bin). This patch adds varbinary support for Python 3
by default. Python 2 connector behavior remains the same.

For encoding="utf-8" (default), the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_bin (varbinary) -> bytes
 str      -> mp_str (string)    -> str

For encoding=None, the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_str (string)    -> bytes

This patch changes current behavior for Python 3. Now bytes objects
encoded to varbinary by default. bytes objects are also supported
as keys.

This patch does not add new restrictions (like "do not permit to use
str in encoding=None mode because result may be confusing") to preserve
current behavior (for example, using space name as str in schema
get_space).

1. tarantool/tarantool#4201

Closes #105

1
DifferentialOrange added a commit that referenced this issue Mar 23, 2022
Before this patch, both bytes and str were encoded as mp_str. It was
possible to work with utf and non-utf strings, but not with
varbinary [1] (mp_bin). This patch adds varbinary support for Python 3
by default. Python 2 connector behavior remains the same.

For encoding="utf-8" (default), the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_bin (varbinary) -> bytes
 str      -> mp_str (string)    -> str

For encoding=None, the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_str (string)    -> bytes

This patch changes current behavior for Python 3. Now bytes objects
encoded to varbinary by default. bytes objects are also supported
as keys.

This patch does not add new restrictions (like "do not permit to use
str in encoding=None mode because result may be confusing") to preserve
current behavior (for example, using space name as str in schema
get_space).

1. tarantool/tarantool#4201

Closes #105
DifferentialOrange added a commit that referenced this issue Mar 23, 2022
Before this patch, both bytes and str were encoded as mp_str. It was
possible to work with utf and non-utf strings, but not with
varbinary [1] (mp_bin). This patch adds varbinary support for Python 3
by default. Python 2 connector behavior remains the same.

For encoding="utf-8" (default), the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_bin (varbinary) -> bytes
 str      -> mp_str (string)    -> str

For encoding=None, the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_str (string)    -> bytes

This patch changes current behavior for Python 3. Now bytes objects
encoded to varbinary by default. bytes objects are also supported
as keys.

This patch does not add new restrictions (like "do not permit to use
str in encoding=None mode because result may be confusing") to preserve
current behavior (for example, using space name as str in schema
get_space).

1. tarantool/tarantool#4201

Closes #105
DifferentialOrange added a commit that referenced this issue Mar 23, 2022
Before this patch, both bytes and str were encoded as mp_str. It was
possible to work with utf and non-utf strings, but not with
varbinary [1] (mp_bin). This patch adds varbinary support for Python 3
by default. Python 2 connector behavior remains the same.

For encoding="utf-8" (default), the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_bin (varbinary) -> bytes
 str      -> mp_str (string)    -> str

For encoding=None, the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_str (string)    -> bytes

This patch changes current behavior for Python 3. Now bytes objects
encoded to varbinary by default. bytes objects are also supported
as keys.

This patch does not add new restrictions (like "do not permit to use
str in encoding=None mode because result may be confusing") to preserve
current behavior (for example, using space name as str in schema
get_space).

1. tarantool/tarantool#4201

Closes #105
DifferentialOrange added a commit that referenced this issue Mar 31, 2022
Before this patch, both bytes and str were encoded as mp_str. It was
possible to work with utf and non-utf strings, but not with
varbinary [1] (mp_bin). This patch adds varbinary support for Python 3
by default. Python 2 connector behavior remains the same.

For encoding="utf-8" (default), the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_bin (varbinary) -> bytes
 str      -> mp_str (string)    -> str

For encoding=None, the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_str (string)    -> bytes

This patch changes current behavior for Python 3. Now bytes objects
encoded to varbinary by default. bytes objects are also supported
as keys.

This patch does not add new restrictions (like "do not permit to use
str in encoding=None mode because result may be confusing") to preserve
current behavior (for example, using space name as str in schema
get_space).

1. tarantool/tarantool#4201

Closes #105
DifferentialOrange added a commit that referenced this issue Mar 31, 2022
Before this patch, both bytes and str were encoded as mp_str. It was
possible to work with utf and non-utf strings, but not with
varbinary [1] (mp_bin). This patch adds varbinary support for Python 3
by default. Python 2 connector behavior remains the same.

For encoding="utf-8" (default), the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_bin (varbinary) -> bytes
 str      -> mp_str (string)    -> str

For encoding=None, the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_str (string)    -> bytes

This patch changes current behavior for Python 3. Now bytes objects
encoded to varbinary by default. bytes objects are also supported
as keys.

This patch does not add new restrictions (like "do not permit to use
str in encoding=None mode because result may be confusing") to preserve
current behavior (for example, using space name as str in schema
get_space).

1. tarantool/tarantool#4201

Closes #105
DifferentialOrange added a commit that referenced this issue Mar 31, 2022
Before this patch, both bytes and str were encoded as mp_str. It was
possible to work with utf and non-utf strings, but not with
varbinary [1] (mp_bin). This patch adds varbinary support for Python 3
by default. Python 2 connector behavior remains the same.

For encoding="utf-8" (default), the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_bin (varbinary) -> bytes
 str      -> mp_str (string)    -> str

For encoding=None, the following behavior is expected now:
(Python 3 -> Tarantool          -> Python 3)
 bytes    -> mp_str (string)    -> bytes
 str      -> mp_str (string)
             mp_bin (string)    -> bytes

This patch changes current behavior for Python 3. Now bytes objects
encoded to varbinary by default. bytes objects are also supported
as keys.

This patch does not add new restrictions (like "do not permit to use
str in encoding=None mode because result may be confusing") to preserve
current behavior (for example, using space name as str in schema
get_space).

1. tarantool/tarantool#4201

Closes #105
DifferentialOrange added a commit that referenced this issue Apr 1, 2022
This is a breaking change.

Before this patch, both bytes and str were encoded as mp_str. It was
possible to work with utf and non-utf strings, but not with
varbinary (mp_bin) [1]. This patch adds varbinary support for Python 3
by default. Python 2 connector behavior remains the same.

Before this patch:

* encoding="utf-8" (default)

    Python 3 -> Tarantool          -> Python 3
    str      -> mp_str (string)    -> str
    bytes    -> mp_str (string)    -> str
                mp_bin (varbinary) -> bytes

* encoding=None

    Python 3 -> Tarantool          -> Python 3
    bytes    -> mp_str (string)    -> bytes
    str      -> mp_str (string)    -> bytes
                mp_bin (varbinary) -> bytes

Using bytes as key was not supported by several methods (delete,
update, select).

After this patch:

* encoding="utf-8" (default)

    Python 3 -> Tarantool          -> Python 3
    str      -> mp_str (string)    -> str
    bytes    -> mp_bin (varbinary) -> bytes

* encoding=None

    Python 3 -> Tarantool          -> Python 3
    bytes    -> mp_str (string)    -> bytes
    str      -> mp_str (string)    -> bytes
                mp_bin (varbinary) -> bytes

Using bytes as key are now supported by all methods.

Thus, encoding="utf-8" connection may be used to work with
utf-8 strings and varbinary and encodine=None connection
may be used to work with non-utf-8 strings.

This patch does not add new restrictions (like "do not permit to use
str in encoding=None mode because result may be confusing") to preserve
current behavior (for example, using space name as str in schema
get_space).

1. tarantool/tarantool#4201

Closes #105
DifferentialOrange added a commit that referenced this issue Apr 2, 2022
This is a breaking change.

Before this patch, both bytes and str were encoded as mp_str. It was
possible to work with utf and non-utf strings, but not with
varbinary (mp_bin) [1]. This patch adds varbinary support for Python 3
by default. Python 2 connector behavior remains the same.

Before this patch:

* encoding="utf-8" (default)

    Python 3 -> Tarantool          -> Python 3
    str      -> mp_str (string)    -> str
    bytes    -> mp_str (string)    -> str
                mp_bin (varbinary) -> bytes

* encoding=None

    Python 3 -> Tarantool          -> Python 3
    bytes    -> mp_str (string)    -> bytes
    str      -> mp_str (string)    -> bytes
                mp_bin (varbinary) -> bytes

Using bytes as key was not supported by several methods (delete,
update, select).

After this patch:

* encoding="utf-8" (default)

    Python 3 -> Tarantool          -> Python 3
    str      -> mp_str (string)    -> str
    bytes    -> mp_bin (varbinary) -> bytes

* encoding=None

    Python 3 -> Tarantool          -> Python 3
    bytes    -> mp_str (string)    -> bytes
    str      -> mp_str (string)    -> bytes
                mp_bin (varbinary) -> bytes

Using bytes as key are now supported by all methods.

Thus, encoding="utf-8" connection may be used to work with
utf-8 strings and varbinary and encodine=None connection
may be used to work with non-utf-8 strings.

This patch does not add new restrictions (like "do not permit to use
str in encoding=None mode because result may be confusing") to preserve
current behavior (for example, using space name as str in schema
get_space).

1. tarantool/tarantool#4201

Closes #105
DifferentialOrange added a commit that referenced this issue Apr 4, 2022
This is a breaking change.

Before this patch, both bytes and str were encoded as mp_str. It was
possible to work with utf and non-utf strings, but not with
varbinary (mp_bin) [1]. This patch adds varbinary support for Python 3
by default. With Python 2 the behavior of the connector remains
the same.

Before this patch:

* encoding="utf-8" (default)

    Python 3 -> Tarantool          -> Python 3
    str      -> mp_str (string)    -> str
    bytes    -> mp_str (string)    -> str
                mp_bin (varbinary) -> bytes

* encoding=None

    Python 3 -> Tarantool          -> Python 3
    bytes    -> mp_str (string)    -> bytes
    str      -> mp_str (string)    -> bytes
                mp_bin (varbinary) -> bytes

Using bytes as key was not supported by several methods (delete,
update, select).

After this patch:

* encoding="utf-8" (default)

    Python 3 -> Tarantool          -> Python 3
    str      -> mp_str (string)    -> str
    bytes    -> mp_bin (varbinary) -> bytes

* encoding=None

    Python 3 -> Tarantool          -> Python 3
    bytes    -> mp_str (string)    -> bytes
    str      -> mp_str (string)    -> bytes
                mp_bin (varbinary) -> bytes

Using bytes as key are now supported by all methods.

Thus, encoding="utf-8" connection may be used to work with
utf-8 strings and varbinary and encodine=None connection
may be used to work with non-utf-8 strings.

This patch does not add new restrictions (like "do not permit to use
str in encoding=None mode because result may be confusing") to preserve
current behavior (for example, using space name as str in schema
get_space).

1. tarantool/tarantool#4201

Closes #105
DifferentialOrange added a commit that referenced this issue Apr 4, 2022
This is a breaking change.

Before this patch, both bytes and str were encoded as mp_str. It was
possible to work with utf and non-utf strings, but not with
varbinary (mp_bin) [1]. This patch adds varbinary support for Python 3
by default. With Python 2 the behavior of the connector remains
the same.

Before this patch:

* encoding="utf-8" (default)

    Python 3 -> Tarantool          -> Python 3
    str      -> mp_str (string)    -> str
    bytes    -> mp_str (string)    -> str
                mp_bin (varbinary) -> bytes

* encoding=None

    Python 3 -> Tarantool          -> Python 3
    bytes    -> mp_str (string)    -> bytes
    str      -> mp_str (string)    -> bytes
                mp_bin (varbinary) -> bytes

Using bytes as key was not supported by several methods (delete,
update, select).

After this patch:

* encoding="utf-8" (default)

    Python 3 -> Tarantool          -> Python 3
    str      -> mp_str (string)    -> str
    bytes    -> mp_bin (varbinary) -> bytes

* encoding=None

    Python 3 -> Tarantool          -> Python 3
    bytes    -> mp_str (string)    -> bytes
    str      -> mp_str (string)    -> bytes
                mp_bin (varbinary) -> bytes

Using bytes as key are now supported by all methods.

Thus, encoding="utf-8" connection may be used to work with
utf-8 strings and varbinary and encodine=None connection
may be used to work with non-utf-8 strings.

This patch does not add new restrictions (like "do not permit to use
str in encoding=None mode because result may be confusing") to preserve
current behavior (for example, using space name as str in schema
get_space).

1. tarantool/tarantool#4201

Closes #105
DifferentialOrange added a commit to tarantool/doc that referenced this issue May 11, 2022
Since the release of tarantool-python 0.8.0 [1] several things has
changed.

* Issue tarantool/tarantool-python#105 has been fixed [2].
* CI has been migrated to GitHub Actions [3].
* New connection pool (ConnectionPool) with master discovery
  was introduced [4].
* old connection pool (MeshConnection) with round-robin failover was
  deprecated [4].

These changes together with GitHub stars update are introduced with this
patch.

1. https://github.com/tarantool/tarantool-python/releases/tag/0.8.0
2. tarantool/tarantool-python#211
3. tarantool/tarantool-python#213
4. tarantool/tarantool-python#207
DifferentialOrange added a commit to tarantool/doc that referenced this issue May 11, 2022
Since the release of tarantool-python 0.8.0 [1] several things has
changed.

* Issue tarantool/tarantool-python#105 has been fixed [2].
* CI has been migrated to GitHub Actions [3].
* New connection pool (ConnectionPool) with master discovery
  was introduced [4].
* old connection pool (MeshConnection) with round-robin failover was
  deprecated [4].

These changes together with GitHub stars update are introduced with this
patch.

1. https://github.com/tarantool/tarantool-python/releases/tag/0.8.0
2. tarantool/tarantool-python#211
3. tarantool/tarantool-python#213
4. tarantool/tarantool-python#207
patiencedaur pushed a commit to tarantool/doc that referenced this issue May 16, 2022
Since the release of tarantool-python 0.8.0 [1] several things has
changed.

* Issue tarantool/tarantool-python#105 has been fixed [2].
* CI has been migrated to GitHub Actions [3].
* New connection pool (ConnectionPool) with master discovery
  was introduced [4].
* old connection pool (MeshConnection) with round-robin failover was
  deprecated [4].

These changes together with GitHub stars update are introduced with this
patch.

1. https://github.com/tarantool/tarantool-python/releases/tag/0.8.0
2. tarantool/tarantool-python#211
3. tarantool/tarantool-python#213
4. tarantool/tarantool-python#207
patiencedaur added a commit to tarantool/doc that referenced this issue May 16, 2022
* Update python connector comparison table

Since the release of tarantool-python 0.8.0 [1] several things has
changed.

* Issue tarantool/tarantool-python#105 has been fixed [2].
* CI has been migrated to GitHub Actions [3].
* New connection pool (ConnectionPool) with master discovery
  was introduced [4].
* old connection pool (MeshConnection) with round-robin failover was
  deprecated [4].

These changes together with GitHub stars update are introduced with this
patch.

1. https://github.com/tarantool/tarantool-python/releases/tag/0.8.0
2. tarantool/tarantool-python#211
3. tarantool/tarantool-python#213
4. tarantool/tarantool-python#207

* Update translation

Co-authored-by: Patience Daur <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants