Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with set_type_codec and array of json/jsonb #623

Closed
CaselIT opened this issue Sep 22, 2020 · 8 comments
Closed

Problem with set_type_codec and array of json/jsonb #623

CaselIT opened this issue Sep 22, 2020 · 8 comments

Comments

@CaselIT
Copy link

CaselIT commented Sep 22, 2020

  • asyncpg version: 0.21
  • PostgreSQL version: PostgreSQL 12.4 (Debian 12.4-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit (docker image)
  • Do you use a PostgreSQL SaaS? If so, which? Can you reproduce
    the issue with a local PostgreSQL install?
    : no
  • Python version: Python 3.8.5
  • Platform: windows 10. Same happen on linux
  • Do you use pgbouncer?: no
  • Did you install asyncpg with pip?: yes
  • If you built asyncpg locally, which version of Cython did you use?: no
  • Can the issue be reproduced under both asyncio and
    uvloop?
    : not tried

I think there is an issue in how asyncpg handles the json input data when an user sets a type codec for json and/or jsonb and the column type is JSON[] or JSONB[].
For the type json[] there is actually a working configuration, by setting the format='binary' when setting the type codec, but the same configuration fails on jsonb
This came up while trying to normalize the behavior of all pg dbapi for sqlalchemy sqlalchemy/sqlalchemy#5584

Below is a reproduction case that uses only asyncpg, without sqlalchemy.

import asyncio
import asyncpg
import json
async def connect():
    return await asyncpg.connect(user="scott", password="tiger", database="test", host="127.0.0.1")
async def setup():
    conn = await connect()
    await conn.execute("DROP TABLE IF EXISTS json_table")
    await conn.execute("DROP TABLE IF EXISTS jsonb_table")
    await conn.execute("CREATE TABLE json_table (id SERIAL, js_col JSON[])")
    await conn.execute("CREATE TABLE jsonb_table (id SERIAL, js_col JSONB[])")
    await conn.close()
js_insert = "INSERT INTO json_table(js_col) VALUES (cast($1 as json[]))"
jsb_insert = "INSERT INTO jsonb_table(js_col) VALUES (cast($1 as jsonb[]))"
js_select = "select js_col from json_table order by id"
jsb_select = "select js_col from jsonb_table order by id"
values = [(['"foo"'],), (["22"],), (["null"],), (None,), (["[2]"],)]
async def go(conn, jsonb, text):
    i, s = (jsb_insert, jsb_select) if jsonb else (js_insert, js_select)
    try:
        await conn.executemany(i, values)
        v = list(await conn.fetch(s))
        # print(v)
        print(text, jsonb, "ok")
    except Exception as e:
        print(text, jsonb, "fail", e)
    finally:
        await conn.close()

async def without_codec(jsonb):
    conn = await connect()
    await go(conn, jsonb, "without_codec")

async def with_codec(jsonb):
    conn = await connect()
    for type in ("json", "jsonb"):
        await conn.set_type_codec(type, encoder=lambda v: v, decoder=json.loads, schema="pg_catalog")
    await go(conn, jsonb, "with_codec")

async def with_codec_binary(jsonb):
    conn = await connect()
    for type in ("json", "jsonb"):
        await conn.set_type_codec(type, encoder=str.encode, decoder=json.loads, schema="pg_catalog", format="binary")
    await go(conn, jsonb, "with_codec_binary")

asyncio.run(setup())
for fn in (without_codec, with_codec, with_codec_binary):
    print(fn)
    asyncio.run(fn(False))
    asyncio.run(fn(True))

The output I get is:

~~ without_codec
without_codec json ok
without_codec jsonb ok
~~ with_codec
with_codec json fail. Error: invalid input syntax for type json
DETAIL:  Token "NULL" is invalid.
with_codec jsonb fail. Error: invalid input syntax for type json
DETAIL:  Token "NULL" is invalid.
~~ with_codec_binary
with_codec_binary json ok
with_codec_binary jsonb fail. Error: unsupported jsonb version number 34

When the test works the data that is inserted is correct.
The expected result is that all cases work, or that at least a set_codec configuration for both json and jsonb is functioning.

cc @fantix

@elprans
Copy link
Member

elprans commented Sep 22, 2020

This is because the binary I/O format for jsonb is not the same as json: you must preface the JSON string with a "version" byte, currently always \x01.

@elprans elprans closed this as completed Sep 22, 2020
@CaselIT
Copy link
Author

CaselIT commented Sep 22, 2020

Some documentation regarding this would maybe be helpful...

Also the non binary case does not work for both json and jsonb versions...

In any case it's really strange from the user prospective that setting the decoder changed the behavior of the library.

@CaselIT
Copy link
Author

CaselIT commented Sep 22, 2020

Since the problem is clearly the encoder overriding, is is possible to only override the decoder?

Could this be considered as a request?

@elprans
Copy link
Member

elprans commented Sep 22, 2020

Some documentation regarding this would maybe be helpful...

Documenting wire formats for PostgreSQL data types is arguably not in the scope of user documentation, considering that even PostgreSQL doesn't document them.

Also the non binary case does not work for both json and jsonb versions...

It doesn't work because you need to account for NULL values correctly and return None from the encoder instead of "NULL", which is invalid JSON. I see this should be better documented.

In any case it's really strange from the user prospective that setting the decoder changed the behavior of the library.

I'm not sure what you mean. Isn't the purpose of set_type_codec exactly to change how data is encoded and decoded?

Could this be considered as a request?

I guess we can add some magic constant like asyncpg.DEFAULT_DECODER and asyncpg.DEFAULT_ENCODER to indicate that builtin implementation should be used for that direction. Feel free to send a patch.

@CaselIT
Copy link
Author

CaselIT commented Sep 22, 2020

Documenting wire formats for PostgreSQL data types is arguably not in the scope of user documentation, considering that even PostgreSQL doesn't document them.

The documentation could just mention that asyncpg sends the received value directly to PostgreSQL, so the wire format for pg should be returned by the function when format=binary

It doesn't work because you need to account for NULL values correctly and return None from the encoder instead of "NULL", which is invalid JSON. I see this should be better documented.

the string 'null' is very much valid json and it's handled correctly by the builtin type codec. It's the result of json.dumps(None) that is different from passing None: one sets a value in the column of json-null, the other sets the column to sql null.

I'm not sure what you mean. Isn't the purpose of set_type_codec exactly to change how data is encoded and decoded?

Yes, but an user could only want to change one of the two and not necessary both at the same time

@CaselIT
Copy link
Author

CaselIT commented Sep 23, 2020

It doesn't work because you need to account for NULL values correctly and return None from the encoder instead of "NULL", which is invalid JSON. I see this should be better documented.

I've tried this and returning None when the input is 'null', other than preventing inserting json null, fails with the error
Error: invalid input for query argument $1: ['null'] (invalid array element: expected str, got NoneType) or for a json column (ie not an array of json) the error is Error: invalid input for query argument $1: 'null' (expected str, got NoneType).

The encoder I tried is encoder=lambda v: v if v =='null' else None

Maybe I did not understand what you meant. Could you provide an example?

Thanks

@fantix
Copy link
Member

fantix commented Sep 25, 2020

It doesn't work because you need to account for NULL values correctly and return None from the encoder instead of "NULL", which is invalid JSON. I see this should be better documented.

The encoder I tried is encoder=lambda v: v if v =='null' else None

Maybe I did not understand what you meant. Could you provide an example?

I tried exactly this and ran into the same error. Guess I'm confused in the same way.

Following deeper into the array encoder, I found this for #82:

if not apg_strcasecmp_char(elem_str, b'NULL'):
array_data.write_bytes(b'"NULL"')

And the following patch fixed @CaselIT 's text codec use cases:

diff --git a/asyncpg/protocol/codecs/array.pyx b/asyncpg/protocol/codecs/array.pyx
index e975b9f..25ae254 100644
--- a/asyncpg/protocol/codecs/array.pyx
+++ b/asyncpg/protocol/codecs/array.pyx
@@ -209,7 +209,9 @@ cdef _write_textarray_data(ConnectionSettings settings, object obj,

                 try:
                     if not apg_strcasecmp_char(elem_str, b'NULL'):
-                        array_data.write_bytes(b'"NULL"')
+                        array_data.write_bytes(b'"')
+                        array_data.write_cstr(elem_str, 4)
+                        array_data.write_bytes(b'"')
                     else:
                         quoted_elem_len = elem_len
                         need_quoting = False

I'm only not certain about if this is appropriate for all scenarios. cc @elprans

@elprans
Copy link
Member

elprans commented Sep 25, 2020

Yeah, good catch @fantix, please submit a PR.

fantix added a commit to fantix/asyncpg that referenced this issue Sep 25, 2020
When given a textual json codec that yields 'null', the array
encoder should generate b'["null"]' instead of b'["NULL"]'
for a JSON[] type.

Refs: MagicStack#623
elprans pushed a commit that referenced this issue Sep 25, 2020
When given a textual json codec that yields 'null', the array
encoder should generate b'["null"]' instead of b'["NULL"]'
for a JSON[] type.

Refs: #623
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants