COPY ... FROM STDIN writes incorrect data for BYTES/BYTEA columns #100299
Labels
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
O-community
Originated from the community
T-sql-queries
SQL Queries Team
X-blathers-triaged
blathers was able to find an owner
Describe the problem
COPY ... FROM STDIN
ofBYTES
/BYTEA
columns displays different behaviour from postgres, and does not roundtrip correctly. The data written is a literal hex escaped form of the underlying bytes. For example,b'aaaa'
is roundtripped asb'\\x61616161'
This sample is reproduced with the Python
psycopg3
client driver, but I have confirmed that the same behaviour is seen with manual SQL usage.To Reproduce
repro.py
, which shows the behaviour of trying to write data withCOPY
vsINSERT
Postgres
Run database
Show expected behaviour - bytes roundtrip correctly.
Cockroach
Run database
Show incorrect behaviour - bytes are escaped and do not roundtrip correctly.
Expected behavior
Identical behaviour between cockroach/postgres backends.
Bytes data should be roundtripped, and not returned in escaped format.
Environment:
Cockroach running in docker under Ubuntu 20.04.
Python 3.8.10 with the following dependencies:
Additional context
Impact: cannot load data using
COPY ... FROM STDIN
as documented withBYTEA
column type. This makes copying bulk data fail, which makes bulk inserts as documented here incorrect: https://www.cockroachlabs.com/docs/v22.2/copy-fromI do not believe this use case in unsupported (it does not appear in the unsupported syntax here: https://www.cockroachlabs.com/docs/v22.2/copy-from#unsupported-syntax)
Jira issue: CRDB-26346
The text was updated successfully, but these errors were encountered: