Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special characters not quoted in BYTEA columns #1

Open
iustin opened this issue Jan 7, 2011 · 8 comments
Open

Special characters not quoted in BYTEA columns #1

iustin opened this issue Jan 7, 2011 · 8 comments

Comments

@iustin
Copy link

iustin commented Jan 7, 2011

Hi,

It seems that the BYTEA/ByteString column type doesn't quote "special" characters properly, which results in silent data truncation at the first NULL character.

Sample table:

create table test (name bytea);

Sample program:

import Database.HDBC.PostgreSQL
import Database.HDBC
import Data.ByteString

main = do
  db <- connectPostgreSQL "dbname=debug"
  stmt <- prepare db "INSERT INTO test (name) VALUES($1)"
  execute stmt [toSql $ pack [0]]
  execute stmt [toSql $ pack [65, 0, 66]]
  commit db

The statements as received by the postgres server are:

2011-01-07 16:56:04 CET LOG:  execute <unnamed>: INSERT INTO test (name) VALUES($1)
2011-01-07 16:56:04 CET DETAIL:  parameters: $1 = ''
2011-01-07 16:56:04 CET LOG:  execute <unnamed>: INSERT INTO test (name) VALUES($1)
2011-01-07 16:56:04 CET DETAIL:  parameters: $1 = 'A'

As you can see, the bindings do not correctly escape the NULL bytes and this results in silent truncation. The proper escaping rules for BYTEA columns are documented here: http://www.postgresql.org/docs/8.4/static/datatype-binary.html

thanks,
iustin

@jgoerzen
Copy link
Member

jgoerzen commented Jan 7, 2011

Is it valid to escape all of these even when not going into a BYTEA column? I may need to go back and look at the code, but I believe that we do not always know whether a BYTEA value is expected.

@iustin
Copy link
Author

iustin commented Jan 7, 2011

The escaping seems to be accepted:

debug=# SELECT E'\\000'::bytea;
 bytea 
-------
 \000
(1 row)

debug=# SELECT E'\\000'::varchar;
 varchar 
---------
 \000
(1 row)

So I don't think we'll have more breakage by this. On the other hand, I'm suprised varchar supports null chars :)

@iustin
Copy link
Author

iustin commented Jan 10, 2011

An alternative to "quoting everywhere" would be to add such quoting to ByteString fields (on the haskell side), and disallow NULLs in plain Strings.

@jgoerzen
Copy link
Member

I have made commit 6cc0578 that I believe will fix this for you. Please test and let me know. It is rather unclear what level of binary escaping is needed using the C API vs. using psql. Also, it is unclear whether unesaping will have to be done on data come back the other way around. Let me know.

Thanks,

-- John

@lpsmith
Copy link
Contributor

lpsmith commented Jan 17, 2011

Incidentally, I know that it is possible to efficiently transfer large binary blobs to and from Postgres using libpq without any quoting. I'd have to dig through the docs to figure out how though.

@jgoerzen
Copy link
Member

I would be happy to know what you find out. I had thought I already was doing this, but apparently not.

@lpsmith
Copy link
Contributor

lpsmith commented Jan 18, 2011

Ok, I've uploaded an example of what needs to happen here.

@soenkehahn
Copy link
Contributor

Reading a ByteString from a bytea field yields the binary data in "bytea Hex Format" as described here: http://www.postgresql.org/docs/9.1/static/datatype-binary.html. This is probably not what one wants. Our fork (https://github.com/soenkehahn/hdbc-postgresql) fixes this by converting bytestrings to "Hex Format" before writing them to the DB and decoding the format when reading.
The downside to our current implementation is that you cannot use ByteStrings to write to any other fields than bytea fields (e.g. VARCHAR). This will probably break a lot of code. We think however that this restriction might be a good thing: ByteString is a type for binary data. For textual data SqlString should be used.

And comments on that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants