Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgtests test_emoticons_as_literal fails #1004

Closed
bkline opened this issue Jan 4, 2022 · 16 comments · Fixed by #1005
Closed

pgtests test_emoticons_as_literal fails #1004

bkline opened this issue Jan 4, 2022 · 16 comments · Fixed by #1005

Comments

@bkline
Copy link
Contributor

bkline commented Jan 4, 2022

Environment

  • Python: 3.10.1
  • pyodbc: built from current HEAD of master branch
  • OS: macOS 11.6.2 (for client) Debian 10.2.1-6 (for server)
  • DB: PostgreSQL 14.1
  • driver: psqlodbcw.so 13.02.0000 (supports ODBC version 03.51)
  • unixODBC 2.3.9

Issue

The test_emoticons_as_literal() test fails for the pgtests.py suite.

======================================================================
FAIL: test_emoticons_as_literal (__main__.PGTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bkline/repos/pyodbc/tests3/pgtests.py", line 625, in test_emoticons_as_literal
    self.assertEqual(result, v)
AssertionError: 'x � z' != 'x 🌜 z'
- x � z
?   ^
+ x 🌜 z
?   ^

This is similar to the failure reported for another test suite by #536. Interestingly, if I run the tests with the --ansi flag set, this test passes, but the test_output_conversion test fails, because apparently setting ansi=True in the pyodbc.connect() call results for some reason in the return value from the local convert() function not being used (or perhaps that local function isn't invoked at all)

======================================================================
FAIL: test_output_conversion (__main__.PGTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bkline/repos/pyodbc/tests3/pgtests.py", line 671, in test_output_conversion
    self.assertEqual(value, 'X123.45X')
AssertionError: '123.45' != 'X123.45X'
- 123.45
+ X123.45X
? +      +

However, if I comment out the setencoding() call:

diff --git a/tests3/pgtests.py b/tests3/pgtests.py
index dcb03a9..5dfacc9 100755
--- a/tests3/pgtests.py
+++ b/tests3/pgtests.py
@@ -70,7 +70,7 @@ class PGTestCase(unittest.TestCase):
 
         # I've set my test database to use UTF-8 which seems most popular.
         self.cnxn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
-        self.cnxn.setencoding(encoding='utf-8')
+        # self.cnxn.setencoding(encoding='utf-8')
 
         # As of psql 9.5.04 SQLGetTypeInfo returns absurdly small sizes leading
         # to slow writes.  Override them:

... and omit the --ansi flag in the test suite invocation, then all the tests pass (except for the error reported by #1003).

Library: /Users/bkline/repos/pyodbc/build/lib.macosx-11-x86_64-3.10/pyodbc.cpython-310-darwin.so
/Users/bkline/repos/pyodbc/tests3 --> /Users/bkline/repos/pyodbc
/Users/bkline/repos/pyodbc/tests3/pgtests.py:519: DeprecationWarning: PyUnicode_FromUnicode(NULL, size) is deprecated; use PyUnicode_New() instead
  result = str(row)
======================================================================
ERROR: test_columns (__main__.PGTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bkline/repos/pyodbc/tests3/pgtests.py", line 575, in test_columns
    assert row.precision == 3, row.precision
AttributeError: 'pyodbc.Row' object has no attribute 'precision'

----------------------------------------------------------------------
Ran 79 tests in 4.181s

FAILED (errors=1)
@gordthompson
Copy link
Collaborator

Try running the tests with the -v switch, e.g,

$ python3 tests3/pgtests.py -v

That will show details about the test environment, specifically the driver name and version

driver:  psqlodbcw.so 12.01.0000

What does it display for you?

@bkline
Copy link
Contributor Author

bkline commented Jan 4, 2022

Appreciate your following up on these tickets so promptly. Here the verbose output.

% python3 tests3/pgtests.py -t test_emoticons_as_literal -vv
Library: /Users/bkline/repos/pyodbc/build/lib.macosx-11-x86_64-3.10/pyodbc.cpython-310-darwin.so
/Users/bkline/repos/pyodbc/tests3 --> /Users/bkline/repos/pyodbc
python:  3.10.1 (main, Dec  6 2021, 23:20:29) [Clang 13.0.0 (clang-1300.0.29.3)]
pyodbc:  4.0.33b6 /Users/bkline/repos/pyodbc/build/lib.macosx-11-x86_64-3.10/pyodbc.cpython-310-darwin.so
odbc:    03.52
driver:  psqlodbcw.so 13.02.0000
         supports ODBC version 03.51
os:      Darwin
unicode: Py_Unicode=4 SQLWCHAR=2
Max VARCHAR = 255
Max WVARCHAR = 255
Max BINARY = (not supported)
test_emoticons_as_literal (__main__.PGTestCase) ... FAIL

======================================================================
FAIL: test_emoticons_as_literal (__main__.PGTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bkline/repos/pyodbc/tests3/pgtests.py", line 625, in test_emoticons_as_literal
    self.assertEqual(result, v)
AssertionError: 'x � z' != 'x 🌜 z'
- x � z
?   ^
+ x 🌜 z
?   ^


----------------------------------------------------------------------
Ran 1 test in 0.066s

FAILED (failures=1)

@gordthompson
Copy link
Collaborator

Hmmm. As in the case of #1003 this one works for me with Python 3.10 as well.

$ python3.10 tests3/pgtests.py -t test_emoticons_as_literal -vv
Library: /home/gord/git/pyodbc/build/lib.linux-x86_64-3.10/pyodbc.cpython-310-x86_64-linux-gnu.so
/home/gord/git/pyodbc/tests3 --> /home/gord/git/pyodbc
python:  3.10.1 (main, Dec 21 2021, 17:46:38) [GCC 9.3.0]
pyodbc:  4.0.33b7+test_with_dsn /home/gord/git/pyodbc/build/lib.linux-x86_64-3.10/pyodbc.cpython-310-x86_64-linux-gnu.so
odbc:    03.52
driver:  psqlodbcw.so 12.01.0000
         supports ODBC version 03.51
os:      Linux
unicode: Py_Unicode=4 SQLWCHAR=2
Max VARCHAR = 255
Max WVARCHAR = 255
Max BINARY = (not supported)
test_emoticons_as_literal (__main__.PGTestCase) ... ok

----------------------------------------------------------------------
Ran 1 test in 1.813s

OK

@bkline
Copy link
Contributor Author

bkline commented Jan 4, 2022

On Windows, too, apparently (given that my recent PR passed all the pipeline tests). So the tests are just broken on macOS (at least with the versions of the drivers and unixODBC that I'm running).

@bkline
Copy link
Contributor Author

bkline commented Jan 4, 2022

Given the recent comments in #1003 it doesn't seem unreasonable to wonder if the fact that we're running different versions of the psqlodbc driver is a more likely explanation of the discrepancy between our results for this test, than the different operating systems on which we're running. I'll hop onto a Linux system and run some tests to see if I can confirm (or debunk) this theory.

@bkline
Copy link
Contributor Author

bkline commented Jan 5, 2022

Rolled back psqlodbc temporarily to 12.01.0000 on my macOS system, but the test still fails unless I comment out the line self.cnxn.setencoding(encoding='utf-8') or set the --ansi flag. I did find this bug report which @keitherskine filed with pgsql-bugs, but as far as I can tell his report was ignored.

Testing on a Linux system, also running psqlodbcw.so 12.01.0000, the test passes (matching your results). Also, all of the tests pass on that system with the setencoding() call commented out, and that's true of the macOS system running with the same driver version. I don't have access to a Windows test box, but it might be useful to know how the tests come out on windows without the call to setencoding(). If they pass, it might be tempting to fix the failing test by just letting the tests run with the pyodbc encoding defaults.

It would be even nicer if we understood what the underlying cause of the difference in behavior is when the encoding default is overridden. The only difference I can see (after pointing both systems to the same PostgreSQL server) is that the macOS client is using unixODBC 2.3.9 and the Linux system is running unixODBC 2.3.7. Unfortunately, attempts to build a version of unixODBC which matched the version installed on the other client ran into too much of a dependency swamp to be feasible on either system.

@gordthompson
Copy link
Collaborator

On Windows with 12.01.0000:

PS C:\Users\Gord\git\pyodbc> mypython3-64 tests3/pgtests.py -t test_emoticons_as_literal -vv
Library: C:\Users\Gord\git\pyodbc\build\lib.win-amd64-3.9\pyodbc.cp39-win_amd64.pyd
C:\Users\Gord\git\pyodbc\tests3 --> C:\Users\Gord\git\pyodbc
python:  3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)]
pyodbc:  4.0.33b6 C:\Users\Gord\git\pyodbc\build\lib.win-amd64-3.9\pyodbc.cp39-win_amd64.pyd
odbc:    03.80.0000
driver:  PSQLODBC35W.DLL 12.01.0000
         supports ODBC version 03.51
os:      Windows
unicode: Py_Unicode=2 SQLWCHAR=2
Max VARCHAR = 255
Max WVARCHAR = 255
Max BINARY = (not supported)
         8.1 6.3.9600 SP0 Multiprocessor Free
test_emoticons_as_literal (__main__.PGTestCase) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.311s

OK

@gordthompson
Copy link
Collaborator

On Windows with 12.01.0000 and setencoding() disabled in pgtests.py:

PS C:\Users\Gord\git\pyodbc> mypython3-64 tests3/pgtests.py -t test_emoticons_as_literal -vv
Library: C:\Users\Gord\git\pyodbc\build\lib.win-amd64-3.9\pyodbc.cp39-win_amd64.pyd
C:\Users\Gord\git\pyodbc\tests3 --> C:\Users\Gord\git\pyodbc
python:  3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)]
pyodbc:  4.0.33b6 C:\Users\Gord\git\pyodbc\build\lib.win-amd64-3.9\pyodbc.cp39-win_amd64.pyd
odbc:    03.80.0000
driver:  PSQLODBC35W.DLL 12.01.0000
         supports ODBC version 03.51
os:      Windows
unicode: Py_Unicode=2 SQLWCHAR=2
Max VARCHAR = 255
Max WVARCHAR = 255
Max BINARY = (not supported)
         8.1 6.3.9600 SP0 Multiprocessor Free
test_emoticons_as_literal (__main__.PGTestCase) ... FAIL

======================================================================
FAIL: test_emoticons_as_literal (__main__.PGTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\Gord\git\pyodbc\tests3\pgtests.py", line 625, in test_emoticons_as_literal
    self.assertEqual(result, v)
AssertionError: 'x ?? z' != 'x 🌜 z'
- x ?? z
?   ^^
+ x 🌜 z
?   ^


----------------------------------------------------------------------
Ran 1 test in 0.149s

FAILED (failures=1)

@bkline
Copy link
Contributor Author

bkline commented Jan 5, 2022

Well, that's a bummer. So with some stacks it fails if you leave the encoding at the default, and with others it fails if you override the default encoding. Makes for a fragile test suite. 😒

@bkline
Copy link
Contributor Author

bkline commented Jan 5, 2022

I suppose one possibility would be to make the overriding of the default encoding dynamically based on the OS.

@bkline
Copy link
Contributor Author

bkline commented Jan 6, 2022

Here's a repro of the test (pulled out into a separate script) failing on Linux with Python 3.9.7, pyodbc 4.0.32, and version 13.00.0000 of psqlodbcw.so.

pyodbc-issue1004.tar.gz

tar -xf pyodbc-issue1004.tar.gz
cd issue1004
docker compose build
docker compose up -d
docker logs issue1004-repro-1
docker compose down

Output:

============================================================
with default encoding
============================================================
python:  3.9.7 (default, Sep 10 2021, 14:59:43) [GCC 11.2.0]
pyodbc:  4.0.32
odbc:    03.52
driver:  psqlodbcw.so 13.00.0000
         supports ODBC version 03.51
os:      Linux
unicode: Py_Unicode=4 SQLWCHAR=2
------------------------------------------------------------
'x 🌜 z' matches the expected value 'x 🌜 z'

============================================================
with encoding overridden
============================================================
python:  3.9.7 (default, Sep 10 2021, 14:59:43) [GCC 11.2.0]
pyodbc:  4.0.32
odbc:    03.52
driver:  psqlodbcw.so 13.00.0000
         supports ODBC version 03.51
os:      Linux
unicode: Py_Unicode=4 SQLWCHAR=2
------------------------------------------------------------
'x ð\x9f\x8c\x9c z' does not match the expected value 'x 🌜 z'

@bkline
Copy link
Contributor Author

bkline commented Jan 6, 2022

I found a Windows machine to test on.

============================================================
with default encoding
============================================================
python:  3.9.0 (tags/v3.9.0:9cf6752, Oct 5 2020, 15:34:40) [MSC v.1927 64 bit (AMD64)]
pyodbc:  4.0.32
odbc:    03.80.0000
driver:  PSQLODBC35W.DLL 13.02.0000
         supports ODBC version 03.51
os:      Windows 10 10.0.19041 SP0 Multiprocessor Free
unicode: Py_Unicode=2 SQLWCHAR=2
------------------------------------------------------------
'x 🌜 z' matches the expected value 'x 🌜 z'

============================================================
with encoding overridden
============================================================
python:  3.9.0 (tags/v3.9.0:9cf6752, Oct 5 2020, 15:34:40) [MSC v.1927 64 bit (AMD64)]
pyodbc:  4.0.32
odbc:    03.80.0000
driver:  PSQLODBC35W.DLL 13.02.0000
         supports ODBC version 03.51
os:      Windows 10 10.0.19041 SP0 Multiprocessor Free
unicode: Py_Unicode=2 SQLWCHAR=2
------------------------------------------------------------
'x 🌜 z' does not match the expected value 'x 🌜 z'

@gordthompson
Copy link
Collaborator

gordthompson commented Jan 6, 2022

When you try to do the insert from your Mac with .setencoding() and .setdecoding(), e.g.,

cnxn.setdecoding(pyodbc.SQL_CHAR, encoding="utf-8")
cnxn.setdecoding(pyodbc.SQL_WCHAR, encoding="utf-8")
cnxn.setencoding(encoding="utf-8")

crsr = cnxn.cursor()

v = "x \U0001F31C z"

crsr.execute("CREATE TABLE t1 (s varchar(100))")
crsr.execute(f"insert into t1 values ('{v}')")

and then check the table from a separate tool like pgAdmin_4 does the inserted value look correct?

Edit:

Both pgAdmin_4 and DBeaver appear to render the column contents as x 🌜 z, which are the correct bytes for UTF-8 encoding but not the Unicode character itself. 🙍‍♂️

@bkline
Copy link
Contributor Author

bkline commented Jan 6, 2022

The inserted character is not correct, so the problem happens going into the DB, not coming out. Sort of what I expected, given the semantics of "encode" and "decode" (commenting out just the setencoding() line was sufficient to get the value stored correctly).

#!/usr/bin/env python3

"""Store values with high Unicode characters in PostgreSQL database.
"""

import argparse
import pyodbc

VALUE = "x \U0001F31C z"

parser = argparse.ArgumentParser()
parser.add_argument("--connection-string", "-c", required=True)
opts = parser.parse_args()
conn = pyodbc.connect(opts.connection_string)
conn.setdecoding(pyodbc.SQL_CHAR, encoding="utf-8")
conn.setdecoding(pyodbc.SQL_WCHAR, encoding="utf-8")
conn.setencoding(encoding="utf-8")
cursor = conn.cursor()
for table_name in ("t1", "t2"):
    cursor.execute(f"CREATE TABLE {table_name} (s VARCHAR(100))")
cursor.execute("INSERT INTO t1 VALUES (?)", (VALUE,))
cursor.execute("INSERT INTO t2 VALUES ('{VALUE}')")
conn.commit()
print("done")

image

@gordthompson
Copy link
Collaborator

Okay, for me the test passes on both Windows and Linux if I comment out both the setencoding() and setdecoding() in tests3/pgtests.py:

pyodbc/tests3/pgtests.py

Lines 72 to 73 in b1cfb75

self.cnxn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
self.cnxn.setencoding(encoding='utf-8')

So if that also passes on Mac then we can just remove those lines.

@bkline
Copy link
Contributor Author

bkline commented Jan 6, 2022

The only failing test with that modification of the script on macOS is test_columns() (addressed in #1003). It's sort of a nice bonus that the correct behavior across the platforms is achieved using the package's defaults. That feels right. 👍

(I'm learning all sorts of new tricks with these recent tickets. TIL about extending a permalink in GitHub to a range. 😂 )

gordthompson added a commit to gordthompson/pyodbc that referenced this issue Jan 6, 2022
v-makouz pushed a commit to v-makouz/pyodbc that referenced this issue Sep 9, 2022
* Add support for Python 3.10, drop EOL 3.5 (mkleehammer#952)

* Remove duplicate entry in pyi stub (mkleehammer#979)

* Replace deprecated SafeConfigParser with ConfigParser (mkleehammer#953)

* Designate connection string as optional (mkleehammer#987)

* Fix spelling typos (mkleehammer#985)

Co-authored-by: Gord Thompson <[email protected]>

* Fix for DSN Names with non-ASCII chars (mkleehammer#951)

* Fix for DSN Names with non-ASCII chars

Fixes: mkleehammer#948

Co-authored-by: bamboo <[email protected]>
Co-authored-by: Gord Thompson <[email protected]>

* Added InterfaceError to pyodbc.pyi. (mkleehammer#1013)

Co-authored-by: Benjamin Holder <[email protected]>

* Upgrade deprecated unicode encoding calls (mkleehammer#792)

* Do not include .pyc artifacts in source tarball mkleehammer#742

* Build wheels with cibuildwheels on GitHub Actions

Fixes mkleehammer#175
Ref mkleehammer#688
Closes mkleehammer#668
Closes mkleehammer#685
Fixes mkleehammer#441 and pretty much most issues that mention
` sql.h: No such file or directory`

This also need to setup some PyPI keys for automated uploads.

* Install unixodbc-dev for Linux wheels

* Enable GitHub Actions for pull requests

* Use Debian based `manylinux_2_24` image

* `apt-get` update before installing in wheel build

* Use PEP 440 version name required for wheels

* Skip building 32-bit wheels

* 4.0.dev0 for default version, because test_version() wants 3 parts here

Checked this won't shadow released minor version (credit goes to @hugovk)

    >>> from packaging.version import Version
    >>> Version("4.0.dev0") > Version("4.0.24")
    False

* Had to use Debian image for PyPy too

* Disable PyPy wheels

https://cibuildwheel.readthedocs.io/en/stable/options/#build-selection

PyPy is missing some C functions that `pyodbc` needs.

* Update README.md

* Avoid error when testing with DSN= connection

Fixes: mkleehammer#1000

* Disable setencoding/setdecoding in tests3/pgtests.py

Fixes: mkleehammer#1004

* Adjust test_columns() in tests3/pgtests.py for newer driver versions

Fixes: mkleehammer#1003

* Move driver version check out of function

* Add comment to _get_column_size()

* Fix memory leak with decimal parameters

Fixes: mkleehammer#1026

* Create codeql-analysis.yml

* Bugfix/sql param data memory leak (mkleehammer#703)

* Updated .gitignore

* * Created a test file for the specific scenario

* * Updated doc of test file for the specific SQLParamData scenario

* * Fixed the test file for the specific SQLParamData scenario by Py_XDECREF the PyObject with 1 reference.

* * Improved the test to close the cursor and set it to None, then forcing the gc

* * Changed the fix of the memory leak and updated the test.

* * Removed redundant empty line

* * Converted tabs to spaces

* * Moved variable out of conn's scope

* Update gitignore, remove duplicated

* Replace deprecated PyUnicode_FromUnicode(NULL, size) calls (mkleehammer#998)

Current versions of Python write a deprecation warning message to
stderr, which breaks CGI scripts running under web servers which
fold stderr into stdout. Likely breaks other software. This change
replaces the deprecated calls with PyUnicode_New(size, maxchar).
The accompanying code to populate the new objects has also been
rewritten to use the new PyUnicode APIs.

* Making pyodbc compatible with PostgreSQL infinity dates, returning MINYEAR and MAXYEAR to python, instead of values out of python's limits

* Removing autoformat from code

* Removing autoformat from code

* Add odbc_config support on mac and m1 homebrew dir

* Note EOL of 2.7 support in README (mkleehammer#945)

* Fix version of CI generated wheels

The CI system is checking out exact tags like "git checkout 4.0.33", which results in a
detached HEAD.  The version calculation was adding the commit hash.

* Fix for mkleehammer#1082 libraries in Linux wheels (mkleehammer#1084)

* use argparse instead of optparse (mkleehammer#1089)

Co-authored-by: Hugo van Kemenade <[email protected]>
Co-authored-by: Alex Nelson <[email protected]>
Co-authored-by: Kian Meng, Ang <[email protected]>
Co-authored-by: Gord Thompson <[email protected]>
Co-authored-by: bamboo <[email protected]>
Co-authored-by: Gord Thompson <[email protected]>
Co-authored-by: bdholder <[email protected]>
Co-authored-by: Benjamin Holder <[email protected]>
Co-authored-by: Inada Naoki <[email protected]>
Co-authored-by: Michael Fladischer <[email protected]>
Co-authored-by: Anatoli Babenia <[email protected]>
Co-authored-by: Francisco Morales <[email protected]>
Co-authored-by: Gord Thompson <[email protected]>
Co-authored-by: Michael Kleehammer <[email protected]>
Co-authored-by: Gilad Leifman <[email protected]>
Co-authored-by: Bob Kline <[email protected]>
Co-authored-by: Leandro Scott <[email protected]>
Co-authored-by: Jordan Mendelson <[email protected]>
Co-authored-by: Keith Erskine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants