Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write Null values to mssql table #8088

Closed
klonuo opened this issue Aug 21, 2014 · 8 comments
Closed

Write Null values to mssql table #8088

klonuo opened this issue Aug 21, 2014 · 8 comments

Comments

@klonuo
Copy link
Contributor

klonuo commented Aug 21, 2014

This issue is covered at several places, but it doesn't seem there is a solution for mssql.
Referencing #4199

Example:

import numpy as np
import pandas as pd
from sqlalchemy import create_engine

pg_engine = create_engine('postgresql://postgres:*pass*@localhost:5432/test')
ms_engine = create_engine('mssql+pyodbc://localhost\\sqlexpress/test')

df = pd.DataFrame(np.random.rand(5,3))
df = df.where(df < .5, None)

df.to_sql('x', con=pg_engine)
df.to_sql('x', con=ms_engine)

This script created successfully postgres table but fails at mssql:

DataError: (DataError) (
    '22018', '[22018] [Microsoft][ODBC SQL Server Driver][SQL Server]
    Operand type clash: float is incompatible with text (206) (SQLExecDirectW); 
    [42000] [Microsoft][ODBC SQL Server Driver][SQL Server]
    Statement(s) could not be prepared. (8180)'
) 
u'INSERT INTO x ([index], [0], [1], [2]) VALUES (?, ?, ?, ?)' (
    (0L, 0.30160831834724133, None, 0.3381303856312915), 
    (1L, 0.15675955874181347, None, None), 
    (2L, None, None, 0.4913946391341001), 
    (3L, 0.47313398459868405, 0.41395730103112083, 0.024404123661447397), 
    (4L, 0.02599871607241755, 0.0038944154716251678, None)
)

Full trace: https://gist.github.com/51dd8360a938dece9f28

pandas: 0.14.1
sqlalchemy: 0.9.7

@klonuo
Copy link
Contributor Author

klonuo commented Aug 21, 2014

With sqlalchemy I'm able to write Null values to table if I use sqlalchemy.sql.null() instead None.

@klonuo klonuo changed the title Write nan values to mssql table Write Null values to mssql table Aug 21, 2014
@jorisvandenbossche
Copy link
Member

The problem with the df.where(pd.isnull(df), None) approach (having NaNs as None) is that the columns have then 'object' dtype, and because of that 'TEXT' columns are created in the database (and gives you the error of 'float is incompatible with text').

Possible hack for now: first create the database with part of the data without NaNs (eg first line if there aren't NaNs), and then append the rest.

But I hope the NaN issues will be fixed for real for 0.15 (as replacing NaNs with None is also hack that does not work very well, as you illustrated here). If you are interested in looking into this a bit, certainly welcome!

@klonuo
Copy link
Contributor Author

klonuo commented Aug 21, 2014

The problem with the df.where(pd.isnull(df), None) approach (having NaNs as None) is that the columns have then 'object' dtype, and because of that 'TEXT' columns are created in the database.

You are right. Postgre table columns are of type text, which makes the referenced hack unacceptable.

Possible hack for now: first create the database with part of the data without NaNs (eg first line if there aren't NaNs), and then append the rest.

Tested and working fine for me.
I had to change np.nans to None, and then appending creates Nulls and column type is preserved.

Thanks :)

But I hope the NaN issues will be fixed for real for 0.15. If you are interested in looking into this a bit, certainly welcome!

I can perhaps look a bit, but I'm afraid I don't have skills to make the difference.

@jorisvandenbossche
Copy link
Member

Yes, this is the 'known hack' for the moment, changing the NaNs to None (and first creating the database and then append).
The main issue about this is here: #2754

What error do you get if you just write to mssql with NaNs (so df = df.where(df < .5, np.nan) instead of None)? Because eg for postgres it does work, but not for mysql (see overview #2754 (comment)). Always interesting to know how the status is with other flavours.

@klonuo
Copy link
Contributor Author

klonuo commented Aug 21, 2014

The main issue about this is here: #2754

OK, I'll look closer

What error do you get if you just write to mssql with NaNs (so df = df.where(df < .5, np.nan) instead of None)?

I get this:

ProgrammingError: (ProgrammingError) (
    '42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]
    The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. 
    Parameter 6 (""): The supplied value is not a valid instance of data type float. 
    Check the source data for invalid values. 
    An example of an invalid value is data of numeric type with scale greater than precision. (8023) (SQLExecDirectW)'
) 
u'INSERT INTO y ([index], [0], [1], [2]) VALUES (?, ?, ?, ?)' (
    (0L, 0.3546747680133002, nan, 0.15303927955650265), 
    (1L, nan, nan, nan), 
    (2L, nan, nan, nan), 
    (3L, nan, nan, 0.027304714280517728), 
    (4L, nan, 0.08309139967584267, 0.2815579060440928)
)

Full trace: https://gist.github.com/339130e1346b22e91030

@jorisvandenbossche
Copy link
Member

Thanks for testing!
I am going to close this issue then, but mention it in #2754 in the overview of what is/not yet working (as for the moment, it is a general issue for all flavors).

@klonuo
Copy link
Contributor Author

klonuo commented Aug 21, 2014

OK, @jorisvandenbossche
Thanks for your support

@mfieraci
Copy link

mfieraci commented Feb 5, 2022

Hi,
My DataFrame has a few columns that currently are empty (set to np.NaN).
Is there an alternative approach to uploading data to sql server that does not require a row to have no NaN values at all?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants