Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR : "('HY000', 'The driver did not supply an error!')" or "The incoming request has too many parameters" when trying to push a df to a MS SQL database using the to_sql() method with argument method='multi' #38541

Closed
AntoineWeber opened this issue Dec 17, 2020 · 2 comments
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@AntoineWeber
Copy link

AntoineWeber commented Dec 17, 2020

  • I have checked that this issue has not already been reported.

  • [Yes, it was already opened here. However the issue was closed even if the problem is still here]

  • I have confirmed this bug exists on the latest version of pandas.

  • [Yes, currently version 1.1.5]

  • (optional) I have confirmed this bug exists on the master branch of pandas.

  • [No, did not check this.]


Problem description

I am currently trying to push a dataframe to a MS SQL database using pandas' to_sql() (using sqlalchemy and pyodbc) method. My dataframe is rather large with a 14k x 72 shape and various types (strings, floats, integers.)

When trying to push the dataframe without any argument set to the to_sql() method (except the table name and DB connection), the data will push extremely slow (> 30min for this 5mb df).
To try to fasten this, I came accross the method= argument, that I set to 'multi'. Without specifying a chunksize, I get the error
('HY000', 'The driver did not supply an error!')

So then I tried to set a chunksize, but whenever this latter one is bigger than ~20, I get the error The incoming request has too many parameters. The server supports a maximum of 2100 parameters. Reduce the number of parameters and resend the request.
Setting the chunksize to 10 works, but the upload is ~3min, which is still too slow..

From my understanding, such an upload should be doable normally in a few seconds.

Am I the only one having this issue ?

Sample code

# connection to the db
engine = sqlalchemy.create_engine("mssql+pyodbc://{},{}/{}?driver=SQL Server?Trusted_Connection=yes"
                                  .format(server_ip, port_number, db_name))
connection = engine.connect()

# read the df
df_raw = pd.read_csv(files_path, index_col=0, low_memory=False)

# push to database
with connection.begin():
    df_raw.to_sql('RawTestData', con=connection, if_exists='replace', index=False, chunksize=chunk_size, method='multi')

Expected Output

Using the method = 'multi' argument should speed the upload of the data. I guess in a few seconds maximum.

Output of pd.show_versions()

[paste the output of pd.show_versions() here leaving a blank line after the details tag]

INSTALLED VERSIONS

commit : b5958ee
python : 3.8.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : French_Switzerland.1252

pandas : 1.1.5
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.4
setuptools : 50.3.1.post20201107
Cython : 0.29.21
pytest : 6.1.1
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.1
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.3
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.20
tables : 3.6.1

@AntoineWeber AntoineWeber added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 17, 2020
@Liam3851
Copy link
Contributor

Liam3851 commented Dec 21, 2020

If you want fast uploading to SQL Server, you need to use a more modern ODBC driver, like ODBC Driver 17 for SQL Server, rather than "SQL Server" which I see in your connection string (which was the driver for SQL Server 2008R2 and earlier). You will also want to set fast_executemany=True in your connection string. See https://docs.sqlalchemy.org/en/13/dialects/mssql.html#fast-executemany-mode.

(Edit: Do note these settings apply to any use of SQLAlchemy with SQL Server, and not pandas to_sql specifically).

@AntoineWeber
Copy link
Author

If you want fast uploading to SQL Server, you need to use a more modern ODBC driver, like ODBC Driver 17 for SQL Server, rather than "SQL Server" which I see in your connection string (which was the driver for SQL Server 2008R2 and earlier). You will also want to set fast_executemany=True in your connection string. See https://docs.sqlalchemy.org/en/13/dialects/mssql.html#fast-executemany-mode.

(Edit: Do note these settings apply to any use of SQLAlchemy with SQL Server, and not pandas to_sql specifically).

Hey, thanks for your answer. I actually proceed to install exactly this new driver this morning (ODBC Driver 17 for SQL Server) to enable easier dockerized use.
Now to make it faster, I also indeed added the fast_executemany=True which at first did not accelerate anything. To end-up with a substantial speed boost, I also had to remove the method='multi' from the to_sql() method. Now I can set a chunk_size way bigger (currently 1000) and a dataframe of 14k rows and 72columns is pushed in under 9 seconds.

Thanks you for the insight. I can close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants