Why would I get a memory error with fast_executemany on a tiny df?

Question

I was looking for ways to speed up pushing a dataframe to sql server and stumbled upon an approach here. This approach blew me away in terms of speed. Using normal to_sql took almost 2 hours and this script was done in 12.54 seconds to push a 100k row X 100 column df.

So after testing the code below with a sample df, I attempted to use a df that had many different datatypes (int, string, floats, Booleans). However, I was sad to see a memory error. So I started reducing the size of my df to to see what the limitations were. I noticed that if my df had any strings then I wasn't able to load to sql server. I am having trouble isolating the issue further. The script below is taken from the question in the link, however, I added a tiny df with strings. Any suggestions on how to rectify this issue would be great!

import pandas as pd
import numpy as np
import time
from sqlalchemy import create_engine, event
from urllib.parse import quote_plus
import pyodbc

conn =  "DRIVER={SQL Server};SERVER=SERVER_IP;DATABASE=DB_NAME;UID=USER_ID;PWD=PWD"
quoted = quote_plus(conn)
new_con = 'mssql+pyodbc:///?odbc_connect={}'.format(quoted)
engine = create_engine(new_con)


@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    print("FUNC call")
    if executemany:
        cursor.fast_executemany = True


table_name = 'fast_executemany_test'
df1 = pd.DataFrame({'col1':['tyrefdg','ertyreg','efdgfdg'],
                   'col2':['tydfggfdgrefdg','erdfgfdgfdgfdgtyreg','edfgfdgdfgdffdgfdg']
                   })



s = time.time()
df1.to_sql(table_name, engine, if_exists = 'replace', chunksize = None)
print(time.time() - s)

Gord Thompson · Accepted Answer · 2018-04-20T14:42:45.583

23

I was able to reproduce your issue using pyodbc 4.0.23. The MemoryError was related to your use of the ancient

DRIVER={SQL Server}

Further testing using

DRIVER=ODBC Driver 11 for SQL Server

also failed, with

Function sequence error (0) (SQLParamData)

which was related to an existing pyodbc issue on GitHub. I posted my findings here.

That issue is still under investigation. In the meantime you might be able to proceed by

using a newer ODBC driver like DRIVER=ODBC Driver 13 for SQL Server, and
running pip install pyodbc==4.0.22 to use an earlier version of pyodbc.

edited Apr 20 '18 at 14:42

answered Apr 20 '18 at 01:03

Gord Thompson

107,466
28
191
387

2

Thanks! pyodbc 4.0.22 didn't work for me but pyodbc 4.0.19 worked fine. Also, I noticed that the script throws an error when adding datatypes into the `to_sql` statement. So i just load the data to SQL Server and change the datatypes there, in case anyone is getting that same error. – Martin Bobak Apr 23 '18 at 18:35
Thanks @MartinBobak It's such a strange error..! I had to fiddle around with the metadata and it worked, but very annoying. Also found the performance boost amazing. – Umar.H Jun 17 '19 at 13:45

score 1 · Answer 2 · answered Sep 16 '21 at 16:41

1

I was facing thi issue too.

Solved by changing:

conn =  "DRIVER={SQL Server};SERVER=SERVER_IP;DATABASE=DB_NAME;UID=USER_ID;PWD=PWD"

To this:

conn =  "DRIVER={ODBC Driver 17 for SQL Server};SERVER=SERVER_IP;DATABASE=DB_NAME;UID=USER_ID;PWD=PWD"

answered Sep 16 '21 at 16:41

Hugo Vares

767
6
7

score 0 · Answer 3 · answered Apr 20 '21 at 01:17

I experienced this issue with 32 bit and switched my interrupter to 64 bit which resolved my memory issue. Out side of that solution, I would recommend is chunking the amount of data you process. You can establish your threshold and once that threshold is met you can process that chunk of data and iterate through until you have processed all of the data.

Why would I get a memory error with fast_executemany on a tiny df?

3 Answers3

Linked