PyODBC + Pandas + Parameterization

Question

I'm using PyODBC to connect to Oracle with a connection called cnxn.

I have a list of unique identifiers: list1 = [1234, 2345, 3456, 4567]

I also have a query:

query1 = """
select * from tablename where unique_id = ?
""'

What I'd like to do is use this list of identifiers to create a pandas dataframe.

As a test, I did this:

testid = "1234" (since Oracle wants a string as that id not an integer)

However, when I do this:

pd.read_sql(query1, cnxn, params = testid)

I get "the sql contains 1 parameter marker, yet 4 were supplied."

Eventually, I want to be able to do something like this:

for i in list1:
    newdataframe.append(pd.read_sql(query1, cnxn, params = i))

and have it spit out a dataframe.

I've read the docs on PyODBC and it looks like it specifies ? as the parameter. I've also looked at this question and it's similar, but I need to be able to feed the results into a Pandas dataframe for further manipulation.

I think if I can get the testid working, I'll be on the right track.

Thanks

just updated to add the single quotes after the query is parsed. — PyNoob, May 16 '16 at 18:36

score 10 · Accepted Answer · answered May 17 '16 at 09:09

From the pandas read_sql docs: "params: List of parameters to pass to execute method".

The params needs to be a list (or tuple), and hence the string is interpreted as a list of 4 single characters. But this can be solved easily:

pd.read_sql(query1, cnxn, params=[testid])

PyNoob · Answer 2 · 2016-05-16T18:36:02.823

The below is a full example with connection details but is SQL Server specific.

Because you're using ORACLE, you can steal the df_query part. The point I'm trying to illustrate here is that you can use string formatting for parameter values instead of using params in your connection string.

import os
import sqlalchemy as sa
import urllib
import pandas as pd


#Specify the databases and servers used for reading and writing data.
read_server = 'Server'
read_database = 'Database'

read_params = urllib.quote_plus("DRIVER={Server};SERVER={read_server};DATABASE={read_database};TRUSTED_CONNECTION=Yes".format(Server = 'SQL Server',read_server = read_server,read_database=read_database))
read_engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % read_params)

unique_id= 'id'
single_quote = "'"

df_query = """ 
                SELECT * 
                FROM TABLE 
                WHERE UNIQUE_ID = {single_quote}{unique_id}{single_quote}
""".format(single_quote = single_quote,unique_id=unique_id)

DF = pd.read_sql_query(df_query,con=read_engine,index_col=None)

This isn't parametrized and thus vulnerable to [sql-injection](https://en.wikipedia.org/wiki/SQL_injection#Parameterized_statements) — Dan, Apr 20 '18 at 16:22

PyODBC + Pandas + Parameterization

2 Answers2