Redshift where clause using multiple columns

Question

Suppose I have the following table in redshift:

table
|   a |   b |
|----:|----:|
|   3 |   1 |
|   1 |   8 |
|   7 |   6 |
|   4 |   0 |
|   5 |   6 |
|   5 |   2 |
|   5 |   9 |
|   4 |   3 |
|   7 |   9 |
|   9 |   8 |

And in python, I have the following list of tuples:

x = [(3,1), (4,2), (10, 1), (7,9), (5,2), (6,1)]

I want to extract all rows from the table where the tuple (a,b) is in x using pd.read_sql_query`.

If I only had one column it would be a simple SQL WHERE clause, something like:

query = f'''
SELECT *
FROM table
WHERE a IN {x_sql} 
'''

pd.read_sql_query(query, engine)

My final result would be:

|   a |   b |
|----:|----:|
|   3 |   1 |
|   5 |   2 |
|   7 |   9 |

I wanted to create a query like:

#doesn't work
SELECT *
FROM table 
WHERE a,b IN ((3,1), (4,2), (10, 1), (7,9), (5,2), (6,1))

score 1 · Answer 1 · answered Aug 17 '20 at 16:46

1

IIUC, we can use .stack with isin and .loc to filter along the index.

x = [(3,1), (4,2), (10, 1), (7,9), (5,2), (6,1)]

df.loc[df.stack().groupby(level=0).agg(tuple).isin(x)]


       a      b 
1      3      1
6      5      2
9      7      9

answered Aug 17 '20 at 16:46

Umar.H

20,495
6
30
59

My table is in redshift and I didn't want to load it entirely using pandas – Bruno Mello Aug 17 '20 at 16:50
@BrunoMello right missed that, you could create a calculated column I think [this](https://stackoverflow.com/questions/8006901/using-tuples-in-sql-in-clause) question has the syntax you need. you could also create a calculated column and match as strings. Not sure how that is for performance . – Umar.H Aug 17 '20 at 16:53
I think the syntax of the accepted answer might work, thanks! – Bruno Mello Aug 17 '20 at 16:56

Redshift where clause using multiple columns

1 Answers1