Suppose I have following table as a pandas.DataFrame with some important IDs:
| important_ids |
|---|
| 1 |
| 5 |
| 22 |
| 40 |
So basically a table which represents the filtering condition which has to be used (merge/join (?)) on the second table:
| all_ids | value |
|---|---|
| 1 | 5.0 |
| 1 | 4.0 |
| 1 | 5.0 |
| 7 | 4.5 |
| 9 | 2.5 |
| 22 | 3.0 |
| 22 | 0.5 |
My major problem here is that there are duplicate IDs in the second table which I do not want to be merged into one single ID (always happens to me), so that I retrieve a table like:
| all_important_ids | value |
|---|---|
| 1 | 5.0 |
| 1 | 4.0 |
| 1 | 5.0 |
| 22 | 3.0 |
| 22 | 0.5 |
I thought, it should be possible to do this using pandas joins or merges. Can this be done this way?
Code Examples:
table1 = pd.DataFrame(data=table1_series)
table1x2 = table1.join(table2, lsuffix = 'all_ids', rsuffix ='value', how='outer')
Is an outer join the correct approach in this case?
Or even something like this (Which just gives me the id column)?
table1 = pd.DataFrame(data=table1_series)
table1x2 = table2.filter(items = table1)
table1x2 = table2.filter('all_ids' == table1)