7

I created a loop to read sqlite database into a pandas dataframe, and I am trying to merge them together based on "Code"

...
df = pandas.Dataframe()  # Creating an empty dataframe for merging at the end

items = ["tb1", "tb2", "tb3"]

for each_item in items:
    my_value = pandas.read_sql_query('select "Code", "Name", "Value" from {tb_name} where "Value" is not null'
                                     .format(tbl_name='"%s"' % each_item), con=engine)

    print(my_value)

    # This below code is my attempt to merge the dataframes that was obtained through the for loop
    merge_value = pandas.merge(my_value, df, on='Code', how='outer')

my_value results:

# tb1 results
     Code          Name      Value
0     C01         Name1   0.010000
1     C02         Name2   0.001200
2     C03         Name3   0.000300
3     C04         Name4   0.001700

# tb2 results
     Code          Name      Value
0     C03         Name3   0.010000
1     C04         Name4   0.001200
2     C05         Name5   0.000300
3     C06         Name6   0.001700

# tb3 results
     Code          Name      Value
0     C01         Name1   0.010000
1     C02         Name2   0.001200
2     C05         Name5   0.000300
3     C06         Name6   0.001700

I am trying to merge them into one table like below:

# desired results
     Code          Name    Value_x   Value_y    Value_Z
0     C01         Name1   0.010000      NULL   0.010000      
1     C02         Name2   0.001200      NULL   0.001200      
2     C03         Name3   0.000300  0.010000       NULL
3     C04         Name4   0.001700  0.001200       NULL      
4     C05         Name5       NULL  0.000300   0.000300  
5     C06         Name6       NULL  0.001700   0.001700  

How do I merge it? I tried the below, but it produces key error: Code

merge_value = pandas.merge(my_value, df, on='Code', how='outer')
jake wong
  • 4,374
  • 10
  • 37
  • 75
  • Here is a link with a solution to the loop problem (distinct from the merge problem): https://simplernerd.com/python-pandas-merge-dataframes-loop/ Basically, if you have a list of dataframes dfs, then use this code: from functools import reduce merged_df = reduce(lambda l, r: pd.merge(l, r, on='date', how='inner'), dfs) – GuyStalks Mar 30 '22 at 13:10

1 Answers1

10

You can use DataFrame.merge instead of pandas.merge

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html


Create an empty DataFrame with the columns to prevent the "key error: Code"

df = pd.DataFrame(columns=['Code']) 

then in the loop, you

df.merge(my_value, on='Code', how='outer') 

after my_value is created

Jeffrey Ram
  • 380
  • 2
  • 8
  • Hello, the issue is that `tb1, tb2, tb3` is obtained through the `for` loop. So i cannot explicitly type the solution you proposed. – jake wong Sep 02 '17 at 09:32
  • 1
    Try creating an empty DataFrame with the columns df = pd.DataFrame(columns=['Code']) then in the loop, you df.merge(my_value, on='Code', how='outer') after my_value is created – Jeffrey Ram Sep 02 '17 at 09:38