1

I have a dictionary where the key is two parts, one the index coordinate and the other the column coordinate. I would like to use this dictionary to populate a pandas DataFrame based on these coordinates.

For example my dictionary looks like this:

final = {('BUV395', 'BUV496'): 0, ('BUV395', 'BUV563'): 0, ('BUV395', 'BUV615'): 0, ('BUV395', 'BUV661'): 0, etc...

The input to my function is the pandas DataFrame with the original data - just to give context to the code below:

def matrix_all_pairs(df):
  dataframe = pd.DataFrame(index=range(0,len(df.index.values)),columns=range(0,len(df.index.values)))
  dataframe.columns = df.index.values
  idx = list(df.index.values)
  list_fluor = list(combinations(df.index.values, 2))
  final = {}
  for fluor in list_fluor:
    if (r2_score(df.xs(fluor[0]), df.xs(fluor[1]))) < 0:
      final[fluor] = 0
    else:
      final[fluor] = (r2_score(df.xs(fluor[0]), df.xs(fluor[1])))
  for fluor, value in list_fluor:
    x = value
    dataframe.loc(idx.index(fluor[0]), fluor[1]) = x
  dataframe.index = df.index.values
  return(dataframe)

When I try to run this, it gives me "SyntaxError: can't assign to function call" for the line:

    dataframe.loc(idx.index(fluor[0]), fluor[1]) = x

Is there a better way of doing this? I've seen multiple people say that populating an empty DataFrame using a loop is messy but I'm not sure how else I could do this?

I'm not sure how to post my data for people to work with - I'm new to this site.

Kraigolas
  • 3,728
  • 3
  • 8
  • 27
ben
  • 11
  • 2
  • It is really unclear what you are trying to do. You say you have a dictionary that has key/value pairs that look something like (str, str'): int and you want to create a df using the dictionary. You then show a function which has a variable titled df which typically is used to denote a dataframe, from which you seem to create another dataframe. I am lost, can you clarify? – itprorh66 Jan 13 '22 at 19:14
  • "I'm not sure how to post my data for people to work with" because you are trying to create a dataframe, it's fine to just give the example that you've given `final = {...}`. I would then add how you want this small dictionary to look when it becomes a dataframe. – Kraigolas Jan 13 '22 at 19:14
  • 2
    You're getting `SyntaxError: can't assign to function call` because df.loc should use `[]` instead of `()` – scrollout Jan 13 '22 at 19:15
  • for the first comment: the input df has the raw data. I'm doing pairwise linear regressions comparing each row to every other one and outputting a dictionary with the pair being compared as the key and the r^2 of the linear regression as the value. I then want to turn that dictionary into a new dataframe with the r^2 at the intersection of the comparison. Does that help clear things up? – ben Jan 13 '22 at 20:26
  • for the third comment: Thank you! that worked. Now I'm running into trouble with being able to call the first and second parts of each dictionary key. The way I have it gives me the first and second letter instead of the first and second word (I think string is the "code" way to say this?) Do you have a solution for this? – ben Jan 13 '22 at 20:32

1 Answers1

0

Is this what you're asking? First item in each tuple is the "row/index" value and second item is the "column" header. Essentially, you have a multiindex series that you want to unstack into a single index dataframe.

df = pd.DataFrame.from_dict(final, orient='index')
df[['index','column']] = df.index.values.tolist()
df = df.set_index(['index','column'])[0].unstack()

Your example final dictionary has only one unique key in the first tuple elements so result would be:

column  BUV496  BUV563  BUV615  BUV661
index                                 
BUV395       0       0       0       0

Alternate example to show more obviously 2-dimensional dataframe.

final = {('BUV395', 'BUV496'): 0, ('BUV395', 'BUV563'): 0, ('BUV496', 'BUV395'): 0, ('BUV496', 'BUV563'): 0, ('BUV563', 'BUV395'): 0, ('BUV563', 'BUV496'): 0}

df = pd.DataFrame.from_dict(final, orient='index')
df[['index','column']] = df.index.values.tolist()
df = df.set_index(['index','column'])[0].unstack().rename_axis(None).rename_axis(None, axis=1)
        BUV395  BUV496  BUV563
BUV395     NaN     0.0     0.0
BUV496     0.0     NaN     0.0
BUV563     0.0     0.0     NaN
StevenS
  • 630
  • 5
  • Will this work to create a dataframe with more than one index? My actual data should create a dataframe that is 78x78 (indices, columns). – ben Jan 13 '22 at 20:59
  • Yes, of course. The example only includes one index because your example `final` dictionary only had one unique first element of the 4 tuple keys. – StevenS Jan 13 '22 at 21:21
  • When I tried this I got back an empty dataframe (all NaN). It was also 77 rows instead of 78 like I expected. What's going on? – ben Jan 14 '22 at 02:03
  • It's difficult to say without knowing more about your actual data, but I would recommend looking at your dataframe after each step to see where you're starting to see the `NaN` values show up. Is the initial dataframe constructed from the dictionary correct? – StevenS Jan 14 '22 at 07:25
  • Also, maybe double check that you actually are expecting 78 rows by doing something like `len(set([x[0] for x in final.keys()]))`. The result will be the actual number of unique first elements in your dictionary tuple keys (i.e., number of expected rows in the final dataframe). – StevenS Jan 14 '22 at 07:57
  • I got it to work but there are a few more problems now. It's not copying over the last index into the final dataframe and it's also not fully populating it - there's still a lot of spaces with NaN – ben Jan 14 '22 at 17:38