1

This seems simple, but it's throwing me for a loop. Coding pun intended.

I have a dataframe with the following format:

 df = pd.DataFrame({"chrom":[12,12], 
                   "Pos":[112233,112234], 
                    "ref_base":["A","G"], 
                    "alt_base":["T","C"], 
                    "A":[12,22], 
                    "T":[3,34], 
                    "G":[12,23], 
                    "C":[22,21]}, 
                    index=[0,1])


   chrom     Pos       ref_base  alt_base   A    T   G   C
   12        112233        A        T       12   3  23  22
   12        112234        G        C       22  34  23  21

I need to find a way to create a new column that contains the value from the A,T,G, or C columns that matches the value in the ref_base column.

   chrom     Pos       ref_base  alt_base   A    T   G   C  ref_val
   12        112233        A        T       12   3  23  22    12
   12        112234        G        C       22  34  23  21    23

What I'm ultimately trying to do is create a column containing a tuple of (ref_val, alt_base_val) so if there's a better way to do that than creating the individual columns first and joining them, I'm grateful to learn what that is.

chrom     Pos       ref_base  alt_base      A    T   G   C      AD
   12        112233        A        T       12   3  23  22    (12,3)
   12        112234        G        C       22  34  23  21    (23,21)
SummerEla
  • 1,833
  • 3
  • 23
  • 41
  • `df.lookup(df.index, df.ref_base)` This is a dupe. – cs95 Jan 30 '19 at 00:29
  • I saw your post on that, coldspeed, but wasn't sure how to apply it to my situation. Not a dupe. – SummerEla Jan 30 '19 at 00:39
  • "Wasn't sure how to apply it" is not a reason for it not being a dupe. `lookup` is exactly what you would've used. It's even in the answer below. – cs95 Jan 30 '19 at 00:41
  • Showing implementation is important. I tried this method Ava was unsure how to implement for the problem at hand. Nevertheless, I have an answer. – SummerEla Jan 30 '19 at 05:16

1 Answers1

2

Using lookup

df['New']=tuple(zip(df.lookup(df.index,df.ref_base),df.lookup(df.index,df.alt_base)))
df
    A   C   G     Pos   T alt_base  chrom ref_base       New
0  12  22  12  112233   3        T     12        A   (12, 3)
1  22  21  23  112234  34        C     12        G  (23, 21)
BENY
  • 296,997
  • 19
  • 147
  • 204