0

Hi there so my dataset looks as follow:

Patient ID Medicine Death
1 A,B,C,D,E 1
2 B,D 0
3 A,D,E 1

So my dependent feature is death and my independent feature is medicine. I am trying to predict death based on the medication received by a patient using machine learning.

There are five distinct medicines A, B, C, D, and E. Each patient can be given a combination of these medicines. My question is how do I process the medicine feature vector?

I was thinking to create a dummy binary variable for each medicine to check if it was administered. But that seems quite cumbersome, especially if I have more than 5 medicines say 100 medicines. I appreciate your input on how this feature can be processed, I am sure there is a solution out there to handle this kind of situation which I am not aware of. Thanks.

  • 1
    The typical approach is what you mentioned: create a binary variable for each medication, indicating whether or not it was administered. – user20160 Dec 15 '20 at 23:47
  • I see thank you for confirming this. I was not sure about it since I am fairly new to the area. – Aditya Lahiri Dec 16 '20 at 00:19

0 Answers0