Beginner ML question here. I have a dataframe with a categorical column, a lot of the values are slightly different but essentially mean the same thing. Here's an example of such values:
Wireless Connection
Wireless Connection Wi-Fi
Wireless Connection Issue 3g
Wireless Connection Issue Wi-Fi
Wireless/Connectivity
What would be the best way to condense/encode these values? Intuitively my thinking is to just find all instances of the values and convert them to single unique value like, Wireless Connection. For what it's worth, I'll be feeding the encoding of these values into some classifier, like a Decision Tree or Logistic Regression to predict if a customer's question in a call to customer service was answered.