0

Context: I'm allowing a user to add specific methods for a cleaning process pipeline (appended to a main list with all the methods chosen). Each element from this list is the name of a function.

My quesiton is:

Why does this work:

dataframe[cleanedCol] =dataframe[colToClean].apply(replace_contractions).apply(remove_links).apply(remove_emails)

But something like this doesn't?

pipeline = ['replace_contractions','remove_links','remove_emails']
for method in pipeline:
     dataframe[cleanedColumn] = dataframe[columnToClean].apply(method)

How could I iteratively apply each one of the methods from the list (by the order they are in the list) to the dataframe column?

Thank you in advance!

ROO
  • 31
  • 5
  • 1
    Can you tell us what the difference is between `replace_contractions` and `'replace_contractions'`? What are the types of both those expressions? If you're just doing one `.apply()`, would `dataframe[colToClean].apply('replace_contractions')` work? Why not? – Pranav Hosangadi May 13 '22 at 15:50
  • What you have in the second example is a list of strings, not a list of variables. That's why it returns error when you pass it into the apply method. – Kevin Choon Liang Yew May 13 '22 at 15:50

1 Answers1

2

You would either have to convert those strings to actual function objects or even better just store the function objects instead of the names as strings

pipeline = [replace_contractions, remove_links, remove_emails]
for method in pipeline:
     dataframe[cleanedColumn] = dataframe[columnToClean].apply(method)
Cory Kramer
  • 107,498
  • 14
  • 145
  • 201
  • That was my bad, completely forgot about that! I was using a dictionary as a way to map keys as the "function name" and values as the "options chosen" `{function: 1}, etc` and I forgot that I should be using the function objects instead of strings! Thanks! – ROO May 13 '22 at 16:00