0

I am working with some twitter data and the rtweet package. I wish to use the following code on the data to clean some of the text strings by creating a new column text2 based on the column text:

BART <- get_timeline("SFBARTalert", n= 200, token = twitter_token)
BART$text2 <- gsub("@\\w+", "", BART$text) 
BART$text2 <- gsub("[[:punct:]]", " ", BART$text2)  
BART$text2 <- gsub("http\\w+", "", BART$text2)
BART$text2 <- gsub("[ |\t]{2,}", " ", BART$text2) 
BART$text2 <- gsub("^ ", "", BART$text2) 
BART$text2 <- gsub(" $", "", BART$text2) 

Since I will do this for multiple twitter accounts, I want to create a function to do the exact same thing as the code above:

clean_tweets = function(data, old_col = text, new_col = text2){
data$new_col <- gsub("@\\w+", "", data$old_col) 
data$new_col <- gsub("[[:punct:]]", " ", data$new_col) # Remove punctuation (replace with " ") 
data$new_col <- gsub("http\\w+", "", data$new_col) # Remove strings beginning with http (replace with "")
data$new_col <- gsub("[ |\t]{2,}", " ", data$new_col) # Remove tabs (replace with " ")
data$new_col <- gsub("^ ", "", data$new_col) # Remove blank spaces at the beginning
data$new_col <- gsub(" $", "", data$new_col) # Remove blank spaces at the end
data$new_col <- tolower(data$new_col)
}

When I try to run clean_tweets(BART), I get an error:

Error: Assigned data `gsub("@\\\\w+", "", data$old_col)` must be compatible with existing data.

I am not sure why this function is not working. Can anybody help me out? Thanks a lot!

z_11122
  • 115
  • 6
  • The `$` operator only works with literal values. You cannot use it with variables that you pass to your function. Instead make your column name values strings and use `[[ ]]` to read and assign values to `data`. – MrFlick Nov 09 '21 at 03:28

0 Answers0