I am working with some twitter data and the rtweet package. I wish to use the following code on the data to clean some of the text strings by creating a new column text2 based on the column text:
BART <- get_timeline("SFBARTalert", n= 200, token = twitter_token)
BART$text2 <- gsub("@\\w+", "", BART$text)
BART$text2 <- gsub("[[:punct:]]", " ", BART$text2)
BART$text2 <- gsub("http\\w+", "", BART$text2)
BART$text2 <- gsub("[ |\t]{2,}", " ", BART$text2)
BART$text2 <- gsub("^ ", "", BART$text2)
BART$text2 <- gsub(" $", "", BART$text2)
Since I will do this for multiple twitter accounts, I want to create a function to do the exact same thing as the code above:
clean_tweets = function(data, old_col = text, new_col = text2){
data$new_col <- gsub("@\\w+", "", data$old_col)
data$new_col <- gsub("[[:punct:]]", " ", data$new_col) # Remove punctuation (replace with " ")
data$new_col <- gsub("http\\w+", "", data$new_col) # Remove strings beginning with http (replace with "")
data$new_col <- gsub("[ |\t]{2,}", " ", data$new_col) # Remove tabs (replace with " ")
data$new_col <- gsub("^ ", "", data$new_col) # Remove blank spaces at the beginning
data$new_col <- gsub(" $", "", data$new_col) # Remove blank spaces at the end
data$new_col <- tolower(data$new_col)
}
When I try to run clean_tweets(BART), I get an error:
Error: Assigned data `gsub("@\\\\w+", "", data$old_col)` must be compatible with existing data.
I am not sure why this function is not working. Can anybody help me out? Thanks a lot!