0

I’m a beginner with using R and am currently working on a file with multiple columns. I want to focus on one column (labelled text in the csv file) and create a corpus and then change the text in the text column so that it is all in lower case, has punctuation removed etc.

The code below is what I have so far:

# Import text data

ALL_tweets_df <- read.csv("All_tweets.csv", stringsAsFactors = FALSE)

library(tm)

# View the structure of tweets

str(ALL_tweets_df)

# Print out the number of rows in tweets

nrow(ALL_tweets_df)

# Isolate text from tweets: All_tweets

ALL_tweets_df <- ALL_tweets_df$text

#converts the relevant part of your file into a corpus

mycorpus<-Corpus(VectorSource(ALL_tweets_df$text)) 

# change to lower case, remove stop words, remove punctuation

mycorpus2 = tm_map(mycorpus, tolower)

mycorpus3 = tm_map(mycorpus2, removeWords, stopwords("english"))

mycorpus4 = tm_map(mycorpus3, removePunctuation)

I’m going wrong where I try to convert the relevant parts of my file to a corpus because it’s saying I have a list of 0 as the value for mycorpus which can’t be right as there are thousands of tweets under the text column in the csv file. Would anyone know how I could amend this so that it works?

Any help would really be appreciated.

 

 

 

Adam
  • 4,207
  • 1
  • 32
  • 48
anna.c
  • 1
  • 2

0 Answers0