How to convert relevant parts of a file to a corpus using R

Question

I’m a beginner with using R and am currently working on a file with multiple columns. I want to focus on one column (labelled text in the csv file) and create a corpus and then change the text in the text column so that it is all in lower case, has punctuation removed etc.

The code below is what I have so far:

# Import text data

ALL_tweets_df <- read.csv("All_tweets.csv", stringsAsFactors = FALSE)

library(tm)

# View the structure of tweets

str(ALL_tweets_df)

# Print out the number of rows in tweets

nrow(ALL_tweets_df)

# Isolate text from tweets: All_tweets

ALL_tweets_df <- ALL_tweets_df$text

#converts the relevant part of your file into a corpus

mycorpus<-Corpus(VectorSource(ALL_tweets_df$text)) 

# change to lower case, remove stop words, remove punctuation

mycorpus2 = tm_map(mycorpus, tolower)

mycorpus3 = tm_map(mycorpus2, removeWords, stopwords("english"))

mycorpus4 = tm_map(mycorpus3, removePunctuation)

I’m going wrong where I try to convert the relevant parts of my file to a corpus because it’s saying I have a list of 0 as the value for mycorpus which can’t be right as there are thousands of tweets under the text column in the csv file. Would anyone know how I could amend this so that it works?

Any help would really be appreciated.

Thanks, that's fixed it. I'm trying to convert it back into a dataframe now with this code:mycorpus5 — anna.c, Apr 13 '17 at 07:27
This previous answer might help here... http://stackoverflow.com/questions/24703920/r-tm-package-vcorpus-error-in-converting-corpus-to-data-frame — Andrew Gustar, Apr 13 '17 at 08:34

How to convert relevant parts of a file to a corpus using R

0 Answers0