-1

I am new to R programming...

I have multiple text files with a word in each line:

sample file

i want to import all the text files and create a data frame.. Something like this:

library(rTextTools)
data(USCongress)
View(USCongress)

data frame

I want the words to be in a single line and then create a data.frame with variable 'text' just like in the reference data(USCongress) please help

Session Info:

R version 3.0.3 (2014-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] magrittr_1.5     RTextTools_1.4.2 SparseM_1.05    

loaded via a namespace (and not attached):
 [1] bitops_1.0-6        BradleyTerry2_1.0-5 brglm_0.5-9         car_2.0-21          caret_6.0-37        caTools_1.17.1     
 [7] class_7.3-9         codetools_0.2-8     coin_1.0-24         colorspace_1.2-4    digest_0.6.4        e1071_1.6-4        
[13] foreach_1.4.2       ggplot2_1.0.0       glmnet_1.9-8        grid_3.0.3          gtable_0.1.2        gtools_3.4.1       
[19] ipred_0.9-3         iterators_1.0.7     kernlab_0.9-19      lattice_0.20-27     lava_1.2.6          lme4_1.1-7         
[25] MASS_7.3-35         Matrix_1.1-2        maxent_1.3.3.1      minqa_1.2.4         modeltools_0.2-21   munsell_0.4.2      
[31] mvtnorm_1.0-1       nlme_3.1-113        nloptr_1.0.4        nnet_7.3-8          parallel_3.0.3      party_1.0-18       
[37] plyr_1.8.1          prodlim_1.4.5       proto_0.3-10        randomForest_4.6-10 Rcpp_0.11.3         reshape2_1.4       
[43] rpart_4.1-5         sandwich_2.3-2      scales_0.2.4        slam_0.1-32         splines_3.0.3       stats4_3.0.3       
[49] stringr_0.6.2       strucchange_1.5-0   survival_2.37-7     tau_0.0-18          tm_0.5-10           tools_3.0.3        
[55] tree_1.0-35         zoo_1.7-11

I tried this:

Data <- paste0("h:/desktop/datasci/new",list.files("~/new/")) %>%
+     sapply(.,read.table) %>%
+     do.call(rbind,.) %>%
+     apply(.,1,paste0,collapse=" ") %>%
+     data.frame(text=.,row.names=NULL) 

but this gives me an error:

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'h:/desktop/datasci/new': Permission denied

Thanks

Learner27
  • 239
  • 1
  • 4
  • 13
  • 1
    In what way are you trying to make your data resemble the `USCongress` data above? E.g. each text file should be become a sentence / string of words, like a single row in the column `text`? Or words with the same relative position from different files should be combined, i.e. the first word in each file is combined into a single sentence, etc...? – nrussell Dec 06 '14 at 20:53
  • I want the words to be in a single line and then create a data.frame with variable 'text' just like in the reference data(USCongress) – Learner27 Dec 06 '14 at 20:57
  • 1
    Have you made any coding attempts? What exactly is the challenge you are facing? Images of data isn't all that helpful. Try to make a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with specific sample input and desired output. The USCongress data seems to have a bunch of other columns that aren't relevant to you r data. – MrFlick Dec 06 '14 at 21:20
  • @MrFlick .... i just need the 'text' column of the USCongress data.. – Learner27 Dec 06 '14 at 21:22
  • Do you want a data.frame with a single column? Have you tried `read.table()` at all on your data? How does that not return what you want? – MrFlick Dec 06 '14 at 21:23
  • @MrFlick I tried that but i have over 100 text files.. – Learner27 Dec 06 '14 at 21:25

1 Answers1

1

Here's one approach, where the .txt files are in the directory "~/tempfiles/":

library(magrittr)
##
Df <- paste0("~/tempfiles/",list.files("~/tempfiles/")) %>%
  sapply(.,read.table) %>%
  do.call(rbind,.) %>%
  apply(.,1,paste0,collapse=" ") %>%
  data.frame(text=.,row.names=NULL)
R> Df
                                                         text
1 file1_line1 file1_line2 file1_line3 file1_line4 file1_line5
2 file2_line1 file2_line2 file2_line3 file2_line4 file2_line5
3 file3_line1 file3_line2 file3_line3 file3_line4 file3_line5

You can certainly do this without using magrittr, but it's much cleaner than looking at a bunch of nested function calls.

For the example above I just made three nonsense .txt files in a throw-away directory ~/tempfiles/:

R> list.files("~/tempfiles/")
[1] "file1.txt" "file2.txt" "file3.txt"

which look like this:

R> read.table("~/tempfiles/file1.txt")
           V1
1 file1_line1
2 file1_line2
3 file1_line3
4 file1_line4
5 file1_line5
R> read.table("~/tempfiles/file2.txt")
           V1
1 file2_line1
2 file2_line2
3 file2_line3
4 file2_line4
5 file2_line5

etc... sapply is used to iterate over all of the files in the target directory, and the results are combined into a [3 x 5] array using do.call(rbind(...)). This is piped into apply, where we collapse each of the 5 columns into a single vector (of length 3), and finally made into a data.frame.

nrussell
  • 17,956
  • 4
  • 46
  • 60
  • I am getting this error: Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file 'h:/desktop/datasci/new': Permission denied – Learner27 Dec 06 '14 at 21:36
  • Add the output of calling `list.files` on your file directory (presumably `h:/desktop/datasci/`) to your question please. This looks like a typical pain-in-the-ass issue that only happens on Windows. While you're at it, can you run `sessionInfo()` and include that as well? – nrussell Dec 06 '14 at 21:43
  • I'm not sure if you can use tilde expansion on Windows; I'm using Linux, where `~/` is just short for `/home/nathan/`. Try using `paste0("h:/desktop/datasci/new",list.files("h:/desktop/datasci/new"))`, where you are passing the *same file directory* to `paste0` *and* `list.files`. – nrussell Dec 06 '14 at 22:05
  • Sorry, it should be `paste0("h:/desktop/datasci/new/",list.files("h:/desktop/datasci/new/"))`, the `/` after `new` the character string you pass to `paste0` is necessary or your file paths will not be correct. – nrussell Dec 06 '14 at 22:15