I have very big .csv file, it's around a few GB.
I want to read first few thousand lines of it.
Is there any method to do this efficiently?
Asked
Active
Viewed 9.9k times
65
jangorecki
- 15,274
- 3
- 67
- 146
user2806363
- 2,273
- 7
- 27
- 44
-
http://stackoverflow.com/questions/3094866/trimming-a-huge-3-5-gb-csv-file-to-read-into-r?rq=1 – Francisco Corrales Morales Jan 19 '14 at 20:18
-
3I came to this question repeatedly when looking how to solve the same issue. I'd like to see solutions in readr read.csv etc. And from the number of hits, upvotes and favourites think it would be useful to reopen the question? – pluke Sep 15 '17 at 08:50
-
1This is pretty valid question. Don't really understand why it is "too broad". Do we really need repex to write big csv just to have something to deal with? The nature of problem of reading just part of the file is broad, not the question. – jangorecki May 09 '18 at 10:37
-
Check out argument `nrows` in `help("read.csv")`. – Rui Barradas May 09 '18 at 17:50
2 Answers
99
Use the nrows argument in read.csv(...)
df <- read.csv(file="my.large.file.csv",nrows=2000)
There is also a skip= parameter that tells read.csv(...) how many lines to skip before you start reading.
If your file is that large you might be better off using fread(...) in the data.table package. Same arguments.
Ben Bolker
- 192,494
- 24
- 350
- 426
jlhoward
- 56,091
- 6
- 91
- 135
-
1`skip` isn't very helpful if the first line is the row of column names. – Matthew Lundberg Jan 19 '14 at 20:29
-
3@MatthewLundberg In that case you can `scan()` the first line with n=1, then use `read.csv` with `skip=` and add the colnames after that. – Ari B. Friedman Nov 19 '14 at 11:25
-
3for `read_csv` (R 3.4.4, Win 7), the option is `n_max`, see docs (p. 6) https://cran.r-project.org/web/packages/readr/readr.pdf – Peter Feb 19 '19 at 13:29
-
3
19
If you're on UNIX or OS/X, you can use the command line:
head -n 1000 myfile.csv > myfile.head.csv
Then just read it in R like normal.
Ari B. Friedman
- 69,285
- 35
- 174
- 232