10

A follow up to this question: How can I download and uncompress a gzipped file using R? For example (from the UCI Machine Learning Repository), I have a file of insurance data. How can I download it using R?

Here is the data url: http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz.

Community
  • 1
  • 1
Zach
  • 28,621
  • 32
  • 135
  • 198

3 Answers3

20

I like Ramnath's approach, but I would use temp files like so:

tmpdir <- tempdir()

url <- 'http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz'
file <- basename(url)
download.file(url, file)

untar(file, compressed = 'gzip', exdir = tmpdir )
list.files(tmpdir)

The list.files() should produce something like this:

[1] "TicDataDescr.txt" "dictionary.txt"   "ticdata2000.txt"  "ticeval2000.txt"  "tictgts2000.txt" 

which you could parse if you needed to automate this process for a lot of files.

JD Long
  • 57,386
  • 54
  • 197
  • 281
  • +1 nice approach to automate the process. maybe download + unzip should be a function in its own right as it is a very common operation. – Ramnath Aug 12 '11 at 19:37
  • Yes, that's more or less what was in my answer to the question Zach already linked to: http://stackoverflow.com/questions/3053833/using-r-to-download-zipped-data-file-extract-and-import-data – Dirk Eddelbuettel Aug 12 '11 at 19:41
  • I thought the use of basename() and list.files() was worth illustrating. – JD Long Aug 12 '11 at 20:11
8

Here is a quick way to do it.

# create download directory and set it
.exdir = '~/Desktop/tmp'
dir.create(.exdir)
.file = file.path(.exdir, 'tic.tar.gz')

# download file
url = 'http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz'
download.file(url, .file)

# untar it
untar(.file, compressed = 'gzip', exdir = path.expand(.exdir))
Ramnath
  • 52,739
  • 13
  • 122
  • 151
  • As I said, that is virtually identical to what I wrote [in this SO question](http://stackoverflow.com/questions/3053833/using-r-to-download-zipped-data-file-extract-and-import-data) -- modulo the tar vs zip file content issue and the fact that you do not use a proper temp. diretory. I think the whole question could be closed as duplicate. – Dirk Eddelbuettel Aug 13 '11 at 15:44
  • Dirk, i still fail to understand how it is a duplicate. `unz` only works with zip files that contain a single file. so the difference between `untar` and `unz` is reasonably significant in my mind to merit a different question. am i missing something completely here? – Ramnath Aug 13 '11 at 15:53
  • So now for the the third time: downloading a remote file, expanding it in a temp location and working on the content is all the same between both answers. The only minor difference is what operation you use to extract the content, depending on whether it is a zip or tarfile. Is that really that difficult to grasp? – Dirk Eddelbuettel Aug 13 '11 at 15:57
  • 5
    I understand that quite well Dirk. But by that count several questions on SO would have to be closed as duplicate if all that mattered was the underlying concept behind the answers. In my humble opinion, a reader wanting to extract a downloaded archive would not be able to achieve his purpose based on the other question. I don't want to prolong this discussion, but if there are several others who see this as a simple extension and a duplicate, please feel free to shut this question down. – Ramnath Aug 13 '11 at 16:07
  • +1. Quick question: Is `path.expand` necessary for the code to work or is it merely best practice to use the full path instead of relying on **R** doing the tilde expansion? – Steve S Mar 12 '15 at 04:13
2

Please the content of help(download.file) for that. If the file in question is merely a gzipped but otherwise readable file, you can feed the complete URL to read.table() et al too.

Dirk Eddelbuettel
  • 347,098
  • 55
  • 623
  • 708