0

I'm trying to open a large fixed width file, but unfortunately the function read.fwf seems to be very time expansive. As I have to work with large dataset I would know if there are other solutions to save time or if there are no solutions to this.

In my case, to open a file of about six hundred thousand I use the follow code:

df <- read.fwf(file = "df.txt", widths = c(1,3,3,4,3,3,1,1,2,2,2,2,1,1,3,3,2,2,3,1,2,2,3,2,1,3,1,1,3,3,1,3,1,1,1,1,2,3,1,2,1,3,2,1,1,1,3,3,1,1,1,2,4,16,3,7,1,1,2,1,2,2,1))

It takes about 12 minutes.

As I don't need of all the columns in the database, I tried to delete some of them by doing so:

df <- read.fwf(file = "df.txt", widths = c(1,3,3,4,-3,3,-1,-1,2,-2,-2,2,1,-1,-3,-3,-2,-2,-3,-1,-2,-2,3,-2,-1,-3,-1,-1,-3,-3,-1,3,-1,-1,-1,1,-2,-3,-1,2,1,-3,-2,-1,-1,1,-3,-3,-1,-1,-1,-2,-4,-16,3,-7,-1,-1,2,-1,-2,2,-1))

I save some time (6 minutes) but the procedure is still too time expansive because I have to work with many dataset of over 2 million of rows.

Nimantha
  • 5,793
  • 5
  • 23
  • 56
Antonio
  • 61
  • 5
  • Have you tried looked at [this](https://stackoverflow.com/questions/24715894/faster-way-to-read-fixed-width-files) – David Jul 13 '21 at 18:24

0 Answers0