2

I am trying to read a worksheet in an Excel 2010 workbook file into R using both the xlsx and XLConnect packages. Both are dropping leading zeroes on zip code columns despite formatting the cells in the worksheet as 'Text'.

wb <- loadWorkbook('c:/~/file1.xlsx')
sheetNames <- getSheets(wb)

for(i in 1:length(sheetNames)){  # i = 2
    #dat1 <- read.xlsx('c:/~/file1.xlsx', sheetNames[i], as.data.frame = T)
    dat1 <-  readWorksheetFromFile('c:/~/file1.xlsx', sheetNames[i])
}

Does anyone have suggestions for how to deal with this?

screechOwl
  • 25,740
  • 58
  • 153
  • 258

2 Answers2

1

You can also fix this afterwards with sprintf:

# some example data
x <- c(225,4867,52,15732,9514,78142)
# getting the leading zeros back
x <- sprintf("%05s", x)

> x
[1] "00225" "04867" "00052" "15732" "09514" "78142"
ulfelder
  • 5,148
  • 1
  • 20
  • 37
Jaap
  • 77,147
  • 31
  • 174
  • 185
1

You can specify the column types as an argument to readWorksheet() or readWorksheetFromFile(), by using the argument colType = ....

For example, to read all columns as character, use:

readWorkSheet(..., colType="character")
Andrie
  • 170,733
  • 42
  • 434
  • 486