1

I'm trying to rename the columns in a data frame using matching from a lookup table.

oldvars = c("mpg", "cyl" , "disp",  "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb")
newvars = c("Miles Per Gallon", "Cycle", "Displacement", "Horsepower", "Distance Rating", 
"Working Time", "Quick Second", "Versus", "America", "Gears", "Carbohydrates")

lookup = data.frame(oldvars, newvars)
mycars = mtcars

Using the lookup list to match oldvars and change them into newvars, so that names(mycars) would output "Miles Per Gallon", "Cycle", "Displacement", "Horsepower", "Distance Rating", "Working Time", "Quick Second", "Versus", "America", "Gears", "Carbohydrates"

I've tried using colnames to change the names but it isn't reading the variable like I was expecting. The following

for(i in 1:length(newvars)) {
  colnames(mycars)[oldvars[i]] = newvars[i]
} 

just outputs NAs

Gregor Thomas
  • 119,032
  • 17
  • 152
  • 277
jamzsabb
  • 1,025
  • 1
  • 16
  • 36
  • Subset `names(test)` by "second" and assign the value of `lookup$first`. – A5C1D2H2I1M1N2O1R2T1 Jan 11 '16 at 15:50
  • 1
    Please provide a [reproducible example](http://stackoverflow.com/a/5963610/1412059). – Roland Jan 11 '16 at 15:53
  • 2
    Instead of a new question to get more responses, this should be added to your original question http://stackoverflow.com/questions/34723607/append-data-frame-to-another-data-frame-with-different-column-names – Pierre L Jan 11 '16 at 15:53
  • @Roland I completely re-wrote my question, wrote it after spending too long staring at it this morning. It is now reproducible and shows expected output – jamzsabb Jan 11 '16 at 19:52
  • @PierreLafortune it's related but it's a separate question. I felt it was worth a separate post, I'm not just being a pain and abusing the system. – jamzsabb Jan 11 '16 at 19:57

3 Answers3

10

If you know they're in the same order (as they are in your example) then the answer is easy:

names(mycars) = newvars

However, let's focus on a more general case - not all column need renaming, and there might be some extra names in the name vector, and the order is not necessarily the same:

oldvars = c("cyl" ,"mpg",  "disp",  "foo")
newvars = c( "Cycle", "Miles Per Gallon", "Displacement", "bar")

mycars = head(mtcars, 2) # short data frame to work with

Then match is your friend for determining the correct order:

name_match = match(names(mycars), lookup$oldvars)
name_match
[1]  2  1  3 NA NA NA NA NA NA NA NA

# assign the names
names(mycars)[na.omit(name_match)] = newvars[!is.na(name_match)]

data.table has a really nice interface for this, setnames accepts a vector of old and new names

library(data.table)
setDT(mycars) # convert the data frame to data.table
setnames(mycars, old = oldvars, new = newvars) # update names by reference

The old plyr package has a rename function that accepts a named vector, where the vector values are the new names and the vector names are the old names:

plyr::rename(mycars, setNames(newvars, oldvars))
Gregor Thomas
  • 119,032
  • 17
  • 152
  • 277
1

Adding as answer only because I can't comment yet.

In @Gregor Thomas's correct answer, I suggest to reverse the arguments in the match() call, to account for cases where matching oldvars are not continuous from the first element in names(mycars). Full example below since I have the space.

mycars = head(mtcars, 2)    

oldvars = c("cyl" ,"mpg",  "wt",  "foo") #note change to variable selection in lookup
newvars = c( "Cycle", "Miles Per Gallon", "Weight", "bar")
name_match = match(names(mycars), oldvars)
name_match
[1]  2  1 NA NA NA  3 NA NA NA NA NA

#After omitting the `NA` elements, the match vector no longer properly aligns with
#the names(mycars) vector

names(mycars)[na.omit(name_match)] = newvars[!is.na(name_match)]
names(mycars) 
[1] "Miles Per Gallon" "Cycle"            NA   
[4] "hp"               "drat"             "wt"
[7] "qsec"             "vs"               "am"
[10] "gear"             "carb"

#instead, reverse the arguments in match() to find the data.frame names that appear in the lookup

name_match = match(oldvars, names(mycars))
name_match
[1]  2  1  6 NA
names(mycars)[na.omit(name_match)] = newvars[!is.na(name_match)]
names(mycars)
[1] "Miles Per Gallon" "Cycle"            "disp" 
[4] "hp"               "drat"             "Weight" 
[7] "qsec"             "vs"               "am"    
[10] "gear"             "carb"
fantanaman
  • 35
  • 7
  • It's ok to post another answer fixing something in the existing answer, so it is not needed to explain yourself at the start. But remember each answer should be comprehensive on its own. – Dharman Mar 15 '20 at 15:55
0

Solved it with a double for loop

for(i in 1:length(newvars)) {
  for(z in 1:length(newvars)) {
    if(colnames(mycars)[i] == oldvars[z]) {
      colnames(mycars)[i] = newvars[z]
    }
  }
}

Highly inefficient but it gets the job done

jamzsabb
  • 1,025
  • 1
  • 16
  • 36
  • The *double* loop certainly isn't needed. `nn = names(mycars); for(i in 1:length(oldvars)){name(mycars)[nn == oldvars[i]] = newvars[i]}` will do. – Gregor Thomas Sep 16 '19 at 16:28