2

I have a survey data set that I'm creating contingency tables for. Each column in the data frame is a question and generally speaking, the questions tend to group together. So to make life easy, I've been using lapply to loop through sections and return the contingency tables with the following code:

> out <- lapply(dat[,162:170], function(x) round(prop.table(table(x,dat$seg_2),2),3)*100)
> out
$r3a_1
            
x               1    2
  Don't Know  1.9  1.4
  No         14.2  4.9
  Yes        83.9 93.7

$r3a_2
            
x               1    2
  Don't Know  2.7  1.7
  No         14.8  6.6
  Yes        82.4 91.6

etc...

As you can see, I'm looping through columns 162:170 and creating a prop table that shows the different responses between groups 1 and 2.

However, I'd like to weight this data. So I'm using the survey package to create a simple weighted survey design object called dat_weight and using svytable() instead of table(). I can run the updated code on a single column manually:

> round(prop.table(svytable(~dat[,162] + dat$seg_2, dat_weight),2),3)*100 
            dat$seg_2
dat[, 162]      1    2
  Don't Know  2.5  2.7
  No         16.5  5.4
  Yes        80.9 91.9

However, when I try to use lapply it doesn't work:

> out <- lapply(dat[,162:170], function(x) round(prop.table(svytable(~x + dat$seg_2, dat_weight),2),3)*100)

Error in eval(expr, envir, enclos) : object 'x' not found 

Clearly the anonymous function call and svytable aren't playing nicely together. I've tried creating a for loop which doesn't work either. I'm guessing this has something to do with scoping but I'm at a loss as to how to fix it.

Surely there has to be a way to loop through chunks of this survey and avoid having to create a unique line of code for each column. Any help would be greatly appreciated.

Edit to add some sample data:

> library("survey")
> dat <- structure(list(r3a_1 = structure(c(3L, 2L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Don't Know", 
"No", "Yes"), class = "factor"), r3a_2 = structure(c(3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L), .Label = c("Don't Know", "No", "Yes"), class = "factor"), 
    r3a_3 = structure(c(3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L), .Label = c("Don't Know", 
    "No", "Yes"), class = "factor"), r3a_4 = structure(c(3L, 
    2L, 2L, 2L, 3L, 2L, 2L, 3L, 3L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 
    3L, 3L, 3L, 1L), .Label = c("Don't Know", "No", "Yes"), class = "factor"), 
    r3a_5 = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 
    2L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 3L, 1L), .Label = c("Don't Know", 
    "No", "Yes"), class = "factor"), r3a_6 = structure(c(3L, 
    3L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 2L, 2L, 3L, 
    2L, 3L, 3L, 3L), .Label = c("Don't Know", "No", "Yes"), class = "factor"), 
    r3a_7 = structure(c(1L, 2L, 2L, 2L, 3L, 2L, 2L, 3L, 3L, 2L, 
    3L, 3L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("Don't Know", 
    "No", "Yes"), class = "factor"), r3a_8 = structure(c(3L, 
    2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 2L, 
    2L, 2L, 3L, 3L), .Label = c("Don't Know", "No", "Yes"), class = "factor"), 
    r3a_9 = structure(c(1L, 3L, 2L, 2L, 3L, 2L, 2L, 3L, 3L, 3L, 
    3L, 3L, 2L, 2L, 2L, 3L, 2L, 2L, 3L, 3L), .Label = c("Don't Know", 
    "No", "Yes"), class = "factor"), weight = c(0.34, 0.34, 0.34, 
    0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.43, 
    0.43, 0.43, 0.34, 0.34, 0.34, 0.34, 0.34), seg_2 = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L), .Label = c("1", "2"), class = "factor")), .Names = c("r3a_1", 
"r3a_2", "r3a_3", "r3a_4", "r3a_5", "r3a_6", "r3a_7", "r3a_8", 
"r3a_9", "weight", "seg_2"), row.names = c(NA, 20L), class = "data.frame")

> dat_weight <- svydesign(ids = ~1, weights = ~weight, data = dat)

From there you can get the weighted and unweighted tables:

round(prop.table(table(dat[,1],dat$seg_2),2),3)*100  #unweighted

round(prop.table(svytable(~dat[,1] + dat$seg_2, dat_weight),2),3)*100   #weighted

However, this works:

lapply(dat[,1:9], function(x) round(prop.table(table(x,dat$seg_2),2),3)*100)

While this doesn't:

lapply(dat[,1:9], function(x) round(prop.table(svytable(~x + dat$seg_2, dat_weight),2),3)*100)
Community
  • 1
  • 1
  • It would be a lot easier to help if you could provide a sample data set to test possible solutions. When you use a formula syntax, normally you don't actually pass in a vector, normally you pass in a variable name from a data.frame. I'm guessing you can provide a `data=` parameter to `svytable` and then get the formula to work. See [how to make a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – MrFlick Jul 29 '14 at 03:06
  • Hi, I've just edited the post with some data that reproduces the problem. edit: The data= parameter is included in the design object dat_weight. – Texas_Crane Jul 29 '14 at 03:26
  • Perfect. That made it much easier to help, and that's why we almost always ask for a reproducible example when someone asks a question. – MrFlick Jul 29 '14 at 03:44

1 Answers1

4

Ok, well, it seems the svytable function is picky and will only look up data in the design object. It doesn't seem to look for x in the enclosing environment. So an alternative approach is to dynamically build the formula. So instead of passing in the columns of data themselves, we pass in names of columns form the data.frame. Then we plug those into the formula and then they are resolved by the design object which points to the original data.frame. Here's a bit of working code using your sample data

lapply(names(dat)[1:9], function(x) round(prop.table(
    svytable(bquote(~.(as.name(x)) + seg_2), dat_weight),
2),3)*100)

So here we use bquote to build the formula. The .() allows us to plug in expressions and here we take the character value in x and convert it to a proper name object. Thus is goes from "r3a_9" to r3a_9.

MrFlick
  • 178,638
  • 15
  • 253
  • 268