0

Hello I have a user defined function lambda , into which I am passing age (integer), gender (integer) and a file (BMI).. When I call it I get some value, which is wrong. But when I execute each statements in the function I get the right answer. Strange thing is

  1. after I executing single statements from the function and getting the expected answer, if I call the function again, I get the right answer.
  2. Then if change the values when calling the function, it gives the old answer (of step #1). And it never changes until I execute single statements from the function (as in the step #1)

Can someone please help me..

lambda <-function(ag, gnd, ibmi) {

        sel <- subset(ibmi, age==ag & gender == gnd)
        out <- boxcox(lm(sel$bmi~1))
        rn <- range(out$x[out$y > max(out$y)-qchisq(0.95,1)/2])
        return (rn[1] + rn[2])/2
    }

This is how I call the function..

> lambda(14,1,bmi)
[1] -0.4040404

Executing each statements frm the fn

> sel <- subset(bmi, age==14 & gender == 1)
> out <- boxcox(lm(sel$bmi~1))
> rn <- range(out$x[out$y > max(out$y)-qchisq(0.95,1)/2])
> (rn[1] + rn[2])/2
[1] -0.3636364

Again I call the function

> lambda(14,1,bmi)
[1] -0.3636364

Calling function with different value for Age

> lambda(17,1,bmi)
[1] -0.3636364

> dput(head(bmi[,c(2,3,4,5,6,7,8,9)], 50))
structure(list(age = c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6), gender = c(2L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L), diabetic = structure(c(3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "No", "Yes"
), class = "factor"), height = c(117, 117.5, 123, 133.5, 110.2, 
126, 119, 113.8, 125.8, 115.6, 124.1, 117.8, 128, 117, 121, 116.6, 
111.3, 116, 118, 123, 119.5, 118.3, 121.5, 116.4, 121, 110, 114, 
116, 113.5, 115.5, 111.7, 116.5, 121.6, 129.9, 115.3, 116, 116.5, 
122.2, 123.4, 118, 117, 107.2, 116.5, 126.6, 118.2, 119.6, 141, 
118.7, 150.6, 113), weight = c(21.2, 21, 19.2, 36.7, 18.1, 42.6, 
36.2, 18.6, 25.1, 20.1, 40.2, 29.7, 32.6, 20.4, 24.5, 20.9, 17.3, 
22, 21.4, 23.5, 27.5, 23.3, 33.4, 22.2, 24, 16.9, 18.8, 23, 20.1, 
20.8, 26.4, 21.5, 41.2, 47.1, 31.4, 18.9, 27.1, 33.6, 26.2, 21.3, 
20, 17.6, 22.7, 24.3, 19.9, 32, 45.4, 21.4, 20.9, 21.2), bmi = c(15.49, 
15.21, 12.69, 20.59, 14.9, 26.83, 25.56, 14.36, 15.86, 15.04, 
26.1, 21.4, 19.9, 14.9, 16.73, 15.37, 13.97, 16.35, 15.37, 15.53, 
19.26, 16.65, 22.63, 16.39, 16.39, 13.97, 14.47, 17.09, 15.6, 
15.59, 21.16, 15.84, 27.86, 27.91, 23.62, 14.05, 19.97, 22.5, 
17.21, 15.3, 14.61, 15.32, 16.73, 15.16, 14.24, 22.37, 22.84, 
15.19, 9.22, 16.6), systolic = c(108L, 104L, 93L, 103L, 99L, 
103L, 130L, 105L, 131L, 102L, 104L, 137L, 129L, 84L, 116L, 101L, 
108L, 101L, 92L, 111L, 100L, 112L, 107L, 118L, 112L, 104L, 108L, 
140L, 100L, 78L, 103L, 100L, 119L, 106L, 110L, 116L, 100L, 102L, 
96L, 90L, 122L, 104L, 110L, 96L, 106L, 106L, 146L, 120L, 102L, 
116L), diastolic = c(68L, 69L, 65L, 76L, 85L, 73L, 90L, 80L, 
70L, 80L, 78L, 113L, 109L, 64L, 97L, 76L, 63L, 55L, 64L, 86L, 
80L, 94L, 68L, 90L, 77L, 75L, 68L, 120L, 70L, 57L, 70L, 75L, 
78L, 90L, 70L, 67L, 75L, 70L, 68L, 82L, 80L, 71L, 70L, 62L, 71L, 
78L, 87L, 80L, 79L, 89L)), .Names = c("age", "gender", "diabetic", 
"height", "weight", "bmi", "systolic", "diastolic"), row.names = c(NA, 
50L), class = "data.frame")
> 
arshad
  • 315
  • 1
  • 10

3 Answers3

2

There are several (potential) issues:

  1. subset should not be used inside functions. Use [ for subsetting.

  2. Scoping issue are common if you pass a fitted lm model to functions. Avoid that and use the boxcox formula method.

  3. You had missing parens in your return statement. return is a function in R.

Thus:

lambda <-function(ag, gnd, ibmi) {
  require(MASS)
  sel <- ibmi[ibmi$age==ag & ibmi$gender == gnd,]
  out <- boxcox(bmi~1, data = sel)
  rn <- range(out$x[out$y > max(out$y)-qchisq(0.95,1)/2])
  return((rn[1] + rn[2])/2)
}

lambda(6,2,bmi)
#[1] -0.4040404

sel <- subset(bmi, age==6 & gender == 2)
out <- boxcox(lm(sel$bmi~1))
rn <- range(out$x[out$y > max(out$y)-qchisq(0.95,1)/2])
(rn[1] + rn[2])/2
#[1] -0.4040404
Roland
  • 122,144
  • 10
  • 182
  • 276
  • Side note: `return` is absolutely not needed as last statement in a function, it can be more readable with it but it adds a small overhead when the interpreter looks for the function before doing the usual return call at end of function. – Tensibai May 26 '16 at 13:03
1

I think the issue is when passing the lm class object to the boxplot. Just avoid using lm altogether and it works.

lambda <-function(ag, gnd, ibmi) {
  sel <- ibmi[ibmi$age==ag & ibmi$gender == gnd, ]
  out <- boxcox(bmi~1, data = sel)
  rn <- range(out$x[out$y > max(out$y)-qchisq(0.95,1)/2])
  return ((rn[1] + rn[2])/2)
}

> lambda(6,2,bmi)
[1] -0.4040404

As seen, the lm is actually redundant. Although we can still use lm first, we must define the data both in the lm and boxplot function to make it work properly

lambda <-function(ag, gnd, ibmi) {
  sel <- ibmi[ibmi$age==ag & ibmi$gender == gnd, ]
  out <- boxcox(lm(bmi~1, data = sel), data = sel)
  rn <- range(out$x[out$y > max(out$y)-qchisq(0.95,1)/2])
  return ((rn[1] + rn[2])/2)
}
zyurnaidi
  • 1,993
  • 12
  • 14
0

Given all the comments, and with your example, here is a working function:

library(MASS)
lambda <-function(ag, gnd, ibmi) {
  sel <- ibmi[ibmi$age==ag & ibmi$gender == gnd,]
  out <- boxcox(lm(bmi~1, sel))
  rn <- range(out$x[out$y > max(out$y)-qchisq(0.95,1)/2])
  return(mean(rn))
}

eg:

lambda(6,2,bmi)
Vincent Bonhomme
  • 6,837
  • 2
  • 23
  • 35