0

I have a data set of 1000 observations of 17 variables.

I have 3 columns of interest: "quantity" "unit cost" "tax" "total". In each column except the tax column, there is some missing data (not all).

For example, I have quantity, unit cost, and tax but not total.

What code is the best way to do this?

For example, to get the total I need to take quantity*unitcost+tax=total

Thanks

MLavoie
  • 9,277
  • 40
  • 37
  • 54
  • 1
    In order for us to help you, please provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). For example, to produce a minimal data set, you can use `head()`, `subset()`, or the indices. Then use `dput()` to give us something that can be put in R immediately. Also, please make sure you know what to do [when someone answers your question](https://stackoverflow.com/help/someone-answers). More info can be found at Stack Overflow's [help center](https://stackoverflow.com/help). Thank you! – iamericfletcher Nov 23 '20 at 19:37

3 Answers3

0

replace the NAs with the formula (df is your data.frame):

nas <- is.na(df$total)
df[nas, "total"] <- df$quantity[nas] * df$unitcost[nas] + df$tax[nas]

nas is a logical vector that indicates whether the total is missing.

EDIT

I guess you're trying to achieve this for any of the four components?

df
  quantity unit_cost tax total
1        1         1   1    NA
2        2         2   2    NA
3       NA         3   3     4
4        1        NA   4     6
5        2        NA   5     9
for(x in c("quantity", "unit_cost", "tax", "total") ){
    nas <- is.na(df[[x]])
    df[nas,x] <- with(df, switch(x,
                   quantity= (total - tax)/unit_cost ,
                   unit_cost= (total - tax)/quantity,
                   tax= total - quantity*unit_cost,
                   total= tax + quantity*unit_cost
                   ))[nas]
}
   quantity unit_cost tax total
1 1.0000000         1   1     2
2 2.0000000         2   2     6
3 0.3333333         3   3     4
4 1.0000000         2   4     6
5 2.0000000         2   5     9

mind you this solution assumes that there is only one missing value in the 4 variables at each row, otherwise the calculation wouldn't be possible.

data

df <- data.frame(
  quantity = c(1, 2, NA, 1, 2),
  unit_cost = c(1,2,3,NA, NA),
  tax = c(1,2,3,4,5),
    total = c(NA,NA,4,6,9)
)
Abdessabour Mtk
  • 3,745
  • 2
  • 11
  • 21
0

Here is a solution using dplyr. I assume that if quantity, unit_cost, or tax are missing, you would want total to show NA, right?

# Example data:

df <- data.frame(
  quantity = c(1, 2, NA, 1, NA),
  unit_cost = c(1,2,3,NA, NA),
  tax = c(1,2,3,4,5)
)

# Code:

library(dplyr)

df %>% 
  mutate(
    total = quantity * unit_cost + tax
  )

# Output: 

#>   quantity unit_cost tax total
#> 1        1         1   1     2
#> 2        2         2   2     6
#> 3       NA         3   3    NA
#> 4        1        NA   4    NA
#> 5       NA        NA   5    NA

# Original:

#>   quantity unit_cost tax
#> 1        1         1   1
#> 2        2         2   2
#> 3       NA         3   3
#> 4        1        NA   4
#> 5       NA        NA   5
iamericfletcher
  • 2,529
  • 6
  • 17
0

If we want to remove the missing values while doing the calculation, use rowSums

transform(df, total = rowSums(cbind(quantity * unit_cost, tax), na.rm = TRUE))

data

df <- data.frame(
  quantity = c(1, 2, NA, 1, NA),
  unit_cost = c(1,2,3,NA, NA),
  tax = c(1,2,3,4,5)
)
akrun
  • 789,025
  • 32
  • 460
  • 575