0

I want to perform the following task using fastfooddataset fromopenintro` package in R.

a) Create a correlation matrix for the relations between calories, total_fat, sugar, and calcium for all items at Sonic, Subway, and Taco Bell, omitting missing values with na.omit().

b) Create a regression predicting whether or not a restaurant is McDonalds or Subway based on calories, sodium, and protein.

c) Run a regression predicting calories from saturated fat, fiber, and sugar. Based on standardized regression coefficients, identify the strongest predictor.

here is my code:

library(tidyverse)

library(openintro)

library(lm.beta)

fastfood <- openintro::fastfood

head(fastfood)

fastfood.corr <- cor(fastfood$calories, fastfood$total_fat, fastfood$sugar, fastfood$calcium,use="pairwise.complete.obs" ,method = "pearson")

but I'm getting errors Error in match.arg(alternative) : 'arg' must be NULL or a character vector:

  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Please ask one specific programming question at a time. `cor.test` is just for observations from values from two groups. You can't pass in multiple different measurements at once. If you want a matrix you probably want a function like `cor()` instead. You might still need to reshape your data. – MrFlick Feb 25 '22 at 07:42
  • `fastfood` dataset is available in `openintro` package, dataset is my input and my output is correlation and regression. – tony michael Feb 25 '22 at 07:49

1 Answers1

0

You probably should apply cor on a subset of your data frame columns like so:

cor(fastfood[c('calories', 'total_fat', 'sugar', 'calcium')], 
    use="pairwise.complete.obs", method="pearson")
#            calories total_fat     sugar   calcium
# calories  1.0000000 0.9004937 0.4377113 0.3512067
# total_fat 0.9004937 1.0000000 0.2593702 0.1688170
# sugar     0.4377113 0.2593702 1.0000000 0.3105594
# calcium   0.3512067 0.1688170 0.3105594 1.0000000

You may also subset for specific rows (e.g. restaurants). (The difference to above is that when we subset without commas, data[j], columns are selected, and when we use a comma data[i, j], i are rows and j are columns. See ?Extract.)

cor(fastfood[fastfood$restaurant %in% c("Sonic", "Subway", "Taco Bell"),
             c('calories', 'total_fat', 'sugar', 'calcium')], 
    use="pairwise.complete.obs", method="pearson")
#            calories total_fat     sugar   calcium
# calories  1.0000000 0.8402781 0.5150627 0.6127083
# total_fat 0.8402781 1.0000000 0.2234985 0.2415309
# sugar     0.5150627 0.2234985 1.0000000 0.6690489
# calcium   0.6127083 0.2415309 0.6690489 1.0000000

You could also use dplyr syntax, but it's more code.

library(dplyr)
fastfood %>%
  filter(restaurant %in% c("Sonic", "Subway", "Taco Bell")) %>%
  select(calories, total_fat, sugar, calcium) %>%
  cor(use="pairwise.complete.obs", method="pearson")
#            calories total_fat     sugar   calcium
# calories  1.0000000 0.8402781 0.5150627 0.6127083
# total_fat 0.8402781 1.0000000 0.2234985 0.2415309
# sugar     0.5150627 0.2234985 1.0000000 0.6690489
# calcium   0.6127083 0.2415309 0.6690489 1.0000000

Data:

fastfood <- openintro::fastfood
jay.sf
  • 46,523
  • 6
  • 46
  • 87