Finding Correlation and Regression in R

Question

I want to perform the following task using fastfooddataset fromopenintro` package in R.

a) Create a correlation matrix for the relations between calories, total_fat, sugar, and calcium for all items at Sonic, Subway, and Taco Bell, omitting missing values with na.omit().

b) Create a regression predicting whether or not a restaurant is McDonalds or Subway based on calories, sodium, and protein.

c) Run a regression predicting calories from saturated fat, fiber, and sugar. Based on standardized regression coefficients, identify the strongest predictor.

here is my code:

library(tidyverse)

library(openintro)

library(lm.beta)

fastfood <- openintro::fastfood

head(fastfood)

fastfood.corr <- cor(fastfood$calories, fastfood$total_fat, fastfood$sugar, fastfood$calcium,use="pairwise.complete.obs" ,method = "pearson")

but I'm getting errors Error in match.arg(alternative) : 'arg' must be NULL or a character vector:

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Please ask one specific programming question at a time. `cor.test` is just for observations from values from two groups. You can't pass in multiple different measurements at once. If you want a matrix you probably want a function like `cor()` instead. You might still need to reshape your data. — MrFlick, Feb 25 '22 at 07:42
`fastfood` dataset is available in `openintro` package, dataset is my input and my output is correlation and regression. — tony michael, Feb 25 '22 at 07:49

jay.sf · Accepted Answer · 2022-02-25T08:20:13.063

0

You probably should apply cor on a subset of your data frame columns like so:

cor(fastfood[c('calories', 'total_fat', 'sugar', 'calcium')], 
    use="pairwise.complete.obs", method="pearson")
#            calories total_fat     sugar   calcium
# calories  1.0000000 0.9004937 0.4377113 0.3512067
# total_fat 0.9004937 1.0000000 0.2593702 0.1688170
# sugar     0.4377113 0.2593702 1.0000000 0.3105594
# calcium   0.3512067 0.1688170 0.3105594 1.0000000

You may also subset for specific rows (e.g. restaurants). (The difference to above is that when we subset without commas, data[j], columns are selected, and when we use a comma data[i, j], i are rows and j are columns. See ?Extract.)

cor(fastfood[fastfood$restaurant %in% c("Sonic", "Subway", "Taco Bell"),
             c('calories', 'total_fat', 'sugar', 'calcium')], 
    use="pairwise.complete.obs", method="pearson")
#            calories total_fat     sugar   calcium
# calories  1.0000000 0.8402781 0.5150627 0.6127083
# total_fat 0.8402781 1.0000000 0.2234985 0.2415309
# sugar     0.5150627 0.2234985 1.0000000 0.6690489
# calcium   0.6127083 0.2415309 0.6690489 1.0000000

You could also use dplyr syntax, but it's more code.

library(dplyr)
fastfood %>%
  filter(restaurant %in% c("Sonic", "Subway", "Taco Bell")) %>%
  select(calories, total_fat, sugar, calcium) %>%
  cor(use="pairwise.complete.obs", method="pearson")
#            calories total_fat     sugar   calcium
# calories  1.0000000 0.8402781 0.5150627 0.6127083
# total_fat 0.8402781 1.0000000 0.2234985 0.2415309
# sugar     0.5150627 0.2234985 1.0000000 0.6690489
# calcium   0.6127083 0.2415309 0.6690489 1.0000000

Data:

fastfood <- openintro::fastfood

edited Feb 25 '22 at 08:20

answered Feb 25 '22 at 08:04

jay.sf

46,523
6
46
87

Thanks, How can I do it for all items at Sonic, Subway, and Taco Bell only? – tony michael Feb 25 '22 at 08:06
@tonymichael Please see updated answer. – jay.sf Feb 25 '22 at 08:12
Thanks, it works let me try part b and c – tony michael Feb 25 '22 at 08:17
@tonymichael Good luck! – jay.sf Feb 25 '22 at 08:17
Here is my code for part b `fit_1 – tony michael Feb 25 '22 at 08:40
@tonymichael That doesn't work because `restaurant` is nominal, try to replace with `I(restaurant == "Subway")` or `I(restaurant %in% c("Subway", "Mcdonalds"))` in formula instead. – jay.sf Feb 25 '22 at 08:43
@tonymichael See documentation on `?I` or `help('I')` (section _formula_), which is what you want to do for each function you are learning. – jay.sf Feb 25 '22 at 08:51
Thanks, now I have two models, `fit_1 – tony michael Feb 25 '22 at 08:56
@tonymichael There;'s an `AIC` function , try `AIC(fit_1)`. – jay.sf Feb 25 '22 at 08:58

Finding Correlation and Regression in R

1 Answers1