Lab 10: Regression Analysis
Please note that all images were created with modifications to the defaults to make them digitally accessible. If you recreate this code in another environment, your plots have different colors and backgrounds.
1 Getting Started
Be sure to load the packages ggformula and mosaic, using the library() function. Remember, you need to do this with each new Quarto document or R Session. Add the package names in each of the blanks below to load in the indicated packages.
library() loads in packages. You need to supply the package name you need to load inside the parentheses.
library(ggformula) #for graphs
library(mosaic) #for statistics
library(tidyverse) #for data management
library(ggformula) #for graphs
library(mosaic) #for statistics
library(tidyverse) #for data management2 Muscle Mass
A person’s muscle mass is expected to decrease with age. To explore this relationship in women, a nutritionist randomly selected 60 women between 40 and 79 years old. The research objective is to see if there is the expected muscle mass (in kilograms; kg) linear decrease with respect to age (in years) as an approximate trend. The data are found in the file muscle-mass.csv.
Data from Applied Linear Statistical Models, Kutner et al 2004, 5th ed.
Let’s analyze the data using a linear model. We use the following notation for a linear model:
\[Y_{muscle~mass} = \beta_0 + \beta_1 \cdot X_{age} + \epsilon\]
First, read in the data and look at the variables
Recall from Lab 9 Primer you have already evaluated the basics of the linear model (scatterplot, correlation, slope, intercept, and \(R^2\)). Here we will focus on determining if there is a decline in muscle mass in women over the age of 40, as age increases
2.1 Create the Model
Modify the code below to estimate the population regression line between muscle mass and age.
Use the lm() function to create the model.
2.2 Identify the symbolic hypotheses for this study.
2.2.1 Null Hypothesis
2.2.2 Alternative Hypothesis
Be sure you can also write the hypotheses verbally in context. Bring your answers to office hours or the CLC for review.
2.3 Evaluate the Conditions of the Test
Conduct the appropriate evaluations for the conditions of regression analysis. Modify the code below to create the additional plots needed.
2.3.1 Linearity
gf_point(muscle_mass ~ age, data = muscle,
ylab = "Muscle Mass (kg)",
xlab = "Age (in years) of Female Participants")
gf_point(muscle_mass ~ age, data = muscle,
ylab = "Muscle Mass (kg)",
xlab = "Age (in years) of Female Participants")2.3.2 QQ Plot of the Residuals
plot(mm_age, 2)
plot(mm_age, 2)2.3.3 Residual vs. Fitted Plot
plot(mm_age, 1, add.smooth = TRUE)
plot(mm_age, 1, add.smooth = TRUE)2.3.4 Cook’s Distance Plot
plot(mm_age, 5, add.smooth = TRUE)
plot(mm_age, 5, add.smooth = TRUE)2.4 Which of the conditions appear to be reasonably met?
Write justifications for each condition. Bring your answers to office hours or the CLC for review.
2.5 Create a Summary Table
Modify the code below the create a summary table to assess whether there is a statistically meaningful relationship between the two variables.
summary(mm_age)
summary(mm_age)2.6 Estimate the True Slope
Modify the code below to determine the 95% confidence interval for the slope of the regression line.
confint(mm_age, parm = 2, level = 0.95)
confint(mm_age, parm = 2, level = 0.95)Be sure you can interpret and evaluate the confidence interval. Bring your answers to office hours or the CLC for review.