Lab 10: Regression Analysis

Please note that all images were created with modifications to the defaults to make them digitally accessible. If you recreate this code in another environment, your plots have different colors and backgrounds.

1 Getting Started

Be sure to load the packages ggformula and mosaic, using the library() function. Remember, you need to do this with each new Quarto document or R Session. Add the package names in each of the blanks below to load in the indicated packages.

library() loads in packages. You need to supply the package name you need to load inside the parentheses.

library(ggformula) #for graphs library(mosaic) #for statistics library(tidyverse) #for data management
library(ggformula) #for graphs
library(mosaic) #for statistics
library(tidyverse) #for data management

2 Muscle Mass

A person’s muscle mass is expected to decrease with age. To explore this relationship in women, a nutritionist randomly selected 60 women between 40 and 79 years old. The research objective is to see if there is the expected muscle mass (in kilograms; kg) linear decrease with respect to age (in years) as an approximate trend. The data are found in the file muscle-mass.csv.

Data from Applied Linear Statistical Models, Kutner et al 2004, 5th ed.

Let’s analyze the data using a linear model. We use the following notation for a linear model:

\[Y_{muscle~mass} = \beta_0 + \beta_1 \cdot X_{age} + \epsilon\]

First, read in the data and look at the variables

Recall from Lab 9 Primer you have already evaluated the basics of the linear model (scatterplot, correlation, slope, intercept, and \(R^2\)). Here we will focus on determining if there is a decline in muscle mass in women over the age of 40, as age increases

2.1 Create the Model

Modify the code below to estimate the population regression line between muscle mass and age.

Use the lm() function to create the model.

2.2 Identify the symbolic hypotheses for this study.

2.2.1 Null Hypothesis







2.2.2 Alternative Hypothesis







Be sure you can also write the hypotheses verbally in context. Bring your answers to office hours or the CLC for review.

2.3 Evaluate the Conditions of the Test

Conduct the appropriate evaluations for the conditions of regression analysis. Modify the code below to create the additional plots needed.

2.3.1 Linearity

gf_point(muscle_mass ~ age, data = muscle, ylab = "Muscle Mass (kg)", xlab = "Age (in years) of Female Participants")
gf_point(muscle_mass ~ age, data = muscle,
           ylab = "Muscle Mass (kg)",
           xlab = "Age (in years) of Female Participants")

2.3.2 QQ Plot of the Residuals

plot(mm_age, 2)
plot(mm_age, 2)

2.3.3 Residual vs. Fitted Plot

plot(mm_age, 1, add.smooth = TRUE)
plot(mm_age, 1, add.smooth = TRUE)

2.3.4 Cook’s Distance Plot

plot(mm_age, 5, add.smooth = TRUE)
plot(mm_age, 5, add.smooth = TRUE)

2.4 Which of the conditions appear to be reasonably met?






Write justifications for each condition. Bring your answers to office hours or the CLC for review.

2.5 Create a Summary Table

Modify the code below the create a summary table to assess whether there is a statistically meaningful relationship between the two variables.

summary(mm_age)
summary(mm_age)

2.6 Estimate the True Slope

Modify the code below to determine the 95% confidence interval for the slope of the regression line.

confint(mm_age, parm = 2, level = 0.95)
confint(mm_age, parm = 2, level = 0.95)

Be sure you can interpret and evaluate the confidence interval. Bring your answers to office hours or the CLC for review.

2.7 Which of the following statements are true?