Lab 9: Correlation and Linear Models

Please note that all images were created with modifications to the defaults to make them digitally accessible. If you recreate this code in another environment, your plots have different colors and backgrounds.

1 Getting Started

Be sure to load the packages ggformula and mosaic, using the library() function. Remember, you need to do this with each new Quarto document or R Session. Add the package names in each of the blanks below to load in the indicated packages.

library() loads in packages. You need to supply the package name you need to load inside the parentheses.

library(ggformula) #for graphs library(mosaic) #for statistics library(tidyverse) #for data management
library(ggformula) #for graphs
library(mosaic) #for statistics
library(tidyverse) #for data management

2 Muscle Mass

A person’s muscle mass is expected to decrease with age. To explore this relationship in women, a nutritionist randomly selected 60 women between 40 and 79 years old. The research objective is to see if there is the expected muscle mass (in kilograms; kg) linear decrease with respect to age (in years) as an approximate trend. The data are found in the file muscle-mass.csv.

Data from Applied Linear Statistical Models, Kutner et al 2004, 5th ed.

Let’s analyze the data using a linear model. We use the following notation for a linear model:

\[Y_{muscle~mass} = \beta_0 + \beta_1 \cdot X_{age} + \epsilon\]

First, read in the data and look at the variables.

2.0.1 For each variable, identify whether it is the explanatory or response variable in our analysis, and the type of variable.

Age:





Muscle Mass:





2.0.2 Identify the study type of this study. Be sure you are able to provide a full justification.




3 Exploratory Data Analysis

Conduct Exploratory Data Analysis (EDA). Modify the code below to calculate any summary statistics and produce a graphic appropriate for assessing relationships between two variables.

cor(muscle_mass ~ age, data = muscle)
cor(muscle_mass ~ age, data = muscle)
gf_point(muscle_mass ~ age, data = muscle, ylab = "Muscle Mass (kg)", xlab = "Age (in years) of Female Participants")
gf_point(muscle_mass ~ age, data = muscle,
           ylab = "Muscle Mass (kg)",
           xlab = "Age (in years) of Female Participants")

3.0.1 Based on the Correlation and Scatterplot, select the best descriptions of the relationship between age of females and muscle mass.








4 Constructing the Linear Model

Now let’s fit a least squares regression line to our data, using the form

\[\hat{y}_{muscle~mass} = b_0 + b_1x_{age}\]

4.1 Modify your code above to add a linear model to the plot.

gf_point(muscle_mass ~ age, data = muscle, ylab = "Muscle Mass (kg)", xlab = "Age (in years) of Female Participants") |> gf_lm()
gf_point(muscle_mass ~ age, data = muscle,
           ylab = "Muscle Mass (kg)",
           xlab = "Age (in years) of Female Participants") |> 
  gf_lm()

4.2 Modify the code below to estimate the population regression line using the data and find the slope and intercept.

Hint: use the functions lm() and coef()

4.2.1 Intercept:

Round to 2 decimal places

4.2.2 Choose the correct interpretation of the intercept from the options below.

The [intercept] would be replaced with the value calculated above.






4.2.3 Slope:

Round to 2 decimal places

4.2.4 Choose the correct interpretation of the slope from the options below.

The [slope] would be replaced with the value calculated above.






4.3 Modify the code below the create a summary table to assess whether there is a statistically meaningful relationship between the two variables. Then determine the following statistics for your linear model.

summary(mm_age)
summary(mm_age)

4.3.1 \(R^2\):


If you wanted to just print out the \(R^2\) value in R, you could use the following code:

4.3.2 Arrange the words to interpret the coefficient of determination. \(R^2\) is…

by the linear model
with age
in women aged 40-79 years old
that is explained
the proportion of variability
in muscle mass


5 Calculations using the Linear Model

Now that you have the linear model you can calculate the specific values to either

  1. Estimate the mean muscle mass for a particular age
  2. Predict the muscle mass for an individual of a particular age

5.1 Calculate the average muscle mass of 60 year old women.

156.35 + -1.19*60
156.35 + -1.19*60

If you wanted to save yourself some hand calculations, you could use the predict() function in R. You just provide newdata and it will calculate a prediction for the provided value.