Lab 8: ANOVA

Please note that all images were created with modifications to the defaults to make them digitally accessible. If you recreate this code in another environment, your plots have different colors and backgrounds.

1 Getting Started

Be sure to load the packages ggformula and mosaic, using the library() function. Remember, you need to do this with each new Quarto document or R Session. Add the package names in each of the blanks below to load in the indicated packages.

library() loads in packages. You need to supply the package name you need to load inside the parentheses.

library(ggformula) #for graphs library(mosaic) #for statistics library(tidyverse) #for data management
library(ggformula) #for graphs
library(mosaic) #for statistics
library(tidyverse) #for data management

2 Exercise and Brain Size

In 2020, the Center for Disease Control (CDC) estimated that as many as 5.8 million Americans were living with Alzheimer’s. While there are several diseases that can cause dementia, Alzheimer’s disease is the most common type of dementia. There are many risk factors that contribute to dementia that can be controlled, like diet, exercise, smoking status & alcohol consumption. Yet other risk factors like genetics and aging can’t be controlled. Brain size typically starts to shrink in your 30s and 40s with an increase in shrinkage rate at age 60. Therefore, any intervention that can protect against brain shrinkage could help to protect the elderly against dementia and Alzheimer’s disease. Researchers in China recently investigated whether different kinds of exercise/activity might help to prevent brain shrinkage or perhaps even lead to an increase in brain volume (Mortimer et al., 2012).

The researchers randomly assigned elderly adult volunteers into four activity groups: tai chi, walking, social interaction, and no intervention. Except for the group with no intervention, each group met for about an hour three times a week for 40 weeks to participate in their assigned activity. Each participant had their brains imaged using magnetic resonance imaging (MRI) to determine brain volume before the study began and again at its end. The researchers measured the percentage change in brain volume in each participant’s brain during that time. If a person’s brain volume increased, then this percentage change was positive; if brain volume decreased, then this percentage change was negative.

2.1 Identify Variables and Types

For each variable, identify whether it is the explanatory or response variable in our analysis, and the type of variable.

Activity:








Change in brain volume (%)








2.2 Identify the study type of this study.

Be sure you are able to provide a full justification.




2.3 Exploratory Data Analysis

Conduct Exploratory Data Analysis (EDA). Modify the code below to calculate any summary statistics and produce a graphic.

df_stats(brain_change ~ treatment, data = brain)
df_stats(brain_change ~ treatment, data = brain)
gf_boxplot(brain_change ~ treatment, data = brain, ylab = "Percentage Change in Brain Volume", xlab = "Activity Group for 40 Weeks")
gf_boxplot(brain_change ~ treatment, data = brain,
           ylab = "Percentage Change in Brain Volume",
           xlab = "Activity Group for 40 Weeks")

2.4 Identify the hypotheses for this study.

Note, these options are generic and not complete. Be sure you are able to complete them in context, symbolically and verbally for your labs.








2.5 Modify the code below to create the ANOVA model

brain_aov <- aov(brain_change ~ factor(treatment), data = brain)

2.6 Evaluate Conditions

Conduct an appropriate testing of the conditions to trust an ANOVA analysis.

2.6.1 Normality of Residuals

plot(brain_aov, 2)
plot(brain_aov, 2)
Are the conditions for normality/sufficient sample size met in order to use the F distribution as a model for the null distribution of the F-ratio?



2.6.2 Constant Variance (Homogeneity) of Populations

plot(brain_aov, 1, add.smooth = FALSE)
plot(brain_aov, 1, add.smooth = FALSE)
Are the conditions for constant variance across populations met in order to use the F distribution as a model for the null distribution of the F-ratio?



2.7 ANOVA Table

Complete the code below to print out an anova table for your analysis. Fill in the values of the appropriate statistics. Round to 4 decimal places

anova(brain_aov)
anova(brain_aov)

2.7.1 \(df_e\):

2.7.2 \(df_t\):

2.7.3 MSE:

2.7.4 MST:

2.7.5 F:

2.7.6 p-value:


2.8 Evaluate the strength of your evidence from the hypothesis test, using a 0.05 significance level.







2.9 Conduct a Tukey test by modifying the code below.

TukeyHSD(brain_aov)
TukeyHSD(brain_aov)

2.10 Identify which pairs are different (at least moderate evidence against the null).








You might be wondering why we found evidence against the null for the ANOVA, but only “some” evidence against the null for one pairwise comparison from the Tukey Test. There are several possible reasons this could happen.

  1. Conditions are Not Met: While not applicable to our situation, this could be a reason the ANOVA and Post-Hoc Tests do not match, especially if the constant variance condition is not met.
  2. Under-powered Study Design: For all the comparisons we are doing we are actually using a ‘smaller’ significance level than the \(\alpha = 0.05\) experimentwise error rate. Since decreasing the significance level also decreases the power, it is possible we do not have big enough group sizes to detect the effect size of interest in our study.
  3. Unbalanced Design: Tukey Tests in particular are sensitive to unbalanced designs, i.e. each group has a different number of replicates, which could explain the weak evidence again the null for only one group for our study, which is unbalanced.

So what is a statistician to do? There are post-hoc comparisons that might be better suited for this study than the Tukey HSD Test. You can learn more about those methods in STAT 325: Experimental Design and Analysis.

2.11 Which of the following conclusions about the study are true?