Lab 6: Hypothesis Testing - Two Independent Groups

Digital Accessibility

Please note that all images were created with modifications to the defaults to make them digitally accessible. If you recreate this code in another environment, your plots have different colors and backgrounds.

1 Getting Started

Be sure to load the packages ggformula and mosaic, using the library() function. Remember, you need to do this with each new Quarto document or R Session. Add the package names in each of the blanks below to load in the indicated packages.

library() loads in packages. You need to supply the package name you need to load inside the parentheses.

library(ggformula) #for graphs
library(mosaic) #for statistics
library(tidyverse) #for data management

Revisit Lab 5 Primer

The examples used in the Lab 6 Primer are continuations from the Lab 5 Primer. We encourage you to go back and review your previous answers and code to help you with your lab.

2 Sex Bias in Professor Ratings

Sex bias stems from a perceived mismatch from an expected role or characteristics based on sex. Studies have shown that men and women have unconscious sex biases against women in traditionally male-dominated fields (such as the sciences) or characteristics (such as leadership qualities). These biases often cause equally qualified women to be seen as less likable or less qualified than the men. (These links are to descriptions of two well-known studies, but there are plenty of other good resources).

Researchers are interested if this sex bias exists in traditionally female-dominated jobs as well, such as teaching. Students are asked to watch a video of an animated classroom and rate the professor. Each student is randomly assigned to either of two animations; the videos are exactly the same except for the sex of the professor drawn. You have been asked to analyze the data for the researchers to determine if the female-identifying professor is rated more poorly, on a 1 to 7 scale (with 7 being the best), than the male-identifying professor.

Run the following code chunk to read in the data and view the variable names and first 6 rows of the data.

2.2 Exploratory Data Analysis

Recall from the Lab 5 Primer, we calculated the following summary statistics and data visualizations.

2.2.1 Summary Statistics

df_stats(Rating ~ Sex, data = bias)

  response    Sex min   Q1 median   Q3 max     mean       sd  n missing
1   Rating Female   0 2.25      4 5.00   6 3.647059 1.554706 34       0
2   Rating   Male   3 4.00      5 5.75   7 4.764706 1.016793 34       0

2.2.2 Data Visualization

gf_boxplot(Rating ~ Sex, data = bias, 
        ylab = "Rating of Professor in Video (Scale 1-7)", 
        xlab = "Sex of the Professor in Video")

2.2.3 QQ Plot

gf_qq(~Rating | Sex, data = bias,
      xlab = "Theoretical Z-Scores",
      ylab = "Rating of Professor in Video") |> 
  gf_qqline()

Based on the provided information, do we meet the necessary conditions to conduct inference using the t-distribution (e.g. confidence interval, hypothesis test)?

Remember to check the conditions of sufficient sample size and normality together.
The sufficient sample size depends on whether our sample indicates the population may or may not be normality distributed (now evaluated using the QQ Plot).
Provide a statement, based on the condition check, to determine if we can or cannot use the t-distribution as a model for null distribution or sampling distribution of our test/sample statistic.

2.3 Calculating the Test Statistic and P-Value

We will practice code for both a “by-hand” calculation and using t.test() (which is what we will be using from in general).

For the “by-hand” calculation, we will need to split the dataset into two parts, one dataset for the male professor video ratings and one for the female professor video ratings.

We can use the function filter() to extract out specific rows associated with a specific variable value. Notice that we use double equal signs == to indicate equivalence with a particular value, and since our variable is a categorical (character), we put the value in quotes.

Calculate the summary statistics for each sample. Modify the code below to calculate the summary statistics for each group.

Here are the necessary summary statistics for the Female professor video Ratings. We will save them for later use. To get them to both save and print, we can add parentheses around each statement

2.4 Calculating a t-Test Statistic and p-Value

Now that we have the necessary summary statistics saved, we can calculate both our test statistic (t) and our p-value. Recall the t-statistic for an independent two-sample test is

\[t_0 = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\]

Recall that we are looking at Female Ratings - Male Ratings. Fill in the blanks below using the saved object names (e.g. mean_f, var_m) from above.

You should get \(se = 0.3185897\) and \(t = -3.508107\).

Now, calculate the p-value using the pt() function. Consider the direction of the alternative hypothesis.

Now, let’s calculate the test statistic and p-value using the t.test() function. Consider the direction of the alternative hypothesis. Recall you have three choices for the alternative.

"two.sided"
"greater"
"less"

2.4.1 Switching the Direction of the Difference

Ultimately, it is up to the researcher to choose the direction of the calculated difference. If we wanted to switch our difference and have Male Ratings - Female Ratings, we would have to tell R to change the ordering of our variable using the mutate() function, since R defaults to reading our groups alphabetically and in the t.test() code would default to Female Ratings - Male Ratings (since F comes before M).

Here is the code to reorder the variable levels so "Male" is read first:

Now, rerun the t.test() code, but using bias_reorder. What do you have to change to make the test equivalent?

2.6 Calculating a Confidence Interval for the Difference of Two Means

In order to calculate the confidence interval for the difference of two population means, it takes on the same structure as a confidence interval for one population mean.

\[point \ estimate \pm (critical \ value)*(standard\ error)\]

\[\bar{x}_1 - \bar{x}_2 \ \pm \ t^*{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\]

We can find our \(t^*\) critical value the same way we found it for previous confidence intervals.

Our point estimate is \(\bar{x}_1 - \bar{x}_2\)

and finally we can calculate the lower bound (lb) and upper bound (ub) for our confidence interval.

Of course, the “by-hand” method is tedious when we can just use t.test() to calculate our confidence interval. Fill in the blanks below to calculate the 95% confidence interval for the difference between the two means.

Lab 6: Hypothesis Testing - Two Independent Groups

1 Getting Started

2 Sex Bias in Professor Ratings

2.1 Identify the Parameters

Identify the study design of this study.

Identify the parameter(s) that would be of interest based on the study design.

Identify the null hypothesis that would be of interest based on the study design.

Identify the alternative hypothesis that would be of interest based on the study design.

2.2 Exploratory Data Analysis

2.2.1 Summary Statistics

2.2.2 Data Visualization

2.2.3 QQ Plot

Based on the provided information, do we meet the necessary conditions to conduct inference using the t-distribution (e.g. confidence interval, hypothesis test)?

2.3 Calculating the Test Statistic and P-Value

Calculate the summary statistics for each sample. Modify the code below to calculate the summary statistics for each group.

2.4 Calculating a t-Test Statistic and p-Value

2.4.1 Switching the Direction of the Difference

2.5 Interpreting and Evaluating the p-Value

Reorder to provide the appropriate interpretation of the p-value.

Evaluate the strength of evidence against the null hypothesis, using a significance level of \(\alpha = 0.05\)

Which of the following statements are true based on the p-value?

2.6 Calculating a Confidence Interval for the Difference of Two Means

Provide an interpretation of the confidence interval by reordering the phrases below.

Which of the following would be plausible estimates for the difference between two true mean ratings?

What can we conclude about the study?