Lab 6: Hypothesis Testing - Two Independent Groups

Please note that all images were created with modifications to the defaults to make them digitally accessible. If you recreate this code in another environment, your plots have different colors and backgrounds.

1 Getting Started

Be sure to load the packages ggformula and mosaic, using the library() function. Remember, you need to do this with each new Quarto document or R Session. Add the package names in each of the blanks below to load in the indicated packages.

library() loads in packages. You need to supply the package name you need to load inside the parentheses.

library(ggformula) #for graphs library(mosaic) #for statistics library(tidyverse) #for data management
library(ggformula) #for graphs
library(mosaic) #for statistics
library(tidyverse) #for data management
Revisit Lab 5 Primer

The examples used in the Lab 6 Primer are continuations from the Lab 5 Primer. We encourage you to go back and review your previous answers and code to help you with your lab.

2 Sex Bias in Professor Ratings

Sex bias stems from a perceived mismatch from an expected role or characteristics based on sex. Studies have shown that men and women have unconscious sex biases against women in traditionally male-dominated fields (such as the sciences) or characteristics (such as leadership qualities). These biases often cause equally qualified women to be seen as less likable or less qualified than the men. (These links are to descriptions of two well-known studies, but there are plenty of other good resources).

Researchers are interested if this sex bias exists in traditionally female-dominated jobs as well, such as teaching. Students are asked to watch a video of an animated classroom and rate the professor. Each student is randomly assigned to either of two animations; the videos are exactly the same except for the sex of the professor drawn. You have been asked to analyze the data for the researchers to determine if the female-identifying professor is rated more poorly, on a 1 to 7 scale (with 7 being the best), than the male-identifying professor.

Run the following code chunk to read in the data and view the variable names and first 6 rows of the data.

2.1 Identify the Parameters

Identify the study design of this study.

Be sure you are able to provide a full justification. This is review from the Lab 5 Primer.




Identify the parameter(s) that would be of interest based on the study design.







Identify the null hypothesis that would be of interest based on the study design.

The researchers want to determine if the female-identifying professor is rated more poorly, on a 1 to 7 scale (with 7 being the best), than the male-identifying professor. (Female - Male)










Identify the alternative hypothesis that would be of interest based on the study design.

The researchers want to determine if the female-identifying professor is rated more poorly, on a 1 to 7 scale (with 7 being the best), than the male-identifying professor.










2.2 Exploratory Data Analysis

Recall from the Lab 5 Primer, we calculated the following summary statistics and data visualizations.

2.2.1 Summary Statistics

df_stats(Rating ~ Sex, data = bias) 
  response    Sex min   Q1 median   Q3 max     mean       sd  n missing
1   Rating Female   0 2.25      4 5.00   6 3.647059 1.554706 34       0
2   Rating   Male   3 4.00      5 5.75   7 4.764706 1.016793 34       0

2.2.2 Data Visualization

gf_boxplot(Rating ~ Sex, data = bias, 
        ylab = "Rating of Professor in Video (Scale 1-7)", 
        xlab = "Sex of the Professor in Video") 

2.2.3 QQ Plot

gf_qq(~Rating | Sex, data = bias,
      xlab = "Theoretical Z-Scores",
      ylab = "Rating of Professor in Video") |> 
  gf_qqline()

Based on the provided information, do we meet the necessary conditions to conduct inference using the t-distribution (e.g. confidence interval, hypothesis test)?
  • Remember to check the conditions of sufficient sample size and normality together.
  • The sufficient sample size depends on whether our sample indicates the population may or may not be normality distributed (now evaluated using the QQ Plot).
  • Provide a statement, based on the condition check, to determine if we can or cannot use the t-distribution as a model for null distribution or sampling distribution of our test/sample statistic.

2.3 Calculating the Test Statistic and P-Value

We will practice code for both a “by-hand” calculation and using t.test() (which is what we will be using from in general).

For the “by-hand” calculation, we will need to split the dataset into two parts, one dataset for the male professor video ratings and one for the female professor video ratings.

We can use the function filter() to extract out specific rows associated with a specific variable value. Notice that we use double equal signs == to indicate equivalence with a particular value, and since our variable is a categorical (character), we put the value in quotes.

Calculate the summary statistics for each sample. Modify the code below to calculate the summary statistics for each group.

Here are the necessary summary statistics for the Female professor video Ratings. We will save them for later use. To get them to both save and print, we can add parentheses around each statement

2.4 Calculating a t-Test Statistic and p-Value

Now that we have the necessary summary statistics saved, we can calculate both our test statistic (t) and our p-value. Recall the t-statistic for an independent two-sample test is

\[t_0 = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\]

Recall that we are looking at Female Ratings - Male Ratings. Fill in the blanks below using the saved object names (e.g. mean_f, var_m) from above.

You should get \(se = 0.3185897\) and \(t = -3.508107\).

Now, calculate the p-value using the pt() function. Consider the direction of the alternative hypothesis.

pt(t, df = n_f - 1)
pt(t, df = n_f - 1)

Now, let’s calculate the test statistic and p-value using the t.test() function. Consider the direction of the alternative hypothesis. Recall you have three choices for the alternative.

  • "two.sided"
  • "greater"
  • "less"
t.test(Rating ~ Sex, data = bias, mu = 0, alternative = "less")
t.test(Rating ~ Sex, data = bias, mu = 0, alternative = "less")

2.4.1 Switching the Direction of the Difference

Ultimately, it is up to the researcher to choose the direction of the calculated difference. If we wanted to switch our difference and have Male Ratings - Female Ratings, we would have to tell R to change the ordering of our variable using the mutate() function, since R defaults to reading our groups alphabetically and in the t.test() code would default to Female Ratings - Male Ratings (since F comes before M).

Here is the code to reorder the variable levels so "Male" is read first:

Now, rerun the t.test() code, but using bias_reorder. What do you have to change to make the test equivalent?

t.test(Rating ~ Sex, data = bias_reorder, mu = 0, alternative = "greater")
t.test(Rating ~ Sex, data = bias_reorder, mu = 0, alternative = "greater")

2.5 Interpreting and Evaluating the p-Value

Using the calculate p-value from the t.test() function to answer the following questions.

Reorder to provide the appropriate interpretation of the p-value.
of t = -3.5081
that there is no difference between the true mean ratings for male and female professor videos
or less
0.0004447
of observing our test statistic
There is a probability of
assuming the null hypothesis is true


Evaluate the strength of evidence against the null hypothesis, using a significance level of \(\alpha = 0.05\)






Remember we have specific details to include in a full evaluation of the strength of evidence.

We have {very strong/strong/moderate/some/little} evidence against the null hypothesis (in favor of the alternative hypothesis) that {context of indicated hypothesis} (t = {xxx}, df = {xxx}, p-value = {xxx}).


Which of the following statements are true based on the p-value?






2.6 Calculating a Confidence Interval for the Difference of Two Means

In order to calculate the confidence interval for the difference of two population means, it takes on the same structure as a confidence interval for one population mean.

\[point \ estimate \pm (critical \ value)*(standard\ error)\]

or

\[\bar{x}_1 - \bar{x}_2 \ \pm \ t^*{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\]

We can find our \(t^*\) critical value the same way we found it for previous confidence intervals.

Our point estimate is \(\bar{x}_1 - \bar{x}_2\)

and finally we can calculate the lower bound (lb) and upper bound (ub) for our confidence interval.

Of course, the “by-hand” method is tedious when we can just use t.test() to calculate our confidence interval. Fill in the blanks below to calculate the 95% confidence interval for the difference between the two means.

t.test(Rating ~ Sex, data = bias, conf.level = 0.95)$conf.int
t.test(Rating ~ Sex, data = bias, conf.level = 0.95)$conf.int


Provide an interpretation of the confidence interval by reordering the phrases below.
LB and UB
and
is a single value
the true mean rating of female professor videos
Based on the sample
the difference between
within the interval
the true mean rating of male professor videos
we are 95% confident that


Which of the following would be plausible estimates for the difference between two true mean ratings?










What can we conclude about the study?