Lab 3: Introduction to Inference Primer

Digital Accessibility

Please note that all images were created with modifications to the defaults to make them digitally accessible. If you recreate this code in another environment, your plots have different colors and backgrounds.

1 Getting Setup

Reminder: Before starting, load the ggformula and mosaic packages we need for statistical and graphical functions using the library() function in the code chunk below. Replace __graphics package__ and __stats package__ with the package names.

2 Penny Age

Coins are stamped with the year in which they were minted, and stay in circulation until they are too worn, at which point they are removed from circulation. With the rising cost of raw materials, it is more expensive to manufacture pennies than the coin is worth. Furthermore, as the US moves towards a more cashless society, the use of physical coins is declining, making the penny less necessary. On May 22, 2025, The U.S. Treasury Department announced it would end penny production starting in 2026. We are going to estimate the mean age of all circulating U.S. pennies. Rolls of pennies were collected from a bank in Marina, California in Spring 2020. The data of the age of the penny (in number of years) since 2019 can be found in penny-age-2019.csv.

2.1 Import the Data

Complete the code below to (1) read in the csv file penny-age-2019.csv and store it as the object penny, and (2) print the variable names. Replace the underscored sections of the code with data file name and function.

You can modify the code above to hide the output if you want, by adding show_col_types = FALSE to the function.

2.2 Visualize the Data

Use gf_histogram() to create a histogram of the sample distribution, being sure to label axes appropriately. Replace the underscored components. With good labels such as:

"Age in Years of US Pennies from Marina, CA Bank (Sp 2020)"
"Number of Pennies"
Try a bin width of 5 years
Set the lower bound of the bars to 0 so it doesn’t look like we have negative ages.

gf_histogram(~ age, data = penny,
             xlab = "Age in Years of US Pennies from Marina, CA Bank (Sp 2020)",
             ylab = "Number of Pennies",
             binwidth = 5,
             boundary = 0,
             color = "black")

2.3 Calculate Summary Statistics

Use df_stats() to calculate the descriptive statistics. Replace is the underscored components.

2.4 Check Your Undertanding

2.4.1 What do the following symbols represent? Arrange the descriptions in order.

\(\mu\)
\(\bar{x}\)
\(s\)
\(n\)

2.4.2 Identify the population of interest.

2.5 Construct a Confidence Interval

Let’s use a model to find a 95% confidence interval.

Go to: Confidence Interval Applet
Change the scenario to ‘one mean’
Enter your sample’s \(n\), \(\bar{x}\), and \(s\) into the fields, then ‘calculate’
Click “Confidence Interval”.
Enter the correct confidence level.
Click ‘Calculate.’
The confidence interval and degrees of freedom are displayed.

Record the calculated confidence interval and degrees of freedom that is returned. Keep all 4 decimal places in your answers.

2.5.1 Lower bound:

2.5.2 Upper bound:

2.5.3 df:

2.5.4 Arrange the words to interpret your confidence interval in the context of the question.

2.5.5 What needs to be true in order to generalize to the population of interest?

2.5.6 To use the model (t-distribution) to calculate the confidence interval, a condition/conditions need to be satisfied. Which condition(s) are they?

1 Getting Setup

2 Penny Age

2.1 Import the Data

2.2 Visualize the Data

2.2.1 Describe the shape of the distribution of penny ages in our sample.

2.3 Calculate Summary Statistics

2.4 Check Your Undertanding

2.4.1 What do the following symbols represent? Arrange the descriptions in order.

2.4.2 Identify the population of interest.

2.5 Construct a Confidence Interval

2.5.1 Lower bound:

2.5.2 Upper bound:

2.5.3 df:

2.5.4 Arrange the words to interpret your confidence interval in the context of the question.

2.5.5 What needs to be true in order to generalize to the population of interest?

2.5.6 To use the model (t-distribution) to calculate the confidence interval, a condition/conditions need to be satisfied. Which condition(s) are they?