Lab 3: Introduction to Inference Primer
Please note that all images were created with modifications to the defaults to make them digitally accessible. If you recreate this code in another environment, your plots have different colors and backgrounds.
1 Getting Setup
Reminder: Before starting, load the ggformula and mosaic packages we need for statistical and graphical functions using the library() function in the code chunk below. Replace __graphics package__ and __stats package__ with the package names.
library(tidyverse) #data management
library(ggformula)
library(mosaic)
library(tidyverse) #data management
library(ggformula)
library(mosaic)2 Penny Age
Coins are stamped with the year in which they were minted, and stay in circulation until they are too worn, at which point they are removed from circulation. With the rising cost of raw materials, it is more expensive to manufacture pennies than the coin is worth. Furthermore, as the US moves towards a more cashless society, the use of physical coins is declining, making the penny less necessary. On May 22, 2025, The U.S. Treasury Department announced it would end penny production starting in 2026. We are going to estimate the mean age of all circulating U.S. pennies. Rolls of pennies were collected from a bank in Marina, California in Spring 2020. The data of the age of the penny (in number of years) since 2019 can be found in penny-age-2019.csv.
2.1 Import the Data
Complete the code below to (1) read in the csv file penny-age-2019.csv and store it as the object penny, and (2) print the variable names. Replace the underscored sections of the code with data file name and function.
You can modify the code above to hide the output if you want, by adding show_col_types = FALSE to the function.
names(penny)
names(penny)2.2 Visualize the Data
Use gf_histogram() to create a histogram of the sample distribution, being sure to label axes appropriately. Replace the underscored components. With good labels such as:
"Age in Years of US Pennies from Marina, CA Bank (Sp 2020)""Number of Pennies"
- Try a bin width of 5 years
- Set the lower bound of the bars to 0 so it doesn’t look like we have negative ages.
gf_histogram(~ age, data = penny,
xlab = "Age in Years of US Pennies from Marina, CA Bank (Sp 2020)",
ylab = "Number of Pennies",
binwidth = 5,
boundary = 0,
color = "black")
gf_histogram(~ age, data = penny,
xlab = "Age in Years of US Pennies from Marina, CA Bank (Sp 2020)",
ylab = "Number of Pennies",
binwidth = 5,
boundary = 0,
color = "black")2.2.1 Describe the shape of the distribution of penny ages in our sample.
2.3 Calculate Summary Statistics
Use df_stats() to calculate the descriptive statistics. Replace is the underscored components.
df_stats(~ age, data = penny)
df_stats(~ age, data = penny)2.4 Check Your Undertanding
2.4.1 What do the following symbols represent? Arrange the descriptions in order.
- \(\mu\)
- \(\bar{x}\)
- \(s\)
- \(n\)
2.4.2 Identify the population of interest.
2.5 Construct a Confidence Interval
Let’s use a model to find a 95% confidence interval.
- Go to: Confidence Interval Applet
- Change the scenario to ‘one mean’
- Enter your sample’s \(n\), \(\bar{x}\), and \(s\) into the fields, then ‘calculate’
- Click “Confidence Interval”.
- Enter the correct confidence level.
- Click ‘Calculate.’
- The confidence interval and degrees of freedom are displayed.
Record the calculated confidence interval and degrees of freedom that is returned. Keep all 4 decimal places in your answers.