Lab 2: Introduction to Quarto Primer
Please note that all images were created with modifications to the defaults to make them digitally accessible. If you recreate this code in another environment, your plots have different colors and backgrounds.
Introduction
This Primer will introduce you to using Markdown to create documents. The good news is: you’ve already been doing this! The .qmd files you are using are Quarto documents that are written with markdown and you’ve been editing them since week 1. We’re just going to make some of what you’ve already learned more explicit.
Part 1: Introduction to Markdown
Markdown is a way to encode formatting on text in an unobtrusive way (compared to, say, html). Quarto documents use the Markdown language to code the text in the document, and allow you to embed code chunks that execute R code. Your finished .qmd document can be ‘rendered’ to other document types, and the markdown language converts to specific formatting.
Check out the markdown and rendered text side-by-side:
When you do so, you will see this Markdown document converted to an HTML document and **this phrase is now bold**, *this phrase is italicized*, and `this phrase is written to denote R code`. To do the same in your own documents, you can use the same asterisks (*) or back ticks (`) as shown in the unrendered document.
When you do so, you will see this Markdown document converted to an HTML document and this phrase is now bold, this phrase is italicized, and this phrase is written to denote R code
. To do the same in your own documents, you can use the same asterisks (*) or back ticks (`) as shown in the unrendered document.
We are going to learn some of the basic Quarto editing techniques.
1.0 Overview of Quarto
Before we continue, check out this Tutorial Hello Quarto (25 min). Remember, you will want to take notes as you move through the tutorials. You can also read more via this optional introduction to Quarto (30 min, all of Chapter 28).
The biggest thing to realize is that a Quarto document contains three types of content:
- A YAML header surrounded by
---
s at the top. - Chunks of R code surrounded by
```{r}
and```
. - Text mixed with simple text formatting like
# heading
and_italics_
.
For our class (STAT 250), you will generally only be editing the Chunks and the Text.
1.1. Basic Markdown
Markdown can format a document just like you format your Word documents in other classes, but we’ll just cover some of the basics here. You’ve already learned italics, bold and code
displays. Markdown can make lists as well, using a variety of encoding (see the online tutorial). You’ve probably also noticed that when you format your text, its color display changes, to help you see the formatting easier.
Unrendered
Rendered
*italics* or _italics_
italics or italics
**bold**
bold
`code`
code
subscript~2~
subscript2
superscript^2^
superscript2
> Text block for answers
Text block for answers
# Header Size 1
Header Size 1
## Header Size 2
Header Size 2
### Header Size 3
Header Size 3
Markdown Comments Will Not Render: <!--- this comment won't show up rendered --->
Markdown Comments Will Not Render:
One last thing of note - markdown works best with “space to breathe.” If you find that something isn’t rendering correctly, making sure there is a blank line between the text you are trying to format and the next helps.
For example:
> Although this is green unrendered, it doesn't appear with the formatting we want. Try adding an "enter" between the "For example:" and the start of this line and render again. Did it fix it?
You'll also notice that if there are no line breaks, this formatting doesn't end, so you don't need to put a ">" before each line answer.
For example: > Although this is green unrendered, it doesn’t appear with the formatting we want. Try adding an “enter” between the “For example:” and the start of this line and render again. Did it fix it? You’ll also notice that if there are no line breaks, this formatting doesn’t end, so you don’t need to put a “>” before each line answer.
Notice what happens when we add spacing:
For example:
> Although this is green unrendered, it doesn't appear with the formatting we want. Try adding an "enter" between the "For example:" and the start of this line and render again. Did it fix it?
You'll also notice that if there are no line breaks, this formatting doesn't end, so you don't need to put a ">" before each line answer.
For example:
Although this is green unrendered, it doesn’t appear with the formatting we want. Try adding an “enter” between the “For example:” and the start of this line and render again. Did it fix it? You’ll also notice that if there are no line breaks, this formatting doesn’t end, so you don’t need to put a “>” before each line answer.
1.2. Creating Tables
The easiest way to make a table in markdown is using this format:
Unrendered
| Right | Left | Default | Center |
|------:|:-----|---------|:------:|
| 12 | 12 | 12 | 12 |
| 123 | 123 | 123 | 123 |
| 1 | 1 | 1 | 1 |
Rendered
Right | Left | Default | Center |
---|---|---|---|
12 | 12 | 12 | 12 |
123 | 123 | 123 | 123 |
1 | 1 | 1 | 1 |
The : denotes where the cell values align; default is fine on labs. You might also notice that it doesn’t mater if my |
don’t line up when we edit a table once we render:
Unrendered
| Right | Left | Default | Center |
|------:|:-----|---------|:------:|
| 12 | 124 | 12| 12 |
| 123 | 123 | 123 | 123 |
| 1 | 176 | 1 | 1 |
Rendered
Right | Left | Default | Center |
---|---|---|---|
12 | 124 | 12 | 12 |
123 | 123 | 123 | 123 |
1 | 176 | 1 | 1 |
That said, it is easier to read in the .qmd file if you add spaces and dashes as needed so that they all do line up.
1.3. Mathematics
We often want to write small mathematics equations or write mathematical symbols. This is actually much easier in .qmd than in Word.
We surround an equation with $
signs: $e=mc^2$
which will appear as \(e=mc^2\) in the rendered document.
Here are some examples of the type of mathematics you will use within your Quarto documents.
Greek Letters
We often want to use statistical symbols within markdown to denote parameters with Greek letters. They are called by using a backslash (\
) and then the full name of the Greek letter, such as \alpha
, and surrounded by $
to be called within the equation-mode:
$\sigma$
$\alpha$
$\mu$ $\beta$
\(\sigma\)
\(\alpha\)
\(\mu\)
\(\beta\)
Superscripts and Subscripts
We can include sub- and superscripts to make our parameters and statistics specific. Note: once the equation is surrounded by $
’s we don’t need the markdown edit for superscript like above. In math mode, you might see:
$x^2$
$H_0$ $\mu_1$
\(x^2\)
\(H_0\)
\(\mu_1\)
We might want to make our subscript a longer phrase in statistics, to help denote our population of interest when we have more than one. We surround whatever we want to stay in a subscript with {}
, such as:
$\mu_{female}$ $\mu_{male}$
\(\mu_{female}\)
\(\mu_{male}\)
Symbols over Letters
We can call our special statistical symbols in a similar way, such as \bar{x}
. In this case, the bar
symbol will be applied over the x
, to make our statistical symbol for sample mean: \(\bar{x}\).
$\bar{x}$
$\hat{p}$ $\hat{\beta}_1$
\(\bar{x}\)
\(\hat{p}\)
\(\hat{\beta}_1\)
Equations
Finally, we can write little equations for our outputs:, like \(\bar{x} = 3.54\).
$\bar{x} = 3.54$ $\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x$
\(\bar{x} = 3.54\)
\(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x\)
1.4. Code Chunks
You’ve learned how to insert code chunks using the shortcuts Ctrl-Alt-I (or Cmd-Alt-I), by clicking the green Insert +C button above, but you can also just type the two sets of code that surround code chunks, ```{r}
and the ending ```
. Note, those are back-ticks, found above the tab key, not single quotation marks.
We can add labels #| label:
to support easy navigation as well, but be careful, DO NOT REPEAT LABELS or this will cause an error when you render the .qmd file.
```{r}
#| label: simple-addition
1 + 1
```
[1] 2
At the beginning of each Quarto document there should be a special code chunk called the setup chunk. This is where we read the packages used in the rest of the document. We often use #| include: false
to as an option in the setup code chunk to hide the output in the rendered document.
```{r}
#| label: setup
library(tidyverse)
library(mosaic)
library(ggformula)
```
1.5 In-line code
One of the neat features in Quarto files is the ability to combine code chunks, text, and automatically include (or not– but we always will in this class) code and outputs in the finished document. We’ve been writing code in the code chunks, but you can also write it directly in line with the text using some specific formatting.
Let’s demonstrate with our small study
dataset. First, we need to read in the data as an object
.
```{r}
#| label: read-study-data
<- read_csv("study.csv", show_col_types = FALSE) #stops some extra output
study ```
Then we could calculate the mean:
```{r}
#| label: mean-wk1
mean(~wk1, data = study)
```
[1] 1.142857
By running the code as above, we calculated the mean, but didn’t save it as an object
– it just prints to screen and is forgotten by R. If we save the output as an object
, we can use it within the Quarto file. When you save an output to an object
name, it will not print the value to the screen unless you tell R to provide the value.
```{r}
#| label: mean-wk1-assigned
<- mean(~wk1, data = study) # calculates mean and saves it as mwk1
mwk1 # prints to screen the value saved to the object name
mwk1 ```
[1] 1.142857
We can use the named objects to display the value stored within it in-line with our text by using the format:r
. We already know that paired back-ticks mark text to be printed out in code font. By including the r
in the front, we tell it what follows is actual code to be run.
For example:
`r mwk1` hours. The mean number of hours studied in week 1 is
The mean number of hours studied in week 1 is 1.1428571 hours.
Using a stored object name is useful if you are going to use the value often (to cut down on typing), but if you only need to use the value small number of times, you can also write the code directly in line. For example:
`r max(~wk2, data = study)` hours The maximum number of hours studied in week two is
the maximum number of hours studied in week two is 5 hours
Notice that we never ran that code in any code chunk. It was run and displayed completely in-line.
1.6. Checking code & Rendering your Document
You can run specific code chunks to check their output by pressing the green play button to run the specific code chunk, or the icon next to it to run all code chunks prior to that code chunk.
Viewing Your Rendered Document
If you want to check your Markdown editing, you can render the document and view it in the ‘Viewer’ Pane by pressing the ‘Render’ button at the top of the Quarto document pane. Make sure the setting (gear icon above) is set to ‘Preview in Viewer pane’.
Changing the Document Type
You can change the document type you knit to automatically by changing the output type in the YAML header from format: html
to format: docx
and press the ‘Render’ button. Your word document will be created in the project folder! (In this class, we will always use html documents; please don’t edit the YAML header of your labs; we showed you this simply for future information.)
---
title: "My report"
format: html
execute:
echo: true
error: false
---
Errors in your Code
If there is an error in your file, it will not render. When .qmd files render, they run on a completely clean environment. So if you somehow loaded data or something else not with code in your .qmd file, the rendering process will create an error.
If you cannot figure out the error, you can generally still get your document to render by changing error: false
to error: true
in the YAML code at the top.
---
title: "My report"
format: html
execute:
echo: true
error: true
---
This will print out any errors in your code chunks, but still render the document.
Errors in your Markdown
The most common errors in your markdown that can cause issues are deleting back ticks (`) around your code chunks or dashes (-) from around the YAML code, so always check that first.
Part 2: R Code: Exploratory Data Analysis for Single Numeric Variables
Here is a review of some of the code you will use in the rest of Lab 2.
Remember to first run the set up chunks first and check that all necessary packages are loaded.
2.1 Reading in Data
Next, we must be sure to read in the data AND assign it to an object
name so that we can call it later in other functions.
Read the output and add the argument to the function above to “quiet this message”.
2.2 Histograms
Histograms are a great way to explore the distribution (shape, center, spread, outliers) of a variable. Here is the basic structure for a histogram of body mass of the penguins. Add the following arguments.
- a x-axis label that says
"Body Mass (g) of Three Penguin Species in the Palmer Archipelago"
- a y-axis label that say
"Number of Penguins"
- modify the bin width to 200
- add the argument
color = "black"
to outline the bars
gf_histogram(~ body_mass_g,
data = penguins,
xlab = "Body Mass (g) of Three Penguin Species in the Palmer Archipelago",
ylab = "Number of Penguins",
binwidth = 200,
color = "black")
gf_histogram(~ body_mass_g,
data = penguins,
xlab = "Body Mass (g) of Three Penguin Species in the Palmer Archipelago",
ylab = "Number of Penguins",
binwidth = 200,
color = "black")
2.3 Boxplots
Boxplots are a great way to explore the distribution (shape, center, spread, outliers) of a variable, though they do not give us information about modality. Here is the basic structure for a boxplot of body mass of the penguins. Add the following arguments.
- a x-axis label that says
"Body Mass (g) of Three Penguin Species in the Palmer Archipelago"
gf_boxplot(~ body_mass_g,
data = penguins,
xlab = "Body Mass (g) of Three Penguin Species in the Palmer Archipelago")
gf_boxplot(~ body_mass_g,
data = penguins,
xlab = "Body Mass (g) of Three Penguin Species in the Palmer Archipelago")
If you want to make a boxplot that compares a variable across groups, you can either use the formula structure:
numeric_variable ~ group_variable
group_variable ~ numeric_variable
Try switching the order of the two variables below and see what changes.
2.4 Summary Statistics
There are a lot of functions we learned for summary statistics, including:
mean()
median()
sd()
var()
IQR()
max()
min()
Remember to include na.rm = TRUE
if you have missing data to avoid getting NA
as your answer.
Calculate the mean for body mass for penguins.
mean(~ body_mass_g, data = penguins, na.rm = TRUE)
mean(~ body_mass_g, data = penguins, na.rm = TRUE)
You can also calculate the mean, or other statistics, split across multiple groups, such as species. Try replacing sd()
with other functions.
Remember to add na.rm = TRUE
if you get NA
as an output.
There are also some ways to generate multiple different statistics at once. Try the following two codes and note what information they provide.
What information does df_stats
provide?
What information does quantile()
provide?
Final Reminders
Remember, all the code needed for the labs can be found in your course notebook, through the chapter, in the “How to do it in R” section at the end of each chapter, or in the appendix of the notebook.
Please see Canvas for information on the Math/Stat Cafe, the CLC Supplemental Instruction and Drop-in Hours, and your Instructor’s Student Hours (Office Hours) if you need any additional help.