Lab 2: Introduction to Quarto Primer

Please note that all images were created with modifications to the defaults to make them digitally accessible. If you recreate this code in another environment, your plots have different colors and backgrounds.

Introduction

This Primer will introduce you to using Markdown to create documents. The good news is: you’ve already been doing this! The .qmd files you are using are Quarto documents that are written with markdown and you’ve been editing them since week 1. We’re just going to make some of what you’ve already learned more explicit.

Part 1: Introduction to Markdown

Markdown is a way to encode formatting on text in an unobtrusive way (compared to, say, html). Quarto documents use the Markdown language to code the text in the document, and allow you to embed code chunks that execute R code. Your finished .qmd document can be ‘rendered’ to other document types, and the markdown language converts to specific formatting.

Check out the markdown and rendered text side-by-side:

When you do so, you will see this Markdown document converted to an HTML document and **this phrase is now bold**, *this phrase is italicized*, and `this phrase is written to denote R code`. To do the same in your own documents, you can use the same asterisks (*) or back ticks (`) as shown in the unrendered document. 

When you do so, you will see this Markdown document converted to an HTML document and this phrase is now bold, this phrase is italicized, and this phrase is written to denote R code. To do the same in your own documents, you can use the same asterisks (*) or back ticks (`) as shown in the unrendered document.

We are going to learn some of the basic Quarto editing techniques.

1.0 Overview of Quarto

Before we continue, check out this Tutorial Hello Quarto (25 min). Remember, you will want to take notes as you move through the tutorials. You can also read more via this optional introduction to Quarto (30 min, all of Chapter 28).

The biggest thing to realize is that a Quarto document contains three types of content:

  1. A YAML header surrounded by ---s at the top.
  2. Chunks of R code surrounded by ```{r} and ```.
  3. Text mixed with simple text formatting like # heading and _italics_.

For our class (STAT 250), you will generally only be editing the Chunks and the Text.

1.1. Basic Markdown

Markdown can format a document just like you format your Word documents in other classes, but we’ll just cover some of the basics here. You’ve already learned italics, bold and code displays. Markdown can make lists as well, using a variety of encoding (see the online tutorial). You’ve probably also noticed that when you format your text, its color display changes, to help you see the formatting easier.

Unrendered

Rendered

*italics*  or _italics_

italics or italics

**bold**

bold

`code`  

code

subscript~2~    

subscript2

superscript^2^  

superscript2

> Text block for answers

Text block for answers


# Header Size 1

Header Size 1

## Header Size 2

Header Size 2


### Header Size 3

Header Size 3

Markdown Comments Will Not Render: <!--- this comment won't show up rendered --->

Markdown Comments Will Not Render:

One last thing of note - markdown works best with “space to breathe.” If you find that something isn’t rendering correctly, making sure there is a blank line between the text you are trying to format and the next helps.

For example:
> Although this is green unrendered, it doesn't appear with the formatting we want. Try adding an "enter" between the "For example:" and the start of this line and render again. Did it fix it?
You'll also notice that if there are no line breaks, this formatting doesn't end, so you don't need to put a ">" before each line answer. 

For example: > Although this is green unrendered, it doesn’t appear with the formatting we want. Try adding an “enter” between the “For example:” and the start of this line and render again. Did it fix it? You’ll also notice that if there are no line breaks, this formatting doesn’t end, so you don’t need to put a “>” before each line answer.

Notice what happens when we add spacing:

For example:

> Although this is green unrendered, it doesn't appear with the formatting we want. Try adding an "enter" between the "For example:" and the start of this line and render again. Did it fix it?
You'll also notice that if there are no line breaks, this formatting doesn't end, so you don't need to put a ">" before each line answer. 

For example:

Although this is green unrendered, it doesn’t appear with the formatting we want. Try adding an “enter” between the “For example:” and the start of this line and render again. Did it fix it? You’ll also notice that if there are no line breaks, this formatting doesn’t end, so you don’t need to put a “>” before each line answer.

1.2. Creating Tables

The easiest way to make a table in markdown is using this format:

Unrendered


| Right | Left | Default | Center | 
|------:|:-----|---------|:------:| 
|   12  |  12  |    12   |    12  | 
|  123  |  123 |   123   |   123  | 
|    1  |    1 |     1   |     1  | 

Rendered

Right Left Default Center
12 12 12 12
123 123 123 123
1 1 1 1

The : denotes where the cell values align; default is fine on labs. You might also notice that it doesn’t mater if my | don’t line up when we edit a table once we render:

Unrendered


| Right | Left | Default | Center | 
|------:|:-----|---------|:------:| 
|   12  |  124 |    12|    12  | 
|  123  |     123 |   123 |   123  | 
|    1  |  176 |     1   |     1  | 

Rendered

Right Left Default Center
12 124 12 12
123 123 123 123
1 176 1 1

That said, it is easier to read in the .qmd file if you add spaces and dashes as needed so that they all do line up.

1.3. Mathematics

We often want to write small mathematics equations or write mathematical symbols. This is actually much easier in .qmd than in Word.

We surround an equation with $ signs: $e=mc^2$ which will appear as \(e=mc^2\) in the rendered document.

Here are some examples of the type of mathematics you will use within your Quarto documents.

Greek Letters

We often want to use statistical symbols within markdown to denote parameters with Greek letters. They are called by using a backslash (\) and then the full name of the Greek letter, such as \alpha, and surrounded by $ to be called within the equation-mode:

$\sigma$
$\alpha$
$\mu$
$\beta$

\(\sigma\)
\(\alpha\)
\(\mu\)
\(\beta\)

Superscripts and Subscripts

We can include sub- and superscripts to make our parameters and statistics specific. Note: once the equation is surrounded by $’s we don’t need the markdown edit for superscript like above. In math mode, you might see:

$x^2$  
$H_0$  
$\mu_1$  

\(x^2\)
\(H_0\)
\(\mu_1\)

We might want to make our subscript a longer phrase in statistics, to help denote our population of interest when we have more than one. We surround whatever we want to stay in a subscript with {}, such as:

$\mu_{female}$  
$\mu_{male}$ 

\(\mu_{female}\)
\(\mu_{male}\)

Symbols over Letters

We can call our special statistical symbols in a similar way, such as \bar{x}. In this case, the bar symbol will be applied over the x, to make our statistical symbol for sample mean: \(\bar{x}\).

$\bar{x}$  
$\hat{p}$  
$\hat{\beta}_1$  

\(\bar{x}\)
\(\hat{p}\)
\(\hat{\beta}_1\)

Equations

Finally, we can write little equations for our outputs:, like \(\bar{x} = 3.54\).

$\bar{x} = 3.54$  
$\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x$  

\(\bar{x} = 3.54\)
\(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x\)

1.4. Code Chunks

You’ve learned how to insert code chunks using the shortcuts Ctrl-Alt-I (or Cmd-Alt-I), by clicking the green Insert +C button above, but you can also just type the two sets of code that surround code chunks, ```{r} and the ending ``` . Note, those are back-ticks, found above the tab key, not single quotation marks.

We can add labels #| label: to support easy navigation as well, but be careful, DO NOT REPEAT LABELS or this will cause an error when you render the .qmd file.

```{r}
#| label: simple-addition
1 + 1
```
[1] 2

At the beginning of each Quarto document there should be a special code chunk called the setup chunk. This is where we read the packages used in the rest of the document. We often use #| include: false to as an option in the setup code chunk to hide the output in the rendered document.

```{r}
#| label: setup
library(tidyverse)
library(mosaic)
library(ggformula)
```

1.5 In-line code

One of the neat features in Quarto files is the ability to combine code chunks, text, and automatically include (or not– but we always will in this class) code and outputs in the finished document. We’ve been writing code in the code chunks, but you can also write it directly in line with the text using some specific formatting.

Let’s demonstrate with our small study dataset. First, we need to read in the data as an object.

```{r}
#| label: read-study-data
study <- read_csv("study.csv", show_col_types = FALSE) #stops some extra output
```

Then we could calculate the mean:

```{r}
#| label: mean-wk1
mean(~wk1, data = study)
```
[1] 1.142857

By running the code as above, we calculated the mean, but didn’t save it as an object – it just prints to screen and is forgotten by R. If we save the output as an object, we can use it within the Quarto file. When you save an output to an object name, it will not print the value to the screen unless you tell R to provide the value.

```{r}
#| label: mean-wk1-assigned
mwk1 <- mean(~wk1, data = study) # calculates mean and saves it as mwk1
mwk1 # prints to screen the value saved to the object name
```
[1] 1.142857

We can use the named objects to display the value stored within it in-line with our text by using the format:r. We already know that paired back-ticks mark text to be printed out in code font. By including the r in the front, we tell it what follows is actual code to be run.

For example:

The mean number of hours studied in week 1 is `r mwk1` hours.

The mean number of hours studied in week 1 is 1.1428571 hours.

Using a stored object name is useful if you are going to use the value often (to cut down on typing), but if you only need to use the value small number of times, you can also write the code directly in line. For example:

The maximum number of hours studied in week two is `r max(~wk2, data = study)` hours

the maximum number of hours studied in week two is 5 hours

Notice that we never ran that code in any code chunk. It was run and displayed completely in-line.

1.6. Checking code & Rendering your Document

You can run specific code chunks to check their output by pressing the green play button to run the specific code chunk, or the icon next to it to run all code chunks prior to that code chunk.

Viewing Your Rendered Document

If you want to check your Markdown editing, you can render the document and view it in the ‘Viewer’ Pane by pressing the ‘Render’ button at the top of the Quarto document pane. Make sure the setting (gear icon above) is set to ‘Preview in Viewer pane’.

Changing the Document Type

You can change the document type you knit to automatically by changing the output type in the YAML header from format: html to format: docx and press the ‘Render’ button. Your word document will be created in the project folder! (In this class, we will always use html documents; please don’t edit the YAML header of your labs; we showed you this simply for future information.)

---
title: "My report"
format: html
execute:
  echo: true
  error: false
---  

Errors in your Code

If there is an error in your file, it will not render. When .qmd files render, they run on a completely clean environment. So if you somehow loaded data or something else not with code in your .qmd file, the rendering process will create an error.

If you cannot figure out the error, you can generally still get your document to render by changing error: false to error: true in the YAML code at the top.

---
title: "My report"
format: html
execute:
  echo: true
  error: true
---  

This will print out any errors in your code chunks, but still render the document.

Errors in your Markdown

The most common errors in your markdown that can cause issues are deleting back ticks (`) around your code chunks or dashes (-) from around the YAML code, so always check that first.

Part 2: R Code: Exploratory Data Analysis for Single Numeric Variables

Here is a review of some of the code you will use in the rest of Lab 2.

Remember to first run the set up chunks first and check that all necessary packages are loaded.

2.1 Reading in Data

Next, we must be sure to read in the data AND assign it to an object name so that we can call it later in other functions.

Read the output and add the argument to the function above to “quiet this message”.

2.2 Histograms

Histograms are a great way to explore the distribution (shape, center, spread, outliers) of a variable. Here is the basic structure for a histogram of body mass of the penguins. Add the following arguments.

  • a x-axis label that says "Body Mass (g) of Three Penguin Species in the Palmer Archipelago"
  • a y-axis label that say "Number of Penguins"
  • modify the bin width to 200
  • add the argument color = "black" to outline the bars
gf_histogram(~ body_mass_g, data = penguins, xlab = "Body Mass (g) of Three Penguin Species in the Palmer Archipelago", ylab = "Number of Penguins", binwidth = 200, color = "black")
gf_histogram(~ body_mass_g, 
             data = penguins,
             xlab = "Body Mass (g) of Three Penguin Species in the Palmer Archipelago",
             ylab = "Number of Penguins",
             binwidth = 200,
             color = "black")

2.3 Boxplots

Boxplots are a great way to explore the distribution (shape, center, spread, outliers) of a variable, though they do not give us information about modality. Here is the basic structure for a boxplot of body mass of the penguins. Add the following arguments.

  • a x-axis label that says "Body Mass (g) of Three Penguin Species in the Palmer Archipelago"
gf_boxplot(~ body_mass_g, data = penguins, xlab = "Body Mass (g) of Three Penguin Species in the Palmer Archipelago")
gf_boxplot(~ body_mass_g, 
           data = penguins,
           xlab = "Body Mass (g) of Three Penguin Species in the Palmer Archipelago")

If you want to make a boxplot that compares a variable across groups, you can either use the formula structure:

  • numeric_variable ~ group_variable
  • group_variable ~ numeric_variable

Try switching the order of the two variables below and see what changes.

2.4 Summary Statistics

There are a lot of functions we learned for summary statistics, including:

  • mean()
  • median()
  • sd()
  • var()
  • IQR()
  • max()
  • min()

Remember to include na.rm = TRUE if you have missing data to avoid getting NA as your answer.

Calculate the mean for body mass for penguins.

mean(~ body_mass_g, data = penguins, na.rm = TRUE)
mean(~ body_mass_g, data = penguins, na.rm = TRUE)

You can also calculate the mean, or other statistics, split across multiple groups, such as species. Try replacing sd() with other functions.

Remember to add na.rm = TRUE if you get NA as an output.

There are also some ways to generate multiple different statistics at once. Try the following two codes and note what information they provide.

What information does df_stats provide?

What information does quantile() provide?

Final Reminders

Remember, all the code needed for the labs can be found in your course notebook, through the chapter, in the “How to do it in R” section at the end of each chapter, or in the appendix of the notebook.

Please see Canvas for information on the Math/Stat Cafe, the CLC Supplemental Instruction and Drop-in Hours, and your Instructor’s Student Hours (Office Hours) if you need any additional help.