Manuscritos reproducibles con Rmarkdown

Elena Quintero

13/01/2025

A typical research workflow

  1. Prepare data (spreadsheet)

  2. Analyse data (R)

  3. Write report/paper (Word)

A typical research workflow

However, this workflow can unchain a maelstrom of email attachments …

Problems of a broken workflow

  • How did you get these values? What analysis is behind this figure? Did you account for …?

  • What dataset was used? Which individuals were left out? Where is the clean dataset?

  • Oops, there is an error in the data. Can you repeat the analysis? And update figures/tables in Word!

Copy-pasting can be tedious & problematic

Transcribing numbers from stats software by hand was the largest source of errors - Eubank 2006

Reproducibility problems


You can find difficulties when resuming your own work and you can struggle to reproduce your own results from a few weeks/months/years ago…


Also, revising non-reproducible manuscripts can be very messy

Doing dynamic reports in R

Artwork by @allison_horst

Doing dynamic reports in R

Rmarkdown structure

Rmd = code (R, Python, etc) + text (Markdown)

Rmarkdown structure

Rmarkdown documents are…

  • Fully reproducible (trace all results including tables and plots)

  • Dynamic (can be regenerated with 1 click)

  • Multiple outputs:

    • documents (HTML, Word, PDF)
    • presentations (HTML, PDF, PowerPoint)
    • books
    • websites…

Rmarkdown allows to track results

Where does this value come from?

Rmarkdown allows to track results


You can write inline code using the syntax `r `


surv.diff <- 30


In Rmarkdown:

Survival in population A was `r surv.diff` % higher

Output:

Survival in population A was 30 % higher

Rmarkdown allows to track results


You can write inline code using the syntax `r `


data <- iris
nrow(data)
[1] 150


In Rmarkdown:

We measured `r nrow(data)` individuals

Output:

We measured 150 individuals

Code chunk options

The knitr package provides a lot of chunk options for customizing nearly all components of code chunks, such as the source code, text output, plots, and the language of the chunk

{r echo=FALSE, eval=TRUE, fig.height=3} plot(iris)

Code chunk options

Chunk options can also be embbeded with #|

```{r}
#| echo = FALSE
#| eval = TRUE
#| fig.cap = "My figure caption"
plot(iris)
```

Naming chunks

```{r iris-plot1}
#| echo = FALSE
#| eval = TRUE
#| fig.cap = "My figure caption"
plot(iris)
```

Naming chunks

  • It can help debugging and navigating long docs

Naming chunks:

  • It can help debugging and navigating long docs
  • Figure files take chunk name

Chunks can execute code different languages

Not only R, but also:

  • Python
  • Julia
  • C++
  • SQL
  • Bash
  • Rcpp
  • Stan
  • JavaScript
  • CSS

…etc

Text formatting

# Header

## Subheader

*italic*

**bold**

[a link](https://example.com)


Can use Visual Markdown Editor

Or R package “Remedy” to facilitate writing in markdown with RStudio: https://thinkr-open.github.io/remedy/

Regenerate Word/PDF/HTML

With just one click (Knit):

Regenerate Word/PDF/HTML

Spotted error in the data? Want to make new changes? No problem!

Make changes in Rmarkdown document

Click Knit🧶 in Rstudio

The report will update automatically!

Let’s try

Create, edit and share an Rmarkdown document

  • File > New File > Rmarkdown
  • Write text
  • Insert code chunks
  • Change chunk options (echo, eval, etc)
  • Try HTML/Word/PDF output

PDF generation requires LaTeX

library('tinytex')
install_tinytex()

Additional features

‘Visual Rmarkdown’

https://rstudio.github.io/visual-markdown-editing

Automatic table generation

model <- lm(Petal.Length ~ Species, data = iris)
library(xtable)
knitr::kable(xtable(model))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.462 0.0608585 24.02294 0
Speciesversicolor 2.798 0.0860669 32.50960 0
Speciesvirginica 4.090 0.0860669 47.52118 0

Alternatives: gtsummary, modelsummary, huxtable, etc

equatiomatic describes model structure

We fitted a linear model:

library('equatiomatic')
model <- lm(Petal.Length ~ Species, data = iris)
extract_eq(model)

\[ \operatorname{Petal.Length} = \alpha + \beta_{1}(\operatorname{Species}_{\operatorname{versicolor}}) + \beta_{2}(\operatorname{Species}_{\operatorname{virginica}}) + \epsilon \]

Insert equations with LaTeX

Using LaTeX:

$$
y \sim N(\mu, \sigma^2)
$$

\[ y \sim N(\mu, \sigma^2) \]

Citing bibliography

https://rstudio.github.io/visual-markdown-editing/citations.html

Using BibTeX file with references and csl format

Many Citation Styles:

https://www.zotero.org/styles

https://github.com/citation-style-language/styles

Rmarkdown templates

The are several R packages with Rmd templates for scientific journal:

  • rticles

  • papaja

  • rrtools

  • pinp

  • rmdTemplates

  • pagedreport

  • GitHub

Accessing Rmd templates

Revise grammar checking

R package: gramr https://github.com/ropenscilabs/gramr

wellspell.addin: https://github.com/nevrome/wellspell.addin

Others

Synonym finder: https://github.com/gadenbuie/synamyn

Word count and readability: https://github.com/benmarwick/wordcountaddin

Write books, theses, with bookdown

https://bookdown.org/

Parameterised reports

Declare parameters:

https://bookdown.org/yihui/rmarkdown/parameterized-reports.html

Parameterised reports

Render thousands of individual reports from Rmd template

Collaborative writing

Google Docs - trackdown R package

Rmarkdown resources

Rmarkdown website

http://rmarkdown.rstudio.com/

Rmarkdown cheat sheet

https://raw.githubusercontent.com/rstudio/cheatsheets/main/rmarkdown.pdf

Rmarkdown reference guide

https://www.rstudio.org/links/r_markdown_reference_guide

Rmarkdown books

Quarto: the modern Rmarkdown

https://quarto.org/

Let’s try

Writing a Rmd document

  • Try visual markdown editor

  • Parameterised reports (e.g. different iris or penguin species)

  • Add bibliography

  • Try templates (rticles, rmdTemplates)