Publishing your repo

Formatting, containerising, publishing, archiving code

Alban Sagouis

iDiv, Leipzig

This is for you if:

  • You ever want to publish your research code

What will we do?

  • Reformat the code
    • Once at the end of the project?
    • Automatically! All the time!
  • Freeze your R environment with renv
    • Once at the end of the project, yay!
    • All the time, only if…
  • Write a README and a CITATION.cff
  • Publish on GitHub
    • Once at the end of the project
    • All the time!?
  • Archive on Zenodo
    • Just once, automatically

Let’s choose a project

  • Let’s copy-paste the entire folder to keep an intact backup…
  • And make it an R project if it’s not already one.

Code formatting

  • In Rstudio, the keyboard shortcut Ctrl+Shift+A or Cmd+Shift+A reformats the selection.
  • You can open your scripts and, one by one, Ctrl+A and Ctrl+Shift+A.
  • Or…

Code formatting: the Rstudio default

  • You can activate Rstudio styler formatter and automatically reformat on save.
  • Open Tools -> Global Options -> Code -> Formatting.
    • Select styler.
    • Check Reformat documents on save.

Code formatting: or use the Air formatter

band_members |> select(name) |> full_join(band_instruments2, by = join_by(name == artist))

left_join <- function(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = NULL) {
  UseMethod("left_join")
}

1+2:3*(4/5)
band_members |>
  select(name) |>
  full_join(band_instruments2, by = join_by(name == artist))

left_join <- function(
  x,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  ...,
  keep = NULL
) {
  UseMethod("left_join")
}

1 + 2:3 * (4 / 5)

Code formatting: how to install Air

  • First, you’ll need to install the Air command line tool.

  • Next, you’ll need to tell RStudio to use Air as an external formatter:

    • Open Tools -> Global Options -> Code.
    • Choose the Formatting tab at the top.
    • Change the Code formatter: option to External.
    • Change the Reformat command: to {path/to/air} format.
      • Note that you set this to a partially complete command! RStudio will append the name of the file to this partial command, but you must specify format in addition to the path to Air for it to work.
      • The easiest way to figure out {path/to/air} for yourself is to run which air from a Terminal on Unix, and where air from the Command Prompt on Windows.

Code formatting: how to install Air

  • At this point, explicit calls to Reformat Selection and Reformat Document should use Air.
  • If you’d also like RStudio to invoke Air on save:
    • Open Tools -> Global Options -> Code -> Saving and check Reformat documents on save.
Rstudio settings

Figure 1: Rstudio settings

Reproducibility: absolute paths

read.table("~/idiv/biotime/data/biotime.csv")
  • Works only on your current computer.

Reproducibility: relative paths

setwd("~idiv/biotime")
read.table("data/biotime.csv")
  • Better but also works only on your current computer.
  • Setting your working directory by hand is not a very reproducible habit.

Reproducibility: relative paths

  • Using an R project? All your paths can be relative to the root of the project.
  • Your colleagues, reviewers and future self don’t need to make any change to have working paths.

Reproducibility: relative paths

  • Don’t use Rstudio? You can use the here package:

    • Create a .here empty file at the root of your project.
    • and use it like this:
read.table(here("data", "biotime.csv"))
  • All your paths are relative to the .here file.
  • Even works on the cluster.

Reproducibility: renv

  • Let’s snapshot the package versions used in this project. Easy!
install.packages("renv")
renv::snapshot()
  • Did it work?
  • Most likely problem: renv does not know where is the root of the project.

    • renv looks for a project_name.Rproj, a README, a DESCRIPTION file or a R/ folder.
    • Easy fix is to create an empty file at the root of the project called .here.

Metadata: README

  • The README located at the root of the project file will appear directly in GitHub and in Zenodo.
  • I show here recommendations from a Methods in Ecology and Evolution hackathon:

Metadata: code README

  • Information on the manuscript it came from.
  • Contact details of at least one author.
  • License information [note that some people provide this as a separate LICENSE file which is also good practice].
  • List of all scripts and what they do, i.e. processing, analysis, plotting etc. and what their outputs are (e.g. table 1, figure 2). [note that some of the detailed descriptions of this may be in the files themselves, especially for functions, this is also fine but the README should list the basics of what the scripts do].
  • Details of the workflow of the code if there are multiple scripts, i.e. what order do the scripts need to be run in?
  • How does the code link to the data? i.e. which data files are needed for each script?
  • The name of the software used (e.g. R), version, and names and versions of all packages required to run the analyses.

Metadata: data README

  • Information on the manuscript it came from.
  • Contact details of at least one author.
  • License information.
  • Information about the data.
  • Brief summary of how data were collected.
  • Sources of data if from a literature review.
  • List of all data files.
  • Column-by-column description of the data files, along with column headers, measurement units, levels of factors (e.g. if the variable is “habitat” what categories are possible?), explanations for any abbreviations.

Metadata: CITATION.cff

  • The reference inside a CITATION.cff file would also be shown elegantly by both GitHub and Zenodo.
  • It is the best way to acknowledge funding and the participation of people who did not directly contribute code to the repository but participated to the analyses.
cff-version: 1.2.0
message: "If you use these data and code, please cite this work as below."
authors:
  - family-names: Sagouis
    given-names: Alban
    orcid: https://orcid.org/0000-0002-3827-1063
  - family-names: Blowes
    given-names: Shane
    orcid: https://orcid.org/0000-0001-6310-3670
  - family-names: Chase
    given-names: Jonathan
    orcid: https://orcid.org/0000-0001-5580-4303
  - family-names: Xu
    given-names: Wubing
    orcid: https://orcid.org/0000-0002-6566-4452
title: chase-lab/metacommunity_surveys, Metacommunity Surveys data for `Local
changes dominate variation in biotic homogenization and differentiation`
version: v2.5-Blowes_etal_Science_Advances
date-released: 2024-01-01

Metadata: CITATION.cff

  • Using unique identifiers such as orcid and ROR is a great idea.

Publishing on GitHub: git init

  • “Existing project, github last” workflow from Jenny Bryant’s book Happy Git and GitHub for the useR.
  • We activate git locally.
  • Create an empty repository on GitHub.
  • Copy paste the 3 command lines GitHub gives us and done.

Publishing on GitHub: .gitignore

  • Get the skeleton here
  • Exclude an entire folder like this
doc/
inst/ignore/
  • Exclude all files from a specific format like this:
.DS_Store
*.html
  • Exclude all files from a specific format in a specific folder like this:
vignettes/*.R
src/*.o
  • Exclude all files but one like this:
/cache/**
!README

Publishing on GitHub: git commit and git push

  • git commit creates the snapshot
  • git push sends it to github.com
  • Can you see your README and your CITATION?

Archiving: Zenodo

Extras

  • Add badges to your README
    • Project version
    • Zenodo DOI
    • Manuscript DOI
  • You kept working on this project?
    • Create a new release
    • Zenodo automatically gives you a new DOI and keeps track of versions