Publishing your repo

Formatting, containerising, publishing, archiving code

Alban Sagouis

alban.sagouis@idiv.de

iDiv, Leipzig

This is for you if:

You ever want to publish your research code

What will we do?

Reformat the code
- Once at the end of the project?
- Automatically! All the time!

Freeze your R environment with renv
- Once at the end of the project, yay!
- All the time, only if…

Write a README and a CITATION.cff

Publish on GitHub
- Once at the end of the project
- All the time!?

Archive on Zenodo
- Just once, automatically

Let’s choose a project

Let’s copy-paste the entire folder to keep an intact backup…
And make it an R project if it’s not already one.

Code formatting

In Rstudio, the keyboard shortcut Ctrl+Shift+A or Cmd+Shift+A reformats the selection.
You can open your scripts and, one by one, Ctrl+A and Ctrl+Shift+A.
Or…

Code formatting: the Rstudio default

You can activate Rstudio styler formatter and automatically reformat on save.
Open Tools -> Global Options -> Code -> Formatting.
- Select styler.
- Check Reformat documents on save.

Code formatting: or use the Air formatter

Before
After

band_members |> select(name) |> full_join(band_instruments2, by = join_by(name == artist))

left_join <- function(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = NULL) {
  UseMethod("left_join")
}

1+2:3*(4/5)

band_members |>
  select(name) |>
  full_join(band_instruments2, by = join_by(name == artist))

left_join <- function(
  x,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  ...,
  keep = NULL
) {
  UseMethod("left_join")
}

1 + 2:3 * (4 / 5)

Code formatting: how to install Air

First, you’ll need to install the Air command line tool.
Next, you’ll need to tell RStudio to use Air as an external formatter:
- Open Tools -> Global Options -> Code.
- Choose the Formatting tab at the top.
- Change the Code formatter: option to External.
- Change the Reformat command: to {path/to/air} format.
  - Note that you set this to a partially complete command! RStudio will append the name of the file to this partial command, but you must specify format in addition to the path to Air for it to work.
  - The easiest way to figure out {path/to/air} for yourself is to run which air from a Terminal on Unix, and where air from the Command Prompt on Windows.

Code formatting: how to install Air

At this point, explicit calls to Reformat Selection and Reformat Document should use Air.
If you’d also like RStudio to invoke Air on save:
- Open Tools -> Global Options -> Code -> Saving and check Reformat documents on save.

Figure 1: Rstudio settings

Reproducibility: absolute paths

read.table("~/idiv/biotime/data/biotime.csv")

Works only on your current computer.

Reproducibility: relative paths

setwd("~idiv/biotime")
read.table("data/biotime.csv")

Better but also works only on your current computer.

Setting your working directory by hand is not a very reproducible habit.

Reproducibility: relative paths

Using an R project? All your paths can be relative to the root of the project.

Your colleagues, reviewers and future self don’t need to make any change to have working paths.

Reproducibility: relative paths

Don’t use Rstudio? You can use the here package:
- Create a .here empty file at the root of your project.
- and use it like this:

read.table(here("data", "biotime.csv"))

All your paths are relative to the .here file.
Even works on the cluster.

Reproducibility: `renv`

Let’s snapshot the package versions used in this project. Easy!

install.packages("renv")
renv::snapshot()

Did it work?

Most likely problem: renv does not know where is the root of the project.
- renv looks for a project_name.Rproj, a README, a DESCRIPTION file or a R/ folder.
- Easy fix is to create an empty file at the root of the project called .here.

Metadata: README

The README located at the root of the project file will appear directly in GitHub and in Zenodo.

I show here recommendations from a Methods in Ecology and Evolution hackathon:

Metadata: code README

Information on the manuscript it came from.
Contact details of at least one author.
License information [note that some people provide this as a separate LICENSE file which is also good practice].
List of all scripts and what they do, i.e. processing, analysis, plotting etc. and what their outputs are (e.g. table 1, figure 2). [note that some of the detailed descriptions of this may be in the files themselves, especially for functions, this is also fine but the README should list the basics of what the scripts do].
Details of the workflow of the code if there are multiple scripts, i.e. what order do the scripts need to be run in?
How does the code link to the data? i.e. which data files are needed for each script?
The name of the software used (e.g. R), version, and names and versions of all packages required to run the analyses.

Metadata: data README

Information on the manuscript it came from.
Contact details of at least one author.
License information.
Information about the data.
Brief summary of how data were collected.
Sources of data if from a literature review.
List of all data files.
Column-by-column description of the data files, along with column headers, measurement units, levels of factors (e.g. if the variable is “habitat” what categories are possible?), explanations for any abbreviations.

Metadata: CITATION.cff

The reference inside a CITATION.cff file would also be shown elegantly by both GitHub and Zenodo.

It is the best way to acknowledge funding and the participation of people who did not directly contribute code to the repository but participated to the analyses.

cff-version: 1.2.0
message: "If you use these data and code, please cite this work as below."
authors:
  - family-names: Sagouis
    given-names: Alban
    orcid: https://orcid.org/0000-0002-3827-1063
  - family-names: Blowes
    given-names: Shane
    orcid: https://orcid.org/0000-0001-6310-3670
  - family-names: Chase
    given-names: Jonathan
    orcid: https://orcid.org/0000-0001-5580-4303
  - family-names: Xu
    given-names: Wubing
    orcid: https://orcid.org/0000-0002-6566-4452
title: chase-lab/metacommunity_surveys, Metacommunity Surveys data for `Local
changes dominate variation in biotic homogenization and differentiation`
version: v2.5-Blowes_etal_Science_Advances
date-released: 2024-01-01

Metadata: CITATION.cff

Using unique identifiers such as orcid and ROR is a great idea.

Here is a guide on how to structure it because it is flimsy: https://citation-file-format.github.io/.
TL/DR: you can just copy paste the code from the previous slide or build it with an online tool.

Publishing on GitHub: `git init`

“Existing project, github last” workflow from Jenny Bryant’s book Happy Git and GitHub for the useR.
We activate git locally.
Create an empty repository on GitHub.
Copy paste the 3 command lines GitHub gives us and done.

Publishing on GitHub: `.gitignore`

Get the skeleton here

Exclude an entire folder like this

doc/
inst/ignore/

Exclude all files from a specific format like this:

.DS_Store
*.html

Exclude all files from a specific format in a specific folder like this:

vignettes/*.R
src/*.o

Exclude all files but one like this:

/cache/**
!README

Publishing on GitHub: `git commit` and `git push`

git commit creates the snapshot
git push sends it to github.com

Can you see your README and your CITATION?

Archiving: Zenodo

Issuing a persistent identifier for your repository with Zenodo
Login with GitHub
Activate the repo you want to publish
Create a release on github
Enjoy

Extras

Add badges to your README
- Project version
- Zenodo DOI
- Manuscript DOI

You kept working on this project?
- Create a new release
- Zenodo automatically gives you a new DOI and keeps track of versions

Publishing your repo

This is for you if:

What will we do?

Let’s choose a project

Code formatting

Code formatting: the Rstudio default

Code formatting: or use the Air formatter

Code formatting: how to install Air

Code formatting: how to install Air

Reproducibility: absolute paths

Reproducibility: relative paths

Reproducibility: relative paths

Reproducibility: relative paths

Reproducibility: renv

Metadata: README

Metadata: code README

Metadata: data README

Metadata: CITATION.cff

Metadata: CITATION.cff

Publishing on GitHub: git init

Publishing on GitHub: .gitignore

Publishing on GitHub: git commit and git push

Archiving: Zenodo

Extras

Reproducibility: `renv`

Publishing on GitHub: `git init`

Publishing on GitHub: `.gitignore`

Publishing on GitHub: `git commit` and `git push`