Integral for Validated R Package Repositories • Certara.IntegralR

Overview

Among its many capabilities, Integral can function as a centralized repository for storing, distributing, and version-controlling validated R packages, which is critical for ensuring reproducibility in analyses, especially in light of potential changes to package versions over time. By leveraging the renv and miniCRAN packages, users can create a “snapshot” of the R packages used in an analysis and seamlessly upload them to Integral from R. Importantly, when the analysis needs to be reproduced at a later time, the package repository can be downloaded from Integral, allowing other users to easily install the required packages and recreate the exact R environment used by the original analyst. Hosting these package snapshots within Integral provides a safeguard against the challenges posed by updates or removals of packages from CRAN, enabling consistent and reproducible workflows across teams and projects.

In this vignette, we demonstrate a minimally reproducible example of how to create a portable package repository using miniCRAN and save it to Integral. This approach simplifies the process of sharing and reproducing R environments, ensuring a seamless experience for all users.

Setup

To create a snapshot of the environment, the renv and miniCRAN packages will need to be installed locally.

install.packages(c("renv", "miniCRAN"))

While not required, the renv package works best when used in an R project environment. An R project creates a self-contained working directory for your analysis, which includes a .Rproj file to store project-specific settings and metadata. If you are not performing your analysis within a designated R project, make sure that your current working directory is not cluttered with irrelevant files. renv will check all package dependencies in the defined project or environment, so it is imperative that only relevant packages exist within this working directory.

For the purpose of this vignette, we will set our working directory to be a folder that contains a single file called dependencies.R, which will simulate an analysis script.

setwd("~/original_analysis")

An image of a folder directory containing a single file called dependencies.R

This file consists of two lines of code that are calling packages, similar to how one would document any packages needed at the top of their analysis script.

An image of a simple R script that calls two packages via the library() function

Snapshotting Current Packages

Now that we are working in a folder designed specifically for our analysis, we are ready to take a snapshot of the packages we need. Using the snapshot() function from the renv package, we can create a JSON lockfile, which contains information on all the packages that are being used in our current folder. In addition to the packages we explicitly named, it also records any of our packages’ required dependencies.

renv::snapshot()

The snapshot() function accepts a type argument, which dictates how the function decides which packages to include in the lockfile. Values passed to this argument can be either “implicit” or “explicit”.

Implicit Packages

The default behavior is the same as specifying snapshot(type = "implicit"), which will record all the packages returned by renv::dependencies(). It will crawl through all the R scripts in your project, looking for any calls to packages with library() as well as package NAMESPACE prefixes (e.g., package::function()). This behavior works well for a single analysis or project, where a user wants to create a repository containing only the packages required to re-run the analysis.

Explicit Packages

Alternatively, you can use snapshot(type = "explicit") to create the lockfile. This behavior only checks packages recorded in the DESCRIPTION file. While the DESCRIPTION file is most often used in packages, it can also be added to an R project (usethis::use_description(check_name = FALSE)). snapshot() will check the Imports section of the DESCRIPTION file to determine what packages to capture. This method is more applicable for an admin who wants to validate and capture an entire suite of R packages to be used in a variety of use cases across multiple users.

Creating a Local Package Repository

Now that we know what packages are needed, we will create a local repository of these packages using the miniCRAN package. The functions from this package require a repository or CRAN mirror (typically your regional CRAN repository) to be defined as an argument.

CRAN <- c(CRAN = "https://cloud.r-project.org")

We can now parse the lockfile using the lockfile_read() function, and then extract the names of the packages being used. As an additional safeguard, we will also check for dependencies using the pkgDep() function from miniCRAN.

# Parse lockfile
lockfile <- renv::lockfile_read()

# Parse package names
pkgs <- names(lockfile$Packages)

# Check for package dependencies, while ignoring suggested packages 
pkgList <- miniCRAN::pkgDep(pkgs, repos = CRAN, type = "source", suggests = FALSE)

Since this repository will always be available on Integral, there is no need to keep in a permanent location locally; we will create a temporary directory called miniCRAN to serve as our repository to export. With our temporary directory created, we will use the makeRepo() function from miniCRAN to download all the packages in our list to our temporary repository.

# Create temporary directory
miniCRAN_pth <- file.path(tempdir(), "miniCRAN")
dir.create(miniCRAN_pth)

# Download packages and create the repository
miniCRAN::makeRepo(pkgList, path = miniCRAN_pth, repos = CRAN, type = c("source", "win.binary"))

An important detail to note is that makeRepo() installs the most up to date version of packages. As binaries for old packages are generally not available from CRAN, adding old packages will require additional manual work, so using the most up to date versions of your chosen packages will allow for the easiest workflow. For ways to use previous package versions or to fine tune your repository, see the Advanced Techniques for Repository Management vignette.

Uploading the Package Repository to Integral

With a local folder containing all the packages needed to recreate our analysis, we can now zip our folder and upload it to Integral. This will allow all future Integral users to download the repository and use the same packages used in the original analysis. We should also upload our renv.lock file as well, as it contains valuable information about our packages and R environment.

library(Certara.IntegralR)

# Change to the miniCRAN directory temporarily. This makes handling paths in zip() easier. 
old_wd <- setwd(miniCRAN_pth)  

# Zip the miniCRAN directory. 
# List all the files (recursively) in the new working directory, 
files_to_zip <- list.files(".", recursive = TRUE, full.names = FALSE) 

# Create the path name of the zipped repo
zipped_repo <- file.path(tempdir(), "project_001_miniCRAN.zip")

# Zip the listed files to the defined folder path
utils::zip(
    zipfile = zipped_repo, 
    files = files_to_zip
)

# Change back to our original working directory
setwd(old_wd)

# Upload the zipped repo along with the renv.lock file
integral_upload(
    file = c(zipped_repo, "renv.lock"),
    path = "My Study/R Package Repositories/Project 001",
    reason = "Create an R package repository to recreate analysis"
)

Downloading and Using the Repository

Downstream users can now download the saved package repository and use it as their installation source for packages. This will provide them with the exact same version of packages used in the original analysis. This local repository will be stored within the project folder and all packages will be installed from here instead of from the master CRAN location.

# Move to a clean folder to recreate analysis
setwd("~/new_analysis")

# Download the repository and lockfile to the new folder
Certara.IntegralR::integral_download(
    file = c("My Study/R Package Repositories/Project 001/renv.lock", 
             "My Study/R Package Repositories/Project 001/project_001_miniCRAN.zip"),
    path = getwd()
)

# Unzip the zipped folder to a local sub folder by the same name
unzip(
    zipfile = "project_001_miniCRAN.zip",
    exdir   = "project_001_miniCRAN"
)

With the package repository downloaded locally, we will activate and load a renv environment with the activate() and load() functions from renv. This will create a local folder structure that will serve as the package library that packages from our local repository will be downloaded to. All calls to library() a package will now pull packages in from this local library. After doing this, we can see that our library paths have been changed to our new local library, as well as an additional temporary one that holds base R packages associated with our current R version.

# Activate a new renv
renv::activate()
renv::load()

# Checking that our .libPaths() have changed to a renv library
.libPaths()

Finally, we need to restore our environment to the one described in the lockfile. While we have these packages downloaded in a local repository, they have not actually been installed. Using the restore() function from renv, we will supply the lockfile that describes the packages needed, along with the location of our local repository from which to download the packages.

renv::restore(repos = "project_001_miniCRAN")

We now have our local environment set up to recreate our original analysis. Calling library() to our original two packages (tictoc and ggplot2) will be successful, while calling it for a package that was not listed in our lockfile will return an error.

library(tictoc)
library(ggplot2)
library(curl)

Once you are done working in this analysis folder, you can use renv::deactivate() to reset your .libPaths() to your normal system library. This will retain your local renv library in the folder, allowing you to easily come back and continue work on the project at a future date by simply using renv::activate() again.