Skip to contents

Overview

While snapshotting a repository is much easier when planned in advance, it is likely that you will have projects that are already completed that need to be archived. These might contain older versions of packages that may be hard to source. Alternatively, you may want to create a package repository that only contains packages your organization has approved and validated. Using tested and validated versions of packages is crucial in creating reproducible and reliable analyses, and Integral serves as a key component in creating a reliable repository for future use. Once you have gone through the effort to create a validated set of R packages, you will want to save them to Integral to enable downstream users easy access to these reliable and tested versions. In this vignette we will cover how to build a repository with specific package versions, along with strategies to modify and update an already built local repository.

A useful tool for assessing the risks of various packages while building your organizations repository is the riskmetric package. More information about risk management and riskmetric can be found here.

Setup

In this example, we will be creating a repository for an analysis (R script) stored in Integral that was built on version 3.3.6 of the ggplot2 package. In the R code, there is an argument size in the function geom_line() that became deprecated in versions 3.4.0 and onward. Users of newer ggplot2 versions can still run the script, but it returns warnings in the console. In addition to striving for your analysis to run with no warnings or errors, it is likely that this argument will eventually be removed all together, meaning the analysis will not be able to be run on later versions. Because setting up a working environment for new users with outdated packages can be tedious to get all the dependencies aligned, we will create a repository with the last version of ggplot2 before the size argument was deprecated, to help maintain reproducibility for all future users.

To prepare this repository snapshot with the required R packages and versions we need, we will download our analysis script from Integral.

library(Certara.IntegralR)

integral_download(
  file = "R Package Repository - Demo/R Scripts/pk_conc_time_plot.R",
  path = getwd()
)

Accessing Old Packages from CRAN

The next step is to install the the required version of ggplot2 so we can build a lockfile for future users. The Comprehensive R Archive Network (CRAN) is the main repository in which R packages are stored and installed from. Packages can be installed in two different ways:

  1. From “binary”. This is a ‘pre-compiled’ and ‘pre-built’ .zip (Windows) or .tgz (Mac) file of the package specific to your OS and R version. The contents of the binary file will be similar to the contents of the installed package folder inside your R library. Downloading binaries is the preferred option as the binary file simply needs to be extracted and moved inside your R library folder, e.g., .libPaths(), making installation significantly faster.
  2. From “source”. These files are saved as .tar.gz compressed folders and contain the raw source code of the package. Installing from source is slower because, after the .tar.gz is downloaded, the package must then be built on the user’s system, which can take additional time. Further, if the particular R package calls C, C++, Fortran code internally or requires external libraries that need linking (e.g., BLAS or LAPACK), this will require additional compilation and RTools may be required. Linux does not commonly distribute R package binaries as precompiled files. Instead, users typically build packages from source (.tar.gz) and the required system libraries must be installed separately from the Linux terminal e.g., sudo apt install libblas-dev liblapack-dev.

The most common way to install old package versions is by using the remotes package. With the install_version() function, you can install previous versions of packages from CRAN.

remotes::install_version("ggplot2", version = "3.3.6")

While this will work, it is possible that the version you select will need compilation from source. CRAN only maintains binaries for the newest package versions that were available at the time of each minor R release (e.g., 4.3, 4.4). Additionally, they do not maintain binaries for any previous major R version (e.g., 3.0). In our case, our closest options for binaries are version 3.3.5 (R 4.0) or 3.4.2 (R 4.1), neither of which are the version we want.

Cases like this mean we would need to find a different repository than the central CRAN if we want this package binary. Establishing a repository of source files would be redundant for us since CRAN already hosts them. Additionally, it could force downstream users to compile the packages locally after installation, which can be a tedious process for re-running analyses and may lead to installation discrepancies or compilation issues.

Using CRAN Snapshots

Luckily, Posit (the creator of RStudio) has been taking various snapshots of the entire CRAN repository almost daily since 2017. These snapshots contain every package (source and binary) that was on CRAN at the time of snapshot, so while downloading the entire snapshot would be far too cumbersome for individual use, we can still download the specific packages we need.

Heading to https://packagemanager.posit.co/client/#/repos/cran/setup, we can follow the prompts to gain access to the specific repository required to download our packages. By selecting the date of November 3, 2022 (the most recent date before ggplot2 was updated to version 3.4.0), we are given a URL similar to the traditional CRAN one, but this is to the repository as it stood on that specific date. We will pass this URL to all repo arguments to ensure that we are pulling packages from the snapshot when ggplot2 was at the desired version, increasing the likelihood of finding binaries.

An image of the url generate by the Posit package manager website.

We will download any packages needed for our analysis from this new URL, to ensure compatibility and avoid introducing newer dependencies that could disrupt the consistency of our environment. renv will use the traditional CRAN URL, as newer versions have helper functions and the package itself does not have any dependencies outside of base-R packages. Because we are using IntegralR in our workflow, we will also include the Certara repository URL when checking for dependencies.

CRAN_url <- "https://packagemanager.posit.co/cran/2022-11-03"

# Install necessary packages
install.packages("renv")
install.packages("ggplot2",  repos = CRAN_url)
install.packages("miniCRAN", repos = CRAN_url)
install.packages("Certara.IntegralR",
                 repos = c(Certara = "https://certara.jfrog.io/artifactory/certara-cran-release-public/", 
                           CRAN    = CRAN_url), 
                 method = "libcurl")

Create a Lockfile and Repo

With our CRAN repository URL, we can now start the process of creating a temporary repository and its associated lockfile, as demonstrated in the overview vignette. We will specify the repos we are using in the snapshot() function so they can be recorded in the lockfile.

# Create lockfile
renv::snapshot(repos = c(CRAN = CRAN_url,
                         Certara = "https://certara.jfrog.io/artifactory/certara-cran-release-public/"))

# Parse lockfile
lockfile <- renv::lockfile_read()

# Parse package names
pkgs <- names(lockfile$Packages)

# Get list of all recursive ggplot2 dependencies as of 2022-11-03
pkgList <- miniCRAN::pkgDep(pkgs, 
                            repos = c(CRAN = CRAN_url, 
                                      Certara = "https://certara.jfrog.io/artifactory/certara-cran-release-public/"), 
                            type = "source", suggests = FALSE)

# List of packages to be added to repository
pkgList
 [1] "Certara.IntegralR" "MASS"              "Matrix"            "R6"                "RColorBrewer"      
 [6] "askpass"           "cli"               "colorspace"        "curl"              "digest"            
[11] "fansi"             "farver"            "ggplot2"           "glue"              "gtable"            
[16] "httr2"             "isoband"           "jsonlite"          "labeling"          "lattice"           
[21] "lifecycle"         "magrittr"          "mgcv"              "munsell"           "nlme"              
[26] "openssl"           "pillar"            "pkgconfig"         "rappdirs"          "renv"              
[31] "rlang"             "scales"            "sys"               "tibble"            "utf8"             
[36] "uuid"              "vctrs"             "viridisLite"       "withr"    
# Create temporary directory
repo_path <- file.path(tempdir(), "temp_repo")
dir.create(repo_path)

# Make a repository
miniCRAN::makeRepo(pkgList, 
                   path = repo_path, 
                   repos = c(CRAN = CRAN_url, 
                             Certara = "https://certara.jfrog.io/artifactory/certara-cran-release-public/"), 
                   type = c("source", "win.binary"))

Modifying Our Repo

Sometimes after creating a repository, you may realize that you need to add more packages or update one that exists in the repository. miniCRAN provides a few helper functions to assist with this.

Updating a Package Version

If we decide that one of the packages needs updating, we can use oldPackages() to return a list of the packages that have a newer version available. Once we have decided that we want one of those updated versions, we can use updatePackages(). This will update those packages to the newer version, replacing the previous one in our repo. The default behavior will interactively ask if you want to update each package individually, but adding the ask = FALSE argument can silence this and forcefully update all of them. These functions can only update one type at a time, so be sure to update all types equally to ensure your repository does not have different versions within it. For this case, we need to update the curl package to >= 5.0.1 to satisfy the requirements for IntegralR. We will use the first date available after version 5.0.1 was released.

# Check what packages have updates. Indexing [,1:3], as column 4 is just the repository URL
miniCRAN::oldPackages(path = repo_path, repos = "https://packagemanager.posit.co/cran/2023-06-08", type = "source")[,1:3]
miniCRAN::oldPackages(path = repo_path, repos = "https://packagemanager.posit.co/cran/2023-06-08", type = "win.binary")[,1:3]
            Package       LocalVer   ReposVer 
cli         "cli"         "3.4.1"    "3.6.1"  
colorspace  "colorspace"  "2.0-3"    "2.1-0"  
curl        "curl"        "4.3.3"    "5.0.1"  
digest      "digest"      "0.6.30"   "0.6.31" 
fansi       "fansi"       "1.0.3"    "1.0.4"  
ggplot2     "ggplot2"     "3.3.6"    "3.4.2"  
gtable      "gtable"      "0.3.1"    "0.3.3"  
httr2       "httr2"       "0.2.2"    "0.2.3"  
isoband     "isoband"     "0.2.6"    "0.2.7"  
jsonlite    "jsonlite"    "1.8.3"    "1.8.5"  
lattice     "lattice"     "0.20-45"  "0.21-8" 
MASS        "MASS"        "7.3-58.1" "7.3-60" 
Matrix      "Matrix"      "1.5-1"    "1.5-4.1"
mgcv        "mgcv"        "1.8-41"   "1.8-42" 
nlme        "nlme"        "3.1-160"  "3.1-162"
openssl     "openssl"     "2.0.4"    "2.0.6"  
pillar      "pillar"      "1.8.1"    "1.9.0"  
renv        "renv"        "0.16.0"   "0.17.3" 
rlang       "rlang"       "1.0.6"    "1.1.1"  
sys         "sys"         "3.4.1"    "3.4.2"  
tibble      "tibble"      "3.1.8"    "3.2.1"  
utf8        "utf8"        "1.2.2"    "1.2.3"  
vctrs       "vctrs"       "0.5.0"    "0.6.2"  
viridisLite "viridisLite" "0.4.1"    "0.4.2"  
# Go ahead and update the curl package
miniCRAN::updatePackages(path = repo_path, oldPkgs = "curl",repos = "https://packagemanager.posit.co/cran/2023-06-08", type = "source", ask = FALSE)
miniCRAN::updatePackages(path = repo_path, oldPkgs = "curl",repos = "https://packagemanager.posit.co/cran/2023-06-08", type = "win.binary", ask = FALSE)

While the repository is updated, our lockfile is not, as it is based on the versions we have in our local library. We will need to update our local version before we can update the lockfile. We will install curl from the same CRAN snapshot date that was used in making the repo.

# Update curl, using the newer CRAN snapshot date
install.packages("curl",  repos = "https://packagemanager.posit.co/cran/2023-06-08", dependencies = FALSE)

# Update lockfile
renv::snapshot(repos = c(CRAN = CRAN_url,
                         Certara = "https://certara.jfrog.io/artifactory/certara-cran-release-public/"))

Adding a New Package

We can also add entirely new packages to our repo using the addPackage() function. Here we will add the tictoc package in case downstream users want to run some benchmark tests on the analysis. The default behavior downloads all dependencies and suggests, so after checking there are no dependencies we need, we will clarify the deps = FALSE argument to prevent any unnecessary extra packages.

# Check for dependencies
miniCRAN::pkgDep("tictoc", repos = CRAN_url, type = "source", suggests = FALSE)

# Download and ignore dependencies/suggests
miniCRAN::addPackage("tictoc", 
                     path  = repo_path, 
                     repos = CRAN_url, 
                     deps  = FALSE)

Because we did not install this package locally and update our lockfile, downstream users will not automatically have the tictoc package installed when they restore their environment. However, it will be available in their local repository, allowing them to install the version we saved for them if they wish.

Using the Repository

At this stage, we would zip the repository and upload it to Integral as demonstrated in the overview vignette. The repository is now set up for future users to easily replicate the analysis. A downstream user would first want to move to a clean analysis folder and download the necessary lockfile, repository, and analysis script from Integral, making sure to unzip the repository into their working directory.

library(Certara.IntegralR)

# Download the repository and lockfile to the new folder
Certara.IntegralR::integral_download(
    file = c("R Package Repository - Demo/R Scripts/pk_conc_time_plot.R",
             "R Package Repository - Demo/Repository/renv.lock", 
             "R Package Repository - Demo/Repository/repo.zip"),
    path = getwd()
)

They then will want to activate and load their renv before restoring it with the lockfile. Make sure to point to the new unzipped repository in the repos argument.

# Activate and load renv
renv::activate()
renv::load()

# Restore renv from lockfile
renv::restore(repos = "repo")

With everything set up, they should now be able to execute the script and successfully recreate the figure using the deprecated arguments that only work in ggplot2 version 3.3.6 and earlier.

library(Certara.IntegralR)
library(ggplot2)

pkData <- Certara.IntegralR::integral_read(FUN = read.csv,
                                           file = "R Package Repository - Demo/Data/pkData.csv")

# Create plot of concentration vs. time
pkData$Subject <- as.factor(pkData$Subject)

pkData |>
  ggplot(aes(x = Act_Time, y = Conc, group = Subject, color = Subject)) +
  scale_y_log10() +
  geom_line(size = 0.5) +   # The size arg results in a warning in ggplot2 >= 3.4.0
  geom_point() +
  ylab("Drug Concentration \n at the central compartment")

A plot of actual time vs drug concentration.

Troubleshooting

Mismatched Repo and Lockfile

If restoring from your lockfile results in errors that the specified versions cannot be found, it is likely that your local package library, when you created the lockfile, did not have the packages versions you intended. renv::snapshot() checks for required packages that are called in R scripts, but the versions it stores are the ones that were present in your local library. It is common for local versions to be newer than those that were obtained using a CRAN snapshot URL. The easiest way to solve this is to erase the entire local library and start from scratch, installing only the required packages to have the scripts run, and sourcing them all from the SAME repository that you are downloading from in your makeRepo() call. Additionally, make sure you are not accidentally downloading any packages outside of the required dependencies. Recursive dependencies can become complex, so you only want to download the bare minimum to reproduce the analysis.

Duplicate Packages in Repo

The default behavior of addPackage() will add duplicate versions of dependency packages to your repo, leaving it in an inconsistent state. The easiest way to avoid this is to add the deps = FALSE argument to the addPackage() function, and download each needed package on its own. This can become very complicated when the list of recursive dependencies is large, and is the reason why setting up the repo correctly in a single call to makeRepo(), ideally with the newest version of packages, is the cleanest and simplest way to build your repository. If you realize you need a new package and are encountering issues with adding it to the miniCRAN repo, it may be worth restarting and making sure you include it in the first call to makeRepo().

If you encounter this issue and choose to move forward, you will need to manually check the repository for duplicates and remove them before updating the PACKAGES index.

Below is an example of how this may happen, and how to resolve it.

# List packages
pkgs <- miniCRAN::pkgDep("askpass", repos = CRAN_url, type = "source", suggests = FALSE)

# Make a repository
my_dir <- tempdir()
miniCRAN::makeRepo(pkgs, 
                   path  = my_dir, 
                   repos = c(CRAN = CRAN_url), 
                   type  = c("source", "win.binary"))

# Add a new packages from the current CRAN
# Not specifying deps = FALSE will add some newer versions of dependencies
miniCRAN::addPackage("openssl", 
                     path  = my_dir, 
                     repos = "https://cloud.r-project.org")

# List packages, and return warning if there are duplicates
src_pkgs <- miniCRAN::checkVersions(pkgList, path = my_dir, type = "source")
bin_pkgs <- miniCRAN::checkVersions(pkgList, path = my_dir, type = "win.binary")

We get the following warning returned when checking our source packages, but not the binaries.

Warning message:
Duplicate package(s): askpass, sys 

We then need to manually index the source packages and remove the duplicates.

# After inspecting package versions, remove old versions
basename(src_pkgs[[1]])
 [1] "R6_2.5.1.tar.gz"        "askpass_1.1.tar.gz"     "askpass_1.2.1.tar.gz"   "cli_3.6.3.tar.gz"       "curl_6.0.1.tar.gz"
 [6] "digest_0.6.37.tar.gz"   "glue_1.8.0.tar.gz"      "jsonlite_1.8.9.tar.gz"  "lifecycle_1.0.4.tar.gz" "magrittr_2.0.3.tar.gz"
[11] "openssl_2.3.0.tar.gz"   "rappdirs_0.3.3.tar.gz"  "rlang_1.1.4.tar.gz"     "sys_3.4.1.tar.gz"       "sys_3.4.3.tar.gz"       
[16] "withr_3.0.2.tar.gz"  
# Remove the packages by index. Alternatively, you can supply the full file name
file.remove(src_pkgs[[1]][c(2, 14)])

# Update PACKAGES index
miniCRAN::updateRepoIndex(repo_path, type = c("source", "win.binary"))