Skip to contents

Overview

A savepoint folder contains one or more files that are linked to one or more dependency files in a separate folder. Typically, files in a savepoint folder are scripts or some output from an analysis that is dependent on some other file(s), such as the raw source data.

When a dependency file is updated/revisioned, files in the savepoint folder are then considered ‘out of date’, and users are notified that these downstream files in the savepoint folder should be updated.

Savepoint folders have revision numbers. Individual files in savepoint folders can also have unique revision numbers, independent from the savepoint folder revision number.

Setup

To create a savepoint folder, first, ensure that the root folder contains one or more savepoint ‘container’ folders.

folders <- get_children_of_folder(path = "R-Integral")

print(folders)

Note in the above output that the R-Integral root folder contains a savepoint container folder named Analyses.

Next, identify a source data set to use for the analysis. In this example, the mtcars data set will be used, which is saved to a location in a remote Integral repository.

data <- integral_read(FUN = read.csv,
                      file = "R-Integral/Data/mtcars.csv")

Perform some Exploratory Data Analysis (EDA) on the data, creating a few scatter plots and saving them to the working directory.

png(file = "mtcars_mpg_vs_wt.png", width = 800, height = 600)
plot(data$wt, data$mpg,
     xlab = "Weight (1000 lbs)",
     ylab = "Miles/(US) gallon",
     main = "Vars: MPG vs. Weight",
     pch = 19)
dev.off() 

png(file = "mtcars_mpg_vs_hp.png", width = 800, height = 600)
plot(data$hp, data$mpg,
     xlab = "Horsepower",
     ylab = "Miles/(US) gallon",
     main = "Vars: MPG vs. Horsepower",
     pch = 19)
dev.off() 

Plot of weight vs miles-per-gallon from the mtcars dataset.

Plot of horsepower vs miles-per-gallon from the mtcars dataset.

Usage

Create new savepoint

Now that a few output files are available, based on the dependency data, the create_savepoint() function can be used to create a new savepoint.

The name is the name of the new savepoint folder, which becomes a subfolder of path (e.g., the Integral path to an existing savepoint folder container). The file argument is a character vector of paths to local files in which to upload to the folder. The dependencies argument is a character vector, specifying the paths to one or more dependency files in the remote Integral repository. The type argument is used to define the type of savepoint folder, which users can configure within Integral. The values argument is used to pass a list of various meta information with the savepoint, with the format being “attribute” = “value”. Some of these fields are required, so use ?get_folder_type_attributes to check which attributes are required and ?get_folder_type_values for permissible meta information to be included.

create_savepoint(
  name = "Figures",
  file = c("mtcars_mpg_vs_hp.png", "mtcars_mpg_vs_wt.png"),
  path = "R-Integral/Analyses",
  dependencies = "R-Integral/Data/mtcars.csv",
  type = "Analysis",
  values = c("Analysis Type" = "Client Application",
             "Tool Version" = "R 4.3.1",
             "Status" = "Draft",
             "Description" = "initial plot upload")
)

Update existing savepoint

Next, make some changes to the data set and re-upload to Integral, causing the R-Integral/Analyses/Figures savepoint folder to be out of date. The plot files in the savepoint folder can then be updated/revised based on the new data.

Import data and change mpg column value:

data <- integral_read(FUN = read.csv,
                      file = "R-Integral/Data/mtcars.csv")
                      
data[, "mpg"] <- as.numeric(data[, "mpg"]) * runif(1)

Next, save the data back to Integral as a new revision to the existing mtcars.csv file.

integral_write(data, FUN = write.csv, row.names = FALSE, path = "R-Integral/Data/mtcars.csv")

Note that the Figures savepoint folder has now turned red, indicating that the contents of the folder are out of date.

Image of an out of date Integral folder.

Now, recreate the plots with the updated data:

png(file = "mtcars_mpg_vs_wt.png", width = 800, height = 600)
plot(data$wt, data$mpg,
     xlab = "Weight (1000 lbs)",
     ylab = "Miles/(US) gallon",
     main = "Vars: MPG vs. Weight",
     pch = 19)
dev.off() 

png(file = "mtcars_mpg_vs_hp.png", width = 800, height = 600)
plot(data$hp, data$mpg,
     xlab = "Horsepower",
     ylab = "Miles/(US) gallon",
     main = "Vars: MPG vs. Horsepower",
     pch = 19)
dev.off() 

Finally, use the function update_savepoint() to update the files in the existing savepoint folder. The arguments are similar to create_savepoint(), with the only major difference being the addition of a reason argument.

update_savepoint(
  name = "Figures",
  file = c("mtcars_mpg_vs_hp.png", "mtcars_mpg_vs_wt.png"),
  path = "R-Integral/Analyses",
  dependencies = "R-Integral/Data/mtcars.csv",
  reason = "update plots",
  values = c(
    "Analysis Type" = "Client Application",
    "Tool Version" = "R 4.3.1",
    "Status" = "Draft",
    "Description" = "test plot upload"
  )
)