library(Certara.RDarwin)

RDarwin is an R package designed to facilitate the usage of pyDarwin with the Certara NLME pharmacometric modeling engine from the R command line. pyDarwin is a powerful tool for using machine learning algorithms for model selection. This vignette provides an overview of how RDarwin can assist you in creating your model selection projects.

Table of Contents

  • Introduction: Understand the basic concepts of machine learning model selection and the role of pyDarwin in this process.

  • Creating the Template and Tokens Files: Learn how to create the essential template.txt and tokens.json files using RDarwin, which form the foundation of your model selection project.

  • Generating pyDarwin Options: Discover how to create the options.json file, which is crucial for configuring your pyDarwin project to meet your specific requirements.

  • Running the Model Search with pyDarwin: Explore how to execute a model search with pyDarwin using the run_pyDarwin() function

Introduction

Machine learning algorithms play a crucial role in model selection, particularly in the field of pharmacokinetics. Model selection involves identifying the most appropriate model from a range of candidate models. This selection process is usually categorized into two types: supervised and unsupervised learning.

  • Supervised Learning: In supervised learning, algorithms learn to associate patterns with labeled examples. For example, given a dataset of cat and dog images, an artificial neural network (ANN) can learn the patterns associated with “cat” and “dog.” It can then predict whether new images are cats or dogs.

  • Unsupervised Learning: Unsupervised learning, on the other hand, lacks labeled training data. It’s akin to exploring an unknown landscape. For instance, traditional PK/PD model selection doesn’t have a labeled training dataset. Each dataset contributes to the learning process, and the algorithm must discover relationships across different datasets.

pyDarwin addresses the model selection process using machine learning techniques, providing a powerful platform for these tasks. It allows you to specify your model’s goodness, with options for penalties and post-run custom code execution.

RDarwin serves as a bridge between the R environment and pyDarwin, enabling you to create, manage, and execute model selection projects seamlessly. Make sure you have Python and pyDarwin installed. Also please note that pyDarwin uses a call to Certara.RsNLME package, so it should be also installed with all dependencies.

Data

The data set used in this example is timeVaryingCovariates.csv and should be downloaded to your working directory. Click here to download.

Creating the Template and Tokens Files

To use pyDarwin effectively, you need to create specific files that define the structure and parameters of your model search space. RDarwin assists you in generating two essential files: template.txt and tokens.json.

Template File (template.txt)

The template.txt file provides a foundational shell for NLME (Nonlinear Mixed Effects) metamodel files. This template sets the structure for your model search, defining the parameters to be considered during the selection process.

Tokens File (tokens.json)

The tokens.json file describes the dimensions of your model search space and the available options within each dimension. It provides a structured representation of your search space, including the characteristics of your models and their various attributes.

RDarwin provides two key functions to facilitate the creation of these files:

  • get_PMLParametersSets: This function allows you to specify general PK model structures to be included in the search space. You can define model structures such as the number of compartments, absorption type, parameterization type, and more.
# create a search space with 1 compartment model, clearance parameterization and
# not using analytical solution due to interpolated covariate added later.
# Using ellipsis arguments, additional search space modification are made. 
# Residual error is changed to Additive and set LLOQ to 0.1; 
# changed the type of structural parameters to 'LogNormal2' and 
# added a covariate scr to be searched on structural parameters.
modelPMLCodes <- get_PMLParametersSets(
  CompartmentsNumber = 1,
  Parameterization = "Clearance",
  ClosedForm = FALSE,
  CObs = Observation(
    ObservationName = "CObs",
    SigmasChosen = list(Additive = 0.02),
    BQL = TRUE,
    BQLValue = 0.1
  ),
  V = StParm(StParmName = "V", Type = "LogNormal2"),
  Cl = StParm(StParmName = "Cl", Type = "LogNormal2"),
  scr = Covariate(
    Name = "scr",
    Direction = "Interpolate",
    State = "Searched"
  )
)

It is recommended to modify the search using modify_ (e.g., modify_StParm()) and add_ (e.g., add_Covariate()) families of tidy functions since this approach is more general and more flexible.

  • write_ModelTemplateTokens: This function generates and writes both the template.txt and tokens.json files based on your inputs. It streamlines the process and ensures the correct structure of these essential files.
TemplateFilePath <- file.path(tempdir(), "template.txt")
TokensFilePath <- file.path(tempdir(), "tokens.json")
# the function will return the text written to TemplateFilePath
# Note that `AMT` is used as a model term in DataMpping.
# AMT is resolved to the main dosepoint name in the model
# (A1 for Absorption == "Intravenous" or "Gamma", Aa for 
# Absorption == "Extravascular")
generatedOutput <-
  write_ModelTemplateTokens(
    TemplateFilePath = TemplateFilePath,
    TokensFilePath = TokensFilePath,
    Description = "searchCov",
    Author = "Certara",
    DataFilePath = "timeVaryingCovariates.csv",
    DataMapping = c(
      id = "id",
      time = "time",
      AMT = "dose",
      CObs = "dv",
      scr = "scr"
    ),
    PMLParametersSets = modelPMLCodes,
    EstArgs = specify_EngineParams() # default estimation arguments
  )
#> information stored in C:\Users\jcraig\AppData\Local\Temp\Rtmp8UKsqK/template.txt and C:\Users\jcraig\AppData\Local\Temp\Rtmp8UKsqK/tokens.json

# check tokens file
cat("tokens.json:", readLines(TokensFilePath), sep = "\n")
#> tokens.json:
#> {
#>     "Cl_scr": [
#>         [
#>             "",
#>             ""
#>         ],
#>         [
#>             " + scr*dCldscr",
#>             "fixef(dCldscr= c(, 0, ))"
#>         ]
#>     ],
#>     "V_scr": [
#>         [
#>             "",
#>             ""
#>         ],
#>         [
#>             " + scr*dVdscr",
#>             "fixef(dVdscr= c(, 0, ))"
#>         ]
#>     ]
#> }

# check template file
cat("template.txt:", readLines(TemplateFilePath), sep = "\n")
#> template.txt:
#> ##Description: searchCov
#> ##Author: Certara
#> ##DATA {data_dir}/timeVaryingCovariates.csv
#> ##MAP  scr=scr  A1 = dose CObs = dv id = id time = time
#> ##MODEL test() {
#>  deriv(A1 = -Cl * C)
#>  C = A1 / V
#>  dosepoint(A1, idosevar = A1Dose, infdosevar = A1InfDose, infratevar = A1InfRate)
#>  error(CEps = 0.02)
#>  observe(CObs = C + CEps, bql=0.1)
#>  interpolate(scr)
#>  stparm(Cl = exp( tvCl {Cl_scr[1]} + nCl ))
#>  fixef(tvCl= c(, 1, ))
#>  {Cl_scr[2]}
#>  ranef(diag(nCl) = c(1))
#>  stparm(V = exp( tvV {V_scr[1]} + nV ))
#>  fixef(tvV= c(, 1, ))
#>  {V_scr[2]}
#>  ranef(diag(nV) = c(1))
#> 
#> }
#> ##ESTARGS
#>  sort=FALSE
#> ##TABLES

Generating pyDarwin Options

The options.json file is a crucial configuration file that defines various parameters for your pyDarwin project. These parameters include the choice of optimization algorithm, optimization settings, directories for model execution, and many more. RDarwin provides functions to create and write these options to a JSON file.

  • create_pyDarwinOptions: This function allows you to generate a list of parameters that configure your pyDarwin run. It offers a wide range of options, including specifying the optimization algorithm, adjusting parallel processes, setting directories, and many more.

  • write_pyDarwinOptions: This function takes a list of pyDarwin options and writes them to the options.json file. You can generate these options using the create_pyDarwinOptions function or customize them manually.

# refer to pyDarwin documentation for aliases explanation
workingDir <- "{project_dir}/Results"
outputDir <- "{project_dir}/Results/output"
tempDir <- "{project_dir}/Results/temp"

# Option setup
# RsNLME is expected to be installed
optionSetup <- create_pyDarwinOptions(
  algorithm = "EX",
  engine_adapter = "nlme",
  nlme_dir = Sys.getenv("INSTALLDIR"),
  working_dir = workingDir,
  output_dir = outputDir,
  temp_dir = tempDir,
  gcc_dir = Sys.getenv("NLMEGCCDir64")
)

# Generate option file
write_pyDarwinOptions(pyDarwinOptions = optionSetup,
                      file = file.path(tempdir(), "options.json"))

Running the Model Search with pyDarwin

To run the pyDarwin model search, you can use the run_pyDarwin() function. It requires specifying the path to the Python interpreter executable and the paths to the template, tokens, and options files.

# Example of using run_pyDarwin
result <- run_pyDarwin(
  InterpreterPath = "~/darwin/venv/bin/python",
  DirectoryPath = tempdir(),
  TemplatePath = "template.txt",
  TokensPath = "tokens.json",
  OptionsPath = "options.json",
  Wait = TRUE
)

The run_pyDarwin() function will launch the search process and monitor its progress. Depending on the value of the Wait argument, it will either wait for the search to complete and return the results or exit immediately, providing the location of the messages.txt file where the raw output is stored.

These options are essential for tailoring your model selection project to meet your specific needs.

Conclusion

RDarwin simplifies the process of creating and managing model selection projects using pyDarwin. Whether you’re a pharmacometrician or a data scientist, this package helps you leverage machine learning for making informed model selection decisions.