RDarwin: An R Interface to pyDarwin
RDarwin_NLME_Overview.Rmd
RDarwin is an R package designed to facilitate the
usage of pyDarwin with the Certara NLME pharmacometric
modeling engine from the R command line. pyDarwin
is a powerful tool for using machine learning algorithms for model
selection. This vignette provides an overview of how
RDarwin
can assist you in creating your model selection
projects.
Table of Contents
Introduction: Understand the basic concepts of machine learning model selection and the role of pyDarwin in this process.
Creating the Template and Tokens Files: Learn how to create the essential
template.txt
andtokens.json
files using RDarwin, which form the foundation of your model selection project.Generating pyDarwin Options: Discover how to create the
options.json
file, which is crucial for configuring your pyDarwin project to meet your specific requirements.Running the Model Search with pyDarwin: Explore how to execute a model search with pyDarwin using the
run_pyDarwin()
function
Introduction
Machine learning algorithms play a crucial role in model selection, particularly in the field of pharmacokinetics. Model selection involves identifying the most appropriate model from a range of candidate models.
pyDarwin addresses the model selection process using machine learning techniques, providing a powerful platform for these tasks. It allows you to specify your model’s goodness, with options for penalties and post-run custom code execution.
RDarwin serves as a bridge between the R environment and pyDarwin, enabling you to create, manage, and execute model selection projects seamlessly. Make sure you have Python and pyDarwin installed. Also please note that pyDarwin uses a call to Certara.RsNLME package, so it should be also installed with all dependencies.
Data
RDarwin’s PK modeling functionalities expect data in a specific format. A typical dataset would include the following columns:
- id: Individual subject identifier.
- time: Time of observation.
- dose: Administered dose. Note that
AMT
model term could be used to map the main dosepoint term in the model. - dv: Observed dependent variable.
- scr: Serum creatinine (or other covariate) (optional).
WorkingDir <- tempdir()
DataFilePath <- file.path(WorkingDir, "timeVaryingCovariates.csv")
# This data file is made available in the package examples folder,
# however, we will copy it to our current working directory for ease of access
file.copy(system.file(package = "Certara.RDarwin", "examples", "timeVaryingCovariates.csv"),
DataFilePath, overwrite = TRUE)
Creating the Template and Tokens Files
To use pyDarwin effectively, you need to create specific files that
define the structure and parameters of your model search space. RDarwin
assists you in generating two essential files: template.txt
and tokens.json
.
Template File (template.txt)
The template.txt
file provides a foundational shell for
NLME (Nonlinear Mixed Effects) metamodel files. This template sets the
structure for your model search, defining the parameters to be
considered during the selection process.
Tokens File (tokens.json)
The tokens.json
file describes the dimensions of your
model search space and the available options within each dimension. It
provides a structured representation of your search space, including the
characteristics of your models and their various attributes.
RDarwin provides two key functions to facilitate the creation of these files:
- create_ModelPK: This function allows you to specify general PK model structures to be included in the search space. You can define model structures such as the number of compartments, absorption type, parameterization type, and more.
# create a search space with 1 compartment model, clearance parameterization and
# not using analytical solution due to interpolated covariate added later.
# Using ellipsis arguments, additional search space modification are made.
# Residual error is changed to Additive and set LLOQ to 0.1;
# changed the type of structural parameters to 'LogNormal2' and
# added a covariate scr to be searched on structural parameters.
modelPMLCodes <- create_ModelPK(
CompartmentsNumber = 1,
Parameterization = "Clearance",
ClosedForm = FALSE,
CObs = Observation(
ObservationName = "CObs",
SigmasChosen = list(Additive = 0.02),
BQL = TRUE,
BQLValue = 0.1
),
V = StParm(StParmName = "V", Type = "LogNormal2"),
Cl = StParm(StParmName = "Cl", Type = "LogNormal2"),
scr = Covariate(
Name = "scr",
Direction = "Interpolate",
State = "Searched"
)
)
It is recommended to modify the search using modify_ (e.g.,
modify_StParm()
) and add_ (e.g.,
add_Covariate()
) families of tidy functions since this
approach is more general and more flexible.
-
write_ModelTemplateTokens: This function generates
and writes both the
template.txt
andtokens.json
files based on your inputs. It streamlines the process and ensures the correct structure of these essential files.
TemplateFilePath <- file.path(WorkingDir, "template.txt")
TokensFilePath <- file.path(WorkingDir, "tokens.json")
# the function will return the text written to TemplateFilePath
# Note that `AMT` is used as a model term in DataMpping.
# AMT is resolved to the main dosepoint name in the model
# (A1 for Absorption == "Intravenous" or "Gamma", Aa for
# Absorption == "Extravascular")
generatedOutput <-
write_ModelTemplateTokens(
TemplateFilePath = TemplateFilePath,
TokensFilePath = TokensFilePath,
Description = "searchCov",
Author = "Certara",
DataFilePath = DataFilePath,
DataMapping = c(
id = "id",
time = "time",
AMT = "dose",
CObs = "dv",
scr = "scr"
),
PMLParametersSets = modelPMLCodes,
EstArgs = specify_EngineParams() # default estimation arguments
)
# check tokens file
cat("tokens.json:", readLines(TokensFilePath), sep = "\n")
#> tokens.json:
#> {
#> "Cl_scr": [
#> [
#> "",
#> ""
#> ],
#> [
#> " + scr*dCldscr",
#> "fixef(dCldscr= c(, 0, ))"
#> ]
#> ],
#> "V_scr": [
#> [
#> "",
#> ""
#> ],
#> [
#> " + scr*dVdscr",
#> "fixef(dVdscr= c(, 0, ))"
#> ]
#> ]
#> }
# check template file
cat("template.txt:", readLines(TemplateFilePath), sep = "\n")
#> template.txt:
#> ##Description: searchCov
#> ##Author: Certara
#> ##DATA {data_dir}/timeVaryingCovariates.csv
#> ##MAP scr=scr A1 = dose CObs = dv id = id time = time
#> ##MODEL test() {
#> deriv(A1 = -Cl * C)
#> C = A1 / V
#> dosepoint(A1, idosevar = A1Dose, infdosevar = A1InfDose, infratevar = A1InfRate)
#> error(CEps = 0.02)
#> observe(CObs = C + CEps, bql=0.1)
#> interpolate(scr)
#> stparm(Cl = exp( tvCl {Cl_scr[1]} + nCl ))
#> fixef(tvCl= c(, 1, ))
#> {Cl_scr[2]}
#> ranef(diag(nCl) = c(1))
#> stparm(V = exp( tvV {V_scr[1]} + nV ))
#> fixef(tvV= c(, 1, ))
#> {V_scr[2]}
#> ranef(diag(nV) = c(1))
#>
#> }
#> ##ESTARGS
#> sort=FALSE
#> ##TABLES
Generating pyDarwin Options
The options.json
file is a crucial configuration file
that defines various parameters for your pyDarwin project. These
parameters include the choice of optimization algorithm, optimization
settings, directories for model execution, and many more. RDarwin
provides functions to create and write these options to a JSON file.
create_pyDarwinOptions: This function allows you to generate a list of parameters that configure your pyDarwin run. It offers a wide range of options, including specifying the optimization algorithm, adjusting parallel processes, setting directories, and many more.
write_pyDarwinOptions: This function takes a list of pyDarwin options and writes them to the
options.json
file. You can generate these options using thecreate_pyDarwinOptions
function or customize them manually.
# refer to pyDarwin documentation for aliases explanation
working_dir <- "{project_dir}/Results"
output_dir <- "{project_dir}/Results/output"
temp_dir <- "{project_dir}/Results/temp"
# Option setup
# RsNLME is expected to be installed
optionSetup <- create_pyDarwinOptions(
algorithm = "EX",
engine_adapter = "nlme",
nlme_dir = Sys.getenv("INSTALLDIR"),
working_dir = working_dir,
output_dir = output_dir,
temp_dir = temp_dir,
gcc_dir = Sys.getenv("NLMEGCCDir64")
)
# Generate option file
write_pyDarwinOptions(pyDarwinOptions = optionSetup,
file = file.path(WorkingDir, "options.json"))
Running the Model Search with pyDarwin
To run the pyDarwin model search, you can use the
run_pyDarwin()
function. It requires specifying the path to
the Python interpreter executable and the paths to the template, tokens,
and options files.
# Example of using run_pyDarwin on Windows machine
result <- run_pyDarwin(
InterpreterPath = "C:/temp/venv/Scripts/python.exe",
DirectoryPath = WorkingDir,
TemplatePath = "template.txt",
TokensPath = "tokens.json",
OptionsPath = "options.json",
Wait = TRUE
)
The run_pyDarwin()
function will launch the search
process and monitor its progress. Depending on the value of the Wait
argument, it will either wait for the search to complete and return the
results or exit immediately, providing the location of the
messages.txt
file where the raw output is stored.
These options are essential for tailoring your model selection project to meet your specific needs.