RDarwin_NLME_Overview.Rmd
library(Certara.RDarwin)
RDarwin is an R package designed to facilitate the
usage of pyDarwin with the Certara NLME pharmacometric
modeling engine from the R command line. pyDarwin
is a powerful tool for using machine learning algorithms for model
selection. This vignette provides an overview of how
RDarwin
can assist you in creating your model selection
projects.
Table of Contents
Introduction: Understand the basic concepts of machine learning model selection and the role of pyDarwin in this process.
Creating the Template and Tokens Files: Learn
how to create the essential template.txt
and
tokens.json
files using RDarwin, which form the foundation
of your model selection project.
Generating pyDarwin Options: Discover how to
create the options.json
file, which is crucial for
configuring your pyDarwin project to meet your specific
requirements.
Running the Model Search with pyDarwin: Explore
how to execute a model search with pyDarwin using the
run_pyDarwin()
function
Machine learning algorithms play a crucial role in model selection, particularly in the field of pharmacokinetics. Model selection involves identifying the most appropriate model from a range of candidate models. This selection process is usually categorized into two types: supervised and unsupervised learning.
Supervised Learning: In supervised learning, algorithms learn to associate patterns with labeled examples. For example, given a dataset of cat and dog images, an artificial neural network (ANN) can learn the patterns associated with “cat” and “dog.” It can then predict whether new images are cats or dogs.
Unsupervised Learning: Unsupervised learning, on the other hand, lacks labeled training data. It’s akin to exploring an unknown landscape. For instance, traditional PK/PD model selection doesn’t have a labeled training dataset. Each dataset contributes to the learning process, and the algorithm must discover relationships across different datasets.
pyDarwin addresses the model selection process using machine learning techniques, providing a powerful platform for these tasks. It allows you to specify your model’s goodness, with options for penalties and post-run custom code execution.
RDarwin serves as a bridge between the R environment and pyDarwin, enabling you to create, manage, and execute model selection projects seamlessly. Make sure you have Python and pyDarwin installed. Also please note that pyDarwin uses a call to Certara.RsNLME package, so it should be also installed with all dependencies.
The data set used in this example is
timeVaryingCovariates.csv
and should be downloaded to your
working directory. Click here
to download.
To use pyDarwin effectively, you need to create specific files that
define the structure and parameters of your model search space. RDarwin
assists you in generating two essential files: template.txt
and tokens.json
.
The template.txt
file provides a foundational shell for
NLME (Nonlinear Mixed Effects) metamodel files. This template sets the
structure for your model search, defining the parameters to be
considered during the selection process.
The tokens.json
file describes the dimensions of your
model search space and the available options within each dimension. It
provides a structured representation of your search space, including the
characteristics of your models and their various attributes.
RDarwin provides two key functions to facilitate the creation of these files:
# create a search space with 1 compartment model, clearance parameterization and
# not using analytical solution due to interpolated covariate added later.
# Using ellipsis arguments, additional search space modification are made.
# Residual error is changed to Additive and set LLOQ to 0.1;
# changed the type of structural parameters to 'LogNormal2' and
# added a covariate scr to be searched on structural parameters.
modelPMLCodes <- get_PMLParametersSets(
CompartmentsNumber = 1,
Parameterization = "Clearance",
ClosedForm = FALSE,
CObs = Observation(
ObservationName = "CObs",
SigmasChosen = list(Additive = 0.02),
BQL = TRUE,
BQLValue = 0.1
),
V = StParm(StParmName = "V", Type = "LogNormal2"),
Cl = StParm(StParmName = "Cl", Type = "LogNormal2"),
scr = Covariate(
Name = "scr",
Direction = "Interpolate",
State = "Searched"
)
)
It is recommended to modify the search using modify_ (e.g.,
modify_StParm()
) and add_ (e.g.,
add_Covariate()
) families of tidy functions since this
approach is more general and more flexible.
template.txt
and
tokens.json
files based on your inputs. It streamlines the
process and ensures the correct structure of these essential files.
TemplateFilePath <- file.path(tempdir(), "template.txt")
TokensFilePath <- file.path(tempdir(), "tokens.json")
# the function will return the text written to TemplateFilePath
# Note that `AMT` is used as a model term in DataMpping.
# AMT is resolved to the main dosepoint name in the model
# (A1 for Absorption == "Intravenous" or "Gamma", Aa for
# Absorption == "Extravascular")
generatedOutput <-
write_ModelTemplateTokens(
TemplateFilePath = TemplateFilePath,
TokensFilePath = TokensFilePath,
Description = "searchCov",
Author = "Certara",
DataFilePath = "timeVaryingCovariates.csv",
DataMapping = c(
id = "id",
time = "time",
AMT = "dose",
CObs = "dv",
scr = "scr"
),
PMLParametersSets = modelPMLCodes,
EstArgs = specify_EngineParams() # default estimation arguments
)
#> information stored in C:\Users\jcraig\AppData\Local\Temp\Rtmp8UKsqK/template.txt and C:\Users\jcraig\AppData\Local\Temp\Rtmp8UKsqK/tokens.json
# check tokens file
cat("tokens.json:", readLines(TokensFilePath), sep = "\n")
#> tokens.json:
#> {
#> "Cl_scr": [
#> [
#> "",
#> ""
#> ],
#> [
#> " + scr*dCldscr",
#> "fixef(dCldscr= c(, 0, ))"
#> ]
#> ],
#> "V_scr": [
#> [
#> "",
#> ""
#> ],
#> [
#> " + scr*dVdscr",
#> "fixef(dVdscr= c(, 0, ))"
#> ]
#> ]
#> }
# check template file
cat("template.txt:", readLines(TemplateFilePath), sep = "\n")
#> template.txt:
#> ##Description: searchCov
#> ##Author: Certara
#> ##DATA {data_dir}/timeVaryingCovariates.csv
#> ##MAP scr=scr A1 = dose CObs = dv id = id time = time
#> ##MODEL test() {
#> deriv(A1 = -Cl * C)
#> C = A1 / V
#> dosepoint(A1, idosevar = A1Dose, infdosevar = A1InfDose, infratevar = A1InfRate)
#> error(CEps = 0.02)
#> observe(CObs = C + CEps, bql=0.1)
#> interpolate(scr)
#> stparm(Cl = exp( tvCl {Cl_scr[1]} + nCl ))
#> fixef(tvCl= c(, 1, ))
#> {Cl_scr[2]}
#> ranef(diag(nCl) = c(1))
#> stparm(V = exp( tvV {V_scr[1]} + nV ))
#> fixef(tvV= c(, 1, ))
#> {V_scr[2]}
#> ranef(diag(nV) = c(1))
#>
#> }
#> ##ESTARGS
#> sort=FALSE
#> ##TABLES
The options.json
file is a crucial configuration file
that defines various parameters for your pyDarwin project. These
parameters include the choice of optimization algorithm, optimization
settings, directories for model execution, and many more. RDarwin
provides functions to create and write these options to a JSON file.
create_pyDarwinOptions: This function allows you to generate a list of parameters that configure your pyDarwin run. It offers a wide range of options, including specifying the optimization algorithm, adjusting parallel processes, setting directories, and many more.
write_pyDarwinOptions: This function takes a
list of pyDarwin options and writes them to the
options.json
file. You can generate these options using the
create_pyDarwinOptions
function or customize them
manually.
# refer to pyDarwin documentation for aliases explanation
workingDir <- "{project_dir}/Results"
outputDir <- "{project_dir}/Results/output"
tempDir <- "{project_dir}/Results/temp"
# Option setup
# RsNLME is expected to be installed
optionSetup <- create_pyDarwinOptions(
algorithm = "EX",
engine_adapter = "nlme",
nlme_dir = Sys.getenv("INSTALLDIR"),
working_dir = workingDir,
output_dir = outputDir,
temp_dir = tempDir,
gcc_dir = Sys.getenv("NLMEGCCDir64")
)
# Generate option file
write_pyDarwinOptions(pyDarwinOptions = optionSetup,
file = file.path(tempdir(), "options.json"))
To run the pyDarwin model search, you can use the
run_pyDarwin()
function. It requires specifying the path to
the Python interpreter executable and the paths to the template, tokens,
and options files.
# Example of using run_pyDarwin
result <- run_pyDarwin(
InterpreterPath = "~/darwin/venv/bin/python",
DirectoryPath = tempdir(),
TemplatePath = "template.txt",
TokensPath = "tokens.json",
OptionsPath = "options.json",
Wait = TRUE
)
The run_pyDarwin()
function will launch the search
process and monitor its progress. Depending on the value of the Wait
argument, it will either wait for the search to complete and return the
results or exit immediately, providing the location of the
messages.txt
file where the raw output is stored.
These options are essential for tailoring your model selection project to meet your specific needs.