Multi-Objective Optimization with Certara.RDarwin (MOGA / MOGA3)
RDarwin_NLME_MOGA.Rmd
library(Certara.RDarwin)
library(Certara.RsNLME)
library(Certara.DarwinReporter)
# required for for 3D hull with MOGA3
library(geometry)
Introduction
In pharmacometric modeling, the quest for the “best” model is rarely straightforward. We often need to balance multiple, competing criteria: maximizing goodness-of-fit while minimizing model complexity, ensuring predictive performance, and maintaining physiological plausibility.
Multi-Objective Genetic Algorithms (MOGAs) are powerful tools designed for this exact challenge. Instead of combining all criteria into a single score, a MOGA finds a set of optimal trade-off solutions, known as the Pareto front.
This vignette provides a step-by-step guide to performing multi-objective optimization using R-Darwin, covering the two algorithms provided by pyDarwin for multi-objective optimization (MOGA and MOGA3) and visualization of their results:
MOGA: with objectives being the objective function used by NONMEM and NLME for fitting (OFV) and number of estimated parameters (NEP).
MOGA3: For advanced optimization with three or more custom objectives, such as the objectives being OFV, NEP, and a Posterior Predictive Check (PPC). It is based on the NSGA-III algorithm (see the documentation at https://pymoo.org/algorithms/moo/nsga3.html?highlight=nsga%20ii).
Visualization: How to interpret the results using
Certara.DarwinReporter
.
1. Prerequisites and Environment Setup
Before you begin, it is crucial to set up your environment correctly.
R-Darwin Dependencies
Ensure that the necessary environment variables for the NLME Engine are set.
# Example for Windows:
Sys.setenv(INSTALLDIR = "C:/Program Files/Certara/NLME_Engine")
Sys.setenv(NLMEGCCDir64 = "C:/Program Files/Certara/NLME_Engine/gcc/bin")
# Example for Linux:
Sys.setenv(INSTALLDIR = "/opt/Certara/NLME_Engine")
Sys.setenv(NLMEGCCDir64 = "/usr/bin/") # Or another path to your GCC compiler
# this one is required for UBUNTU distros
# Sys.setenv("PML_BIN_DIR" = "UBUNTU")
# It's good practice to validate that the directories exist
stopifnot(dir.exists(Sys.getenv("INSTALLDIR")))
stopifnot(dir.exists(Sys.getenv("NLMEGCCDir64")))
2. Defining the Model Search Space
First, we must define the universe of models that
pyDarwin
will search. For this example, we will construct a
search space for a PK model, exploring:
- Structural Model: A choice between a 1, 2, or 3-compartment model.
-
Random Effects: Whether to include a random effect
(IIV) on the peripheral volume
V2
. -
Covariate Effects: Whether to include effects for
Age, Body Weight, and Sex on both Clearance (
Cl
) and Volume (V
). - Residual Error: A choice between a Proportional and an Additive-Multiplicative error model.
We define this using R-Darwin’s intuitive functions:
# 1. Start with three base structural models
models <- create_ModelPK(CompartmentsNumber = c(1, 2, 3))
# 2. Search for the presence of random effects on nV2
models <- models |>
modify_Omega(Name = "nV2", State = "Searched")
# 3. Add Age, Body Weight and Sex covariates on Cl and V as searchable
models <- models |>
add_Covariate(Name = "Age", StParmNames = c("V", "Cl"), State = "Searched") |>
add_Covariate(Name = "BW", StParmNames = c("V", "Cl"), State = "Searched", Center = 70) |>
add_Covariate(Name = "Sex", Type = "Categorical", StParmNames = c("V", "Cl"), State = "Searched", Categories = c(0, 1))
# 4. Define the search space for the residual error model
models <- models |>
modify_Observation(
ObservationName = "CObs",
SigmasChosen = Sigmas(
AdditiveMultiplicative = list(PropPart = 0.1, AddPart = 0.01),
Proportional = 0.1
)
)
3. Generating the Project Files
With the search space defined, we now create the input files required
by pyDarwin: template.txt
and tokens.json
.
We will also set up a temporary working directory and copy our dataset and a post-processing script (which we’ll use later for the MOGA3 run) into it.
# Create a temporary directory for our project files
WorkingDir <- tempdir()
# Create the dataset for the run
data(pkData, package = "Certara.RsNLME")
write.csv(pkData, file = file.path(WorkingDir, "pkData.csv"), row.names = FALSE)
# Now, generate the template and tokens files from our model definition
write_ModelTemplateTokens(
TemplateFilePath = file.path(WorkingDir, "template.txt"),
TokensFilePath = file.path(WorkingDir, "tokens.json"),
Description = "MOGA_Vignette_Search",
Author = "Certara",
DataFilePath = file.path(WorkingDir, "pkData.csv"),
DataMapping = c(id = "Subject", time = "Act_Time", AMT = "Amount", CObs = "Conc", Age = "Age", BW = "BodyWeight", Sex = "Gender(female = 0, male = 1)"),
PMLParametersSets = models,
# We need simulation tables for the PPC script we'll use later
SimArgs = specify_SimParams(sort = TRUE, numReplicates = 1, seed = 12345678),
Tables = list(Table(Name = "ObsData.csv", KeepSource = TRUE, WhenObs = c("CObs")),
Table(Name = "SimData.csv", KeepSource = TRUE, VariablesList = c("CObs"), ForSimulation = TRUE))
)
#> information stored in C:\Users\jcraig\AppData\Local\Temp\Rtmpk37iFX/template.txt and C:\Users\jcraig\AppData\Local\Temp\Rtmpk37iFX/tokens.json
message("Project files created in: ", WorkingDir)
#> Project files created in: C:\Users\jcraig\AppData\Local\Temp\Rtmpk37iFX
Our project is now ready. We have a dataset, a template file defining the model structure, and a tokens file defining the full search space.
4. MOGA: 2-Objective Optimization
The simplest multi-objective search involves two objectives. By
default, pyDarwin
’s MOGA mode will optimize for:
- Model Fitness: The primary objective function value (e.g., -2LL).
- Number of Parameters: As a measure of model complexity.
This is a powerful way to find models that provide a good fit without being unnecessarily complex. The result is not a single “best” model, but a set of optimal trade-off solutions called the Pareto front.
Configuration
Setting up a MOGA run is straightforward. We use
create_pyDarwinOptions()
and set
algorithm = "MOGA"
. The key is to balance the thoroughness
of the search with the time it takes to run.
# For this vignette, we use small values to ensure the example runs quickly.
# For a real analysis, you would use significantly larger values.
moga_options <- create_pyDarwinOptions(
algorithm = "MOGA",
# --- Key Search Scope Parameters ---
population_size = 6, # DEMO VALUE: Number of models in each generation.
num_generations = 3, # DEMO VALUE: Number of evolutionary cycles.
num_parallel = 4, # Speeds up the search by running models in parallel.
# --- Local Search & Refinement ---
downhill_period = 2, # Run a local search around best models every 2 generations.
# --- Housekeeping and Paths ---
working_dir = "MOGA_Results", # All output will be saved in this sub-folder.
keep_files = c("dmp.txt", "posthoc.csv", "ObsData.csv", "SimData.csv"),
nlme_dir = Sys.getenv("INSTALLDIR"),
gcc_dir = Sys.getenv("NLMEGCCDir64")
)
# It's good practice to save your options to a file for reproducibility.
write_pyDarwinOptions(moga_options,
file = file.path(WorkingDir, "options_moga.json")
)
Parameter Deep Dive
For a successful MOGA run, understanding these key parameters is essential:
-
Controlling the Search Scope:
-
population_size
: This is the number of candidate models (the “population”) in each generation. Think of it as the breadth of your search. A larger population can explore more diverse solutions simultaneously but increases the run time of each generation. For this demo, we use6
, but for a real analysis, values between 20 and 100 are common. -
num_generations
: This is the number of evolutionary cycles the algorithm will perform. Think of it as the depth of your search. More generations give the models more time to “evolve” and converge towards the true Pareto front. We use3
for the demo, but 10 or more generations are typical for a thorough search.
-
-
Refining the Best Solutions:
-
downhill_period
: This is a powerful feature for refining results. Settingdownhill_period = 2
tellspyDarwin
to pause the standard “breeding” process every two generations and instead perform an intensive local search around the current best models on the Pareto front. This helps find small improvements that the genetic algorithm might otherwise miss. -
final_downhill_search
: A related option (which isTRUE
by default) that performs one last, exhaustive local search at the very end of the run. It is highly recommended to leave this enabled.
-
-
Performance:
-
num_parallel
: This is one of the most important settings for practical use. It specifies how many model runs can be executed simultaneously. Set this to the number of CPU cores you want to dedicate to the search to significantly reduce the overall run time.
-
-
Housekeeping and Paths:
-
working_dir
: This defines the root directory wherepyDarwin
will save all its output, including theoutput/
,temp/
, andnon_dominated_models/
folders. Using a relative path like"MOGA_Results"
is a good practice, as it creates a new folder within your current project directory. -
keep_files
: This argument is useful for debugging and for post-processing. It tellspyDarwin
to save specific files from individual model runs that would otherwise be deleted. This is essential if you need to inspect intermediate outputs.
-
-
Fine-Tuning the Genetic Algorithm (Advanced):
- If you need even more control, you can pass a configuration list to
the
MOGA
argument itself usingpyDarwinOptionsMOGA()
. This allows you to tune core GA parameters likecrossover_rate
andmutation_rate
.
- If you need even more control, you can pass a configuration list to
the
# Example of advanced tuning
ga_tuning <- pyDarwinOptionsMOGA(crossover_rate = 0.9, mutation_rate = 0.95)
moga_options_tuned <- create_pyDarwinOptions(
algorithm = "MOGA",
MOGA = ga_tuning,
# ... other options
)
Execution
With all files prepared (template.txt
,
tokens.json
, and options_moga.json
), we can
now launch the search. The run_pyDarwin()
function handles the call to the Python executable and monitors the
process.
# Execute the MOGA run
moga_results <- run_pyDarwin(
InterpreterPath = PYTHON_PATH,
DirectoryPath = WorkingDir,
OptionsPath = "options_moga.json", # Point to our MOGA config
Wait = TRUE )
When you execute this command, a detailed log will stream to the console. Below is a condensed and anonymized example of a successful MOGA run, highlighting the key stages.
#> [12:38:08] Running pyDarwin v3.1.0
#> [12:38:08] Options file found at /tmp/RtmpcflQuG/options_moga.json
#> [12:38:08] Preparing project working folder...
#> [12:38:08] Algorithm: MOGA
#> [12:38:08] Population size: 6
#> [12:38:08] num_generations: 3
#> [12:38:08] NLME found: /opt/Certara/NLME_Engine
#> [12:38:08] GCC found: /usr/bin/
#> [12:38:08] Search start time: Thu Jul 24 12:38:08 2025
#> ...
#> [12:38:35] Iteration = 1, Model 3, Done, OFV = 1534.816, NEP = 10, message = No important warnings
#> [12:38:37] Iteration = 1, Model 2, Done, OFV = 1214.318, NEP = 13, message = No important warnings
#> [12:38:47] Iteration = 1, Model 1, Done, OFV = 1207.310, NEP = 17, message = No important warnings
#> ...
#> [12:39:01] Current Non Dominated models:
#> [12:39:01] Model NLME_1_1.mmdl, OFV = 1207.310, NEP = 17
#> [12:39:01] Model NLME_1_2.mmdl, OFV = 1214.318, NEP = 13
#> [12:39:01] Model NLME_1_3.mmdl, OFV = 1534.816, NEP = 10
#> ...
#> [12:39:51] Current Non Dominated models:
#> [12:39:51] Model NLME_1_2.mmdl, OFV = 1214.318, NEP = 13
#> [12:39:51] Model NLME_1_3.mmdl, OFV = 1534.816, NEP = 10
#> [12:39:51] Model NLME_2_1.mmdl, OFV = 1520.136, NEP = 11
#> [12:39:51] Model NLME_2_6.mmdl, OFV = 1205.560, NEP = 16
#> [12:39:52] Starting downhill generation 2
#> [12:39:52] code for niche (minimal binary) 1 = [...], model # NLME_1_2
#> ...
#> [12:39:52] Starting downhill step 1, total of 36 in 4 niches to be run.
#> ...
#> [12:44:06] Current Non Dominated models:
#> [12:44:06] Model NLME_2D01_37.mmdl, OFV = 1202.446, NEP = 17
#> ...
#> [13:03:50] MOGA best genome =
#> [[1 1 0 1 1 1 1 1 1 0]
#> [0 0 0 0 0 0 0 0 0 0]
#> ...],
#> OFV and # of parameters =
#> [[1202.346 18. ]
#> [1674.356 5. ]
#> [1535.706 7. ]
#> [1265.59 8. ]
#> [1207.23 14. ]
#> [1245.5 11. ]]
#> [13:03:50] Number of considered models: 778
#> [13:03:50] Number of models that were run during the search: 218
#> [13:03:50] Elapsed time: 25.7 minutes
#>
#> [13:03:50] Search end time: Thu Jul 24 13:03:50 2025
Interpreting the MOGA Output
The console output provides a real-time view into the evolutionary process. Here’s what to look for:
Configuration Summary: The run begins by confirming all settings, paths, and the size of the search space (in this case, 768 possible models). This is the first place to check if the run isn’t behaving as you expect.
-
Model Execution and the Pareto Front: In each generation,
pyDarwin
fits models and reports their Objective Function Value (OFV
, which is -2LL) and Number of Estimated Parameters (NEP
). TheCurrent Non Dominated models
list is the most important output; it represents the evolving Pareto front.- For example, after Generation 1, model
NLME_1_1
has a great fit (OFV=1207.3) but is complex (NEP=17), while modelNLME_1_3
is much simpler (NEP=10) but has a worse fit (OFV=1534.8). Neither is strictly “better”; they are different, optimal trade-offs.
- For example, after Generation 1, model
The Downhill Search: The log entry
Starting downhill generation 2
signifies a shift in strategy. Because we setdownhill_period = 2
, the algorithm pauses the standard evolutionary process to perform an intensive local search around the best solutions found so far (“niches”). This is a critical step for refining the Pareto front and finding subtle improvements.Final Results: At the end,
pyDarwin
reports theMOGA best genome
, which is the final set of non-dominated models that make up the Pareto front. It also provides summary statistics, showing that of the 778 possible models, it only needed to run 218 to find the optimal set, demonstrating the efficiency of the genetic search.
5. MOGA3: 3-Objective Optimization with Custom Post-Processing
While optimizing for fit and complexity is powerful, you often need to incorporate other criteria, such as a model’s predictive performance. This is where the MOGA3 algorithm excels, as it is designed for three or more objectives.
To use MOGA3, you must provide a post-processing script that calculates the value of each objective after a model is run. For this example, we will add a third objective: a Posterior Predictive Check (PPC) based on Cmax.
The Post-Processing Script
The post-processing script is the heart of a custom MOGA3 run. It is an R (or Python) script that pyDarwin executes after every successful model fit. It has a strict requirement for its return value.
The script must return two vectors:
A numeric vector of the objectives to be minimized.
A numeric vector of constraints, where each value must be <= 0 for the model to be considered “feasible.” If a model is not feasible, it is discarded.
Our script, CmaxPPC.R, will calculate three objectives and three constraints.
# Define the content of our post-processing R script as a character string
# NOTE: it is used for MOGA3 only
ppc_script_code <- r"(
# Load estimation results from the dmp.txt file
source("dmp.txt")
# Calculate the first two objectives: -2LL and the number of parameters
neg2LL <- -2 * dmp.txt$logLik
numEstimatedParams <- dmp.txt$nParm
# Load observed and simulated data, filtering for post-dose observations only
obsData <- subset(read.csv("ObsData.csv"), time > 0, select = c("id5", "CObs"))
simData <- subset(read.csv("SimData.csv"), time > 0, select = c("id5", "CObs"))
# Calculate Cmax per subject for both observed and simulated data
obsCmax <- tapply(obsData$CObs, obsData$id5, max)
simCmax <- tapply(simData$CObs, simData$id5, max)
# Calculate the third objective: a penalty based on the geometric mean of Cmax
# This penalizes models where simulated Cmax diverges from observed Cmax
obsCmax_geoMean <- exp(mean(log(obsCmax)))
simCmax_geoMean <- exp(mean(log(simCmax)))
penalty <- 4 * abs((obsCmax_geoMean - simCmax_geoMean) / obsCmax_geoMean) * 100
# --- Return Values for pyDarwin ---
# Vector 1: The objectives to be MINIMIZED.
c(neg2LL, numEstimatedParams, penalty)
# Vector 2: The constraints. ALL values must be <= 0 for the model to be considered 'feasible'.
c(neg2LL - 99999998, # Fails if the model crashed (returns a value near the crash_value)
2 - numEstimatedParams, # Fails if the model has fewer than 2 parameters
penalty - 50 # Fails if the PPC penalty is greater than 50
)
)"
# Write the R code string to a file named CmaxPPC.R in our project directory
writeLines(ppc_script_code, con = file.path(WorkingDir, "CmaxPPC.R"))
MOGA3 Configuration
Configuring a MOGA3 run is more involved than a MOGA run because you
are responsible for defining the objectives and constraints yourself.
This is done by linking pyDarwin
to a post-processing
script that performs these calculations.
The setup involves three main components: 1.
pyDarwinOptionsMOGA()
: To configure the
specifics of the NSGA-III algorithm (like the number of objectives). 2.
pyDarwinOptionsPostprocess()
: To define
the external R or Python script that will be executed. 3.
create_pyDarwinOptions()
: To assemble
everything and set algorithm = "MOGA3"
.
# --- 1. Define MOGA3 Algorithm Specifics ---
# This part tells pyDarwin how to handle the multi-objective search itself.
moga3_spec <- pyDarwinOptionsMOGA(
objectives = 3, # Must match the length of the first returned vector from our R script.
constraints = 3, # Must match the length of the second returned vector from our R script.
partitions = 3 # Key parameter for the NSGA-III algorithm to ensure diversity.
)
# --- 2. Configure the Post-Processing Script ---
# This tells pyDarwin to run our R script after each successful model fit.
postprocess_spec <- pyDarwinOptionsPostprocess(
use_r = TRUE,
post_run_r_code = file.path(WorkingDir, "CmaxPPC.R"),
r_timeout = 600 # Safety feature: timeout in seconds for the R script.
)
# --- 3. Assemble the Final Options ---
moga3_options <- create_pyDarwinOptions(
algorithm = "MOGA3",
MOGA = moga3_spec,
postprocess = postprocess_spec,
# --- Search Scope (use larger values for real analysis) ---
num_generations = 6,
population_size = 12,
# --- Housekeeping and Paths ---
data_dir = WorkingDir, # Location of the input dataset (pkData.csv).
working_dir = file.path(WorkingDir, "MOGA3_Results"), # Root for all output folders.
nlme_dir = Sys.getenv("INSTALLDIR"),
gcc_dir = Sys.getenv("NLMEGCCDir64")
)
# Save the final options to a new file
write_pyDarwinOptions(moga3_options, file = file.path(WorkingDir, "options_moga3.json"))
Parameter Deep Dive
For a MOGA3 run, the interaction between the options and your post-processing script is critical.
-
Defining the MOGA3 Search
(
pyDarwinOptionsMOGA
):-
objectives
andconstraints
: These numbers are not arbitrary; they must exactly match the length of the two numeric vectors returned by yourpost_run_r_code
script. A mismatch will cause the run to fail. -
partitions
: This is a key parameter for the NSGA-III algorithm. It helps maintain diversity across the solutions by dividing the objective space using a set of reference directions. The number of partitions directly impacts the number of points on the final Pareto front. For a 3-objective problem, a value of3
or4
is a good starting point. For more objectives, this value typically needs to be increased.
-
-
Configuring the Post-Processing Script
(
pyDarwinOptionsPostprocess
):-
post_run_r_code
: This must be the path to your R script.pyDarwin
will execute this script from within a temporary model run directory, which is why the script can find files likedmp.txt
andSimData.csv
using relative paths. -
r_timeout
: This is a crucial safety feature. If your R script encounters an error or an infinite loop, this timeout ensures that a single failing model doesn’t stall the entire multi-hour search process.
-
-
Balancing Search Scope and Time:
-
population_size
andnum_generations
: Just as with MOGA, these control the breadth and depth of your search. However, because a 3D objective space is much larger and more complex than a 2D space, a MOGA3 search often requires a larger population size and more generations than a comparable MOGA run to adequately explore the Pareto front.
-
-
Directory Management:
-
data_dir
: This tellspyDarwin
where to find the input dataset (e.g.,pkData.csv
) as specified in yourtemplate.txt
file’s##DATA
directive. -
working_dir
: This is the root directory where all outputs from the run will be stored, including theoutput/
,temp/
, andnon_dominated_models/
folders. It is important to note that the post-processing script runs inside a subdirectory of thisworking_dir
.
-
MOGA3 Execution and Visualization
The execution command is identical to the MOGA run, but we point to our new options file.
# Execute the MOGA3 run
moga3_results <- run_pyDarwin(
InterpreterPath = PYTHON_PATH,
DirectoryPath = WorkingDir,
OptionsPath = "options_moga3.json", # Point to our MOGA3 config
Wait = TRUE
)
Interpreting the MOGA3 Output
The log from a MOGA3 run provides a rich, real-time view of the search. Below is a condensed and anonymized example highlighting the key stages and outputs.
#> [14:37:27] Running pyDarwin v3.1.0
#> [14:37:27] Options file found at /tmp/RtmpcflQuG/options_moga3.json
#> [14:37:28] Algorithm: MOGA3
#> [14:37:28] Population size: 12
#> [14:37:28] num_generations: 6
#> [14:37:28] Number of objectives: 3
#> [14:37:28] NLME found: /opt/Certara/NLME_Engine
#> [14:37:28] GCC found: /usr/bin/
#> [14:37:28] RScript found at /usr/bin/Rscript
#> [14:37:28] Post Run R code found at /tmp/RtmpcflQuG/CmaxPPC.R
#> ...
#> [14:37:49] Iteration = 1, Model 3, Done, f1 = 1534.816, f2 = 10.000, f3 = 37.464, message = No important warnings
#> [14:37:53] Iteration = 1, Model 2, Done, f1 = 1214.318, f2 = 13.000, f3 = 7.843, message = No important warnings
#> [14:38:51] Iteration = 1, Model 11, Done, f1 = 1202.346, f2 = 18.000, f3 = 18.127, message = No important warnings
#> ...
#> [14:38:51] Current Non Dominated models:
#> [14:38:51] Model NLME_1_02.mmdl, f1 = 1214.318, f2 = 13.000, f3 = 7.843
#> [14:38:51] Model NLME_1_06.mmdl, f1 = 1264.534, f2 = 13.000, f3 = 3.016
#> [14:38:51] Model NLME_1_07.mmdl, f1 = 1559.188, f2 = 7.000, f3 = 48.388
#> ...
#> [14:40:19] Starting downhill generation 2
#> ...
#> [14:44:20] Current Non Dominated models:
#> [14:44:20] Model NLME_2D01_29.mmdl, f1 = 1206.092, f2 = 15.000, f3 = 17.869
#> [14:44:20] Model NLME_2D01_42.mmdl, f1 = 1265.590, f2 = 9.000, f3 = 3.397
#> [14:44:20] Model NLME_2_06.mmdl, f1 = 1227.924, f2 = 13.000, f3 = 4.039
#> ...
#> [14:53:45] MOGA best genome =
#> [[0 1 0 1 0 0 0 0 1 0]
#> [1 1 0 1 1 0 0 1 0 0]
#> [0 1 1 1 1 1 1 0 0 0]
#> [0 1 1 0 0 0 0 0 0 0]],
#> OFV and # of parameters =
#> [[1265.22 10. 2.112673]
#> [1206.092 15. 17.86854 ]
#> [1227.924 13. 4.038766]
#> [1265.59 9. 3.39664 ]]
#> [14:53:45] Number of considered models: 482
#> [14:53:45] Number of models that were run during the search: 154
#> [14:53:45] Elapsed time: 16.3 minutes
#>
#> [14:53:45] Search end time: Thu Jul 24 14:53:45 2025
The log from a MOGA3 run is fundamentally different from a standard MOGA run. Here’s what to look for:
-
Custom Objective Reporting: The output no longer
shows
OFV
andNEP
. Instead, it displaysf1
,f2
, andf3
. These correspond directly to the values returned by ourCmaxPPC.R
script, in order:-
f1
:-2 * logLik
-
f2
:numEstimatedParams
-
f3
:penalty
-
-
The 3D Pareto Front: The
Current Non Dominated models
list now shows the trade-offs across all three objectives. For example:- Model
NLME_1_06
has a good PPC score (f3
=3.0) but a relatively poor fit (f1
=1264.5). - Model
NLME_1_11
has an excellent fit (f1
=1202.3) but is very complex (f2
=18) and has a high PPC penalty (f3
=18.1). These models represent points on a 3D surface of optimal solutions. No single point is definitively “best”—the choice depends on the project goals.
- Model
-
Feasibility and Constraints: Although not
explicitly shown for every model, our constraints (
neg2LL
< 99999998,numEstimatedParams
>= 2, andpenalty
<= 50) were checked for every run. Any model that failed these checks would have been discarded and not considered for the Pareto front. In the log, you can see thatModel NLME_2_01
withf3 = 50.592
was not included in the next generation’s non-dominated list, likely because it violated ourpenalty - 50 <= 0
constraint.
Now that we have this rich, multi-dimensional result, the next step is to visualize it to better understand the trade-offs.
6. Visualizing MOGA Results with
Certara.DarwinReporter
The result of a MOGA run is not a single “best” model, but a collection of optimal, non-dominated solutions that form the Pareto front. Simply looking at a table of objective values can be overwhelming. Visualization is therefore an essential step to understand the trade-offs between objectives and to make an informed model selection decision.
The Certara.DarwinReporter
package is specifically
designed for this purpose, providing functions to load, process, and
plot the results from a pyDarwin
run.
Step 1: Loading the Results Data
The first step is to load the results from your pyDarwin
working directory into a special darwin_data
object. This
object acts as a container for all the run information.
Key Functions:
darwin_data()
: Creates the main container object. You point it to the root of your results folder (the path you specified in theworking_dir
option).import_non_dominated_models()
: Scans the output directory, finds all executed models, parses their results, and populates thedarwin_data
object.
# --- Load results from the 2-objective MOGA run ---
moga2_output_path <- file.path(WorkingDir, "MOGA_Results")
# need to rename options file to options.json
file.copy(file.path(WorkingDir, "options_moga.json"),
file.path(WorkingDir, "options.json"),
overwrite =TRUE)
moga_data_2_obj <-
darwin_data(project_dir = dirname(moga2_output_path),
working_dir = moga2_output_path) |>
import_non_dominated_models(dir = file.path(moga2_output_path, "non_dominated_models"))
summarise_overall_by_non_dominated_models(moga_data_2_obj)
# --- Load results from the 3-objective MOGA3 run ---
moga3_output_path <- file.path(WorkingDir, "MOGA3_Results")
file.copy(file.path(WorkingDir, "options_moga3.json"),
file.path(WorkingDir, "options.json"),
overwrite =TRUE)
moga_data_3_obj <-
darwin_data(project_dir = dirname(moga3_output_path),
working_dir = moga3_output_path) |>
import_non_dominated_models(dir = file.path(moga3_output_path, "non_dominated_models"))
Step 2: Understanding the Output Data
Before plotting, it’s useful to inspect the loaded data. The
summarise_overall_by_non_dominated_models()
function
provides a clean summary table of the models that made it to the final
Pareto front.
# Inspect just the final front models from the MOGA3 run
summarise_overall_by_non_dominated_models(moga_data_3_obj)
This summary includes standard metrics like -2LL
,
AIC
, and BIC
, alongside the custom objective
values from the run (f1
, f2
, f3
).
It also includes two important logical columns: global_ff
and moga_ff
.
Understanding the Final Front:
global_ff
(Global Final Front): A model is included here based on a strict, objective-based comparison. It is on the global front if no other model exists that is better or equal in all objectives. This is a purely data-driven assessment of the final results.moga_ff
(MOGA Final Front): This is the final front as determined by thepyDarwin
algorithm itself, which may consider factors from the evolutionary process like mutation and crossover in addition to the final objective values.
While both methods usually yield similar results, retaining both provides flexibility for analysis.
Step 3: Visualizing the Pareto Front
The objectives_vs_non_dominated_models()
function is the
primary tool for visualization. It automatically creates a 2D or 3D plot
based on the number of objectives found in the data.
Visualizing the 2D Front (MOGA)
For our 2-objective run, the plot shows the direct trade-off between model fit and complexity.
# Generate a static 2D plot and customize the colors
objectives_vs_non_dominated_models(
darwin_data = moga_data_2_obj,
interactive = FALSE,
model_colors = c(Dominated = "grey", `Non-dominated` = "blue", `Final Front` = "black")
)
Interpreting the 2D Plot:
Each Point is a model.
The Ideal Point is the bottom-left corner (best fit, lowest complexity).
The “Knee” or “Elbow” of the front is often the most desirable region, representing the point of diminishing returns where adding more complexity yields only marginal improvements in fit.
Visualizing the 3D Front (MOGA3)
For our 3-objective run, an interactive plot is essential for exploration.
# Generate an interactive 3D plot of the Pareto front surface
objectives_vs_non_dominated_models(
darwin_data = moga_data_3_obj,
add_hull = TRUE, # Adds a surface to help visualize the front
interactive = TRUE # Creates a Plotly object for interaction
)
Interpreting the 3D Plot:
The Axes represent our three objectives:
-2LL
,NumParams
, andPPC_Penalty
.The Pareto Surface: The points form a 3D surface of optimal solutions.
Interactive Exploration: You can rotate, zoom, and hover over points to see their exact objective values. This allows you to visually identify regions of interest, such as models with a good fit and an acceptable PPC penalty, and then investigate their complexity.
Step 4: Deep Dive with the Interactive GUI
For the most comprehensive analysis,
Certara.DarwinReporter
provides a full interactive Shiny
application via the darwinReportUI()
function.
# Launch the Shiny GUI for our MOGA3 results
darwinReportUI(moga_data_3_obj)
This GUI provides powerful features for exploring the results (see Certara.DarwinReporter MOGA Vignette for details):
Live Filtering: Use sliders to filter the models based on objective values, which updates the plot and summary table in real-time.
Detailed Model Info: Click on any point in the plot to open a pop-up window showing that model’s specific details, including its PML code.
Integrated Diagnostics: For models on the final front, you can directly link to a diagnostics page to generate goodness-of-fit plots and other tables, helping you make the final selection.
By leveraging these visualization tools, you can transform the complex output of a MOGA run into actionable insights, leading to a more robust and defensible model selection.