Multi-Objective Optimization with Certara.RDarwin (MOGA / MOGA3)

library(Certara.RDarwin)
library(Certara.RsNLME)
library(Certara.DarwinReporter)
# required for for 3D hull with MOGA3
library(geometry)

Introduction

In pharmacometric modeling, the quest for the “best” model is rarely straightforward. We often need to balance multiple, competing criteria: maximizing goodness-of-fit while minimizing model complexity, ensuring predictive performance, and maintaining physiological plausibility.

Multi-Objective Genetic Algorithms (MOGAs) are powerful tools designed for this exact challenge. Instead of combining all criteria into a single score, a MOGA finds a set of optimal trade-off solutions, known as the Pareto front.

This vignette provides a step-by-step guide to performing multi-objective optimization using R-Darwin, covering the two algorithms provided by pyDarwin for multi-objective optimization (MOGA and MOGA3) and visualization of their results:

MOGA: with objectives being the objective function used by NONMEM and NLME for fitting (OFV) and number of estimated parameters (NEP).
MOGA3: For advanced optimization with three or more custom objectives, such as the objectives being OFV, NEP, and a Posterior Predictive Check (PPC). It is based on the NSGA-III algorithm (see the documentation at https://pymoo.org/algorithms/moo/nsga3.html?highlight=nsga%20ii).
Visualization: How to interpret the results using Certara.DarwinReporter.

1. Prerequisites and Environment Setup

Before you begin, it is crucial to set up your environment correctly.

R-Darwin Dependencies

Ensure that the necessary environment variables for the NLME Engine are set.

# Example for Windows:
Sys.setenv(INSTALLDIR = "C:/Program Files/Certara/NLME_Engine")
Sys.setenv(NLMEGCCDir64 = "C:/Program Files/Certara/NLME_Engine/gcc/bin")

# Example for Linux:
Sys.setenv(INSTALLDIR = "/opt/Certara/NLME_Engine")
Sys.setenv(NLMEGCCDir64 = "/usr/bin/") # Or another path to your GCC compiler
# this one is required for UBUNTU distros
# Sys.setenv("PML_BIN_DIR" = "UBUNTU")

# It's good practice to validate that the directories exist
stopifnot(dir.exists(Sys.getenv("INSTALLDIR")))
stopifnot(dir.exists(Sys.getenv("NLMEGCCDir64")))

Python and Rscript

You will also need a path to your Python interpreter. pyDarwin requires Python version 3.10 or newer.

# Define the path to your Python executable
PYTHON_PATH <- "/home/user/venv/bin/python3" # <-- CHANGE THIS to your Python path

2. Defining the Model Search Space

First, we must define the universe of models that pyDarwin will search. For this example, we will construct a search space for a PK model, exploring:

Structural Model: A choice between a 1, 2, or 3-compartment model.
Random Effects: Whether to include a random effect (IIV) on the peripheral volume V2.
Covariate Effects: Whether to include effects for Age, Body Weight, and Sex on both Clearance (Cl) and Volume (V).
Residual Error: A choice between a Proportional and an Additive-Multiplicative error model.

We define this using R-Darwin’s intuitive functions:

# 1. Start with three base structural models
models <- create_ModelPK(CompartmentsNumber = c(1, 2, 3))

# 2. Search for the presence of random effects on nV2
models <- models |>
  modify_Omega(Name = "nV2", State = "Searched")

# 3. Add Age, Body Weight and Sex covariates on Cl and V as searchable 
models <- models |>
  add_Covariate(Name = "Age", StParmNames = c("V", "Cl"), State = "Searched") |>
  add_Covariate(Name = "BW", StParmNames = c("V", "Cl"), State = "Searched", Center = 70) |>
  add_Covariate(Name = "Sex", Type = "Categorical", StParmNames = c("V", "Cl"), State = "Searched", Categories = c(0, 1))

# 4. Define the search space for the residual error model
models <- models |>
  modify_Observation(
    ObservationName = "CObs",
    SigmasChosen = Sigmas(
      AdditiveMultiplicative = list(PropPart = 0.1, AddPart = 0.01),
      Proportional = 0.1
      )
    )

3. Generating the Project Files

With the search space defined, we now create the input files required by pyDarwin: template.txt and tokens.json.

We will also set up a temporary working directory and copy our dataset and a post-processing script (which we’ll use later for the MOGA3 run) into it.


# Create a temporary directory for our project files
WorkingDir <- tempdir()

# Create the dataset for the run
data(pkData, package = "Certara.RsNLME")
write.csv(pkData, file = file.path(WorkingDir, "pkData.csv"), row.names = FALSE)

# Now, generate the template and tokens files from our model definition
write_ModelTemplateTokens(
  TemplateFilePath = file.path(WorkingDir, "template.txt"),
  TokensFilePath = file.path(WorkingDir, "tokens.json"),
  Description = "MOGA_Vignette_Search",
  Author = "Certara",
  DataFilePath = file.path(WorkingDir, "pkData.csv"),
  DataMapping = c(id = "Subject", time = "Act_Time", AMT = "Amount", CObs = "Conc", Age = "Age", BW = "BodyWeight", Sex = "Gender(female = 0, male = 1)"),
  PMLParametersSets = models,
  # We need simulation tables for the PPC script we'll use later
  SimArgs = specify_SimParams(sort = TRUE, numReplicates = 1, seed = 12345678),
  Tables = list(Table(Name = "ObsData.csv", KeepSource = TRUE, WhenObs = c("CObs")),
                  Table(Name = "SimData.csv", KeepSource = TRUE, VariablesList = c("CObs"), ForSimulation = TRUE))
)
#> information stored in C:\Users\jcraig\AppData\Local\Temp\Rtmpk37iFX/template.txt and C:\Users\jcraig\AppData\Local\Temp\Rtmpk37iFX/tokens.json

message("Project files created in: ", WorkingDir)
#> Project files created in: C:\Users\jcraig\AppData\Local\Temp\Rtmpk37iFX

Our project is now ready. We have a dataset, a template file defining the model structure, and a tokens file defining the full search space.

4. MOGA: 2-Objective Optimization

The simplest multi-objective search involves two objectives. By default, pyDarwin’s MOGA mode will optimize for:

Model Fitness: The primary objective function value (e.g., -2LL).
Number of Parameters: As a measure of model complexity.

This is a powerful way to find models that provide a good fit without being unnecessarily complex. The result is not a single “best” model, but a set of optimal trade-off solutions called the Pareto front.

Configuration

Setting up a MOGA run is straightforward. We use create_pyDarwinOptions() and set algorithm = "MOGA". The key is to balance the thoroughness of the search with the time it takes to run.

# For this vignette, we use small values to ensure the example runs quickly.
# For a real analysis, you would use significantly larger values.
moga_options <- create_pyDarwinOptions(
  algorithm = "MOGA",
  
  # --- Key Search Scope Parameters ---
  population_size = 6,  # DEMO VALUE: Number of models in each generation.
  num_generations = 3,  # DEMO VALUE: Number of evolutionary cycles.
  num_parallel    = 4,  # Speeds up the search by running models in parallel.
  
  # --- Local Search & Refinement ---
  downhill_period = 2,  # Run a local search around best models every 2 generations.
  
  # --- Housekeeping and Paths ---
  working_dir     = "MOGA_Results", # All output will be saved in this sub-folder.
  keep_files      = c("dmp.txt", "posthoc.csv", "ObsData.csv", "SimData.csv"),
  nlme_dir        = Sys.getenv("INSTALLDIR"),
  gcc_dir         = Sys.getenv("NLMEGCCDir64")
)

# It's good practice to save your options to a file for reproducibility.
write_pyDarwinOptions(moga_options, 
                      file = file.path(WorkingDir, "options_moga.json")
                      )

Parameter Deep Dive

For a successful MOGA run, understanding these key parameters is essential:

Controlling the Search Scope:
- population_size: This is the number of candidate models (the “population”) in each generation. Think of it as the breadth of your search. A larger population can explore more diverse solutions simultaneously but increases the run time of each generation. For this demo, we use 6, but for a real analysis, values between 20 and 100 are common.
- num_generations: This is the number of evolutionary cycles the algorithm will perform. Think of it as the depth of your search. More generations give the models more time to “evolve” and converge towards the true Pareto front. We use 3 for the demo, but 10 or more generations are typical for a thorough search.
Refining the Best Solutions:
- downhill_period: This is a powerful feature for refining results. Setting downhill_period = 2 tells pyDarwin to pause the standard “breeding” process every two generations and instead perform an intensive local search around the current best models on the Pareto front. This helps find small improvements that the genetic algorithm might otherwise miss.
- final_downhill_search: A related option (which is TRUE by default) that performs one last, exhaustive local search at the very end of the run. It is highly recommended to leave this enabled.
Performance:
- num_parallel: This is one of the most important settings for practical use. It specifies how many model runs can be executed simultaneously. Set this to the number of CPU cores you want to dedicate to the search to significantly reduce the overall run time.
Housekeeping and Paths:
- working_dir: This defines the root directory where pyDarwin will save all its output, including the output/, temp/, and non_dominated_models/ folders. Using a relative path like "MOGA_Results" is a good practice, as it creates a new folder within your current project directory.
- keep_files: This argument is useful for debugging and for post-processing. It tells pyDarwin to save specific files from individual model runs that would otherwise be deleted. This is essential if you need to inspect intermediate outputs.
Fine-Tuning the Genetic Algorithm (Advanced):
- If you need even more control, you can pass a configuration list to the MOGA argument itself using pyDarwinOptionsMOGA(). This allows you to tune core GA parameters like crossover_rate and mutation_rate.

# Example of advanced tuning
ga_tuning <- pyDarwinOptionsMOGA(crossover_rate = 0.9, mutation_rate = 0.95)

moga_options_tuned <- create_pyDarwinOptions(
  algorithm = "MOGA",
  MOGA = ga_tuning,
  # ... other options
)

Execution

With all files prepared (template.txt, tokens.json, and options_moga.json), we can now launch the search. The run_pyDarwin() function handles the call to the Python executable and monitors the process.

# Execute the MOGA run 
moga_results <- run_pyDarwin(
  InterpreterPath = PYTHON_PATH,   
  DirectoryPath   = WorkingDir,   
  OptionsPath     = "options_moga.json", # Point to our MOGA config   
  Wait            = TRUE )

When you execute this command, a detailed log will stream to the console. Below is a condensed and anonymized example of a successful MOGA run, highlighting the key stages.


#> [12:38:08] Running pyDarwin v3.1.0
#> [12:38:08] Options file found at /tmp/RtmpcflQuG/options_moga.json
#> [12:38:08] Preparing project working folder...
#> [12:38:08] Algorithm: MOGA
#> [12:38:08] Population size: 6
#> [12:38:08] num_generations: 3
#> [12:38:08] NLME found: /opt/Certara/NLME_Engine
#> [12:38:08] GCC found: /usr/bin/
#> [12:38:08] Search start time: Thu Jul 24 12:38:08 2025
#> ...
#> [12:38:35] Iteration =     1, Model     3,           Done,     OFV =  1534.816, NEP = 10,  message = No important warnings
#> [12:38:37] Iteration =     1, Model     2,           Done,     OFV =  1214.318, NEP = 13,  message = No important warnings
#> [12:38:47] Iteration =     1, Model     1,           Done,     OFV =  1207.310, NEP = 17,  message = No important warnings
#> ...
#> [12:39:01] Current Non Dominated models:
#> [12:39:01] Model NLME_1_1.mmdl,  OFV =  1207.310, NEP = 17
#> [12:39:01] Model NLME_1_2.mmdl,  OFV =  1214.318, NEP = 13
#> [12:39:01] Model NLME_1_3.mmdl,  OFV =  1534.816, NEP = 10
#> ...
#> [12:39:51] Current Non Dominated models:
#> [12:39:51] Model NLME_1_2.mmdl,  OFV =  1214.318, NEP = 13
#> [12:39:51] Model NLME_1_3.mmdl,  OFV =  1534.816, NEP = 10
#> [12:39:51] Model NLME_2_1.mmdl,  OFV =  1520.136, NEP = 11
#> [12:39:51] Model NLME_2_6.mmdl,  OFV =  1205.560, NEP = 16
#> [12:39:52] Starting downhill generation 2
#> [12:39:52] code for niche (minimal binary) 1 = [...], model #  NLME_1_2
#> ...
#> [12:39:52] Starting downhill step 1, total of 36 in 4 niches to be run.
#> ...
#> [12:44:06] Current Non Dominated models:
#> [12:44:06] Model NLME_2D01_37.mmdl,  OFV =  1202.446, NEP = 17
#> ...
#> [13:03:50]  MOGA best genome =
#> [[1 1 0 1 1 1 1 1 1 0]
#>  [0 0 0 0 0 0 0 0 0 0]
#>  ...],
#>  OFV and # of parameters =
#> [[1202.346   18.   ]
#>  [1674.356    5.   ]
#>  [1535.706    7.   ]
#>  [1265.59     8.   ]
#>  [1207.23    14.   ]
#>  [1245.5     11.   ]]
#> [13:03:50] Number of considered models: 778
#> [13:03:50] Number of models that were run during the search: 218
#> [13:03:50] Elapsed time: 25.7 minutes 
#> 
#> [13:03:50] Search end time: Thu Jul 24 13:03:50 2025

Interpreting the MOGA Output

The console output provides a real-time view into the evolutionary process. Here’s what to look for:

Configuration Summary: The run begins by confirming all settings, paths, and the size of the search space (in this case, 768 possible models). This is the first place to check if the run isn’t behaving as you expect.
Model Execution and the Pareto Front: In each generation, pyDarwin fits models and reports their Objective Function Value (OFV, which is -2LL) and Number of Estimated Parameters (NEP). The Current Non Dominated models list is the most important output; it represents the evolving Pareto front.
- For example, after Generation 1, model NLME_1_1 has a great fit (OFV=1207.3) but is complex (NEP=17), while model NLME_1_3 is much simpler (NEP=10) but has a worse fit (OFV=1534.8). Neither is strictly “better”; they are different, optimal trade-offs.
The Downhill Search: The log entry Starting downhill generation 2 signifies a shift in strategy. Because we set downhill_period = 2, the algorithm pauses the standard evolutionary process to perform an intensive local search around the best solutions found so far (“niches”). This is a critical step for refining the Pareto front and finding subtle improvements.
Final Results: At the end, pyDarwin reports the MOGA best genome, which is the final set of non-dominated models that make up the Pareto front. It also provides summary statistics, showing that of the 778 possible models, it only needed to run 218 to find the optimal set, demonstrating the efficiency of the genetic search.

5. MOGA3: 3-Objective Optimization with Custom Post-Processing

While optimizing for fit and complexity is powerful, you often need to incorporate other criteria, such as a model’s predictive performance. This is where the MOGA3 algorithm excels, as it is designed for three or more objectives.

To use MOGA3, you must provide a post-processing script that calculates the value of each objective after a model is run. For this example, we will add a third objective: a Posterior Predictive Check (PPC) based on Cmax.

The Post-Processing Script

The post-processing script is the heart of a custom MOGA3 run. It is an R (or Python) script that pyDarwin executes after every successful model fit. It has a strict requirement for its return value.

The script must return two vectors:

A numeric vector of the objectives to be minimized.
A numeric vector of constraints, where each value must be <= 0 for the model to be considered “feasible.” If a model is not feasible, it is discarded.

Our script, CmaxPPC.R, will calculate three objectives and three constraints.

# Define the content of our post-processing R script as a character string
# NOTE: it is used for MOGA3 only
ppc_script_code <- r"(
# Load estimation results from the dmp.txt file
source("dmp.txt")

# Calculate the first two objectives: -2LL and the number of parameters
neg2LL <- -2 * dmp.txt$logLik
numEstimatedParams <- dmp.txt$nParm

# Load observed and simulated data, filtering for post-dose observations only
obsData <- subset(read.csv("ObsData.csv"), time > 0, select = c("id5", "CObs"))
simData <- subset(read.csv("SimData.csv"), time > 0, select = c("id5", "CObs"))

# Calculate Cmax per subject for both observed and simulated data
obsCmax <- tapply(obsData$CObs, obsData$id5, max)
simCmax <- tapply(simData$CObs, simData$id5, max)

# Calculate the third objective: a penalty based on the geometric mean of Cmax
# This penalizes models where simulated Cmax diverges from observed Cmax
obsCmax_geoMean <- exp(mean(log(obsCmax)))
simCmax_geoMean <- exp(mean(log(simCmax)))
penalty <- 4 * abs((obsCmax_geoMean - simCmax_geoMean) / obsCmax_geoMean) * 100

# --- Return Values for pyDarwin ---
# Vector 1: The objectives to be MINIMIZED.
c(neg2LL, numEstimatedParams, penalty)

# Vector 2: The constraints. ALL values must be <= 0 for the model to be considered 'feasible'.
c(neg2LL - 99999998, # Fails if the model crashed (returns a value near the crash_value)
  2 - numEstimatedParams, # Fails if the model has fewer than 2 parameters
  penalty - 50            # Fails if the PPC penalty is greater than 50
)
)"

# Write the R code string to a file named CmaxPPC.R in our project directory
writeLines(ppc_script_code, con = file.path(WorkingDir, "CmaxPPC.R"))

MOGA3 Configuration

Configuring a MOGA3 run is more involved than a MOGA run because you are responsible for defining the objectives and constraints yourself. This is done by linking pyDarwin to a post-processing script that performs these calculations.

The setup involves three main components: 1. pyDarwinOptionsMOGA(): To configure the specifics of the NSGA-III algorithm (like the number of objectives). 2. pyDarwinOptionsPostprocess(): To define the external R or Python script that will be executed. 3. create_pyDarwinOptions(): To assemble everything and set algorithm = "MOGA3".

# --- 1. Define MOGA3 Algorithm Specifics ---
# This part tells pyDarwin how to handle the multi-objective search itself.
moga3_spec <- pyDarwinOptionsMOGA(
  objectives  = 3,  # Must match the length of the first returned vector from our R script.
  constraints = 3,  # Must match the length of the second returned vector from our R script.
  partitions  = 3   # Key parameter for the NSGA-III algorithm to ensure diversity.
)

# --- 2. Configure the Post-Processing Script ---
# This tells pyDarwin to run our R script after each successful model fit.
postprocess_spec <- pyDarwinOptionsPostprocess(
  use_r             = TRUE,
  post_run_r_code   = file.path(WorkingDir, "CmaxPPC.R"),
  r_timeout         = 600 # Safety feature: timeout in seconds for the R script.
)

# --- 3. Assemble the Final Options ---
moga3_options <- create_pyDarwinOptions(
  algorithm       = "MOGA3",
  MOGA            = moga3_spec,
  postprocess     = postprocess_spec,
  
  # --- Search Scope (use larger values for real analysis) ---
  num_generations = 6,  
  population_size = 12,
  
  # --- Housekeeping and Paths ---
  data_dir        = WorkingDir, # Location of the input dataset (pkData.csv).
  working_dir     = file.path(WorkingDir, "MOGA3_Results"), # Root for all output folders.
  nlme_dir        = Sys.getenv("INSTALLDIR"),
  gcc_dir         = Sys.getenv("NLMEGCCDir64")
)

# Save the final options to a new file
write_pyDarwinOptions(moga3_options, file = file.path(WorkingDir, "options_moga3.json"))

Parameter Deep Dive

For a MOGA3 run, the interaction between the options and your post-processing script is critical.

Defining the MOGA3 Search (pyDarwinOptionsMOGA):
- objectives and constraints: These numbers are not arbitrary; they must exactly match the length of the two numeric vectors returned by your post_run_r_code script. A mismatch will cause the run to fail.
- partitions: This is a key parameter for the NSGA-III algorithm. It helps maintain diversity across the solutions by dividing the objective space using a set of reference directions. The number of partitions directly impacts the number of points on the final Pareto front. For a 3-objective problem, a value of 3 or 4 is a good starting point. For more objectives, this value typically needs to be increased.
Configuring the Post-Processing Script (pyDarwinOptionsPostprocess):
- post_run_r_code: This must be the path to your R script. pyDarwin will execute this script from within a temporary model run directory, which is why the script can find files like dmp.txt and SimData.csv using relative paths.
- r_timeout: This is a crucial safety feature. If your R script encounters an error or an infinite loop, this timeout ensures that a single failing model doesn’t stall the entire multi-hour search process.
Balancing Search Scope and Time:
- population_size and num_generations: Just as with MOGA, these control the breadth and depth of your search. However, because a 3D objective space is much larger and more complex than a 2D space, a MOGA3 search often requires a larger population size and more generations than a comparable MOGA run to adequately explore the Pareto front.
Directory Management:
- data_dir: This tells pyDarwin where to find the input dataset (e.g., pkData.csv) as specified in your template.txt file’s ##DATA directive.
- working_dir: This is the root directory where all outputs from the run will be stored, including the output/, temp/, and non_dominated_models/ folders. It is important to note that the post-processing script runs inside a subdirectory of this working_dir.

MOGA3 Execution and Visualization

The execution command is identical to the MOGA run, but we point to our new options file.

# Execute the MOGA3 run
moga3_results <- run_pyDarwin(
  InterpreterPath = PYTHON_PATH,
  DirectoryPath   = WorkingDir,
  OptionsPath     = "options_moga3.json", # Point to our MOGA3 config
  Wait            = TRUE
)

Interpreting the MOGA3 Output

The log from a MOGA3 run provides a rich, real-time view of the search. Below is a condensed and anonymized example highlighting the key stages and outputs.


#> [14:37:27] Running pyDarwin v3.1.0
#> [14:37:27] Options file found at /tmp/RtmpcflQuG/options_moga3.json
#> [14:37:28] Algorithm: MOGA3
#> [14:37:28] Population size: 12
#> [14:37:28] num_generations: 6
#> [14:37:28] Number of objectives: 3
#> [14:37:28] NLME found: /opt/Certara/NLME_Engine
#> [14:37:28] GCC found: /usr/bin/
#> [14:37:28] RScript found at /usr/bin/Rscript
#> [14:37:28] Post Run R code found at /tmp/RtmpcflQuG/CmaxPPC.R
#> ...
#> [14:37:49] Iteration =     1, Model     3,           Done,     f1 =  1534.816, f2 =    10.000, f3 =    37.464,  message = No important warnings
#> [14:37:53] Iteration =     1, Model     2,           Done,     f1 =  1214.318, f2 =    13.000, f3 =     7.843,  message = No important warnings
#> [14:38:51] Iteration =     1, Model    11,           Done,     f1 =  1202.346, f2 =    18.000, f3 =    18.127,  message = No important warnings
#> ...
#> [14:38:51] Current Non Dominated models:
#> [14:38:51] Model NLME_1_02.mmdl,  f1 =  1214.318, f2 =    13.000, f3 =     7.843
#> [14:38:51] Model NLME_1_06.mmdl,  f1 =  1264.534, f2 =    13.000, f3 =     3.016
#> [14:38:51] Model NLME_1_07.mmdl,  f1 =  1559.188, f2 =     7.000, f3 =    48.388
#> ...
#> [14:40:19] Starting downhill generation 2
#> ...
#> [14:44:20] Current Non Dominated models:
#> [14:44:20] Model NLME_2D01_29.mmdl,  f1 =  1206.092, f2 =    15.000, f3 =    17.869
#> [14:44:20] Model NLME_2D01_42.mmdl,  f1 =  1265.590, f2 =     9.000, f3 =     3.397
#> [14:44:20] Model NLME_2_06.mmdl,  f1 =  1227.924, f2 =    13.000, f3 =     4.039
#> ...
#> [14:53:45]  MOGA best genome =
#> [[0 1 0 1 0 0 0 0 1 0]
#>  [1 1 0 1 1 0 0 1 0 0]
#>  [0 1 1 1 1 1 1 0 0 0]
#>  [0 1 1 0 0 0 0 0 0 0]],
#>  OFV and # of parameters =
#> [[1265.22       10.          2.112673]
#>  [1206.092      15.         17.86854 ]
#>  [1227.924      13.          4.038766]
#>  [1265.59        9.          3.39664 ]]
#> [14:53:45] Number of considered models: 482
#> [14:53:45] Number of models that were run during the search: 154
#> [14:53:45] Elapsed time: 16.3 minutes 
#> 
#> [14:53:45] Search end time: Thu Jul 24 14:53:45 2025

The log from a MOGA3 run is fundamentally different from a standard MOGA run. Here’s what to look for:

Custom Objective Reporting: The output no longer shows OFV and NEP. Instead, it displays f1, f2, and f3. These correspond directly to the values returned by our CmaxPPC.R script, in order:
- f1: -2 * logLik
- f2: numEstimatedParams
- f3: penalty
The 3D Pareto Front: The Current Non Dominated models list now shows the trade-offs across all three objectives. For example:
- Model NLME_1_06 has a good PPC score (f3=3.0) but a relatively poor fit (f1=1264.5).
- Model NLME_1_11 has an excellent fit (f1=1202.3) but is very complex (f2=18) and has a high PPC penalty (f3=18.1). These models represent points on a 3D surface of optimal solutions. No single point is definitively “best”—the choice depends on the project goals.
Feasibility and Constraints: Although not explicitly shown for every model, our constraints (neg2LL < 99999998, numEstimatedParams >= 2, and penalty <= 50) were checked for every run. Any model that failed these checks would have been discarded and not considered for the Pareto front. In the log, you can see that Model NLME_2_01 with f3 = 50.592 was not included in the next generation’s non-dominated list, likely because it violated our penalty - 50 <= 0 constraint.

Now that we have this rich, multi-dimensional result, the next step is to visualize it to better understand the trade-offs.

6. Visualizing MOGA Results with `Certara.DarwinReporter`

The result of a MOGA run is not a single “best” model, but a collection of optimal, non-dominated solutions that form the Pareto front. Simply looking at a table of objective values can be overwhelming. Visualization is therefore an essential step to understand the trade-offs between objectives and to make an informed model selection decision.

The Certara.DarwinReporter package is specifically designed for this purpose, providing functions to load, process, and plot the results from a pyDarwin run.

Step 1: Loading the Results Data

The first step is to load the results from your pyDarwin working directory into a special darwin_data object. This object acts as a container for all the run information.

Key Functions:

darwin_data(): Creates the main container object. You point it to the root of your results folder (the path you specified in the working_dir option).
import_non_dominated_models(): Scans the output directory, finds all executed models, parses their results, and populates the darwin_data object.

# --- Load results from the 2-objective MOGA run ---
moga2_output_path <- file.path(WorkingDir, "MOGA_Results")
# need to rename options file to options.json
file.copy(file.path(WorkingDir, "options_moga.json"), 
          file.path(WorkingDir, "options.json"),
          overwrite =TRUE)
moga_data_2_obj <- 
  darwin_data(project_dir = dirname(moga2_output_path),
              working_dir = moga2_output_path) |>
  import_non_dominated_models(dir = file.path(moga2_output_path, "non_dominated_models"))
summarise_overall_by_non_dominated_models(moga_data_2_obj)

# --- Load results from the 3-objective MOGA3 run ---
moga3_output_path <- file.path(WorkingDir, "MOGA3_Results")
file.copy(file.path(WorkingDir, "options_moga3.json"), 
          file.path(WorkingDir, "options.json"),
          overwrite =TRUE)
moga_data_3_obj <- 
  darwin_data(project_dir = dirname(moga3_output_path),
              working_dir = moga3_output_path) |>
  import_non_dominated_models(dir = file.path(moga3_output_path, "non_dominated_models"))

Step 2: Understanding the Output Data

Before plotting, it’s useful to inspect the loaded data. The summarise_overall_by_non_dominated_models() function provides a clean summary table of the models that made it to the final Pareto front.

# Inspect just the final front models from the MOGA3 run
summarise_overall_by_non_dominated_models(moga_data_3_obj)

This summary includes standard metrics like -2LL, AIC, and BIC, alongside the custom objective values from the run (f1, f2, f3). It also includes two important logical columns: global_ff and moga_ff.

Understanding the Final Front:

global_ff (Global Final Front): A model is included here based on a strict, objective-based comparison. It is on the global front if no other model exists that is better or equal in all objectives. This is a purely data-driven assessment of the final results.
moga_ff (MOGA Final Front): This is the final front as determined by the pyDarwin algorithm itself, which may consider factors from the evolutionary process like mutation and crossover in addition to the final objective values.

While both methods usually yield similar results, retaining both provides flexibility for analysis.

Step 3: Visualizing the Pareto Front

The objectives_vs_non_dominated_models() function is the primary tool for visualization. It automatically creates a 2D or 3D plot based on the number of objectives found in the data.

Visualizing the 2D Front (MOGA)

For our 2-objective run, the plot shows the direct trade-off between model fit and complexity.

# Generate a static 2D plot and customize the colors
objectives_vs_non_dominated_models(
  darwin_data = moga_data_2_obj,
  interactive = FALSE,
  model_colors = c(Dominated = "grey", `Non-dominated` = "blue", `Final Front` = "black")
)

A 2D scatter plot showing the relationship between the Number of Estimated Parameters on the x-axis and the objective function value (-2LL) on the y-axis

Interpreting the 2D Plot:

Each Point is a model.
The Ideal Point is the bottom-left corner (best fit, lowest complexity).
The “Knee” or “Elbow” of the front is often the most desirable region, representing the point of diminishing returns where adding more complexity yields only marginal improvements in fit.

Visualizing the 3D Front (MOGA3)

For our 3-objective run, an interactive plot is essential for exploration.

# Generate an interactive 3D plot of the Pareto front surface
objectives_vs_non_dominated_models(
  darwin_data = moga_data_3_obj, 
  add_hull = TRUE,      # Adds a surface to help visualize the front
  interactive = TRUE    # Creates a Plotly object for interaction
)

Interpreting the 3D Plot:

The Axes represent our three objectives: -2LL, NumParams, and PPC_Penalty.
The Pareto Surface: The points form a 3D surface of optimal solutions.
Interactive Exploration: You can rotate, zoom, and hover over points to see their exact objective values. This allows you to visually identify regions of interest, such as models with a good fit and an acceptable PPC penalty, and then investigate their complexity.

Step 4: Deep Dive with the Interactive GUI

For the most comprehensive analysis, Certara.DarwinReporter provides a full interactive Shiny application via the darwinReportUI() function.

# Launch the Shiny GUI for our MOGA3 results
darwinReportUI(moga_data_3_obj)

This GUI provides powerful features for exploring the results (see Certara.DarwinReporter MOGA Vignette for details):

Live Filtering: Use sliders to filter the models based on objective values, which updates the plot and summary table in real-time.
Detailed Model Info: Click on any point in the plot to open a pop-up window showing that model’s specific details, including its PML code.
Integrated Diagnostics: For models on the final front, you can directly link to a diagnostics page to generate goodness-of-fit plots and other tables, helping you make the final selection.

By leveraging these visualization tools, you can transform the complex output of a MOGA run into actionable insights, leading to a more robust and defensible model selection.

Introduction

1. Prerequisites and Environment Setup

R-Darwin Dependencies

Python and Rscript

2. Defining the Model Search Space

3. Generating the Project Files

4. MOGA: 2-Objective Optimization

Configuration

Parameter Deep Dive

Execution

Interpreting the MOGA Output

5. MOGA3: 3-Objective Optimization with Custom Post-Processing

The Post-Processing Script

MOGA3 Configuration

Parameter Deep Dive

MOGA3 Execution and Visualization

Interpreting the MOGA3 Output

6. Visualizing MOGA Results with Certara.DarwinReporter

Step 1: Loading the Results Data

Step 2: Understanding the Output Data

Step 3: Visualizing the Pareto Front

Visualizing the 2D Front (MOGA)

Visualizing the 3D Front (MOGA3)

Step 4: Deep Dive with the Interactive GUI

6. Visualizing MOGA Results with `Certara.DarwinReporter`