RDarwin: Executing pyDarwin Jobs on a Remote Host

library(Certara.RDarwin)

Introduction to Remote Execution

The Certara.RDarwin package extends its capabilities to allow users to launch and manage pyDarwin model selection jobs on a remote Linux host. This is particularly useful for computationally intensive searches that can benefit from more powerful remote hardware or for keeping your local R session free.

This vignette will guide you through the process of setting up and running a pyDarwin job remotely using Certara.RDarwin, and how to later reconnect to monitor or retrieve results. It assumes you are familiar with the basic concepts of creating template.txt, tokens.json, and options.json files as covered in the “RDarwin: An R Interface to pyDarwin” vignette.

Prerequisites for Remote Execution:

A Linux remote host (e.g., RHEL 8/9, Ubuntu 22.04/24.04).
pyDarwin must be installed on the remote host.
Certara NLME Engine, Certara.RsNLME R package with dependencies must be installed and configured on the remote host.
SSH access to the remote host (key-based authentication is recommended).
The local machine requires the ssh R package.

Workflow for Remote Execution

The general workflow involves:

Local Setup: Creating your template.txt, tokens.json, and options.json files locally, just as you would for a local run.
Launching the Remote Job: Using run_pyDarwinRemote() to transfer files and start pyDarwin on the remote host.
Monitoring and Retrieving Results:
- If Wait = TRUE, run_pyDarwinRemote() will monitor the job and download results upon completion.
- If Wait = FALSE, run_pyDarwinRemote() starts the job and returns immediately. You can then use reconnect_pyDarwinJob() to check status and get results later.
Stopping a Remote Job (Optional): Using stop_pyDarwinRemote() to signal a running remote job to terminate.

We will use a similar PK model example as in the introductory RDarwin vignette.

1. Local Setup: Preparing Input Files

First, we set up a temporary working directory and copy our example data file.

WorkingDir <- tempdir()
DataFilePath <- file.path(WorkingDir, "timeVaryingCovariates.csv")
file.copy(system.file(package = "Certara.RDarwin", "examples", "timeVaryingCovariates.csv"),
          DataFilePath, overwrite = TRUE)

Create Template and Tokens Files

We define a simple PK model search space.

modelPMLCodes <- create_ModelPK(
  CompartmentsNumber = 1,
  Parameterization = "Clearance",
  ClosedForm = FALSE,
  CObs = Observation(
    ObservationName = "CObs",
    SigmasChosen = list(Additive = 0.02),
    BQL = TRUE,
    BQLValue = 0.1
  ),
  V = StParm(StParmName = "V", Type = "LogNormal2"),
  Cl = StParm(StParmName = "Cl", Type = "LogNormal2"),
  scr = Covariate(
    Name = "scr",
    State = "Searched"
  )
)

TemplateFilePath <- file.path(WorkingDir, "template.txt")
TokensFilePath <- file.path(WorkingDir, "tokens.json")

generatedOutput <-
  write_ModelTemplateTokens(
    TemplateFilePath = TemplateFilePath,
    TokensFilePath = TokensFilePath,
    Description = "remoteSearchCov",
    Author = "CertaraRemote",
    DataFilePath = DataFilePath, # Local path to data
    DataMapping = c(
      id = "id", time = "time", AMT = "dose", CObs = "dv", scr = "scr"
    ),
    PMLParametersSets = modelPMLCodes,
    EstArgs = specify_EngineParams()
  )
#> information stored in C:\Users\jcraig\AppData\Local\Temp\RtmpoHH4kK/template.txt and C:\Users\jcraig\AppData\Local\Temp\RtmpoHH4kK/tokens.json

Create Options File

We configure options.json. Crucially, for remote NLME runs, nlme_dir and gcc_dir must point to valid paths on the remote host. The rscript_path should also be valid on the remote if post-processing uses R.

For this example, we’ll use placeholder paths for nlme_dir and gcc_dir. You would replace these with actual paths on your remote server. The working_dir, output_dir, etc., will be resolved relative to a project directory on the remote host.

# Define paths as they should be interpreted on the remote system,
# or rely on pyDarwin system options on the remote.
# For this example, we assume these are set in PYDARWIN_OPTIONS on remote
# or pyDarwin can find them.
# If not, you MUST provide correct remote paths here.

# Example remote paths (MODIFY THESE FOR YOUR REMOTE SETUP)
RemoteNlmeDir <- "/opt/InstallDirNLME/" # Example
RemoteGccDir <- "/usr/bin/"                     # Example (if gcc is in /usr/bin)
RemoteRscriptPath <- "/usr/bin/Rscript"    # Example

# Option setup for a remote NLME run
optionSetupRemote <- create_pyDarwinOptions(
  project_name = "MyRemotePKProject", # Good practice to set a project name
  algorithm = "EX", # Using Exhaustive for a quicker example
  engine_adapter = "nlme",
  # These paths MUST be valid on the remote host if not using system options there
  nlme_dir = RemoteNlmeDir,
  gcc_dir = RemoteGccDir,
  rscript_path = RemoteRscriptPath, # If using R postprocessing
  # working_dir, output_dir, etc., will be handled by run_pyDarwinRemote
  # based on its RemoteBaseDir and the project_name.
  # For example, remote working_dir will become something like:
  # ~/.rdarwin/MyRemotePKProject/
  # and remote output_dir: ~/.rdarwin/MyRemotePKProject/output/
  # data_dir will also be handled correctly on the remote within the project dir.
  num_parallel = 2 # Adjust based on remote server's capacity
)

OptionsFilePath <- file.path(WorkingDir, "options.json")
write_pyDarwinOptions(pyDarwinOptions = optionSetupRemote,
                      file = OptionsFilePath)

2. Launching the Remote Job with `run_pyDarwinRemote`

The run_pyDarwinRemote() function is used to initiate the job. You’ll need to provide SSH credentials (host, user, and password or key path) and paths to your locally generated files.

Important: The RemoteInterpreterPath argument should point to the Python interpreter (ideally within a virtual environment where pyDarwin is installed) on the remote host.

# --- SSH Connection Details (MODIFY THESE) ---
RemoteHost <- "your.remote.server.com"
RemoteUser <- "your_remote_username"
# For key-based authentication (recommended):
RemoteKeyPath <- "~/.ssh/id_rsa_remote_server" # Path to your private key
RemotePassword <- NULL
# Or for password-based authentication (less secure):
# RemoteKeyPath <- NULL
# RemotePassword <- "your_remote_password"

# --- Path to Python on Remote Host (MODIFY THIS) ---
# Example: Python in a venv on the remote server
PythonPathOnRemote <- "~/pydarwin_venv/bin/python3"

# Launch the job, Wait = FALSE to run in background
# The LocalDirectoryPath argument tells run_pyDarwinRemote where to find
# the input files if TemplatePath, TokensPath, OptionsPath are relative names,
# and also where to save the job info file.
RunResult_NoWait <- run_pyDarwinRemote(
  Host = RemoteHost,
  User = RemoteUser,
  Password = RemotePassword,
  KeyPath = RemoteKeyPath,
  LocalDirectoryPath = WorkingDir, # Base for finding local files and storing job info
  LocalTemplatePath = "template.txt",  # Name relative to WorkingDir
  LocalTokensPath = "tokens.json",    # Name relative to WorkingDir
  LocalOptionsPath = "options.json",     # Name relative to WorkingDir
  RemoteInterpreterPath = PythonPathOnRemote,
  Wait = FALSE, # Start job and return immediately
  verbose = TRUE # For more detailed SSH output
)

# RunResult_NoWait will contain information to reconnect, including LocalJobInfoFile path
print(RunResult_NoWait)
# Example output:
# $LocalJobInfoFile
# [1] "/tmp/Rtmpxxxxxx/MyRemotePKProject_remote_job_info.json"
# $RemoteProjectDir
# [1] "~/.rdarwin/MyRemotePKProject" # Default remote base
# $RemoteJobPID
# [1] "12345" # Example PID
# $Host
# [1] "your.remote.server.com"
# $User
# [1] "your_remote_username"
# $ProjectName
# [1] "MyRemotePKProject"

When Wait = FALSE, run_pyDarwinRemote uploads the necessary files (template, tokens, modified options, data file, and a launcher script), starts pyDarwin in the background on the remote host, and returns a list. This list includes LocalJobInfoFile, which is a JSON file saved locally containing details like RemoteHost, RemoteUser, RemoteProjectDir, and RemoteJobPID. This file is key for reconnecting.

3. Reconnecting and Monitoring with `reconnect_pyDarwinJob`

If you launched a job with Wait = FALSE, or interrupted the execution of local run_pyDarwinRemote() with Wait = TRUE, you can later use reconnect_pyDarwinJob() to check its status, stream messages.txt updates, and retrieve results once it’s finished.

This function uses the LocalJobInfoFile (or LocalDirectoryPath and ProjectName) to find the job details.

# Assuming RunResult_NoWait from the previous step
# and that options.json is still available in WorkingDir for ProjectName derivation by the reconnect function.

# Path to the job info file saved by run_pyDarwinRemote(Wait=FALSE)
JobInfoFileToReconnect <- RunResult_NoWait$LocalJobInfoFile

# Or, if you only have LocalDirectoryPath and know the ProjectName:
# reconnect_pyDarwinJob(
#   LocalDirectoryPath = WorkingDir,
#   # ProjectName will be derived from options.json in WorkingDir
#   Password = RemotePassword, # Or KeyPath
#   KeyPath = RemoteKeyPath
# )


if (!is.null(JobInfoFileToReconnect) && file.exists(JobInfoFileToReconnect)) {
  message("Attempting to reconnect to job using: ", JobInfoFileToReconnect)

  # We need to provide OriginalOptionsPath so Reconnect can get engine_adapter etc.
  # and derive ProjectName if not perfectly in job info.
  # LocalDirectoryPath is also needed as a base for where results will be downloaded.
  ReconnectResults <- reconnect_pyDarwinJob(
    LocalDirectoryPath = WorkingDir, # Base directory for downloads and finding other files
    LocalJobInfoFilePath = JobInfoFileToReconnect,
    OriginalOptionsPath = OptionsFilePath, # Provide path to original options.json
    Password = RemotePassword, # Or KeyPath
    KeyPath = RemoteKeyPath,
    MonitoringInterval = 60, # Check every 60 seconds
    verbose = TRUE
  )

  # ReconnectResults will be similar to the output of run_pyDarwinRemote(Wait=TRUE)
  # It will be a list containing $results (data.frame), $FinalResultFile, $FinalControlFile
  # or the content of messages.txt if results parsing failed.
  if (is.list(ReconnectResults) && "results" %in% names(ReconnectResults)) {
    print(head(ReconnectResults$results))
    # cat(ReconnectResults$FinalResultFile, sep = "\n")
  } else if (is.character(ReconnectResults)) {
    # Likely content of messages.txt
    # cat(ReconnectResults, sep = "\n")
  }

} else {
  message("Job info file not found. Cannot reconnect.")
}

reconnect_pyDarwinJob will: 1. Read the job details from LocalJobInfoFile. 2. Re-establish the SSH connection. 3. Monitor the remote pyDarwin process by checking its PID and streaming messages.txt. 4. Once the job is detected as finished (PID no longer exists), it will download results (output/, key_models/, non_dominated_models/, messages.txt) to a local subdirectory (e.g., WorkingDir/MyRemotePKProject_Results/). 5. Finally, it processes these downloaded files to return a list similar to run_pyDarwin().

4. Stopping a Remote Job with `stop_pyDarwinRemote` (Optional)

If you need to stop a running remote job before its natural completion, you can use stop_pyDarwinRemote(). This function signals pyDarwin to terminate by creating a stop.darwin file in its remote project directory.

# Assuming RunResult_NoWait from the initial launch step is available,
# or you have the LocalJobInfoFile path.
# We also need the original options.json to help locate the job info file if not specified.

if (!is.null(RunResult_NoWait$LocalJobInfoFile) && file.exists(RunResult_NoWait$LocalJobInfoFile)) {
  StopSignalSent <- stop_pyDarwinRemote(
    LocalDirectoryPath = WorkingDir, # Base directory
    LocalJobInfoFilePath = RunResult_NoWait$LocalJobInfoFile,
    OriginalOptionsPath = OptionsFilePath, # To help derive project name if needed
    Password = RemotePassword, # Or KeyPath
    KeyPath = RemoteKeyPath,
    verbose = TRUE
  )

  if (StopSignalSent) {
    message("Stop signal sent to remote pyDarwin job.")
    # You might want to call reconnect_pyDarwinJob after a short delay
    # to confirm it stopped and retrieve any partial results.
  } else {
    message("Failed to send stop signal.")
  }
}

This function requires similar information to reconnect_pyDarwinJob to locate the remote job (primarily LocalDirectoryPath and optionally explicit paths to job info and options files).

Conclusion

Certara.RDarwin provides a comprehensive suite of functions (run_pyDarwinRemote, reconnect_pyDarwinJob, stop_pyDarwinRemote) to manage pyDarwin model selection tasks on remote Linux hosts. This allows users to leverage more powerful computing resources and manage long-running jobs effectively from their familiar R environment. Remember to replace placeholder paths and credentials with your actual remote server details. Always ensure that pyDarwin and its dependencies (NLME Engine, RsNLME, Python environment) are correctly set up on the remote host.