The purpose of this vignette is to demonstrate how to utilize the suite of R packages developed by Certara, RsNLME, for a remote execution.

There are several prerequisites to perform remote execution described below: Client Prerequisites and Server Prerequisites.

Client Prerequisites

Client System Requirements

Only following OSes are supported:

  • Windows 8, 10

  • Windows Server 2018, 2019

  • Linux CentOS8/RHEL8 or Ubuntu 22.04

Client Installation Prerequisites

Certara.RsNLME package should be installed on the Client machine within NLME engine. Please see RsNLME Installation Guide for details. Command line interface requires installation of the Certara.RsNLME R package and its associated dependencies. Other R packages developed by Certara may be installed but are not required.

Client Licensing

Local or node license is required on the both local and remote hosts. Please see Licensing.

Note: There are no additional costs to acquire the local license if you have already licensed RsNLME for remote execution.

Server Prerequisites

Server System Requirements

Linux CentOS8/RHEL8 and Ubuntu 22.04 are supported.

  • ‘openssh-server’ software installed

  • OpenMPI library installed and configured (optional)

  • SGE or TORQUE or LSF or SLURM grid (optional).

Server Installation Prerequisites

Please see RsNLME Installation Guide, Installation of NLME Engine and Installation of R Packages chapters.

Note: The Certara.NLME8 package, a dependency that is automatically installed with Certara.RsNLME, is the only package that is required on the remote host. The user may simply install this package without installing the entire suite of RsNLME packages on the remote host.

# the code should be executed on remote host
install.packages("Certara.NLME8", 
  repos = c("https://certara.jfrog.io/artifactory/certara-cran-release-public/", 
            "https://cloud.r-project.org"))

Server licensing

License required on the server side. Please see Licensing.

For remote runs licensing, please use one of the following:

  • Put the license file lservrc to the directory with NLME Engine files ($INSTALLDIR).

  • Put the license file lservrc to the directory on the server and specify the path in licenseFile argument of the hostParams() function (see an example below).

  • Put the license file lservrc to the directory on the server and specify PhoenixLicenseFile env.variable which points to the license file in ~/.bashrc configuration file for the current user (the environment variable should be available in non-interactive sessions).

  • Put the license file lservrc to the directory on the server and specify PhoenixLicenseFile env.variable which points to the license file in the shell script (please check the X flag) to be added in the argument list.

  • Specify PhoenixLicenseServer env.variable which points to the license server running.

Host to be used for remote runs

Simple Remote Host

Simple remote host is designed for using with the jobs where parallelization is not required (i.e. simple fit of the model with few fixed effects/omegas). The following code should be modified before execution on the local host. It creates a host which will be used for connection.

# example model
model <- emaxmodel(
  checkBaseline = TRUE,
  checkFractional = TRUE,
  checkInhibitory = TRUE,
  data = pkpdData,
  ID = "ID",
  C = "CObs",
  EObs = "EObs"
)

remoteSimpleHost <- hostParams(
  sharedDirectory = "/home/user/temp",                 # a directory where the model files are stored,
                                                       # should exist before run
  installationDirectory = "/home/user/InstallDirNLME", # a directory where NLME Engine is unzipped
  licenseFile = "/home/user/InstallDirNLME/lservrc",   # a license file; note that the file is inside $INSTALLDIR,
                                                       # so it is not necessary to specify it here
  hostName = "remoteSimpleHost",                       # an internal name of the host, not used in computations
  machineName = "192.168.1.1",                         # network address used for ssh session
  hostType = "Linux",                                  # remote host OS; currently Linux only supported
  numCores = 1,                                        # number of cores
  isLocal = FALSE,                                     # FALSE since the remote run is used
  rLocation = "/usr/local/bin",                        # R path to be used for execution
  userName = "user",                                   # user account on the remote host
  userPassword = "password",                           # user password on the remote host
  parallelMethod = "None"                              # no parallelization methods for the current host
)

# note that shortcuts as tilde(~) should be avoided in remote paths
# when ready, the host could be used for remote runs:
# fitmodelResults <- fitmodel(model = model, hostPlatform = remoteSimpleHost)

RsNLME will send all necessary files to the remote directory specified, will report current iterations status, will download resulted files from remote directory when ready and will load the resulted data frames.

Remote Host with MPI parallelization

Next, we’ll modify the host to use openMPI installed on the system: in the next R script the host for remote run has a shell script link which specifies MPI Directory env.variable, but firstly look at the shell script content:

  # /home/user/MPIEnable.sh content
  export PhoenixMPIDir64=/lib64/openmpi/  
  export PML_BIN_DIR=RHEL8

In such case it is not necessary to specify PhoenixMPIDir64 in .bashrc. Also additional environment variable PML_BIN_DIR is specified to guide scripts what kind of libraries should be linked: for Ubuntu 22.04 it should be PML_BIN_DIR=UBUNTU2204, for RHEL8 it is not necessary to specify PML_BIN_DIR or use PML_BIN_DIR=RHEL8.

A host with MPI parallelization is the best to be used for fitting complex models with many thetas/omegas/ODEs involved. By subject parallelization will be performed for all but QRPEM engine where parallelization by samples is done. Parallelization could timely decrease the time required for fitting.

# example model
model <- pkemaxmodel(
  parameterization = "Clearance",
  data = pkpdData,
  Time = "Time",
  ID = "ID",
  A1 = "Dose",
  C1Obs = "CObs",
  EObs = "EObs"
)

remoteMPIHost <- hostParams(
  sharedDirectory = "/home/user/temp",                 # a directory where the model files are stored
  installationDirectory = "/home/user/InstallDirNLME", # a directory where NLME Engine is unzipped
  hostName = "remoteMPIHost",                          # an internal name of the host, not used in computations
  machineName = "192.168.1.1",                         # network address used for ssh session
  hostType = "Linux",                                  # remote host OS; currently Linux only supported
  numCores = 16,                                       # number of cores
  isLocal = FALSE,                                     # FALSE since the remote run is used
  rLocation = "/usr/local/bin",                        # R path to be used for execution
  scriptPath = "/home/user/MPIEnable.sh",              # path to the script to be sourced
  userName = "user",                                   # user account on the remote host
  userPassword = "password",                           # user password on the remote host
  parallelMethod = "MPI"                               # MPI parallelization for the current host
)

# when ready, the host could be used for remote runs:
# fitmodelResults <- fitmodel(model = model, 
                              hostPlatform = remoteMPIHost)

After execution, the results will be available as a list in the variable given and downloaded in the model directory.

In the current example also a private SSH key is used instead of password. Please refer to your server administrator for keyfile generation. General information could be found here and ssh package documentation.

If something goes wrong and console output is not informative, useful information can be found in the local NlmeRemote.LOG file.

Multicore Remote Host

For type of runs where multiple jobs could be performed in parallel, the user may start multiple jobs at once. In doing so, the user delegates to the system a job distribution process across all available cores. Types of runs to be used with Multicore Remote Host:

  • sortfit

  • shotgunSearch

  • stepwiseSearch

  • bootstrap

input_data <- pkData

# Add gender code column
input_data$GenderCode = 0
input_data$GenderCode[input_data$Gender == "male"] = 1

2CMTModel <-
  pkmodel(
    numCompartments = 2,
    data = input_data,
    ID = "Subject",
    Time = "Act_Time",
    A1 = "Amount",
    CObs = "Conc"
  )
  
2CMTModel <- 
  addCovariate(
    2CMTModel,
    covariate = "Gender",
    type = "Categorical",
    effect = c("V2", "Cl2"),
    levels = c(0, 1),
    labels = c("Female", "Male")
  )
  
2CMTModel <- 
  addCovariate(
    2CMTModel,
    covariate = "BodyWeight",
    direction = "Backward",
    center = "Mean",
    effect = c("V", "Cl")
  )

CovariateEffectNames <-
  listCovariateEffectNames(2CMTModel)
combinations <-
  combn(c("", CovariateEffectNames),
        length(CovariateEffectNames),
        simplify = FALSE)

scenarioNames <-
  lapply(combinations,
         function(x) {paste(x, collapse = " ")})

scenarios <-
  lapply(scenarioNames,
         function(x, CovariateEffectNames) {
           CovariateCombinations <- unlist(strsplit(x, " ", fixed = TRUE))
           scenarioIndex <-
             paste(which(CovariateEffectNames %in% CovariateCombinations,
                         arr.ind = TRUE),
                   collapse = ", ")
           NlmeScenario(trimws(x), scenarioIndex)
         },
         CovariateEffectNames)

remoteMULTICOREHost <- hostParams(
  sharedDirectory = "/home/user/temp",                 # a directory where the model files are stored
  installationDirectory = "/home/user/InstallDirNLME", # a directory where NLME Engine is unzipped
  hostName = "remoteMULTICOREHost",                    # an internal name of the host, not used in computations
  machineName = "192.168.1.1",                         # network address used for ssh session
  hostType = "Linux",                                  # remote host OS; currently Linux only supported
  numCores = 16,                                       # number of parallel jobs to be executed
  isLocal = FALSE,                                     # FALSE since the remote run is used
  rLocation = "/usr/local/bin",                        # R path to be used for execution
  userName = "user",                                   # user account on the remote host
  privateKeyFile = "D:/private/puttykey.key",          # keyfile in OpenSSH format
  parallelMethod = "Multicore"                         # Multicore parallelization for the current host
)

# when ready, the host could be used for remote runs:
# res <-
#   sortfit(2CMTModel,
#           hostPlatform = remotemulticoreHost,
#           sortColumns = SortColumns("Gender"),
#           scenarios = scenarios)

Grid Remote Host

If one of the supported grid systems is installed on the remote host (e.g.,"Torque", "SGE", "LSF", "SLURM"), you may execute the job on the grid by specifying applicable option for the argument parallelMethod. See ?hostParams for available options.

remoteSLURMHost <- hostParams(
  sharedDirectory = "/home/user/temp",                 # a directory where the model files are stored
  installationDirectory = "/home/user/InstallDirNLME", # a directory where NLME Engine is unzipped
  hostName = "remoteSLURMHost",                       # an internal name of the host, not used in computations
  machineName = "192.168.1.2",                         # network address used for ssh session
  hostType = "Linux",                                  # remote host OS; currently Linux only supported
  numCores = 16,                                       # number of nodes to be used
  isLocal = FALSE,                                     # FALSE since the remote run is used
  rLocation = "/usr/local/bin",                        # R path to be used for execution
  scriptPath = "/home/user/SLURMEnable.sh",           # path to the script - i.e., grid initialization
  userName = "user",                                   # user account on the remote host
  privateKeyFile = "D:/private/SLURMkey.key"),        # keyfile in OpenSSH format
  parallelMethod = "SLURM"                            # TORQUE parallelization for the current host
)

# when ready, the host could be used for remote runs:
# shotgunSearchResults <- shotgunSearch(model = 2CMTModel, 
#                                       hostPlatform = remoteSLURMHost)

remoteSGEHost <- hostParams(
  sharedDirectory = "/home/user/temp",                 # a directory where the model files are stored
  installationDirectory = "/home/user/InstallDirNLME", # a directory where NLME Engine is unzipped
  hostName = "remoteSGEHost",                          # an internal name of the host, not used in computations
  machineName = "192.168.1.3",                         # network address used for ssh session
  hostType = "Linux",                                  # remote host OS; currently Linux only supported
  numCores = 16,                                       # number of nodes to be used
  isLocal = FALSE,                                     # FALSE since the remote run is used
  rLocation = "/usr/local/bin",                        # R path to be used for execution
  scriptPath = "/home/user/SGEEnable.sh",              # path to the script - i.e., grid initialization
  userName = "user",                                   # user account on the remote host
  privateKeyFile = "D:/private/SGEkey.key"),           # keyfile in OpenSSH format
  parallelMethod = "SGE"                               # SGE parallelization for the current host
)

# stepwiseSearch(model = model, 
#                hostPlatform = remoteSGEHost)

Note: Do not overload the Grid with numCores argument exceeding the number of nodes available for the current grid, otherwise uninformative errors may be given.

Grid Remote Host with MPI parallelization

For large datasets with many replicates, where each replicate requires extended processing time, it is useful to start with by job/by subject parallelization simultaneously.

# parallelization by subjects and by replicates
# could be used for bootstrap, covariate search or sortfit

remoteSLURMMPIHost <- hostParams(
  sharedDirectory = "/home/user/temp",                 # a directory where the model files are stored
  installationDirectory = "/home/user/InstallDirNLME", # a directory where NLME Engine is unzipped
  hostName = "remoteSLURMMPIHost",                     # an internal name of the host, not used in computations
  machineName = "192.168.1.3",                         # network address used for ssh session
  hostType = "Linux",                                  # remote host OS; currently Linux only supported
  numCores = 16,                                       # number of cores available
  isLocal = FALSE,                                     # FALSE since the remote run is used
  rLocation = "/usr/local/bin",                        # R path to be used for execution
  scriptPath = "/home/user/SLURM_MPIEnable.sh",        # optional path to the script - i.e., grid initialization
                                                       # and PhoenixMPIDir64 export (if not done in .bash_profile,
                                                       # see above)
  userName = "user",                                   # user account on the remote host
  userPassword = "passwd",                             # password for a grid account
  parallelMethod = "SGE_MPI"                           # SLURM and MPI parallelization for the current host
)

# when ready, the host could be used for remote runs:
# bootstrapResults <- bootstrap(model = model, 
                                hostPlatform = remoteSLURMMPIHost,
                                numReplicates = 1000)

RsNLME will try to find optimal nodes/MPI cores by node distribution taking into account the number of subjects in the jobs and the number of jobs. The number of cores to be used for each replicate in parallelization will be calculated as the smaller of the following 2 numbers:

1). the number of cores available divided by the number of replicates or

2). the number of unique subjects in a specific replicate divided by 3.

Example 1: There are 300 cores available, the user requests 2000 replicates, and there are 200 unique subjects. Each of the 2000 replicates would parallelize across 1 core (300/2000 < 1. 200/3 = 66. 1 < 66). Total cores used = 300.

Example 2: There are 1000 cores available, the user requests 200 replicates, and there are 300 unique subjects. Each of the 200 replicates would parallelize across 5 cores (1000/200 = 5. 300/3 = 100. 5 < 100). Total cores used = 1000.