remote_execution.Rmd
The purpose of this vignette is to demonstrate how to utilize the
suite of R packages developed by Certara, RsNLME
, for a
remote execution.
There are several prerequisites to perform remote execution described below: Client Prerequisites and Server Prerequisites.
Only following OSes are supported:
Windows 8, 10
Windows Server 2018, 2019
Linux CentOS8/RHEL8
Certara.RsNLME
package should be installed on the Client
machine within NLME engine. Please see RsNLME
Installation Guide for details. Command line interface requires
installation of the Certara.RsNLME
R package and its
associated dependencies. Other R packages developed by Certara may be
installed but are not required.
Local or node license is required on the both local and remote hosts. Please see Licensing.
Note: There are no additional costs to acquire the local license if you have already licensed RsNLME for remote execution.
Please see RsNLME Installation Guide, Installation of NLME Engine and Installation of R Packages chapters.
Note: The Certara.NLME8
package, a dependency that
is automatically installed with Certara.RsNLME
, is the only
package that is required on the remote host. The user may simply install
this package without installing the entire suite of RsNLME packages on
the remote host.
# the code should be executed on remote host
install.packages("Certara.NLME8",
repos = c("https://certara.jfrog.io/artifactory/certara-cran-release-public/",
"https://cloud.r-project.org"))
License required on the server side. Please see Licensing.
For remote runs licensing, please use one of the following:
Put the license file lservrc to the directory with NLME Engine files ($INSTALLDIR).
Put the license file lservrc to the directory on the server and
specify the path in licenseFile
argument of the
hostParams()
function (see an example below).
Put the license file lservrc to the directory on the server and
specify PhoenixLicenseFile
env.variable which points to the
license file in ~/.bashrc
configuration file for the
current user (the environment variable should be available in
non-interactive sessions).
Put the license file lservrc to the directory on the server and
specify PhoenixLicenseFile
env.variable which points to the
license file in the shell script (please check the X flag) to be added
in the argument list.
Specify PhoenixLicenseServer
env.variable which
points to the license server running.
Simple remote host is designed for using with the jobs where parallelization is not required (i.e. simple fit of the model with few fixed effects/omegas). The following code should be modified before execution on the local host. It creates a host which will be used for connection.
# example model
model <- emaxmodel(
checkBaseline = TRUE,
checkFractional = TRUE,
checkInhibitory = TRUE,
data = pkpdData,
ID = "ID",
C = "CObs",
EObs = "EObs"
)
remoteSimpleHost <- hostParams(
sharedDirectory = "/home/user/temp", # a directory where the model files are stored,
# should exist before run
installationDirectory = "/home/user/InstallDirNLME", # a directory where NLME Engine is unzipped
licenseFile = "/home/user/InstallDirNLME/lservrc", # a license file; note that the file is inside $INSTALLDIR,
# so it is not necessary to specify it here
hostName = "remoteSimpleHost", # an internal name of the host, not used in computations
machineName = "192.168.1.1", # network address used for ssh session
hostType = "Linux", # remote host OS; currently Linux only supported
numCores = 1, # number of cores
isLocal = FALSE, # FALSE since the remote run is used
rLocation = "/usr/local/bin", # R path to be used for execution
userName = "user", # user account on the remote host
userPassword = "password", # user password on the remote host
parallelMethod = "None" # no parallelization methods for the current host
)
# note that shortcuts as tilde(~) should be avoided in remote paths
# when ready, the host could be used for remote runs:
# fitmodelResults <- fitmodel(model = model, hostPlatform = remoteSimpleHost)
RsNLME will send all necessary files to the remote directory specified, will report current iterations status, will download resulted files from remote directory when ready and will load the resulted data frames.
Next, we’ll modify the host to use openMPI installed on the system: in the next R script the host for remote run has a shell script link which specifies MPI Directory env.variable, but firstly look at the shell script content:
# /home/user/MPIEnable.sh content
export PhoenixMPIDir64=/lib64/openmpi/
In such case it is not necessary to specify
PhoenixMPIDir64
in .bashrc
.
A host with MPI parallelization is the best to be used for fitting complex models with many thetas/omegas/ODEs involved. By subject parallelization will be performed for all but QRPEM engine where parallelization by samples is done. Parallelization could timely decrease the time required for fitting.
# example model
<- pkemaxmodel(
model parameterization = "Clearance",
data = pkpdData,
Time = "Time",
ID = "ID",
A1 = "Dose",
C1Obs = "CObs",
EObs = "EObs"
)
<- hostParams(
remoteMPIHost sharedDirectory = "/home/user/temp", # a directory where the model files are stored
installationDirectory = "/home/user/InstallDirNLME", # a directory where NLME Engine is unzipped
hostName = "remoteMPIHost", # an internal name of the host, not used in computations
machineName = "192.168.1.1", # network address used for ssh session
hostType = "Linux", # remote host OS; currently Linux only supported
numCores = 16, # number of cores
isLocal = FALSE, # FALSE since the remote run is used
rLocation = "/usr/local/bin", # R path to be used for execution
scriptPath = "/home/user/MPIEnable.sh", # path to the script to be sourced
userName = "user", # user account on the remote host
userPassword = "password", # user password on the remote host
parallelMethod = "MPI" # MPI parallelization for the current host
)
# when ready, the host could be used for remote runs:
# fitmodelResults <- fitmodel(model = model,
= remoteMPIHost) hostPlatform
After execution, the results will be available as a list in the variable given and downloaded in the model directory.
In the current example also a private SSH key is used instead of password. Please refer to your server administrator for keyfile generation. General information could be found here and ssh package documentation.
If something goes wrong and console output is not informative, useful information can be found in the local NlmeRemote.LOG file.
For type of runs where multiple jobs could be performed in parallel, the user may start multiple jobs at once. In doing so, the user delegates to the system a job distribution process across all available cores. Types of runs to be used with Multicore Remote Host:
sortfit
shotgunSearch
stepwiseSearch
bootstrap
<- pkData
input_data
# Add gender code column
$GenderCode = 0
input_data$GenderCode[input_data$Gender == "male"] = 1
input_data
<-
2CMTModel pkmodel(
numCompartments = 2,
data = input_data,
ID = "Subject",
Time = "Act_Time",
A1 = "Amount",
CObs = "Conc"
)
<-
2CMTModel addCovariate(
2CMTModel,covariate = "Gender",
type = "Categorical",
effect = c("V2", "Cl2"),
levels = c(0, 1),
labels = c("Female", "Male")
)
<-
2CMTModel addCovariate(
2CMTModel,covariate = "BodyWeight",
direction = "Backward",
center = "Mean",
effect = c("V", "Cl")
)
<-
CovariateEffectNames listCovariateEffectNames(2CMTModel)
<-
combinations combn(c("", CovariateEffectNames),
length(CovariateEffectNames),
simplify = FALSE)
<-
scenarioNames lapply(combinations,
function(x) {paste(x, collapse = " ")})
<-
scenarios lapply(scenarioNames,
function(x, CovariateEffectNames) {
<- unlist(strsplit(x, " ", fixed = TRUE))
CovariateCombinations <-
scenarioIndex paste(which(CovariateEffectNames %in% CovariateCombinations,
arr.ind = TRUE),
collapse = ", ")
NlmeScenario(trimws(x), scenarioIndex)
},
CovariateEffectNames)
<- hostParams(
remoteMULTICOREHost sharedDirectory = "/home/user/temp", # a directory where the model files are stored
installationDirectory = "/home/user/InstallDirNLME", # a directory where NLME Engine is unzipped
hostName = "remoteMULTICOREHost", # an internal name of the host, not used in computations
machineName = "192.168.1.1", # network address used for ssh session
hostType = "Linux", # remote host OS; currently Linux only supported
numCores = 16, # number of parallel jobs to be executed
isLocal = FALSE, # FALSE since the remote run is used
rLocation = "/usr/local/bin", # R path to be used for execution
userName = "user", # user account on the remote host
privateKeyFile = "D:/private/puttykey.key", # keyfile in OpenSSH format
parallelMethod = "Multicore" # Multicore parallelization for the current host
)
# when ready, the host could be used for remote runs:
# res <-
# sortfit(2CMTModel,
# hostPlatform = remotemulticoreHost,
# sortColumns = SortColumns("Gender"),
# scenarios = scenarios)
If one of the supported grid systems is installed on the remote host
(e.g.,"Torque"
, "SGE"
, "LSF"
),
you may execute the job on the grid by specifying applicable option for
the argument parallelMethod
. See ?hostParams
for available options.
<- hostParams(
remoteTORQUEHost sharedDirectory = "/home/user/temp", # a directory where the model files are stored
installationDirectory = "/home/user/InstallDirNLME", # a directory where NLME Engine is unzipped
hostName = "remoteTORQUEHost", # an internal name of the host, not used in computations
machineName = "192.168.1.2", # network address used for ssh session
hostType = "Linux", # remote host OS; currently Linux only supported
numCores = 16, # number of nodes to be used
isLocal = FALSE, # FALSE since the remote run is used
rLocation = "/usr/local/bin", # R path to be used for execution
scriptPath = "/home/user/TORQUEEnable.sh", # path to the script - i.e., grid initialization
userName = "user", # user account on the remote host
privateKeyFile = "D:/private/TORQUEkey.key"), # keyfile in OpenSSH format
= "TORQUE" # TORQUE parallelization for the current host
parallelMethod )
# when ready, the host could be used for remote runs:
# shotgunSearchResults <- shotgunSearch(model = 2CMTModel,
# hostPlatform = remoteMULTICOREHost)
<- hostParams(
remoteSGEHost sharedDirectory = "/home/user/temp", # a directory where the model files are stored
installationDirectory = "/home/user/InstallDirNLME", # a directory where NLME Engine is unzipped
hostName = "remoteSGEHost", # an internal name of the host, not used in computations
machineName = "192.168.1.3", # network address used for ssh session
hostType = "Linux", # remote host OS; currently Linux only supported
numCores = 16, # number of nodes to be used
isLocal = FALSE, # FALSE since the remote run is used
rLocation = "/usr/local/bin", # R path to be used for execution
scriptPath = "/home/user/SGEEnable.sh", # path to the script - i.e., grid initialization
userName = "user", # user account on the remote host
privateKeyFile = "D:/private/SGEkey.key"), # keyfile in OpenSSH format
= "SGE" # SGE parallelization for the current host
parallelMethod )
# stepwiseSearch(model = model,
# hostPlatform = remoteSGEHost)
Note: Do not overload the Grid with numCores
argument exceeding the number of nodes available for the current grid,
otherwise uninformative errors may be given.
For large datasets with many replicates, where each replicate requires extended processing time, it is useful to start with by job/by subject parallelization simultaneously.
# parallelization by subjects and by replicates
# could be used for bootstrap, covariate search or sortfit
<- hostParams(
remoteSGEMPIHost sharedDirectory = "/home/user/temp", # a directory where the model files are stored
installationDirectory = "/home/user/InstallDirNLME", # a directory where NLME Engine is unzipped
hostName = "remoteSGEMPIHost", # an internal name of the host, not used in computations
machineName = "192.168.1.3", # network address used for ssh session
hostType = "Linux", # remote host OS; currently Linux only supported
numCores = 16, # number of cores available
isLocal = FALSE, # FALSE since the remote run is used
rLocation = "/usr/local/bin", # R path to be used for execution
scriptPath = "/home/user/SGE_MPIEnable.sh", # path to the script - i.e., grid initialization
# and PhoenixMPIDir64 export (if not done in .bash_profile)
userName = "user", # user account on the remote host
privateKeyFile = "D:/private/SGEEkey.key", # keyfile in OpenSSH format
parallelMethod = "SGE_MPI" # SGE and MPI parallelization for the current host
)
# when ready, the host could be used for remote runs:
# bootstrapResults <- bootstrap(model = model,
= remoteSGEMPIHost,
hostPlatform = 1000) numReplicates
RsNLME will try to find optimal nodes/MPI cores by node distribution taking into account the number of subjects in the jobs and the number of jobs. The number of cores to be used for each replicate in parallelization will be calculated as the smaller of the following 2 numbers:
1). the number of cores available divided by the number of replicates or
2). the number of unique subjects in a specific replicate divided by 3.
Example 1: There are 300 cores available, the user requests 2000 replicates, and there are 200 unique subjects. Each of the 2000 replicates would parallelize across 1 core (300/2000 < 1. 200/3 = 66. 1 < 66). Total cores used = 300.
Example 2: There are 1000 cores available, the user requests 200 replicates, and there are 300 unique subjects. Each of the 200 replicates would parallelize across 5 cores (1000/200 = 5. 300/3 = 100. 5 < 100). Total cores used = 1000.