Setup

Before starting, load the ggplot2 and ggcertara packages, and set the default ggplot theme to theme_certara():

suppressPackageStartupMessages({
  library(ggplot2)
  library(ggcertara)
  library(kableExtra) #used for the vignette not for ggcertara demonstration
})
theme_set(theme_certara())

Overview

The ggcertara package and theme_certara theme are meant to provide a standardized look to typical gof plots employed by pharmacometricians. The syntax is based on ggplot2. The ggcertara also utilizes the patchwork package and syntax. These will be touched upon later in this vignette.

Data:

Data Expectations

The gof() function uses standard column names in lower case. The following column names are recognized by the functions in this package:

time: Time (e.g., time since first dose)
tad: Time after dose
dv: Dependent variable
pred: Population prediction
ipred: Individual prediction
cwres: Conditional weighted residual
iwres: Individual weighted residual

Not all of these need to be present, only those that are needed for the disired plots (for instance, if no plots involving iwres are requested, then it does not need to be present). The column names must be in lower case.

Reading:

The gof_read_data() function simplifies the data loading process by:

loading the data
converting all names to lower case
checking for the presence of an mdv column and, if present, filtering the rows to keep only those with mdv = 0

For this vignette we will be using some sample data that is provided with the package. The data is contained in a file named nmtable.csv, that was actually generated by NONMEM using a $TABLE statement (i.e., $TABLE FILE=nmtable.csv FORMAT=,1PE15.8 ...). It’s contents look like this:

ID	TIME	DV	PRED	IPRED	RES	WRES	CWRES	CIWRES	IRES	IWRES	MDV	EVID	TAD	CL	V	KA	ETA1	ETA2	ETA3
1	0.0000000	0.00000	0.00000	0.00000	0.0000000	0.0000000	0.0000000	0.0000000	0.0000000	0.0000000	1	1	0.0000000	3.311365	10.26801	2.860697	-0.144312	-0.1010664	0.0747987
1	0.1107401	16.84466	14.39002	16.98482	2.4546404	0.0413047	0.0306011	-0.1043802	-0.1401575	-0.1038259	0	0	0.1107401	3.311365	10.26801	2.860697	-0.144312	-0.1010664	0.0747987
1	0.2493163	33.22216	26.62746	31.08233	6.5946926	1.2657456	1.1250718	0.8657021	2.1398227	0.8661927	0	0	0.2493163	3.311365	10.26801	2.860697	-0.144312	-0.1010664	0.0747987
1	0.4983335	43.48446	38.21232	43.90453	5.2721396	0.2099566	0.1983909	-0.1206951	-0.4200702	-0.1203824	0	0	0.4983335	3.311365	10.26801	2.860697	-0.144312	-0.1010664	0.0747987
1	0.8483932	48.27355	42.64485	48.29827	5.6286966	0.3379642	0.3135719	-0.0066225	-0.0247160	-0.0064387	0	0	0.8483932	3.311365	10.26801	2.860697	-0.144312	-0.1010664	0.0747987
1	2.1580771	32.07656	31.68700	35.66761	0.3895546	-1.0379582	-0.9256054	-1.2668550	-3.5910498	-1.2667701	0	0	2.1580771	3.311365	10.26801	2.860697	-0.144312	-0.1010664	0.0747987

In a normal project workflow, the location of the appropriate file would be clear. For the vignette, the file can be located as follows:

file <- system.file(package="ggcertara", "sample-data/nmtable.csv")

Now we load the file:

dat <- gof_read_data(file)

Examining the data, we can see that the names are lower case and the rows have been filtered such that mdv = 0:

head(dat)

#>   id      time       dv     pred    ipred       res        wres      cwres       ciwres        ires        iwres mdv evid       tad
#> 2  1 0.1107401 16.84466 14.39002 16.98482 2.4546404  0.04130471  0.0306011 -0.104380162 -0.14015753 -0.103825922   0    0 0.1107401
#> 3  1 0.2493163 33.22216 26.62746 31.08233 6.5946926  1.26574562  1.1250718  0.865702143  2.13982267  0.866192653   0    0 0.2493163
#> 4  1 0.4983335 43.48446 38.21232 43.90453 5.2721396  0.20995661  0.1983909 -0.120695079 -0.42007023 -0.120382378   0    0 0.4983335
#> 5  1 0.8483932 48.27355 42.64485 48.29827 5.6286966  0.33796419  0.3135719 -0.006622542 -0.02471603 -0.006438689   0    0 0.8483932
#> 6  1 2.1580771 32.07656 31.68700 35.66761 0.3895546 -1.03795820 -0.9256054 -1.266855000 -3.59104976 -1.266770110   0    0 2.1580771
#> 7  1 3.9157063 22.78889 17.64876 20.31905 5.1401243  2.03836621  1.8503260  1.529135290  2.46983532  1.529378070   0    0 3.9157063
#>         cl        v       ka      eta1       eta2       eta3
#> 2 3.311365 10.26801 2.860697 -0.144312 -0.1010664 0.07479871
#> 3 3.311365 10.26801 2.860697 -0.144312 -0.1010664 0.07479871
#> 4 3.311365 10.26801 2.860697 -0.144312 -0.1010664 0.07479871
#> 5 3.311365 10.26801 2.860697 -0.144312 -0.1010664 0.07479871
#> 6 3.311365 10.26801 2.860697 -0.144312 -0.1010664 0.07479871
#> 7 3.311365 10.26801 2.860697 -0.144312 -0.1010664 0.07479871

table(dat$mdv)

#> 
#>   0 
#> 831

Basic usage

The basic functionality is accessed through the gof() function:

gof(dat) #Note: output aspect ratio is set to 3:2, in this case width=9, height=6

By default, the gof() function produces a figure with 6 panels arranged in a 2-by-3 grid.

Column_1	Column_2	Column_3
DV vs. IPRED	CWRES vs. TIME	CWRES vs. TAD
DV vs. PRED	CWRES vs. PRED	\|IWRES\| vs. IPRED

Note that all axes are nicely labelled (though they lack units at this point). With this simple function call, we have produced a report ready figure!

Customizing the output

Selecting panels

The gof() function can easily produce a variety of panels which can be referenced as either numbered panels, named functions, or as strings.

Number	Function	Name	Description
1	gof_cwres_histogram()	cwres_histogram	Histogram of CWRES
2	gof_cwres_qqplot()	cwres_qqplot	QQ-plot of CWRES
3	gof_dv_vs_ipred()	dv_vs_ipred	DV vs. IPRED
4	gof_dv_vs_pred()	dv_vs_pred	DV vs. PRED
5	gof_cwres_vs_pred()	cwres_vs_pred	CWRES vs. PRED
6	gof_cwres_vs_time()	cwres_vs_time	CWRES vs. TIME
7	gof_cwres_vs_tad()	cwres_vs_tad	CWRES vs. TAD
8	gof_absiwres_vs_ipred()	absiwres_vs_ipred	\|IWRES\| vs. IPRED
9	gof_absiwres_vs_time()	absiwres_vs_time	\|IWRES\| vs. TIME
10	gof_absiwres_vs_tad()	absiwres_vs_tad	\|IWRES\| vs. TAD
11, -3	gof_dv_vs_ipred(log_xy=T)	dv_vs_ipred_log	DV vs. IPRED log scale
12, -4	gof_dv_vs_pred(log_xy=T)	dv_vs_pred_log	DV vs. PRED log scale
13, -5	gof_cwres_vs_pred(log_x=T)	cwres_vs_pred_log	CWRES vs. PRED log scale
14, -6	gof_cwres_vs_time(log_x=T)	cwres_vs_time_log	CWRES vs. TIME log scale
15, -7	gof_cwres_vs_tad(log_x=T)	cwres_vs_tad_log	CWRES vs. TAD log scale
16, -8	gof_absiwres_vs_ipred(log_x=T)	absiwres_vs_ipred_log	\|IWRES\| vs. IPRED log scale
17, -9	gof_absiwres_vs_time(log_x=T)	absiwres_vs_time_log	\|IWRES\| vs. TIME log scale
18, -10	gof_absiwres_vs_tad(log_x=T)	absiwres_vs_tad_log	\|IWRES\| vs. TAD log scale

Note that the standard 6 plots in the gof() function correspond to the numbered panels 3, 6, 7, 4, 5, 8 from this list.

Calling the `panels` option within `gof()`

Using the panels argument, select panels can be selected using either the respective panel numbers or names (as strings). For example, to prduce both a histogram and QQ-plot of CWRES, select panels 1 and 2 or callthem by name cwres_histogram and cwres_qqplot:

gof(dat, panels=1:2)

gof(dat, panels=c('cwres_histogram', 'cwres_qqplot'))

For log-scale plots (panels 11-18 above), we can call their panel number or add a - to their linear scale counterpart. For example, to obtain a plot of DV vs. IPRED on log-log scale, we can set panel=11 or panel=-3:

gof(dat, panels=c(11,-3))

We can use the layout option to specify the number of rows and columns in a multi-panel output.

gof(dat, panels=c(3, -3, 4, -4, 5, 6), layout = c(3, 2))

Named function:

We can also call specific panels using the corresponding named function. Log variants of the plot can be obtained by specifying either log_x=T or log_xy=T depending on the function.

gof_dv_vs_ipred(dat)

gof_dv_vs_ipred(dat, log_xy=T)

The `gof_list` object

Individual panels can also be with the gof_list() function. This returns a numbered list of the 6 standard panels by default, or all panels if all=T is specified:

p <- gof_list(dat, all=T)
p

#> List of gof plots:
#> 
#>   1. cwres_histogram
#>   2. cwres_qqplot
#>   3. dv_vs_ipred
#>   4. dv_vs_pred
#>   5. cwres_vs_pred
#>   6. cwres_vs_time
#>   7. cwres_vs_tad
#>   8. absiwres_vs_ipred
#>   9. absiwres_vs_time
#>   10. absiwres_vs_tad
#>   11. dv_vs_ipred_log
#>   12. dv_vs_pred_log
#>   13. cwres_vs_pred_log
#>   14. cwres_vs_time_log
#>   15. cwres_vs_tad_log
#>   16. absiwres_vs_ipred_log
#>   17. absiwres_vs_time_log
#>   18. absiwres_vs_tad_log

Individual panels can be extracted from the list:

p[[5]]

gof_list can also be used to build a new list of plots, by specifying empty=T:

p <- gof_list(empty = T)
p[[1]] = gof_dv_vs_ipred(dat)
p[[2]] = gof_dv_vs_pred(dat)

A composite plot can be obtained from a list using the gof_layout() function:

gof_layout(p)

Labels and units

There are several ways to customize the plot axis labels/units. When a recognized variable name is used in a plot (e.g. ipred, cwres, etc.), the functions will find a corresponding axis label in a list object which by default corresponds to a global option:

Option	Default
gof.label.dv	Observed Value
gof.label.pred	Population Prediction
gof.label.ipred	Individual Prediction
gof.label.cwres	Conditional Weighted Residual
gof.label.iwres	Individual Weighted Residual
gof.label.time	Time
gof.label.tad	Time After Dose
gof.units.dv	NA
gof.units.time	NA
gof.units.tad	NA

Every function has a labels argument to override the default, should it be necessary. Also note when updating labels/units, it is not required to specify all labels/units, only the labels/units that are being updated need be specified.

Global label/units update

We can update labels and units at the global level by calling the desired option within the options function. Units for concentration (dv, pred, ipred), time, and tad can also be set using global options and will be appended to the label:

options(gof.label.cwres = 'CWRES',
        gof.label.dv = 'MyDrug',
        gof.units.dv="ng/mL",
        gof.units.time="h",
        gof.units.tad="h")
gof(dat)

Update labels for a single plot

We can update a default label for a single plot by calling the label option as follows.

gof(dat, labels = gof_labels(dv='Obs'))

Finally, the usual labs() function can be used to set or modify the labels of any plot. In this case, units do not get applied and must be explicitly included in the labels.

gof(dat, 6) + labs(x="Something silly", y="Why not? (light years)")

Now let’s restore our defaults before moving on…

# Restore defaults
options(gof.label.dv    = "Observed Value",
        gof.label.pred  = "Population Prediction",
        gof.label.ipred = "Individual Prediction",
        gof.label.cwres = "Conditional Weighted Residual",
        gof.label.iwres = "Individual Weighted Residual",
        gof.label.time  = "Time",
        gof.label.tad   = "Time After Dose",
        gof.units.dv=NA,
        gof.units.time=NA,
        gof.units.tad=NA
)

Outliers

To highlight outliers using a special symbol (and color), you first define a column in the dataset which is a factor with 2 levels (be mindful of the labels, as these will appear in the legend and also note the use of unicode language for special symbols). The sample data comes from a simulation, and doesn’t have any outliers, but for demonstration purposes we will pretend that |CWRES| > 2.5 is a criterion for defining outliers:

dat$outlier <- factor(abs(dat$cwres) > 2.5, labels=c("|CWRES| \U{2264} 2.5", "|CWRES| > 2.5"))

Then, use the highlight option, like this:

gof_cwres_vs_pred(dat, highlight=outlier) +
  scale_y_continuous(breaks = c(-2.5,0,2.5))

This feature should be used carefully. Note that highlighting values also populates a legend at the bottom, left corner of the plot area. It works for multiple panels too:

gof(dat, highlight=outlier)

Faceting

Faceting can be applied with facet_wrap or facet_grid as with any ggplot object. For this example, we will augment our data with some covariate information, specifically sex and study information. In the same folder as the dataset nmtable.csv is a file pkpop.csv that contains covariate information on PK population.

file <- system.file(package="ggcertara", "sample-data/pkpop.csv")

pkpop <- read.csv(file)
head(pkpop)

#>   id sex age    wt study
#> 1  1   1  59  65.4     1
#> 2  2   0  37  64.3     1
#> 3  3   1  38  58.3     1
#> 4  4   0  31  73.3     1
#> 5  5   0  40 109.3     1
#> 6  6   1  79  56.5     1

Note that there is one row for each id in the dataset. We need to link this to our goodness-of-fit data by matching on the id column. There are many ways to do this (merge(), left_join(), …), but a simple way is to use the basic match() function:

i <- match(dat$id, pkpop$id)
dat$sex   <- pkpop$sex[i]
dat$study <- pkpop$study[i]

We have now added sex and study to our dataset, but in order to get nice, informative labels, we should convert these to factor:

dat$sex = factor(dat$sex, levels=0:1, labels=c('Female', 'Male'))
dat$study = factor(dat$study, levels=1:3, labels=c('Study A', 'Study B', 'Study C'))

Now, we can produce a plot of DV vs. IPRED faceted by sex as follows:

gof(dat, panels = 3) + facet_wrap(~sex)

Changing aesthetics (colors, etc.)

Global changes

Changes that will affect all plots can be made with the update_geom_defaults() function. For example, to change the alpha level used for points in all panels:

update_geom_defaults("point_c", list(alpha=0.05)) # update the global options
gof(dat)

update_geom_defaults("point_c", list(alpha=0.4)) # reset to default alpha

Note that a special custom geom, named geom_point_c(), is defined and used in this package, so that other ggplot code will not be affected.

Changes to individual plots

For a single panel, you can just modify the ggplot object. For example, let’s change the color and symbol of points in a DV vs. PRED plot to be different by study (which we already added to our dataset earlier):

update_geom_defaults("point_c", list(alpha=0.5, size=2.5)) # make points bigger and more opaque
gof(dat, panels=4) +
    aes(color=study, shape=study) +
    labs(color=NULL, shape=NULL) +
    scale_color_certara()

For a patchwork object (multiple panels), you can do the same thing by replacing the + operator with a &:

gof(dat, panels=3:4) &
    aes(color=study, shape=study) &
    labs(color=NULL, shape=NULL) &
    scale_color_certara()

update_geom_defaults("point_c", list(alpha=0.4, size=1.5)) # restore defaults

patchwork is smart enough to produce a single legend for both panels (note: the theme_certara() theme must be set globally, as it was at the beginning of this script, for this to display properly).

Adding geoms and annotations

As with any ggplot object, geoms can be added. Remember to use the & operator instead of + when creating multiple panels (this notation comes from the patchwork package).

For examples, we can add a dashed line with corresponding label showing the LLOQ to our DV vs. IPRED and PRED plots as follows:

 gof(dat, panels = -(3:4)) &
  geom_hline(yintercept=1.2, linetype="dashed") &
  geom_text(data=data.frame(x=Inf, y=1.2), aes(x=x, y=y),
    label="LLOQ = 1.2 ng/mL", hjust=1.1, vjust=-0.5, size=3)

As another example, we can add a blue linear regression line in addition to the red LOESS smooth to our panels:

gof(dat, panels=8:9) &
  geom_smooth(method="lm", color=certara_pal()[1], se=F, size=1)

R session information

sessionInfo()

#> R version 4.0.1 (2020-06-06)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18363)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] kableExtra_1.1.0 ggplot2_3.3.2    rmarkdown_2.2    ggcertara_0.3    devtools_2.3.0   usethis_1.6.1    nvimcom_0.9-102 
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.4.6         lattice_0.20-41      prettyunits_1.1.1    ps_1.3.3             assertthat_0.2.1     rprojroot_1.3-2     
#>  [7] digest_0.6.25        R6_2.4.1             backports_1.1.7      evaluate_0.14        httr_1.4.1           highr_0.8           
#> [13] pillar_1.4.4         rlang_0.4.6          rstudioapi_0.11      callr_3.4.3          Matrix_1.2-18        splines_4.0.1       
#> [19] desc_1.2.0           labeling_0.3         webshot_0.5.2        readr_1.3.1          stringr_1.4.0        munsell_0.5.0       
#> [25] compiler_4.0.1       xfun_0.14            pkgconfig_2.0.3      pkgbuild_1.0.8       mgcv_1.8-31          htmltools_0.5.0.9001
#> [31] tidyselect_1.1.0     tibble_3.0.1         roxygen2_7.1.1       fansi_0.4.1          viridisLite_0.3.0    crayon_1.3.4        
#> [37] dplyr_1.0.0          withr_2.2.0          grid_4.0.1           nlme_3.1-148         gtable_0.3.0         lifecycle_0.2.0     
#> [43] magrittr_1.5         scales_1.1.1         cli_2.0.2            stringi_1.4.6        farver_2.0.3         fs_1.4.1            
#> [49] remotes_2.1.1        testthat_2.3.2       xml2_1.3.2           ellipsis_0.3.1       generics_0.0.2       vctrs_0.3.0         
#> [55] tools_4.0.1          rcmdcheck_1.3.3      glue_1.4.1           purrr_0.3.4          hms_0.5.3            processx_3.4.2      
#> [61] pkgload_1.1.0        yaml_2.2.1           colorspace_1.4-1     xopen_1.0.0          sessioninfo_1.1.1    rvest_0.3.5         
#> [67] memoise_1.1.0        knitr_1.28           patchwork_1.0.1

Basic Goodness-of-fit Diagnostic Plots

Certara PMxO R Tools Workstream

17-Nov-2020

Setup

Overview

Data:

Data Expectations

Reading:

Basic usage

Customizing the output

Selecting panels

Calling the `panels` option within `gof()`

Named function:

The `gof_list` object

Labels and units

Global label/units update

Update labels for a single plot

Outliers

Faceting

Changing aesthetics (colors, etc.)

Global changes

Changes to individual plots

Adding geoms and annotations

R session information

Basic Goodness-of-fit Diagnostic Plots

Certara PMxO R Tools Workstream

17-Nov-2020

Setup

Overview

Data:

Data Expectations

Reading:

Basic usage

Customizing the output

Selecting panels

Calling the panels option within gof()

Named function:

The gof_list object

Labels and units

Global label/units update

Update labels for a single plot

Outliers

Faceting

Changing aesthetics (colors, etc.)

Global changes

Changes to individual plots

Adding geoms and annotations

R session information

Calling the `panels` option within `gof()`

The `gof_list` object