Description of the ePCR package

ePCR is an R-package intended for the survival analysis of advanced prostate cancer. This document is a basic introduction to the functionality of ePCR and a general overview to the possible analysis workflows for clinical trial or hospital registry cohorts. The approach leverages ensemble-driven usage of single Cox regression based regression models named ePCR, which was the top performing approach in the DREAM 9.5 Prostate Cancer Challenge (Guinney et al, 2017).

The latest version of ePCR is available in the Comprehensive R Archive Network CRAN. CRAN mirrors are by default available in the installation of R, and the ePCR package is installable using the R terminal command: install.packages("ePCR"). This should prompt the user to select a nearby CRAN mirror, after which the installation of ePCR and its dependencies are automatically performed. After the install.packages-call, the ePCR package can be loaded with either command library("ePCR").

The following notation is used in the document: R commands, package names and function names are written in typewriter font. The notation of format pckgName::funcName indicates that the function funcName is called from the package pckgName, which is prominently used in the underlying R code due to package namespaces. This document as well as other useful PDFs can be inspected using the browseVignettes function for any package in R.

Loading the example clinical cohorts into the R session

The ePCR-package is provided with two example hospital registry datasets. These datasets represent confidential hospital registry cohorts, to which kernel density estimation was fitted. Illustrative virtual patients were then generated from the kernel estimates and are provided here in the example datasets. Please see the accompanying ePCR publication for further details on the two Turku University Hospital cohorts (Laajala et al., 2018), and the Synapse site for DREAM 9.5 PCC for accessing the original DREAM data (Guinney, Wang, Laajala et al. 2017). The exemplifying datasets can be loaded into an R session using:

library(ePCR)

## 
## Attaching package: 'ePCR'

## The following object is masked from 'package:graphics':
## 
##     plot

## The following object is masked from 'package:base':
## 
##     plot

# Kernel density simulated patients from Turku University Hospital (TYKS)
# Data consists of TEXT cohort (text-search found patients) 
# and MEDI (patients identified using medication and few keywords)
data(TYKSSIMU)
# The following data matrices x and survival responses y become available
head(xTEXTSIMU); head(yTEXTSIMU)

##                BMI HEIGHTBL WEIGHTBL      ALP      ALT      AST    CA    CREAT
## TEXTSIMU1 27.16556      172     83.0 4.852030 3.044522 3.401197 2.305 3.951244
## TEXTSIMU2 27.16556      176     83.0 4.442651 3.258097 3.401197 2.310 4.644391
## TEXTSIMU3 29.35235      168     91.2 4.304065 2.708050 3.401197 2.305 4.394449
## TEXTSIMU4 24.80000      176     83.0 4.442651 2.944439 3.218876 2.330 4.465908
## TEXTSIMU5 27.20000      176     83.0 5.129899 2.944439 3.401197 2.310 3.891820
## TEXTSIMU6 27.16556      176     83.0 4.564348 1.609438 3.401197 2.305 4.204693
##             HB      LDH      NEU PLT       PSA    TBILI      TESTO      WBC
## TEXTSIMU1 11.3 5.265247 1.128171 323 3.4657359 2.197225 -0.1743534 2.001480
## TEXTSIMU2 12.6 5.265247 1.329710 216 4.6051702 2.197225 -0.1743534 2.332144
## TEXTSIMU3 13.5 5.265247 2.187174  83 3.8712010 2.197225 -0.1743534 1.856298
## TEXTSIMU4 12.7 5.273000 2.551006 189 0.3364722 3.135494  0.3364722 2.186051
## TEXTSIMU5 12.3 5.265247 1.329710 298 6.6720329 2.197225 -0.1743534 2.041220
## TEXTSIMU6 15.4 5.265247 1.329710 237 3.6505739 2.197225 -0.1743534 1.435085
##             CREACL NA.        MG      PHOS  ALB TPRO   RBC       LYM      BUN
## TEXTSIMU1 3.549617 137 -0.210721 0.1397619 34.8   67 4.830 0.3364722 2.475973
## TEXTSIMU2 3.549617 141 -0.210721 0.1397619 34.8   67 4.830 0.3364722 2.475973
## TEXTSIMU3 3.549617 135 -0.210721 0.1397619 29.5   67 4.185 0.3364722 2.397895
## TEXTSIMU4 3.549617 140 -0.210721 0.1397619 34.8   67 3.620 0.3364722 2.475973
## TEXTSIMU5 3.549617 140 -0.210721 0.1397619 34.8   67 4.120 0.3364722 2.475973
## TEXTSIMU6 3.549617 142 -0.210721 0.1397619 34.8   67 3.780 0.3364722 2.475973
##               CCRC      GLU SYSTOLICBP DIASTOLICBP PULSE HEMAT SPEGRA LYMperLEU
## TEXTSIMU1 3.703478 1.824549        136          76    72  0.43      0        24
## TEXTSIMU2 3.703478 1.840550        142          64    72  0.45      0        22
## TEXTSIMU3 3.703478 1.856298        111          76    72  0.38      0        22
## TEXTSIMU4 3.703478 1.856298        128          76    72  0.38      0        22
## TEXTSIMU5 3.703478 1.757858        142          76    69  0.38      0        22
## TEXTSIMU6 3.703478 1.856298        151          76    72  0.38      0        22
##           MONO MONOperLEU NEUperLEU POT BASOperLEU  EOS EOSperLEU TARGET
## TEXTSIMU1 0.62          9        63 4.1          1 0.17         0      0
## TEXTSIMU2 0.62          9        63 4.1          0 0.17         1      0
## TEXTSIMU3 0.62          9        63 4.1          0 0.19         2      0
## TEXTSIMU4 0.62          9        63 4.9          0 0.17         2      0
## TEXTSIMU5 0.62          9        63 3.7          0 0.17         2      0
## TEXTSIMU6 0.62          9        63 3.7          0 0.17         2      0
##           LYMPH_NODES KIDNEYS LUNGS LIVER PLEURA OTHER PROSTATE ORCHIDECTOMY
## TEXTSIMU1           0       0     0     0      0     0        0            1
## TEXTSIMU2           0       0     0     0      0     0        0            0
## TEXTSIMU3           0       0     0     0      0     0        0            0
## TEXTSIMU4           0       0     0     0      0     1        0            0
## TEXTSIMU5           0       0     0     1      0     0        0            0
## TEXTSIMU6           1       0     0     0      0     1        0            0
##           PROSTATECTOMY LYMPHADENECTOMY BILATERAL_ORCHIDECTOMY
## TEXTSIMU1             1               0                      1
## TEXTSIMU2             0               0                      0
## TEXTSIMU3             0               0                      0
## TEXTSIMU4             0               0                      0
## TEXTSIMU5             0               0                      0
## TEXTSIMU6             0               0                      0
##           PRIOR_RADIOTHERAPY ANALGESICS ANTI_ANDROGENS GLUCOCORTICOID
## TEXTSIMU1                  1          0              0              0
## TEXTSIMU2                  1          1              0              1
## TEXTSIMU3                  1          0              0              0
## TEXTSIMU4                  0          0              0              0
## TEXTSIMU5                  0          0              0              1
## TEXTSIMU6                  1          0              0              0
##           GONADOTROPIN BISPHOSPHONATE CORTICOSTEROID IMIDAZOLE ACE_INHIBITORS
## TEXTSIMU1            0              0              0         0              0
## TEXTSIMU2            0              0              0         0              0
## TEXTSIMU3            0              0              0         0              0
## TEXTSIMU4            0              0              0         0              0
## TEXTSIMU5            0              0              0         0              0
## TEXTSIMU6            0              0              0         0              0
##           BETA_BLOCKING HMG_COA_REDUCT ESTROGENS ANTI_ESTROGENS CEREBACC CHF
## TEXTSIMU1             0              0         0              0        0   0
## TEXTSIMU2             0              0         0              0        0   0
## TEXTSIMU3             0              0         0              0        0   0
## TEXTSIMU4             0              0         0              0        0   1
## TEXTSIMU5             0              0         0              0        0   0
## TEXTSIMU6             0              0         0              0        0   0
##           DVT DIAB MI PULMEMB SPINCOMP COPD MHBLOOD MHCARD MHCONGEN MHEAR
## TEXTSIMU1   0    0  0       0        0    0       0      1        0     0
## TEXTSIMU2   0    0  0       0        0    0       0      0        0     0
## TEXTSIMU3   0    0  0       0        0    0       0      0        0     0
## TEXTSIMU4   0    1  0       0        0    0       0      0        0     0
## TEXTSIMU5   0    0  0       0        0    0       0      0        0     0
## TEXTSIMU6   0    0  0       0        0    0       0      0        0     1
##           MHENDO MHGASTRO MHHEPATO MHIMMUNE MHINFECT MHINJURY MHINVEST MHMETAB
## TEXTSIMU1      0        0        0        0        0        1        0       0
## TEXTSIMU2      0        1        0        0        0        0        0       0
## TEXTSIMU3      1        1        0        0        1        0        0       0
## TEXTSIMU4      0        1        0        0        0        0        0       0
## TEXTSIMU5      0        0        0        0        0        0        0       0
## TEXTSIMU6      0        0        0        0        0        0        0       0
##           MHPSYCH MHRENAL MHRESP MHSKIN MHVASC ECOG_C AGEGRP2 RaceAsian
## TEXTSIMU1       0       0      0      0      0      0       2         0
## TEXTSIMU2       0       0      0      0      1      0       0         0
## TEXTSIMU3       0       0      0      0      0      0       1         0
## TEXTSIMU4       0       0      0      0      0      0       1         0
## TEXTSIMU5       0       0      0      0      0      0       1         0
## TEXTSIMU6       0       0      0      0      0      0       2         0
##           RaceBlack RaceOther RaceWhite RegionAsia RegionEastEuro
## TEXTSIMU1         0         0         0          0              0
## TEXTSIMU2         0         0         0          0              0
## TEXTSIMU3         0         0         0          0              0
## TEXTSIMU4         0         0         0          0              0
## TEXTSIMU5         0         0         0          0              0
## TEXTSIMU6         0         0         0          0              0
##           RegionNorthAmer RegionSouthAmer RegionWestEuro
## TEXTSIMU1               0               0              0
## TEXTSIMU2               0               0              0
## TEXTSIMU3               0               0              0
## TEXTSIMU4               0               0              0
## TEXTSIMU5               0               0              0
## TEXTSIMU6               0               0              0

##           DEATH LKADT_P  surv
## TEXTSIMU1     1     342   342
## TEXTSIMU2     0     360  360+
## TEXTSIMU3     1     682   682
## TEXTSIMU4     0    1067 1067+
## TEXTSIMU5     1     113   113
## TEXTSIMU6     0    1246 1246+

head(xMEDISIMU); head(yMEDISIMU)

##                BMI HEIGHTBL WEIGHTBL      ALP      ALT      AST   CA    CREAT
## MEDISIMU1 28.04282      175       90 5.093750 2.708050 3.349750 1.99 4.488636
## MEDISIMU2 26.57313      176       60 5.017280 3.091042 3.258097 2.41 4.174387
## MEDISIMU3 28.39506      165       65 4.418841 3.332205 3.349750 2.41 4.077537
## MEDISIMU4 24.57787      176      107 5.003946 3.295837 3.349750 2.33 4.634729
## MEDISIMU5 30.58581      188       73 4.158883 2.484907 3.367296 2.34 4.234107
## MEDISIMU6 25.18079      174       86 4.564348 4.882802 3.349750 2.33 4.499810
##             HB      LDH       NEU PLT      PSA    TBILI       TESTO      WBC
## MEDISIMU1 10.9 5.327876 1.2149127 186 6.194405 1.386294 -0.08338161 1.609438
## MEDISIMU2 13.3 5.327876 0.7030975 156 2.163323 1.609438 -0.08338161 2.041220
## MEDISIMU3 11.8 5.327876 1.0952734 126 3.713572 1.609438 -0.99425227 1.871802
## MEDISIMU4 13.1 5.327876 0.4946962 217 3.555348 1.791759 -0.08338161 1.568616
## MEDISIMU5 15.3 5.327876 1.1939225 221 3.367296 2.079442  0.78845736 1.704748
## MEDISIMU6 12.8 5.327876 1.9892433 386 3.610918 1.791759 -1.56064775 1.824549
##           CREACL NA.         MG        PHOS   ALB TPRO  RBC        LYM      BUN
## MEDISIMU1      0 140 -0.1923903  0.09531018 36.65 68.5 3.91  0.1823216 1.722767
## MEDISIMU2      0 142 -0.1923903 -0.02020271 33.60 68.5 4.28  0.8878913 1.722767
## MEDISIMU3      0 144 -0.1923903 -0.02020271 36.65 68.5 4.62  0.4946962 1.722767
## MEDISIMU4      0 142 -0.1923903 -0.02020271 36.65 69.0 4.35 -0.2744368 1.722767
## MEDISIMU5      0 143 -0.1923903 -0.06187540 36.65 68.5 4.05  0.4946962 1.722767
## MEDISIMU6      0 137 -0.1923903 -0.02020271 36.65 68.5 4.78  0.5128236 1.722767
##               CCRC      GLU SYSTOLICBP DIASTOLICBP PULSE HEMAT SPEGRA LYMperLEU
## MEDISIMU1 3.800105 1.435085      141.5          77    68  0.37      0        29
## MEDISIMU2 3.746038 1.871802      107.0          77    58  0.34      0        29
## MEDISIMU3 3.800105 1.916923      141.5          77    71  0.43      0        29
## MEDISIMU4 3.800105 1.871802      126.0          90    71  0.40      0        29
## MEDISIMU5 3.800105 1.589235      141.5          77    71  0.35      0        28
## MEDISIMU6 3.800105 1.791759      188.0          77    88  0.38      0        29
##           MONO MONOperLEU NEUperLEU POT BASOperLEU  EOS EOSperLEU TARGET
## MEDISIMU1 0.60         11      56.5 4.4          0 0.17         3      0
## MEDISIMU2 0.60         11      56.5 4.5          1 0.17         3      0
## MEDISIMU3 0.60         11      56.5 3.7          0 0.17         7      0
## MEDISIMU4 0.88         11      56.5 4.1          0 0.17         3      0
## MEDISIMU5 0.60         11      56.5 4.6          0 0.17         3      0
## MEDISIMU6 0.60         11      56.5 4.0          0 0.17         3      0
##           LYMPH_NODES KIDNEYS LUNGS LIVER PLEURA OTHER PROSTATE ORCHIDECTOMY
## MEDISIMU1           0       0     0     0      0     1        0            0
## MEDISIMU2           0       0     0     0      0     0        0            0
## MEDISIMU3           0       0     0     0      0     0        0            0
## MEDISIMU4           1       0     0     0      0     0        0            1
## MEDISIMU5           0       0     0     0      0     1        0            0
## MEDISIMU6           0       0     0     0      0     0        0            0
##           PROSTATECTOMY LYMPHADENECTOMY BILATERAL_ORCHIDECTOMY
## MEDISIMU1             0               0                      0
## MEDISIMU2             0               0                      0
## MEDISIMU3             1               0                      0
## MEDISIMU4             0               0                      0
## MEDISIMU5             0               0                      0
## MEDISIMU6             0               0                      0
##           PRIOR_RADIOTHERAPY ANALGESICS ANTI_ANDROGENS GLUCOCORTICOID
## MEDISIMU1                  1          0              0              0
## MEDISIMU2                  1          1              0              0
## MEDISIMU3                  1          0              1              1
## MEDISIMU4                  0          1              1              1
## MEDISIMU5                  1          0              1              1
## MEDISIMU6                  1          0              1              1
##           GONADOTROPIN BISPHOSPHONATE CORTICOSTEROID IMIDAZOLE ACE_INHIBITORS
## MEDISIMU1            0              1              1         0              0
## MEDISIMU2            0              0              1         0              0
## MEDISIMU3            0              0              1         0              0
## MEDISIMU4            0              0              1         0              0
## MEDISIMU5            0              0              0         0              0
## MEDISIMU6            0              0              1         0              0
##           BETA_BLOCKING HMG_COA_REDUCT ESTROGENS ANTI_ESTROGENS CEREBACC CHF
## MEDISIMU1             0              1         0              0        0   0
## MEDISIMU2             0              0         0              0        0   0
## MEDISIMU3             1              0         0              0        0   0
## MEDISIMU4             0              0         0              0        0   0
## MEDISIMU5             0              0         0              0        0   0
## MEDISIMU6             1              0         0              0        0   0
##           DVT DIAB MI PULMEMB SPINCOMP COPD MHBLOOD MHCARD MHCONGEN MHEAR
## MEDISIMU1   0    0  0       0        0    0       0      1        0     0
## MEDISIMU2   0    0  0       0        0    0       0      1        0     1
## MEDISIMU3   1    0  0       0        0    0       0      1        0     1
## MEDISIMU4   0    1  0       0        0    0       0      0        0     0
## MEDISIMU5   0    0  0       0        0    0       0      0        0     0
## MEDISIMU6   0    1  0       0        0    0       0      0        0     0
##           MHENDO MHGASTRO MHHEPATO MHIMMUNE MHINFECT MHINJURY MHINVEST MHMETAB
## MEDISIMU1      0        1        0        0        1        0        0       0
## MEDISIMU2      0        0        0        0        0        1        0       1
## MEDISIMU3      0        0        0        0        0        0        0       0
## MEDISIMU4      0        0        0        0        0        0        0       1
## MEDISIMU5      0        0        0        0        0        0        0       0
## MEDISIMU6      0        0        0        0        0        0        0       0
##           MHPSYCH MHRENAL MHRESP MHSKIN MHVASC ECOG_C AGEGRP2 RaceAsian
## MEDISIMU1       0       0      0      0      0      0       2         0
## MEDISIMU2       0       0      0      0      0      0       1         0
## MEDISIMU3       0       0      0      0      0      0       2         0
## MEDISIMU4       0       0      0      1      0      0       2         0
## MEDISIMU5       0       1      0      0      0      0       0         0
## MEDISIMU6       0       0      0      0      0      0       2         0
##           RaceBlack RaceOther RaceWhite RegionAsia RegionEastEuro
## MEDISIMU1         0         0         0          0              0
## MEDISIMU2         0         0         0          0              0
## MEDISIMU3         0         0         0          0              0
## MEDISIMU4         0         0         0          0              0
## MEDISIMU5         0         0         0          0              0
## MEDISIMU6         0         0         0          0              0
##           RegionNorthAmer RegionSouthAmer RegionWestEuro
## MEDISIMU1               0               0              0
## MEDISIMU2               0               0              0
## MEDISIMU3               0               0              0
## MEDISIMU4               0               0              0
## MEDISIMU5               0               0              0
## MEDISIMU6               0               0              0

##           DEATH LKADT_P  surv
## MEDISIMU1     0      89   89+
## MEDISIMU2     1     754   754
## MEDISIMU3     1     783   783
## MEDISIMU4     0     159  159+
## MEDISIMU5     0    1322 1322+
## MEDISIMU6     1     200   200

library(survival)

It is important to disginguish between the PSP and PEP objects, which represent a single penalized Cox regression model and an ensemble of Cox regression models, respectively. PSP objects are penalized/regularized Cox regression models fitted to a particular dataset by exploring its {λ, α} parameter space. Notice that the sequence of λ is dependent on the α ∈ [0, 1]. The regularized/penalized fitting procedure in ePCR is provided by the glmnet-package (Simon et al., 2011), although custom cross-validation and other supporting functionality is provided independently.

After fitting suitable candidate PSP-objects (Penalized Single Predictors), these will be aggregated to the ensemble structure PEP (Penalized Ensemble Predictor). The key input to PEP-constructor are the PSP intended for the use of the ensemble. We will start off by introducing the fine-tuning and fitting of PSPs. For this purpose the generic S4-class contructor new will be called with the main parameter indicating that we wish to construct a PSP-object.

PSP-objects

The key attributes provided for the PSP-constructor are the following parameters (see ?'PSP-class' in R for further documentation):

x: The input data matrix where rows corresponding to patients and columns to potential predictors.
y: The Surv-class response vector as required by Cox regression and glmnet in survival prediction.
seeds: An integer vector or a single value for setting the random seed for cross-validation. Notice that this is highly suggested for reproducibility. If a multiple seed integers are provided, the cross-validation will be conducted separately for each. This will smoothen the cross-validation surface, but will take multiply the computational time required to fit a model.
score: The scoring function utilized in evaluating the generalization ability of the fitted model in cross-validation; readily implemented scoring functions include score.iAUC and score.cindex, but custom scoring functions are also allowed.
alphaseq: Sequence of alpha values. The extreme ends α = 1 is LASSO regression and α = 0 is Ridge Regression. α ∈ ]0, 1[ is generally referred to as Elastic Net. Notice that LASSO and Ridge Regression have noticeably different characteristics as they utilizeo only the L₁ and L₂ norms, respectively; for example, a Ridge Regression model will never have its coefficients exactly zero. Furthermore, for co-linear predictors LASSO tends to pick a single one, while Ridge Regression picks multiple ones and spreads the overall effect over these predictors. Depending on the ultimate prediction purpose, one may prefer one or the other and can tailor alphaseq to suit their needs. By default we suggest utilizing an evenly spaced alphaseq over [0, 1] at least for preliminary search.
nlambda: Number of λ tested as a function of the corresponding α. By default glmnet suggests 100 values which are picked from a feasible range between model including all coefficients and converged model where no further penalization is possible.
folds: Number of folds in the cross-validation (minimum 3, maximum n obs = LOO-CV).

For the sake of the example, we will construct an ePCR model ensemble that consists of two PSP-objects; one from the medication curated cohort and other from the text search cohort. We will leave out a small portion of medication and text search patients for a small test set, to later evaluate the generalization ability of the ensemble. Notice however that this is not a proper evaluation as the patients are not from an independent source, and therefore give an optimistic view to the generalization capability of the model(s).

testset <- 1:30
# Medication cohort fit
# Leaving out patients into a separate test set using negative indices
psp_medi <- new("PSP", 
    # Input data matrix x (example data loaded previously)
    x = xMEDISIMU[-testset,],
    # Response vector, 'surv'-object
    y = yMEDISIMU[-testset,"surv"],
    # Seeds for reproducibility
    seeds = c(1,2),
    # If user wishes to run the CV binning multiple times,
    # this is possible by averaging over them for smoother CV heatmap.
    cvrepeat = 2,
    # Using the concordance-index as prediction accuracy in CV
    score = score.cindex,
    # Alpha sequence
    alphaseq = seq(from=0, to=1, length.out=6),
    # Using glmnet's default nlambda of 100
    nlambda = 100,
    # Running the nominal 10-fold cross-validation
    folds = 10,
    # x.expand slot is a function that would allow interaction terms
    # For the sake of the simplicity we will consider identity function
    x.expand = function(x) { as.matrix(x) }
)

## --- Initializing new PSP object ---
## 
## --- Cross-validation ( 10 -folds) repeat run  1  of  2  ---
## 
## [1] "alpha 0"
## [1] "alpha 0.2"

## [1] "alpha 0.4"

## [1] "alpha 0.6"

## [1] "alpha 0.8"

## [1] "alpha 1"

## --- Cross-validation ( 10 -folds) repeat run  2  of  2  ---
## 
## [1] "alpha 0"
## [1] "alpha 0.2"
## [1] "alpha 0.4"
## [1] "alpha 0.6"

## [1] "alpha 0.8"

## [1] "alpha 1"

## --- Computing AUCs for regularization curves for coefficients --- 
## 
## --- Generating feature list and dictionary --- 
## 
## --- New PSP object successfully created ---

The parameters for the second PSP are similar to the one above. Notice that with the PSP-members, user can tailor multiple parameters to best suit the data.

# Text run similar to above
# Leaving out patients into a separate test set using negative indices
psp_text <- new("PSP", 
    x = xTEXTSIMU[-testset,],
    y = yTEXTSIMU[-testset,"surv"],
    seeds = c(3,4),
    cvrepeat = 2,
    score = score.cindex,
    alphaseq = seq(from=0, to=1, length.out=6),
    nlambda = 100,
    folds = 10,
    x.expand = function(x) { as.matrix(x) }
)

## --- Initializing new PSP object ---
## 
## --- Cross-validation ( 10 -folds) repeat run  1  of  2  ---
## 
## [1] "alpha 0"
## [1] "alpha 0.2"
## [1] "alpha 0.4"
## [1] "alpha 0.6"
## [1] "alpha 0.8"
## [1] "alpha 1"
## --- Cross-validation ( 10 -folds) repeat run  2  of  2  ---
## 
## [1] "alpha 0"
## [1] "alpha 0.2"
## [1] "alpha 0.4"
## [1] "alpha 0.6"
## [1] "alpha 0.8"
## [1] "alpha 1"

## --- Computing AUCs for regularization curves for coefficients --- 
## 
## --- Generating feature list and dictionary --- 
## 
## --- New PSP object successfully created ---

# Taking a look on the show-method for PSP:
psp_medi

## PSP ePCR object
## N observations:  120 
## Optimal alpha:  1 
## Optimal lambda:  0.2578574 
## Optimal lambda index:  1

# Plot the CV-surface of the fitted PSP:
plot(psp_medi, 
    # Showing only every 10th row and column name (propagated to heatcv-function)
    by.rownames=10, by.colnames=10, 
    # Adjust main title and tilt the bias of the color key legend (see ?heatcv)
    main="C-index CV for psp_medi", bias=0.2)

Noticeably, the cross-validation surface suggests different optimized penalization parameters for the two ensemble members. This most likely stems from systematic differences in the two cohorts, to which end the ePCR methodology offers an ensemble-driven alternative to account for differences between patient substrata.

plot(psp_text, 
    # Showing only every 10th row and column name (propagated to heatcv-function)
    by.rownames=10, by.colnames=10, 
    # Adjust main title and tilt the bias of the color key legend (see ?heatcv)
    main="C-index CV for psp_text", bias=0.2)

In addition to providing the CV-grid, the identified optimal parameters are available for downstream analyses:

psp_medi@optimum

##       Alpha  AlphaIndex      Lambda LambdaIndex 
##   1.0000000   6.0000000   0.2578574   1.0000000

psp_text@optimum

##       Alpha  AlphaIndex      Lambda LambdaIndex 
##   1.0000000   6.0000000   0.4396716   1.0000000

slotNames(psp_medi)

##  [1] "description" "features"    "strata"      "alphaseq"    "cvfolds"    
##  [6] "nlambda"     "cvmean"      "cvmedian"    "cvstdev"     "cvmin"      
## [11] "cvmax"       "score"       "cvrepeat"    "impute"      "optimum"    
## [16] "seed"        "x"           "x.expand"    "y"           "fit"        
## [21] "criterion"   "dictionary"  "regAUC"

PEP-objects

Once the PSP-objects have been constructed, they are aggregated to the corresponding Penalized Ensemble Predictor (PEP). The PEP objects aggregate PSP objects from various data slices or optimization criteria, and create an ensemble predictor that averages over the provided single predictors. As such, its most important input is the list of desired PSP-objects:

pep_tyks <- new("PEP",
    # The main input is the list of PSP objects
    PSPs = list(psp_medi, psp_text)
)
# These PSPs were constructed using the example code above.
pep_tyks

## Penalized Ensemble Predictor
## Count of PSPs:  2

Predictions based on PEP/PSP-objects

# Conduct naive test set evaluation
xtest <- rbind(xMEDISIMU[testset,], xTEXTSIMU[testset,])
ytest <- rbind(yMEDISIMU[testset,], yTEXTSIMU[testset,])
# Perform survival prediction based on the PEP-ensemble we've created
xpred <- predict(pep_tyks, newx=as.matrix(xtest), type="ensemble")
# Construct a survival object using the Surv-class
ytrue <- Surv(time = ytest[,"surv"][,"time"], event = ytest[,"surv"][,"status"])
# Test c-index between our constructed ensemble prediction and true response
tyksscore <- score.cindex(pred = xpred, real = ytrue)
print(paste("TYKS example c-index:", round(tyksscore, 4)))

## [1] "TYKS example c-index: 0.5"

Using the provided DREAM and TYKS ePCR-models

The ePCR R-package comes with readily fitted ePCR-ensembles from the work by (Guinney, Wang, Laajala et al. 2017) as well as from hospital registry cohorts. Due to data confidentiality issues, the original data matrices or responses are not provided in the S4-objects (although normally they would be in the slots @x and @y, respectively).

In order to gain access to the original data by Guinney et al., the processed data can be accessed as raw .csv files or R workspaces at the corresponding Synapse workspace.

Accessing the Turku University Hospital registry cohort requires a research permit and users are encouraged to contact the Center for Clinical Informatics ([email protected]) for further information.

Despite not providing the original data matrices, the ensemble model fits and their coefficients as a function of {λ, α} are fully functional. They are therefore suitable for conducting predictions for future patients or for studying effect within the estimated models/ensembles. These model objects can be loaded in ePCR using:

data(ePCRmodels)
class(DREAM)

## [1] "PEP"
## attr(,"package")
## [1] ".GlobalEnv"

class(TYKS)

## [1] "PEP"
## attr(,"package")
## [1] "ePCR"

The DREAM S4-object is the top-performing mCRPC OS-predicting ensemble from Guinney et al., while the TYKS models are fitted to the original Turku University Hospital cohorts. These model objects can be used for prediction similarly to the novel S4 PEP-object created in above sections. As an example, if we utilize the DREAM model trained on controlled clinical trials on the TYKS hospital registry patients, the OS prediction can be conducted using:

# Create a DREAM-matching data input matrix from our xtest and the full data matrix
xtemp <- conforminput(DREAM, xtest)
# Predict survival for our hospital registry example dataset 
dreampred <- predict(DREAM, 
    # Providing full new data and average prediction over the ensemble members
    newx=xtemp, type="ensemble",
    # Defining that we don't want any further data matrix feature extraction
    # The call to conforminput above already formatted the input data
    x.expand = as.matrix
)

Notice that we utilize the helper function conforminput for feature extraction/creation, as multiple interaction variables were introduced in the original DREAM data matrix and the dimensions would not match in the regression task otherwise.

The following error message is quite commonly encountered when first using pre-built models to new data:

Error in newx %*% nbeta : Cholmod error ‘X and/or Y have wrong dimensions’ at file ../MatrixOps/cholmod_sdmult.c, line 90

It is prompted by the glmnet-package’s C/Fortran implementation, if the β coefficients do not conform to the provided dimensions of the new data matrix X. For this purpose, the new data should have equal number of columns (variables) using data processing (functions such as conforminput or the S4-slot in a PEP-object called x.expand).

# Test c-index between the DREAM ensemble prediction and TYKS true response
dreamscore <- score.cindex(pred = dreampred, real = ytrue)
print(paste("DREAM example c-index:", round(dreamscore, 4)))

## [1] "DREAM example c-index: 0.389"

Session info

sessionInfo()

## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] survival_3.8-3 ePCR_0.11.0    rmarkdown_2.29
## 
## loaded via a namespace (and not attached):
##  [1] Matrix_1.7-2        glmnet_4.1-8        future.apply_1.11.3
##  [4] jsonlite_1.8.9      compiler_4.4.2      Rcpp_1.0.14        
##  [7] parallel_4.4.2      jquerylib_0.1.4     globals_0.16.3     
## [10] splines_4.4.2       yaml_2.3.10         fastmap_1.2.0      
## [13] lattice_0.22-6      prodlim_2024.06.25  impute_1.81.0      
## [16] Bolstad2_1.0-29     R6_2.6.0            shape_1.4.6.1      
## [19] knitr_1.49          iterators_1.0.14    pec_2023.04.12     
## [22] future_1.34.0       maketools_1.3.2     bslib_0.9.0        
## [25] rlang_1.1.5         cachem_1.1.0        xfun_0.50          
## [28] sass_0.4.9          sys_3.4.3           cli_3.6.3          
## [31] hamlet_0.9.7        digest_0.6.37       foreach_1.5.2      
## [34] grid_4.4.2          mvtnorm_1.3-3       lifecycle_1.0.4    
## [37] lava_1.8.1          timereg_2.0.6       timeROC_0.4        
## [40] evaluate_1.0.3      pracma_2.4.4        data.table_1.16.4  
## [43] numDeriv_2016.8-1.1 listenv_0.9.1       codetools_0.2-20   
## [46] buildtools_1.0.0    parallelly_1.42.0   tools_4.4.2        
## [49] htmltools_0.5.8.1