Title: | Hierarchical Optimal Matching and Machine Learning Toolbox |
---|---|
Description: | Various functions and algorithms are provided here for solving optimal matching tasks in the context of preclinical cancer studies. Further, various helper and plotting functions are provided for unsupervised and supervised machine learning as well as longitudinal mixed-effects modeling of tumor growth response patterns. |
Authors: | Teemu Daniel Laajala <[email protected]> |
Maintainer: | Teemu Daniel Laajala <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.9.7 |
Built: | 2025-02-25 06:02:16 UTC |
Source: | https://github.com/syksy/hamlet |
This package provides functions and algorithms for solving optimal matching tasks in the context of preclinical cancer studies. Further, various help and plotting functions are provided for unsupervised and supervised machine learning as well as longitudinal modeling of tumor growth response patterns.
Teemu Daniel Laajala
Maintainer: Teemu Daniel Laajala <[email protected]>
Laajala TD, Jumppanen M, Huhtaniemi R, Fey V, Kaur A, et al. (2016) Optimized design and analysis of preclinical intervention studies in vivo. Sci Rep. 2016 Aug 2;6:30723. doi: 10.1038/srep30723.
Knuuttila M, Yatkin E, Kallio J, Savolainen S, Laajala TD, et al. (2014) Castration induces upregulation of intratumoral androgen biosynthesis and androgen receptor expression in orthotopic VCaP human prostate cancer xenograft model. Am J Pathol. 2014 Aug;184(8):2163-73. doi: 10.1016/j.ajpath.2014.04.010.
## ## Exploring the VCaP dataset provided alongside the 'hamlet' package ## data(vcapwide) data(vcaplong) # VCaP Castration-resistant prostate cancer (CRPC) PSA-measurements (and body weight) in wide-format mixplot(vcapwide[,c("PSAWeek10", "PSAWeek14", "BWWeek10", "Group")], pch=16) anv <- aov(PSA ~ Group, data.frame(PSA = vcapwide[,"PSAWeek14"], Group = vcapwide[,"Group"])) summary(anv) TukeyHSD(anv) summary(aov(BW ~ Group, data.frame(BW = vcapwide[,"BWWeek14"], Group = vcapwide[,"Group"]))) # VCaP Castration-resistant prostate cancer (CRPC) PSA-measurements (and body weight) in long-format library(lattice) xyplot(log2PSA ~ DrugWeek | Group, data = vcaplong, type="l", group=ID, layout=c(3,1)) xyplot(BW ~ DrugWeek | Group, data = vcaplong, type="l", group=ID, layout=c(3,1)) ## ## Example multigroup (g=3) nbp-matching using the branch and bound algorithm, ## and subsequent random allocation of submatches to 3 arms ## # Construct an Euclidean distance example distance matrix using 15 observations from the VCaP study d <- as.matrix(dist(vcapwide[1:15,c("PSAWeek10", "BWWeek10")])) # Matching using the b&b algorithm to submatches of size 3 # (which will result in 3 intervention groups) bb3 <- match.bb(d, g=3) str(bb3) solvec <- bb3$solution # matching vector, where each element indicates to which submatch each observation belongs to # Perform an example random allocation of the above submatches, # these will be randomly allocated to 3 arms based on the submatches set.seed(1) groups <- match.allocate(solvec) # Illustrate randomization, no baseline differences in these three artificial groups by(vcapwide[1:15,c("PSAWeek10", "BWWeek10")], INDICES=groups, FUN=function(x) x) summary(aov(PSAWeek10 ~ groups, data = data.frame(PSAWeek10 = vcapwide[1:15,"PSAWeek10"], groups))) summary(aov(BWWeek10 ~ groups, data = data.frame(BWWeek10 = vcapwide[1:15,"BWWeek10"], groups))) ## ## Example mixed-effects modeling of the longitudinal PSA profiles using ## the actual experimental groups ## exdat <- vcaplong[vcaplong[,"Group"] %in% c("Vehicle", "ARN"),] library(lme4) # Model fitting using lme4-package f1 <- lmer(log2PSA ~ 1 + DrugWeek + DrugWeek:ARN + (1 + DrugWeek|ID), data = exdat) mem.getcomp(f1) library(lmerTest) # Model term testing using the lmerTest-package summary(f1)
## ## Exploring the VCaP dataset provided alongside the 'hamlet' package ## data(vcapwide) data(vcaplong) # VCaP Castration-resistant prostate cancer (CRPC) PSA-measurements (and body weight) in wide-format mixplot(vcapwide[,c("PSAWeek10", "PSAWeek14", "BWWeek10", "Group")], pch=16) anv <- aov(PSA ~ Group, data.frame(PSA = vcapwide[,"PSAWeek14"], Group = vcapwide[,"Group"])) summary(anv) TukeyHSD(anv) summary(aov(BW ~ Group, data.frame(BW = vcapwide[,"BWWeek14"], Group = vcapwide[,"Group"]))) # VCaP Castration-resistant prostate cancer (CRPC) PSA-measurements (and body weight) in long-format library(lattice) xyplot(log2PSA ~ DrugWeek | Group, data = vcaplong, type="l", group=ID, layout=c(3,1)) xyplot(BW ~ DrugWeek | Group, data = vcaplong, type="l", group=ID, layout=c(3,1)) ## ## Example multigroup (g=3) nbp-matching using the branch and bound algorithm, ## and subsequent random allocation of submatches to 3 arms ## # Construct an Euclidean distance example distance matrix using 15 observations from the VCaP study d <- as.matrix(dist(vcapwide[1:15,c("PSAWeek10", "BWWeek10")])) # Matching using the b&b algorithm to submatches of size 3 # (which will result in 3 intervention groups) bb3 <- match.bb(d, g=3) str(bb3) solvec <- bb3$solution # matching vector, where each element indicates to which submatch each observation belongs to # Perform an example random allocation of the above submatches, # these will be randomly allocated to 3 arms based on the submatches set.seed(1) groups <- match.allocate(solvec) # Illustrate randomization, no baseline differences in these three artificial groups by(vcapwide[1:15,c("PSAWeek10", "BWWeek10")], INDICES=groups, FUN=function(x) x) summary(aov(PSAWeek10 ~ groups, data = data.frame(PSAWeek10 = vcapwide[1:15,"PSAWeek10"], groups))) summary(aov(BWWeek10 ~ groups, data = data.frame(BWWeek10 = vcapwide[1:15,"BWWeek10"], groups))) ## ## Example mixed-effects modeling of the longitudinal PSA profiles using ## the actual experimental groups ## exdat <- vcaplong[vcaplong[,"Group"] %in% c("Vehicle", "ARN"),] library(lme4) # Model fitting using lme4-package f1 <- lmer(log2PSA ~ 1 + DrugWeek + DrugWeek:ARN + (1 + DrugWeek|ID), data = exdat) mem.getcomp(f1) library(lmerTest) # Model term testing using the lmerTest-package summary(f1)
This function serves as an alternative to the R function 'extendrange', when user wishes to conserve a point of symmetricity for the range. For example, this might be desired when the plot should be symmetric around the origin x=0, but that the sides need to extend beyond the actual range of values.
extendsymrange(x, r = range(x, na.rm = T), f = 0.05, sym = 0)
extendsymrange(x, r = range(x, na.rm = T), f = 0.05, sym = 0)
x |
Vector of values to compute the range for |
r |
The range of values |
f |
The factor by which the range is extended beyond the extremes |
sym |
The defined point of symmetricity |
A vector of 2 values for the lower and higher limit of the symmetric extended range
Teemu Daniel Laajala <[email protected]>
set.seed(1) ex <- rnorm(10)+2 hist(ex, xlim=extendsymrange(ex, sym=0), breaks=100)
set.seed(1) ex <- rnorm(10)+2 hist(ex, xlim=extendsymrange(ex, sym=0), breaks=100)
This function plots heatmap figure based on the normal plot-region. This is useful if the image-based function 'heatmap' is not suitable, i.e. when multiple heatmaps should be placed in a single device.
hmap(x, add = F, xlim = c(0.2, 0.8), ylim = c(0.2, 0.8), col = heat.colors(10), border = matrix(NA, nrow = nrow(x), ncol = ncol(x)), lty = matrix("solid", nrow = nrow(x), ncol = ncol(x)), lwd = matrix(1, nrow = nrow(x), ncol = ncol(x)), hclustfun = hclust, distfun = dist, reorderfun = function(d, w) reorder(d, w), textfun = function(xseq, yseq, labels, type = "row", ...) { if (type == "col") par(srt = 90); text(x = xseq, y = yseq, labels = labels, ...); if (type == "col") par(srt = 0)}, symm = F, Rowv = NULL, Colv = if (symm) Rowv else NULL, leftlim = c(0, 0.2), toplim = c(0.8, 1), rightlim = c(0.8, 1), bottomlim = c(0, 0.2), type = "rect", scale = c("none", "row", "column"), na.rm = T, nbins = length(col), valseq = seq(from = min(x, na.rm = na.rm), to = max(x, na.rm = na.rm), length.out = nbins), namerows = T, namecols = T, ...)
hmap(x, add = F, xlim = c(0.2, 0.8), ylim = c(0.2, 0.8), col = heat.colors(10), border = matrix(NA, nrow = nrow(x), ncol = ncol(x)), lty = matrix("solid", nrow = nrow(x), ncol = ncol(x)), lwd = matrix(1, nrow = nrow(x), ncol = ncol(x)), hclustfun = hclust, distfun = dist, reorderfun = function(d, w) reorder(d, w), textfun = function(xseq, yseq, labels, type = "row", ...) { if (type == "col") par(srt = 90); text(x = xseq, y = yseq, labels = labels, ...); if (type == "col") par(srt = 0)}, symm = F, Rowv = NULL, Colv = if (symm) Rowv else NULL, leftlim = c(0, 0.2), toplim = c(0.8, 1), rightlim = c(0.8, 1), bottomlim = c(0, 0.2), type = "rect", scale = c("none", "row", "column"), na.rm = T, nbins = length(col), valseq = seq(from = min(x, na.rm = na.rm), to = max(x, na.rm = na.rm), length.out = nbins), namerows = T, namecols = T, ...)
x |
Matrix to be plotted |
add |
Should the figure be added to the plotting region of an already existing figure |
xlim |
The x limits in which the heatmap is placed horizontally in the plotting region |
ylim |
The y limits in which the heatmap is placed vertically in the plotting region |
col |
Color palette for the heatmap colors |
border |
A matrix of border color definitions (rectangles in the heatmap) |
lty |
A matrix of line type definitions (rectangles in the heatmap) |
lwd |
A matrix of line width definitions (rectangles in the heatmap) |
hclustfun |
The hierarchical clustering function similar to 'stats::heatmap' implementation. Should yield a valid 'hclust' object for a given distance/dissimilarity matrix. |
distfun |
The distance/dissimilarity function similar to 'stats::heatmap' implementation. Should yield a valid 'dist' object for a given data matrix. |
reorderfun |
The function to use to reorder branches of the clustering (notice that same-level branches in a hierarchical clustering may be permutated without violating the solution). The default approach from 'stats::heatmap' is utilized here. |
textfun |
A text function that is used to plot the names of the rows and columns, if desired. The default implementation shows how user could tailor the columns and rows differently, by turning the column labels around 90-degrees. The parameter 'type' is used to distinguish between rows and columns. |
symm |
Should the given data matrix be treated as symmetric (has to be a square matrix if so), by default 'FALSE'. |
Rowv |
The row clustering parameter. If 'NA' the row hierarchical clustering is completely omitted. Alternatively, if a numeric vector of ranks, the ordering of branches is tried to be permutated according to the desired order. This can also be a pre-computed dendrogram-object. |
Colv |
The column clustering parameter. If 'NA' the column hierarchical clustering is completely omitted. Alternatively, if a numeric vector of ranks, the ordering of branches is tried to be permutated according to the desired order. This can also be a pre-computed dendrogram-object. |
leftlim |
The horizontal limits of the row hierarchical clustering. The horizontal limits of the heatmap are a=leftlim[1], b=leftlim[2], c=xlim[1], d=xlim[2] where the 'a' is where the row dendrogram begins, 'b' is where the row dendrogram ends, 'c' is where the heatmap itself begins, and 'd' is where the heatmap itself ends. The vertical limits are computed according to 'ylim' to align correctly with the heatmap rectangles. |
toplim |
The vertical limits of the row hierarchical clustering. The horizontal limits of the heatmap are a=ylim[1], b=ylim[2], c=toplim[1], d=toplim[2] where the 'a' is where the heatmap begins, 'b' is where the heatmap ends, 'c' is where the column dendrogram begins, and 'd' is where the column dendrogram ends. The horizontal limits are computed according to 'xlim' to align correctly with the heatmap rectangles. |
rightlim |
The horizontal limits for the row texts. |
bottomlim |
The vertical limits for the column texts. |
type |
Type of clustering visualization; while default "rect" provides a rectangular-angled tree, the alternate option "tri" provides triangular-angled tree. |
scale |
Should data be scaled according to 'row' or 'column' (or 'col') similarly to 'stats::heatmap'. |
na.rm |
Should missing values be removed, by default TRUE. |
nbins |
Number of discrete bins the data is divided into using 'seq(from=min(x), to=max(x), length.out=nbins)'. |
valseq |
The sequence of values by which the data is binned, typically corresponding to the above 'nbins' parameter. |
namerows |
If a single boolean value TRUE, then the default 'rownames(x)' are plotted to the right of the rows with the 'textfun'. If it is a vector of length 'nrow(x)', then this vector is used for plotting the row names instead. |
namecols |
If a single boolean value TRUE, then the default 'colnames(x)' are plotted below the columns with the 'textfun'. If it is a vector of length 'nrow(x)', then this vector is used for plotting the column names instead. |
... |
Additional parameters provided to the rectangle plotting function |
Teemu Daniel Laajala <[email protected]>
heatmap
hmap.key
hmap.annotate
# Generate some data set.seed(1) r1 <- replicate(30, rnorm(20)) lab <- sample(letters[1:2], 20, replace=TRUE) r1[lab==lab[1],] <- r1[lab==lab[1],] + 2 r2a <- replicate(10, rnorm(10)) r2b <- replicate(10, rnorm(10)) # Set up a new plot region, notice we have a 2-fold wider x-axis plot.new() plot.window(xlim=c(0,2), ylim=c(0,1)) # Plot an example plot along with the color key and annotations for our 'lab' vector h1 <- hmap(r1, add = TRUE) hmap.key(h1, x1=0.18) hmap.annotate(h1, rw = lab, rw.wid=c(0.82,0.90)) # Plot the rest to show how the coordinates are adjusted to place the heatmap(s) differently h2a <- hmap(r2a, add = TRUE, xlim=c(1.2, 1.8), leftlim=c(1.0, 1.2), rightlim=c(1.8,2.0), ylim=c(0.6, 1.0), bottomlim=c(0.5,0.6), Colv=NA) h2b <- hmap(r2b, add = TRUE, xlim=c(1.2, 1.8), leftlim=c(1.0, 1.2), rightlim=c(1.8,2.0), ylim=c(0.1, 0.5), bottomlim=c(0.0,0.1), Colv=NA) # Show the normal plot region axes axis(1, at=c(0.5,1.5), c("A", "B")) ## Not run: # Heatmap used as base for the help documentation figure set.seed(1) hmap(matrix(rnorm(100), nrow=10), xlim=c(0.2,0.8), ylim=c(0.2,0.8), leftlim=c(0.0,0.2), rightlim=c(0.8,1.0), bottomlim=c(0.0,0.2), toplim=c(0.8,1.0)) axis(1); axis(2); title(xlab="x", ylab="y") ## End(Not run)
# Generate some data set.seed(1) r1 <- replicate(30, rnorm(20)) lab <- sample(letters[1:2], 20, replace=TRUE) r1[lab==lab[1],] <- r1[lab==lab[1],] + 2 r2a <- replicate(10, rnorm(10)) r2b <- replicate(10, rnorm(10)) # Set up a new plot region, notice we have a 2-fold wider x-axis plot.new() plot.window(xlim=c(0,2), ylim=c(0,1)) # Plot an example plot along with the color key and annotations for our 'lab' vector h1 <- hmap(r1, add = TRUE) hmap.key(h1, x1=0.18) hmap.annotate(h1, rw = lab, rw.wid=c(0.82,0.90)) # Plot the rest to show how the coordinates are adjusted to place the heatmap(s) differently h2a <- hmap(r2a, add = TRUE, xlim=c(1.2, 1.8), leftlim=c(1.0, 1.2), rightlim=c(1.8,2.0), ylim=c(0.6, 1.0), bottomlim=c(0.5,0.6), Colv=NA) h2b <- hmap(r2b, add = TRUE, xlim=c(1.2, 1.8), leftlim=c(1.0, 1.2), rightlim=c(1.8,2.0), ylim=c(0.1, 0.5), bottomlim=c(0.0,0.1), Colv=NA) # Show the normal plot region axes axis(1, at=c(0.5,1.5), c("A", "B")) ## Not run: # Heatmap used as base for the help documentation figure set.seed(1) hmap(matrix(rnorm(100), nrow=10), xlim=c(0.2,0.8), ylim=c(0.2,0.8), leftlim=c(0.0,0.2), rightlim=c(0.8,1.0), bottomlim=c(0.0,0.2), toplim=c(0.8,1.0)) axis(1); axis(2); title(xlab="x", ylab="y") ## End(Not run)
Annotation of rows or columns in a 'hmap'-plot. By default, rectangles aligned with either rows or columns are plotted to the right-side or lower-side of the heatmap respectively. User-specified customizations may be given to change these annotations in positioning or type.
hmap.annotate(h, rw, rw.n = length(unique(rw)), rw.col = rainbow(rw.n, start = 0.05, end = 0.5), rw.wid, rw.hei, rw.pch, rw.x = rep(min(h$rightlim), times = length(h$rowtext$xseq)), rw.y = h$rowtext$yseq, rw.shift = c(0.02, 0), cl, cl.n = length(unique(cl)), cl.col = rainbow(cl.n, start = 0.55, end = 1), cl.wid, cl.hei, cl.pch, cl.x = h$coltext$xseq, cl.y = rep(max(h$bottomlim), times = length(h$coltext$yseq)), cl.shift = c(0, -0.02), ...)
hmap.annotate(h, rw, rw.n = length(unique(rw)), rw.col = rainbow(rw.n, start = 0.05, end = 0.5), rw.wid, rw.hei, rw.pch, rw.x = rep(min(h$rightlim), times = length(h$rowtext$xseq)), rw.y = h$rowtext$yseq, rw.shift = c(0.02, 0), cl, cl.n = length(unique(cl)), cl.col = rainbow(cl.n, start = 0.55, end = 1), cl.wid, cl.hei, cl.pch, cl.x = h$coltext$xseq, cl.y = rep(max(h$bottomlim), times = length(h$coltext$yseq)), cl.shift = c(0, -0.02), ...)
h |
The list of heatmap parameters returned invisibly by the original 'hmap'-call. |
rw |
Annotation vector for rows 'r', each unique instance is given a different color (or pch) and plotted right-side of the corresponding heatmap rows |
rw.n |
Number of unique colors (or pch) to give each annotated row |
rw.col |
A vector for color values for unique instances in 'r' for annotating rows |
rw.wid |
The widths for annotation boxes for each row 'r' |
rw.hei |
The heights for annotation boxes for each row 'r' |
rw.pch |
Alternatively, instead of widths and heights user may specify a symbol 'pch' to use for annotating each row |
rw.x |
The x-coordinate locations for the row annotations, by default right side of heatmap itself |
rw.y |
The y-coordinate locations for the row annotations, by default same vertical locations as for the heatmap rows |
rw.shift |
Row annotation shift: a vector of 2 values, where first indicates the amount of x-axis shift desired and the second indicates the amount of y-axis shift |
cl |
Annotation vector for columns 'r', each unique instance is given a different color (or pch) and plotted lower-side of the corresponding heatmap columns |
cl.n |
Number of unique colors (or pch) to give each annotated column |
cl.col |
A vector for color values for unique instances in 'c' for annotating columns |
cl.wid |
The widths for annotation boxes for each column 'c' |
cl.hei |
The heights for annotation boxes for each column 'c' |
cl.pch |
Alternatively, instead of widths and heights user may specify a symbol 'pch' to use for annotating each column |
cl.x |
The x-coordinate locations for the column annotations, by default same horizontal locations as for the heatmap columns |
cl.y |
The y-coordinate locations for the column annotations, by default lower side of heatmap itself |
cl.shift |
Column annotation shift: a vector of 2 values, where first indicates the amount of x-axis shift desired and the second indicates the amount of y-axis shift |
... |
Additional parameters supplied either to 'rect' or 'points' function if user desired rectangles or 'pch'-based points respectively |
Teemu Daniel Laajala <[email protected]>
# Generate some data set.seed(1) r1 <- replicate(30, rnorm(20)) lab <- sample(letters[1:2], 20, replace=TRUE) r1[lab==lab[1],] <- r1[lab==lab[1],] + 2 r2a <- replicate(10, rnorm(10)) r2b <- replicate(10, rnorm(10)) # Set up a new plot region, notice we have a 2-fold wider x-axis plot.new() plot.window(xlim=c(0,2), ylim=c(0,1)) # Plot an example plot along with the color key and annotations for our 'lab' vector h1 <- hmap(r1, add = TRUE) hmap.key(h1, x1=0.18) hmap.annotate(h1, rw = lab, rw.wid=c(0.82,0.90)) # Plot the rest to show how the coordinates are adjusted to place the heatmap(s) differently h2a <- hmap(r2a, add = TRUE, xlim=c(1.2, 1.8), leftlim=c(1.0, 1.2), rightlim=c(1.8,2.0), ylim=c(0.6, 1.0), bottomlim=c(0.5,0.6), Colv=NA) h2b <- hmap(r2b, add = TRUE, xlim=c(1.2, 1.8), leftlim=c(1.0, 1.2), rightlim=c(1.8,2.0), ylim=c(0.1, 0.5), bottomlim=c(0.0,0.1), Colv=NA) # Show the normal plot region axes axis(1, at=c(0.5,1.5), c("A", "B"))
# Generate some data set.seed(1) r1 <- replicate(30, rnorm(20)) lab <- sample(letters[1:2], 20, replace=TRUE) r1[lab==lab[1],] <- r1[lab==lab[1],] + 2 r2a <- replicate(10, rnorm(10)) r2b <- replicate(10, rnorm(10)) # Set up a new plot region, notice we have a 2-fold wider x-axis plot.new() plot.window(xlim=c(0,2), ylim=c(0,1)) # Plot an example plot along with the color key and annotations for our 'lab' vector h1 <- hmap(r1, add = TRUE) hmap.key(h1, x1=0.18) hmap.annotate(h1, rw = lab, rw.wid=c(0.82,0.90)) # Plot the rest to show how the coordinates are adjusted to place the heatmap(s) differently h2a <- hmap(r2a, add = TRUE, xlim=c(1.2, 1.8), leftlim=c(1.0, 1.2), rightlim=c(1.8,2.0), ylim=c(0.6, 1.0), bottomlim=c(0.5,0.6), Colv=NA) h2b <- hmap(r2b, add = TRUE, xlim=c(1.2, 1.8), leftlim=c(1.0, 1.2), rightlim=c(1.8,2.0), ylim=c(0.1, 0.5), bottomlim=c(0.0,0.1), Colv=NA) # Show the normal plot region axes axis(1, at=c(0.5,1.5), c("A", "B"))
A continuous color scale key for a heatmap. By default the key is constructed according to the 'h'-object which is invisibly returned by the original 'hmap'-call. Some customization may be supplied to position the legend or to customize ticks and style.
hmap.key(h, x0 = h$leftlim[1], x1 = h$leftlim[2], y0 = h$toplim[1], y1 = h$toplim[2], xlim = range(h$valseq), ratio = 0.5, tick = 0.1, at = seq(from = min(h$valseq), to = max(h$valseq), length.out = 5), bty = "c", cex = 0.5, pos = 3, offset = c(0,0))
hmap.key(h, x0 = h$leftlim[1], x1 = h$leftlim[2], y0 = h$toplim[1], y1 = h$toplim[2], xlim = range(h$valseq), ratio = 0.5, tick = 0.1, at = seq(from = min(h$valseq), to = max(h$valseq), length.out = 5), bty = "c", cex = 0.5, pos = 3, offset = c(0,0))
h |
The list of heatmap parameters returned invisibly by the original 'hmap'-call. |
x0 |
Coordinates for the color key; left border |
x1 |
Coordinates for the color key; right border |
y0 |
Coordinates for the color key; lower border |
y1 |
Coordinates for the color key; upper border |
xlim |
Value range for the x-axis within the key itself, by default extracted from the h-object |
ratio |
Ratio between y-axis coordinates to separate the key box to upper color key box and lower tick and values |
tick |
The vertical length in value ticks |
at |
The values in color key at which to plot ticks and the values at ticks |
bty |
Type of box to plot around the color key |
cex |
The zooming factor for plotting the text and other objects affected by the 'cex' parameter in 'par' |
pos |
The text alignment and position parameter given to the 'text' function in the key |
offset |
Allows offsetting legend text by absolute x,y coordinates in relation to ticks |
Teemu Daniel Laajala <[email protected]>
# Generate some data set.seed(1) r1 <- replicate(30, rnorm(20)) lab <- sample(letters[1:2], 20, replace=TRUE) r1[lab==lab[1],] <- r1[lab==lab[1],] + 2 r2a <- replicate(10, rnorm(10)) r2b <- replicate(10, rnorm(10)) # Set up a new plot region, notice we have a 2-fold wider x-axis plot.new() plot.window(xlim=c(0,2), ylim=c(0,1)) # Plot an example plot along with the color key and annotations for our 'lab' vector h1 <- hmap(r1, add = TRUE) hmap.key(h1, x1=0.18) hmap.annotate(h1, rw = lab, rw.wid=c(0.82,0.90)) # Plot the rest to show how the coordinates are adjusted to place the heatmap(s) differently h2a <- hmap(r2a, add = TRUE, xlim=c(1.2, 1.8), leftlim=c(1.0, 1.2), rightlim=c(1.8,2.0), ylim=c(0.6, 1.0), bottomlim=c(0.5,0.6), Colv=NA) h2b <- hmap(r2b, add = TRUE, xlim=c(1.2, 1.8), leftlim=c(1.0, 1.2), rightlim=c(1.8,2.0), ylim=c(0.1, 0.5), bottomlim=c(0.0,0.1), Colv=NA) # Show the normal plot region axes axis(1, at=c(0.5,1.5), c("A", "B"))
# Generate some data set.seed(1) r1 <- replicate(30, rnorm(20)) lab <- sample(letters[1:2], 20, replace=TRUE) r1[lab==lab[1],] <- r1[lab==lab[1],] + 2 r2a <- replicate(10, rnorm(10)) r2b <- replicate(10, rnorm(10)) # Set up a new plot region, notice we have a 2-fold wider x-axis plot.new() plot.window(xlim=c(0,2), ylim=c(0,1)) # Plot an example plot along with the color key and annotations for our 'lab' vector h1 <- hmap(r1, add = TRUE) hmap.key(h1, x1=0.18) hmap.annotate(h1, rw = lab, rw.wid=c(0.82,0.90)) # Plot the rest to show how the coordinates are adjusted to place the heatmap(s) differently h2a <- hmap(r2a, add = TRUE, xlim=c(1.2, 1.8), leftlim=c(1.0, 1.2), rightlim=c(1.8,2.0), ylim=c(0.6, 1.0), bottomlim=c(0.5,0.6), Colv=NA) h2b <- hmap(r2b, add = TRUE, xlim=c(1.2, 1.8), leftlim=c(1.0, 1.2), rightlim=c(1.8,2.0), ylim=c(0.1, 0.5), bottomlim=c(0.0,0.1), Colv=NA) # Show the normal plot region axes axis(1, at=c(0.5,1.5), c("A", "B"))
This function allocates units belonging to a single submatch to separate intervention arms. This ensures that the resulting intervention groups are homogeneous in respect to the variables that were used to construct the distance/dissimilarity matrix for the non-bipartite matching. The number of resulting intervention groups is equal to the 'g' (i.e. submatch size) used in the multigroup non-bipartite matching.
match.allocate(xmat)
match.allocate(xmat)
xmat |
A binary matching matrix or a matching vector given by match.bb-function. |
A vector where each element indicates to which group the observation was randomized to. The group names are "Group_A", "Group_B", "Group_C", ... until 'g' letters, where 'g' was the size of submatches.
Teemu Daniel Laajala <[email protected]>
match.bb
match.mat2vec
match.vec2mat
match.dummy
data(vcapwide) # Construct an Euclidean distance example distance matrix using 15 observations from the VCaP study d <- as.matrix(dist(vcapwide[1:15,c("PSAWeek10", "BWWeek10")])) # Matching using the b&b algorithm to submatches of size 3 # (which will result in 3 intervention groups) bb3 <- match.bb(d, g=3) str(bb3) solvec <- bb3$solution # matching vector, where each element indicates to which submatch each observation belongs to # Perform an example random allocation of the above submatches, # these will be randomly allocated to 3 arms based on the submatches set.seed(1) groups <- match.allocate(solvec) # Illustrate randomization, no baseline differences in these three artificial groups by(vcapwide[1:15,c("PSAWeek10", "BWWeek10")], INDICES=groups, FUN=function(x) x) summary(aov(PSAWeek10 ~ groups, data = data.frame(PSAWeek10 = vcapwide[1:15,"PSAWeek10"], groups))) summary(aov(BWWeek10 ~ groups, data = data.frame(BWWeek10 = vcapwide[1:15,"BWWeek10"], groups)))
data(vcapwide) # Construct an Euclidean distance example distance matrix using 15 observations from the VCaP study d <- as.matrix(dist(vcapwide[1:15,c("PSAWeek10", "BWWeek10")])) # Matching using the b&b algorithm to submatches of size 3 # (which will result in 3 intervention groups) bb3 <- match.bb(d, g=3) str(bb3) solvec <- bb3$solution # matching vector, where each element indicates to which submatch each observation belongs to # Perform an example random allocation of the above submatches, # these will be randomly allocated to 3 arms based on the submatches set.seed(1) groups <- match.allocate(solvec) # Illustrate randomization, no baseline differences in these three artificial groups by(vcapwide[1:15,c("PSAWeek10", "BWWeek10")], INDICES=groups, FUN=function(x) x) summary(aov(PSAWeek10 ~ groups, data = data.frame(PSAWeek10 = vcapwide[1:15,"PSAWeek10"], groups))) summary(aov(BWWeek10 ~ groups, data = data.frame(BWWeek10 = vcapwide[1:15,"BWWeek10"], groups)))
This function performs multigroup non-bipartite matching of observations based on a provided distance/dissimilarity matrix 'd'. The number of elements in each submatch is defined by the parameter 'g'.
match.bb(d, g = 2, presort = "complete", progress = 1e+05, bestknown = Inf, maxbranches = Inf, verb = 0)
match.bb(d, g = 2, presort = "complete", progress = 1e+05, bestknown = Inf, maxbranches = Inf, verb = 0)
d |
A distance matrix with NxN elements |
g |
Number of elements per each submatch, i.e. how many observations are always matched together |
presort |
If hierarchical clustering should be used for an initial guess, hclust method-options are valid options ("complete", "single", "ward", "average") |
progress |
How many branching operations are done before outputting information to the user |
bestknown |
If a best known solution already exists, this can be used to bound branches of the tree before initiation. The default Inf value causes whole search tree to be potential solution space. |
maxbranches |
Maximum number of branching operations before returning current best solution, by default no cutoff is defined. |
verb |
Level of verbosity |
See further details in the reference Laajala et al.
The function returns a list of objects, where elements are
branches |
Number of branching operations during the branch and bound algorithm |
bounds |
Number of bounding operations during the branch and bound algorithm |
ends |
Number of end leaf nodes visited during the branch and bound algorithm |
matrix |
The resulting binary matching matrix where rows and columns sum to g |
solution |
The resulting matching vector where each element indicates the submatch where the observation was placed |
cost |
Final cost value of the target function in the minimization task |
Notice that the solution submatch vector in $solution is not the same as the intervention group allocation. Submatches should be randomly allocated to intervention arms using the match.allocate-function.
The package 'nbpMatching' provides a FORTRAN implementation for computation of paired non-bipartite matching case (g=2).
Computation may be heavy if the number of observations is high, or the number of within-submatch pairwise distances to consider is high (increases quadratically as a function of 'g').
Teemu Daniel Laajala <[email protected]>
match.allocate
match.mat2vec
match.vec2mat
match.dummy
data(vcapwide) # Construct an Euclidean distance example distance matrix using 15 observations from the VCaP study d <- as.matrix(dist(vcapwide[1:15,c("PSAWeek10", "BWWeek10")])) bb3 <- match.bb(d, g=3) str(bb3) mat <- bb3$matrix # binary matching matrix solvec <- bb3$solution # matching vector, where each element indicates to which submatch each observation belongs to mixplot(data.frame(vcapwide[1:15,c("PSAWeek10", "BWWeek10")], submatch=as.factor(paste("Submatch_",solvec, sep=""))), pch=16, col=rainbow(5))
data(vcapwide) # Construct an Euclidean distance example distance matrix using 15 observations from the VCaP study d <- as.matrix(dist(vcapwide[1:15,c("PSAWeek10", "BWWeek10")])) bb3 <- match.bb(d, g=3) str(bb3) mat <- bb3$matrix # binary matching matrix solvec <- bb3$solution # matching vector, where each element indicates to which submatch each observation belongs to mixplot(data.frame(vcapwide[1:15,c("PSAWeek10", "BWWeek10")], submatch=as.factor(paste("Submatch_",solvec, sep=""))), pch=16, col=rainbow(5))
Dummy observations are allowed in order to make the number of observations dividable by the number of elements in each submatch, i.e. for pairwise matching the number of observations should be paired, for triangular matching the number of observations should be dividable by 3, etc. This can be done either by adding column averaged individuals to the original data frame (parameter 'dat'), or by adding zero distance sinks to the distance/dissimilarity matrix (parameter 'd'). The latter approach favors dummies being matched to real extreme observations, while the former favors dummies being matched to close-to-mean real observations.
match.dummy(dat, d, g = 2)
match.dummy(dat, d, g = 2)
dat |
A data.frame of the original observations, to which column averaged new dummy observations are added |
d |
N times N distance/dissimilarity matrix, to which zero distance sinks are added |
g |
The desired number of elements per each submatch, i.e. the size of the clusters. The number of added dummies is the smallest number of additions that fulfills (N+dummy)%%g == 0 |
Depending on if the dat or the d parameter was provided, the function either: dat: adds new averaged individuals according to column means and then returns the data matrix d: adds zero distance sinks to the distance/dissimilarity matrix and returns the new distance/dissimilarity matrix
Adding zero distance sinks to the distance matrix or averaged individuals to the original data frame produce different results and affect the optimal matching task differently.
Teemu Daniel Laajala <[email protected]>
match.allocate
match.mat2vec
match.vec2mat
match.bb
data(vcapwide) exdat <- vcapwide[1:10,c("PSAWeek10", "BWWeek10")] dim(exdat) avgdummies <- match.dummy(dat=exdat, g=3) dim(avgdummies) # Construct an Euclidean distance matrix after adding two dummy individuals # (averaged individuals to the original data matrix) bb3 <- match.bb(as.matrix(dist(avgdummies)), g=3) str(bb3) # Construct an Euclidean distance matrix after adding two dummy distances (zero distance sinks) exd <- as.matrix(dist(vcapwide[1:10,c("PSAWeek10", "BWWeek10")])) dim(exd) d <- match.dummy(d=exd, g=3) dim(d) # 10 is not dividable by 3, 2 sinks are added to make d 12x12 bb3 <- match.bb(d, g=3) str(bb3) # Notice that sinks produce a lot smaller target function costs than averaged individuals
data(vcapwide) exdat <- vcapwide[1:10,c("PSAWeek10", "BWWeek10")] dim(exdat) avgdummies <- match.dummy(dat=exdat, g=3) dim(avgdummies) # Construct an Euclidean distance matrix after adding two dummy individuals # (averaged individuals to the original data matrix) bb3 <- match.bb(as.matrix(dist(avgdummies)), g=3) str(bb3) # Construct an Euclidean distance matrix after adding two dummy distances (zero distance sinks) exd <- as.matrix(dist(vcapwide[1:10,c("PSAWeek10", "BWWeek10")])) dim(exd) d <- match.dummy(d=exd, g=3) dim(d) # 10 is not dividable by 3, 2 sinks are added to make d 12x12 bb3 <- match.bb(d, g=3) str(bb3) # Notice that sinks produce a lot smaller target function costs than averaged individuals
An implementation of the Genetic Algorithm for solving non-bipartite matching tasks with customizable evolutionary events and parameters
match.ga(d, g, pops, generations = 100, popsize = 100, nmutate = 100, ndeath = 30, type = "min", mutate = hamlet:::.ga.mutate, breed = hamlet:::.ga.breed, weight = hamlet:::.ga.weight, fitness = hamlet:::.ga.fitness, step = hamlet:::.ga.step, initialize = hamlet:::.ga.init, progplot = T, plot = T, verb = 0, progress = 500, ...)
match.ga(d, g, pops, generations = 100, popsize = 100, nmutate = 100, ndeath = 30, type = "min", mutate = hamlet:::.ga.mutate, breed = hamlet:::.ga.breed, weight = hamlet:::.ga.weight, fitness = hamlet:::.ga.fitness, step = hamlet:::.ga.step, initialize = hamlet:::.ga.init, progplot = T, plot = T, verb = 0, progress = 500, ...)
d |
A distance/dissimilarity matrix 'd' |
g |
The size in submatches, as in how many observations are always connected |
pops |
If user wants to specify starting populations, they can be provided here as a matrix. Each row correspondings to the observations, while columns are the different solutions (population in the GA). For example, a 10 row 100 column pops-matrix would be 100 different matching solutions of 10 observations. Each number in the matrix indicates a different submatch. |
generations |
Number of simulations to run in the GA. In each step, mutations, breeding and breeding occur according to user's specified settings, and a new generation is created out of this. |
popsize |
Number of solutions (='individuals') to have in each step of the algorithm. |
nmutate |
Number of mutations to occur in each step. Individuals are sampled with replacement, and then given the corresponding number of mutations. |
ndeath |
Number of deaths to occur in each step. Each dead solution (='individual') is then replaced by breeding suitable parents (probability of being a parent weighted by fitness). |
type |
Type of optimization, can be 'min' or 'max'. |
mutate |
Mutation function; by default the hamlet internal function '.ga.mutate' is used. This function takes in solution vector 'x'. Two random positions are then swapped, which could be seen as a form of a point mutation. |
breed |
Breeding function; by default the hamlet internal function '.ga.breed' is used. This function takes in solution vectors 'x' and 'y' ,which will be the parents, and the distance matrix 'd'. The products x*d and y*d are computed, and row-wise differences are computed between the two matrices. The row with the highest difference indicates where one of the parents can be most improved, and this trait is inherited from the other parent. |
weight |
Weighting function; by default the hamlet internal function '.ga.weight' is used. This weight should be correspond to probabilities that the corresponding individuals will undergo some sort of event (i.e. mutation, death) or participate in producing offspring (i.e. breed). This probability weight is computed according to ranks of fitnesses computed in the |
fitness |
Fitness function; by default the hamlet internal function '.ga.fitness' is used. This should yield the numeric fitness for a solution, indicating how viable the solution is in relation to the others. In a minimization task the lower fitness indicates better viability. |
step |
A step function; by default the hamlet internal function '.ga.step' is used. The step function which combines all operations in the GA, in order to produce the next generation of solutions given the previous one. |
initialize |
Initialization function; by default the hamlet internal function '.ga.initialize' is used. This function should format a set of valid solutions to produce the first generation in the beginning of the GA. |
progplot |
Should progress be plotted. If true, in every generation index dividable by the parameter 'progress', a function of fitnesses over the generations is plotted. The plot shows development of solution cost quantiles over time. |
plot |
Should the function plot the final quantiles over all the generations. |
verb |
Level of verbosity; -1 indicates omitting of verbal output, 0 indicates normal level, and +1 indicates debugging/additional information. |
progress |
How often should the function plot and print intermediate information on the progress. |
... |
Additional parameters for the internal GA functions. |
The Genetic Algorithm (GA) is a form of an evolutionary optimization algorithm, where a population (a group of solutions to an optimization tasks) reproduce among themselves, die, mutate, and live on in a simulated environment. As the GA is a generic framework of solution approaches, it has many adjustable parameters and user may wish to explore many different options for the populations (for example in population size, mutation frequencies, fitness functions, drift etc) and also the evolutionary mechanics (such as breeding technique, types of mutations, and suitability for reproducing). Here, general default options and mechanics are provided, but it is advisable to explore different parameters for the particular optimization task in hand to find optimal solutions. If the user wishes to explore the implementation of the default mechanics, the function implementations are internally available in the hamlet-package. For example, the mutation function is accessible with the command: ' hamlet::.ga.mutate '.
The returned list compromises of:
A list of solutions; a matrix 'pops' which contains the population of solutions in the final generation of the algorithm, a vector 'fitnesses' which portrait the corresponding fitnesses to the columns of 'pops', and 'weights' which were the corresponding probabilities to events in the GA.
A vector 'bestsol', for which the fitness function obtained minimum (or maximum) value during the algorithm.
A value 'best', which is the optimum solution cost value observed during the algorithm.
Notice that end quality of the matching based allocation is heavily dependent on providing a feasible matrix D. One should carefully consider choice and tuning of the similarity metric. For example, Euclidean distance without standardization is often not a good choice as it does not normalize the variance of each variable and emphasis is on baseline variables that have a large relative variance.
Note that the R-package 'GA' offers a wide range of generalized GA-related tools.
Teemu Daniel Laajala <[email protected]>
# Set up a distance matrix and add dummies, then run GA data(vcapwide) # Construct an Euclidean distance example distance matrix using 15 observations from the VCaP study d <- as.matrix(dist(vcapwide[1:15,c("PSAWeek10", "BWWeek10")])) # Or rather, z-score transform all input variables first d2 <- as.matrix(dist(scale(vcapwide[1:15,c("PSAWeek10", "BWWeek10")]))) # Notice that random simulations take place, so we will fix the RNG seed for reproducibility set.seed(1) # Resulting genetic algorithm progression is plotted by default ga <- match.ga(d2, g=3, generations=60) str(ga) # Submatches, i.e. similar individuals that ought to be allocated to separate groups ga[[2]]
# Set up a distance matrix and add dummies, then run GA data(vcapwide) # Construct an Euclidean distance example distance matrix using 15 observations from the VCaP study d <- as.matrix(dist(vcapwide[1:15,c("PSAWeek10", "BWWeek10")])) # Or rather, z-score transform all input variables first d2 <- as.matrix(dist(scale(vcapwide[1:15,c("PSAWeek10", "BWWeek10")]))) # Notice that random simulations take place, so we will fix the RNG seed for reproducibility set.seed(1) # Resulting genetic algorithm progression is plotted by default ga <- match.ga(d2, g=3, generations=60) str(ga) # Submatches, i.e. similar individuals that ought to be allocated to separate groups ga[[2]]
This function transforms a binary matching matrix to a matching vector. A matching vector is of length N where each element indicates the submatch to which the observation belongs to. Notice that this is not the same as the group allocation vector that is provided by the match.allocate-function. The binary matching matrix is of size N x N where 0 indicates that the observations have been part of a different submatch, and 1 indicates that the observations have been part of the same submatch. Diagonal is always 0 although an observation is always in the same submatch with its self.
match.mat2vec(xmat)
match.mat2vec(xmat)
xmat |
A binary matching matrix 'xmat' |
A matching vector where each element indicates submatch the observation belongs to
Notice that the particular index numbers produced by match.mat2vec may be different to that of the branch and bound solution vector, but that the submatches shared by observations are common.
Teemu Daniel Laajala <[email protected]>
match.allocate
match.bb
match.vec2mat
match.dummy
data(vcapwide) # Construct an Euclidean distance example distance matrix using 15 observations from the VCaP study d <- as.matrix(dist(vcapwide[1:15,c("PSAWeek10", "BWWeek10")])) bb3 <- match.bb(d, g=3) str(bb3) mat <- bb3$matrix # matching vector, where each element indicates to which submatch each observation belongs to mat solvec <- match.mat2vec(mat) which(mat[1,] == 1) # E.g. the first, third and thirteenth observation are part of the same submatch which(solvec == solvec[1]) # Similarly
data(vcapwide) # Construct an Euclidean distance example distance matrix using 15 observations from the VCaP study d <- as.matrix(dist(vcapwide[1:15,c("PSAWeek10", "BWWeek10")])) bb3 <- match.bb(d, g=3) str(bb3) mat <- bb3$matrix # matching vector, where each element indicates to which submatch each observation belongs to mat solvec <- match.mat2vec(mat) which(mat[1,] == 1) # E.g. the first, third and thirteenth observation are part of the same submatch which(solvec == solvec[1]) # Similarly
This function allows transforming a matching vector to a binary matching matrix. A matching vector is of length N where each element indicates the submatch to which the observation belongs to. Notice that this is not the same as the group allocation vector that is provided by the match.allocate-function. The binary matching matrix is of size N x N where 0 indicates that the observations have been part of a different submatch, and 1 indicates that the observations have been part of the same submatch. Diagonal is always 0 although an observation is always in the same submatch with its self.
match.vec2mat(x)
match.vec2mat(x)
x |
A matching vector 'x' |
N times N binary matching matrix, where 0 indicates that the observations have been part of a different submatch, and 1 indicates that the observations have been part of the same submatch.
Teemu Daniel Laajala <[email protected]>
match.allocate
match.mat2vec
match.bb
match.dummy
data(vcapwide) # Construct an Euclidean distance example distance matrix using 15 observations from the VCaP study d <- as.matrix(dist(vcapwide[1:15,c("PSAWeek10", "BWWeek10")])) bb3 <- match.bb(d, g=3) str(bb3) solvec <- bb3$solution # matching vector, where each element indicates to which submatch each observation belongs to solvec mat <- match.vec2mat(solvec) mat which(mat[1,] == 1) # E.g. the first, third and thirteenth observation are part of the same submatch which(solvec == solvec[1]) # Similarly
data(vcapwide) # Construct an Euclidean distance example distance matrix using 15 observations from the VCaP study d <- as.matrix(dist(vcapwide[1:15,c("PSAWeek10", "BWWeek10")])) bb3 <- match.bb(d, g=3) str(bb3) solvec <- bb3$solution # matching vector, where each element indicates to which submatch each observation belongs to solvec mat <- match.vec2mat(solvec) mat which(mat[1,] == 1) # E.g. the first, third and thirteenth observation are part of the same submatch which(solvec == solvec[1]) # Similarly
Assuming a mixed-effects model of form y_fit = Xb + Zu + e, where X is the model matrix for fixed effects, Z is the model matrix for random effects, and b and u are the fixed and random effects respectively, this function returns these components per each fitted value y. These may be useful for model inference and/or diagnostic purposes.
mem.getcomp(fit)
mem.getcomp(fit)
fit |
A fitted mixed-effects model generated either through the lme4 or the nlme package. |
Notice that per-observation model error is e = Xb + Zu - y_observation. Similarly, the components Xb and Zu are extracted.
The function returns per-observation model fit components for a mixed-effects model. The return fields are
Xb |
Fixed effects component Xb |
Zu |
Random effects component Zu |
XbZu |
Full model fit by summing the above two Xb+Zu |
e |
Model error e |
y |
Original observations y |
Teemu Daniel Laajala <[email protected]>
data(vcaplong) exdat <- vcaplong[vcaplong[,"Group"] %in% c("Vehicle", "ARN"),] library(lme4) f1 <- lmer(log2PSA ~ 1 + DrugWeek + DrugWeek:ARN + (1 + DrugWeek|ID), data = exdat) mem.getcomp(f1)
data(vcaplong) exdat <- vcaplong[vcaplong[,"Group"] %in% c("Vehicle", "ARN"),] library(lme4) f1 <- lmer(log2PSA ~ 1 + DrugWeek + DrugWeek:ARN + (1 + DrugWeek|ID), data = exdat) mem.getcomp(f1)
This plot creates histogram plots for the columns extracted from random effects from a model fit. This is useful for model diagnostics, such as observing deviations from normality in the random effects.
mem.plotran(fit, breaks = 100)
mem.plotran(fit, breaks = 100)
fit |
A fitted mixed-effects model generated either through the lme4 or the nlme package. |
breaks |
Number of breaks in the histograms (passed to the 'hist'-function) |
Teemu Daniel Laajala <[email protected]>
data(vcaplong) exdat <- vcaplong[vcaplong[,"Group"] %in% c("Vehicle", "ARN"),] library(lme4) f1 <- lmer(log2PSA ~ 1 + DrugWeek + DrugWeek:ARN + (1 + DrugWeek|ID), data = exdat) ranef(f1) # Histograms are plotted for these columns mem.plotran(f1)
data(vcaplong) exdat <- vcaplong[vcaplong[,"Group"] %in% c("Vehicle", "ARN"),] library(lme4) f1 <- lmer(log2PSA ~ 1 + DrugWeek + DrugWeek:ARN + (1 + DrugWeek|ID), data = exdat) ranef(f1) # Histograms are plotted for these columns mem.plotran(f1)
This function plots stylized residuals of a mixed-effects model. It is possible to obtain fitted values versus errors (XbZu vs e), or original values versus errors (y vs e) in order to obtain different views to the errors in connection to the observations.
mem.plotresid(fit, linear = T, type = "XbZu", main, xlab, ylab)
mem.plotresid(fit, linear = T, type = "XbZu", main, xlab, ylab)
fit |
A fitted mixed-effects model generated either through the lme4 or the nlme package. |
linear |
Should linear trend lines be drawn to the residual plot |
type |
Type of residual plot; should fitted values (value "XbZu") or original observations (value "y") be plotted against "e" errors |
main |
Main title |
xlab |
x-axis label |
ylab |
y-axis label |
Notice that the typical residual plot is fitted values (type="XbZu") versus errors ("e").
Teemu Daniel Laajala <[email protected]>
data(vcaplong) exdat <- vcaplong[vcaplong[,"Group"] %in% c("Vehicle", "ARN"),] library(lme4) f0 <- lmer(log2PSA ~ 1 + DrugWeek + (1 + DrugWeek|ID), data = exdat) f1 <- lmer(log2PSA ~ 1 + DrugWeek + DrugWeek:ARN + (1 + DrugWeek|ID), data = exdat) f2 <- lmer(log2PSA ~ 1 + DrugWeek + DrugWeek:ARN + (1|ID) + (0 + DrugWeek|ID), data = exdat) f3 <- lmer(log2PSA ~ 1 + DrugWeek + DrugWeek:ARN + (1|Submatch) + (0 + DrugWeek|ID), data = exdat) par(mfrow=c(2,2)) mem.plotresid(f0) mem.plotresid(f1) mem.plotresid(f2) mem.plotresid(f3)
data(vcaplong) exdat <- vcaplong[vcaplong[,"Group"] %in% c("Vehicle", "ARN"),] library(lme4) f0 <- lmer(log2PSA ~ 1 + DrugWeek + (1 + DrugWeek|ID), data = exdat) f1 <- lmer(log2PSA ~ 1 + DrugWeek + DrugWeek:ARN + (1 + DrugWeek|ID), data = exdat) f2 <- lmer(log2PSA ~ 1 + DrugWeek + DrugWeek:ARN + (1|ID) + (0 + DrugWeek|ID), data = exdat) f3 <- lmer(log2PSA ~ 1 + DrugWeek + DrugWeek:ARN + (1|Submatch) + (0 + DrugWeek|ID), data = exdat) par(mfrow=c(2,2)) mem.plotresid(f0) mem.plotresid(f1) mem.plotresid(f2) mem.plotresid(f3)
Bootstrap sampling is used to investigate the statistical significance of the fixed effects terms specified for a readily fitted mixed-effects model as a function of the number of individuals participating in the study. User either specifies a suitable sampling unit, or it is automatically identified based on the random effects formulation of a readily fitted mixed-effects model. Per each count of individuals in vector N, a fixed number of bootstrapped datasets are generated and re-fitted using the model formulation on the pre-fitted model. Power is then computed as the fraction of effects identified as statistically significant out of all the bootstrapped datasets.
mem.powersimu(fit, N = 4:20, boot = 100, level = NULL, strata = NULL, default = FALSE, seed = NULL, plot = TRUE, plot.loess = FALSE, legendpos = "bottomright", return.data = FALSE, verb = 1, ...)
mem.powersimu(fit, N = 4:20, boot = 100, level = NULL, strata = NULL, default = FALSE, seed = NULL, plot = TRUE, plot.loess = FALSE, legendpos = "bottomright", return.data = FALSE, verb = 1, ...)
fit |
A fitted mixed-effects model. Should be either a model produced by the lme4-package, or then a modified lme4-fit such as provided by lmerTest or similar package that builds on lme4. |
N |
A vector of desired amounts of individuals to be tested, i.e. sample sizes N. Notice that the N may be either a total N if no strata is spesified, or then an N value per each substrata if strata is not NULL. See below the parameter 'strata'. |
boot |
Number of bootstrapped datasets to generate per each N value. The total number of generated data frames in the end will be N times boot. |
level |
An unambiguous indicator available in the model data frame that indicates each separate individual unit in the experiment. For example, this may correspond to a single patient indicator column ID, where each patient has a unique ID instance. If this parameter is given as NULL, then this function automatically attempts to identify the best possible level of individual indicators based on the random effects specified for the model. |
strata |
If any sampling strata should be balanced, it should be indicated here. For example, if one is studying the possible effects of an intervention, it is typical to have an equal number of individual both in the control and in the intervention arms also in the sampled datasets. It should be then given as an column name available in the original model data frame. Each strata will be sampled in equal amounts. |
default |
What is the default statistical significance if a model could not be re-fitted to the sampled datasets, which may occur for example due to convergence or redundance issues. This defaults to FALSE, which means that a coefficient is expected to be statistically insignificant if the corresponding model re-fitting fails in lme4. |
seed |
For reproducibility, one may wish to set a numeric seed to produce the exact same results. |
plot |
If set to TRUE, the function will plot a power curve. Each fixed effects coefficient is a different curve, with color coding and a legend annotated to separate which one is which. |
plot.loess |
If plot==TRUE, this plot.loess==TRUE adds an additional loess-smoothed approximated curve to the existing curves. This is useful if running the simulations with a low number of bootstrapped samples, as it may help approximate where the curve reaches critical points, i.e. power = 0.8. |
legendpos |
Position for the legend in plot==TRUE, defaults to "bottomright". Any legal position similar to provided the function 'legend' is allowed. |
return.data |
Should one obtain the bootstrapped data instead of bootstrapping and then re-fitting. This will skip the model re-fitting schema and instead return a list of lists with the bootstrapped data instead. The outer list corresponds to the values of 'N', while the inner loop corresponds to the different 'boot' runs of bootstrap. This may be useful to inspecting that the schema is sampling correct sampling units for example, or if bootstrapping is to be used for something else than re-fitting the lme4-models. |
verb |
Numeric value indicating the level of verbosity; 0=silent, 1=normal, 2=debugging. |
... |
Additional parameters provided for the function. |
This function will by default utilizes the lmerTest-package's Satterthwaite approximation for determining the p-values for the fixed effects. If this fails, it resorts to the conventional approximation |t|>2 for significance, which is not accurate, but may provide a reasonable approximation for the power levels.
If return.data==FALSE, this function will return a matrix, where the rows correspond to the different N values and the columns correspond to the fixed effects. The values [0,1] are the fraction of bootstrapped datasets where the corresponding fixed effects was detected as statistically significant.
Please note that the example runs in this document are extremely small due to run time constraints on CRAN. For real power analyses, it is recommended that the N counts would vary e.g. from 5 to 15 with steps of 1 and the amount of bootstrapped datasets would be at least 100.
Teemu D. Laajala
# Use the VCaP ARN data as an example data(vcaplong) arn <- vcaplong[vcaplong[,"Group"] == "Vehicle" | vcaplong[,"Group"] == "ARN",] # lme4 is required for mixed-effects models library(lme4) # Fit an example fixed effects model fit <- lmer(PSA ~ 1 + DrugWeek + ARN:DrugWeek + (1|ID) + (0 + DrugWeek|ID), data = arn) # For reproducibility, set a seed set.seed(123) # Run a brief power analysis with only a few selected N values and a limited number of bootstrapping # Balance strata over the ARN and non-ARN (=Vehicle) so that both contain equal count of individuals power <- mem.powersimu(fit, N=c(3, 6, 9), boot=10, strata="ARN", plot=TRUE) # Power curves are plotted, along with returning the power matrix at: power # Notice that each column corresponds to a specified fixed effects at the formula # "1 + DrugWeek + ARN:DrugWeek"
# Use the VCaP ARN data as an example data(vcaplong) arn <- vcaplong[vcaplong[,"Group"] == "Vehicle" | vcaplong[,"Group"] == "ARN",] # lme4 is required for mixed-effects models library(lme4) # Fit an example fixed effects model fit <- lmer(PSA ~ 1 + DrugWeek + ARN:DrugWeek + (1|ID) + (0 + DrugWeek|ID), data = arn) # For reproducibility, set a seed set.seed(123) # Run a brief power analysis with only a few selected N values and a limited number of bootstrapping # Balance strata over the ARN and non-ARN (=Vehicle) so that both contain equal count of individuals power <- mem.powersimu(fit, N=c(3, 6, 9), boot=10, strata="ARN", plot=TRUE) # Power curves are plotted, along with returning the power matrix at: power # Notice that each column corresponds to a specified fixed effects at the formula # "1 + DrugWeek + ARN:DrugWeek"
This function encodes categorical variables (e.g. columns of type 'factor' or 'character'). U new columns are created per each such column, where U is the number of unique instances of that column. The new columns are named OriginalColumnName_U1, OriginalColumnName_U2, etc.
mix.binary(x)
mix.binary(x)
x |
A data.frame or a matrix where categorical columns are to be binary coded. Categorical columns are assumed to be all non-numeric fields. |
A function that codes categorical variables in a dataset into binary variables. This is done in the following manner: e.g. x = red, green, blue, green –> x_new = 1,0,0, 0,1,0, 0,0,1, 0,1,0 where the dimensions in x_new are is_red, is_green and is_blue
The function returns a data.frame, where categorical variables have been replaced with 0/1-binary fields, and numeric fields have been left untouched. Notice that the order of the columns may not be the original.
Teemu Daniel Laajala <[email protected]>
data(vcapwide) ex <- mix.binary(vcapwide[,c("Group", "CastrationDate")]) apply(ex, MARGIN=1, FUN=sum) # Notice that each row sums to 2, as two categorical variables were binary coded # and no missing values were present mix.binary(vcapwide[,c("PSAWeek4", "Group", "CastrationDate")]) # Binary coding is only applied to non-numeric fields
data(vcapwide) ex <- mix.binary(vcapwide[,c("Group", "CastrationDate")]) apply(ex, MARGIN=1, FUN=sum) # Notice that each row sums to 2, as two categorical variables were binary coded # and no missing values were present mix.binary(vcapwide[,c("PSAWeek4", "Group", "CastrationDate")]) # Binary coding is only applied to non-numeric fields
This function is intended for applying functions to numeric fields of a mixed type data.frame. Namely, the function ignores fields that are e.g. factors, and returns FUN function applied to only the numeric fields.
mix.fun(x, FUN = scale, ...)
mix.fun(x, FUN = scale, ...)
x |
Data.frame x with mixed type fields |
FUN |
Function to apply, for example 'scale', 'cov', or 'cor' |
... |
Additional parameters passed on to FUN |
Return values of FUN when applied to numeric columns of 'x'
Teemu Daniel Laajala <[email protected]>
data(vcapwide) mix.fun(vcapwide[,c("Group", "PSAWeek4", "PSAWeek10", "PSAWeek14")], FUN=scale) # Column 'Group' is ignored mix.fun(vcapwide[,c("Group", "PSAWeek4", "PSAWeek10", "PSAWeek14")], FUN=cov, use="na.or.complete") # ... is used to pass the 'use' parameter to the 'cov'-function
data(vcapwide) mix.fun(vcapwide[,c("Group", "PSAWeek4", "PSAWeek10", "PSAWeek14")], FUN=scale) # Column 'Group' is ignored mix.fun(vcapwide[,c("Group", "PSAWeek4", "PSAWeek10", "PSAWeek14")], FUN=cov, use="na.or.complete") # ... is used to pass the 'use' parameter to the 'cov'-function
This function plots a scatterplot similar to the default plot-function, with the difference that factor/character fields in input data.frame are handled as categorical variables. These categorical variables are color-coded and handled separately in marginal distributions.
mixplot(x, main = NA, match, func = function(x, y, par) { segments(x0 = x[1], y0 = x[2], x1 = y[1], y1 = y[2], col = par)}, legend = T, col = palette(), na.lines = T, origin = F, marginal = F, lhei, lwid, verb = 0, ...)
mixplot(x, main = NA, match, func = function(x, y, par) { segments(x0 = x[1], y0 = x[2], x1 = y[1], y1 = y[2], col = par)}, legend = T, col = palette(), na.lines = T, origin = F, marginal = F, lhei, lwid, verb = 0, ...)
x |
A data.frame or a matrix of observations. Typically x should be a data.frame, where columns are of different types, e.g. some of 'numeric' and some of 'factor' class. |
main |
Main title plotted on top of the figure |
match |
A matching matrix (e.g. produced by hamlet::match.vec2mat) or a matching vector (e.g. produced by hamlet::match.mat2vec) that indicates with different values if certain observations should be connected. |
func |
The function to apply to each pair of observations 'x' and 'y'. By default, it is a segment line in 2 dimensions (each individual bivariate panel). Segment line color is indicated by the matching vector or individual element in the matching matrix. Thus 0-values indicate no line, while other values are used to annotate submatches. 'par' is the index of the submatch, and by default indicate the colors. |
legend |
Should an automated legend be generated |
col |
Colors per observation |
na.lines |
Should lines be drawn to represent one of the variables if the other one is missing in a 2-dim scatterplot |
origin |
Should the origin x=0, y=0 be separately indicated using lines |
marginal |
Should marginal distributions be drawn in sides of each scatterplot |
lhei |
Heights for bins in the layout |
lwid |
Widths for bins in the layout |
verb |
Level of verbosity: -1<= (no verbosity), 0/FALSE (warnings) or >=1/TRUE (additional information) |
... |
Additional parameters given to the plot-function |
An invisible return of the measurements and plot layout structure (matrix, heights, and widths)
Teemu Daniel Laajala <[email protected]>
data(vcapwide) mixplot(vcapwide[,c("Group", "PSAWeek4", "PSAWeek10", "PSAWeek14")], marginal=TRUE, pch=16, main="PSA at weeks 4, 10 and 14 per intervention group")
data(vcapwide) mixplot(vcapwide[,c("Group", "PSAWeek4", "PSAWeek10", "PSAWeek14")], marginal=TRUE, pch=16, main="PSA at weeks 4, 10 and 14 per intervention group")
Long-format measurements of PSA over the intervention period in the ORX study. Notice that this data.frame is in suitable format for mixed-effects modeling, where each row should correspond to a single longitudinal measurement. These measurements are annotated using the individual indicator fields 'ID', time fields 'Day', 'TrDay', 'Date', and the response values are contained in raw format in 'PSA' or after log2-transformation in 'log2PSA'. Additional fields are provided for group testing and matched inference in 'Group', 'Submatch', and the binary indicators 'ORX+Tx', 'ORX', and 'Intact'.
data("orxlong")
data("orxlong")
A data frame with 392 observations on the following 11 variables.
ID
A unique character indicator for the different individual(s)
PSA
Raw longitudinal PSA measurement values in unit (ug/l)
log2PSA
Log2-transformed longitudinal PSA measurement values in unit (log2 ug/l)
Day
Day since the first PSA measurement. Notice that there is a single time point prior to interventions.
TrDay
Day since the interventions began, 0 annotating the point at which surgery was performed or drug compounds were first given.
Date
A date format when the actual measurement was performed
Group
The actual intervention groups, after blinded groups were assigned to 'ORX+Tx', 'ORX', or 'Intact'
Submatch
The submatches that were assigned based on the baseline variables.
A binary indicator field indicating which measurements belong to the group 'ORX+Tx'
ORX
A binary indicator field indicating which measurements belong to the group 'ORX'
Intact
A binary indicator field indicating which measurements belong to the group 'Intact'
For mixed-effects modeling, the fields 'ID', 'PSA' (or 'log2PSA'), 'TrDay', and group-specific indicators should be included.
Group-testing should be performed so that 'ORX+Tx' is tested against 'ORX', in order to infer possible effects occurring due to 'Tx' on top of 'ORX'. 'ORX' should be compared to 'Intact', in order to infer if the 'ORX' surgical procedure has beneficial effects in comparison to intact animals. For statistical modeling of the intervention effects, one should use observations with the positive 'TrDay'-values, as this indicates the beginning of the interventions.
Laajala TD, Jumppanen M, Huhtaniemi R, Fey V, Kaur A, et al. (2016) Optimized design and analysis of preclinical intervention studies in vivo. Sci Rep. 2016 Aug 2;6:30723. doi: 10.1038/srep30723.
data(orxlong) # Construct data frames that can be used for testing pairwise group contrasts orxintact <- orxlong[orxlong[,"Intact"]==1 | orxlong[,"ORX"]==1, c("PSA", "ID", "ORX", "TrDay", "Submatch")] orxtx <- orxlong[orxlong[,"ORXTx"]==1 | orxlong[,"ORX"]==1, c("PSA", "ID", "ORXTx", "TrDay", "Submatch")] # Include only observations occurring post-surgery orxintact <- orxintact[orxintact[,"TrDay"]>=0,] orxtx <- orxtx[orxtx[,"TrDay"]>=0,] # Example fits library(lme4) library(lmerTest) # Conventional model fits fit1a <- lmer(PSA ~ 1 + TrDay + ORXTx:TrDay + (1|ID) + (0 + TrDay|ID), data = orxtx) fit1b <- lmer(PSA ~ 1 + TrDay + ORXTx:TrDay + (1 + TrDay|ID), data = orxtx) fit2a <- lmer(PSA ~ 1 + TrDay + ORX:TrDay + (1|ID) + (0 + TrDay|ID), data = orxintact) fit2b <- lmer(PSA ~ 1 + TrDay + ORX:TrDay + (1 + TrDay|ID), data = orxintact) # Collate to matched inference for pairwise observations over the submatches matched.orx <- do.call("rbind", by(orxintact, INDICES=orxintact[,"Submatch"], FUN=function(z){ z[,"MatchedPSA"] <- z[,"PSA"] - z[z[,"ORX"]==0,"PSA"] z <- z[z[,"ORX"]==1,] z })) # Few examples of matched fits with different model formulations fit.matched.1 <- lmer(MatchedPSA ~ 0 + TrDay + (1|ID) + (0 + TrDay|ID), data = matched.orx) fit.matched.2 <- lmer(MatchedPSA ~ 1 + TrDay + (1|ID) + (0 + TrDay|ID), data = matched.orx) fit.matched.3 <- lmer(MatchedPSA ~ 1 + TrDay + (1 + TrDay|ID), data = matched.orx) summary(fit.matched.1) summary(fit.matched.2) summary(fit.matched.3) # We notice that the intercept term is highly insignificant # if included in the matched model, as expected by baseline balance. # In contrast, the matched intervention growth coefficient is highly # statistically significant in each of the models.
data(orxlong) # Construct data frames that can be used for testing pairwise group contrasts orxintact <- orxlong[orxlong[,"Intact"]==1 | orxlong[,"ORX"]==1, c("PSA", "ID", "ORX", "TrDay", "Submatch")] orxtx <- orxlong[orxlong[,"ORXTx"]==1 | orxlong[,"ORX"]==1, c("PSA", "ID", "ORXTx", "TrDay", "Submatch")] # Include only observations occurring post-surgery orxintact <- orxintact[orxintact[,"TrDay"]>=0,] orxtx <- orxtx[orxtx[,"TrDay"]>=0,] # Example fits library(lme4) library(lmerTest) # Conventional model fits fit1a <- lmer(PSA ~ 1 + TrDay + ORXTx:TrDay + (1|ID) + (0 + TrDay|ID), data = orxtx) fit1b <- lmer(PSA ~ 1 + TrDay + ORXTx:TrDay + (1 + TrDay|ID), data = orxtx) fit2a <- lmer(PSA ~ 1 + TrDay + ORX:TrDay + (1|ID) + (0 + TrDay|ID), data = orxintact) fit2b <- lmer(PSA ~ 1 + TrDay + ORX:TrDay + (1 + TrDay|ID), data = orxintact) # Collate to matched inference for pairwise observations over the submatches matched.orx <- do.call("rbind", by(orxintact, INDICES=orxintact[,"Submatch"], FUN=function(z){ z[,"MatchedPSA"] <- z[,"PSA"] - z[z[,"ORX"]==0,"PSA"] z <- z[z[,"ORX"]==1,] z })) # Few examples of matched fits with different model formulations fit.matched.1 <- lmer(MatchedPSA ~ 0 + TrDay + (1|ID) + (0 + TrDay|ID), data = matched.orx) fit.matched.2 <- lmer(MatchedPSA ~ 1 + TrDay + (1|ID) + (0 + TrDay|ID), data = matched.orx) fit.matched.3 <- lmer(MatchedPSA ~ 1 + TrDay + (1 + TrDay|ID), data = matched.orx) summary(fit.matched.1) summary(fit.matched.2) summary(fit.matched.3) # We notice that the intercept term is highly insignificant # if included in the matched model, as expected by baseline balance. # In contrast, the matched intervention growth coefficient is highly # statistically significant in each of the models.
This data frame contains the wide-format data of the ORX study for baseline characteristics of the individuals participating in the study. Some fields (Volume, PSA, High, BodyWeight, PSAChange) were used to construct the distance matrix in the original matching-based random allocation of individuals at baseline, while other variables (Group, Submatch) contain these results.
data("orxwide")
data("orxwide")
A data frame with 109 observations on the following 8 variables.
ID
A unique character indicator for the different individual(s)
Group
After identifying suitable submatches, the data were distributed to blinded intervention groups. These groups were later then annotated to actual treatments or non-intervention control groups.
Submatch
Submatches identified at baseline using the methodology presented in this package
Volume
Tumor volume at baseline in cubic millimeters
PSA
Raw baseline PSA measurement values in unit (ug/l)
High
The highest dimension in the tumor in millimeters, giving insight into the shape of the tumor
BodyWeight
Body weight at baseline in unit (g)
PSAChange
A fold-change like change in PSA from the prior measurement defined as: (PSA_current - PSA_last)/(PSA_last)
Originally, 3-fold weighting of the baseline 'Volume' and 'PSA' was used in comparison to 'High', 'BodyWeight' and 'PSAChange' when computing the distance matrix. Furthermore, some individuals were annotated prior to matching for exclusion based on outlierish behaviour. The exclusion criteria were applied before any interventions were given or the matching was performed. The excluded tumors had either non-existant PSA, non-detectable tumor volume, or extremely large tumors (volume above 700 mm^3).
Notice that while normally the submatches would be distributed equally to the experiment groups, here rarely a single submatch may hold multiple instances from a single group. This is due to practical constraints in the experiment, that animals had to be manually moved in order to fulfill groups and to reflect the amount of drug compounds available. Additionally, the original experiment was performed on 6 intervention groups, while here only 3 are further presented after the baseline ('ORX+Tx', 'ORX' and 'Intact').
Laajala TD, Jumppanen M, Huhtaniemi R, Fey V, Kaur A, et al. (2016) Optimized design and analysis of preclinical intervention studies in vivo. Sci Rep. 2016 Aug 2;6:30723. doi: 10.1038/srep30723.
data(orxwide) # Construct an example distance matrix based on conventional # Euclidean distance and the baseline characteristics d.orx <- dist(orxwide[,c("Volume", "PSA", "High", "BodyWeight", "PSAChange")]) # Plot a hierarchical clustering of the individuals plot(hclust(d=d.orx)) # This 'd.orx' may then be further processed by downstream experiment # design functions such as match.ga, match.bb, etc.
data(orxwide) # Construct an example distance matrix based on conventional # Euclidean distance and the baseline characteristics d.orx <- dist(orxwide[,c("Volume", "PSA", "High", "BodyWeight", "PSAChange")]) # Plot a hierarchical clustering of the individuals plot(hclust(d=d.orx)) # This 'd.orx' may then be further processed by downstream experiment # design functions such as match.ga, match.bb, etc.
This function takes in a vector of measurements and computes overlapping bins of observations, and applies a jittering function within each overlapping bin.
smartjitter(x, q = seq(from = 0, to = 1, length.out = 10), type = 1, amount = 0.1, jitterfuncs = list(function(n) { (1:n)/(1/amount) }, function(n) { (((-1)^c(0:(n - 1))) * (0:(n - 1)))/(1/amount) }), jits = jitterfuncs[[type]])
smartjitter(x, q = seq(from = 0, to = 1, length.out = 10), type = 1, amount = 0.1, jitterfuncs = list(function(n) { (1:n)/(1/amount) }, function(n) { (((-1)^c(0:(n - 1))) * (0:(n - 1)))/(1/amount) }), jits = jitterfuncs[[type]])
x |
The values that should be jittered. Notice that these are used to determine which are overlapping, and should not be though of as x-axis positions (see example). |
q |
Probability quantiles where the ends of the bins should be placed |
type |
Type of jittering, by default it is used to choose which element (1 or 2) of the list of jittering functions is chosen as the final jittering function. Customized functions may be provided to the jitterfuncs-parameter. |
amount |
Amount of jittering (here deterministic shifting) for the jittering function |
jitterfuncs |
List of possible jittering functions for n overlapping values. The jittering function at list position 'type' is chosen |
jits |
Final jittering function from the jitterfuncs-list |
The smart jittering is applied to the x-parameter values, and returns a vector of shifting amounts per each observation. Notice that in the typical case, parameter 'x' are the desired response values e.g. among the y-axis, and the returned value of smartjitter are the amounts of jittering done on the x-axis of a plot.
The function returns a vector of values with same length as x. The values in this vector indicate what should be the shifting per each observation, if the observations should be jittered along an another axis.
Teemu Daniel Laajala <[email protected]>
data(vcapwide) plot.new() plot.window(xlim=extendrange(c(0,1)), ylim=extendrange(vcapwide[,"PSAWeek4"])) y1 <- vcapwide[vcapwide[,"CastrationDate"]=="100413","PSAWeek4"] y2 <- vcapwide[vcapwide[,"CastrationDate"]=="170413","PSAWeek4"] points(x=0+smartjitter(y1, type=2, amount=0.02), y=y1) points(x=1+smartjitter(y2, type=2, amount=0.02), y=y2) axis(1, at=c(0,1), labels=c("10.04.13", "17.04.13")) axis(2); box() title(ylab="PSA at week 4", xlab="Castration batches")
data(vcapwide) plot.new() plot.window(xlim=extendrange(c(0,1)), ylim=extendrange(vcapwide[,"PSAWeek4"])) y1 <- vcapwide[vcapwide[,"CastrationDate"]=="100413","PSAWeek4"] y2 <- vcapwide[vcapwide[,"CastrationDate"]=="170413","PSAWeek4"] points(x=0+smartjitter(y1, type=2, amount=0.02), y=y1) points(x=1+smartjitter(y2, type=2, amount=0.02), y=y2) axis(1, at=c(0,1), labels=c("10.04.13", "17.04.13")) axis(2); box() title(ylab="PSA at week 4", xlab="Castration batches")
The long-format of the VCaP experiment PSA-measurements may be used to model longitudinal measurements during interventions (Vehicle, ARN, or MDV). Body weights and PSA were measured weekly during the experiment. PSA concentrations were log2-transformed to make data better normally distributed.
data(vcaplong)
data(vcaplong)
A data frame with 225 observations on the following 11 variables.
PSA
Raw PSA (prostate-specific antigen) measurements with unit (ug/l)
log2PSA
Log2-transformed PSA (prostate-specific antigen) measurements with unit (log2 ug/l)
BW
Body weights (g)
Submatch
A grouping factor for indicating which measurements belong to individuals that were part of the same submatch prior to interventions
ID
A character vector indicating unique animal IDs
Week
Week of the experiment, notice that this is not the same as the week of drug administration (see below)
DrugWeek
Week since beginning administration of the drugs
Group
Grouping factor for intervention groups of the observations
Vehicle
Binary indicator for which observations belonged to the group 'Vehicle'
ARN
Binary indicator for which observations belonged to the group 'ARN-509'
MDV
Binary indicator for which observations belonged to the group 'MDV3100'
Notice that the long-format is suitable for modeling longitudinal measurements. The grouping factors ID or Submatch could be used to group observations belonging to a single individual or matched individuals.
Laajala TD, Jumppanen M, Huhtaniemi R, Fey V, Kaur A, et al. (2016) Optimized design and analysis of preclinical intervention studies in vivo. Sci Rep. 2016 Aug 2;6:30723. doi: 10.1038/srep30723.
Knuuttila M, Yatkin E, Kallio J, Savolainen S, Laajala TD, et al. (2014) Castration induces upregulation of intratumoral androgen biosynthesis and androgen receptor expression in orthotopic VCaP human prostate cancer xenograft model. Am J Pathol. 2014 Aug;184(8):2163-73. doi: 10.1016/j.ajpath.2014.04.010.
data(vcaplong) str(vcaplong) head(vcaplong) library(lattice) xyplot(log2PSA ~ DrugWeek | Group, data = vcaplong, type="l", group=ID, layout=c(3,1)) xyplot(BW ~ DrugWeek | Group, data = vcaplong, type="l", group=ID, layout=c(3,1))
data(vcaplong) str(vcaplong) head(vcaplong) library(lattice) xyplot(log2PSA ~ DrugWeek | Group, data = vcaplong, type="l", group=ID, layout=c(3,1)) xyplot(BW ~ DrugWeek | Group, data = vcaplong, type="l", group=ID, layout=c(3,1))
VCaP cancer cells were injected orthotopically into the prostate of mice and PSA (prostate-specific antigen) was followed. The animals were castrated on two subsequent weeks, after which the castration-resistant tumors were allowed to emerge. Since PSA reached pre-castration levels, the animals were non-bipartite matched and allocated to separate intervention arms (at week 10). 3 different interventions are presented here, with 'Vehicle' as a comparison point and MDV3100 and ARN-509 tested for reducing PSA and its correlated tumor size.
data(vcapwide)
data(vcapwide)
A data frame with 45 observations on the following 34 variables.
CastrationDate
A numeric vector indicating week when the animal was castrated, resulting in steep decrease in PSA and subsequent castration-resistant tumors to emerge.
CageAtAllocation
A factorial vector indicating cage labels for each animal at the intervention allocation.
Group
A character vector indicating which intervention group the animal was allocated to in the actual experiment (3 alternatives).
TreatmentInitiationWeek
A character vector indicating at which week the intervention was started.
Submatch
A character vector indicating which submatch the individual was part of the original non-bipartite matching task.
ID
A unique character vector indicating the animals.
PSAWeek2
Numeric vector(s) indicating PSA concentration (ug/l) per each week (2 to 14) of the experiment.
PSAWeek3
Numeric vector(s) indicating PSA concentration (ug/l) per each week (2 to 14) of the experiment.
PSAWeek4
Numeric vector(s) indicating PSA concentration (ug/l) per each week (2 to 14) of the experiment.
PSAWeek5
Numeric vector(s) indicating PSA concentration (ug/l) per each week (2 to 14) of the experiment.
PSAWeek6
Numeric vector(s) indicating PSA concentration (ug/l) per each week (2 to 14) of the experiment.
PSAWeek7
Numeric vector(s) indicating PSA concentration (ug/l) per each week (2 to 14) of the experiment.
PSAWeek8
Numeric vector(s) indicating PSA concentration (ug/l) per each week (2 to 14) of the experiment.
PSAWeek9
Numeric vector(s) indicating PSA concentration (ug/l) per each week (2 to 14) of the experiment.
PSAWeek10
Numeric vector(s) indicating PSA concentration (ug/l) per each week (2 to 14) of the experiment.
PSAWeek11
Numeric vector(s) indicating PSA concentration (ug/l) per each week (2 to 14) of the experiment.
PSAWeek12
Numeric vector(s) indicating PSA concentration (ug/l) per each week (2 to 14) of the experiment.
PSAWeek13
Numeric vector(s) indicating PSA concentration (ug/l) per each week (2 to 14) of the experiment.
PSAWeek14
Numeric vector(s) indicating PSA concentration (ug/l) per each week (2 to 14) of the experiment.
BWWeek0
Numeric vector indicating body weight (g) of the animals per each week (0 to 14) of the experiment.
BWWeek1
Numeric vector indicating body weight (g) of the animals per each week (0 to 14) of the experiment.
BWWeek2
Numeric vector indicating body weight (g) of the animals per each week (0 to 14) of the experiment.
BWWeek3
Numeric vector indicating body weight (g) of the animals per each week (0 to 14) of the experiment.
BWWeek4
Numeric vector indicating body weight (g) of the animals per each week (0 to 14) of the experiment.
BWWeek5
Numeric vector indicating body weight (g) of the animals per each week (0 to 14) of the experiment.
BWWeek6
Numeric vector indicating body weight (g) of the animals per each week (0 to 14) of the experiment.
BWWeek7
Numeric vector indicating body weight (g) of the animals per each week (0 to 14) of the experiment.
BWWeek8
Numeric vector indicating body weight (g) of the animals per each week (0 to 14) of the experiment.
BWWeek9
Numeric vector indicating body weight (g) of the animals per each week (0 to 14) of the experiment.
BWWeek10
Numeric vector indicating body weight (g) of the animals per each week (0 to 14) of the experiment.
BWWeek11
Numeric vector indicating body weight (g) of the animals per each week (0 to 14) of the experiment.
BWWeek12
Numeric vector indicating body weight (g) of the animals per each week (0 to 14) of the experiment.
BWWeek13
Numeric vector indicating body weight (g) of the animals per each week (0 to 14) of the experiment.
BWWeek14
Numeric vector indicating body weight (g) of the animals per each week (0 to 14) of the experiment.
The wide-format here presented the longitudinal measurements for PSA and Body Weight per each column. For modeling the PSA growth longitudinally e.g. using mixed-effects models, see the vcaplong
dataset where the data has been readily transposed into the long-format.
Laajala TD, Jumppanen M, Huhtaniemi R, Fey V, Kaur A, et al. (2016) Optimized design and analysis of preclinical intervention studies in vivo. Sci Rep. 2016 Aug 2;6:30723. doi: 10.1038/srep30723.
Knuuttila M, Yatkin E, Kallio J, Savolainen S, Laajala TD, et al. (2014) Castration induces upregulation of intratumoral androgen biosynthesis and androgen receptor expression in orthotopic VCaP human prostate cancer xenograft model. Am J Pathol. 2014 Aug;184(8):2163-73. doi: 10.1016/j.ajpath.2014.04.010.
data(vcapwide) str(vcapwide) head(vcapwide) mixplot(vcapwide[,c("PSAWeek10", "PSAWeek14", "BWWeek10", "Group")], pch=16) anv <- aov(PSA ~ Group, data.frame(PSA = vcapwide[,"PSAWeek14"], Group = vcapwide[,"Group"])) summary(anv) TukeyHSD(anv) summary(aov(BW ~ Group, data.frame(BW = vcapwide[,"BWWeek14"], Group = vcapwide[,"Group"])))
data(vcapwide) str(vcapwide) head(vcapwide) mixplot(vcapwide[,c("PSAWeek10", "PSAWeek14", "BWWeek10", "Group")], pch=16) anv <- aov(PSA ~ Group, data.frame(PSA = vcapwide[,"PSAWeek14"], Group = vcapwide[,"Group"])) summary(anv) TukeyHSD(anv) summary(aov(BW ~ Group, data.frame(BW = vcapwide[,"BWWeek14"], Group = vcapwide[,"Group"])))