For this vignette we will need the tensorflow package for constructing the TensorFlow graph, as well as dplyr for data manipulation and ggplot2 and gridExtra for plotting.
library(dplyr)
library(tensorflow)
library(ggplot2)
library(gridExtra)
First, we will simulate some data from a process that resides on a lattice of size 64 \(\times\) 64 (4096 pixels). The process is a realisation of a Gaussian process with mean zero and exponential covariance function that has variance 1 and length scale 0.1.
## Simulate process
set.seed(1)
Nside <- 64L
sigma2 <- 1
tau <- 0.1
covfun <- function(sigma2, tau, D) {
sigma2 * exp(-D / tau)
}
xy <- expand.grid(s1 = seq(0, 1, length.out = Nside),
s2 = seq(0, 1, length.out = Nside))
Dproc <- as.matrix(dist(xy))
L <- t(chol(covfun(sigma2, tau, Dproc)))
xy$Y <- L %*% rnorm(Nside^2)
We now assume that 4000 of these pixels are observed, and that the measurements are corrupted with noise that is Gaussian with mean zero and variance 0.2.
Nobs <- 4000L
sigma2e <- 0.2
zdf <- sample_n(xy, Nobs)
zdf$Z <- zdf$Y + sqrt(sigma2e)*rnorm(Nobs)
Below we plot the simulated process and data:
g1 <- ggplot(xy) + geom_tile(aes(s1, s2, fill = Y)) +
scale_fill_distiller(palette="Spectral") + theme_bw() + coord_fixed()
g2 <- ggplot(zdf) + geom_tile(aes(s1,s2, fill = Z)) +
scale_fill_distiller(palette="Spectral") + theme_bw() + coord_fixed()
grid.arrange(g1, g2, nrow = 1)
The Gaussian process has mean zero and covariance function \[ C(h) = \sigma^2\exp(-|h|/\tau), \] where \(\sigma^2\) and \(\tau\) are parameters that need to be estimated. The measurement error is Gaussian with mean zero and variance \(\sigma^2_\epsilon\), which also needs to be estimated.
Now, given some data \(\mathbf{Z} \equiv (Z_1,\dots,Z_{m})\), where \(m = 4000\), the log-likelihood function of \(\boldsymbol{\theta} \equiv (\sigma^2_\epsilon, \sigma^2, \tau)'\) is
\[ l(\boldsymbol{\theta}) = -\frac{m}{2}\log{2\pi} - \frac{1}{2}\log|\boldsymbol{\Sigma}| -\frac{1}{2}\mathbf{Z}'\boldsymbol{\Sigma}^{-1}\mathbf{Z}, \] where \[\Sigma_{ij} = \sigma^2\exp(-|h_{ij}|/\tau) + \sigma^2_\epsilon I(i = j),\] \(h_{ij}\) is the displacement between the \(i\)-th and \(j\)-th location, and \(I(\cdot)\) is the indicator function.
Below we write out the negative log-likelihood (since optim
minimises by default) up to an additive constant in R
; note that we pass the log of the parameters to be optimised rather than the parameters themselves in order to ensure positivity of these parameters.
## Negative log-likelihood function where
## logtheta = (log(sigma2e), log(sigma2), log(tau))
negloglik <- function(logtheta, D, Z) {
n <- nrow(D)
sigma2e <- exp(logtheta[1])
sigma2 <- exp(logtheta[2])
tau <- exp(logtheta[3])
SIGMA <- covfun(sigma2, tau, D) + sigma2e * diag(n)
L <- t(chol(SIGMA))
Zt_SIGMAinv_Z <- as.numeric(crossprod(solve(L, Z)))
sum(log(diag(L))) + 0.5*Zt_SIGMAinv_Z
}
Next, we call the function optim
; we restrict the maximum number of iterations in this example to 20 in order to minimise computation time.
## Maximum likelihood estimation
Dobs <- as.matrix(dist(zdf[,c("s1", "s2")]))
t1 <- Sys.time()
theta <- optim(par = rep(log(0.5), 3),
fn = negloglik,
D = Dobs,
Z = zdf$Z,
control = list(maxit = 20))
t2 <- Sys.time()
## Print results
data.frame(True = c(sigma2e, sigma2, tau),
Est. = exp(theta$par),
row.names = c("sigma2e", "sigma2", "tau"))
print(t2 - t1)
## Time difference of 23 secs
In order to do maximum-likelihood estimation with TensorFlow we first create a placeholder to contain the parameters (specifically the log of the parameters to ensure positivity) we want to estimate. We do not use the conventional TensorFlow variable, since in this example we will optimise using the R
function optim
. When optimising using TensorFlow itself, you will need to establish mutable variables using tf$Variable()
.
logtheta_tf <- tf$placeholder(dtype = "float32", shape = list(3L))
Next, we define the immutable objects. In our example, these are the distance matrix (encoding the distances between the observations) and the observations themselves. We define these using the function tf$constant
.
Dobs_tf <- tf$constant(Dobs, dtype = "float32")
Z_tf <- tf$constant(zdf$Z, dtype = "float32")
The covariance function is defined in a very similar way, making sure we use tf
functions and not the standard R
functions (e.g., tf$exp
rather than just exp
).
covfun_tf <- function(sigma2, tau, D)
tf$multiply(sigma2, tf$exp(-D / tau))
Now we construct the bulk of the graph that builds from the placeholder inputs to the negative log-likelihood.
n <- nrow(Dobs_tf)
sigma2e_tf <- tf$exp(logtheta_tf[1])
sigma2_tf <- tf$exp(logtheta_tf[2])
tau_tf <- tf$exp(logtheta_tf[3])
SIGMA_tf <- covfun_tf(sigma2_tf, tau_tf, Dobs_tf) + sigma2e_tf * tf$eye(n)
L_tf <- tf$cholesky(SIGMA_tf)
LinvZ_tf <- tf$matrix_solve(L_tf, Z_tf)
Zt_SIGMAinv_Z_tf <- tf$matmul(tf$transpose(LinvZ_tf), LinvZ_tf)
NLL <- tf$reduce_sum(tf$log(tf$diag_part(L_tf))) + 0.5*Zt_SIGMAinv_Z_tf
At the top of the graph we have the variable NLL
which contains our negative log likelihood. To execute this graph for particular parameter values we use the function run
associated with our session, and pass these parameter values through a dictionary. For example,
fd = dict(logtheta_tf = c(0.1, 0.2, 0.3))
run <- tf$Session()$run
run(NLL, feed_dict = fd)
## [,1]
## [1,] 1409
returns the negative log-likelihood for \(\log(\boldsymbol\theta) = (0.1, 0.2, 0.3)'\). Therefore, all we need to do is supply optim
with a funcion that takes the parameters, and executes NLL
on the GPU. The following function is all that is needed.
run_negloglik_tf_optim <- function(logtheta) {
fd <- dict(logtheta_tf = logtheta)
run(NLL, feed_dict = fd)
}
That’s it! Now optim
will be calling the GPU at each step. To compare apples with apples, we set the maximum number of iterations to 20 as we did earlier,
t3 <- Sys.time()
theta <- optim(par = rep(log(0.5), 3), fn = run_negloglik_tf_optim,
control = list(maxit = 20))
t4 <- Sys.time()
for which we have the following results (confirming they are the same as those we obtained using the CPU).
## Print results
data.frame(True = c(sigma2e, sigma2, tau),
Est. = exp(theta$par),
row.names = c("sigma2e", "sigma2", "tau"))
print(t4 - t3)
## Time difference of 1.3 secs
Using TensorFlow with the GPU required 1.29 seconds, which is much faster than the CPU implementation (which in my case uses OpenBLAS).
By changing the number of observed locations (the variable Nobs
above), one can get a feel of the achievable gains using a GPU. On my system (Core i9-7900X CPU with OpenBLAS, 1080Ti GPU), the performance was comparable for Nobs = 300
. However, the GPU started to result in large computational gains when Nobs
was increased. For the chosen Nobs =
4000 we have the following computational times when running optim
for 20 iterations.
data.frame(CPU_OpenBLAS = t2 - t1,
GPU_TensorFlow = t4 - t3,
row.names = "Time")