Simulation Setup

For this vignette we will need the tensorflow package for constructing the TensorFlow graph, as well as dplyr for data manipulation and ggplot2 and gridExtra for plotting.

library(dplyr)
library(tensorflow)
library(ggplot2)
library(gridExtra)

First, we will simulate some data from a process that resides on a lattice of size 64 \(\times\) 64 (4096 pixels). The process is a realisation of a Gaussian process with mean zero and exponential covariance function that has variance 1 and length scale 0.1.

## Simulate process
set.seed(1)        
Nside <- 64L
sigma2 <- 1
tau <- 0.1
covfun <- function(sigma2, tau, D) {
  sigma2 * exp(-D / tau)
}
xy <- expand.grid(s1 = seq(0, 1, length.out = Nside),
                  s2 = seq(0, 1, length.out = Nside))
Dproc <- as.matrix(dist(xy))
L <- t(chol(covfun(sigma2, tau, Dproc)))
xy$Y <- L %*% rnorm(Nside^2)

We now assume that 4000 of these pixels are observed, and that the measurements are corrupted with noise that is Gaussian with mean zero and variance 0.2.

Nobs <- 4000L
sigma2e <- 0.2
zdf <- sample_n(xy, Nobs)
zdf$Z <- zdf$Y + sqrt(sigma2e)*rnorm(Nobs)

Below we plot the simulated process and data:

g1 <- ggplot(xy) + geom_tile(aes(s1, s2, fill = Y)) +
  scale_fill_distiller(palette="Spectral") + theme_bw() + coord_fixed()
g2 <- ggplot(zdf) + geom_tile(aes(s1,s2, fill = Z)) +
 scale_fill_distiller(palette="Spectral") + theme_bw() + coord_fixed()
grid.arrange(g1, g2, nrow = 1)

Maximum Likelihood Estimation (without TensorFlow)

The Gaussian process has mean zero and covariance function \[ C(h) = \sigma^2\exp(-|h|/\tau), \] where \(\sigma^2\) and \(\tau\) are parameters that need to be estimated. The measurement error is Gaussian with mean zero and variance \(\sigma^2_\epsilon\), which also needs to be estimated.

Now, given some data \(\mathbf{Z} \equiv (Z_1,\dots,Z_{m})\), where \(m = 4000\), the log-likelihood function of \(\boldsymbol{\theta} \equiv (\sigma^2_\epsilon, \sigma^2, \tau)'\) is

\[ l(\boldsymbol{\theta}) = -\frac{m}{2}\log{2\pi} - \frac{1}{2}\log|\boldsymbol{\Sigma}| -\frac{1}{2}\mathbf{Z}'\boldsymbol{\Sigma}^{-1}\mathbf{Z}, \] where \[\Sigma_{ij} = \sigma^2\exp(-|h_{ij}|/\tau) + \sigma^2_\epsilon I(i = j),\] \(h_{ij}\) is the displacement between the \(i\)-th and \(j\)-th location, and \(I(\cdot)\) is the indicator function.

Below we write out the negative log-likelihood (since optim minimises by default) up to an additive constant in R; note that we pass the log of the parameters to be optimised rather than the parameters themselves in order to ensure positivity of these parameters.

## Negative log-likelihood function where 
## logtheta = (log(sigma2e), log(sigma2), log(tau))
negloglik <- function(logtheta, D, Z) {
  n <- nrow(D)
  sigma2e <- exp(logtheta[1])
  sigma2 <- exp(logtheta[2])
  tau <- exp(logtheta[3])
  
  SIGMA <- covfun(sigma2, tau, D) + sigma2e * diag(n)
  L <- t(chol(SIGMA))
  Zt_SIGMAinv_Z <- as.numeric(crossprod(solve(L, Z)))
  sum(log(diag(L))) + 0.5*Zt_SIGMAinv_Z
}

Next, we call the function optim; we restrict the maximum number of iterations in this example to 20 in order to minimise computation time.

## Maximum likelihood estimation
Dobs <- as.matrix(dist(zdf[,c("s1", "s2")]))
t1 <- Sys.time()
theta <- optim(par = rep(log(0.5), 3), 
               fn = negloglik, 
               D = Dobs, 
               Z = zdf$Z,              
               control = list(maxit = 20))
t2 <- Sys.time()
## Print results
data.frame(True = c(sigma2e, sigma2, tau),
           Est. = exp(theta$par),
           row.names = c("sigma2e", "sigma2", "tau"))
print(t2 - t1)
## Time difference of 23 secs

Maximum Likelihood Estimation (with TensorFlow)

In order to do maximum-likelihood estimation with TensorFlow we first create a placeholder to contain the parameters (specifically the log of the parameters to ensure positivity) we want to estimate. We do not use the conventional TensorFlow variable, since in this example we will optimise using the R function optim. When optimising using TensorFlow itself, you will need to establish mutable variables using tf$Variable().

logtheta_tf <- tf$placeholder(dtype = "float32", shape = list(3L))

Next, we define the immutable objects. In our example, these are the distance matrix (encoding the distances between the observations) and the observations themselves. We define these using the function tf$constant.

Dobs_tf <- tf$constant(Dobs, dtype = "float32")
Z_tf <- tf$constant(zdf$Z, dtype = "float32")

The covariance function is defined in a very similar way, making sure we use tf functions and not the standard R functions (e.g., tf$exp rather than just exp).

covfun_tf <- function(sigma2, tau, D)
  tf$multiply(sigma2, tf$exp(-D / tau))

Now we construct the bulk of the graph that builds from the placeholder inputs to the negative log-likelihood.

n <- nrow(Dobs_tf)
sigma2e_tf <- tf$exp(logtheta_tf[1])
sigma2_tf <- tf$exp(logtheta_tf[2])
tau_tf <- tf$exp(logtheta_tf[3])
SIGMA_tf <- covfun_tf(sigma2_tf, tau_tf, Dobs_tf) + sigma2e_tf * tf$eye(n)
L_tf <- tf$cholesky(SIGMA_tf)
LinvZ_tf <- tf$matrix_solve(L_tf, Z_tf)
Zt_SIGMAinv_Z_tf <- tf$matmul(tf$transpose(LinvZ_tf), LinvZ_tf)
NLL <- tf$reduce_sum(tf$log(tf$diag_part(L_tf))) + 0.5*Zt_SIGMAinv_Z_tf

At the top of the graph we have the variable NLL which contains our negative log likelihood. To execute this graph for particular parameter values we use the function run associated with our session, and pass these parameter values through a dictionary. For example,

fd = dict(logtheta_tf = c(0.1, 0.2, 0.3))
run <- tf$Session()$run
run(NLL, feed_dict = fd)
##      [,1]
## [1,] 1409

returns the negative log-likelihood for \(\log(\boldsymbol\theta) = (0.1, 0.2, 0.3)'\). Therefore, all we need to do is supply optim with a funcion that takes the parameters, and executes NLLon the GPU. The following function is all that is needed.

run_negloglik_tf_optim <- function(logtheta) {
  fd <- dict(logtheta_tf = logtheta)
  run(NLL, feed_dict = fd)
}

That’s it! Now optim will be calling the GPU at each step. To compare apples with apples, we set the maximum number of iterations to 20 as we did earlier,

t3 <- Sys.time()
theta <- optim(par = rep(log(0.5), 3), fn = run_negloglik_tf_optim,
               control = list(maxit = 20))
t4 <- Sys.time()

for which we have the following results (confirming they are the same as those we obtained using the CPU).

## Print results
data.frame(True = c(sigma2e, sigma2, tau),
           Est. = exp(theta$par),
           row.names = c("sigma2e", "sigma2", "tau"))
print(t4 - t3)
## Time difference of 1.3 secs

Using TensorFlow with the GPU required 1.29 seconds, which is much faster than the CPU implementation (which in my case uses OpenBLAS).

Conclusion

By changing the number of observed locations (the variable Nobs above), one can get a feel of the achievable gains using a GPU. On my system (Core i9-7900X CPU with OpenBLAS, 1080Ti GPU), the performance was comparable for Nobs = 300. However, the GPU started to result in large computational gains when Nobs was increased. For the chosen Nobs = 4000 we have the following computational times when running optim for 20 iterations.

data.frame(CPU_OpenBLAS = t2 - t1,
           GPU_TensorFlow = t4 - t3,
       row.names = "Time")