For this vignette we will need the **tensorflow** package for constructing the TensorFlow graph, as well as **dplyr** for data manipulation and **ggplot2** and **gridExtra** for plotting.

```
library(dplyr)
library(tensorflow)
library(ggplot2)
library(gridExtra)
```

First, we will simulate some data from a process that resides on a lattice of size 64 \(\times\) 64 (4096 pixels). The process is a realisation of a Gaussian process with mean zero and exponential covariance function that has variance 1 and length scale 0.1.

```
## Simulate process
set.seed(1)
Nside <- 64L
sigma2 <- 1
tau <- 0.1
covfun <- function(sigma2, tau, D) {
sigma2 * exp(-D / tau)
}
xy <- expand.grid(s1 = seq(0, 1, length.out = Nside),
s2 = seq(0, 1, length.out = Nside))
Dproc <- as.matrix(dist(xy))
L <- t(chol(covfun(sigma2, tau, Dproc)))
xy$Y <- L %*% rnorm(Nside^2)
```

We now assume that 4000 of these pixels are observed, and that the measurements are corrupted with noise that is Gaussian with mean zero and variance 0.2.

```
Nobs <- 4000L
sigma2e <- 0.2
zdf <- sample_n(xy, Nobs)
zdf$Z <- zdf$Y + sqrt(sigma2e)*rnorm(Nobs)
```

Below we plot the simulated process and data:

```
g1 <- ggplot(xy) + geom_tile(aes(s1, s2, fill = Y)) +
scale_fill_distiller(palette="Spectral") + theme_bw() + coord_fixed()
g2 <- ggplot(zdf) + geom_tile(aes(s1,s2, fill = Z)) +
scale_fill_distiller(palette="Spectral") + theme_bw() + coord_fixed()
grid.arrange(g1, g2, nrow = 1)
```

The Gaussian process has mean zero and covariance function \[ C(h) = \sigma^2\exp(-|h|/\tau), \] where \(\sigma^2\) and \(\tau\) are parameters that need to be estimated. The measurement error is Gaussian with mean zero and variance \(\sigma^2_\epsilon\), which also needs to be estimated.

Now, given some data \(\mathbf{Z} \equiv (Z_1,\dots,Z_{m})\), where \(m = 4000\), the log-likelihood function of \(\boldsymbol{\theta} \equiv (\sigma^2_\epsilon, \sigma^2, \tau)'\) is

\[ l(\boldsymbol{\theta}) = -\frac{m}{2}\log{2\pi} - \frac{1}{2}\log|\boldsymbol{\Sigma}| -\frac{1}{2}\mathbf{Z}'\boldsymbol{\Sigma}^{-1}\mathbf{Z}, \] where \[\Sigma_{ij} = \sigma^2\exp(-|h_{ij}|/\tau) + \sigma^2_\epsilon I(i = j),\] \(h_{ij}\) is the displacement between the \(i\)-th and \(j\)-th location, and \(I(\cdot)\) is the indicator function.

Below we write out the *negative* log-likelihood (since `optim`

minimises by default) up to an additive constant in `R`

; note that we pass the log of the parameters to be optimised rather than the parameters themselves in order to ensure positivity of these parameters.

```
## Negative log-likelihood function where
## logtheta = (log(sigma2e), log(sigma2), log(tau))
negloglik <- function(logtheta, D, Z) {
n <- nrow(D)
sigma2e <- exp(logtheta[1])
sigma2 <- exp(logtheta[2])
tau <- exp(logtheta[3])
SIGMA <- covfun(sigma2, tau, D) + sigma2e * diag(n)
L <- t(chol(SIGMA))
Zt_SIGMAinv_Z <- as.numeric(crossprod(solve(L, Z)))
sum(log(diag(L))) + 0.5*Zt_SIGMAinv_Z
}
```

Next, we call the function `optim`

; we restrict the maximum number of iterations in this example to 20 in order to minimise computation time.

```
## Maximum likelihood estimation
Dobs <- as.matrix(dist(zdf[,c("s1", "s2")]))
t1 <- Sys.time()
theta <- optim(par = rep(log(0.5), 3),
fn = negloglik,
D = Dobs,
Z = zdf$Z,
control = list(maxit = 20))
t2 <- Sys.time()
```

```
## Print results
data.frame(True = c(sigma2e, sigma2, tau),
Est. = exp(theta$par),
row.names = c("sigma2e", "sigma2", "tau"))
```

`print(t2 - t1)`

`## Time difference of 23 secs`

In order to do maximum-likelihood estimation with TensorFlow we first create a *placeholder* to contain the parameters (specifically the log of the parameters to ensure positivity) we want to estimate. We do not use the conventional TensorFlow *variable*, since in this example we will optimise using the `R`

function `optim`

. When optimising using TensorFlow itself, you will need to establish mutable variables using `tf$Variable()`

.

`logtheta_tf <- tf$placeholder(dtype = "float32", shape = list(3L))`

Next, we define the immutable objects. In our example, these are the distance matrix (encoding the distances between the observations) and the observations themselves. We define these using the function `tf$constant`

.

```
Dobs_tf <- tf$constant(Dobs, dtype = "float32")
Z_tf <- tf$constant(zdf$Z, dtype = "float32")
```

The covariance function is defined in a very similar way, making sure we use `tf`

functions and not the standard `R`

functions (e.g., `tf$exp`

rather than just `exp`

).

```
covfun_tf <- function(sigma2, tau, D)
tf$multiply(sigma2, tf$exp(-D / tau))
```

Now we construct the bulk of the graph that builds from the placeholder inputs to the negative log-likelihood.

```
n <- nrow(Dobs_tf)
sigma2e_tf <- tf$exp(logtheta_tf[1])
sigma2_tf <- tf$exp(logtheta_tf[2])
tau_tf <- tf$exp(logtheta_tf[3])
SIGMA_tf <- covfun_tf(sigma2_tf, tau_tf, Dobs_tf) + sigma2e_tf * tf$eye(n)
L_tf <- tf$cholesky(SIGMA_tf)
LinvZ_tf <- tf$matrix_solve(L_tf, Z_tf)
Zt_SIGMAinv_Z_tf <- tf$matmul(tf$transpose(LinvZ_tf), LinvZ_tf)
NLL <- tf$reduce_sum(tf$log(tf$diag_part(L_tf))) + 0.5*Zt_SIGMAinv_Z_tf
```

At the top of the graph we have the variable `NLL`

which contains our negative log likelihood. To execute this graph for particular parameter values we use the function `run`

associated with our session, and pass these parameter values through a dictionary. For example,

```
fd = dict(logtheta_tf = c(0.1, 0.2, 0.3))
run <- tf$Session()$run
run(NLL, feed_dict = fd)
```

```
## [,1]
## [1,] 1409
```

returns the negative log-likelihood for \(\log(\boldsymbol\theta) = (0.1, 0.2, 0.3)'\). Therefore, all we need to do is supply `optim`

with a funcion that takes the parameters, and executes `NLL`

on the GPU. The following function is all that is needed.

```
run_negloglik_tf_optim <- function(logtheta) {
fd <- dict(logtheta_tf = logtheta)
run(NLL, feed_dict = fd)
}
```

Thatâ€™s it! Now `optim`

will be calling the GPU at each step. To compare apples with apples, we set the maximum number of iterations to 20 as we did earlier,

```
t3 <- Sys.time()
theta <- optim(par = rep(log(0.5), 3), fn = run_negloglik_tf_optim,
control = list(maxit = 20))
t4 <- Sys.time()
```

for which we have the following results (confirming they are the same as those we obtained using the CPU).

```
## Print results
data.frame(True = c(sigma2e, sigma2, tau),
Est. = exp(theta$par),
row.names = c("sigma2e", "sigma2", "tau"))
```

`print(t4 - t3)`

`## Time difference of 1.3 secs`

Using TensorFlow with the GPU required **1.29** seconds, which is much faster than the CPU implementation (which in my case uses OpenBLAS).

By changing the number of observed locations (the variable `Nobs`

above), one can get a feel of the achievable gains using a GPU. On my system (Core i9-7900X CPU with OpenBLAS, 1080Ti GPU), the performance was comparable for `Nobs = 300`

. However, the GPU started to result in large computational gains when `Nobs`

was increased. For the chosen `Nobs =`

4000 we have the following computational times when running `optim`

for 20 iterations.

```
data.frame(CPU_OpenBLAS = t2 - t1,
GPU_TensorFlow = t4 - t3,
row.names = "Time")
```