Funding Source: This work was supported by the National Science Foundation (SES1851690, https://www.nsf.gov/awardsearch/showAward?AWD_ID=1851690). Illustrative data collection was supported by the National Institute of Deafness and Other Communication Disorders (R01 DC011755).

Introduction

This website provides a tutorial for fitting generalized additive logistic regression models as presented in Cho, Brown-Schmidt, De Boeck, and Naveiras (2022, https://psycnet.apa.org/doiLanding?doi=10.1037%2Fmet0000444) using R. This tutorial uses the illustrative data set in Cho et al. (2022) to demonstrate how generalized additive logistic regression models as generalized additive mixed models (GAMMs) can be applied to intensive binary time-series eye-tracking data with complex spatial-temporal effects. The structure of this tutorial is as follows:

Data Management

This section will show what your data should (roughly) resemble when using this tutorial.

First, read the file for your data into R as a data frame:

setwd("C:\\Users\\MyName\\Documents\\Code")
data <- read.table("data.txt",sep="\t",header=TRUE)
##    y person trial item talker session sleep1 sleep2 time
## 1  0      1    20    3      1       1   0.25   -0.5  180
## 2  0      1    20    3      1       1   0.25   -0.5  190
## 3  0      1    20    3      1       1   0.25   -0.5  200
## 4  0      1    20    3      1       1   0.25   -0.5  210
## 5  0      1    20    3      1       1   0.25   -0.5  220
## 6  0      1    20    3      1       1   0.25   -0.5  230
## 7  0      1    20    3      1       1   0.25   -0.5  240
## 8  0      1    20    3      1       1   0.25   -0.5  250
## 9  0      1    20    3      1       1   0.25   -0.5  260
## 10 0      1    20    3      1       1   0.25   -0.5  270

Your data should be in a long format, with each row containing the data for a single time point. We’ll define new variables for both (a) the levels of each variable (unique.variablename) and (b) the number of levels for each variable (number.variablename) as follows:

unique.y <- sort(unique(y))
unique.person <- sort(unique(person))
unique.trial <- sort(unique(trial))
unique.item <- sort(unique(item))
unique.talker <- sort(unique(talker))
unique.session <- sort(unique(session))
unique.time <- sort(unique(time))

number.y <- length(unique.y)               # 2, because y is a binary outcome (0/1)
number.person <- length(unique.person)     # 60
number.trial <- length(unique.trial)       # 64
number.item <- length(unique.item)         # 8
number.talker <- length(unique.talker)     # 2
number.session <- length(unique.session)   # 2
number.time <- length(unique.time)         # 132

Data Description

In this section we’ll show how to calculate empirical logits (\(empirical \; logit = log(\frac{proportion}{1 - proportion})\)) for your data, and how to use those empirical logits to explore change processes (autocorrelations and trend). We’ll then plot the autocorrelations, partial autocorrelations, and trend (for examining temporal dependence), Pearson residuals (for examining spatial correlations), and the semivariogram (for examining spatial by temporal interactions) using the ggplot2 package (Wickham, 2016).

Temporal Autocorrelations & Trend

Calculating Empirical logit

First, we’ll create a data frame called empirical.logit containing the empirical logit calculated for each unique combination of time, trial, and session. Note that we don’t include person or item here because the empirical logit will be calculated across persons and because the items are nested within trials. The proportion is calculated as the proportion of correct responses (\(y=1\)) for each unique combination of time, trial, and session (\(proportion_{tls} = mean(y_{tls})\) for time point \(t\), trial \(l\), and session \(s\)). The empirical logit is then calculated based on the proportions: \((empirical \; logit)_{tls} = log(\frac{proportion_{tls}}{1 - proportion_{tls}})\). The proportions and empirical logits are added to the empirical.logit data frame as follows:

rowcount <- number.time*number.trial*number.session
empirical.logit <- data.frame(time=rep(NA,rowcount),trial=rep(NA,rowcount),
session=rep(NA,rowcount),proportion=rep(NA,rowcount),empirical.logit=rep(NA,rowcount))
index <- 1

for (t in 1:number_time){
for (l in 1:number_trialseq){
for (s in 1:number_session){
y.tls <- data[(data$time==unique_time[t] & data$trialseq==unique_trial[l] & 
data$session==unique_session[s]),'y']
proportion <- mean(y.tls)
prop <- min(max(10^-9, proportion), 10^9)
emp.logit <- log(prop/(1 - prop))
empirical.logit[index,] <- c(unique_time[t], unique_trialseq[l], unique_session[s], 
proportion, emp.logit)
index <- index + 1 #
}}}
empirical.logit$time <- as.factor(empirical.logit$time)

Lines 1-4 initialize the empirical.logit matrix and the row index (for each unique combination of time, trial, and session). Lines 6-8 loop through each unique combination of time, trial, and session. Line 11 calculates the proportion, and lines 12-13 calculate the empirical logit. Line 12 places upper and lower bounds on the proportion when calculating the empirical logit for extreme cases where proportion approaches 0 or 1, which would result in empirical.logit ranging between \(-\infty\) and \(+\infty\), respectively. Lines 14-16 input the calculations into the empirical.logit data frame before updating the row index. Line 18 sets the time variable in empirical.logit to a factor (which is necessary when plotting trend).

Looking at the resulting empirical.logit data frame, we now have the calculated values for empirical logit:

##    time trial session proportion empirical.logit
## 1   180     1       1 0.16666667       -1.609438
## 2   180     1       2 0.11666667       -2.024382
## 3   180     2       1 0.20000000       -1.386294
## 4   180     2       2 0.10000000       -2.197225
## 5   180     3       1 0.08333333       -2.397895
## 6   180     3       2 0.16666667       -1.609438
## 7   180     4       1 0.10000000       -2.197225
## 8   180     4       2 0.15000000       -1.734601
## 9   180     5       1 0.11666667       -2.024382
## 10  180     5       2 0.11666667       -2.024382

Calculating Autocorrelations & Partial Autocorrelations

In this section we’ll calculate the autocorrelations and partial autocorrelations, using the calculated empirical logits. Autocorrelations and partial autocorrelations will be calculated for each unique combination of trial, session, and each time.lag ranging from 1 to time.lag.max (for this example, we’ll use a time.lag.max of 20, meaning we’ll examine autocorrelations and partial autocorrelations for each time lag from 1 to 20):

time.lag.max <- 20 
rowcount <- time.lag.max*number.trial*number.session
AC.PAC <- data.frame(time.lag=rep(NA,rowcount), trial=rep(NA,rowcount), 
session=rep(NA,rowcount), AC=rep(NA,rowcount), PAC=rep(NA,rowcount))
index <- 1

for (l in 1:length(unique.trial)){
for (s in 1:length(unique.session)){
empirical.logit.ls <- empirical.logit[(empirical.logit$trial==unique.trial[l] & 
empirical.logit$session==unique.session[s]),'empirical.logit']
ac <- acf(empirical.logit.ls, lag.max = time.lag.max, plot=FALSE)
ac <- ac$acf[2:(time.lag.max + 1)]
pac <- c(pacf(empirical.logit.ls, lag.max = time.lag.max, 
na.action=na.pass, plot=FALSE)$acf)
for (time.lag in 1:time.lag.max){
AC.PAC[(row.index + time.lag - 1),] <- c(time.lag, unique.trial[l], unique.session[s], 
ac[time.lag], pac[time.lag])
}
index <- index + time.lag.max
}}
AC.PAC$time.lag <- as.factor(AC.PAC$time.lag)

Line 1 selects the maximum time lag that we’re interested in (for this example, we’ll look at time lags ranging from 1 to 20). Lines 2-5 initialize the AC.PAC data frame and the row index. Lines 7-8 loop through each unique combination of trial and session. Lines 11-14 calculate the autocorrelations and partial autocorrelations based on the empirical logits we previously calculated. Note that in line 12 we exclude the first autocorrelation, which is the autocorrelation where time.lag=0, which we aren’t interested in. Lines 15-18 update the AC.PAC data frame for each unique combination of trial, session, and time.lag to include the autocorrelations and partial autocorrelations. Line 21 redefines time.lag as a factor (which is necessary when plotting autocorrelations and partial autocorrelations in the following section).

Looking at AC.PAC, we should now have the calculated values for both autocorrelations and partial autocorrelations:

##    time.lag trial session        AC         PAC
## 1         1     1       1 0.9804831  0.98048310
## 2         2     1       1 0.9597471 -0.04139363
## 3         3     1       1 0.9386191 -0.01977484
## 4         4     1       1 0.9162839 -0.04160785
## 5         5     1       1 0.8936426 -0.01767877
## 6         6     1       1 0.8719185  0.01280519
## 7         7     1       1 0.8496486 -0.02616967
## 8         8     1       1 0.8288226  0.02636468
## 9         9     1       1 0.8112830  0.07066020
## 10       10     1       1 0.7935337 -0.02037809

Plotting Autocorrelations & Partial Autocorrelations

In this step, we’ll use ggplot2 to plot figures for the autocorrelations and partial autocorrelations:

library(ggplot2)

AC.plot.1 <- ggplot(AC.PAC[AC.PAC$session==1,], aes(x=time.lag, y=AC)) +
geom_boxplot() +
labs(title = "Figure 1: Session 1 Autocorrelation", x = "Time Lag", 
y = "Autocorrelation for Trials")

AC.plot.2 <- ggplot(AC.PAC[AC.PAC$session==2,], aes(x=time.lag, y=AC)) +
geom_boxplot() +
labs(title = "Figure 2: Session 2 Autocorrelation", x = "Time Lag", 
y = "Autocorrelation for Trials")

PAC.plot.1 <- ggplot(AC.PAC[AC.PAC$session==1,], aes(x=time.lag, y=PAC)) +
geom_boxplot() +
labs(title = "Figure 3: Session 1 Partial Autocorrelation", x = "Time Lag", 
y = "Autocorrelation for Trials")

PAC.plot.2 <- ggplot(AC.PAC[AC.PAC$session==2,], aes(x=time.lag, y=PAC)) +
geom_boxplot() +
labs(title = "Figure 4: Session 2 Partial Autocorrelation", x = "Time Lag", 
y = "Autocorrelation for Trials")

Line 1 loads the ggplot2 library. Lines 3-6 plot the autcorrelations for session 1. Similarly, lines 8-11, 13-16, and 18-21 plot the autocorrelations for session 2, partial autocorrelations for session 1, and partial autocorrelations for session 2 (respectively).

Calling AC.plot.1 gives the following:

The box plot of correlograms across persons presented in Figure 1 above shows that the autocorrelations did not converge at zero, even at large values of lag. This pattern has also been observed when a time-series also exhibits trend (Chatfield, 2004, p. 26).

You should expect a figure somewhat similar to Figure 1 above, with the autocorrelation decreasing as a function of the time lag.

Calling up PAC.plot.1 gives the following:

You should expect a figure somewhat similar to Figure 2 above, with high partial autocorrelations at time.lag=1 that drop off quickly for time.lag > 1. This suggests that the first order of autocorrelations should be considered.

Plotting Trend

In this section, we’ll use ggplot2 to plot figures for the values of empirical logit we calculated above. Each figure will illustrate the empirical logit across trial for each session over time:

time.breaks <- seq(0,1500,by=100)

empirical.logit.plot.1 <- ggplot(empirical.logit[empirical.logit$session==1,], 
aes(x=time, y=empirical.logit)) +
geom_boxplot() +
labs(title = "Figure 5: Trend", x = "Time", y = "Empirical Logit") +
scale_x_discrete(breaks=time.breaks)

empirical.logit.plot.2 <- ggplot(empirical.logit[empirical.logit$session==2,], 
aes(x=time, y=empirical.logit)) +
geom_boxplot() +
labs(title = "Figure 6: Session 2 Trend", x = "Time", y = "Empirical Logit") +
scale_x_discrete(breaks=time.breaks)

Line 1 sets the discrete time breaks for the tick marks of the \(x\)-axis. For this example, because time ranges from 180 to 1490 in increments of 10, we used tick marks from 0 to 1500 in increments of 100 (to avoid the \(x\)-axis from appearing cluttered). Lines 3-7 and lines 9-13 create the plots of empirical logits for session 1 and for session 2, respectively.

Calling empirical.logit.plot.1 gives the following plot:

You should expect a figure somewhat resembling the figure above. The overall trend was positive and (approximately) linear over time. Fitted lines over time were similar between the linear function and Kernal-weighted local polynomial smoothing function, and only small deviations from the linear trend were observed.

Similar patterns in autocorrelations, partial autocorrelations, and trend were observed for session=2 as those presented for session=1. Based on exploratory data analyses of change processes (linear trend and the first-order autocorrelations), we considered a generalized additive logistic regression model to account for the change processes in this binary data.

Spatial Correlations

In this section, we’ll plot the Pearson residuals based on a null model (Equation 5 in Cho et al., 2022) to examine spatial correlations within our data. Rather than modeling across all time points, we’ll reduce the complexity of the data and the resulting model by collapsing the time variable into blocks of time points.

First, we’ll separate the 132 time points (ranging from 180 to 1490 in increments of 10) into 6 equally-sized time blocks containing 22 time points, which we’ll include in the data as the time.block variable:

time.breaks <- c(390, 610, 830, 1050, 1270)
data$time.block <- rep(1,length(data$time))
for (time.break in 1:5){
data$time.block[which(data$time > time.breaks[time.break])] <- time.break + 1
}

The time.breaks in line 1 are the thresholds between time blocks, splitting time into 6 equally-sized time blocks. Lines 2-5 create the time.block variable, with each value of time.block (ranging from 1 to 6) equal to one more than the number of time.breaks (ranging from 0 to 5) that time exceeds. For example, values of time equal to 200, 450, and 1300 would correspond to time.block equal to 1, 2, and 6, respectively.

Second, we’ll calculate the Pearson residuals based on a null model with these time blocks using the bam function in the mgcv package:

library(mgcv)
m0 <- bam(y ~ 1 + s(person, by=time.block, bs="re") + s(item, by=time.block, bs="re"), 
data=data, family=binomial, method="ML")
m0.residuals <- residuals(m0, type="pearson")
m0.data <- data.frame(residuals=m0.residuals, time.block=data$time.block, X=data$X, Y=data$Y)

Line 1 loads the mgcv package to create the null model in line 2-3. Lines 4-5 extract the Pearson residuals from the null model and stores them alongside the \(x\) and \(y\)-coordinates X and Y (not to be confused with the binary outcome y) in the m0.residuals.data data frame.

Third, these residuals and coordinates are plotted in a bubble plot using the bubble function in the sp package (Pebesma & Bivand, 2005):

coordinates(m0.data) <- c("X","Y")
bubble(m0.data, "residuals", col=c("black","grey"), main = "Pearson Residuals",
xlab="x-coordinates (ranging from -1517 to 2943)", ylab="y-coordinates (ranging from -818 to 2136)")

Line 1 above defines the variables in m0.data that represent the \(x\) and \(y\)-coordinates. Lines 2-4 generate the following bubble plot:

The Pearson residuals are plotted by direction (gray for positive, black for negative) and magnitude (small to large). The four areas with large positive (gray) residuals in the above plot correspond with the four images that participants viewed, indicating that there are positive spatial correlations among these four images.

Spatial by Temporal Interactions

In this section, we’ll use the Pearson residuals obtained in the previous section to calculate and plot semivariograms to examine spatial dependence in the \(x\) and \(y\)-coordinates in our data over time to explore spatial by temporal interactions. Rather than examining the semivariogram across all time points (which would be cluttered due to the large amoung of data), we’ll plot the semivariogram for individual time blocks using the time.block variable created in the previous section.

The residuals are used to calculate the semivariogram for a given time block using the variogram function in the gstat package (Pebesma, 2004):

semivariogram <- variogram(residuals ~ 1, data = m0.data[m0.data$time.block==6,])
ggplot(semivariogram, aes(x=dist, y=gamma)) +
geom_point() +
geom_line() +
labs(title = "Semivariogram for Time Block 6", x = "Distance", y = "Semivariance") +
theme_bw()

Line 1 calculates the semivariogram for time.block=6 using the variogram function, which is plotted in lines 2-6. The resulting plot is as follows:

We can also use the variogram function to test the isotopy assumption that spatial dependence is the same in every direction for a given time block. For time.block=6 the semivariogram is calculated and plotted as follows:

semivariogram.isotropy <- variogram(residuals ~ 1, data = m0.data[m0.data$time.block==6,], 
alpha=c(0,45,90,135))
plot(semivariogram.isotropy)

The alpha argument defines the directions to be evaluated. For this example, the directions of \(0^{\circ}\) (north), \(45^{\circ}\) (north-east), \(90^{\circ}\) (east), and \(135^{\circ}\) (south-east) are evaluated. The resulting output is the following plot:

The strength and patterns of spatial correlation were similar for the four selected directions of north (\(0^{\circ}\)), northeast (\(45^{\circ}\)), east (\(90^{\circ}\)), and southeast (\(135^{\circ}\)), which indicates that isotropy can be assumed.

Model Specification and Estimation

In this section, we will first set up the data to include several lag covariates that will be used for modeling. This updated data set will then be used to specify generalized additive logistic regression models from which model selection will take place.

Several additional variables we have in the data that we’re interested in including during modeling include:

We’re going to define lag-1 covariates for y (the outcome), d (target distance), and c (competitor distance). As seen when examining autocorrelation and partial autocorrelation plots, we have good reason to assume that there are first-order correlations that should be considered when modeling. These lag covariates are added to the data as follows:

data$y.lag.1 <- data$d.lag.1 <- data$c.lag.1 <- NA
for (l in 1:length(unique.trial)){
for (j in 1:length(unique.person)){
for (s in 1:length(unique.session)){
index.ljs <- which(data$trial==unique.trial[l] & 
data$person==unique.person[j] & 
data$session==unique.session[s])
data.ljs <- data[index.ljs,]
y.ljs <- data.ljs$y
t.ljs <- data.ljs$d
c.ljs <- data.ljs$c
data[index.ljs,'y.lag.1'] <- c(NA,y.ljs[-length(y.ljs)]
data[index.ljs,'d.lag.1'] <- c(NA,d.ljs[-length(y.ljs)]
data[index.ljs,'c.lag.1'] <- c(NA,c.ljs[-length(y.ljs)]
}}}

Line 1 initializes the lag variables in the data. Lines 2-4 indicate that we’re creating the lag covariates for each unique combination of trial, person, and session. For example, if for trial=3, person=11, and session=1 we have y=[1,0,0,1,1,0,1,0,1] over nine time points, we want y.lag.1=[NA,1,0,0,1,1,0,1,0]. Lines 5-7 obtain the row indices of the data corresponding to the current combination of variables, and lines 8-11 extract the variables from which we’ll create the lag variables. Lines 12-14 create the lag variables, and insert them into the data.

Before modeling, we’ll define the necessary variables in the data as factors and add a contrast attribute to the y.lag.1 variable:

data$person <- as.factor(data$person)
data$item <- as.factor(data$item)
data$session <- as.factor(data$session)
data$y.lag.1 <- as.ordered(data$y.lag.1)
contrasts(data$y.lag.1) <- 'contr.treatment'

Lines 1-4 re-code the person, item, session, and y.lag.1 variables as factors. Line 5 defines the lag-1 covariate for the outcome (by using contr.treatment) contrast variable as an ordered factor.

The first model we’ll create is the null model, model.null. This null model is different than the one previously examined when calculating and plotting the spatial and spatial by temporal interactions, as it no longer uses the time.block variable. The syntax for the null model is specified as follows:

model.null <- bam(y ~ 1 + s(person, bs="re") + s(item, bs="re"), 
data=data, family=binomial, method="ML")

The family argument specifies the random component as binomial data, and the method=“ML” argument selects maximum likelihood as the estimation method. The syntax for model.null is based on the following equation:

\[\begin{equation} {\rm logit}[P(y_{tlji} = 1 | \theta_{j}, \zeta_{i})] = x^{'}\delta_{0} + z_{j}^{'}\theta_{j} + z_{i}^{'}\zeta_{i}, \end{equation}\]

where \(x^{'}\) is a design matrix for fixed intercept parameter \(\delta_{0}\), \(z_{j}\) is a design matrix for parametric random person effect \(\theta_{j}\), and \(z_{i}\) is a design matrix for parametric random item effect \(\zeta_{i}\).

The second model we’ll run, model.A, adds the lag variable y.lag, as well as the time variable:

model.A <- bam(y ~ 1 + y.lag.1 + 
s(time,by=y.lag.1,bs="cr",k=10) + # Added to model.null
s(time,bs="cr",k=10) + # Added to model.null
s(person,bs="re") + s(item,bs="re"), 
data=data, family=binomial, method="ML")

The syntax for model.A is based on the following equation:

\[\begin{equation} {\rm logit}[P(y_{tlji} = 1 | \theta_{j}, \zeta_{i})] = x^{'}\delta_{0} + z_{j}^{'}\theta_{j} + z_{i}^{'}\zeta_{i} + f_{1}(time_{t}) + y^{'}_{(t-1)lji}f_{2}(time_{t}), \end{equation}\]

where \(f_{1}(time_{t})\) is a univariate smooth function to model nonlinear (potentially non-monotonic) trend over time, and \(y^{'}_{(t-1)lji}f_{2}(time_{t})\) is a univariate smooth function of time for the first-order temporal lag covariate.

In the syntax for model.A, the binary variable y is predicted using the following effects:

The third model we’ll run, model.B, adds the trial variable, as well as the interaction between trial and time:

model.B <- bam(y ~ 1 + y.lag.1 + 
s(time,by=y.lag.1,bs="cr",k=10) + 
s(time,bs="cr",k=10) + 
s(trial,bs="cr",k=10) +  # Added to model.A
ti(trial,time,bs="cr",k=10) + # Added to model.A 
s(person,bs="re") + s(item,bs="re"), 
data=data, family=binomial, method="ML")

The syntax for model.B is based on the following equation:

\[\begin{equation} {\rm logit}[P(y_{tlji} = 1 | \theta_{j}, \zeta_{i})] = x^{'}\delta_{0} + z_{j}^{'}\theta_{j} + z_{i}^{'}\zeta_{i} + f_{1}(time_{t}) + y^{'}_{(t-1)lji}f_{2}(time_{t}) + f_{3}(trial_{l}) + f_{4}(time_{t},trial_{l}), \end{equation}\]

where \(f_{3}(trial_{l})\) is a univariate smooth function to model nonlinear (potentially non-monotonic) trend over trials, and \(f_{4}(time_{t},trial_{l})\) is a bivariate smooth function to model the interaction between time and trials.

In the syntax for model.B, the following effects were added to model.A:

The fourth model we’ll run, model.C, adds the t.lag and c.lag variables:

model.C <- bam(y ~ 1 + y.lag.1 + 
s(time,by=y.lag.1,bs="cr",k=10) + 
s(time,by=d.lag.1,bs="cr",k=10) + s(time,by=c.lag.1,bs="cr",k=10) + # Added to model.B
s(time,bs="cr",k=10) + 
s(trial,bs="cr",k=10) + 
ti(trial,time,bs="cr",k=10) +  
s(person,bs="re") + s(item,bs="re"), 
data=data, family=binomial, method="ML")

The syntax for model.C is based on the following equation:

\[ \begin{aligned} {\rm logit}[P(y_{tlji} = 1 | \theta_{j}, \zeta_{i})] = x^{'}\delta_{0} + z_{j}^{'}\theta_{j} + z_{i}^{'}\zeta_{i} + f_{1}(time_{t}) + y^{'}_{(t-1)lji}f_{2}(time_{t}) + f_{3}(trial_{l}) + f_{4}(time_{t},trial_{l}) \\ + \; d^{'}_{(t-1)lji}f_{5}(time_{t}) + c^{'}_{(t-1)lji}f_{6}(time_{t}), \end{aligned} \]

where \(d^{'}_{(t-1)lji}f_{5}(time_{t})\) and \(c^{'}_{(t-1)lji}f_{6}(time_{t})\) are the univariate smooth functions of time for the first-order spatial lag covariates for the target and for the competitor, respectively.

In the syntax for model.C, the following effects were added to model.B:

The fifth model we’ll run, model.D, adds the X and Y (coordinate) variables:

model.D <- bam(y ~ 1 + y.lag.1 + 
s(time,by=y.lag.1,bs="cr",k=10) + 
s(time,by=d.lag.1,bs="cr",k=10) + s(time,by=c.lag.1,bs="cr",k=10) + 
s(time,bs="cr",k=10) + 
s(trial, bs="cr",k=10) +  
ti(trial,time,bs="cr",k=10) +  
s(X,Y,bs="tp",k=10) + # Added to model.C
s(person,bs="re") + s(item,bs="re"), 
data=data, family=binomial, method="ML")

The syntax for model.D is based on the following equation:

\[ \begin{aligned} {\rm logit}[P(y_{tlji} = 1 | \theta_{j}, \zeta_{i})] = x^{'}\delta_{0} + z_{j}^{'}\theta_{j} + z_{i}^{'}\zeta_{i} + f_{1}(time_{t}) + y^{'}_{(t-1)lji}f_{2}(time_{t}) + f_{3}(trial_{l}) + f_{4}(time_{t},trial_{l})\\ + \; d^{'}_{(t-1)lji}f_{5}(time_{t}) + c^{'}_{(t-1)lji}f_{6}(time_{t}) + f_{7}(xposition_{tlji},yposition_{tlji}), \end{aligned} \] where \(f_{7}(xposition_{tlji},yposition_{tlji})\) is a bivariate smooth function to model spatial correlations in the binary fixation data based on the \(x\) and \(y\)-coordinates.

In the syntax for model.D, the following effect was added to model.C:

Model Selection

Prior to adding any additional effects (e.g., experimental condition effects), we consider the models specified in the previous section with different effects to find the best model (regarding both fit and parsimony) to explain the correlations and variabilities in the data. The corrected Akaike information criterion (AIC; Wood et al., 2016) and the Bayesian information criterion (BIC; Schwarz, 1978) were used to compare these models, as they reward good model fit while punishing model complexity.

Because there were convergence problems with model.D (from adding the \(x\) and \(y\)-coordinate variables), it was not considered during model selection. A summary table comparing the remaining candidate models is created as follows:

model.list <- list(model.null, model.A, model.B, model.C)
model.summary <- data.frame(LogLikelihood = sapply(model.list, FUN=logLik.gam),
AIC = sapply(model.list, FUN=AIC),
BIC = sapply(model.list, FUN=BIC),
Deviance = sapply(model.list, deviance),
D_e = sapply(model.list, FUN=function(x){100*(deviance(model.null) - deviance(x))/deviance(model.null)}))

model.summary$D_difference <- c(NA, model.summary$D_e[2:4] - 
model.summary$D_e[1:3])

rownames(model.summary) <- c("Null Model", "Model A", "Model B", "Model C")

Line 1 creates a list of the different candidate models to be compared. Lines 2-7 create the model.summary data frame containing the log likelihood, corrected AIC, BIC, deviance, and the deviance explained by each model. The deviance explained is the percentage of the deviance in the null model that is explained (reduced), and is calculated for model \(m\) (where \(m = null,A,B,C\)) with deviance \(D_{m}\) as: \(D_{e,m} = (\frac{D_{null} - D_{m}}{D_{null}}) \times 100\). Lines 9-10 calculates the difference in deviance explained between models, indicating the deviance explained by the variables added to a model (relative to the previous model).

Calling model.summary gives the following:

##            LogLikelihood        AIC        BIC   Deviance      D_e D_difference
## Null Model    -572468.73 1145071.36 1145857.31 1144937.46  0.00000           NA
## Model A        -45378.07   90915.11   91848.31   90756.13 92.07327 92.073267241
## Model B        -45365.06   90904.23   91926.25   90730.12 92.07554  0.002271755
## Model C        -45203.53   90603.63   91757.53   90407.06 92.10376  0.028216987

Model C is the best of the candidate models regarding spatial and temporal effects, having the lowest corrected AIC (\(AIC = 90603.63\)) and BIC (\(BIC = 91757.53\)). Although the variables added to model.B (beyond those included in model.A) and the variables added to model.C (beyond those included in model.B) made minimal improvements to the deviance explained, these improvements were considered enough to justify their inclusion in the model (based on corrected AIC and BIC). Based on these results, we selected Model C regarding the spatial and temporal effects.

Next, we added experimental condition effects (i.e., the variables beyond spatial and temporal effects) to model.C. This includes all possible main effects, two-way interactions, and three-way interactions among the following variables:

These effects were added to to model.D to create model.E using the following code:

model.E <- bam(y ~ 1+ y.lag.1 + 
s(time,by=y.lag.1,bs="cr",k=10) + 
s(time,by=t.lag.1,bs="cr",k=10) + s(time,by=c.lag.1,bs="cr",k=10) + 
s(time,bs="cr",k=10) + 
s(trial, bs="cr",k=10) +  
ti(trial,time,bs="cr",k=10) + 
s(person,bs="re") + s(item,bs="re") +
talker.e*session.e*sleep.1 + # Added to model.E
talker.e*session.e*sleep.2,  # Added to model.E
data=data, family=binomial, method="ML")

The main effects, two-way interactions, and three-way interactions among talker.e, session.e, sleep.1, and sleep.2 are added in lines 9-10.

Calling summary(model.E) gives the following output:

Family: binomial 
Link function: logit 

Formula:
y ~ 1 + y.lag.1 + s(time, by = y.lag.1, bs = "cr", k = 10) + 
    s(time, by = t.lag.1, bs = "cr", k = 10) + s(time, by = c.lag.1, 
    bs = "cr", k = 10) + s(time, bs = "cr", k = 10) + s(trial, 
    bs = "cr", k = 10) + ti(trial, time, bs = "cr", k = 10) + 
    s(person, bs = "re") + s(item, bs = "re") + talker.e * session.e * 
    sleep.1 + talker.e * session.e * sleep.2

Parametric coefficients:
                           Estimate Std. Error z value
(Intercept)                -3.99834    0.10383 -38.508
y.lag.11                    8.85873    0.04614 192.002
talker.e                    0.13644    0.02177   6.268
session.e                   0.07423    0.02175   3.412
sleep.1                    -0.09051    0.18815  -0.481
sleep.2                     0.17862    0.16291   1.096
talker.e:session.e         -0.08665    0.04347  -1.993
talker.e:sleep.1           -0.03782    0.06146  -0.615
session.e:sleep.1           0.03304    0.06164   0.536
talker.e:sleep.2           -0.04486    0.05313  -0.844
session.e:sleep.2          -0.21079    0.05319  -3.963
talker.e:session.e:sleep.1 -0.09947    0.12287  -0.810
talker.e:session.e:sleep.2  0.03778    0.10628   0.355
                           Pr(>|z|)    
(Intercept)                 < 2e-16 ***
y.lag.11                    < 2e-16 ***
talker.e                   3.65e-10 ***
session.e                  0.000644 ***
sleep.1                    0.630472    
sleep.2                    0.272911    
talker.e:session.e         0.046254 *  
talker.e:sleep.1           0.538344    
session.e:sleep.1          0.591914    
talker.e:sleep.2           0.398469    
session.e:sleep.2          7.40e-05 ***
talker.e:session.e:sleep.1 0.418190    
talker.e:session.e:sleep.2 0.722256    
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                    edf Ref.df  Chi.sq  p-value    
s(time):y.lag.11  6.285  7.377  283.21  < 2e-16 ***
s(time):t.lag.1   2.042  2.081  219.58  < 2e-16 ***
s(time):c.lag.1   6.278  7.221   86.49 1.27e-15 ***
s(time)           6.790  7.667  452.79  < 2e-16 ***
s(trial)          2.270  2.827   15.16  0.00175 ** 
ti(trial,time)    3.343  4.340   12.57  0.02141 *  
s(person)        55.434 57.000 1645.49  < 2e-16 ***
s(item)           6.794  7.000  272.62  < 2e-16 ***
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =   0.96   Deviance explained = 92.5%
-ML = 1.3047e+06  Scale est. = 1         n = 927286

Similar to the comparisons made between the null model and models A-C, model.E (with experimental condition effects) can be compared to model.D (without experimental condition effects) as follows:

model.list <- list(model.C, model.E)
model.summary.2 <- data.frame(LogLikelihood = sapply(model.list, FUN=logLik.gam),
AIC = sapply(model.list, FUN=AIC),
BIC = sapply(model.list, FUN=BIC),
Deviance = sapply(model.list, deviance),
D_e = sapply(model.list, FUN=function(x){100*(deviance(model.null) - deviance(x))/deviance(model.null)}))

model.summary.2$D_difference <- c(NA, model.summary.2$D_e[2] - model.summary.2$D_e[1])

rownames(model.summary.2) <- c("Model C", "Model E")

Calling model.summary.2 gives the following:

##         LogLikelihood      AIC      BIC Deviance      D_e D_difference
## Model C     -45203.53 90603.63 91757.53 90407.06 92.10376           NA
## Model E     -45166.45 90547.40 91806.51 90332.90 92.11023  0.006477226

The experimental effects added to model.C to create model.E made a small improvement to the deviance and the deviance explained. Based on corrected AIC (which was smaller for model.E), this decrease in deviance justified the additional parameters created by adding the experimental effects. However, based on BIC (which punishes the number of parameters more harshly than corrected AIC), the additional parameters did not result in a large enough decrease in deviance to justify their inclusion. Based on these results, we selected model.E as the final model. You can refer to pp. 20-22 in Cho et al. (2022) for interpretations regarding the results of model.E.

The smooth functions in model.E can be plotted by calling plot(model.E, select = 1), where the value for the select argument determines which smooth function is plotted (in the order of inclusion in the syntax of model.E): the time-varying autocorrelations (select=1), the target-distance autocorrelation (select=2), the competitor-distance autocorrelation (select=3), the effect of time (select=4), the effect of trial (select=5), the interaction between time and trial (select=6), the person random effect (select=7), and the item random effect (select=8). For example, calling plot(model.E, select = 4) plots the smooth function for the effect of time (s(time, bs = “cr”, k = 10) in the syntax for model.E):

Standard deviations and 95% credible intervals for these smooth functions can be reported by calling gam.vcomp(model.E).

As can be seen in the syntax for model.E, values for the basis dimension (\(k\), see Cho et al. [2022] for details) were selected as \(k=10\) for the majority of the smooth functions used. To evaluate whether \(k=10\) is sufficiently large enough for each of these smooth functions, gam.check(model.E,type=“pearson”) can be called, which gives the following output:

Method: ML   Optimizer: outer newton
full convergence after 5 iterations.
Gradient range [-0.05549542,0.06283272]
(score 1304663 & scale 1).
Hessian positive definite, eigenvalue range [0.02596176,27.61129].
Model rank =  209 / 209 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

                    k'   edf k-index p-value  
s(time):y.lag.11  9.00  6.28    0.97   0.080 .
s(time):t.lag.1  10.00  2.04    0.97   0.055 .
s(time):c.lag.1  10.00  6.28    0.97   0.150  
s(time)           9.00  6.79    0.97   0.105  
s(trial)          9.00  2.27    0.99   0.650  
ti(trial,time)   81.00  3.34    0.98   0.420  
s(person)        60.00 55.43      NA      NA  
s(item)           8.00  6.79      NA      NA  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

For each smooth function, a \(k\)-index close to 1, an \(edf\) (effective degrees of freedom) less than \(k\), and a \(p\)-value greater than .05 all indicate that the selected value of \(k\) is sufficiently large. Note that the last two smooth functions (for the person random effect and the item random effect) do not have \(k\)-indices or \(p\)-values because values of \(k\) were not specified for these smooth functions as random effects in the syntax for model.E (with default values used instead). Based on these results, we can conclude that \(k=10\) was sufficiently large for each smooth function.

Model Evaluation

This section evaluates model.E regarding its adequacy as a model for describing the data.

To evaluate the adequacy of model.E for describing the data, we’ll first plot the residuals of model.E over the range of fitted values as follows:

fit.E <- fitted(model.E)
residual.E <- residuals(model.E, type="pearson")
hist.residual.E <- hist(residual.E, xlab="Pearson Residuals")
fit.residual.E <- plot(fit.E, residual.E)

Lines 1-2 extract the fitted values and residuals (respectively) of model.E. Line 3 plots a histogram of the residuals of model.E. Line 4 plots the fitted values vs. residuals of model.E to further examine where the largest residuals occurred. Calling hist.residual.E plots the following:

Although several residuals were very large in magnitude (ranging from -37.48 to 43.59), 99.1% of the Pearson residuals were within an acceptable range of \(\pm1.96\), with \(M=-0.002\) and \(SD=0.988\).

Calling fit.residual.E plots the following:

As seen in the plot above, the most extreme residuals that were observed occurred at the extremes of the distribution of fitted values.

Second, we’ll look at the Somers’ rank correlation between the binary data (y) and the predicted probabilities based on model.E by calling the following:

somers2(fit.E, as.numeric(na.omit(data$y)))

The Somers’ rank correlation (Dxy in the resulting output) was .988, indicating that the predictions from model.E (which range from 0 to 1) correlated strongly with the binary outcome.

Both of these results indicate that model.E describes the data sufficiently well.

References

Chatfield, C. (2004). The analysis of time series: An introduction (6th ed.). Chapman; Hall/CRC. https://doi.org/10.1007/978-1-4899-2921-1

Cho, S.-J., Brown-Schmidt, S., De Boeck, P., & Naveiras, M. (2022). Space-time modeling of intensive binary time series eye-tracking data using a generalized additive logistic regression model. Psychological Methods. https://doi.org/10.1037/met0000444

Pebesma, E. J. (2004). Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30, 683-691.

Pebesma E. J. & Bivand R. S. (2005). Classes and methods for spatial data in R. R News, 5(2), 9–13. https://CRAN.R-project.org/doc/Rnews/.

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. https://doi.org/10.1214/aos/1176344136

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. New York, NY: Springer-Verlag. https://ggplot2.tidyverse.org.

Wood, S. N., Pya, N., & Säfken, B. (2016). Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association, 111(516), 1548-1563. https://doi.org/10.1080/01621459.2016.1180986