Funding Source: This work was supported by the National Science Foundation (SES1851690, https://www.nsf.gov/awardsearch/showAward?AWD_ID=1851690). Illustrative data collection was supported by the National Institute of Deafness and Other Communication Disorders (R01 DC011755).
This website provides a tutorial for fitting generalized additive logistic regression models as presented in Cho, Brown-Schmidt, De Boeck, and Naveiras (2022, https://psycnet.apa.org/doiLanding?doi=10.1037%2Fmet0000444) using R
. This tutorial uses the illustrative data set in Cho et al. (2022) to demonstrate how generalized additive logistic regression models as generalized additive mixed models (GAMMs) can be applied to intensive binary time-series eye-tracking data with complex spatial-temporal effects. The structure of this tutorial is as follows:
This section will show what your data should (roughly) resemble when using this tutorial.
First, read the file for your data into R
as a data frame:
## y person trial item talker session sleep1 sleep2 time
## 1 0 1 20 3 1 1 0.25 -0.5 180
## 2 0 1 20 3 1 1 0.25 -0.5 190
## 3 0 1 20 3 1 1 0.25 -0.5 200
## 4 0 1 20 3 1 1 0.25 -0.5 210
## 5 0 1 20 3 1 1 0.25 -0.5 220
## 6 0 1 20 3 1 1 0.25 -0.5 230
## 7 0 1 20 3 1 1 0.25 -0.5 240
## 8 0 1 20 3 1 1 0.25 -0.5 250
## 9 0 1 20 3 1 1 0.25 -0.5 260
## 10 0 1 20 3 1 1 0.25 -0.5 270
Your data should be in a long format, with each row containing the data for a single time point. We’ll define new variables for both (a) the levels of each variable (unique.variablename
) and (b) the number of levels for each variable (number.variablename
) as follows:
unique.y <- sort(unique(y))
unique.person <- sort(unique(person))
unique.trial <- sort(unique(trial))
unique.item <- sort(unique(item))
unique.talker <- sort(unique(talker))
unique.session <- sort(unique(session))
unique.time <- sort(unique(time))
number.y <- length(unique.y) # 2, because y is a binary outcome (0/1)
number.person <- length(unique.person) # 60
number.trial <- length(unique.trial) # 64
number.item <- length(unique.item) # 8
number.talker <- length(unique.talker) # 2
number.session <- length(unique.session) # 2
number.time <- length(unique.time) # 132
In this section we’ll show how to calculate empirical logits (\(empirical \; logit = log(\frac{proportion}{1 - proportion})\)) for your data, and how to use those empirical logits to explore change processes (autocorrelations and trend). We’ll then plot the autocorrelations, partial autocorrelations, and trend (for examining temporal dependence), Pearson residuals (for examining spatial correlations), and the semivariogram (for examining spatial by temporal interactions) using the ggplot2
package (Wickham, 2016).
First, we’ll create a data frame called empirical.logit
containing the empirical logit calculated for each unique combination of time
, trial
, and session
. Note that we don’t include person
or item
here because the empirical logit will be calculated across persons and because the items are nested within trials. The proportion is calculated as the proportion of correct responses (\(y=1\)) for each unique combination of time
, trial
, and session
(\(proportion_{tls} = mean(y_{tls})\) for time point \(t\), trial \(l\), and session \(s\)). The empirical logit is then calculated based on the proportions: \((empirical \; logit)_{tls} = log(\frac{proportion_{tls}}{1 - proportion_{tls}})\). The proportions and empirical logits are added to the empirical.logit
data frame as follows:
rowcount <- number.time*number.trial*number.session
empirical.logit <- data.frame(time=rep(NA,rowcount),trial=rep(NA,rowcount),
session=rep(NA,rowcount),proportion=rep(NA,rowcount),empirical.logit=rep(NA,rowcount))
index <- 1
for (t in 1:number_time){
for (l in 1:number_trialseq){
for (s in 1:number_session){
y.tls <- data[(data$time==unique_time[t] & data$trialseq==unique_trial[l] &
data$session==unique_session[s]),'y']
proportion <- mean(y.tls)
prop <- min(max(10^-9, proportion), 10^9)
emp.logit <- log(prop/(1 - prop))
empirical.logit[index,] <- c(unique_time[t], unique_trialseq[l], unique_session[s],
proportion, emp.logit)
index <- index + 1 #
}}}
empirical.logit$time <- as.factor(empirical.logit$time)
Lines 1-4 initialize the empirical.logit
matrix and the row index (for each unique combination of time
, trial
, and session
). Lines 6-8 loop through each unique combination of time
, trial
, and session
. Line 11 calculates the proportion, and lines 12-13 calculate the empirical logit. Line 12 places upper and lower bounds on the proportion when calculating the empirical logit for extreme cases where proportion
approaches 0 or 1, which would result in empirical.logit
ranging between \(-\infty\) and \(+\infty\), respectively. Lines 14-16 input the calculations into the empirical.logit
data frame before updating the row index. Line 18 sets the time
variable in empirical.logit
to a factor (which is necessary when plotting trend).
Looking at the resulting empirical.logit
data frame, we now have the calculated values for empirical logit:
## time trial session proportion empirical.logit
## 1 180 1 1 0.16666667 -1.609438
## 2 180 1 2 0.11666667 -2.024382
## 3 180 2 1 0.20000000 -1.386294
## 4 180 2 2 0.10000000 -2.197225
## 5 180 3 1 0.08333333 -2.397895
## 6 180 3 2 0.16666667 -1.609438
## 7 180 4 1 0.10000000 -2.197225
## 8 180 4 2 0.15000000 -1.734601
## 9 180 5 1 0.11666667 -2.024382
## 10 180 5 2 0.11666667 -2.024382
In this section we’ll calculate the autocorrelations and partial autocorrelations, using the calculated empirical logits. Autocorrelations and partial autocorrelations will be calculated for each unique combination of trial
, session
, and each time.lag
ranging from 1 to time.lag.max
(for this example, we’ll use a time.lag.max
of 20, meaning we’ll examine autocorrelations and partial autocorrelations for each time lag from 1 to 20):
time.lag.max <- 20
rowcount <- time.lag.max*number.trial*number.session
AC.PAC <- data.frame(time.lag=rep(NA,rowcount), trial=rep(NA,rowcount),
session=rep(NA,rowcount), AC=rep(NA,rowcount), PAC=rep(NA,rowcount))
index <- 1
for (l in 1:length(unique.trial)){
for (s in 1:length(unique.session)){
empirical.logit.ls <- empirical.logit[(empirical.logit$trial==unique.trial[l] &
empirical.logit$session==unique.session[s]),'empirical.logit']
ac <- acf(empirical.logit.ls, lag.max = time.lag.max, plot=FALSE)
ac <- ac$acf[2:(time.lag.max + 1)]
pac <- c(pacf(empirical.logit.ls, lag.max = time.lag.max,
na.action=na.pass, plot=FALSE)$acf)
for (time.lag in 1:time.lag.max){
AC.PAC[(row.index + time.lag - 1),] <- c(time.lag, unique.trial[l], unique.session[s],
ac[time.lag], pac[time.lag])
}
index <- index + time.lag.max
}}
AC.PAC$time.lag <- as.factor(AC.PAC$time.lag)
Line 1 selects the maximum time lag that we’re interested in (for this example, we’ll look at time lags ranging from 1 to 20). Lines 2-5 initialize the AC.PAC
data frame and the row index. Lines 7-8 loop through each unique combination of trial
and session
. Lines 11-14 calculate the autocorrelations and partial autocorrelations based on the empirical logits we previously calculated. Note that in line 12 we exclude the first autocorrelation, which is the autocorrelation where time.lag=0
, which we aren’t interested in. Lines 15-18 update the AC.PAC
data frame for each unique combination of trial
, session
, and time.lag
to include the autocorrelations and partial autocorrelations. Line 21 redefines time.lag
as a factor (which is necessary when plotting autocorrelations and partial autocorrelations in the following section).
Looking at AC.PAC
, we should now have the calculated values for both autocorrelations and partial autocorrelations:
## time.lag trial session AC PAC
## 1 1 1 1 0.9804831 0.98048310
## 2 2 1 1 0.9597471 -0.04139363
## 3 3 1 1 0.9386191 -0.01977484
## 4 4 1 1 0.9162839 -0.04160785
## 5 5 1 1 0.8936426 -0.01767877
## 6 6 1 1 0.8719185 0.01280519
## 7 7 1 1 0.8496486 -0.02616967
## 8 8 1 1 0.8288226 0.02636468
## 9 9 1 1 0.8112830 0.07066020
## 10 10 1 1 0.7935337 -0.02037809
In this step, we’ll use ggplot2
to plot figures for the autocorrelations and partial autocorrelations:
library(ggplot2)
AC.plot.1 <- ggplot(AC.PAC[AC.PAC$session==1,], aes(x=time.lag, y=AC)) +
geom_boxplot() +
labs(title = "Figure 1: Session 1 Autocorrelation", x = "Time Lag",
y = "Autocorrelation for Trials")
AC.plot.2 <- ggplot(AC.PAC[AC.PAC$session==2,], aes(x=time.lag, y=AC)) +
geom_boxplot() +
labs(title = "Figure 2: Session 2 Autocorrelation", x = "Time Lag",
y = "Autocorrelation for Trials")
PAC.plot.1 <- ggplot(AC.PAC[AC.PAC$session==1,], aes(x=time.lag, y=PAC)) +
geom_boxplot() +
labs(title = "Figure 3: Session 1 Partial Autocorrelation", x = "Time Lag",
y = "Autocorrelation for Trials")
PAC.plot.2 <- ggplot(AC.PAC[AC.PAC$session==2,], aes(x=time.lag, y=PAC)) +
geom_boxplot() +
labs(title = "Figure 4: Session 2 Partial Autocorrelation", x = "Time Lag",
y = "Autocorrelation for Trials")
Line 1 loads the ggplot2
library. Lines 3-6 plot the autcorrelations for session 1. Similarly, lines 8-11, 13-16, and 18-21 plot the autocorrelations for session 2, partial autocorrelations for session 1, and partial autocorrelations for session 2 (respectively).
Calling AC.plot.1
gives the following:
The box plot of correlograms across persons presented in Figure 1 above shows that the autocorrelations did not converge at zero, even at large values of lag. This pattern has also been observed when a time-series also exhibits trend (Chatfield, 2004, p. 26).
You should expect a figure somewhat similar to Figure 1 above, with the autocorrelation decreasing as a function of the time lag.
Calling up PAC.plot.1
gives the following:
You should expect a figure somewhat similar to Figure 2 above, with high partial autocorrelations at time.lag=1
that drop off quickly for time.lag > 1
. This suggests that the first order of autocorrelations should be considered.
In this section, we’ll use ggplot2
to plot figures for the values of empirical logit we calculated above. Each figure will illustrate the empirical logit across trial
for each session
over time:
time.breaks <- seq(0,1500,by=100)
empirical.logit.plot.1 <- ggplot(empirical.logit[empirical.logit$session==1,],
aes(x=time, y=empirical.logit)) +
geom_boxplot() +
labs(title = "Figure 5: Trend", x = "Time", y = "Empirical Logit") +
scale_x_discrete(breaks=time.breaks)
empirical.logit.plot.2 <- ggplot(empirical.logit[empirical.logit$session==2,],
aes(x=time, y=empirical.logit)) +
geom_boxplot() +
labs(title = "Figure 6: Session 2 Trend", x = "Time", y = "Empirical Logit") +
scale_x_discrete(breaks=time.breaks)
Line 1 sets the discrete time breaks for the tick marks of the \(x\)-axis. For this example, because time ranges from 180 to 1490 in increments of 10, we used tick marks from 0 to 1500 in increments of 100 (to avoid the \(x\)-axis from appearing cluttered). Lines 3-7 and lines 9-13 create the plots of empirical logits for session 1 and for session 2, respectively.
Calling empirical.logit.plot.1
gives the following plot:
You should expect a figure somewhat resembling the figure above. The overall trend was positive and (approximately) linear over time. Fitted lines over time were similar between the linear function and Kernal-weighted local polynomial smoothing function, and only small deviations from the linear trend were observed.
Similar patterns in autocorrelations, partial autocorrelations, and trend were observed for session=2
as those presented for session=1
. Based on exploratory data analyses of change processes (linear trend and the first-order autocorrelations), we considered a generalized additive logistic regression model to account for the change processes in this binary data.
In this section, we’ll plot the Pearson residuals based on a null model (Equation 5 in Cho et al., 2022) to examine spatial correlations within our data. Rather than modeling across all time points, we’ll reduce the complexity of the data and the resulting model by collapsing the time
variable into blocks of time points.
First, we’ll separate the 132 time points (ranging from 180 to 1490 in increments of 10) into 6 equally-sized time blocks containing 22 time points, which we’ll include in the data as the time.block
variable:
time.breaks <- c(390, 610, 830, 1050, 1270)
data$time.block <- rep(1,length(data$time))
for (time.break in 1:5){
data$time.block[which(data$time > time.breaks[time.break])] <- time.break + 1
}
The time.breaks
in line 1 are the thresholds between time blocks, splitting time
into 6 equally-sized time blocks. Lines 2-5 create the time.block
variable, with each value of time.block
(ranging from 1 to 6) equal to one more than the number of time.breaks
(ranging from 0 to 5) that time
exceeds. For example, values of time
equal to 200, 450, and 1300 would correspond to time.block
equal to 1, 2, and 6, respectively.
Second, we’ll calculate the Pearson residuals based on a null model with these time blocks using the bam
function in the mgcv
package:
library(mgcv)
m0 <- bam(y ~ 1 + s(person, by=time.block, bs="re") + s(item, by=time.block, bs="re"),
data=data, family=binomial, method="ML")
m0.residuals <- residuals(m0, type="pearson")
m0.data <- data.frame(residuals=m0.residuals, time.block=data$time.block, X=data$X, Y=data$Y)
Line 1 loads the mgcv
package to create the null model in line 2-3. Lines 4-5 extract the Pearson residuals from the null model and stores them alongside the \(x\) and \(y\)-coordinates X
and Y
(not to be confused with the binary outcome y
) in the m0.residuals.data
data frame.
Third, these residuals and coordinates are plotted in a bubble plot using the bubble
function in the sp
package (Pebesma & Bivand, 2005):
coordinates(m0.data) <- c("X","Y")
bubble(m0.data, "residuals", col=c("black","grey"), main = "Pearson Residuals",
xlab="x-coordinates (ranging from -1517 to 2943)", ylab="y-coordinates (ranging from -818 to 2136)")
Line 1 above defines the variables in m0.data
that represent the \(x\) and \(y\)-coordinates. Lines 2-4 generate the following bubble plot:
The Pearson residuals are plotted by direction (gray for positive, black for negative) and magnitude (small to large). The four areas with large positive (gray) residuals in the above plot correspond with the four images that participants viewed, indicating that there are positive spatial correlations among these four images.
In this section, we’ll use the Pearson residuals obtained in the previous section to calculate and plot semivariograms to examine spatial dependence in the \(x\) and \(y\)-coordinates in our data over time to explore spatial by temporal interactions. Rather than examining the semivariogram across all time points (which would be cluttered due to the large amoung of data), we’ll plot the semivariogram for individual time blocks using the time.block
variable created in the previous section.
The residuals are used to calculate the semivariogram for a given time block using the variogram
function in the gstat
package (Pebesma, 2004):
semivariogram <- variogram(residuals ~ 1, data = m0.data[m0.data$time.block==6,])
ggplot(semivariogram, aes(x=dist, y=gamma)) +
geom_point() +
geom_line() +
labs(title = "Semivariogram for Time Block 6", x = "Distance", y = "Semivariance") +
theme_bw()
Line 1 calculates the semivariogram for time.block=6
using the variogram
function, which is plotted in lines 2-6. The resulting plot is as follows:
We can also use the variogram
function to test the isotopy assumption that spatial dependence is the same in every direction for a given time block. For time.block=6
the semivariogram is calculated and plotted as follows:
semivariogram.isotropy <- variogram(residuals ~ 1, data = m0.data[m0.data$time.block==6,],
alpha=c(0,45,90,135))
plot(semivariogram.isotropy)
The alpha
argument defines the directions to be evaluated. For this example, the directions of \(0^{\circ}\) (north), \(45^{\circ}\) (north-east), \(90^{\circ}\) (east), and \(135^{\circ}\) (south-east) are evaluated. The resulting output is the following plot:
The strength and patterns of spatial correlation were similar for the four selected directions of north (\(0^{\circ}\)), northeast (\(45^{\circ}\)), east (\(90^{\circ}\)), and southeast (\(135^{\circ}\)), which indicates that isotropy can be assumed.
In this section, we will first set up the data to include several lag covariates that will be used for modeling. This updated data set will then be used to specify generalized additive logistic regression models from which model selection will take place.
Several additional variables we have in the data that we’re interested in including during modeling include:
d
- The Euclidean distance between the current fixation and the centroid of the target image (‘d’ was used to avoid confusion with ‘t’ for time).c
- The Euclidean distance between the current fixation and the centroid of the competitor image.We’re going to define lag-1 covariates for y
(the outcome), d
(target distance), and c
(competitor distance). As seen when examining autocorrelation and partial autocorrelation plots, we have good reason to assume that there are first-order correlations that should be considered when modeling. These lag covariates are added to the data as follows:
data$y.lag.1 <- data$d.lag.1 <- data$c.lag.1 <- NA
for (l in 1:length(unique.trial)){
for (j in 1:length(unique.person)){
for (s in 1:length(unique.session)){
index.ljs <- which(data$trial==unique.trial[l] &
data$person==unique.person[j] &
data$session==unique.session[s])
data.ljs <- data[index.ljs,]
y.ljs <- data.ljs$y
t.ljs <- data.ljs$d
c.ljs <- data.ljs$c
data[index.ljs,'y.lag.1'] <- c(NA,y.ljs[-length(y.ljs)]
data[index.ljs,'d.lag.1'] <- c(NA,d.ljs[-length(y.ljs)]
data[index.ljs,'c.lag.1'] <- c(NA,c.ljs[-length(y.ljs)]
}}}
Line 1 initializes the lag variables in the data. Lines 2-4 indicate that we’re creating the lag covariates for each unique combination of trial
, person
, and session
. For example, if for trial=3
, person=11
, and session=1
we have y=[1,0,0,1,1,0,1,0,1]
over nine time points, we want y.lag.1=[NA,1,0,0,1,1,0,1,0]
. Lines 5-7 obtain the row indices of the data corresponding to the current combination of variables, and lines 8-11 extract the variables from which we’ll create the lag variables. Lines 12-14 create the lag variables, and insert them into the data.
Before modeling, we’ll define the necessary variables in the data as factors and add a contrast
attribute to the y.lag.1
variable:
data$person <- as.factor(data$person)
data$item <- as.factor(data$item)
data$session <- as.factor(data$session)
data$y.lag.1 <- as.ordered(data$y.lag.1)
contrasts(data$y.lag.1) <- 'contr.treatment'
Lines 1-4 re-code the person
, item
, session
, and y.lag.1
variables as factors. Line 5 defines the lag-1 covariate for the outcome (by using contr.treatment
) contrast variable as an ordered factor.
The first model we’ll create is the null model, model.null
. This null model is different than the one previously examined when calculating and plotting the spatial and spatial by temporal interactions, as it no longer uses the time.block
variable. The syntax for the null model is specified as follows:
model.null <- bam(y ~ 1 + s(person, bs="re") + s(item, bs="re"),
data=data, family=binomial, method="ML")
The family
argument specifies the random component as binomial data, and the method=“ML”
argument selects maximum likelihood as the estimation method. The syntax for model.null
is based on the following equation:
\[\begin{equation} {\rm logit}[P(y_{tlji} = 1 | \theta_{j}, \zeta_{i})] = x^{'}\delta_{0} + z_{j}^{'}\theta_{j} + z_{i}^{'}\zeta_{i}, \end{equation}\]
where \(x^{'}\) is a design matrix for fixed intercept parameter \(\delta_{0}\), \(z_{j}\) is a design matrix for parametric random person effect \(\theta_{j}\), and \(z_{i}\) is a design matrix for parametric random item effect \(\zeta_{i}\).
The second model we’ll run, model.A
, adds the lag variable y.lag
, as well as the time
variable:
model.A <- bam(y ~ 1 + y.lag.1 +
s(time,by=y.lag.1,bs="cr",k=10) + # Added to model.null
s(time,bs="cr",k=10) + # Added to model.null
s(person,bs="re") + s(item,bs="re"),
data=data, family=binomial, method="ML")
The syntax for model.A
is based on the following equation:
\[\begin{equation} {\rm logit}[P(y_{tlji} = 1 | \theta_{j}, \zeta_{i})] = x^{'}\delta_{0} + z_{j}^{'}\theta_{j} + z_{i}^{'}\zeta_{i} + f_{1}(time_{t}) + y^{'}_{(t-1)lji}f_{2}(time_{t}), \end{equation}\]
where \(f_{1}(time_{t})\) is a univariate smooth function to model nonlinear (potentially non-monotonic) trend over time, and \(y^{'}_{(t-1)lji}f_{2}(time_{t})\) is a univariate smooth function of time for the first-order temporal lag covariate.
In the syntax for model.A
, the binary variable y
is predicted using the following effects:
s(person,bs=“re”)
for the random person effect \(\theta_{j}\)s(item,bs=“re”)
for the random item effect \(\zeta_{i}\)y.lag.1 + s(time,by=y.lag.1,bs=“cr”,k=10)
for the effects of the first-order temporal lag over time \(y^{'}_{(t-1)lji}f_{2}(time_{t})\)s(time,bs=“cr”,k=10)
for the nonlinear trend of time \(f_{1}(time_{t})\)The third model we’ll run, model.B
, adds the trial
variable, as well as the interaction between trial
and time
:
model.B <- bam(y ~ 1 + y.lag.1 +
s(time,by=y.lag.1,bs="cr",k=10) +
s(time,bs="cr",k=10) +
s(trial,bs="cr",k=10) + # Added to model.A
ti(trial,time,bs="cr",k=10) + # Added to model.A
s(person,bs="re") + s(item,bs="re"),
data=data, family=binomial, method="ML")
The syntax for model.B
is based on the following equation:
\[\begin{equation} {\rm logit}[P(y_{tlji} = 1 | \theta_{j}, \zeta_{i})] = x^{'}\delta_{0} + z_{j}^{'}\theta_{j} + z_{i}^{'}\zeta_{i} + f_{1}(time_{t}) + y^{'}_{(t-1)lji}f_{2}(time_{t}) + f_{3}(trial_{l}) + f_{4}(time_{t},trial_{l}), \end{equation}\]
where \(f_{3}(trial_{l})\) is a univariate smooth function to model nonlinear (potentially non-monotonic) trend over trials, and \(f_{4}(time_{t},trial_{l})\) is a bivariate smooth function to model the interaction between time and trials.
In the syntax for model.B
, the following effects were added to model.A
:
s(trial,bs=“cr”,k=10)
for the nonlinear trend of trial \(f_{3}(trial_{l})\)ti(trial,time,bs=“cr”,k=10)
for the nonlinear interaction between time and trials \(f_{4}(time_{t},trial_{l})\)The fourth model we’ll run, model.C
, adds the t.lag
and c.lag
variables:
model.C <- bam(y ~ 1 + y.lag.1 +
s(time,by=y.lag.1,bs="cr",k=10) +
s(time,by=d.lag.1,bs="cr",k=10) + s(time,by=c.lag.1,bs="cr",k=10) + # Added to model.B
s(time,bs="cr",k=10) +
s(trial,bs="cr",k=10) +
ti(trial,time,bs="cr",k=10) +
s(person,bs="re") + s(item,bs="re"),
data=data, family=binomial, method="ML")
The syntax for model.C
is based on the following equation:
\[ \begin{aligned} {\rm logit}[P(y_{tlji} = 1 | \theta_{j}, \zeta_{i})] = x^{'}\delta_{0} + z_{j}^{'}\theta_{j} + z_{i}^{'}\zeta_{i} + f_{1}(time_{t}) + y^{'}_{(t-1)lji}f_{2}(time_{t}) + f_{3}(trial_{l}) + f_{4}(time_{t},trial_{l}) \\ + \; d^{'}_{(t-1)lji}f_{5}(time_{t}) + c^{'}_{(t-1)lji}f_{6}(time_{t}), \end{aligned} \]
where \(d^{'}_{(t-1)lji}f_{5}(time_{t})\) and \(c^{'}_{(t-1)lji}f_{6}(time_{t})\) are the univariate smooth functions of time for the first-order spatial lag covariates for the target and for the competitor, respectively.
In the syntax for model.C
, the following effects were added to model.B
:
s(time,by=d.lag.1,bs=“cr”,k=10)
for the nonlinear effect of the first-order spatial lag for the target over time \(d^{'}_{(t-1)lji}f_{5}(time_{t})\)s(time,by=c.lag.1,bs=“cr”,k=10)
for the nonlinear effect of the first-order spatial lag for the competitor over time \(c^{'}_{(t-1)lji}f_{6}(time_{t})\)The fifth model we’ll run, model.D
, adds the X
and Y
(coordinate) variables:
model.D <- bam(y ~ 1 + y.lag.1 +
s(time,by=y.lag.1,bs="cr",k=10) +
s(time,by=d.lag.1,bs="cr",k=10) + s(time,by=c.lag.1,bs="cr",k=10) +
s(time,bs="cr",k=10) +
s(trial, bs="cr",k=10) +
ti(trial,time,bs="cr",k=10) +
s(X,Y,bs="tp",k=10) + # Added to model.C
s(person,bs="re") + s(item,bs="re"),
data=data, family=binomial, method="ML")
The syntax for model.D
is based on the following equation:
\[ \begin{aligned} {\rm logit}[P(y_{tlji} = 1 | \theta_{j}, \zeta_{i})] = x^{'}\delta_{0} + z_{j}^{'}\theta_{j} + z_{i}^{'}\zeta_{i} + f_{1}(time_{t}) + y^{'}_{(t-1)lji}f_{2}(time_{t}) + f_{3}(trial_{l}) + f_{4}(time_{t},trial_{l})\\ + \; d^{'}_{(t-1)lji}f_{5}(time_{t}) + c^{'}_{(t-1)lji}f_{6}(time_{t}) + f_{7}(xposition_{tlji},yposition_{tlji}), \end{aligned} \] where \(f_{7}(xposition_{tlji},yposition_{tlji})\) is a bivariate smooth function to model spatial correlations in the binary fixation data based on the \(x\) and \(y\)-coordinates.
In the syntax for model.D
, the following effect was added to model.C
:
s(X,Y,bs=“tp”,k=10)
for the nonlinear interaction between the \(x\) and \(y\)-coordinates \(f_{7}(xposition_{tlji},yposition_{tlji})\)Prior to adding any additional effects (e.g., experimental condition effects), we consider the models specified in the previous section with different effects to find the best model (regarding both fit and parsimony) to explain the correlations and variabilities in the data. The corrected Akaike information criterion (AIC; Wood et al., 2016) and the Bayesian information criterion (BIC; Schwarz, 1978) were used to compare these models, as they reward good model fit while punishing model complexity.
Because there were convergence problems with model.D
(from adding the \(x\) and \(y\)-coordinate variables), it was not considered during model selection. A summary table comparing the remaining candidate models is created as follows:
model.list <- list(model.null, model.A, model.B, model.C)
model.summary <- data.frame(LogLikelihood = sapply(model.list, FUN=logLik.gam),
AIC = sapply(model.list, FUN=AIC),
BIC = sapply(model.list, FUN=BIC),
Deviance = sapply(model.list, deviance),
D_e = sapply(model.list, FUN=function(x){100*(deviance(model.null) - deviance(x))/deviance(model.null)}))
model.summary$D_difference <- c(NA, model.summary$D_e[2:4] -
model.summary$D_e[1:3])
rownames(model.summary) <- c("Null Model", "Model A", "Model B", "Model C")
Line 1 creates a list of the different candidate models to be compared. Lines 2-7 create the model.summary
data frame containing the log likelihood, corrected AIC, BIC, deviance, and the deviance explained by each model. The deviance explained is the percentage of the deviance in the null model that is explained (reduced), and is calculated for model \(m\) (where \(m = null,A,B,C\)) with deviance \(D_{m}\) as: \(D_{e,m} = (\frac{D_{null} - D_{m}}{D_{null}}) \times 100\). Lines 9-10 calculates the difference in deviance explained between models, indicating the deviance explained by the variables added to a model (relative to the previous model).
Calling model.summary
gives the following:
## LogLikelihood AIC BIC Deviance D_e D_difference
## Null Model -572468.73 1145071.36 1145857.31 1144937.46 0.00000 NA
## Model A -45378.07 90915.11 91848.31 90756.13 92.07327 92.073267241
## Model B -45365.06 90904.23 91926.25 90730.12 92.07554 0.002271755
## Model C -45203.53 90603.63 91757.53 90407.06 92.10376 0.028216987
Model C is the best of the candidate models regarding spatial and temporal effects, having the lowest corrected AIC (\(AIC = 90603.63\)) and BIC (\(BIC = 91757.53\)). Although the variables added to model.B
(beyond those included in model.A
) and the variables added to model.C
(beyond those included in model.B
) made minimal improvements to the deviance explained, these improvements were considered enough to justify their inclusion in the model (based on corrected AIC and BIC). Based on these results, we selected Model C regarding the spatial and temporal effects.
Next, we added experimental condition effects (i.e., the variables beyond spatial and temporal effects) to model.C
. This includes all possible main effects, two-way interactions, and three-way interactions among the following variables:
talker.e
which is an effect-coded variable for the gender of the talker distinguishing female (\(-0.5\)) from male (\(0.5\))session.e
which is an effect-coded variable for whether it was the first session (\(-0.5\)) or the second session (\(0.5\))sleep.1
and sleep.2
which are Helmert-coded variables distinguishing the three experimental conditions that participants were assigned to: a.m. to p.m. (sleep.1 = -0.5
, sleep.2 = 0
), p.m. to a.m. (sleep.1 = 0.25
, sleep.2 = -0.5
), and p.m. to p.m. (sleep.1 = 0.25
, sleep.2 = 0.5
)These effects were added to to model.D
to create model.E
using the following code:
model.E <- bam(y ~ 1+ y.lag.1 +
s(time,by=y.lag.1,bs="cr",k=10) +
s(time,by=t.lag.1,bs="cr",k=10) + s(time,by=c.lag.1,bs="cr",k=10) +
s(time,bs="cr",k=10) +
s(trial, bs="cr",k=10) +
ti(trial,time,bs="cr",k=10) +
s(person,bs="re") + s(item,bs="re") +
talker.e*session.e*sleep.1 + # Added to model.E
talker.e*session.e*sleep.2, # Added to model.E
data=data, family=binomial, method="ML")
The main effects, two-way interactions, and three-way interactions among talker.e
, session.e
, sleep.1
, and sleep.2
are added in lines 9-10.
Calling summary(model.E)
gives the following output:
Family: binomial
Link function: logit
Formula:
y ~ 1 + y.lag.1 + s(time, by = y.lag.1, bs = "cr", k = 10) +
s(time, by = t.lag.1, bs = "cr", k = 10) + s(time, by = c.lag.1,
bs = "cr", k = 10) + s(time, bs = "cr", k = 10) + s(trial,
bs = "cr", k = 10) + ti(trial, time, bs = "cr", k = 10) +
s(person, bs = "re") + s(item, bs = "re") + talker.e * session.e *
sleep.1 + talker.e * session.e * sleep.2
Parametric coefficients:
Estimate Std. Error z value
(Intercept) -3.99834 0.10383 -38.508
y.lag.11 8.85873 0.04614 192.002
talker.e 0.13644 0.02177 6.268
session.e 0.07423 0.02175 3.412
sleep.1 -0.09051 0.18815 -0.481
sleep.2 0.17862 0.16291 1.096
talker.e:session.e -0.08665 0.04347 -1.993
talker.e:sleep.1 -0.03782 0.06146 -0.615
session.e:sleep.1 0.03304 0.06164 0.536
talker.e:sleep.2 -0.04486 0.05313 -0.844
session.e:sleep.2 -0.21079 0.05319 -3.963
talker.e:session.e:sleep.1 -0.09947 0.12287 -0.810
talker.e:session.e:sleep.2 0.03778 0.10628 0.355
Pr(>|z|)
(Intercept) < 2e-16 ***
y.lag.11 < 2e-16 ***
talker.e 3.65e-10 ***
session.e 0.000644 ***
sleep.1 0.630472
sleep.2 0.272911
talker.e:session.e 0.046254 *
talker.e:sleep.1 0.538344
session.e:sleep.1 0.591914
talker.e:sleep.2 0.398469
session.e:sleep.2 7.40e-05 ***
talker.e:session.e:sleep.1 0.418190
talker.e:session.e:sleep.2 0.722256
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(time):y.lag.11 6.285 7.377 283.21 < 2e-16 ***
s(time):t.lag.1 2.042 2.081 219.58 < 2e-16 ***
s(time):c.lag.1 6.278 7.221 86.49 1.27e-15 ***
s(time) 6.790 7.667 452.79 < 2e-16 ***
s(trial) 2.270 2.827 15.16 0.00175 **
ti(trial,time) 3.343 4.340 12.57 0.02141 *
s(person) 55.434 57.000 1645.49 < 2e-16 ***
s(item) 6.794 7.000 272.62 < 2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.96 Deviance explained = 92.5%
-ML = 1.3047e+06 Scale est. = 1 n = 927286
Similar to the comparisons made between the null model and models A-C, model.E
(with experimental condition effects) can be compared to model.D
(without experimental condition effects) as follows:
model.list <- list(model.C, model.E)
model.summary.2 <- data.frame(LogLikelihood = sapply(model.list, FUN=logLik.gam),
AIC = sapply(model.list, FUN=AIC),
BIC = sapply(model.list, FUN=BIC),
Deviance = sapply(model.list, deviance),
D_e = sapply(model.list, FUN=function(x){100*(deviance(model.null) - deviance(x))/deviance(model.null)}))
model.summary.2$D_difference <- c(NA, model.summary.2$D_e[2] - model.summary.2$D_e[1])
rownames(model.summary.2) <- c("Model C", "Model E")
Calling model.summary.2
gives the following:
## LogLikelihood AIC BIC Deviance D_e D_difference
## Model C -45203.53 90603.63 91757.53 90407.06 92.10376 NA
## Model E -45166.45 90547.40 91806.51 90332.90 92.11023 0.006477226
The experimental effects added to model.C
to create model.E
made a small improvement to the deviance and the deviance explained. Based on corrected AIC (which was smaller for model.E
), this decrease in deviance justified the additional parameters created by adding the experimental effects. However, based on BIC (which punishes the number of parameters more harshly than corrected AIC), the additional parameters did not result in a large enough decrease in deviance to justify their inclusion. Based on these results, we selected model.E
as the final model. You can refer to pp. 20-22 in Cho et al. (2022) for interpretations regarding the results of model.E
.
The smooth functions in model.E
can be plotted by calling plot(model.E, select = 1)
, where the value for the select
argument determines which smooth function is plotted (in the order of inclusion in the syntax of model.E
): the time-varying autocorrelations (select=1
), the target-distance autocorrelation (select=2
), the competitor-distance autocorrelation (select=3
), the effect of time (select=4
), the effect of trial (select=5
), the interaction between time and trial (select=6
), the person random effect (select=7
), and the item random effect (select=8
). For example, calling plot(model.E, select = 4)
plots the smooth function for the effect of time (s(time, bs = “cr”, k = 10)
in the syntax for model.E
):
Standard deviations and 95% credible intervals for these smooth functions can be reported by calling gam.vcomp(model.E)
.
As can be seen in the syntax for model.E
, values for the basis dimension (\(k\), see Cho et al. [2022] for details) were selected as \(k=10\) for the majority of the smooth functions used. To evaluate whether \(k=10\) is sufficiently large enough for each of these smooth functions, gam.check(model.E,type=“pearson”)
can be called, which gives the following output:
Method: ML Optimizer: outer newton
full convergence after 5 iterations.
Gradient range [-0.05549542,0.06283272]
(score 1304663 & scale 1).
Hessian positive definite, eigenvalue range [0.02596176,27.61129].
Model rank = 209 / 209
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(time):y.lag.11 9.00 6.28 0.97 0.080 .
s(time):t.lag.1 10.00 2.04 0.97 0.055 .
s(time):c.lag.1 10.00 6.28 0.97 0.150
s(time) 9.00 6.79 0.97 0.105
s(trial) 9.00 2.27 0.99 0.650
ti(trial,time) 81.00 3.34 0.98 0.420
s(person) 60.00 55.43 NA NA
s(item) 8.00 6.79 NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
For each smooth function, a \(k\)-index close to 1, an \(edf\) (effective degrees of freedom) less than \(k\), and a \(p\)-value greater than .05 all indicate that the selected value of \(k\) is sufficiently large. Note that the last two smooth functions (for the person random effect and the item random effect) do not have \(k\)-indices or \(p\)-values because values of \(k\) were not specified for these smooth functions as random effects in the syntax for model.E
(with default values used instead). Based on these results, we can conclude that \(k=10\) was sufficiently large for each smooth function.
This section evaluates model.E
regarding its adequacy as a model for describing the data.
To evaluate the adequacy of model.E
for describing the data, we’ll first plot the residuals of model.E
over the range of fitted values as follows:
fit.E <- fitted(model.E)
residual.E <- residuals(model.E, type="pearson")
hist.residual.E <- hist(residual.E, xlab="Pearson Residuals")
fit.residual.E <- plot(fit.E, residual.E)
Lines 1-2 extract the fitted values and residuals (respectively) of model.E
. Line 3 plots a histogram of the residuals of model.E
. Line 4 plots the fitted values vs. residuals of model.E
to further examine where the largest residuals occurred. Calling hist.residual.E
plots the following:
Although several residuals were very large in magnitude (ranging from -37.48 to 43.59), 99.1% of the Pearson residuals were within an acceptable range of \(\pm1.96\), with \(M=-0.002\) and \(SD=0.988\).
Calling fit.residual.E
plots the following:
As seen in the plot above, the most extreme residuals that were observed occurred at the extremes of the distribution of fitted values.
Second, we’ll look at the Somers’ rank correlation between the binary data (y
) and the predicted probabilities based on model.E
by calling the following:
The Somers’ rank correlation (Dxy
in the resulting output) was .988, indicating that the predictions from model.E
(which range from 0 to 1) correlated strongly with the binary outcome.
Both of these results indicate that model.E
describes the data sufficiently well.
Chatfield, C. (2004). The analysis of time series: An introduction (6th ed.). Chapman; Hall/CRC. https://doi.org/10.1007/978-1-4899-2921-1
Cho, S.-J., Brown-Schmidt, S., De Boeck, P., & Naveiras, M. (2022). Space-time modeling of intensive binary time series eye-tracking data using a generalized additive logistic regression model. Psychological Methods. https://doi.org/10.1037/met0000444
Pebesma, E. J. (2004). Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30, 683-691.
Pebesma E. J. & Bivand R. S. (2005). Classes and methods for spatial data in R. R News, 5(2), 9–13. https://CRAN.R-project.org/doc/Rnews/.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. https://doi.org/10.1214/aos/1176344136
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. New York, NY: Springer-Verlag. https://ggplot2.tidyverse.org.
Wood, S. N., Pya, N., & Säfken, B. (2016). Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association, 111(516), 1548-1563. https://doi.org/10.1080/01621459.2016.1180986