In psychological and behavioral science, researchers are often interested in studying the developmental changes of a group or of multiple groups of individuals, such as changes in their cognitive levels and behavioral patterns over time. Longitudinal studies are often conducted to investigate these problems, and results from these studies can yield convincing arguments pertaining to the relationships between variables (e.g., directionality of causality) by constructing a theoretical or temporal back-and-forth logic between said variables (Ferrer & McArdle, 2010; Leszczensky & Wolbring, 2022; Toh & Hernán, 2008). The measurement of developmental changes relies on longitudinal data collected using multiple measures of constructs over time. Typically, these changes can be captured with longitudinal latent variable models falling into two main categories (McArdle, 2009; Muthén & Muthén, 2000): (1) longitudinal models that focus on changes in categorical latent variables, such as hidden Markov models (or latent transition models) (e.g., Bartolucci et al., 2013; Collins et al., 1997; Wang, Yang, et al., 2018a), and (2) longitudinal models that focus on metrical changes in continuous latent variables, such as the latent growth curve models (e.g., Bollen & Curran, 2006; Duncan et al., 2006) and longitudinal item response theory (IRT) models (e.g., Andersen, 1985; von Davier et al., 2011; Wang & Nydick, 2020). Most of these longitudinal models use item response accuracy (RA) data, such as binary responses indicating whether an answer to a multiple-choice question is right or wrong or ordinal responses to Likert-type items, as observed indicators of the measured latent construct in psychological studies.

Thanks to advancements in computer- and web-based learning, assessment, and experimental systems, many types of multimodal data, such as reaction or response times (RTs), mouse clicks, action sequences, eye tracking, and brain activation, are accessible in addition to RA data (Gorin, 2006; Jeon et al., 2021; Jiao & Lissitz, 2018; van der Maas & Jansen, 2003; Zhan et al., 2022). Among these types of data, participants’ RTs to test item, which is the amount of time spent by an individual to consider and solve each task or item, have been used to explore behavioral patterns in many psychological and behavioral case studies (e.g., Meijering & van Rijn, 2009; Siegler, 1989; van der Maas & Jansen, 2003) and to address the measurement issues encountered in many methodological studies (e.g., Bolsinova & Tijmstra, 2018; Man & Harring, 2021; van der Linden & Guo, 2008; Zhan et al., 2018). For an assessment at a given time point, RA and RT data are collected simultaneously to provide parallel information pertaining to participants’ cognitive processes or behavioral patterns for the same task (e.g., it takes 60 s for a participant to respond to an item correctly). As a complement to RA data, RT data may offer additional information related to the contents of the cognitive processes that underlie problem-solving on tasks that are difficult to analyze using RA data alone (De Boeck & Jeon, 2019). A joint analysis of RA and RT data can not only directly reveal the relationship between latent ability and processing speed, but also improve the accuracy of parameter estimation with the measurement models (Bolsinova & Tijmstra, 2018; van der Linden, 2007; Zhan et al., 2018).

Several recent computer-based longitudinal studies, such as dynamic studies focusing on learning development (e.g., adaptive learning [Wang, Yang, et al., 2018a] and intelligent tutoring [Woolf, 2009]) and longitudinal behavioral experimental studies focusing on child development (e.g., Ouyang et al., 2022) have facilitated the simultaneous collection of RA and RT data across multiple time points, and we refer to these data as longitudinal RA and longitudinal RT. Moreover, these studies have demonstrated the promise of using both longitudinal RA and longitudinal RT data to study changes in cognitive levels and behavioral patterns. Human behavior or cognitive process is a product of multiple constructs that are systematically related to one another. Hence, individual constructs are likely to undergo developmental change not independently but on the basis of the influences of their respective developmental changes on each other. Such change in constructs that develop separately but influence each other is called “parallel interactive development” in the present study. Our focus is on the parallel interactive development of two latent constructs, namely latent ability and processing speed, which are measured using RA and RT, respectively, with the aim of advancing our holistic understanding of the developmental changes in individuals.

With the availability of longitudinal RA and longitudinal RT data, a statistical method must be used to capture the parallel interactive development of latent ability and processing speed from the data. Joint-hierarchical latent variable modeling is one of the most popular approaches for simultaneously analyzing RA and RT data (van der Linden, 2007). However, this modeling approach is suitable for cross-sectional data, and it assumes a constant continuous ability (reflected by RA) and a constant continuous latent speed (reflected by RT) through a test at a given point in time. Many variants of this model have been proposed to generalize its assumptions to diverse test scenarios and applications, which are nonetheless limited to cross-sectional data (e.g., Fox & Marianti, 2016; Klein Entink et al., 2009; Man et al., 2019; Man & Harring, 2021; Molenaar et al., 2015, 2016; Zhan et al., 2021). Although a few studies have extended this joint modeling framework to track changes in speed and ability, they have defined latent ability as a categorical latent variable and used a cognitive diagnostic model (CDM) (e.g., Junker & Sijtsma, 2001) for measuring RA data (Wang et al., 2020; Wang, Zhang, et al., 2018b). By contrast, our study focuses on detecting the parallel interactive development of two continuous latent variables, namely latent ability and latent processing speed. Continuous latent variables and categorical latent variables represent two different quantitative perspectives on latent constructs, and they are used for different purposes. Many latent constructs, such as intelligence, are considered continuous, and in such a case, we seek a scale on which individuals can be located. In other words, continuous variables portray individual development more finely than categorical variables, especially when the magnitude of change is small (e.g., only quantitative but not qualitative change) (e.g., Zhan, 2021).

The objective of our study is to develop novel statistical modeling approaches for assessing the parallel interactive development of continuous latent ability and processing speed using both longitudinal RA and longitudinal RT data. The proposed longitudinal joint modeling approaches are within the domain of structural equation modeling (SEM) (McArdle & Nesselroade, 2014), and we adapt three widely used modeling approaches in our setup: unstructured-covariance-matrix-based modeling, latent (parallel process) growth curve modeling, and autoregressive cross-lagged modeling. The first modeling approach is often used in longitudinal IRT models, and the latter two are frequently used for analyzing longitudinal data in the field of developmental and applied psychology. In addition, the proposed modeling approaches can not only provide developmental trajectories in terms of changes in latent ability and processing speed over time, but also provide additional information about the across-time relationships between these two constructs. In this sense, these approaches can be viewed as extensions of joint-hierarchical latent variable modeling (van der Linden, 2007) in a longitudinal setup from the SEM perspective.

In the remaining sections, we first describe the proposed longitudinal joint models, including the model formulas, model estimation procedure, and differences from a few related models. Then, two empirical examples are presented to illustrate the applicability of the proposed models, followed by a simulation study to explore the psychometric performance of the proposed models under different simulated test conditions. Finally, we present our findings and provide an outline for future research on this topic.

Longitudinal joint-hierarchical latent variable modeling

Consider N students participating in a longitudinal assessment that integrates learning components across P time points. Let Ip be the index of the number of items administered at time point p (p = 1, …, P) and the entire assessment consists of \(I={\sum}_{p=1}^P{I}_p\) items. Two types of longitudinal data can be collected from these items at each time point: longitudinal RA and longitudinal RT. Let Ynip and Tnip be the n (n = 1, …, N)-th student’s RA and RT for item i (i = 1, …, Ip) at time point p, respectively. Following the SEM framework, the proposed longitudinal joint models consist of two components: a measurement model that describes the associations between the observed variables and latent variables at a given time point and a structural model that describes the changes in and relationships among the latent variables over time. Our focus is on presenting three types of structural models that can be used to describe the mechanisms of change in latent ability and processing speed over time. The following sections describe the measurement model component and the proposed structural models.

Measurement model

At a given time point p, we assume that RA data follow the two-parameter logistic IRT model (Birnbaum, 1968), which can be expressed as follows:

$$\textrm{Prob}\left({Y}_{nip}=1|{\uptheta}_{np},{a}_{ip},{b}_{ip}\right)=\frac{\exp \left({a}_{ip}{\uptheta}_{np}-{b}_{ip}\right)}{1+\exp \left({a}_{ip}{\uptheta}_{np}-{b}_{ip}\right)},$$
(1)

where θnp denotes the latent ability of student n at time point p, and aip and bip denote the discrimination and difficulty of item i at time point p, respectively. Equation (1) represents a two-parameter extension of Andersen’s (1985) longitudinal Rasch model with an additional item-discrimination parameter (see also von Davier et al., 2011).

In addition, at a given time point p, we assume that RT data follow the lognormal RT model (van der Linden, 2006) with an additional time-discrimination parameter (Klein Entink et al., 2009), which can be expressed as follows:

$$\log {T}_{nip}={\upxi}_{ip}-{\upphi}_{ip}{\uptau}_{np}+{\upvarepsilon}_{nip},{\upvarepsilon}_{nip}\sim N\left(0,{\upomega}_{ip}^{-2}\right),$$
(2)

where τnp denotes the latent processing speed of student n at time point p, and ξip, φip, and ωip denote the time-intensity, time-discrimination, and time-precision of item i at time point p, respectively.

Note that the dichotomous RA data are considered only as an example to illustrate the conceptualization of the proposed modeling approaches. Different IRT models can be applied to ordinal or nominal response data with more than two categories (Bock, 1972; Samejima, 1969). A few studies have explored the joint analysis of ordinal response and RT data in cross-sectional personality tests based on rating scales (e.g., Ranger, 2013).

Structural model

Unstructured-covariance-matrix-based structural model

To describe the relationship between θnp and τnp across P time points, one of the most straightforward methods is to construct an unstructured covariance matrix (e.g., von Davier et al., 2011; Zhan et al., 2019), as illustrated in Fig. 1. The structural model component of the proposed joint model based on an unstructured covariance matrix, denoted by COV, is as follows:

$$\left(\begin{array}{c}{\boldsymbol{\uptheta}}_n\\ {}{\boldsymbol{\uptau}}_n\end{array}\right)\sim MVN\left(\left(\begin{array}{c}{\boldsymbol{\upmu}}_{\uptheta}\\ {}{\boldsymbol{\upmu}}_{\uptau}\end{array}\right),\boldsymbol{\Sigma} \right),\boldsymbol{\Sigma} =\left(\begin{array}{cc}\begin{array}{ccc}{\upsigma}_{\uptheta_1}^2& \cdots & {\upsigma}_{\uptheta_1{\uptheta}_P}\\ {}\vdots & \ddots & \vdots \\ {}{\upsigma}_{\uptheta_P{\uptheta}_1}& \cdots & {\upsigma}_{\uptheta_P}^2\end{array}& \begin{array}{ccc}{\upsigma}_{\uptheta_1{\uptau}_1}& \cdots &\ {\upsigma}_{\uptheta_1{\uptau}_P}\\ {}\vdots & \ddots & \vdots \\ {}{\upsigma}_{\uptheta_P{\uptau}_1}& \cdots & {\upsigma}_{\uptheta_P{\uptau}_P}\end{array}\\ {}\begin{array}{ccc}{\upsigma}_{\uptau_1{\uptheta}_1}& \cdots & {\upsigma}_{\uptau_1{\uptheta}_P}\\ {}\vdots & \ddots & \vdots \\ {}{\upsigma}_{\uptau_P{\uptheta}_1}& \cdots & {\upsigma}_{\uptau_P{\uptheta}_P}\end{array}& \begin{array}{ccc}{\upsigma}_{\uptau_1}^2& \cdots & {\upsigma}_{\uptau_1{\uptau}_P}\\ {}\vdots & \ddots & \vdots \\ {}{\upsigma}_{\uptau_P{\uptau}_1}& \cdots & {\upsigma}_{\uptau_P}^2\end{array}\end{array}\right),$$
(3)

where θn = (θn1, …, θnP)′ denotes the latent ability vector consisting of P elements, τn = (τn1, …, τnP)′ denotes the latent processing speed vector consisting of P elements. The vectors μθ = (μθ1, …, μθP)′ and μτ = (μτ1, …, μτP)′ are the population mean vector of latent abilities and population mean vector of latent processing speeds, respectively. Σ is a variance-covariance matrix, in which \({\upsigma}_{\uptheta_p}^2\) is the variance of θp, \({\upsigma}_{\uptau_p}^2\)is the variance of τp, \({\upsigma}_{\uptheta_p{\uptheta}_{p\prime }}\)is the covariance of θp and θp′, \({\upsigma}_{\uptau_p{\uptau}_{p\prime }}\)is the covariance of τp and τp′, and \({\upsigma}_{\uptheta_p{\uptau}_p}\)is the covariance of θp and τp.

Fig. 1
figure 1

Graphical representation of unstructured-covariance-matrix-based longitudinal joint models (three time points)

This structural model directly outputs estimates of the latent ability and processing speed at each time point. Therefore, \({\hat{\boldsymbol{\uptheta}}}_n\) and \({\hat{\boldsymbol{\uptau}}}_n\), respectively, can be used to directly describe the estimated developmental trajectories of the latent ability and processing speeds of individuals. In other words, \({\hat{\uptheta}}_{n\left(p+1\right)}-{\hat{\uptheta}}_{np}\) and \({\hat{\uptau}}_{n\left(p+1\right)}-{\hat{\uptau}}_{np}\) can be used to describe the estimated degrees of changes in individuals' latent abilities and processing speeds. The estimate of the population-level average changes in ability and speed at two adjacent time points can then be denoted by \({\hat{\upmu}}_{\uptheta \left(p+1\right)}-{\hat{\upmu}}_{\uptheta p}\) and \({\hat{\upmu}}_{\uptau \left(p+1\right)}-{\hat{\upmu}}_{\uptau p}\), respectively. Furthermore, the estimate of the population-level scale changes in ability and speed at two adjacent time points can be described by \({\hat{\upsigma}}_{\uptheta \left(p+1\right)}/{\hat{\upsigma}}_{\uptheta p}\) and \({\hat{\upsigma}}_{\uptau \left(p+1\right)}/{\hat{\upsigma}}_{\uptau p}\), respectively (Paek et al., 2014).

The unstructured covariance matrix in COV makes it possible to consider various relationships between the latent constructs across all time points. However, when the number of time points is large, the computational cost increases dramatically, and the nonconvergent estimation issue may be encountered.

Latent growth curve-based structural model

The second structural model is based on the latent growth curve (e.g., Curtis, 2010; Wang & Nydick, 2020), and we denote the joint model based on it as the latent growth curve longitudinal joint model (LGC). This model is illustrated in Fig. 2 and is expressed as follows:

$${\uptheta}_{np}={\uppi}_{n0}+{\uppi}_{n1}\left(p-1\right)+{\upvarepsilon}_{\uptheta_{np}},{\upvarepsilon}_{\uptheta_{np}}\sim N\left(0,{\upsigma}_{\uptheta_p}^2\right),$$
(4)
$${\uptau}_{np}={\updelta}_{n0}+{\updelta}_{n1}\left(p-1\right)+{\upvarepsilon}_{\uptau_{np}},{\upvarepsilon}_{\uptau_{np}}\sim N\left(0,{\upsigma}_{\uptau_p}^2\right),$$
(5)
$$\left(\begin{array}{c}\begin{array}{c}{\uppi}_{n0}\\ {}{\uppi}_{n1}\end{array}\\ {}\begin{array}{c}{\updelta}_{n0}\\ {}{\updelta}_{n1}\end{array}\end{array}\right)\sim MVN\left(\left(\begin{array}{c}\begin{array}{c}{\upmu}_{\uppi_0}\\ {}{\upmu}_{\uppi_1}\end{array}\\ {}\begin{array}{c}{\upmu}_{\updelta_0}\\ {}{\upmu}_{\updelta_1}\end{array}\end{array}\right),\left(\begin{array}{ccc}{\upsigma}_{\uppi_0}^2& \cdots & {\upsigma}_{\uppi_0{\updelta}_1}\\ {}\vdots & \ddots & \vdots \\ {}{\upsigma}_{\updelta_1{\uppi}_0}& \cdots & {\upsigma}_{\updelta_1}^2\end{array}\right)\right),$$
(6)

where πn0 and πn1 are the individual growth intercept and growth slope parameters for latent ability, respectively, and δn0 and δn1 are the individual growth intercept and growth slope parameters for processing speed, respectively. In addition, \({\upmu}_{\uppi_0}\) and \({\upmu}_{\uppi_1}\) denote the population mean and population average developmental change in latent ability, respectively, and \({\upmu}_{\updelta_0}\)and \({\upmu}_{\updelta_1}\) denote the population mean and population average developmental change in processing speed, respectively. Furthermore, \({\upvarepsilon}_{\uptheta_{np}}\) and \({\upvarepsilon}_{\uptau_{np}}\) are the residual terms of latent ability and processing speed, respectively.Footnote 1

Fig. 2
figure 2

Graphical representation of latent growth curve longitudinal joint models (three time points)

Unlike the COV, which directly estimates the values of θnp and τnp at each time point, this model estimates the growth factors (i.e., πn0, πn1, δn0, and δn1) of each individual. Accordingly, \({\hat{\uppi}}_{n1}\) and \({\hat{\updelta}}_{n1}\) can be used to describe the estimated amounts of change in latent ability and processing speed, respectively, of each individual between adjacent time points, and \({\hat{\upmu}}_{\uppi_1}\) and \({\hat{\upmu}}_{\updelta_1}\) can be used to describe the estimated amounts of change in latent ability and processing speed, respectively, of the corresponding population means between adjacent time points.

As expressed in Eq. (6), growth factors are assumed to follow a multivariate normal distribution, indicating that the starting values (i.e., πn0 and δn0) of and the amounts of developmental change (i.e., πn1 and δn1) in the latent constructs mutually influence each other. In such cases, for example, \({\uprho}_{\uppi_0{\updelta}_0}={~}^{{\upsigma}_{\updelta_0{\uppi}_0}}\!\left/ \!{~}_{{\upsigma}_{\updelta_0}{\upsigma}_{\uppi_0}}\right.\) can be used to describe the correlation between latent ability and processing speed at the starting point, and \({\uprho}_{\uppi_0{\uppi}_1}={~}^{{\upsigma}_{\uppi_0{\uppi}_1}}\!\left/ \!{~}_{{\upsigma}_{\uppi_0}{\upsigma}_{\uppi_1}}\right.\) and \({\uprho}_{\updelta_0{\updelta}_1}={~}^{{\upsigma}_{\updelta_0{\updelta}_1}}\!\left/ \!{~}_{{\upsigma}_{\updelta_0}{\upsigma}_{\updelta_1}}\right.\) can be used to describe the correlation between the starting values of and amounts of developmental change in two latent constructs, respectively. If these values are significantly greater than 0, it follows that the higher the starting level, the faster is the individual’s development, and vice versa.

The proposed LGC is a linear growth model. A more complex nonlinear growth process can be considered by adding quadratic terms (e.g., πn2(p − 1)2 and δn2(p − 1)2, where πn2 and δn2 are the quadratic growth parameters of latent ability and processing speed, respectively) to Eqs. (4) and (5), respectively. This nonlinear LGC was also fitted to two data sets presented in the empirical examples section. However, its relative model–data fit, including the model complexity penalty, was worse than that of the linear LGC in empirical example 1 and marginally better than that of the linear LGC in empirical example 2 (Tables S8 and S9 in the online appendix present the relative and absolute model–data fits, respectively). For this reason, we choose to not present this model in this study.

Autoregressive cross-lagged structural model

The last approach we consider for describing the developmental changes in θnp and τnp over time is autoregressive cross-lagged modeling (e.g., Bentler, 1980; Mayer, 1986; McArdle, 2009), as depicted in Fig. 3. We express the structural model of the autoregressive cross-lagged longitudinal joint model (denoted as ACL) as follows:

$${\uptheta}_{np}={\upbeta}_{0p}+{\upbeta}_{1p}{\uptheta}_{n\left(p-1\right)}+{\upbeta}_{2p}{\uptau}_{n\left(p-1\right)}+{\upvarepsilon}_{\uptheta_{np}},$$
(7)
$${\uptau}_{np}={\uplambda}_{0p}+{\uplambda}_{1p}{\uptau}_{n\left(p-1\right)}+{\uplambda}_{2p}{\uptheta}_{n\left(p-1\right)}+{\upvarepsilon}_{\uptau_{np}},$$
(8)
$$\left(\begin{array}{c}{\upvarepsilon}_{\uptheta_{np}}\\ {}{\upvarepsilon}_{\uptau_{np}}\end{array}\right)\sim MVN\left(\left(\begin{array}{c}0\\ {}0\end{array}\right),\left(\begin{array}{cc}{\upsigma}_{\upvarepsilon_{\uptheta p}}^2& {\upsigma}_{\upvarepsilon_{\uptheta p}{\upvarepsilon}_{\uptau p}}\\ {}{\upsigma}_{\upvarepsilon_{\uptau p}{\upvarepsilon}_{\uptheta p}}& {\upsigma}_{\upvarepsilon_{\uptau p}}^2\end{array}\right)\right),$$
(9)

where β0p and λ0p are the intercepts of latent ability and processing speed at time point p, respectively. Moreover, β1p and λ1p denote the autoregressive effects of latent ability and processing speed at time point p, respectively, and they describe the stability of the latent constructs from one time point to the next. β2p and λ2p denote the cross-lagged effects of latent ability and processing speed at time point p, respectively, and they describe the effect of one construct on another measured at a later time point. In addition, \({\upvarepsilon}_{\uptheta_{np}}\) and \({\upvarepsilon}_{\uptau_{np}}\) are the residual terms of latent ability and processing speed, respectively. In contrast to the residual terms in the LGC, the residual terms in the ACL are assumed to follow a bivariate normal distribution for describing the correlation between the unexplained parts of latent ability and processing speed.

Fig. 3
figure 3

Graphical representation of autoregressive cross-lagged longitudinal joint model (three time points)

Compared to the COV (Fig. 1), the ACL strictly limits the directionality of the influence between variables (i.e., prediction). Under strict experimental design, the ACL can be used to reveal causal relationships between variables if irrelevant variables are controlled (Bentler, 1980), while the COV cannot. In addition, similar to the COV, \({\hat{\uptheta}}_{n\left(p+1\right)}-{\hat{\uptheta}}_{np}\) and \({\hat{\uptau}}_{n\left(p+1\right)}-{\hat{\uptau}}_{np}\) can be used to estimate the degrees of change in individuals’ latent ability and processing speed, respectively. However, the calculations of the degrees of change in the population means of latent ability and processing speed at adjacent time points are complex. For ease of understanding, an example is summarized in Table 1.

Table 1 Comparison of individual level and population mean level among the three models at three time points

Summary and comparison between three longitudinal joint models

In summary, we presented three types of joint models for longitudinal RA and RT data with differences in the formulations of the structural models, denoted by COV, LGC, and ACL. Table 1 presents a comparison between individual level and population mean level among the three structural models at three time points. First, at the starting point (i.e., p = 1), the three models are equivalent, though different notations for parameters related to ability and speed are used given their own model formulations. Second, the three models have different assumptions for describing the developmental changes in ability and speed. Specifically, the COV has the most lenient assumption, and it directly estimates the latent ability and processing speed at each time point. The LGC assumes a linear growth and estimates the coefficients of the growth curves (i.e., growth factors) of ability and speed over time. The ACL further models the influence of constructs on themselves and the relationships between constructs at two adjacent time points. Third, both the LGC and the ACL can be treated as special cases of the COV by reparametrizing the mean and the variance of the distributions of the latent variables in the COV. For example, in the LGC, \({\theta}_{np}\sim N\left({\mu}_{\pi_0}+{\mu}_{\pi_1}\left(p-1\right),{\sigma}_{\pi_0}^2+{\sigma}_{\pi_1}^2{\left(p-1\right)}^2+2\left(p-1\right) COV\left({\pi}_0,{\pi}_1\right)+{\upsigma}_{\uptheta_p}^2\right)\). In addition, to our understanding, ACL and LGC develop independently, and there is no theoretical nested relationship between them. Fourth, when P ≥ 3, all three models can be used. When P = 2, the COV and the ACL are recommended because two time points do not satisfy the identifiability requirement of LGC (Bollen & Curran, 2006).Footnote 2 When P = 1, all three models are reduced to the joint-hierarchical item response model for cross-sectional data (Klein Entink et al., 2009; van der Linden, 2007).

Related models

A limited number of models have been proposed for analyzing longitudinal RA and RT data simultaneously. One area of research is within the ambit of the dynamic CDM framework (Chen & Culpepper, 2020; Wang et al., 2020; Wang, Zhang, et al., 2018b). The aforementioned models use the CDM model and the lognormal RT model for measuring RA and RTs, respectively, at a given time point and then model the transition of a discrete latent ability variable based on different assumptions of the latent ability variable and latent speed. The proposed models differ from these joint models mainly in terms of modeling the change in a continuous latent ability instead of the discrete latent ability. In addition, our proposed models allow different possibilities of changes in both abilities and speed, which are more realistic in terms of the application in diverse longitudinal assessment scenarios.

Another branch of research that is closely related to this study is joint modeling of RA and RT data in cross-sectional scenarios. For example, Zhan et al. (2021) proposed a multidimensional joint model for RA and RT data, which can be used to analyze the multidimensionality of ability and speed in cross-sectional assessments. In fact, the COV proposed in this study can be considered an application of the multidimensional joint model to the analysis of longitudinal bimodal data (RA and RTs), similar to the application of multidimensional IRT models to the analysis of longitudinal RA data (von Davier et al., 2011). In addition, to address the within-person relationship between ability and speed in cross-sectional assessments, Fox and Marianti (2016) proposed a joint model of RA and RT and modeled the relationship between latent ability and speed using a latent growth curve model. In contrast to their assumption that an individual has constant ability but variable speed across items, our LGC assumes that both ability and speed change over time but remain constant across items at a specific time point.

Model estimation

Model identification

To ensure the comparability of latent constructs across time points, an anchor-item design or a repeated measurement design (i.e., all items are anchor items) is often used. Kolen and Brennan (2004) recommended that assessments use at least 20% of the items in question to anchor the parameters to the common scale. If an adequate number of items is linked across time and no item parameter drift is assumed, the estimation of model parameters requires the imposition of a few model identifiability constraints. We impose constraints on the mean and variance of ability and the speed to identify the scales of ability and speed. In the proposed models, the mean of latent ability at the first time point is set to zero to identify the mean of the ability scale. Similarly, the mean of processing speed at the first time point is set to zero to identify the mean of the speed scale. Specifically, in the COV, we set μθ1 = μτ1 = 0; in the LGC, we set \({\upmu}_{\uppi_0}={\upmu}_{\updelta_0}=0\); and in the ACL, we set β01 = λ01 = 0. Next, in all the proposed models, the product of item discrimination parameters is restricted to 1 to identify the variance of the ability scale, and the product of item time-discrimination parameters is restricted to 1 to identify the variance of the speed scale. Once the ability and speed scales are identified, all higher-level model parameters can be identified.

Bayesian parameter estimation

The parameters of the proposed models can be estimated using the Markov chain Monte Carlo (MCMC) method within the Bayesian estimation framework. In this study, the PyMC3 package (version 3.11.2) (Salvatier et al., 2016; https://docs.pymc.io) in the Python software environment is used to implement the MCMC method. Moreover, the No-U-Turn Sampler (NUTS) (Hoffman & Gelman, 2014) in PyMC3 is used as the MCMC sampling algorithm because it uses gradient information from the likelihood to converge considerably faster than traditional sampling algorithms (e.g., Gibbs sampling), especially in the case of complexity models.

Herein, medium-informative hyper-priors are used to increase the generalizability of our code. The priors used in this study are listed in Section S1.1 of the online appendix. In addition, a robustness analysis of the model parameter estimation for low-, medium-, and high-informative hyper-priors is presented in Section S1.2 of the online appendix. The results indicate that the three proposed models are highly robust against prior distributions containing different amounts of information.

In longitudinal studies, missing data are encountered commonly owing to factors such as sample attrition and test design (i.e., missing by design). A common practice in Bayesian estimation is to have the algorithm automatically fill in missing values on the basis of the sampled model parameter values (e.g., Pan & Zhan, 2020). However, when the proportion of missing values is high, the computational cost increases considerably. In addition, when the model does not fit the data well, the automatically filled data may have large biases. By contrast, if the data are missing completely at random (MCAR) or missing at random (MAR), we suggest using only the observed data to fit the proposed models.Footnote 3 For the MCAR or MAR, Rubin (1976) postulated that respondents were similar to nonrespondents within a subcategory; methods such as full information maximum likelihood estimation, which ignores missing values and calculates the likelihood function using observed data, will work reasonably well in such cases (Pokropek, 2011). To delete the missing data that are devoid of information at the response level (i.e., missing data of specific persons for specific items), we adjust the data format from the traditional item response matrix to the item response vector with tags that include time points, persons, and items. Further details pertaining to the aforementioned data format conversion can be found in Section S2 in the online appendix.

Applications

The proposed models were applied for analyzing two sets of data. The first set was collected from a computer-based learning platform that aims to improve students’ spatial rotation skills. The experiment was conducted over 1 to 2 hours with short intervals between time points. The second data set was collected from a computer-based behavioral experiment in the field of developmental psychology, and it was characterized by long intervals between time points. In both experiments, the participants’ longitudinal RA and RT data were recorded. The three proposed joint models were fitted to both data sets. In addition, to evaluate the benefit of using a joint modeling framework, we also separately fitted the longitudinal RA and longitudinal RT data by their corresponding measurement model (denoted as sep-COV, sep-LGC, and sep-ACL, respectively). That is, ignoring the structural relationship between latent ability and processing speed.

Empirical example 1: Computer-based spatial rotation learning experiment

Data description

The spatial rotation data set employed herein has been used in a few previous studies to track learning trajectories in dynamic CDM frameworks (e.g., Wang et al., 2020; Wang, Yang, et al., 2018a; Wang, Zhang, et al., 2018b). In the experiment, a total of 350 participants answered 50 questions from five testing blocks sequentially and received a learning intervention between any two adjacent testing blocks. Each testing block contains 10 items and represents a time point. To balance the item positions and avoid the empirical identifiability problem, a total of five versions of the test were developed following the Latin square design, and they were randomly assigned to the participants to guarantee that different test blocks have the same chance of being the first block among all participants. Detailed descriptions of the test questions, learning intervention, and the entire experimental process can be found in Wang, Yang, et al. (2018a).

Essentially, this data set represents a longitudinal assessment with the repeated measure design. To facilitate the proposed model fitting process, we reorganized the data set using 350 individuals’ responses to 250 items at five time points (50 items per time point). At a given time point, a participant only answered 10 items, and responses to the other 40 items remained missing because of the design. These missing data due to test design were considered as MCAR and were ignored in the data analysis. Figure 4 depicts box plots of the total raw score and the average logarithm of the RTs (logRTs) of all students on the 50 items at each time point (excluding the missing values). A clear decreasing trend of the average logRTs can be observed; as for the total raw score, it shows an increase between time points 1 and 2, and the changes between the other time points are negligible.

Fig. 4
figure 4

Box plot of students’ total raw scores and average logarithm of response times at each time point in empirical example 1

Analysis

In this application, latent ability represents a student's spatial rotation ability, while latent processing speed denote a student's latent speed when completing a rotation question. To each model, two Markov chains were applied. In each chain, 10,000 iterations were performed, with the first 5000 iterations as burn-in, and the remaining 5000 iterations (10,000 per chain in total) performed for model parameter inference. PyMC3 provides different initialization schemes for MCMC chains, as well as a set of tools for automatically diagnosing convergence after sampling.Footnote 4 In this study, the advi+adapt_diag initialization scheme, which runs automatic differentiation variational inference and, subsequently, adapts the resulting diagonal mass matrix on the basis of the variance of the tuning samples, was used. This scheme initializes the chain at the test value, which depends on the prior distribution and is usually the mean or mode of the prior distribution. We checked for convergence using the criterion that the potential scale reduction factor (PSRF) should be less than 1.1 (Brooks & Gelman, 1998) or 1.2 (de la Torre & Douglas, 2004).

Posterior predictive model checking (PPMC; Gelman et al., 2014) was used to evaluate the absolute model–data fit. A posterior predictive probability (ppp) value near 0.5 indicates no systematic differences between the realized and predictive values and, thus, an adequate model fit. By contrast, a ppp value smaller than 0.05 or larger than 0.95 was considered the indication of an inadequate model fit. In this study, the ppp value was computed separately for each time point. Only the differences between the observed and predicted data were compared, and these differences were used to compute the ppp values (Levy & Mislevy, 2016). Specifically, for both RA and logRT on each time point, the differences between the observed data, X, and posterior predicted data, Xpostpred, were compared in computing the PPMC, as \(ppp={\sum}_{e=1}^E\left( Sum\left({\boldsymbol{X}}^{postpred(e)}\right)\ge Sum\left(\boldsymbol{X}\right)\right)/E\), where E is the total number of iterations in MCMC sampling; Xpostpred(e) were the posterior predicted data in e-th iteration, which were generated from the item response function (Eq. (1) for RA and Eq. (2) for logRT) based on the samplings of the model parameters from the posterior distributions. The deviance information criterion (DIC) and the widely available information criterion (WAIC) (Gelman et al., 2014) were computed for model selection. Smaller DIC and WAIC values indicate a better model–data fit.

Results

The PSRFs of all the parameters in each model were less than 1.1, indicating good convergence under the specified setting. Figure 5 depicts the ppp values of the six models for the RA and RT data of each time point. The results of absolute fitting of the data with the joint model and its corresponding separate model are generally consistent. Both the COV and the ACL were able to fit these data well at all five time points. As for the LGC, its RA model was not able to fit the RA data at time point 2, mainly because the change in total raw score was not linear with respect to time (as shown in Fig. 4), but its RT model was able to fit the RT data.

Fig. 5
figure 5

Posterior predictive probability values of six models to response accuracy (RA) and response time (RT) data at each time point in empirical example 1. Note. COV unstructured covariance matrix-based longitudinal joint model, LGC latent growth curve longitudinal joint model, ACL autoregressive cross-lagged longitudinal joint model; sep- separate modeling

Table 2 summarizes the relative model–data fit indices of the six models and their computation times. According to the −2LL, among the six models, the COV model provided the best fit for both RA and RT data, regardless of model complexity, because it was the most generalized model. When the model complexity penalty was considered, in terms of the DIC and WAIC, the ACL model provided the best fit for the RA data, and the LGC model provided the best fit for the RT data. In addition, the joint model generally fitted the data better than its corresponding separate model, except the COV. A possible reason is that the COV incorporates more covariance coefficients between latent ability and processing speed, resulting in larger complexity than the sep-COV, and hence higher DIC and WAIC. Furthermore, because the separate models are more parsimonious than the joint models, the computation times of the former were shorter than those of the latter. The following section discusses the results of the joint models.

Table 2 Relative model–data fits and computation times (in seconds) of six models in empirical example 1

Figure 6 displays the developmental trajectories of latent ability and processing speed estimated using the three joint models. First, there was a high degree of consistency among the developmental trajectories estimated using the three models, especially between the results from the COV and ACL models. Second, the latent ability increased clearly between time point 1 and time point 2, but there was almost no change between time point 2 and time point 5. Meanwhile, the processing speed exhibited a steady increasing trend from time point 1 to time point 5 for all three models. These results are consistent with the findings of previous studies (Wang et al., 2020; Wang and Chen, 2020), and they explain the change trends of the observed variables in Fig. 4. Moreover, these results reflect the benefit of additional RT data analysis. That is, the benefit of using the designed intervention is not only in improving the latent ability but also in increasing processing speed., which cannot be detected if using RA data alone.

Fig. 6
figure 6

Estimated developmental trajectories of latent ability and processing speed in empirical example 1. Note. COV unstructured covariance matrix-based longitudinal joint model, LGC latent growth curve longitudinal joint model, ACL autoregressive cross-lagged longitudinal joint model. The thick solid line represents the developmental trajectory of the population mean

Furthermore, in the case of the COV model, the estimated scale changes of latent ability and processing speed over time (i.e., \({\hat{\upsigma}}_{\uptheta \left(p+1\right)}/{\hat{\upsigma}}_{\uptheta p}\) and \({\hat{\upsigma}}_{\uptau \left(p+1\right)}/{\hat{\upsigma}}_{\uptau p}\), respectively) were (1.093, 1.287, 0.849, 1.020)′ and (1.038, 1.120, 0.902, 0.970)′, respectively. In the case of the ACL model, the \({\hat{\upsigma}}_{\upvarepsilon_{\uptheta \left(p+1\right)}}/{\hat{\upsigma}}_{\upvarepsilon_{\uptheta p}}\) and \({\hat{\upsigma}}_{\upvarepsilon_{\uptau \left(p+1\right)}}/{\hat{\upsigma}}_{\upvarepsilon_{\uptau p}}\) over time were (1.110, 1.073, 0.995, 1.008)′ and (1.038, 1.051, 0.964, 0.988)′, respectively. In the case of the LGC model, neither \({\hat{\uprho}}_{\uppi_0{\uppi}_1}=0.296\) (95% highest posterior density [HDP] of [−0.375, 0.931]) nor \({\hat{\uprho}}_{\updelta_0{\updelta}_1}=0.039\) (95% HDP of [−0.664, 0.746]) was significantly different from zero. These results indicate the absence of a significant Matthew effectFootnote 5 for the changes in latent ability and processing speed.

Figure 7 presents the scatter plot between the estimates of latent ability and processing speed across all time points for all models. First, for each model, both latent ability and processing speed exhibit a high degree of correlation among the estimates at five time points, indicating a high degree of consistency in latent ability or processing speed at different time points (mainly because of the short interval between the time points in this test). Second, the estimates of both latent ability and processing speed obtained using different models at a given point exhibit a high degree of correlation, indicating a high degree of consistency between the estimates yielded by different models.

Fig. 7
figure 7

Scatterplot between estimates of latent ability and processing speed in empirical example 1. Note. M1 unstructured-covariance-matrix-based longitudinal joint model, M2 latent growth curve longitudinal joint model, M3 autoregressive cross-lagged longitudinal joint model, Tx xth time point. The upper and lower triangular matrices represent the estimates of processing speed and that of latent ability, respectively. The position of each dot on the horizontal and vertical axes in each scatter plot represents the estimates of the two models corresponding to the row and column in which the plot is located

Figure 8 displays the estimates of the item parameters of all the models considered in this empirical study. The item parameter estimation was highly consistent across the three models, especially the COV and ACL models. Most of the item difficulty parameter estimates were negative, indicating that the overall difficulty of the test was low, and the resulting ceiling effect may explain the observation that the average total raw score hardly changed after time point 2.

Fig. 8
figure 8

Violin plot of estimates of item parameters in empirical example 1. Note. COV unstructured-covariance-matrix-based longitudinal joint model, LGC latent growth curve longitudinal joint model, ACL autoregressive cross-lagged longitudinal joint model

In addition to the comparative results of the three abovementioned models, a few characteristics of the data itself can be identified using the parameter estimates of the different models. For example, Table 3 presents the estimates of correlations among latent ability and latent speed across all time points in the COV model. A low to medium degree of correlation was found between latent ability and latent speed at each of the time points. Figure 9 shows the estimates of the structural model parameters (i.e., correlation coefficients) of the LGC. In these data, there exists a correlation only between the starting level of latent ability and the starting level of processing speed. Moreover, there is no interaction between the development of these two constructs over time. Figure 10 presents the estimates of the structural model parameters (i.e., autoregressive coefficients and cross-lagged coefficients) of the ACL model. In these data, a parallel relationship exists between developmental change in latent ability and developmental change in processing speed, and they do not affect each other over time.

Table 3 Estimates of structural model parameters (i.e., correlation coefficients) in the unstructured-covariance-matrix-based longitudinal joint model in empirical example 1
Fig. 9
figure 9

Estimates of structural model parameters (i.e., correlation coefficients) of the latent growth curve longitudinal joint model in empirical example 1. Note. 95% highest posterior density (HPD) in parentheses. The dotted line indicates that the correlation coefficient is not significantly different from zero (i.e., 95% HPD includes zero). 95% HPD of each estimate was presented in Fig. S3 in the online appendix

Fig. 10
figure 10

Estimates of structural model parameters (i.e., path coefficients) of the autoregressive cross-lagged longitudinal joint model in empirical example 1. Note. 95% highest posterior density (HPD) in parentheses. The dotted line indicates that the path coefficient is not significantly different from zero (i.e., 95% HPD includes zero). 95% HPD of each estimate was presented in Figure S4 in the online appendix

Empirical example 2: Subitizing task in developmental psychology

Data description and analysis

This subitizing task was adapted from LeFevre et al. (2010). In the task, the participants were presented with a series of black dots (the number of dots ranged from 1 to 4) on a computer screen and asked to indicate verbally the number of dots they saw as fast and accurately as possible. In each trial, the picture of the dots was displayed after a 200-ms mask. The testers recorded the RA and RT data of each participant by pressing a key. The test consists of 18 items, and each series of dots (containing 1–4 dots) was presented thrice, albeit at different locations on the screen. A total of 204 Cantonese-speaking children participated in the task. They were followed for three years across five test time points.Footnote 6 At each of these time points, 190, 189, 183, 180, and 186 children completed the same task containing 18 items. The data set used herein was deployed in a previous study related to development psychology (Ouyang et al., 2022).

As depicted in Fig. 11, the average raw score increased over time, while the average RT decreased over time, possibly indicating that the participants’ latent ability and latent processing speed tended to increase and decrease, respectively, over time. Little’s test of MCAR was performed (across the five time points, approximately 8.85% of the data was missing), and the results indicated that the null hypothesis “the missing data are MCAR” cannot be rejected (for RA: χ2 = 534.863, df = 522, p = 0.339; for RT: χ2 = 528.768, df = 522, p = 0.409). The analysis process was identical to that followed for empirical example 1.

Fig. 11
figure 11

Change trends in total raw score and average response times over time (with standard deviation range)

Results

For simplicity, only a few main results are discussed here, and details about the results obtained for this data set can be found in Section S3.3 in the online appendix. The PSRFs of all the parameters of the six models were less than 1.1. Figure S5 in the online appendix displays the ppp values of the six models applied to the RA and RT data at each time point. The results of absolute fitting the data with the joint model and its corresponding separate model are generally consistent. Overall, the COV and ACL models were able to fit the data at all time points (nearly unfitted to the RTs at time point 5), while the LGC model was nearly unfitted to the RA data at time points 1 and 2, and it did not fit the RT data at time points 3–5. Table S11 in the online appendix summarizes the relative model–data fit indices and the computation times of the six models. The ACL model was found to be preferable on the basis of the DIC and WAIC values for both RA and RT data. Since the joint model generally fitted the data better than its corresponding separate model, the following section discusses the results of the joint models.

Figure 12 displays the estimated developmental trajectories of latent ability and processing speed of the three proposed models. There was a high degree of consistency in the estimated developmental trajectories from the three joint models. The estimated mean population latent ability and latent processing speed both increased over time. In addition, there was a significant upward shift in the processing speed of several children at time point 5, indicating the existence of heterogeneity among the children at that time point for some reason (e.g., rapid guessing owing to poor motivation).

Fig. 12
figure 12

Estimated developmental trajectories of latent ability and processing speed in empirical example 2. Note. COV = unstructured-covariance-matrix-based longitudinal joint model, LGC = latent growth curve longitudinal joint model, ACL = autoregressive cross-lagged longitudinal joint model. The thick solid line represents the developmental trajectory of the population mean

Figure S6 in the online appendix shows a scatter plot between the estimates of latent ability and processing speed across all time points for all models. The results indicate a high degree of consistency between the estimates obtained using different models, as well as a moderate correlation between latent abilities at different time points and a weak correlation between processing speeds at different time points. A possible reason for this phenomenon is the long interval between time points. Figure S7 displays the estimated item parameters of all models in the empirical study. There was a high level of agreement between the estimates of each of the item parameters of the three models.

Table 4 presents the estimated correlations among latent abilities and latent speed across all time points in the COV model. A low to moderate degree of correlation was found between latent ability and latent speed at each time point. Figure 13 shows the estimates of structural model parameters of the LGC model.Footnote 7 The results indicate that (a) a high degree of correlation existed between the starting levels of latent ability and processing speed, and (b) the increase in processing speed was slower for children with high starting levels of latent ability and processing speed. Figure 14 displays the estimates of the structural model parameters of the ACL model. The results mainly indicated that the latent ability at the current time point was positively influenced by both latent ability and latent processing speed at the previous time point. These parallel developmental phenomena with cross-time relationships between latent ability and processing speed cannot be observed from RA data alone.

Table 4 Estimated structural model parameters (i.e., correlation coefficients) of the unstructured-covariance-matrix-based longitudinal joint model in empirical example 2
Fig. 13
figure 13

Estimated structural model parameters (i.e., correlation coefficients) of the latent growth curve longitudinal joint model in empirical example 1. Note. 95% highest posterior density (HPD) in parentheses. The dotted line indicates that the correlation coefficient is not significantly different from zero (i.e., 95% HPD includes zero). 95% HPD of each estimate was presented in Figure S8 in the online appendix

Fig. 14
figure 14

Estimated structural model parameters (i.e., path coefficients) of the autoregressive cross-lagged longitudinal joint model in empirical example 1. Note. 95% highest posterior density (HPD) in parentheses. The dotted line indicates that the path coefficient is not significantly different from zero (i.e., 95% HPD includes zero). 95% HPD of each estimate was presented in Figure S9 in the online appendix

Simulation study

A simulation study was conducted to further explore the psychometric performance of the proposed models in different test scenarios. Note that we did not intend to compare the performance across the three modeling approaches, because they have their own modeling assumptions, and therefore cannot be compared fairly using the same data-generation mechanism.

Design, data generation, and analysis

For each model, we considered three factors, namely, sample size N = 250 and 500, test length at each time point Ip = 15 and 30, and the number of test time points, P = 3 and 5. We assume the same items were used at each time point, as in the empirical studies. The true item parameters were simulated by referring to those in the literature (e.g., Bolsinova & Tijmstra, 2018; Fox & Marianti, 2016; Man et al., 2019; Wang, Zhang, et al., 2018b; Zhan et al., 2018). Specifically, \({b}_{it}={b}_i\sim N\left({\upmu}_b,{\upsigma}_b^2\right)=N\left(0,1\right)\),\({a}_{it}={a}_i\sim N\left({\upmu}_a,{\upsigma}_a^2\right)=N\left(1,0.05\right)\), \({\upxi}_{ip}={\upxi}_i\sim N\left({\upmu}_{\upxi},{\upsigma}_{\upxi}^2\right)=N\left(4,0.25\right)\), \({\upphi}_{ip}={\upphi}_i\sim N\left({\upmu}_{\upphi},{\upsigma}_{\upphi}^2\right)=N\left(1,0.05\right)\), and \({\upomega}_{ip}={\upomega}_i\sim N\left({\upmu}_{\upomega},{\upsigma}_{\upomega}^2\right)=N\left(2,0.05\right)\).

The person parameters of the COV model were generated using Eq. (3), where μθ = μτ = (0, 0.25, 0.5, 0.75, 1)′. The generated unstructured covariance matrix Σ is presented in Table S13 in the online appendix, and it was generated mimicking the \(\hat{\boldsymbol{\Sigma}}\) in empirical example 1.The person parameters of the LGC model were generated using Eqs. (4) and (5). Four growth factors were generated using Eq. (6), where \({\upmu}_{\uppi_0}={\upmu}_{\updelta_0}=0\), \({\upmu}_{\uppi_1}={\upmu}_{\updelta_1}=0.25\), \({\upsigma}_{\uppi_0}^2={\upsigma}_{\updelta_0}^2=1\),\({\upsigma}_{\uppi_1}^2={\upsigma}_{\updelta_1}^2=0.02\), \({\uprho}_{\updelta_0{\uppi}_0}=0.5\), and the correlations between all the other growth factors were set to 0.2. The residual terms were generated from \({\upvarepsilon}_{\uptheta_{np}}\sim N\left(0,{\upsigma}_{\uptheta_p}^2\right)=N\left(0,0.001\right)\) and \({\upvarepsilon}_{\uptau_{np}}\sim N\left(0,{\upsigma}_{\uptau_p}^2\right)=N\left(0,0.001\right)\). The person parameters of the ACL model were generated using Eqs. (7) and (8), where β01 = λ01 = 0, β0p ≥ 2 = λ0p ≥ 2 = 0.25, β1p = λ1p = 0.95, and β2p = λ2p = 0.1. In Eq. (9), \(\left(\begin{array}{c}{\upvarepsilon}_{\uptheta_{n1}}\\ {}{\upvarepsilon}_{\uptau_{n1}}\end{array}\right)\sim MVN\left(\left(\begin{array}{c}0\\ {}0\end{array}\right),\left(\begin{array}{cc}1& 0.5\\ {}0.5& 1\end{array}\right)\right)\) and \(\left(\begin{array}{c}{\upvarepsilon}_{\uptheta_{np\ge 2}}\\ {}{\upvarepsilon}_{\uptau_{np\ge 2}}\end{array}\right)\sim MVN\left(\left(\begin{array}{c}0\\ {}0\end{array}\right),\left(\begin{array}{cc}0.001& 0\\ {}0& 0.001\end{array}\right)\right)\). Finally, for each of the three joint models, 50 data sets were generated for each simulated test situation. In each data set, the model parameters were regenerated from the same distribution mentioned above. The three joint models and their corresponding separate models were used to fit the data.

To evaluate the estimation accuracy, we computed bias and root mean square error (RMSE) as \(\textrm{bias}\left(\hat{x}\right)=\frac{\sum_{r=1}^{50}{\hat{x}}_r-{x}_r}{50}\) and \(\textrm{RMSE}\left(\hat{x}\right)=\sqrt{\frac{\sum_{r=1}^{50}{\left({\hat{x}}_r-{x}_r\right)}^2}{50}}\), respectively, where xr is the true value of the model parameter generated in replicate r (r = 1, …, 50), and \({\hat{x}}_r\) is the corresponding parameter estimate in replicate r. In addition, the correlations between the generated and estimated values (denoted as Cor) of the model parameters were computed. Other analysis processes were identical to those in the empirical examples, except that the length of each Markov chain was shortened to 4000 iterations, of which the first 1000 iterations were earmarked for burn-in. The PSRFs of all model parameters were less than 1.1, suggesting good convergence under the specified setting.

Results

Table 5 presents the recovery of the item parameters of the three joint models (Table S14 in online appendix presents the recovery of the item parameters of three separate models). Except for the time precision parameter (ω), which is largely unaffected, the recoveries of the other item parameters of the joint model were better than those of the corresponding separate model. For three joint models, it was possible to recover all the item parameters across all simulated conditions, where all the biases were close to zero, RMSEs were less than 0.15 for the RA model parameters and less than 0.1 for the RT model parameters, and all Cor values were close to 1. A larger sample size and a longer test length improved the recovery of the item parameters. However, the test length had a relatively weaker effect than the sample size. In addition, the number of time points had little effect on the recovery of the item parameters.

Table 5 Recovery of item parameters of joint models in simulation study

Table 6 presents the RMSE and Cor values of the person parameter estimates obtained using the three models (Table S15 in online appendix presents the RMSE and Cor values of the person parameter estimates of three separate models). In addition, bias values of the six models are presented in Tables S16 and S17, respectively, in the online appendix. The recoveries of two person parameters of the joint model were better than those of the corresponding separate model. For three joint models, the recovery of latent processing speed was better than that of latent ability. A longer test length improved the recovery of the person parameters. The increase in the number of time points was beneficial for recovery in the ACL model, but it had no effect on recovery for the COV and LGC models. The sample size had little effect on the recovery of the person parameters for all models.

Table 6 Recovery of person parameters of joint models in simulation study

Furthermore, Table S18 in online appendix presents the average computation time for all models under all conditions. Generally speaking, the computation time of the joint model was longer than that of its corresponding separate model, and the computation time increased as the sample size, number of time points, and number of items increased.

Conclusion and discussion

In this study, we proposed three longitudinal joint modeling approaches to simultaneously analyze longitudinal RA and RT data for assessing the parallel interactive development of latent ability and processing speed: the COV, LGC, and ACL models. Analysis results from these models can not only inform the developmental trajectories of latent ability and processing speed individually, but also show the relationship between changes in latent ability and processing speed through the across-time relations of these two constructs. The results of the empirical studies indicate that (1) all three joint models are practically applicable, and their conclusions were highly consistent in terms of the changes in ability and speed in the analysis of the same data set, and (2) additional analysis of RT data and acquisition of individual processing speed measurements can reveal the parallel interactive development phenomena that are difficult to detect when using RA data alone. In other words, without the measures of processing speed, the interaction between latent ability and processing speed cannot be detected during the development process. In addition, the results of the simulation study demonstrate that the proposed Bayesian MCMC estimation algorithm can ensure accurate model parameter recovery for all three proposed longitudinal joint models. In addition, for both empirical examples and simulation studies, we also compared the fit of joint models and of the separate models on one type of data at a time. Results of these comparisons indicate that the joint analysis of bimodal data by considering the relationships among different latent variables can improve model–data fit and model parameter recovery to some extent compared with the separate models for one type of data at a time. On one hand, we do note that this conclusion may depend on the correlation between latent variables. For example, when the correlation between latent ability and processing speed is low, the joint model does not necessarily fit the data better than its corresponding separate model, especially if one uses the relative model–data fit indices which account for the model complexity penalty. On the other hand, the result from the joint models allows us to capture the exact magnitude of the correlation between different latent variables, which is more informative than not knowing or considering the correlation between them.

The choice of the modeling approach depends on the specific context of a longitudinal analysis. The COV model can be used if the primary objective is to determine the overall growth at each time point. The LCG model is useful for obtaining the developmental trajectories of latent ability and processing speed, such as the consistency of or differences in changes between the two constructs. The ACL model is suitable when the objective is to analyze the self-influences of latent ability and processing speed and determine the relationships between the two constructs at adjacent time points. In addition, the LGC model did not fit well to the two empirical data sets in this study. This is mainly due to its strictest assumptions related to the developmental trajectories of the latent constructs (i.e., linear developmental trajectory) compared with the other two models, and these assumptions are difficult to satisfy in practice. By contrast, the COV and ACL models have more lenient assumptions related to the developmental trajectory of the latent constructs, and therefore they are more inclusive of practice data than the LGC model.

The results of this study shed light on the use of both RA and RT data to solve problems in various subdomains of psychology, including clinical psychology and developmental psychology, which rely heavily on the use of RA data alone. For example, the results of common psychiatric scales in clinical psychology tend to be interpreted using RA data alone. Specifically, decisions regarding whether an individual has a certain mental illness or abnormal tendency are made by comparing the individual's scale score with a certain cutoff point. However, the outward manifestations of an individual’s mental illness may not be fully expressed using their choices of Likert-type items. For example, usually, people with depression or depressive tendencies take more time to complete the depression scale compared to people who are not affected by depression. In this sense, the joint analysis of RA and RT data might provide superior diagnoses of mental illnesses. In addition, as has been demonstrated in our empirical data analysis, the joint analysis of longitudinal RA and RT data may reveal a significant change in speed when the change in ability is not significant (example 1). This might provide additional information for assessing the effectiveness of the interventions related to mental health when a repeated measure design is adopted.

Next, we discuss the limitations of our study. First, we have only considered the single-group situation in this study. That is, all individuals within the population are assumed to have homogeneous average developmental trajectory for a specific construct. To address the heterogeneity of developmental trajectories among individuals, we can extend the current models through the multigroup modeling (e.g., von Davier et al., 2011) and mixture modeling (e.g., Muthén & Shedden, 1999; Zhang & Wang, 2019) frameworks in the future. Second, herein, we assume the lack of item parameter drift throughout all analyses. This means that all item parameters (e.g., item difficulty and time-intensity) of the same item are assumed to be invariant over time. However, item parameter drift might be expected in repeated measure design, and in that case, more work would be required to address this issue when generating estimates using the proposed models. Third, although this study provides insights into the measurement of individual growth from a holistic perspective, only two data modalities, namely RA and RT, and the constructs measured using these modalities, are considered. In recent years, with the increasing popularity of technology-enhanced assessments (Jiao & Lissitz, 2018), the acquisition of multimodal data beyond RA and RTs, such as action sequences (Han et al., 2021), eye-tracking (Man et al., 2022; Zhan et al., 2022), and brain activation (Jeon et al., 2021), has become possible. Technology-enhanced longitudinal assessments could be considered in the future to assess the parallel interactive development of multiple constructs (e.g., latent ability, processing speed, visual engagement, and brain activation) through the analysis of longitudinal multimodal data. Fourth, the proposed modeling approaches are longitudinal extensions of the cross-sectional joint-hierarchical modeling approach (van der Linden, 2007), in which separate measurement models are implemented for RA and RT data under conditional independence assumptions. A few recent studies have proposed the joint cross-loading modeling approach (e.g., Bolsinova & Tijmstra, 2018; Molenaar et al., 2015), which attempts to relax the conditional independence assumptions and improve the precision of latent ability estimation through direct extraction of information from the RT data. We will investigate longitudinal extension of the joint cross-loading modeling approach in the future. Fifth, given the focus of the present study on extending the applicability of the joint-hierarchical latent variable modeling approach (van der Linden, 2007), the proposed models draw on the three most commonly used longitudinal modeling approaches in SEM, but several new extensions of these approaches are not considered in the present study (e.g., Bianconcini & Bollen, 2018; Bishop et al., 2015; Hamaker et al., 2015; Kohli & Harring, 2013).

Finally, we discuss several other directions of study that can be explored in the future. First, in terms of the model assumptions related to dimensionality of the latent construct, one route to increase the capability of the proposed longitudinal joint models in terms of describing the interactions of students and items is to hypothesize that persons vary on a wide range of latent constructs. Cognitive science (Frederiksen et al., 1990) shows that subsets of those constructs are important for correct response to specific items. Hence, we could consider modeling the parallel interactive development of multidimensional latent ability (Reckase, 2009) and multidimensional processing speed (Zhan et al., 2021) in multidimensional longitudinal assessments. Another straightforward extension of the proposed models is to incorporate observed covariates (e.g., background variables and number of interventions) to explain the variations in individual growth. For example, for the COV model, the constructs at a specific time point may be regressed upon the observed covariates. For the LGC model, either the growth intercept or the growth slope or both can be regressed upon the observed covariates to explain the variations in initial statuses or growth rates. For the ACL model, the observed covariates can be directly added to the regression model. Finally, dimensionality-reduction methods (e.g., Cai, 2010; Gibbons & Hedeker, 1992) can be investigated and incorporated into the model estimation procedure. This is especially necessary when there are large numbers of latent constructs and time points, which may lead to the high-dimensionality problem in the proposed joint models.