Chapter 16 Conditioning Variables for Marginal Maximum Likelihood Estimation

In this session, we will learn about
- What conditioning variables are
- Why we need to use conditioning variable.

16.1 Introduction

As the marginal maximum likelihood estimation method makes an assumption about the (prior) ability distribution, a mis-specification of this prior distribution will result in incorrect inferences. In this chapter, we will carry out simulations to check the issues with the mis-specifications of the prior distribution, and the methods to mitigate these issues.

16.2 Effect of Prior distribution on Posterior distribution

In Chapter 13, Tables 13.2 and 13.3 show that the posterior distribution is obtained by multiplying the prior by the item response probabilities. In this way, the prior distribution acts as weights for the item response probabilities. If the prior consists mostly of low ability students, the posterior will lean towards low abilities. Conversely, if the prior consists of high ability students, the posterior will lean towards the higher end of the ability scale, even if the item responses stay the same.

16.3 Simulation

The following is a simulation to demonstrate the effect of the prior distribution. The item responses come from two groups of respondents with mean abilities of -1 and 1 respectively. The item responses of the two groups are combined to form one data set, and IRT scaling is carried out ignoring the group membership of each respondent. Plausible values are drawn and the group mean abilities are estimated from the PVs.

library(TAM)
generateRasch2 <- function(N,I,m){
  theta <- rnorm( N, mean=m ) # student abilities from normal distribution with mean m and var 1
  p1 <- plogis( outer( theta , seq( -2 , 2 , len=I ) , "-" ) )  #item diff from -2 to 2
  resp <- 1 * ( p1 > matrix( runif( N*I ) , nrow=N , ncol=I ) )  # item responses
  colnames(resp) <- paste("I" , 1:I, sep="")
  return(list(resp=resp,theta=theta))
}
#Generate Rasch item responses for 2000 students and 30 items
generateData <- generateRasch2(2000,30,-1) #ability distribution with mean=-1
resp1 <- generateData$resp
generateData <- generateRasch2(2000,30,1) #ability distribution with mean=1
resp2 <- generateData$resp
resp <- rbind(resp1,resp2) #combine both sets of item response data
mod1 <- tam.mml(resp)
pvs <- tam.pv(mod1)
pv1 <- pvs$pv[1:2000,]
pv2 <- pvs$pv[2001:4000,]
m1 <- apply(pv1[,-1],2,mean)
m2 <- apply(pv2[,-1],2,mean)
mean(m1)
mean(m2)

The estimated mean ability is -0.881 for group 1 and 0.892 for group 2. By assuming that both groups come from the same ability distribution, the PVs are ‘shrunken’ towards the mean of the distribution, which is 0 in this case, leading to an over-estimate of group 1 mean and an under-estimate of group 2 mean. To solve this problem, we need to specify that the respondents come from two different groups. We can set up a group variable to specify the membership of the respondents in groups.

g <- c(rep(1,2000),rep(2,2000)) #group membership
mod2 <- tam.mml(resp,group=g)
pvs <- tam.pv(mod2)
pv1 <- pvs$pv[1:2000,]
pv2 <- pvs$pv[2001:4000,]
m1 <- apply(pv1[,-1],2,mean)
m2 <- apply(pv2[,-1],2,mean)
mean(m1)
mean(m2)

The estimated mean ability is -0.004 for group 1 and 1.894 for group 2 when we specify the respondents’ group membership. Note that when we have more than one group, the constraint is set so the mean of the first group is zero. The key is that the difference in the means of the two groups is correct.

16.4 JML recovers group means well

If we use the JML estimation method where there is no population assumption, the group means are recovered well.

#JML
mod_jml <- tam.jml(resp )
m1 <- mean(mod_jml$WLE[1:2000])
m2 <- mean(mod_jml$WLE[2001:4000])

The estimated mean ability is -0.966 for group 1 and 0.962 for group 2. The group means are recovered well. However, the variance estimates will still have the issues due to measurement errors as discussed in Chapter 13.

16.5 Latent Regression

The group variable can be used when there are discrete groups. Sometimes the membership variable may be continuous. For example, the SES (socio-economic status) measure is on a continuous scale. Suppose one unit of an SES measure can increase ability mean by 0.3. That is, students with SES measure of 0 come from an ability distribution with mean 0. Students with SES measure of 1 come from an ability distribution with mean 0.3. Since there are an infinite number of possible SES measures, there are an infinite number of ability distributions for the students.

In this case, we will use a “regressor” specified in TAM by the Y variable. The following is an example where a continuous variable that relates to the ability measure is added to the model.

library(TAM)
generateRasch3 <- function(N,I){
  theta <- rnorm( N) # student abilities from normal distribution with mean 0 and var 1
  y <- rnorm(N) #Covariate/regressor
  theta <- theta + 0.3*y #Add regressor to ability
  p1 <- plogis( outer( theta , seq( -2 , 2 , len=I ) , "-" ) )  #item diff from -2 to 2
  resp <- 1 * ( p1 > matrix( runif( N*I ) , nrow=N , ncol=I ) )  # item responses
  colnames(resp) <- paste("I" , 1:I, sep="")
  return(list(resp=resp,theta=theta,y=y)) #return the covariate as well
}
generateData <- generateRasch3(2000,30)
resp <- generateData$resp
y <- matrix(generateData$y,ncol=1) #the covariates are stored in a matrix

mod4 <- tam.mml(resp,Y=y)
mod4$beta  #The regression coefficient is directly estimated
pvs <- tam.pv(mod4)
pv <- pvs$pv[,-1] 
r <- apply(pv,2,function(x){lm(x~y)$coefficients[2]})
mean(r) #recover regression coefficient from PVs

The latent regression coefficient is directly estimated and the results are stored in mod4$beta:

mod4$beta

##           [,1]
## [1,] 0.0000000
## [2,] 0.3107401

We can also “recover” the regression coefficients using PVs.

mean(r)

## [1] 0.3026608

What the regression coefficient means is that if a student has an SES measure of x, then that student comes from an ability distribution with mean of 0.3x, assuming that when the SES measure is 0, the ability distribution has a mean of 0.

16.6 Conditioning variables

The group variables and the latent regressors are called conditioning variables. Since there are many possible grouping variables and regressors, when the item response data is scaled, we need to specify all the regressors we are interested in so that any group results we report are correct. In PISA, many student background questionnaire variables are used as conditioning variables. The process is described in the PISA 2018 technical report Chapter 9, p 21.

16.7 Summary

To summarise, using the MML estimation method solves many problems due to measurement errors and complex sampling, but if we are interested in subgroup results, the subgroup memberships must be included as conditioning variables in the scaling model.

16.8 Homework

Simulate item responses for two groups of students where the first group has mean ability of -1 and the second group has mean ability of 1. Scale the IRT model without specifying group membership. Explore the over- and under-estimations of the group means as a function of the number of items in the test.