Chapter 15 Using Plausible Values to Analyse PISA data

In this session, we will learn about
- How to download PISA data sets and read the data into R.
- How to extract data from the whole data set.
- The meaning of sampling weights and replicate weights.
- How to use plausible values to calculate population estimates.
- How to calculate standard errors of the estimates.

15.1 Introduction

In the previous two chapters, we have used simulation to demonstrate the differences between various ability measures and the use of plausible values. In this chapter, we will carry out data analysis for a ‘real’ data set rather than a simulated data set. In particular, we will look at the PISA data sets for the 2018 test administration.

15.2 Download PISA data

OECD provides links to PISA data sets in either SAS format or SPSS format. The files are very large, so you need to have a reliable internet connection and ample disk space on your computer. For this chapter, we will look at the 2018 student background questionnaire data which includes the plausible values for the cognitive dimensions such as Reading, Mathematics and Science.

Search for “PISA 2018 database” in your web browser, or click this link. The file we need for the exercises in this chapter is under “SPSS Data Files” called “Student questionnaire data file.” The file size is 489 MB, so it will take a while to download the file. Once you have downloaded the file, you will need to unzip (un-compress) the file to extract the data file. The data file, called “CY07_MSU_STU_QQQ.sav,” is 1.8 GB in size. You will need to have storage space on your computer to store this file.

While the data file we download is an SPSS .sav file, we do not need SPSS to process it. R can read in the SPSS file.

15.3 Reading in SPSS .sav file in R

You will need the R package, foreign, to read in the SPSS file into R. So make sure you have installed the package, foreign, first.

The following R code is used to read in the SPSS file we have downloaded. You will need to change the working directory to the folder where you have stored the file. Note that as the file is very large, it can take up to several minutes for R to read this file.



pisa2018 = read.spss("CY07_MSU_STU_QQQ.sav",use.value.labels=FALSE,

Under the “Environment” tab of R Studio, we can see that the object pisa2018 has 612004 observations of 1119 variables!!

It will be better to extract the relevant data from pisa2018, so we will not be reading through the entire data set each time. The following R code extracts Australian data. Note that the variable “CNT” contains three-letter abbreviations of country names. To see the abbreviations for country names, download the CODEBOOK file from the PISA data base site.

AUS <- pisa2018[pisa2018$CNT=="AUS",]

Before we explain about how to use plausible values to estimate population statistics, we will first explain the terms “sampling weights” and “replicate weights.”

15.4 Sampling weights and Replicate Weights

First, it should be noted that sampling weights and replicate weights are two completely separate concepts. Although in usage, sometimes a little confusion occurs, as the replicate weights provided in the PISA data base incorporate the sampling weights.

15.4.1 Sampling weights

When students are sampled from a population, each student in the sample represents a number of students in the population. If the sample is a simple random sample, then each student will have the same sampling weight. For example, if there are 10000 students in the population, and 1000 students are chosen in a sample randomly, then each student in the sample represents 10 students in the population. So the sampling weight for each student is 10. Since the sampling weight is the same for all students, we do not need to use the sampling weights for calculating a population estimate.

In large-scale assessments, the sampling process is complex. Typically, a stratified sampling frame is constructed to represent different geographical regions, sectors of the communities, and many other student/school background variables, so that the sample chosen will be more representative of the population. In addition, a two-stage sampling is used where schools are sampled first, and the students/classes within the sampled schools are chosen. There are some practical reasons for this approach. For example, it will be difficult to construct a sampling frame for all eligible students in a country, but it will be easier to have a sampling frame of all the schools. This two-stage sampling approach adds some complications to the computation of population estimates and their standard errors. In the data releases, each student’s sampling weight is provided, so we must weight each student’s variable value by their sampling weights before calculating any aggregated statistics. In PISA, student sampling weights are stored in a variable called W_FSTUWT. The following shows the sampling weights for the first 16 Australian students in the PISA 2018 database.

##  [1] 21.78073 23.81628 23.81628 23.81628 23.81628 21.78073 21.48297 21.78073
##  [9] 21.78073 23.81628 21.78073 21.48297 21.78073 21.78073 23.81628 21.78073

If we want to compute the mean estimate of a variable, we need to (1) multiple each student’s value on the variable by their sampling weight, (2) add these up, and (3) divide by the sum of the weights. The result is a weighted mean estimate. The following R code shows the computation of the mean value for the variable “Instructional Minutes in Science” (variable name: SMINS). To calculate the mean minutes per week of science instructional time in Australia, we need to first remove missing data, and then compute a weighted average, as shown in the R code block below.

minutes <- AUS$SMINS  #Number of Science instructional minutes per week
missing <- which(  #Find students with missing value
minutes <- minutes[-missing] #Remove students with missing values
weight <- AUS$W_FSTUWT[-missing]  #Remove the weights for students with missing values
ave_minutes <- sum(minutes*weight)/sum(weight) #Calculate weighted mean
## [1] 209.0345

The estimated average number of Science instructional minutes per week in Australian schools is 209 minutes.

As the above R code block shows, it is relatively straightforward to calculate weighted statistics given the sampling weights. However, the computation of the standard errors of the estimates is more complicated in concepts, particularly when the sample is not a simple random sample. The good news, however, is that the commands needed in R are still quite simple.

15.4.2 Replicate weights and standard errors

If a simple random sample is chosen, the standard error of the sample mean is just the standard deviation divided by the square root of the sample size. This is simple to compute. However, when we have two-stage sampling, the standard error of a sample statistic is much larger than for simple random sample, because students from the same school tend to be more similar, so we “waste” some data by testing similar students. Ideally,we would want to have a sample that is “diverse” to represent the population. This may be called the “design effect” which quantifies the loss of efficiency in the sample chosen because students are selected in “clusters.”

The standard error is also called the sampling error. It is the possible variation in the point estimate when different samples are drawn. Since we only have one sample, and we cannot collect more samples for the sake of calculating the sampling error, we need to have other ways of estimating the sampling error from just the one sample we have. As mentioned before, for simple random samples, there is a formula for calculating the sampling error of mean estimates from one sample. But for complex sampling, we cannot use this formula, as the sampling error will depend on the sampling design - that is, we do not have a simple formula that works for all sampling designs.

One way to emulate the process of repeated sampling is to alter the composition of the sample we currently have, and regard these altered samples as samples we could possibly draw, without actually going out and sample the real data. PISA uses a method of changing the weights of each school in our sample, and regarding these samples with changed weights as new samples. The method is called Balanced Repeated Replication Method (BRR). The changed weights for schools, and therefore for the students, are called replicate weights. To emulate many samples so we can calculate the sampling error, there are 80 sets of replicate weights. To calculate the sampling error, we need to compute a statistic using each set of the replicate weights, and then calculate the sampling variance (square of the sample error) using a formula. The following shows the formula for combining results from the 80 replicates of a statistic to compute the sampling variance.

\(\sigma^2_{T} = \frac{1}{20}\sum\limits_{r=1}^R (T_r-T)^2\)

where \(T\) is the estimate calculated in our original sample (with student weights, but not replicate weights), and \(T_r\) is the estimate using the rth set of replicate weights and the student weights. In PISA, the replicate weights already incorporated student sampling weights, so we can just multiply the variable value by the replicate weights.

The sampling error is the square root of the sampling variance.

Sampling error \(=\sqrt{\sigma^2_{T}}\)

While the methodology of computing the standard error seems complex, the R code for this procedure is remarkably simple. The following shows the R code for calculating the standard error of the variable “Instructional time in Science” in Australia. Note that in the PISA data base, the replicate weights have variable names W_FSTURWT1 to W_FSTURWT80, and these have already been multiplied by the student sampling weights.

rep <- subset(AUS,select=(W_FSTUWT:W_FSTURWT80))  #Extract sampling weights and replicate weights. 
rep <- rep[-missing,]  #Remove the weights for students with missing values 
min81 <- apply(rep,2,function(x){sum(minutes*x)/sum(x)}) #Calculate mean for each set of replicate weights.
# The first value in min81 is the mean with sampling weights but not replicate weights.
sig2 <- 0.05*sum((min81-min81[1])^2) #Combine 80 estimates to form sampling variance.
sqrt(sig2) #This is the standard error.
## [1] 1.511606

The standard error for the number of Science instructional minutes in Australia is 1.5.

15.5 Using Plausible Values to compute mean estimates and standard errors

In the above example for the variable “Science Instructional Minutes,” there is one value for each student (except for the missing responses). These values are assumed to be without measurement error. In the computation of standard errors, we use these values as if they are “true” observations for each student. However, when we want to estimate student abilities, the measurement errors associated with ability measures are too large to ignore. For this reason, plausible values are provided as we explained in Chapter 14. To calculate mean ability estimate, we need to aggregate the results from all sets of plausible values. To calculate the standard errors of ability estimates, we need to first calculate sampling error using replication methods for each set of plausible values, average these, and combine the sampling error with the measurement error, as shown in Eq.(14.1). The following shows the sequence of steps of using plausible values to calculate a mean estimate.

  1. Compute weighted mean for each set of PVs: \(T_1, T_2, ..., T_D\), where \(D\) is the number of PV’s per student. Use student sample weights (W_FSTUWT) for this computation.
  2. Average the \(D\) \(T_i\)’s. Call this value \(T\). This is the mean value to report.
  3. Calculate the sample variance of the \(D\) \(T_i\)’s. Call this value \(v_m\).
  4. For each set of PVs, compute sampling variance using 80 replicate weights as outlined above. We will have \(D\) sampling variances: \(\sigma_1^2,\sigma_2^2,..., \sigma_D^2\) .
  5. Average the \(D\) \(\sigma_i^2\)’s. Call this value \(v_s\).
  6. Combine sampling and measurement variances using Eq.(14.1).
    \(v = v_s + (1+\frac{1}{D})v_m\)
  7. Take the square root of \(v\). This is the standard error of \(T\).

While the above steps seem complicated, the R code is quite simple, thanks to the power of the R apply function. With two apply functions nested together, with one cycling through the 80 replicates and one cycling through the 10 PVs, the computation is simple. The following R code shows the computation of mean mathematics score for girls in Australia, with the standard error.

#Math score for Australian girls

AUSg <- AUS[AUS$ST004D01T==1,]  #Gender variable ST004D01T, Code 1 for girls.
MathPV <- subset(AUSg,select=(PV1MATH:PV10MATH)) #Extract PVs for mathematics
rep <- subset(AUSg,select=(W_FSTUWT:W_FSTURWT80))  #Extract sampling weights and replicate 
means81 <- apply(MathPV,2,function(x){apply(rep,2,
T <- mean(means81[1,]) #point estimate of mathematics ability
v_m <- var(means81[1,]) #measurement variance
sig2 <- apply(means81,2,function(x){0.05*sum((x-x[1])^2)})
v_s <- mean(sig2) #sampling variance
se <- sqrt(v_s+(1+1/10)*v_m) #standard error

The estimated PISA mathematics mean score for Australian girls is 488.3, and the standard error is 2.46.

15.6 Homework

Calculate the mean Science score for Australian boys, with standard error.