Tuesday, February 23, 2021

Editor’s note: This blog post is part of an ongoing series entitled “Technically Speaking.” In these posts, we write in a way that is understandable about very technical principles that we use in reading research. We want to improve busy practitioners’ and family members’ abilities to be good consumers of reading research and to deepen their understanding of how our research operates to provide the best information.

As humans, we are predisposed to make causal statements. Often, we believe (or want to believe) that one action caused another. Here are some examples of unjustified causal statements that might be made in our everyday lives:

Unjustified Causal Statement 1: I was late to work because I took a different route, and the traffic was terrible!

Reality: We actually do not know if the traffic also would have been terrible if the person took the typical route to work. There may have been something else happening in town that day that caused traffic to be terrible everywhere.

Unjustified Causal Statement 2: I did well on the exam because I wore my favorite t-shirt.

Reality: Wearing a favorite t-shirt probably does little to nothing to influence the outcome of an exam. It may provide a sense of comfort or confidence that alleviates test anxiety. However, the student likely took other steps to prepare for the exam, besides remembering to wear the favorite t-shirt, that influenced the student’s grade.

Causal inference is a combination of logical arguments and statistical methods. There are several different frameworks for causal inference. Although there are some clear similarities among the frameworks, there also are some key distinctions. For example, some frameworks postulate that causal inference is not possible without manipulating the variable(s) to which a cause will be attributed, while other frameworks do not require manipulation of the variable(s). Despite these differences, most—if not all—frameworks agree that a causal relationship exists if these three conditions are met:

  1. The cause precedes the effect (e.g., the instruction was delivered before a test was administered).
  2. The cause was related to the effect (e.g., the instruction was targeting reading skills that were being tested).
  3. No alternative explanation exists for the effect other than the cause (e.g., the student did not get new glasses that corrected a problem with seeing the print, the test was not made easier than tests the student took previously).

In reading experiments, there can be a little ambiguity about these three characteristics. To be a true experiment, the units (e.g., students, classrooms, or schools) must be randomly allocated to a condition, meaning a probabilistic model is used to determine whether or not the unit will receive the treatment or the variable being investigated as a cause. The simplest way to think about this process is to imagine the flip of a coin. If the coin lands on heads, a unit is assigned to the new reading program. If the coin lands on tails, the unit is assigned to receive the customary reading instruction.

Researchers actually do not flip coins to make these decisions. Rather, they write computer code that allows for assigning units to multiple conditions with equal or unequal probabilities, depending on the design of the experiment. After randomly allocating units to conditions, researchers verify that the groups receiving the different reading programs are comparable on expectation, or similar on average, based on the presence of important observable characteristics. For example, assume it is believed that gender or socioeconomic status may determine how students respond to a reading program or perform on the outcome measure. Once students are randomly allocated to either the new reading program or the typical instruction, the means (or proportions) of males and students receiving free or reduced-price meals in each group must be compared to determine if the groups’ compositions are similar on those important characteristics. There are other more elaborate experimental designs (e.g., randomized block designs) to ensure that key characteristics are balanced in the random allocation process. But the purpose is to account for other plausible explanations for the outcomes.

Potential Outcomes

In causal inference, we ideally would like to know what would have happened if a student exposed to the new reading program (treatment) had instead been exposed to the typical reading instruction (control). An individual causal effect is the difference between what happened to a participant in the treatment condition and what would have happened to the same participant if he or she had instead been exposed to the control condition. Until we are able to travel across time, it is not possible to expose the same unit to two treatment conditions without altering (or modifying) that unit. This is known as the fundamental problem of causal inference (Holland, 1986).

Potential outcomes, also known as the Rubin causal model (Rubin, 1974, 2005), provide a framework to understand this key component. Back to our example experiment, before a student randomly assigned to receive the treatment is exposed to that new reading program, there are at least two potential outcomes for that student. Potential outcomes make clear that only one outcome will be observed after the exposure—the change in the student’s reading performance being attributed to the new reading program. The other outcome (i.e., exposure to the typical reading instruction) is not observed and, as such, is a “potential.” In this framework, the same student receiving a different treatment at a different time would be considered a different unit. The non-observable outcome is called the counterfactual (i.e., relating to or expressing what has not happened or is not the case at the present time for a given unit).

Causal Inference Demonstrates the Importance of Random Allocation of Units

When random allocation is not used in a study, units may be purposefully allocated to conditions. In that case, the simple comparison of average scores between groups may not produce an unbiased estimate of the treatment effect. In other words, it might not be possible to make a causal inference about the reading program leading to students’ reading outcome. As an example, let us say there are four students, each assigned to one of two reading programs: Program 1 and Program 0. In the example below (see Table 1), we intentionally pretend that each student has been assigned to the program that produces a higher reading score for that student. We are doing this to demonstrate the importance of the assignment mechanisms.

Table 1. Reading Scores for Program 1 and Program 0

Unit (Student) Y(1) Y(0) Y(1) - Y(0) W
1 11 16* -5 0
2 13 22* -9 0
3 19* 18 1 1
4 21* 20 1 1
Population mean 16 19 -3  
Observed mean 20 19    

Note. * = observed scores; Population mean = the mean score of all students, regardless of whether they were assigned to that program; Observed mean = the mean score of the students assigned to that program, which is computed using only the scores with an asterisk.

In the table, Y represents the reading score after the student is exposed to the reading program. The column “Y(1)” displays the potential outcomes, or reading scores, under reading Program 1. The column “Y(0)” displays the potential outcomes, or reading scores, under reading Program 0. Recall that, in practice, we would observe a single set of scores for each unit or student because each would be assigned to participate in only one of these programs. Thus, within these columns, values with an asterisk represent the potential outcome for the observed reading program in which the student participated (as indicated in the column “W”). If column “W” has a value of 0, the student received Program 0. A value of 1 in column “W” indicates that the student received Program 1. For example, Student 1 was allocated to the group that received Program 0, and, after the intervention was completed, the reading score (i.e., the outcome) for that student in Program 0 was 16. If the student had been allocated to Program 1 instead, we will pretend that the student’s reading score would have been 11. However, we would have observed only the score under a single condition, Program 0 in this case, so a causal effect for an individual student is not estimable. Thus, values in the "Y(1)" and "Y(0)" columns without an asterisk represent what in reality would be the “missing data” (the reading scores that were not observed) for the reading program in which the student did not participate. In the table, these scores are not shown as missing because we are pretending that a student could actually be in the other program at the same time.

For our example, the unit causal effect is defined as the reading score after exposure to reading Program 1 minus the reading score observed after exposure to reading Program 0, as shown in the column “Y(1) – Y(0).” In this hypothetical scenario, we could compute the causal effect for each individual student. By looking at the information in column “Y(1) – Y(0)” together with the information in column "W” identifying the assigned reading program, we could see that each student was allocated to the program for which the student received a higher reading score. In other words, Student 1 received Program 0, and the “Y(0)” outcome score is higher for this student than his or her “Y(1)” score. The same is true for Student 2 (the other student assigned to Program 0). The two students assigned to Program 1 (Students 3 and 4) received a higher outcome score in Program 1 than Program 0. Finally, the population mean for each condition is computed using all the scores in column “Y(1)” and “Y(0),” respectively. The observed means used only the observed scores, or those marked with an asterisk to indicate the program in which the student actually participated.

Recall that in practice, we would never directly observe the same student exposed to both of these two potential outcomes because a student can only participate in one program at a time. In our example, each student achieved better scores in the assigned program than in the other program. In other words, each student purposefully was assigned to the “better” reading program for that individual. Note that just for the purposes of this example, we are pretending that it is known which reading programs are “better” for each student. But in a real research study, researchers would not talk about which program would be better for each student because all the information available about both programs would be based on group averages.

What general conclusions do the data support? Looking at the difference between the observed means, the average reading score for those students taught with Program 1 is 1 point more than for those taught with Program 0. It may seem obvious that reading Program 1 is superior for the typical student, which the four students in the table represent, but this conclusion is wrong. The population mean indicates that the typical causal effect favored reading Program 0, with an average causal effect of 3 points. When random allocation is used, students are not purposefully assigned to the program that is considered “better” for them, as was the case in our example. This illustrates that, in general, simply comparing observed values under treatment conditions only works when students are randomly allocated to conditions. If the assignment is not random, any analysis of the observed data must take this into account.

Ignorability

A related concept in causal inference is ignorability. In an experiment, the assignment of units (e.g., students, classrooms, schools) to treatment conditions should be independent of the potential outcome (i.e., not based on what is the “better” program for the student, as was done in our previous example) after controlling or using a statistical model to account for key covariates. These are characteristics that are known to be related to students’ performance on the outcome measure such as race, socioeconomic status, language background, etc. The ignorability assumption of the treatment assignment indicates that if the key covariates cannot be controlled, the outcome represents a correlation and not a causation. In other words, it is not possible to make a causal inference that the reading program was responsible for students’ reading performance because it is also likely that their performance was due to those covariates (e.g., race, socioeconomic status, language background).

Thus, it would be important to consider how students ended up receiving a particular reading program and what mechanism influenced those decisions. In fact, this is the most important consideration in a causal inference study.

You may come across statements related to this concept when reading a research article. Below are some explanations of the terms you might encounter related to ignorability:

  • A statistical model is referred to as endogenous, which means that the ignorability assumption was not satisfied (i.e., there are variables that were not measured or controlled for).
  • When a statistical model is “not identified,” it means that the model estimated does not represent the “true” model of how the variables relate to each other and to students’ outcomes.

In classical randomized experiments, each student has an equal probability of receiving each treatment. When the goal is to attribute a causal effect to a reading program, and random allocation is not an option, we need to have a good understanding of how different student characteristics or other variables relate to the reading program, the allocation of students to conditions, and the outcome being measured.

Final Thoughts

Observational designs (i.e., designs that lack random allocation) usually do not rule out alternative explanations for a relationship between two variables; therefore, the relationship may not be causal. Rather, it may be due to a third variable or confound. To make a causal inference statement, the independent variable (the reading program in our examples) is manipulated in the different groups, and all other variables that might affect the independent variable are held constant. If certain events or attributes cannot be manipulated for all students in the study to see what happens when students are exposed to them or not, then they cannot be considered causes in experiments. It is much harder to discover the effects of non-manipulable causes.  However, there are other causal inference frameworks that allow for causal statements without manipulation. Typically, those frameworks rely on strong causal stories (using theory and previous evidence) to understand what variables need to be included in the statistical models. But this topic deserves its own post in the future.

References

Holland, P. W. (1986). Statistics and causal inference. Journal of American Statistical Association81, 945–960. https://doi.org/10.2307/2289064

Rubin, D. (2005). Causal inference using potential outcomes. Journal of American Statistical Association100, 322–331. https://doi.org/10.1198/016214504000001880  

Rubin, D. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology66, 688–701. https://doi.org/10.1037/h0037350