Understanding the validity of CBMs can help educators select universal screening measures that accurately identify students who would benefit from additional support in reading.

Research Article of the Month: September 2024

Thursday, September 12, 2024

Written by

Qian Wang, MEd, PhD,

Kate Will, MA

This blog post is part of our Research Article of the Month series. For this month, we highlight “Universal Screening in Grades K–2: A Systematic Review and Meta-Analysis of Early Reading Curriculum-Based Measures,” an article published in the Journal of School Psychology in 2020. Important words related to research are bolded, and definitions of these terms are included at the end of the article in the “Terms to Know” section.

Why Did We Pick This Paper?

A universal screener is a tool teachers use to identify students who may be at risk for literacy difficulties. It is an assessment given to all students several times a year to identify which students are below, at, or above a certain benchmark at a specific point in time. Screening helps teachers identify students who could benefit from additional support or accelerated instruction. It is especially important in early-elementary grades so potential difficulties with reading can be identified and addressed as early as possible.

One kind of assessment used for universal screening is a curriculum-based measure (CBM). CBMs are short, timed assessments that track an individual student’s progress toward a learning goal. Some CBMs measure a specific skill associated with reading ability, such as word identification or letter-sound knowledge. For example, a CBM that assesses letter-sound knowledge might require a student to produce the sounds associated with each letter from a list.

Given the prevalence of CBM assessments in universal screening, it is important to understand the validity of these assessments—the extent to which these assessments measure the skill they are designed to measure. This includes concurrent validity (the extent to which student performance on one assessment is confirmed by their performance on another assessment designed to measure the same skill) and predictive validity (the extent to which student performance on one assessment predicts their performance on another assessment administered at a later time). Understanding the validity of CBMs can help educators select universal screening measures that accurately identify students who would benefit from additional support in reading.

What Are the Research Questions or Purpose?

The researchers aim to evaluate the validity of early reading CBMs administered in Grades K–2. The purposes of the study are as follows:

Measure concurrent and predictive validity of early reading CBMs with other measures of reading outcomes.
Determine whether administration lag (the time lapse between the administration of the CBM and the administration of another assessment of student reading outcomes) affected the relationship between students’ performance on the CBM and another outcome measure.

Note: The authors also intended to assess the classification accuracy of CBMs, but the majority of articles in this meta-analysis did not report the data necessary for this analysis, so the authors were unable to address this purpose of the study. To learn more about the classification accuracy of screeners, read our January 2024 Research Article of the Month.

What Methodology Do the Authors Employ?

The authors conducted a meta-analysis of 54 empirical studies that examined the relationship between early reading CBMs and other measures of reading outcomes. To be included in the analysis, the studies needed to:

Include an early reading CBM assessment as a predictor of a reading outcome
Focus on students in Grades K–2
Report correlation coefficients or the figures necessary to calculate them
Specify the timeframe in which the measures were administered
Occur in a general education classroom
Be published in English

For each of the included studies, researchers examined the students’ performance on a CBM and another student reading outcome measure (e.g., broad reading achievement, reading comprehension and vocabulary, language and listening, oral reading, phonics, or phonological and phonemic awareness).

Researchers also took into account other variables in the studies that could affect students’ reading outcomes. These variables included:

Student demographics
- Gender
- Grade
- Race and ethnicity
- Language status
- Free and reduced lunch status
- Special education status
Screener information
- Publisher (FastBridge, DIBELS, easyCBM, aimsweb)
- Measure type (onset sounds, letter names, phoneme segmentation, word identification, nonsense words)
- Screening seasons (fall, winter, spring)
Reading outcome measure (broad reading achievement, reading comprehension and vocabulary, language and listening, oral reading, phonics, phonological and phonemic awareness)
Administration lag in months

The researchers conducted separate random effects models for concurrent and predictive validity to estimate the correlation between early reading CBMs and other reading outcome measures. The researchers calculated the correlation coefficients for each reading outcome separately, when possible. However, when there were insufficient data to support this kind of analysis, the outcome measures were aggregated to ensure more reliable results.

For the purposes of this study, correlation coefficients were classified as concurrent when the reading outcome measure was administered less than a month after the CBM, and as predictive when the reading outcome measure was administered a month or more after the CBM. The researchers examined the extent to which this administration lag moderated the correlation between early reading CBMs and other reading outcome measures.

What Are the Key Findings?

Alphabet Knowledge

In the studies included in this meta-analysis, there were two early reading CBMs that measured alphabet knowledge: letter sounds and letter naming. These CBMs were only administered in kindergarten and first grade. There was a large concurrent correlation (r = 0.552) between the letter sounds CBM and the composite outcome measure, consisting of phonics, oral reading, and broad reading. Similarly, there was a large concurrent correlation (r = 0.571) between the letter naming CBM and the aggregated outcome measure, consisting of phonics, broad reading, and oral reading. Concerning the abilities of these CBMs to predict students’ future performance, the letter sounds CBM had a large predictive association (r = 0.56) with complex reading skills (a composite of phonics, comprehension, and broad reading outcomes). Similarly, the predictive association between the letter naming CBM and other reading outcomes was also large, ranging from 0.52 for broad reading to 0.64 for oral reading. However, the predictive ability of the letter naming CBM was moderated by administration lag. For every month increase in the lag between the administration of the letter naming CBM and the administration of the outcome measure, the correlation coefficient decreased by 0.01. In other words, the more time that passes between the administration of both assessments, the less accurately the letter naming CBM predicts a student’s later performance.

Phoneme Awareness

In the studies included in this meta-analysis, there were two early reading CBMs that measured phonemic awareness: onset sounds and phoneme segmenting. Onset sounds was only administered in kindergarten, whereas phoneme segmenting was administered in both kindergarten and first grade. The correlation coefficients of these CBMs were smaller than those of the alphabet knowledge CBMs. The onset sounds CBM had a medium concurrent correlation (r = 0.43) with more complex reading skills. Similarly, the phoneme segmenting CBM had a medium concurrent correlation with both phonics and phonological awareness (r = 0.43) and complex reading skills (r = 0.34). Concerning the predictive abilities of these CBMs, there was a medium predictive correlation (r = 0.424) between the onset sounds CBM and the aggregated outcome measure, consisting of phonics, oral reading, broad reading, and comprehension. Similarly, there was a medium predictive correlation between the phoneme-segmenting CBM and the other outcome measures, ranging from 0.350 for oral reading to 0.376 for phonics. These predictive correlations were not moderated by administration lag.

Decoding

Two CBMs in the studies included in this meta-analysis measured decoding skills: nonsense words and word identification. The nonsense words CBM was administered across Grades K–2, whereas the word identification CBM was administered primarily in first grade. Concurrent associations between the nonsense words CBM and other reading skills were large, ranging from 0.60 for broad reading ability to 0.75 for oral reading. Similarly, the word identification CBM had a large concurrent association with complex reading skills (r = 0.70). Concerning the ability of these CBMs to predict students’ later performance, the nonsense words CBM had large predictive correlations with outcome measures, ranging from 0.52 for oral reading to 0.68 for broad reading. However, administration lag was a significant moderator for the nonsense words CBM. Predictive correlations between the word identification CBM and other reading outcomes were also large, ranging from 0.71 for broad reading and comprehension to 0.83 for oral reading. In contrast to the nonsense words CBM, administration lag was not significant for the word identification CBM.

What Are the Practical Applications of Key Findings?

The study explores the validity of early reading CBMs in identifying at-risk students in Grades K–2. One of the key findings is that CBMs can reliably predict later reading outcomes, particularly for skills including phonics, oral reading, and letter naming. Practically, these findings underscore the importance of timely and frequent assessments to accurately predict students’ later performance and shed light on their reading development. For teachers, this suggests that using CBMs frequently, such as multiple times per school year, can help them identify students who could benefit from additional support or early intervention. Additionally, schools can choose CBM tools that align closely with their instructional goals and refine their universal screening processes to identify early signs of potential reading difficulties. For example, if the goal is to predict decoding skills, CBMs that assess nonsense words and word identification have demonstrated strong predictive validity. These CBMs could support data-based decisions in the classroom and help ensure that students receive appropriate support in the early stages of reading development.

What Are the Limitations of This Paper?

Due to the lack of data reported in the included studies, the researchers were unable to explore how student demographic factors, such as race and ethnicity, socioeconomic background, or English Learner status might have influenced the findings. Thus, it is unclear whether CBMs perform comparably well for all student groups. Furthermore, it is difficult to know if these assessments might have different levels of accuracy or predictive validity for students from diverse backgrounds. Further research on these demographic factors would benefit educators and schools to ensure that CBMs are equitable and effective for all learners. Additionally, the study showed large variability, or heterogeneity, between the included studies, in terms of the sample sizes, regions, and specific CBM tools used. While a robust variance estimation method was used to account for this, the variability in how and when CBMs were administered across different contexts could affect the generalizability of the results.

Terms to Know

Validity: Validity refers to the extent to which an assessment measures what it was designed to measure.
Concurrent validity: Concurrent validity is the extent to which one measurement is confirmed by another measurement administered at roughly the same time.
Predictive validity: Predictive validity is the extent to which a student’s performance on one measure predicts their performance on another measure later. For example, if a student’s score on a nonsense word reading assessment predicts their later performance on a standardized state assessment, this nonsense word reading assessment would have predictive validity.
Classification accuracy: Classification accuracy refers to the extent to which one measure (e.g., a universal screener) accurately identifies students as “at risk” or “not at risk” based on their performance on another measure (e.g., a standardized state assessment). An assessment with high classification accuracy minimizes false positives (i.e., proficient readers who are incorrectly identified as at risk) and false negatives (at-risk students who are incorrectly identified as proficient readers). Using screeners with high classification accuracy is important to ensure that time and resources are allocated efficiently and that students receive the appropriate level of support in reading.
Empirical: Empirical research is a way of gaining knowledge through observation or experience. Empirical research contrasts with theoretical research, which relies on systems of logic, beliefs, and assumptions.
Predictor: A predictor variable, is a factor that influences another variable in a correlational study. For example, the length of a reading intervention in total minutes (predictor variable) may forecast a student’s composite reading score.
Correlation coefficient: A correlation coefficient is a measure of the strength of the relationship between two variables. A correlation between variables means that when one variable changes, another variable also changes in a specific direction. For example, if the length of intervention and reading comprehension are correlated, then when the length of reading intervention increases, student reading comprehension will also increase. A common correlation coefficient is Pearson’s correlation coefficient, which is represented by r. Pearson’s correlation coefficient ranges from -1 to 1. Negative values indicate a negative correlation between variables (as one variable changes, the other variable changes in the opposite direction); positive values indicate a positive correlation (as one variable changes, the other variable changes in the same direction). The absolute value, or distance from zero, indicates the strength of the relationship between the variables. A correlation coefficient of ±0.2 is generally considered a small correlation, ±0.3 a medium correlation, and ±0.5 a large correlation (Cohen, 2013).
Random effects model: A random effects model is a type of statistical model that measures how an independent variable affects a dependent variable across a number of different samples or studies. Unlike a fixed effects model, a random effects model accounts for variability between different groups in a dataset.
Effect size: In statistics, effect size is a measure of the strength of the relationship between two variables in statistical analyses. A commonly used interpretation is to refer to effect size as small (g = 0.2), medium (g = 0.5), and large (g = 0.8) based on the benchmarks suggested by Cohen (1988), where “g” refers to Hedge’s g, a statistical measure of effect size.
Moderator: Moderators are variables that affect the relationship between two other variables. For example, the relationship between the length of a reading intervention and reading comprehension may be stronger for students who are at risk for reading disabilities versus students who are not at risk. In this case, at-risk status would be a moderator.
Generalizability: Generalizability refers to the extent to which the findings of one study can be extended to other people, settings, or past/future situations.

References

Cohen, J. (2013). Statistical power analysis for the behavioral sciences. Routledge.

January, S. A., & Klingbeil, D. A. (2020). Universal screening in grades K–2: A systematic review and meta-analysis of early reading curriculum-based measures. Journal of School Psychology, 82, 103–122. https://doi.org/10.1016/j.jsp.2020.08.007