Tuesday, September 21, 2010

Report builds mountain out of education ant hill

The Center on Education Policy (CEP) recently issued a report titled “State Test Score Trends Through 2008-09, Part 1, Rising Scores on State Tests and NAEP,” lauding a supposed discovery that improvement on both state public school tests and the National Assessment of Educational Progress (NAEP) is moving in the same direction and looks much better than previously reported.

Unfortunately, the report turns the mathematics of statistical sampling on its ear while attempting to claim that the performance of most states is improving on the NAEP reading assessments.

The facts: Between 2005 and 2009, the period of major concern in the CEP report, NAEP’s own Reading Report Card for 2009 clearly shows that a strong majority of the states did not post statistically significant improvements in either fourth- or eighth-grade reading. That general lack of progress is found both for the percentages of students scoring at the level NAEP calls “Basic” and at the level NAEP calls “Proficient.” Trying to claim otherwise, which the CEP report attempts to do, simply is not statistically defensible.

There are more issues with the CEP report, such as the selection of an unrealistically low target performance level – NAEP Basic – as a suitable comparison level for state assessment programs. If you would like to see more on that, just click the “Read more” link below.

To begin, first let’s expand on the CEP report’s incorrectly drawn conclusions from the NAEP reading data.

The statistics matter

The NAEP is a sampled assessment. Only a fraction of the students in each state take each test. Thus, NAEP results include statistical sampling errors, just like reports of voter polling always include margins of error.

NAEP’s sampling errors make it impossible to confidently detect small changes in a state’s performance, just like it is impossible to call a close election before all the votes get counted. Unless a performance change is fairly large on the NAEP, claims that real improvement has occurred lack any validity.

For instance, consider Table A-18 in the 2009 NAEP Report Card. This table shows the percentage of students scoring at or above the “NAEP Basic” level in various states for a number of different years.

Table A-18 shows that while the exact figure varies from state to state, in some cases (e.g. New Mexico) states may need changes of more than 4 points before NAEP can confidently detect genuine improvement.

That leads to a serious methodological problem in the CEP report. On Page 6 the authors of the CEP study write, “We did not constrain comparisons by limiting NAEP data to statistically significant changes.” Instead, the CEP report treats any numerical increase in NAEP numbers as a valid and true performance increase.

That action seriously violates some very basic statistical rules, including very clear comments found in the NAEP’s own documentation and reports.

For example, this comment appears on Page 6 in the 2009 NAEP Reading Report Card:

“Only those differences that are found to be statistically significant are discussed as higher or lower.”

Of importance here, the experts who run the NAEP obviously do not consider small changes in either direction to be meaningful. It’s statistically invalid to claim a true increase in performance on NAEP occurred when the blur caused by statistical sampling error won’t support such an assertion. In the cases of small score changes on the NAEP, about the best we can confidently say is performance is flat.

How does the CEP methodology impact the report’s conclusions?

The impact of the CEP’s methodological error on the report’s findings becomes quite apparent when you examine the percentages of eighth grade students who scored at or above NAEP Basic in reading in 2005 and 2009. Those rates are listed in Table A-18 in the NAEP 2009 Reading Report Card.

Here is an extract of that table with some added highlights for the states in question (Click on the picture to enlarge).

In this NAEP table, all of the 2005 figures that are statistically significantly different from the 2009 rates are marked with an asterisk. Where no asterisk appears next to a state’s 2005 score, the limitations in NAEP’s sampling statistics do not allow us to claim that score is different from the 2009 score.

Now, consider what the CEP report claims. CEP shows in its Table 1-A that 17 states had a gain in NAEP Basic reading performance for their eighth grade students between 2005 and 2009. However, as highlighted by the arrows in the figure above, the NAEP itself only supports such a claim for six of those 17 states (CA, FL, MD, PA, TX and UT).

The other 11 states CEP lists as having made gains in eighth grade reading (AK, AL, AZ, CO, MT, ND, NM, NV, OH, TN and WI) actually have results that NAEP itself declines classify as being either higher or lower than the 2009 results. The best conclusion we can draw from the NAEP is that those 11 states had flat results between 2005 and 2009 and should be listed in the CEP’s Table 1-A under the category of “# of states with no change.”

Thus, while the CEP’s report shows that 20 states had gains in their Proficiency rates on their own state eighth-grade reading tests, the NAEP only confirms that six of those states really improved. In the other states that CEP claims also improved – if there is improvement at all – it’s too small for the NAEP to reliably detect. That is a very different picture from that offered in the CEP report.

A similar situation exists with the fourth grade NAEP reading data. To briefly summarize, while CEP claims that 16 states showed improvement against the fourth grade NAEP Basic standard, the NAEP Report Card for 2009 clearly can only support that claim for five states.

Is NAEP Basic a suitable target for state assessment evaluation? Evidence from Kentucky

It is also important to note that the improvement claims from the CEP are based on comparison of state tests to the watered-down target score of NAEP Basic. NAEP documents make it clear that NAEP Basic only denoted partial mastery of material, a fact that even the CEP report admits.

Aside from the clear evidence in NAEP documents that NAEP Basic isn’t a good comparison for state testing, additional evidence from other testing in Kentucky disputes the CEP’s assertion that “Proficient” level scoring from state tests should be compared to NAEP Basic, not NAEP Proficient results.

A small study I assembled in the freedomkentucky.org Wiki site using Kentucky’s EXPLORE test results indicates that if we are interested in finding out how well schools are preparing students for postsecondary education and living wage occupations in the workforce, then NAEP Proficient – not NAEP Basic – truly is the appropriate target for state test comparisons.

For example, this graph, taken from that report, shows surprisingly good agreement between NAEP Proficient results and the percentage of students in the same student cohorts who reached or exceeded the EXPLORE Benchmark Score that indicates students are on track for college and career readiness.

You can learn more by clicking the link above to the complete study, which also looks at math performance. Certainly, the evidence from Kentucky’s EXPLORE testing supports the idea that NAEP Proficient is much better aligned to what students really need than NAEP Basic is.

By the way, the Kentucky 2008-09 Interim Performance Report (access from menu here) shows Kentucky Core Content Test results of 68.05 percent proficiency in eighth grade reading. Thus, both NAEP and EXPLORE tell us the Kentucky Core Content Tests are seriously inflated.

That inflation got worse when Kentucky reduced the rigor of scoring of the Kentucky Core Content Tests in 2007. That same scoring inflation, once it became obvious, played a role in the Kentucky Legislature’s vote in 2009 to disband the Kentucky Core Content Tests as soon as decent replacements – ones that will correlate to college preparedness – can be brought on line.

Similar testing changes are in the wind nationwide. Virtually every state has committed to adopting the new Common Core State Standards and to revising their own state assessments. Pressure to do this comes from widespread understanding that state tests are generally providing seriously inflated information about real student proficiency.

Thus, while the CEP report attempts to downplay inflation in existing state assessments, it appears those assessments are unlikely to remain, anyway, at least not in their currently undemanding format. For sure, new tests are coming to Kentucky.


Dick Schutz said...

And For Sure, the new tests are going to look a lot like the old tests. The Common Core English Language Arts is built on the NAEP "Framework."

Richard Innes said...

RE: Dick Schutz's comment

I share that concern. Some advanced information about what the new testing program might include talks about a lot of open response written questions and even portfolios.

Portfolios are a good instructional tool for teachers, but they are a major headache when included in an accountability program, as Kentucky learned between 1992 and 2009 when we finally called it quits.

Furthermore, too many open response written questions lead to all sorts of problems, as well.

Hopefully, someone will wake up and consider the Kentucky experience before these tests get pushed nationwide.

Maybe we will be helping with that process.

Ray Davis said...

The educational bureaucracies at state levels were bad enough. Then, the federal gov't became involved in a big way. For years we have had elective and administrative government at both levels in league with a heavily unionized labor force. Result? The product, the education of our youth, has declined in quality, and its cost has increased faster than overall inflation.

Tweaking this grossly inefficient, inept, top-down bureaucracy won't produce world-class education. It will only produce another "program", like KERA in KY, that professional educators, academics and politicians will herald as the "solution". A decade later we will be "sold" another "solution". I've heard this song a few times before!

It's time to ditch the failed, one-size-fits-all collectivist approach and try approaches our country was founded upon--competition, freedom to innovate at small levels, and giving choice and educational vouchers to the parents of the children receiving the educational services. Let public, private and parochial schools compete for the educational dollars. If this is done, quality of education will rise, costs will decline, good teachers will see their incomes increase, and poor teachers will leave the profession. Most public schools will survive because the teachers' and administrators' job security will be threatened, and they will rise to the challenges. The outcome? In twenty years the U.S. will again become one of the world leaders in education.

The only way the products of any entrenched bureaucracy can be significantly improved is to threaten its existence. Tweaking is fruitless.

Richard Innes said...

RE: Ray Davis' Comments

We absolutely agree that choice and competition are badly needed.

However, we are not going to get rid of the entrenched public education system overnight, and we must find ways to also kick it into much higher performance levels.