Wednesday, March 24, 2010

Kentucky’s new NAEP reading results look good

Maybe too good

The new National Assessment of Educational Progress results for the 2009 reading assessment were released today, and Kentucky’s overall performance really looks good.

We were the only state in the nation to post a statistically significant score increase in both fourth and eighth grade (NAEP does not report on state high school reading performance).

But, the results almost look TOO good.

For example, this map, found on page 13 in the new NAEP Reading Report Card, shows only one other state, Rhode Island, also made statistically significant progress in fourth grade reading since 2007.

This looks like a great performance for Kentucky.

But, hold on. All states have been pushing reading hard since No Child Left Behind was enacted in 2001. Can it really be that hardly any of them are making progress?

Furthermore, this table, which is extracted from a larger table found on page 14 in the Report Card, shows something that seems to conflict with what the map shows for Kentucky.

While Kentucky supposedly had an overall statistically significant score increase, neither the white nor the black fourth graders in the Bluegrass State made statistically detectable progress.

How is that possible?

Notice that in both Washington DC and Rhode Island, the only other places where reading progress supposedly was made, at least one racial group made progress. Not so in Kentucky. Strange, that.

The answer here may be due to some statistical sampling issues. Those statistical rules require larger score changes before statistical significance can be established when sampled groups are smaller.

However, with whites comprising 84 percent of the 2009 sample tested in Kentucky (Report Card, Page 52), it seems like if the overall scores made a statistically significant rise, then the whites should have a statistically detectable increase as well.

The fact that this didn’t happen makes me wonder if something unusual is happening in this NAEP scores report.

There is another curious thing. Overall, Kentucky’s average fourth grade reading proficiency rate on the NAEP was reported as 36 percent, while the national average proficiency rate was only 32 percent. However, when the disaggregated proficiency rates for whites and blacks are examined, things look quite different. Both Kentucky’s whites and blacks scored LOWER than their peers across the nation. Go figure!

There are some other issues in the new data that may make Kentucky’s accomplishment a little less impressive.

For example, our exclusion rate for students with learning disabilities, once again, was well above the national average. That tends to inflate our scores because these particular students can be expected to score very low if they were to take the NAEP. Their exclusion abnormally raises our scores compared to other states.

In Kentucky, fully seven percent of all the students the NAEP wanted to test were ultimately excluded because they were determined to be too disabled to sit for a reading test. Across the nation, the exclusion rate for learning disabled students was only four percent, a decrease of one point from 2007. Kentucky’s exclusion rate was seven percent in 2007, as well.

Anyway, NAEP analysis has gotten quite involved, and simplistic examination of overall scores which some others like to engage in can be highly misleading. I’ll be looking at this some more, so stay tuned.


niki hayes said...

You might be interested in my published report about the NAEP scores for Austin, TX. Check

Joan NE said...

It is highly doubtful that there are not at least SOME shortcomings with the NAEP. I make a bet that the NCES experts and the authors of the NAEP are quite aware of all the shortcomings of the NAEP, and have done their best, within reason, to make this the best test it can be. Still, experts say that the NAEP is the gold standard against which all other same-grade educational assessments can be measured. As I understand it, It is ONLY because we have the NAEP, and because it meets high standards for test validity, that we know of the phenomenon of score inflation.

I am not troubled by the two figures that Mr Innes gave. []. His explanation for his observation about the first figure is correct. The seemingly odd observation is easily explained by consideration of sample size effects on statistical significance. Interpreting Statistical Significance subsection of the The Technical Notes setcion of the NAEP report that Mr. Innes cited [] gives more information about and a link to a detailed explanation of the statistical analysis.

The second figure Mr. Innes presented has a simple expanation. The weighting factors for Kentucy lead to this result. K\entucky must have a higher proportion of white students than does the nationally aggregated sample, so the "all students" result for Kentucky is more strongly slanted toward the White Kentucky Students score (39%) than is the National sample "white students" score (41%).

A satisfactory answer - for me at least - to Mr Innes' first question is in the section title "Interpreting Statistical Significance." Here is a quote from that section.

"When an estimate has a large standard error, a numerical difference that seems large may not be statistically significant. Differences of the same magnitude may or may not be statistically significant depending upon the size of the standard errors of the estimates. For example, a 2-point change in the average score for White students may be statistically significant, while a 2-point change for American Indian/Alaska Native students may not be."
Here is a paraphrase of this quote that may speak to Mr. Innes questions: Suppose that the 2007-to-2009 change in the average scores for the "All Kentucky 4th Grade NAEP test takers" and the same statistic for each of the ethnic sugroups is two points. The sample size for the first student group is larger - and for all but the white student subgroup - considerably larger than for each of the enthnic sub-groups of 4th-grade NAEP test-takers in Kentucky. A 2-point change in the average score for the "All students" for may be statistically significant, while a 2-point or larger change for any of the subgroups may not be statistically significant.

Joan NE said...

There is also nothing in Niki Hayes' report[] that worries me.

Page 42 of the the NAEP report that Mr Innes cited (i.e., first page of Technical Notes Section or provides for me a quite satisfactory answer to Niki's questions about how students are selected for taking the NAEP. The student selection procedure and statistically methods to identify and address potential bias in the sample of test-takers seems to me to be quite adequate.

The other issue Niki emphasizes is lack of disaggregation of scores of accomodated and non-accomodated schools. I don't see any statistical problem with the way the NAEP gives an overall score for school districts, and which includes accomodated students.

I agree with Niki that disaggregation can reveal very important information. I am pretty sure that researchers (and probably any member of the public) can down load the raw or processed data in order to conduct such valuable disaggregation studies. The report Niki cites doesn't necessesarily represent the full span of analyses that have been conducted or can be conducted with the NAEP data base.