Wednesday, November 24, 2010

No, the goal in NAEP is NOT for kids to score only “Basic”

Education apologists in Kentucky still refuse to admit it, but a lot of experts now understand that our education system needs to strive for much higher student performance than we are currently getting.

Certainly, as I recently blogged, the National Assessment Governing Board, which oversees the National Assessment of Educational Progress, makes that clear with their very simple and easy to understand statement below.


The governing board doesn’t consider the lower score of NAEP “Basic” to be a suitable performance target. NAEP “Basic” only signifies partial mastery of subject material. Per the board, that’s not good enough. “Proficient” is the goal, and nothing less.

But, education ‘status quo’ers’ like Jefferson County Teachers Association president Brent McKim don’t get that. They don’t want you to know it, either.

Offering up a big smoke screen of nonsense that mistakes what is for what needs to be, McKim told the Kentucky Tonight audience on Monday that NAEP “Basic” is fine target. Basically, McKim claimed that partial subject mastery is an acceptable target for him.

Well, setting unrealistically low education goals only works for status quo educators who don’t have a clue about what our kids are going to need to compete in the increasingly more competitive world economy.

However, setting unrealistically low goals is not in line with what the people who actually run the national assessment are telling us. So, one last time, the people who govern the NAEP have spoken, and they are telling us to shoot for NAEP “Proficient,” not something less.

19 comments:

Eternal Pessimist said...

No wonder Jefferson County Schools are in such a mess with leadership like Mr. McKim. Seems like there is another person who needs to be fired in the Jefferson County School District.

Bert said...

NAEP Proficient has been the National Assessment Governing Board’s goal for America’s students since it established the achievement levels about twenty years ago.

In 2001, the Board clarified what Proficient means: “In particular, it is important to understand clearly that the Proficient achievement level does not refer to ‘at grade’ performance. Nor is performance at the Proficient level synonymous with ‘proficiency’ in the subject. That is, students who may be considered proficient in a subject, given the common usage of the term, might not satisfy the requirements for performance at the NAEP achievement level.”

The reality is that NAGB’s policy is not federal law. Unfortunately, NCLB policy is federal law. Since 2002, NCLB has required states to report the percentage of their students performing “at grade.” As a result, “State assessments often define ‘proficiency’ as solid grade-level performance, often indicating readiness for promotion to the next grade. NAEP’s policy definition of its ‘Proficient’ achievement level is ‘competency over challenging subject matter’ and is implicitly intended to be higher than grade-level performance.” -- Andrew Kolstad, Senior Technical Advisor, Assessment Division, National Center for Education Statistics

NCLB also suggests that the Secretary of Education may use NAEP (as well as other data) to confirm state test results. In 2004, a statistical study commissioned by the NAEP Validity Studies Panel concluded that NAEP Basic is the NAEP statistic most comparable to the NCLB defined “state proficient.” If the intent is to use NAEP achievement scores confirm (or debunk) state NCLB results, it is clearly inappropriate to compare NAEP Proficient with state proficient.

Nothing said thus far precludes a study of the change over time in a state’s percentage of students at or above NAEP Proficient (i.e., the state’s percentage of students performing higher than grade level). There is no doubt that the higher the percentage of our students at or above NAEP Proficient, the better off America will be. The Board has adopted a worthy aspirational goal

Anonymous said...

Mr. McKim represents the teachers union. No where is it stated that the teachers union is passionate about anything besides more money, more control and more protections.

The teachers union does a very good job doing what they are committed to do.

It is Kentucky's education leaders that are being led around by the nose by the union and that are enabling lackluster performance.

Every time Kentucky education leaders let bad performance go unchallenged then a lower standard of what's acceptable has just be set.

Richard Innes said...

RE: Comments from Bert

Towards the end of your comments, you say a 2004 study found that NAEP "Basic" most closely corresponds to what states call "Proficient." That is generally true. But, it isn't acceptable performance for the state assessments.

You also say, "If the intent is to use NAEP achievement scores confirm (or debunk) state NCLB results, it is clearly inappropriate to compare NAEP Proficient with state proficient."

That is totally wrong.

You are basically saying it's OK for states to shoot for low standards on their tests that don't relate to what kids really need.

And, you also say it's OK to use NAEP to confirm that states are shooting for a low target, one NAEP's management team does not accept, and to feel everything is just fine.

Anyway, most people don't agree, which is why there is a major effort to create the Common Core State Standards and corresponding assessments to lift the sights of many undemanding state assessment programs.

Bert said...

My point is that states should not be condemned for complying with the provisions of the No Child Left Behind Act. Since 2002, states have been required to set state grade-level standards, to administer state on-grade-level tests, to deem “proficient” students who meet or exceed the state’s grade-level expectations, and to report the percentages of students scoring state “proficient” or above. The state’s content standards, the state tests, and the state performance standards all had to survive a federal Peer Review conducted by a team of curriculum and assessment experts to qualify for federal Title I funds. If you find that unacceptable, then condemn NCLB and the Peer Review process and lobby Congress to pass a suitable replacement (or to get out of education altogether). Just don’t condemn state educators unfairly!

Since NAEP Proficient defines a higher than grade-level performance, it is wholly inappropriate to compare it with the state’s NCLB-required grade-level performance or state proficient for any purpose. NAEP Proficient typically begins at the B+/A- classroom performance level, while NCLB-defined state proficient typically begins at the C-/C classroom performance level. What useful information does one gain by comparing the percentage of students scoring “B+/A- or higher” to the percentage of students scoring “C-/C or higher?” None whatsoever! Does anybody believe that Congress actually intended in NCLB that every child in the country (including students with disabilities and limited English proficiency) to be performing above grade-level, NAEP Proficient, by 2014? No!

Richard Innes said...

RE: Burt at November 26, 2010 1:23 PM

First, I do not defend the NCLB peer review process. I think it was a joke.

As far as I can tell, the peer review ran largely as a rubber stamp for whatever standards the states submitted.

The complete lack of any real consistency in the peer review process is evidenced by the wide range of NAEP equivalent scores that were recently reported for state Proficiency scores.

For example, Table 1 in the NCES report, “Mapping State Proficiency Standards Onto NAEP Scales: 2005-2007” (On line here: http://nces.ed.gov/nationsreportcard/pdf/studies/2010456.pdf) shows that what states call Proficient for grade 4 reading in 2007 mapped to NAEP equivalent scores running from a low of 163 for Mississippi’s state assessment to a high of 232 in Massachusetts. That is simply an enormous range on this test. For comparison, the 2007 NAEP Reading Report Card shows that students with learning disabilities who took the NAEP scored 190 that year. Meanwhile the national public school average for reading was only 220. So, Mississippi’s “Peer Reviewed” standard was lower than the score learning disabled students achieved while that in Massachusetts was well above the national average.

What kind of credibility can a review process that produced that have? The data being generated with NCLB is simply not comparable from state to state, and it is generally highly misleading within most states.

What is “grade level performance?”

Is it fair to consider current grade level performance to be the actual national average score on the NAEP?

If this is reasonable – and I think it is – and we look at the 2007 NAEP grade 4 reading score, which was 220, then Table 1 in “Mapping State Proficiency Standards” shows most states set their proficiency rate well below typical grade level performance. In fact, even if we look at the 1998 NAEP grade 4 reading score of 217, the peer review allowed most states to set their state grade 4 reading “Proficiency” standard well below this reasonable definition of grade level performance, as well.

Do you really want to defend that?

By the way, I looked at Section 1111 in NCLB, which covers the state assessments. I didn’t see anywhere that said states were to set “Proficient” according to a “grade level” sort of standard. It just calls for states to set it at a high level that would encourage progress. Maybe Burt knows of something in the enabling regulations (which I think seriously undermined the NCLB law) that he can point to. But, that really doesn’t matter. I don’t think Congress intended for states to only shoot for the status quo in their 2001 classrooms (the year the bill actually passed Congress).

Bottom line: States were allowed to set seriously under-demanding targets for Proficient when NCLB began. A lot of research shows that. The research also points to NAEP Proficient being pretty close to what kids need to be on track for college and careers. Excusing the public schools for setting lower standards just to make themselves look good is not helping our kids.

Bert said...

RE: Richord, 11/27/10 10:30AM

Sorry, but I cannot defend “Mapping State Proficiency Standards onto NAEP Scales: 2005-2007.” While the equating methodology is sound and is widely used, the study at best only approximated the “same students” requirement. The study does not really “map state proficiency standards” (or the rigor of the state’s cut-score) as its authors claimed. A consortium of three small states (Vermont, New Hampshire, and Rhode Island) worked together to implement NCLB. They identified grade-level content standards, developed grade-level tests, and set cut-scores at each grade for “proficient.” The three states implemented identical content, identical tests, and identical cut-scores. Yet, this study produced different “NAEP equivalent scores” among the three states. Figure 2 in the report displays grade 8 reading “NAEP equivalent scores” of 263 for Vermont, 258 for New Hampshire, and 253 for Rhode Island. Figure 3 displays grade 8 mathematics “NAEP equivalent scores” of 284 for Vermont, 282 for New Hampshire, and 279 for Rhode Island.

Moreover, while the belief is strongly held and loudly proclaimed by many, as of yet no one has empirically demonstrated a meaningful relationship between high proficiency standards and high student achievement. In this instance, when I calculated the correlation between the 2007 state “NAEP equivalent scores” from the mapping study and the 2007 state NAEP achievement scores, I found little to no relationship between the rigor of state proficiency standards and overall student achievement.

“Grade-level performance” cannot be defined by a single test. It is the demonstrated level of knowledge and skill attainment over time that experienced classroom teachers tend to describe as “C” on a report card.

Indeed, there are enabling regulations and/or guidelines for NCLB that define “state proficient” as “grade-level performance.” In fact, the Peer Review Team must verify that this is the case before a state qualifies for federal Title I funds.

Richard Innes said...

RE: Bert November 27, 2010 3:50 PM

Bert’s example of the scores from the three small states of Vermont (VT), New Hampshire (NH) and Rhode Island (RI) needs to include one important point.

NAEP is a sampled assessment, and all the things we do with it are impacted by statistical sampling error. In NAEP, that error is expressed in terms of “Standard Error.” Basically, we can be 95% certain that the true NAEP score lies within two Standard Errors of the published score.

That also applies to the Estimated NAEP Equivalent Scores for state assessments.

Table 1 in the report I referenced in my previous comment shows that in grade 8 reading, the standard errors for VT, NH, and RI are 1.4, 1.5 and 1.1, respectively.

So, the true VT state assessment proficiency NAEP equivalent score for reading could be 263 plus or minus 2.8 points. That could work out as low as a 260.2. For Rhode Island the published equivalency score of 253 could be 255.2. A five point difference from two different test samples isn’t much of a deal on a 500 point scale test. Most people would say these are pretty much equivalent.

A similar argument applies for the math scores, though for some reason the standard errors are quite a bit lower. Standard errors from Table 2 indicate Vermont’s 284 could really be only 282.2 while Rhode Island’s 279 might really be 280.2. Actually, given that different student samples were tested in each state, I’d say the agreement is remarkably close, which undermines your argument.

However, the huge gap from the Massachusetts test’s NAEP equivalent proficiency score of 232 and the very low 163 equivalent score for Mississippi are not going to be explained away by either sampling errors or varying students being tested. Those huge errors point at a seriously flawed process followed by the peer review panels, as I said earlier.

Regarding Burt’s comments about the lack of correlation between high standards and high performance, I’m not sure the standards in most states have been in place long enough to see much difference. Of particular interest is the fact that South Carolina, traditionally a low-scoring state on NAEP, has set very high standards. I wondered if that had any impact on this state’s NAEP trend over time. I just looked quickly at the NAEP grade 8 Proficient or Above results for South Carolina in the 2009 NAEP Math Report Card.

In 1992 South Carolina had a proficiency rate of only 15 percent while the national average was 20 percent. In 2009, the national average proficiency rate rose to 33 percent, a 13 point rise, but South Carolina moved to 30 percent proficiency, a 15 percent rise and only 3 points below the national average. In fact, in both 2005 and 2007 South Carolina’s proficiency rate in grade 8 NAEP math was actually a point or two above the national average. South Carolina’s NAEP math proficiency rates started to jump up notably in 2003, the year NCLB started taking hold.

South Carolina’s exclusion rate for learning disabled students actually declined, as well, which normally would be expected to reduce proficiency rates.

That is cheap and dirty analysis, and it only looks at one state. But, it looked at a state that set what some would call illogically high standards based on its earlier weak performance, and it looks like it worked for South Carolina.

Richard Innes said...

RE: Bert November 27, 2010 3:50 PM

Bert’s example of the scores from the three small states of Vermont (VT), New Hampshire (NH) and Rhode Island (RI) needs to include one important point.

NAEP is a sampled assessment, and all the things we do with it are impacted by statistical sampling error. In NAEP, that error is expressed in terms of “Standard Error.” Basically, we can be 95% certain that the true NAEP score lies within two Standard Errors of the published score.

That also applies to the Estimated NAEP Equivalent Scores for state assessments.

Table 1 in the report I referenced in my previous comment shows that in grade 8 reading, the standard errors for VT, NH, and RI are 1.4, 1.5 and 1.1, respectively.

So, the true VT state assessment proficiency NAEP equivalent score for reading could be 263 plus or minus 2.8 points. That could work out as low as a 260.2. For Rhode Island the published equivalency score of 253 could be 255.2. A five point difference from two different test samples isn’t much of a deal on a 500 point scale test. Most people would say these are pretty much equivalent.

A similar argument applies for the math scores, though for some reason the standard errors are quite a bit lower. Standard errors from Table 2 indicate Vermont’s 284 could really be only 282.2 while Rhode Island’s 279 might really be 280.2. Actually, given that different student samples were tested in each state, I’d say the agreement is remarkably close, which undermines your argument.

However, the huge gap from the Massachusetts test’s NAEP equivalent proficiency score of 232 and the very low 163 equivalent score for Mississippi are not going to be explained away by either sampling errors or varying students being tested. Those huge errors point at a seriously flawed process followed by the peer review panels, as I said earlier.

Regarding Burt’s comments about the lack of correlation between high standards and high performance, I’m not sure the standards in most states have been in place long enough to see much difference. Of particular interest is the fact that South Carolina, traditionally a low-scoring state on NAEP, has set very high standards. I wondered if that had any impact on this state’s NAEP trend over time. I just looked quickly at the NAEP grade 8 Proficient or Above results for South Carolina in the 2009 NAEP Math Report Card.

In 1992 South Carolina had a proficiency rate of only 15 percent while the national average was 20 percent. In 2009, the national average proficiency rate rose to 33 percent, a 13 point rise, but South Carolina moved to 30 percent proficiency, a 15 percent rise and only 3 points below the national average. In fact, in both 2005 and 2007 South Carolina’s proficiency rate in grade 8 NAEP math was actually a point or two above the national average. South Carolina’s NAEP math proficiency rates started to jump up notably in 2003, the year NCLB started taking hold.

South Carolina’s exclusion rate for learning disabled students actually declined, as well, which normally would be expected to reduce proficiency rates.

That is cheap and dirty analysis, and it only looks at one state. But, it looked at a state that set what some would call illogically high standards based on its earlier weak performance, and it looks like it worked for South Carolina.

Richard Innes said...

RE: Bert November 27, 2010 3:50 PM

Bert’s example of the scores from the three small states of Vermont (VT), New Hampshire (NH) and Rhode Island (RI) needs to include one important point.

NAEP is a sampled assessment, and all the things we do with it are impacted by statistical sampling error. In NAEP, that error is expressed in terms of “Standard Error.” Basically, we can be 95% certain that the true NAEP score lies within two Standard Errors of the published score.

That also applies to the Estimated NAEP Equivalent Scores for state assessments.

Table 1 in the report I referenced in my previous comment shows that in grade 8 reading, the standard errors for VT, NH, and RI are 1.4, 1.5 and 1.1, respectively.

So, the true VT state assessment proficiency NAEP equivalent score for reading could be 263 plus or minus 2.8 points. That could work out as low as a 260.2. For Rhode Island the published equivalency score of 253 could be 255.2. A five point difference from two different test samples isn’t much of a deal on a 500 point scale test. Most people would say these are pretty much equivalent.

A similar argument applies for the math scores, though for some reason the standard errors are quite a bit lower. Standard errors from Table 2 indicate Vermont’s 284 could really be only 282.2 while Rhode Island’s 279 might really be 280.2. Actually, given that different student samples were tested in each state, I’d say the agreement is remarkably close, which undermines your argument.

However, the huge gap from the Massachusetts test’s NAEP equivalent proficiency score of 232 and the very low 163 equivalent score for Mississippi are not going to be explained away by either sampling errors or varying students being tested. Those huge errors point at a seriously flawed process followed by the peer review panels, as I said earlier.

Regarding Burt’s comments about the lack of correlation between high standards and high performance, I’m not sure the standards in most states have been in place long enough to see much difference. Of particular interest is the fact that South Carolina, traditionally a low-scoring state on NAEP, has set very high standards. I wondered if that had any impact on this state’s NAEP trend over time. I just looked quickly at the NAEP grade 8 Proficient or Above results for South Carolina in the 2009 NAEP Math Report Card.

In 1992 South Carolina had a proficiency rate of only 15 percent while the national average was 20 percent. In 2009, the national average proficiency rate rose to 33 percent, a 13 point rise, but South Carolina moved to 30 percent proficiency, a 15 percent rise and only 3 points below the national average. In fact, in both 2005 and 2007 South Carolina’s proficiency rate in grade 8 NAEP math was actually a point or two above the national average. South Carolina’s NAEP math proficiency rates started to jump up notably in 2003, the year NCLB started taking hold.

South Carolina’s exclusion rate for learning disabled students actually declined, as well, which normally would be expected to reduce proficiency rates.

That is cheap and dirty analysis, and it only looks at one state. But, it looked at a state that set what some would call illogically high standards based on its earlier weak performance, and it looks like it worked for South Carolina.

Bert said...
This comment has been removed by the author.
Bert said...

Correcting from previous post Vermont’s standard error from 1.0 to 1.4 (conclusion remains the same):

The 95% confidence interval for Vermont’s grade 8 reading “NAEP equivalent score” of 263 with a standard error of 1.4 is 260.26 to 265.74 (i.e., 263 ±2.74). The 95% confidence interval for Rhode Island’s grade 8 reading “NAEP equivalent score” of 253 with a standard error of 1.1 is 250.84 to 255.16 (i.e., 253 ±2.16). Since the 95% confidence intervals for Vermont and Rhode Island do not overlap, the “NAEP equivalent score” difference between Vermont’s 263 and Rhode Island’s 253 is statistically significant at the .05 level. This significant difference for grade 8 reading “NAEP equivalent scores” takes into account any uncertainty in the data that may be attributed to sampling procedures.

Vermont and Rhode Island use the identical grade 8 reading test and the identical cut-score to define grade 8 reading proficiency. The methodology of the mapping study set out to map the relative rigor of the states’ reading proficiency cut-scores. The methodology, however, generated statistically different “NAEP equivalent scores” for the identical cut-score on the identical test used by Vermont and Rhode Island.

Richard Innes said...

RE: Bert at November 28, 2010 1:49 PM

Statistically significant score differences are not necessarily 'significantly' different in the sense that the general public would use the term.

In fact, NAEP reading in 2009 was scored on a 500 point scale. A five point difference is only a one percent difference on the 500 point scale.

Furthermore, just because two states use the same test does not guarantee that the test has a high level of reliability. However, I'll bet most psychometricians would be very happy with a test that generates results as close at those you are talking about here.

Do you know anything about the reliability studies for the tests?

Bert said...

The mapping study was from beginning to end a statistical exercise. To shrug off statistical findings from the study is, intellectually, to shrug off the entire study.

“Statistically significant score differences are not necessarily 'significantly' different in the sense that the general public would use the term,” is an incredible statement for anyone who understands and works with NAEP data. NAEP reports results using terms such as “higher,” “lower,” “larger,” “smaller,” “not significantly different,” all of which are descriptions of statistical relationships. The public generally understands a NAEP report that one state scored higher than another, or that two states were not significantly different.

“A five point difference from two different test samples isn’t much of a deal on a 500 point scale test. Most people would say these are pretty much equivalent. “This claim would not come from someone with a fundamental understanding of statistics and the NAEP scale. Its author, for example, clearly did not understand that the 500 point reading scale covers three grade levels, not just grade 8. In 2009, the average grade 8 reading scores for the 50 states were all above 250 but less than 275.

There is no need to question the reliability of any of the state tests. Each state had to demonstrate to the Peer Review Team that its NCLB assessment met current psychometric standards. The Department of Education has fined at least two states because they failed to implement NLCB assessments meeting current psychometric standards within a reasonable period of time.

In a personal communication, one of the mapping study authors attributed the different rigor scores for one test to the total educational environments in the three states that enabled more students in one state than in another to score at or above the cut-score on the test. It is a fact (by definition) that the more successful students are, the lower a test’s “rigor” score will be. Applying this explanation to NAEP, NAEP itself is a more rigorous test in Mississippi than in Massachusetts

Richard Innes said...

RE – Bert on November 29, 2010 11:14 AM

Bert, I really enjoy our discussions. I’m sure you are raising our readers’ interest in the blog as you offer a chance for them to see how a defender of status quo education really thinks.

Regarding your first comment in the referenced post, why are you trying to fool the public about the difference between scores that are only statistically significantly different and scores that are really breathtakingly, dramatically different? A lot of other education apologists have played the same word games for years, trying to confuse the public about minor changes in NAEP scores, calling them “significantly different” without ever explaining this is only significant in the statistical sense and may not show much real change what-so-ever.

Here is what the NAEP web site has to say:

“The term "significant" is not intended to imply a judgment about the absolute magnitude or the educational relevance of the differences. It is intended to identify statistically dependable population differences to help inform dialogue among policymakers, educators, and the public.”

http://nces.ed.gov/nationsreportcard/reading/interpret-results.asp#statistical

In other words, even the people who run the NAEP say that all a finding of statistical significance shows is that there is some degree of difference in performance between the two tested groups. The magnitude and relevance of that difference specifically IS NOT included in the finding.

Your attempt to ridicule me for pointing out this very easy to grasp point probably isn’t adding to your luster with most of our readers.

I’ll deal with your second issue in another comment.

Richard Innes said...

RE – Bert on November 29, 2010 11:14 AM

Bert also patronizes my understanding of the common 500-point NAEP scoring scale. Let’s talk about that one. Bert says I don’t understand the scale. I would suggest Bert may not know that some of the theory behind the scale was overrun by reality.

When the NAEP 500-point scale was set up years ago, the theory was that it would be established as an equal interval scale where the average score for all students tested in all grades would have a mean of 250 and a standard deviation of 50. Because it was supposed to be common interval, and because the difference in scores between fourth and eighth grade tended to run around 40 points, some testing gurus quickly started to claim that a difference of 10 points on the NAEP scale equated to a full year of extra education. If the theory had held, that would have been about right.

Problem is, real NAEP data overtook this psychometric dream.

Here’s an example.

For the 2009 NAEP reading assessments, the fourth and eighth grade scores can be found in the report card titled “Reading 2009, NATIONAL ASSESSMENT OF EDUCATIONAL PROGRESS AT GRADES 4 AND 8.” This document shows that across the nation fourth grade reading score was 221 and the eighth grade score was 264.

Another report, “Grade 12 Reading and Mathematics 2009, National and Pilot State Results,” shows the national average 12th grade reading score was 288.

Now, let’s see how that single, 500 point NAEP common scale worked out.

The gap between the grade 4 and grade 8 scores is 43 points.

BUT, the gap between grade 8 and grade 12 is much lower, only 24 points.

So, in practice, the common interval NAEP 500-point scoring scale theory does not hold up. Moreover, this also raises questions about whether the often heard statement that a 10 point difference on NAEP is equivalent to a year of extra schooling is correct, as well. It certainly falls apart between grades 8 and 12.

In fact, if you look at the percentile score reports in the NAEP report cards cited above, you find that in 2009 the 90th percentile of fourth grade readers scored 264. If the 500 point scale really were equal interval, common scale, that would mean that one in ten fourth grade kids actually reads as well as the lowest 25 percent of high school seniors, because the 25th percentile score for 12th grade students in the same year was also 264. Bert, are you really going to try to sell that to us?

Finally, you claim that there is no reason to question the reliability of the state tests. You cleverly don’t mention anything about validity, however.

Common, Bert. I know you know more about testing than that.

Without validity of test scores, all reliability tells us is that if you give the test again to the same kids, the scores will come out the same. It doesn’t say anything about whether or not the scores validly indicate if kids are learning what they need for life.

Bert, psychometrics has a long way to go. It’s not a very exact science. It didn’t work out for the NAEP 500-point common scale theory. It didn’t work out for Performance Events in Kentucky in the 1990’s, either (see our reports on KERA @ 20 if you conveniently forgot about that fiasco, on line here: http://www.freedomkentucky.org/index.php?title=KERA_Portal).

Bert said...

I must admit, I was not as well versed on the early development of NAEP's 0-500 scales as I should have been. I have a friend who knows the statistical staff at ETS (the contractor that builds NAEP scales). He asked ETS about some info I found on the NCES website. Here's the repsonse . . .

"The last paragraph on that [NCES web page about all three grades on the original scale with a mean of 250 and a standard deviation] provides a basic answer, but leaves out some details.

"The 0-300 scale that’s used for within-grade scales (e.g., science) is indeed established by setting the mean and standard deviation at each grade to 150 and 35 in the base year. Most of the new assessments or new frameworks established since 1996 (e.g., science in 1996, civics in 1998, grade 12 math in 2005, economics in 2006) are within-grade scales that use the 0-300 reporting scale.

"The 0-500 scale is used for cross-grade scales, most notably reading and math (although grade 12 was taken off the math cross-grade scale in 2005). The base year for these scales is prior to 1996 (1990 for math, 1992 for reading). In that base year, the scale was established by setting the mean and SD of all grades combined to 250, 50. Data for multiple grades are scaled together ONLY in the base year; in subsequent years, the scaling is done separately within grade, and then each grade is linked back to the initial cross-grade scale.

"There are some subject-specific issues and technical details, but the basic approach for transforming the cross-grade scales to the reporting metric is the same.

"In terms of a reference, the best I can come up with is to suggest going back to the old technical reports that contain more details for each assessment year. For instance, the 1996 technical report describes how the 1996 math results were linked to the original 1990 math cross-grade scale."

The 0-500 scale covers all three grades, not to each grade individually.

Thanks for the prompt to learn more about NAEP scaling.

Richard Innes said...

RE: Bert on December 28, 2010 10:02 PM

I'm glad our discussion encouraged you to dig deeper into the NAEP scoring scales.

I suspect that the switch to the within-grade-only 300 point system was partly a reaction to the operational experience with across-grade 500 point system, as I mentioned in earlier comments.

Richard Innes said...

Bert,

We may have a technical issue with one of your attempts to post a comment. I received an alert notice that you made a comment about test Reliability that has not been posted by Blogger.

Did you delete that comment after posting it?