From the Heichinger Report
by Sarah Garland
In Washington, D.C., one of the first places in the country to use value-added teacher ratings to fire teachers, teacher-union president Nathan Saunders likes to point to the following statistic as proof that the ratings are flawed: Ward 8, one of the poorest areas of the city, has only 5 percent of the teachers defined as effective under the new evaluation system known as IMPACT, but more than a quarter of the ineffective ones. Ward 3, encompassing some of the city’s more affluent neighborhoods, has nearly a quarter of the best teachers, but only 8 percent of the worst.
The discrepancy highlights an ongoing debate about the value-added test scores that an increasing number of states—soon to include Florida—are using to evaluate teachers. Are the best, most experienced D.C. teachers concentrated in the wealthiest schools, while the worst are concentrated in the poorest schools? Or does the statistical model ignore the possibility that it’s more difficult to teach a room full of impoverished children?
Saunders thinks it’s harder for teachers in high-poverty schools. “The fact that kids show up to school hungry and distracted and they have no eyeglasses and can’t see the board, it doesn’t even acknowledge that,” he said.
But many researchers argue that value-added models don’t need to control for demographic factors like poverty, race, English-learner or special-education status at the individual student level, as long as enough test score data (at least three years) are included in the formula. They say states and districts choose to include demographic characteristics in the models to satisfy unions and other constituents—not because it’s statistically necessary.
William Sanders, a former University of Tennessee researcher now at the SAS Institute Inc. , has spent nearly three decades working on a complex statistical formula that’s been adopted in districts serving a total of 12 million students around the country. With at least three years of test-score data from different academic subjects, he says he is able to home in on a good prediction of what a particular student’s progress should look like in a given year—and thus, how much a teacher should be expected to teach the student. Adding demographic factors only muddies the picture, he argues.
“If you’ve got a poor black kid and a rich white kid that have exactly the same academic achievement levels, do you want the same expectations for both of them the next year? If the answer is yes, then you don’t want to be sticking things in the model that will be giving the black kid a boost,” he said.
But Eric Isenberg, a Mathematica  researcher and one of the designers of the IMPACT value-added model for Washington, D.C., says he’s “never been really compelled by the lower-the-expectations-for-students argument.” The D.C. model only uses one year of data, and incorporates the poverty status of individual students, among other factors, to protect against biasing the ratings.
“Nobody ever makes the argument that you’re holding the kids that started at a lower [achievement level] to lower standards,” he said.
There is also debate among researchers about whether the concentration of disadvantaged students in a classroom should be taken into account. Only a handful of value-added models do so.
A large body of research has found that student achievement is affected not only by a student’s individual circumstances at home, but also by the circumstances of other children in the same school and classroom. Studies have found that students surrounded by more advantaged peers tend to score higher on tests than similarly performing students surrounded by less advantaged peers.
To some experts, this research suggests that a teacher with a large number of low-achieving minority children in a classroom, for example, might have a more difficult job than another teacher with few such students.
D.C.’s model doesn’t account for classroom characteristics, but Florida’s model accounts for the percentage of students scoring at similar levels in a class, a variable that may partly address the issue.
Controlling for the demographics of a whole class can be messy, says Douglas Harris, a University of Wisconsin-Madison professor who has studied both value-added modeling and how a student’s peers affect his or her own achievement.
“It’s very hard in a statistical sense to separate for those things,” Harris said. “Accounting for the student level and the classroom and school level is not going to make that much difference.”
Isenberg agrees: “I haven’t seen anything to date that suggests peer effects make a large difference” in the context of value-added teacher evaluations. Nevertheless, he is currently leading research in D.C. and 30 other cities to see if factoring in the concentration of disadvantaged students in a class will make a difference in teachers’ scores.
Daniel McCaffrey, a senior statistician at the RAND Corporation, a nonprofit research group, argues that peer effects can make a difference, however. If there are enough years of test-score data, “including individual-level race and income … in the model doesn’t matter very much,” he said. On the other hand, including classroom-level data “tends to matter more and can make meaningful changes” to a teacher’s rating.
Sanders says that in his years of research, he has found no correlation between the concentration of disadvantaged students and school performance on value-added measures. “It becomes a question of where do you want to put your risk,” he said. Should school districts risk hiding the fact that high-poverty schools tend to get more ineffective teachers, he asked, or risk rating teachers with high numbers of disadvantaged students incorrectly?