On the stability of performance assessments.

Document Type


Publication Date



This study examined the stability of scores on two types of performance assessments, an observed hands-on investigation and a notebook surrogate. Twenty-nine sixth-grade students in a hands-on inquiry-based science curriculum completed three investigations on two occasions separated by 5 months. Results indicated that: (a) the generalizability across occasions for relative decisions was, on average, moderate for the observed investigations (.52) and the notebooks (.50); (b) the generalizability for absolute decisions was only slightly lower; (c) the major source of measurement error was the person by occasion (residual) interaction; and (d) the procedures students used to carry out the investigations tended to change from one occasion to the other.