THE biggest influence on student achievement: Hattie
Note: Update at bottom
What is THE biggest influencer of student achievement? A well-known researcher attempts to find out, but what do his findings mean? Among the most famous meta-analysts (if such people can be famous) is John Hattie. In 2009 he published a book called Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement in which he did a meta-analysis of a lot of other meta-analyses and came up with a ranked list of the effects of teacher, teaching, and student influences on achievement (as seen here).
I’ve been taking a close look at these, and frankly am puzzled by the very top one on the list. The thing that Hattie found that had THE biggest effect size was self-report grades. It has an effect size of 1.44. That would be a huge effect. It would mean a kid at the 50th percentile would move to the 93rd percentile after doing this.
Wait a minute… after doing what? Is this a claim that just grading yourself raises your achievement? If it were that easy, wouldn’t we all be doing that?
Since 2009, Hattie has published other books and has changed the meaning of self-grading to self-expectations, and I am afraid that he is going too far beyond what this effect really describes.
In order to try to figure out what the 1.44 effect size is based on, I tried to go back to the studies that Hattie used to get to that effect size. He lists five of them here. I pulled each of them and found this:
- Mabe & West (1982) This is a review of 55 studies. It is framed around the idea of understanding the validity of self-evaluation by correlating self-given grades to other grades. Are the self-given grades accurate? Are they related to more objective measures of achievement? They did NOT investigate whether changing the self-evaluation influences achievement. The average correlation between self-evaluation and achievement was .29, although across studies it ranged from -.26 to .80. The authors identify a number of ways to make the self-evaluations more accurate.
- Fachikov & Boud (1989) This review examined 57 studies. They employed the commonly-used effect size measure. However, again, there weren’t control and experimental groups or studies about the effect of changing someone’s self-grade/self-expectation. Rather, the self-grade was coded as the experimental group and teacher grade as the control. The mean effect size is .47, with a range from -.62 to 1.42 across studies. They also report the mean correlation between self-graded and other-graded was .39.
- Ross (1998) This study examines self-assessment in the context of whether a self-assessment can be used for placement in language classes as opposed to giving placement tests. They report the correlation between self-report and objective scores across 60 studies reviewed as .63. They then report an effect size, but it is an effect size for the correlation coefficient, not the traditional meta-analysis effect size that compares a control and experimental group. This effect size (g) is 1.63. Again, they don’t compare doing it versus not or any effect on achievement.
- Falchikov & Goldfinch (2000) This study is actually about the relationship between peer grades and teacher grades. The overall correlation was .69, with a range from .14 to .99. Regardless, this study does not seem to fall into the category of self-assessment.
- Kuncel, Crede, & Thomas (2005) This paper again looked at the reliability and validity of self-assessment BUT just by looking at whether the GPA and SAT scores that students were reporting were their real scores. In other words, this isn’t even really a judgment of their own expectations, but whether they remember and accurately report known scores. They compare reported to actual results from 37 different samples. So, sure the effect size for reported versus actual GPA was 1.38, but that just means college students can pretty accurately report their already known GPAs. Interestingly, they were quite poor at reporting SAT scores, with effect sizes of .33 for Verbal and .12 for Math.
It is clear that these studies only show that there is a correlation between students’ expected grades and their actual grades. That’s it (and sometimes not even that). They just say kids are pretty good judges of their current levels. They do not say anything about how to improve achievement. These studies are not intervention studies. In fact, if I were looking at all the studies about “influences” on achievement, I would not include this line of research. It is not about influencing achievement.
I’m not going to judge one way or another the really high effect size Hattie reports. It certainly isn’t clear from these five studies how it would be so high, but I’m guessing this is not an exhaustive list of all the studies reviewed. I’m focusing here on, even if that 1.44 is the case, what does that really tell us?
In later work, Hattie is calling this finding self-expectation. The self-grading research seems to have gotten turned into the idea that these studies imply we should help kids prove those expectations wrong or that raising kids’ expectations will raise their achievement. Those are both hypotheses that could be tested. I think the argument is that if children’s performance exceeds their expectations, they will then raise their expectations, and that will raise their achievement. That may or may not work.
That is not what the studies that produced the 1.44 effect size studied. They looked at the correlation of self-report to actual grades, often in the context of whether self-report could be substituted for other kinds of assessment. None of them studied the effect of changing those self-reports. As we all know, correlation does not imply causation. This research does not imply that self-expectations cause grades.
I am a big fan of Hattie’s work. The task that he undertook was Herculean. However, I am afraid that the message related to that very top, very biggest effect size is being distorted. Am I wrong? What am I missing on the studies that lead to this effect size estimate?
UPDATE: Since publishing this, I have been directed to three criticisms of Hattie’s statistical methods. They are worth your time to check out.
About the Author
Kristen DiCerbo, PhD, was a principal research scientist for the Center for Learning Science & Technology within Pearson’s Research & Innovation Network. Dr. DiCerbo’s research program centered on digital technologies in learning and assessment, particularly on the use of data generated from interactions to inform instructional decisions. She has conducted qualitative and quantitative investigations of games and simulations, particularly focusing on the identification and accumulation of evidence. She previously worked as an educational researcher at Cisco and as a school psychologist. She holds doctorate and master’s degrees in Educational Psychology from Arizona State University.