Educational data, Pearson and the ‘theory gap’
Earlier this month we came across an article in the European Educational Research Journal analysing Pearson’s role in education research. In the spirit of open dialogue, we invited the author, Dr Ben Williamson of Stirling University in the UK, to summarise his points, which he does in the following article. You can also read our response to this piece.
Pearson has recently become the subject of several major research studies. These studies have sounded a largely critical note about Pearson, particularly around its business ambitions and its political influences. One of the reasons for the emerging criticism of Pearson among education researchers, I believe, is that Pearson is beginning to challenge the existing authority of social scientists and psychologists to study, understand and produce new knowledge about key aspects of education such as assessment and learning.
I recently published a research article in the European Educational Research Journal on what I described as Pearson’s ‘digital methods.’ The research tried to identify some of the many research methods that Pearson is using to make sense of education, and specifically looked into the statistical methods and the data visualization techniques behind Pearson’s The Learning Curve, and the data science methods used by Pearson’s Centre for Digital Data, Analytics and Adaptive Learning.
My argument was that Pearson is becoming a methodological gatekeeper with the capacity to carry out new forms of educational research using large-scale datasets, big data and data science methods. These are approaches that many educational researchers working in higher education institutions are ill-equipped to carry out, which puts Pearson at an advantage as more and more digital data is produced about learning and assessment. As a result, a research centre like Pearson’s Centre for Digital Data, Analytics and Adaptive Learning looks from the outside like a seriously-resourced laboratory for educational research and knowledge production that challenges the existing methods, knowledge and theories of educational sociology, philosophy and psychology.
For example, John Behrens, the director of the Centre for Digital Data, Analytics and Adaptive Learning has claimed that data-mining ‘the billions of bits of digital data generated by students’ interactions with online lessons as well as everyday digital activities’ will challenge current theoretical frameworks in education, as ‘new forms of data and experience will create a theory gap between the dramatic increase in data-based results and the theory base to integrate them.’ In a report co-authored with Kristen DiCerbo (also of Pearson), it is noted that ‘we need further research that brings together learning science and data science to create the new knowledge, processes, and systems this vision requires.’
The ambition to devise new data science methods together with learning science approaches, and then to use these to identify a ‘theory gap’ could cause disquiet among some education researchers. Of course, it’s intellectually healthy to challenge old theories, otherwise we would still be trying to construct behaviourist ‘teaching machines’ like those of Sidney Pressey a century ago. But for a big company like Pearson to position itself in a way which suggests it has the capacity to address the theory gap using its massive data analytic capacity could be seen as a little troubling. Here are two reasons.
First, Pearson promotes The Learning Curve as an ‘open and living database’ that will encourage ‘evidence-informed education policy’ and help ‘identify the common elements of effective education.’ What is less clear to the user is that The Learning Curve was constructed by the Economist Intelligence Unit (until recently owned by Pearson) whose expertise is in economic forecasting, business intelligence and national comparison. Although The Learning Curve invites the user to engage with the data through an interactive visual interface, ultimately it limits what kinds of analyses can be done and what can be said about the data because it has been designed to prioritize the measurement and comparison of ‘effective’ education according to the methodological preferences of the EIU. What Pearson says is ‘effective education,’ or rather what the EIU measures as ‘effective education,’ or indeed, what data can be included about ‘effective education’ in The Learning Curve in the first place, all point towards its limitations as an impartial, neutral and objective visual and numerical representation of education around the world. The methodological appendix to The Learning Curve even admits as much, stating that ‘because indexes aggregate different data sets on different scales from different sources, building them invariably requires making a number of subjective decisions.’ There is subjectivity to the objectivity offered by The Learning Curve.
For me as an education researcher with a sociological tendency, this makes me ask questions about the ‘who’ behind the data—who selected it, from where, what did they do to prepare it for inclusion, how did they clean it up, how has it been tweaked, how has it been presented, and, crucially, how much interpretation has been done by the designers of The Learning Curve in advance of its presentation on the site?
Second, Pearson’s Centre for Digital Data, Analytics and Adaptive Learning is premised on a kind of big data belief system which assumes that massive quantities of data can reveal truthful and meaningful patterns about the reality they’re taken from—that the data can speak for themselves free of human bias. Yet as many researchers of big data have pointed out, data do not exist naturally as a ‘raw’ or truthful representation of an underlying reality—they have to be brought into being through human, social, methodological and technical practices, and are constantly reshaped as they move between human actors, software platforms, and institutional structures and settings, all framed by social, political and economic contexts. Again, human hands, minds and biases, as well as technical platforms and business plans, can all affect the ways in which data are collected, calculated, and communicated back out into the world.
These examples are significant because Pearson is claiming to be opening up a ‘theory gap’ in our understanding of effective education and learning, and at the same time working on new digital methods and data scientific approaches that might produce new knowledge to fill that gap. As a global educational media company and increasingly a policy influencer, it is then very well positioned to use the insights it gains from the data to come up with new kinds of solutions in the shape of new software products for schools, or even new policy solutions for governments.
You can see why some critically-minded education researchers would be sceptical—Pearson’s identifying problems for which it might sell solutions! Others might point out that numerical data (no matter how big) and its visualization as heatmaps, time series graphs and so on are only part of the educational picture—that they don’t capture social and cultural context, emotional complexity, and the qualitative dimensions of human relations in classrooms.
My own critique is different. Instead, my emphasis is on acknowledging the human and social practices that go into the generation of data at Pearson as a new source of knowledge production, and on asking questions about how its new digital methods and data scientific approaches might be challenging the long history of educational theorizing, empirical investigation, and knowledge production. Pearson is positioning itself as a major source of methodological expertise in educational research, driven by ambitions to reconceptualise education and learning, and it has significant global power to influence policymakers, politicians and practitioners alike that its data provides the numerical and visualized facts that can fill the theory gap.
There is an exciting line of sociological inquiry into the ‘social life of methods’ to draw from here which treats research methods as the object of social scientific inquiry. Those of us trying to understand Pearson from the outside know little about the ‘social life’ of the methodological work being done inside Pearson’s research centres.
The necessary response, I think, is for education researchers to try to understand the ‘who,’ the ‘how’ and the ‘why’ of Pearson’s current digital ambitions. Who at Pearson is collecting the data, designing the algorithms to analyse it, and checking the analytics for their accuracy—and according to whose policy ambitions, business plans and personal objectives? How are the datasets that Pearson possesses selected, interpreted and presented, and how is the visualization of its data on platforms like The Learning Curve designed in such a way as to shape the possible interpretations that audiences can make? And why is Pearson investing in such a massive effort to conduct educational data science—to identify new market niches for itself, to displace higher education institutions, and to position itself as the dominant global centre of educational expertise and knowledge production?
Answering these questions may require researchers with a more critical set of methodologies and theories to engage in a dialogue with researchers within Pearson, and to understand Pearson from the inside as a new source of methodological expertise and knowledge production rather than criticising it from the outside as a commercial monster. There is an empirical gap in our understanding of how Pearson is approaching the theory gap in educational research.