5. BIG DATA - The Importance of Remembering there Can be a Data/Facts Disjunction
“Data” is a dangerous word. Those four letters are often said together to make a sound which many believe, by simply declaring it, gives license to present your next set of information as that other four-letter word: “fact”. But, in fact, data is just the name we give to certain sets of information, as much open to interpretation, falsification, and qualification, as any other piece of information.
When I receive the “information” from my religious friend that he was visited by God last night in a dream, my instinct as an atheist is to take it with a sceptical pinch of salt:
Dreams are a non-supernatural everyday phenomenon.
Is there anything that happened in this “God Dream” which could not be explained non-supernaturally?
Is my friend’s pre-existing faith reason to recognise a potential bias in his recounting of the information?
Etc…
Likewise, when I hear about a report which denies the existence of climate change that “information”, too, goes through a process of scepticism:
Does the report come from someone with vested interest, such as a think tank or polluting industry, who would benefit from the report’s conclusions?
Is the report based on a methodology which yields meaningful information?
Is the information in the report supported by other, independent, research?
Etc…
But there seems to be a feeling in some quarters that the simple presentation of “data” to support an idea is often an intellectual trump card which guarantees that idea’s status as “fact”. The most obvious recent example of this in my own reading is Steven Pinker’s epic use of data in his two books on progress, The Better Angels of Our Nature and Enlightenment Now. Pinker’s two overarching arguments are that:
a) the world is a far less violent place today than it ever was, and
b) that the philosophy of the enlightenment has been the driving force for this, and other, social progress.
Both arguments are well presented, and I think I largely agree with Pinker on the whole, but study after study is presented after each initial hypothesis to help make his case and with each new data set I actually began to find his overall theses less convincing as Pinker attempted to translate intellectual ideas into raw, quantitative numbers. Data about wars, for example, relied on some very questionable interpretations and equivocations about what constituted violence, if not outright denial about violence in certain places. As did data about economics, which suffered throughout from a very clear bias against any form of Marxism or socialism and a large bias in favour of markets. Not to mention data about human nature used to argue for the necessity of government. Now, I didn’t think the issues were enough to topple his overall argument, but they did distract from the point he was actually making and make me question their value as “proof” of anything, especially as war, socialism and anarchism just happen to be research interests of mine, and an area I have my own “data” on to contrast against Pinker’s. Seeing the distortions and problems in those specific areas happened only because I happened to know those topics more intimately; if I knew the other areas in Pinker’s book better, would I find similar flaws in the reading of the data there too?
I remember hearing the magician Penn Jillette once say that he realised every time he read a report in a newspaper about something he knew something about, he noticed they got everything wrong. A friend then pointed out, that’s probably because they get something wrong in every story - you just don’t know enough about it to notice it.
I’m not saying the data has been misunderstood or manipulated; I am saying that the rightness or wrongness of data lies largely in the interpretation and application of it, and on the initial prior agreement of the soundness of the methodology and focus of the study which generated it. When I say Pinker got something wrong I am not accusing Pinker of anything malicious; I am really saying we disagree on what that particular data shows, or on the soundness and relevance of the data in the first place in application to the issue at hand.
And this is not just a problem within the books of this one thinker. Take philosophy of mind, for example. We have all heard the neuroscientist’s claims that mind can be reduced to brain and there are many studies showing beautiful images of brain circuitry all lit up “thinking” and “experiencing” things. The data tends towards eliminitivism - “mind” is an outdated “folk psychology” from before we had all this wondrous data about mind.
Except - it doesn’t really. Because mind is a great example of a fundamental data clash - all the physical brain activity recorded in the world will never come close to giving me the same thing as the qualitative phenomenological experience of a particular thought, and, as Galen Strawson has pointed out, the phenomenological experience is indubitable. So our physical picture of the world is incomplete - and perhaps, as Colin McGinn suggests, even beyond the comprehension of brains evolved for survival and not biologically obligated to answer life’s most profound philosophical questions. So while one set of thinkers hold up the data of neuroscience as proof positive the mind is the brain by reducing mind to brain, others hold it up as evidence that mind is certainly likely to be connected to the brain but we may never be able to fully explain or understand the relationship at all. The data itself does not guarantee the conclusion.
Much of which is already known in philosophy, and the subject of great debate. But I also find it interesting as a teacher how much the “data means facts” mentality has seduced its way into education policy without much scrutiny (and certainly without much question of the message our data use sends to our students and their parents, perpetuating this false idea that data = fact).
Data, in theory, means measurability, tracking and accountability, and therefore is favoured and celebrated by those in the business of measuring education, tracking education, and holding education accountable. Every year teachers across the country collect spreadsheet after spreadsheet of meaningless data on their pupils. They analyse that data and discuss it endlessly with managers and leaders and make decisions about future planning based on what the data says. But the data gathered is, invariably, pure garbage.
Take my own subject, for example. When I test a student on their RE, what am I testing? Well, it could be their subject knowledge. It could be their ability to synthesise that subject knowledge with writing technique and successfully explain a difficult concept. It could be their ability to combine and abstract such concepts into a longer piece of explanatory writing, or to evaluate and analyse them into a fully justified argument. I could be checking their use of scripture and scholarship to support ideas. I could be checking their spelling. I could be checking their longterm memory and I could be checking their short-term memory. I could be checking all of the above.
Each of these potential focuses for assessment are legitimate and valid, and may even generate a grade or level if they correlate to some sort of comprehensive mark-scheme or assessment criteria for the various skills and knowledge required for the subject. But a grade isn’t really the important thing here; rather the question of have they or haven’t they demonstrated the particular skill/knowledge that I’m looking for. And even that isn’t so obvious - because I could just as easily set an assessment task that I know should yield generally positive results (an “easy” test) as one guaranteed to challenge my students and expose their deficits (a “difficult” test). Each approach has pedagogical merits; the former to boost my students’ confidence, the latter to show them what to work on for next time. So it’s not even clear what the grade means without some initial contextual understanding.
There is all manner of data that comes out of a well chosen piece of assessed work, but the sort of data school leaders (and parents) tend to look for are grades. Numbers. And more than that, they are looking for grades to show progression across the year: the student who begins September on a grade 5 should hope to end the year at a grade 6 or higher or else they only demonstrate a year of treading water. Anything lower, and they are failing.
The problem, however, is that true educational progress isn’t neat like that. Many times we step backwards before we move forwards again, or a new context for an old skill throws up challenges we didn’t anticipate. Educational progress is messy. And as the varied list of rationales and approaches to assessment above show, the data we gather across a year may be completely disconnected from each previous data point: an “easy” test followed by a “difficult” test, a test focusing on a higher-level skill followed by a simple knowledge check, etc. Yet schools and parents tend to care little about nuance and narrative and seek only simplistic data - the numbers going up is good, the numbers going down means the same - alongside the underlying assumption: everything can be reduced to a number, and that we all understand what the numbers mean.
So the grade 8 you got for a drawing project is the same as the grade 8 you got for writing an essay and the grade 8 you got in the group presentation. When, very obviously, each grade 8 comes from a very different skill-set. More to the point, a decline in the group work (because of uncooperative partners or a fear of public speaking), or in the essay (because you draw better than you write) is completely understandable and gives us zero evidence that you are “slipping” or “failing” compared to the previous work as the type of assessment is completely unlike the last one. Yet on a spreadsheet, reduced to mere numbers, mere “data”, none of this is obvious.
The alleged “data” is certainly information - but it is not the information we always think it is and it is no guarantee of “fact” (especially so in a subject like mine when the grade is also based on the assessor’s own subjective interpretation of the mark scheme too). Yet, in education, as in many other fields, neat spreadsheets are wielded as if they really are evidence of something, when more often than not they simply are not.
All of which is to say that we need to stop leaving our critical faculties at the door when presented with “data” to support an idea. Data can tell us a lot, and I am not suggesting for a moment data should be ignored. But data can also mislead and obfuscate, and a philosopher needs to consider the nature and role of the information presented, its relevance and reliability, before granting it the sort of epistemic free pass too many today seem eager to give it.
AUTHOR: D.McKee