Enabling the science of history

One of the great ironies of my academic career was that, throughout my Ph.D. work on a digital critical edition of parts of the text of Matthew of Edessa’s Chronicle, I had only the vaguest inkling that anyone else was doing anything similar. I had heard of Peter Robinson and his COLLATE program, of course, but when I met him in 2007 he only confirmed to me that the program was obsolete and, if I needed automatic text collation anytime soon, I had better write my own program. Through blind chance I was introduced to James Cummings around the same time, who told me of the existence of the TEI guidelines and suggested I use them.

It was, in fact, James who finally gave me a push into the world of digital humanities. I was in the last panicked stages of writing up the thesis when he arranged an invitation for me to attend the first ‘bootcamp’ held by the Interedition project, whose subject was to be none other than text collation tools. By the time the meeting was held I was in that state of anxious bliss of having submitted my thesis and having nothing to do but wait for the viva, so I could bend all my hyperactive energy in that direction. Through Interedition I made some first-rate friends and colleagues with whom I have continued to work and hack to this day, and it was through that project that I met various people within KNAW (the Royal Dutch Academy of Science.)

After I joined Interedition I very frequently found myself talking to its head, Joris van Zundert, about all manner of things in this wide world of digital humanities. At the time I knew pretty much nothing of the people within DH and its nascent institutional culture, and was moreover pretty ignorant of how much there was to know, so as often as not we ended up in some kind of debate or argument over the TEI, over the philosophy of science, over what constitutes worthwhile research. The main object of these debates was to work out who was holding what unstated assumption or piece of background context.

One evening we found ourselves in a heated argument about the application of the scientific method to humanities research. I don’t remember quite how we got there, but Joris was insisting (more or less) that humanities research needed to be properly scientific, according to the scientific method, or else it was rubbish, nothing more than creative writing with a rhetorical flourish, and not worth anyone’s time or attention. Historians needed to demonstrate reproducibility, falsifiability, the whole works. I was having none of it–while I detest evidence-free assumption-laden excuses for historical argument as much as any scholar with a proper science-based education would, surely Joris and everyone else must understand that medieval history is neither reproducible nor falsifiable, and that the same goes for most other humanities research? What was I to do, write a Second Life simulation to re-create the fiscal crisis of the eleventh century, complete with replica historical personalities, and simulate the whole to see if the same consequences appeared? Ridiculous. But of course, I was missing the point entirely. What Joris was pushing me to do, in an admittedly confrontational way, was to make clear my underlying mental model for how history is done. When I did, it became really obvious to me how and where historical research ultimately stands to gain from digital methods.

OK, that’s a big claim, so I had better elucidate this mental model of mine. It should be borne in mind that my experience is drawn almost entirely from Near Eastern medieval history, which is grossly under-documented and fairly starved of critical attention in comparison to its Western cousin, so if any of you historians of other places or eras have a wildly different perspective or model, I’d be very interested to hear about it!

When we attempt a historical re-construction or create an argument, we begin with a mixture of evidence, report, and prior interpretation. The evidence can be material (mostly archaeological) or documentary, and we almost always wish we had roughly ten times as much of it as we actually do. The reports are usually those of contemporaneous historians, which are of course very valuable but must be examined in themselves for what they aren’t telling us, or what they are misrepresenting, as much as for what they positively tell us. The prior interpretation easily outweighs the evidence, and even the reports, for sheer volume, and it is this that constitutes the received wisdom of our field.

So we can imagine a rhetorical structure of dependency that culminates in a historical argument, or a reconstruction. We marshal our evidence, we examine our reports, we make interpretations in the light of received wisdom and prior interpretations. In effect it is a huge and intricate connected structure of logical dependencies that we carry around in our head. If our argument goes unchallenged or even receives critical acceptance, this entire structure becomes a ‘black box’ of the sort described by Bruno Latour, labelled only with its main conclusion(s) and ready for inclusion in the dependency structure of future arguments.

Now what if some of our scholarship, some of the received wisdom even, is wide of the mark? Pretty much any historian will relish the opportunity to demonstrate that “everything we thought we knew is wrong”, and in Near Eastern history in particular these opportunities come thick and fast. This is a fine thing in itself, but it poses a thornier problem. When the historian demonstrates that a particular assumption or argument doesn’t hold water–when the paper is published and digested and its revised conclusion accepted–how quickly, or slowly, will the knock-on effects of this new bit of insight make themselves clear? How long will it take for the implications to sort themselves out fully? In practice, the weight of tradition and patterns of historical understanding for Byzantium and the Near East are so strong, and have gone for so long unchallenged, that we historians simply haven’t got the capacity to identify all the black boxes, to open them up and find the problematic components, to re-assess each of these conclusions with these components altered or removed. And this, I think, is the biggest practical obstacle to the work of historians being accepted as science rather than speculation or storytelling.

Well. Once I had been made to put all of this into words, it became clear what the most useful and significant contribution of digital technology to the study of history must eventually be. Big data and statistical analysis of the contents of documentary archives is all well and good, but what if we could capture our very arguments, our black boxes of historical understanding, and make them essentially searchable and available for re-analysis when some of the assumptions have changed? They would even be, dare I say it, reproducible and/or falsifiable. Even, perish the thought, computable.

understanding_dhA few months after this particular debate, I was invited to join Joris and several other members of the Alfalab project at KNAW in preparing a paper for the ‘Computational Turn’ workshop in early 2010, which was eventually included in a collection that arose from the workshop. In the article we take a look at the processes by which knowledge is formalized in various fields in the humanities, and how the formalization can be resisted by scholars within each field. Among other things we presented a form of this idea for the formalization of historical research. Three years later I am still working on making it happen.

I was very pleased to find that Palgrave Macmillan makes its author self-archiving policies clear on their website, for books of collected papers as well as for journals. Unfortunately the policy is that the chapter is under embargo until 2015, so I can’t post it publicly until then, but if you are interested meanwhile and can’t track down a copy of the book then please get in touch!

J. J. van Zundert, S. Antonijevic, A. Beaulieu, K. van Dalen-Oskam, D. Zeldenrust, and T. L. Andrews, ‘Cultures of Formalization – Towards an encounter between humanities and computing‘, in Understanding Digital Humanities, edited by D. Berry (London: Palgrave Macmillan, 2012), pp. 279-94.