Hamburg here I come

As I write this I am on my way to Hamburg for DH2012. I’m very much looking forward to the conference this year, not only because of the wide variety of interesting papers and the chance to explore a city I’ve heard a lot of nice things about, but also because this year I feel like I have some substantial research of my own to contribute.

My speaking slot is on Friday morning (naturally opposite a lot of other interesting and influential speakers, but that seems to be the perpetual curse of DH.)  In preparation for that, I thought I might set down the background for the project I have been working on for the last two years, and discuss a little of what I will be presenting on Friday. After all, if I can set it down in a blog post then I can present it, right?

The project is titled The Tree of Texts, and its aim is to provide a basis for empirical modelling of text transmission. It grows out of the problem of text stemmatology, and specifically the stemmatology of medieval texts that were transmitted through manual copies by scribes who were almost never the author of the original text (if, indeed, a single original text ever existed.)

It is well known that texts vary as they are copied, whether through mistakes, changes in dialect, or intentional adaptation of the text to its context; almost as long as texts have been copied, therefore, scholars have tried in one way or another to get past these variations to what they believe to be the original text.  Even in cases where there was never a written original text, or where the interest of the scholar is more in the adaptation than in the starting point, there is a lot to be gained if we can understand how the text changed over time.

Stemmatology, the formal reconstruction of the genesis of a text, developed as a discipline over the course of the nineteenth century; the most common (“Lachmannian”) method is based on the principle that if two or more manuscripts share a copying error, they are likely to have been copied either one from the other or both from the same (lost) exemplar. There has been a lot of effort, scholarship, and argument on the subject of how one distinguishes ‘error’ from an original (or archetypal) reading, how one distinguishes genealogical error (e.g. the misreading of a few words in a nigh-irreversible way so that the meaning of the entire sentence is changed) from coincidental error (e.g. variation in word spelling or dialect, which probably says more about the scribe than about the manuscript being copied).  The classical Lachmannian method requires the practitioner to decide in advance which variants are likely to have been in the original; more recent and computationally-based neo-Lachmannian methods allow the scholar to withhold that particular pre-judgment, but still require a distinction to be made concerning which shared variants are likely or unlikely to have been coincidental or reversible.

A method that requires the scholar to know the answer in advance was always likely to encounter opposition, and Lachmannian stemmatology has spawned entire sub-disciplines in protest at the sheer arrogance (so an anti-Lachmannian might describe it) of claiming to know in advance what is important and what is trivial. Nevertheless the problem remains: how to trace the history of a text, particularly if we begin with the assumption that we know no more, and perhaps considerably less, than the scribes who made the copies?  The first credible answer was borrowed from the field of evolutionary biology, where they have a similar problem in trying to understand the order in which features of species might have evolved and the specific relationships to each other of members of a group.  This is the discipline of phylogenetics, and there are several statistical methods to reconstruct likely family trees based upon nothing more than the DNA sequences of species living today.  Treat a manuscript as an organism, imagine that its text is its DNA sequence, et voilà – you can create an instant family tree.

And yet phylogenetics, if you ask the Lachmannians and other text scholars besides, has its own problems.  First, the phylogenetic model assumes that any species living today is by definition not an ancestor species, and therefore must appear only at the edge of the family tree; in contrast we certainly still possess manuscripts that served as the ‘parent’ of other extant manuscripts.  Second, in evolutionary terms it is reasonable to model the tree as a bifurcating one – that is, a species only ever divides into two, and then as time progresses either or both of these may divide further.  This also fails to match the manuscript model, where it is easy to see a single text spawning two, three, or ten direct copies. Third, where the evolutionary model is assumed to be continously branching, it is well known that a manuscript can be copied with reference to two, three, or even four exemplars. This is next to impossible to represent in a tree (and indeed is not usually handled in a Lachmannian stemma either, serving more often as a reason why a stemma was not attempted.)  Fourth is the problem of significance of variants–while some scholars will insist that variants should simply not be pre-judged in terms of their significance, most will acknowledge the probable truth that some sorts of variation are more telling than other sorts.  Most phylogenetic programs do not by default take variant significance into account, and most users of phylogenetic trees don’t even try.

In a recent paper, some of the luminaries of text phylogeny argue that none of these problems are insurmountable. Neighbor net diagrams can give some clues regarding multiple text parentage; some more recent and specialized algorithms such as Semstem are able to build trees so that a text can be an ancestor of another text, and so that a text can have more (or even less) than two descendants.  The authors also argued that the problem of significance can be handled trivially in the phylogenetic analysis by anyone who cares to assign weighting factors to the variant sets s/he provides to the program.

While it is undoubtedly true that automated algorithms can handle assignment of significance (that is, weighting), it also remains true that there are only two options for assigning these weightings:

  1. Treat all variants as equal
  2. Assign the weights arbitrarily, according to philological ‘common sense’, personal experience, or any other criterion that takes your fancy.

This is exactly the ‘missing link’ in text stemmatology: what sorts of variants occured in medieval copying, how common were they, how commonly were they copied, and how commonly were they changed?  If we can build a realistic picture of what, statistically speaking, variation actually looked like in medieval times, it will be an enormous step toward reconstructing the stemmata by whatever means the philologist chooses, be it neo-Lachmannian, phylogenetic, or a method yet to be invented.

What we have done in the Tree of Texts project is to create a model for representing text variation, and a model for representing stemmata, and methods for analyzing the text against the stemma in order to answer exactly the questions of what sort of variation occurred when and how.  I’ll be presenting all of these methods on Friday, as well as some preliminary results of the number crunching. If you are at DH I hope to see you there!

Of circumstance and Armenian chroniclers

I promised to start blogging an inventory of my publications back in April. Yes, it’s now July. It turns out that my breezy confidence concerning the ease of discovery of my rights to my own work was…misguided.

My first publication arose from my M.Phil. thesis. The thesis itself was an enormous logic and date-accounting puzzle, which I thought was all kinds of fun but which, when described to fellow students, tended to get the reaction “I’m so sorry, that sounds horribly boring!”  That says something about the geek disposition, I suppose.

The topic of my thesis, and the eventual paper, was the chronological weirdness of the first book of the Chronicle of Matthew of Edessa. There is a back story there, on how a vaguely Byzantium-fancying computer geek came to be writing about an Armenian historical chronicle concerned in large part with a topic (the Crusades) that, had I been asked in 2003, I would have found utterly uninteresting.  It’s also a tale of how the smallest sorts of circumstance can shape a career.

I began grad school on the heels of the Great Dot-com Bust.  My bachelor’s degree was a strange MIT hybrid (“Humanities and Engineering”) which really meant that I had been on course to do a computer science degree when I realized that I could have a lot more fun doing half my coursework in history, and at the end of it I would still probably get a programming job at some Internet startup.  So it came to pass, but I could never shake the urge to go back and give history a more proper study.  In the end the universe did me a perverse sort of favor when my company laid me off just as I was finally resolving to prepare those grad school applications.

This is how I found myself in a room at Exeter College one gorgeous afternoon working out, together with the other new master’s students, what I ought to be doing for the next two years. Among the decisions we needed to make was the language we would study for the examination requirement; the (rather fantastic-sounding) options were Greek, Latin, Armenian, Syriac, Church Slavonic, and Arabic. I had enough Greek and Latin to be getting on with, but my powers as a dead-language autodidact had already failed me once when confronted with Armenian. Why not get some actual tuition in it and see how I did?

Of such whims are career paths made.  Once I had expressed a guarded interest in Armenian language, well, it seemed evident to the assembled dons that I should apply it by studying some Armenian history.  That turned out to be a field so very under-studied that potential thesis topics were lurking under nearly every assigned primary text and journal article.  I resolved eventually to write a thesis on the subject of the Armenian economy of the tenth and eleventh centuries, seeing what we might piece together by looking critically at literary and epigraphic sources. I dutifully began to read, and by August I had a collection of notes on the three main historians of the era (dots indicate approximate note volume):

  • [..]  Aristakes of Lastivert
  • [….]  Stephen of Taron       
  • [……………………………………………………………….]  Matthew of Edessa    

Hm. Clearly my thesis had chosen a direction, even if I hadn’t.  It was not Matthew’s poetic writing, vivid narrative, or historical accuracy that had caught my attention – in the latter case, rather the opposite. How could such a vast history be so very full of such obvious mistakes? Was there any rhyme or reason to them? Could we trust *anything* that Matthew was trying to tell us? If so, what? It took a few months more for the thesis topic to resolve itself to these chronological mistakes, but I got there in the end. The whole process began to turn into an intriguing logic puzzle that I had a lot of fun trying to solve, and it seemed a little unbelievable that no one had beaten me to it.

It took me three years (and another job in industry) to condense the thesis to an article suitable for publication, but I finally submitted it in 2008 to the standard journal for Armenian scholarship, the Revue des études arméniennes. My reward was a charming hand-written letter from the editor acknowledging my contribution and that he would be happy to publish it, though he wondered what my view was on certain issues I hadn’t addressed. I got to pretend for a moment that I was about fifty years older than I am, initiated into the academic community in an era where scholarship was carried on through personal correspondence.

As I have not heard anything from Peeters (and cannot find any information online) concerning author rights, and as I don’t believe I actually signed anything handing over any rights in any event, I have chosen to go with the safest reasonable option for open access: the final version of the article content, before typesetting.

Andrews, Tara L., ‘The Chronology of the Chronicle: An Explanation of the Dating Errors within Book 1 of the Chronicle of Matthew of Edessa’, Revue des études arméniennes 32 (2010): 141-64.