David Roffe, Domesday Databases

Domesday databases
Resourcing Resources Seminar, Oxford, 26 June 2000

Isn't it amazing? We Domesday junkies have been sitting around waiting for a Domesday database for the last 900 years and then, all of a sudden five come into sight all at once. I am privileged to have been consulted in the design of one, the Alecto Digital Domesday, to have seen a second, the Hull database (published as Domesday Explorer), in action, and to have extensively used a third, the COEL database. These are my impressions.

Domesday Book is a natural candidate for computerization. It contains masses and masses of data which no normal person could hold in his or her brain. And, incidentally, it is incredibly tedious to read. Like any historical source, however, it does present considerable problems to the database designer.

In looking at the pre-packaged tools available, I have been struck in the past by the extent to which notions of data have been influenced by the natural sciences. Facts are facts: they are discrete and therefore they can be crunched without any scruple. Personally, I think that many scientists are epistomologically-challenged. But, be that as it may, historians should not be. They ought to have no difficulty with the notion that facts are normative. Context it everything.

It is not enough, then, merely to fillet Domesday for its 'facts'. To do so would be to lose a large part of the evidence. The data themselves can be best described as fuzzy. We do not always (indeed usually) have a firm idea of what they mean. Let me explain. Of the many hundreds of items of information in Domesday Book, there are five that are basic. A tax assessment ('geld'), ploughlands (don't ask), ploughs, population figures, and a value appear for almost every village described. With details of tenure, these matters were clearly central concerns of the compiler of Domesday Book. And yet, incredibly there is no consensus as to what they mean. To my mind there is an historical uncertainty principle at play here. To others it merely betokens lack of effort. It matters not. We don't know.

The designers of the three databases that I have seen are well aware of this fuzzy data problem. By and large, they have adopted one of two strategies to tackle it. The first is exemplified by the Hull database. I shall characterize it as 'exhaustive coding'. The designer, John Palmer, has not only tagged the thousand and one tangible 'variables' of the text but has also tried to represent something of the form in which they are recorded and sometimes the relationship between them. This is an approach that has a long history in Domesday studies. When a proposal was put forward in the late eighteenth century to print Domesday Book for the first time, it was realized that a transcript was not enough. A font, known as Record Type, was therefore specially designed to represent the manuscript. In fact, it is a sophisticated system of coding. I myself have developed a notation to represent different characteristics of the text.

With sensitivity and care, exhaustive coding can be very effective. The Hull databases claims to be able to produce any type of Domesday you please. From what I have seen of it this is no idle boast. The amount of information that can be packed in is considerable. Let's take as an example the record of a humble villein, that is villager. The statement 'one villein is there' seems pretty well unambiguous. But what do we think when we get to half a villein? Well, presumably we are talking less about a person than dues. We need to know context. So, it might be coded thus: population, villein, manor, berewick, geld paying. Various other technical tags might also be put in.

The approach has the great advantage of transparency. Like a good subject index, you know what you can get out of it at a glance. However, its success does depend on the thoroughness and consistently of the designer. Its greatest limitation is the understanding, priorities, and interests that he or she brings to bear on it. The Hull database coding is undoubtedly comprehensive, but it does not tell me, for example, about the form of initial capital letters. Sad, but true, some of us worry about such things.

To be fair, the unforeseen is catered for. John Palmer has mapped each entry to the OS facsimile, so that entry by entry and folio by folio it is possible to observe the data in context for oneself. What is clear is that ultimately there is no substitute for the text. This brings us to the second approach to the fuzzy data problem. That is the 'text-based' databases. I'm afraid that it is a sad fact that there is no complete machine-readable Latin transcript of Domesday Book. Amazing, isn't it? The Hull database allows string searches of a translation, but unfortunately this is inconsistent and inaccurate. The Alecto Digital Domesday will use a complete, standardized, and critical translation, again mapped to a facsimile. Here the main items of information are coded - place-names etc - to aid navigation. But the main means of access will be string-searching with some sort of proximity Boolean operators. As we have just seen, COEL uses, inter alia, a subset of the Domesday data - personal names - and gives them in Latin.

Both of these databases, Alecto and COEL, minimize the intrusion of the database designer and as such enable the user to see the data in context. COEL in particular is attentive to the nuances of the text. Moreover, it gives context in depth by providing documentation of individuals and families in the same sensitive way for the hundred years after Domesday. There are no compromises here. But, we have to be clear that we are dealing with expert systems. The user must be fully conversant with the Domesday text. The Alecto Digital Domesday will be the best tool available for mapping the changing forms of expression in the text, 'the diplomatic' for short, but it is up to the user to identify them. COEL has sophisticated pattern searching tools, but there is no doubt that you have to 'think' Domesday at a very basic level to use it. When I first encountered it I was surprised to find out to what extent I myself tended to think in terms of translations. One has to be fully aware of all the possible forms in which information can occur. Fortunately for all of us. Katharine Keats-Rohan has built in a second, more interpretative, level of access for those who need the information quickly or who simply have tired brains.

In practice, which database will be used will depend on what the researcher is doing. For myself, I shall be using all three if my computer doesn't crash. Each provides unique features, and I suspect that most scholars will follow suit. It is legitimate to ask, then, what effect will they have on Domesday studies? At best they will allow analyses that have probably not been possible before. In using COEL I am appalled at just how easy it is to formulate and test prosopographical hypotheses. It's a PhD machine and will probably be banned for making life too easy for postgraduates. The Hull database not only provides rapid access thanks to its coding, but excellent reporting capacities: its graphical interface facility alone promises radical new insights into the Domesday data. Finally, the Alecto Digital Domesday will greatly facilitate the study of the forms of Domesday Book and the social reality that lies behind them. All three are going to make life much easier for the Domesday scholar. There is no doubt that, when published, they will be the primary tools for accessing Domesday.

The down side may well be that, despite the best efforts of the designers, the data will take over interpretation. Tabulation is always seductive in one way or another and presentation through a monitor does not help: I can only imagine that it is some notion like 'seeing is believing' that confers such great authority on the dots on a cathode ray tube. Even now there are ecomonic and social historians who insist that their computerized Domesday data sets are as transparent as those of the Central Information Office and claim to be able to prove it by statistics. Thus, I find myself continually arguing that the statistical correlation of variables does not in itself validate data. Many years ago I found a strongly positive numerical relationship between the eleventh-century tax assessment of a group of Lincolnshire villages and the number of pubs marked on the modern OS map. I threw my calculator away and my data sets.

Ease of access to the data may also make Domesday studies even more difficult to read. With such a rich source as Domesday Book, there has always been the temptation to transcribe rather than describe. Welldon Finn, one of the most prolific writers on Domesday, had an extensive card index. He wrote from it and it shows. Example follows example and never a conclusion. A database can never be a substitute for synthesis. It's a fact familiar to Buddhists that enlightenment only comes after analysis has stopped. Domesday databases will revolutionize the subject, but only so long as we learn when to turn them off. It has been a long wait, but it will be worth it if we do just that.

ã David Roffe, 2000.

Alecto | COEL | Domesday Explorer | Lectures