Shakespeare’s language: a voyage through numbers

Numbers can bring some people out in a cold sweat. But fear not.

Let us quickly dispatch some easy-to-understand numbers.

The Arden Encyclopedia of Shakespeare’s Language was published by Bloomsbury last summer. Its publication came after 25 years of preparation, a team of up to 25 researchers and over seven years of hard work. All this was made possible by an Arts and Humanities Research Council grant.

What is the research reported in The Arden Encyclopedia of Shakespeare’s Language? Some more numbers. Volumes one and two comprise approximately 20,000 word-entries gleaned from a million-word corpus of Shakespeare’s plays. This is the first fully corpus-based dictionary of Shakespeare’s language and most comprehensive since Alexander Schmidt’s in the early 1870s.

Looking for patterns

Numbers, less trivially, lie at the heart of this research. Don’t stop reading at this point! This won’t be mathematics lesson.

We all deal with numbers because we all deal with patterns, and patterns naturally involve numbers – we need more than one of something to form a pattern. Patterns become noticeable when they contrast with other patterns. For instance, people in the UK, often despair at the fact that two or three buses often arrive at the same time, after a long wait at the bus stop, rather than being regularly spaced out.

Much of my working career has been spent looking for patterns in language. Those patterns can help diagnose how language is working, how meanings are constructed. This is exactly what I have been doing in my work on Shakespeare’s language: using computers to identify large-scale and subtle patterns in Shakespeare’s language.

Different from previous research

So? How does that result in anything different from what previous research has done?

For starters, it enables us to accurately identify what senses of a word are common in Shakespeare and what are not. Dictionaries such as the Oxford English Dictionary order the senses of a word according to when a particular sense starts being used, with the oldest sense being listed first.

In the dictionary part of the encyclopaedia, volumes one and two, the most frequent sense, the sense that readers are most likely to encounter, is given first. For example, the most frequent sense of the word ‘good’ in Shakespeare is not a simple moral judgement. ‘Good’ frequently co-occurs with other words to form a polite, deferential addition to salutations, as in ‘My good Lord’.

And frequency of co-occurrence is not restricted to the frequency of words co-occurring with other words. We track, for instance, the patterns of word use according to different kinds of character. For example, the words ‘alas’ or ‘ah’ are revealed to be heavily used by female characters, doing the emotional work of lamentation in the plays (especially histories).

Looking beyond Shakespeare

But it is not just about looking at patterns within Shakespeare. Other dictionaries define Shakespeare by looking just at Shakespeare. The result is a bit circular, ignoring the fact that Shakespeare’s words had lives amongst his contemporaries. Just like buses coming in clusters forms a notable pattern in contrast to evenly spaced buses, we must contrast patterns in Shakespeare with patterns outside Shakespeare.

We compared patterns of occurrence in Shakespeare’s language with a matching million-word corpus of contemporary plays, and also with huge corpus of 320 million words of various writings from his period.

It is obvious perhaps that the word ‘wicked’ occurs densely in religious texts of the time, but who would have guessed that of the highly frequent word ‘ourselves’? To take another example, we show that the word ‘bastard’ most often appeared in early modern instructional, scientific texts, especially those relating to botany. It frequently referred to a flower that was genetically hybrid; it was not, as today, a simple term of personal abuse. This is not to say that it couldn’t be abusive, as it is in ‘King Lear’, but its use involved a much more offensive claim about somebody being genetically hybrid, fundamentally impure.

What about infrequent words? They are just as interesting. The dictionary identifies the words that occur but once in Shakespeare, such as ‘bone-ache’ (syphilis) or ‘ear-kissing’ (whispering, though other writers used it for flattering), and words that seem to have their earliest occurrence in Shakespeare (including, the decidedly modern sounding ‘self-harming’).

Coming soon…

And that is just volumes one and two. Volumes three and four will appear within a year. Volume three looks at how words pattern together to create particular plays and characters; volume four looks at how words form social networks of interaction amongst characters. Volume five, examining how words form patterns of meaning across the plays (for example, love, death, body parts, money, time), will be out the following year.

That’ll be the end of the numbers, I promise you!

Jonathan Culpeper

Professor of English Language and Linguistics, Lancaster University

Read more about Professor Jonathan Culpeper.