Saturday, 8 November 2014

cold, bonfire, diploma, stylometry, jan rybicki, stylistics, economist, future of the book, burger-n-picpoul, hollybush

















Suddenly, there's cold weather to contend with. And heavens above, it's not really cold for November but just seems that way after a spoilingly hot summer extending well into autumn.

The central heating's now on to supplement the log fire.

The above pic isn't in fact our grate but the remnants of the Wolvercote Green fireworks night bonfire, taken as I headed to the canal on the morning of 6th November.

Alas, one of my last walks this way for some months, I imagine, unless we get an unusually dry spell. The beautiful first quarter of the walk, that feels like you are in the country rather than town, is suitably muddy and while I don't mind this I am only too aware of mud falling off my shoes onto the carpets in meetings and in senior colleagues' offices. So, it's pavement till Summertown and joining the canal there, from now on.

This week, amongst other things, I've been preparing for my long fiction module of the Oxford diploma. It's been fun to revisit the course materials, not least in the light of things I've been doing on other courses and for the Continuing Education Open Day event last Tuesday (which was great fun to do - great group of people taking part). Now looking forward to the first seminar early next week.

Very much enjoyed Jan Rybicki's talk on Stylometry and visualisation on Thursday evening (see last week's post for the outline). Rybicki's approach involves determining the frequency of words in digital texts and then applying a statistical process to the results in order to put them into meaningful forms such as graphs or diagrams and other visualisations.

One surprise to me was that Rybicki works with only a pretty small number of the most frequently used words in any given text - from 100 to 400. Also, these words tend to be common-or-garden ones such as 'the' and 'and'. Well, no great surprise there, given the parameters of the data collection - articles and conjunctions are bound to feature in the top most frequently used words list. What is really surprising, is that it is an author's use of the more mundane words that provides stylometrists with the information needed to attribute authorship and define an author's particular style relative to those of other writers. The words used in the analyses aren't the more esoteric ones that one might have expected to define individual style. And nor are they the ones that necessarily reveal the content of the work (as opposed to its tell-tale style).

Without going into too much detail (which would be beyond me, in any case), Rybicki has been using a statistical process known as Burrow's Delta (after it's creator, John Burrows) to analyse stylistic variations between ever increasing numbers of authors and works. The method appears to be an extremely accurate way of attributing works to a particular author. It also enables Rybicki to differentiate not just different authors but differences between the works of a given author. The works of Le Carré (who Rybicki translates into Polish), for example, cluster into different periods of authorship within the overall Le Carré-style grouping.

By applying the method to thousands of English texts, Rybicki is able to create beautifully striking visualisations that would make Rothko envious and that reveal relationships between authors across many centuries or other ways of grouping the word counts. While many of the groupings are as one would expect them to be, it is the small unexpected differences that are especially revealing. Tolkien's style for example setting him amongst writers of earlier centuries and Virginia Woolf cropping up in different areas of a diagram because of the different styles in which she apparently wrote. (Cases of statistical methods giving empirical backing to things that critics may have picked up intuitively - or at least less scientifically.)

Rybicki also noticed how women writers in the seventeenth and early eighteenth century seemed to have developed male styles in order to get their work published, whereas in later periods women writers' styles became more differentiated from male writers of similar eras. Until we get to the present, where styles of both male and female authors become much more intermixed and more difficult to define as masculine or feminine.

I was intrigued by a story that Rybicki told against himself. When he looked at translations into Polish from English that he had done (Le Carré, Ishiguro and other authors) in comparison with the translation work done by Polish colleagues, he noticed that his translations seemed to cluster together, stylistically, no matter what author he was translating. Whereas, with other translators, the style varies according to the writer they are translating. He felt that this showed that other translators were better than him.

For more information about Jan, visit the Computational Stylistics Group website - which includes an excellent HOWTO PDF that explains how to apply his 'stylo' word-counting and statistical analysis method.

It takes me back to Stylistics classes at Oxford with Professor Suzanne Romaine. Stylistics and its successor Stylometry, have, I think, a very practical relevance to the creative writer, despite their initially rather abstract appearance.

Meanwhile there is a great, great essay in this week's Economist about the future of the book (both physical and e), which includes fascinating insights into new publishing models, as well as observations about the effect of digital reading on literary style and the changing expectations of authors, as far as the reasons why they write are concerned.

Meanwhile, meanwhile, just back from an excellent burger-veg-n-Picpoul lunch at the Hollybush, Witney.

No comments:

Post a Comment