Blog Post 5: Data Mining and Quantifying Literature
When I first started reading about data mining and distant reading last week I wasn't really sure of what purpose it could have. It seemed to me that this method of interacting with literature or art would leave out important context or subtleties in language. The meaning in literature is created by the careful placement of words and I couldn't really understand what use it would be to reduce all this down into quantitative data to be analyzed by a machine. However, after reading more about it and working with Voyant myself I feel like I am getting a better grasp on the use looking at art in this way can have. Close reading and distant reading are not mutually exclusive. Instead distant reading adds to the interpretive power of close reading by allowing for analysis of a larger body of work. It would be impossible to read every book that was published in a certain decade, but by utilizing distant reading practices a person could identify greater literary patterns and apply these patterns to their analysis of a single novel or work. As Jesse Rosenthal is quoted in "Quantifying Literature", '“There’s this huge amount of text and you find yourself studying the little bits that float to the top while just gesturing at the larger body that it’s part of,”' (Keiger 2010). Extracting quantitative data from literature or art can create a view of the "bigger picture" and help place individual works within it. While I don't think this is the only way literature or art should be studied, I think it has the capability of enhancing current methods.
A lot of the power of data mining is its ability to pick up on larger patterns that may not be visible to a person analyzing a body of work one book or piece at a time. As stated in The Digital Humanities Coursebook, "In many ways, the advantages of digital processing are best appreciated for doing what humans cannot do rather than for trying to emulate our capacities" (Drucker 124). For example something like detecting genre is a process humans are already very capable of doing. And, for a machine to do this it still requires human interpretation of what words are related to what genre. This is difficult to do because often genres overlap or are not easily defined. However the capability of data mining and analysis to find word frequencies and word correlations can aid in creating new insights about the work being analyzed. This again is more "bigger picture" analysis. Where a human can more easily interpret emotions and understand things such as humor, sarcasm, irony, or colloquial speech, the computer can readily count and correlate words.
For example, when reading my short story for the distant reading project, "The Old Chest at Wyther Grange" by L.M Montgomery, it was clear that the contrast between the concepts of youth and old age were recurring themes. The beginning of the story is told from the perspective of a very imaginative child while the later half is told from the perspective of a grown up, more mature version of the narrator. When the narrator is young she has a stark foil in her sensible grandmother who almost appears trapped in the past. While reading the story I didn't really have a sense of how the "old chest" fit into this broader theme. When I put the story into Voyant and started looking at the correlation in word frequencies throughout the story I saw that the words "childish" and "chest" had a very strong correlation. This made me go back and read the story a second time, paying attention to this pattern. I realized that the mystery of the chest seems to bring out childish curiosity in all of the characters. It makes sense that the chest is a symbol of youth or naivety when you find out what is in the chest. By using Voyant I was able to find a pattern I had not picked up on while reading the story which led to further insights about the meaning of the story.
I really like your break down of the chapter's themes and connection to your own project! Your post helped me to further understand the text and how to use patterns in literary works. I like the quote you said from the text about how digital processing is really good for things that humans cannot do, rather than just being more efficient at what humans can do. With the application Voyant, it is great about analyzing literary pieces and finding patterns human may not have seen to create a better understanding of the context in which the story was written. My Voyant showed me that the word silence and desolation are used a lot, but at different times in the story. When the frequency of one picks up, the other goes down and it shows the meaning behind the story. At the end desolation completely disappears from the story, while silence is used more than ever. This alone helped me relate this to the theme that in the end silence is more powerful and frightening than desolation could ever be. Voyant also has some other good data processing that helped me articulate this point further and in the end I do not think I would have found these patterns on my own.
ReplyDeleteWow, some great patterns there!
DeleteSome interesting findings! Sounds like you have a good start on your analysis!
ReplyDelete