Blog post 5: Data mining & quantifying literature

 Data mining is a way of finding the patterns in a piece of text or media by having a computer examine the digital file. Based on what I have read this week from the topic’s material, I think that using analytics and computers to extract data from literature and art can have both positives and negatives. Tools available on programs like Voyant can be helpful to know the basics and the measurements of a story. Word count, frequently used terms, their collocations, and their frequency over time in the text can all be interesting to examine. It is facts that stick out from the mass of words that give us something substantial to say about the story. More advanced computers and analysis, as described in the textbook chapter, can even perhaps determine genre and theme. This ‘distant reading’ of a text gives quantifiable results, but it has its downsides, too. Taking a quote from the textbook that stood out to me, “While patterns emerge, and the large trends can be discerned, the question of what these are indicative of remains. Do they only show trends in the data? Or can they reveal trends in phenomena of the actual and lived world?” (Drucker, pg. 115). It’s true that this data mined from literature or art can be concrete, but the key to understanding its significance is context. Without context of words, which can have complicated and variable meanings depending on their placement, connotation, tense, and part of speech, all meaning of the mined data could mean nothing.

Data mining can show readers things they otherwise wouldn’t be able to find themselves without serious physical and mental work. Knowing when a word trends throughout a piece of literature and knowing which words trend alongside it wouldn’t have been easily available to understand without data mining and using digital analysis tools like Voyant. For example, in my short story “The Evil Eye” by Mary Wollstonecraft Shelley, I can look at the correlation between the words ‘evil’ and ‘eye’ and this led me to understand that only once are twice are the used individually apart from one another in the story. Therefore, the term ‘evil eye’ is a significant term that probably has a greater meaning if it is consistently created throughout the text by the author. While data mining and distant reading has its advantages, I don’t think it can yet be applied in extremely significant ways and I don’t believe it can replace actual reading and human analysis. Yet, it shows promise for what computers are capable of when examining human art.  

Comments

  1. I love these ideas! I had similar thoughts on the uses of voyant tools, I specifically liked what you said about how voyant tools allows us to access substantial facts in the short stories. In my group we did similar analysis of our short stories. We looked at the individual works of Edgar Allan Poe, I specifically looked at his short story "The Masque of Red Death". In my individual analysis I found that the most frequent words used were "death, "mask", "red" and "prospero", each of these words highlighted the themes of morality and the facade of safety. These words and the context of the words all develop the themes and motifs. Regarding our group analysis we found that the most used word is "man" which revealed that Edgar Allan Poe only writes his protagonists as men and has never written about woman. Yet I agree with your statement that distant reading and Voyant Tools cannot be solely used a form as analysis and there has to be a balance of distant and close reading.

    ReplyDelete

Post a Comment

Popular posts from this blog

Blog Post 1: What is Digital Humanities?

What is Digital Humanities? Post #1 (Kira Littlefield)