Blog Post 5: Data Mining and Quantifying Literature
Chapter 7 discusses data mining and how it is an efficient tool to understanding aspects of literature that would not be possible by just reading it. In my last blog post, I defined data mining as, "the analysis of digital data through searching for patterns that may assist in determining meaning". Quantifying literature offers a new perspective on how works of writing a viewed. Through turning qualitative data into qualitative data, literature is able to be viewed from a more objective lens. From quantifying literature, the researcher is able to view how many words are used, how many times each word is used, where each word is used, or even what words are often connected to other commonly used words throughout the piece of literature. It is up to the researcher to interpret the significance of this data as it might not be very obvious on first glance. Quantifying literature can also help compare two or more different works. This could offer insight a multitude of different areas of literature such as on an author’s overall writing style or how the structure of one book compares to another (etc.). Overall, data mining is quite useful when analyzing data as it provides a new way to interpret literature.
In class, we have been utilizing the data mining website Voyant. This website takes a work of literature and converts its qualitative data into quantitative data in a more palatable way. For instance, it offers a “bubbleline” graph to show how the most commonly used words are distributed throughout the works of literature. It also offers statistics that are already calculated and ready for interpretation (i.e. readability index). In my group, we are using Voyant to further understand our individual works of Edgar Allan Poe and our works as a collective. We are specifically utilizing the “trends” feature which divides the work into sections and then graphs how often each word is used in each section. For our group project, we noticed that in this section the word “man” was the most used word throughout with “death” right behind it. In our presentation, we are going to discuss how Poe’s main genre was horror and often focused on themes such as death. We will also talk about how in all of our short stories, Poe’s protagonist is a man and none of the characters in any story involves a woman. In my (The Pit and the Pendulum) story specifically, I noticed that the word death is consistently used throughout, insinuating that death is a main theme throughout the story. I believe that Poe uses literature to define his relationship with death, one that is rooted in fear. I would not be able to make that assumption with confidence without the use of Voyant as I can numerically see that death is a pattern in my short story as well as the three other stories my group members are reading. This proves the importance of data mining as it offers evidence to support generalizations of an author and their works. I am still trying to understand how Voyant works and how to interpret the data it provides me; however, I can see how this tool would be a game changer for the world of digital humanities.
In class, we have been utilizing the data mining website Voyant. This website takes a work of literature and converts its qualitative data into quantitative data in a more palatable way. For instance, it offers a “bubbleline” graph to show how the most commonly used words are distributed throughout the works of literature. It also offers statistics that are already calculated and ready for interpretation (i.e. readability index). In my group, we are using Voyant to further understand our individual works of Edgar Allan Poe and our works as a collective. We are specifically utilizing the “trends” feature which divides the work into sections and then graphs how often each word is used in each section. For our group project, we noticed that in this section the word “man” was the most used word throughout with “death” right behind it. In our presentation, we are going to discuss how Poe’s main genre was horror and often focused on themes such as death. We will also talk about how in all of our short stories, Poe’s protagonist is a man and none of the characters in any story involves a woman. In my (The Pit and the Pendulum) story specifically, I noticed that the word death is consistently used throughout, insinuating that death is a main theme throughout the story. I believe that Poe uses literature to define his relationship with death, one that is rooted in fear. I would not be able to make that assumption with confidence without the use of Voyant as I can numerically see that death is a pattern in my short story as well as the three other stories my group members are reading. This proves the importance of data mining as it offers evidence to support generalizations of an author and their works. I am still trying to understand how Voyant works and how to interpret the data it provides me; however, I can see how this tool would be a game changer for the world of digital humanities.
Sounds like your group has a great start!
ReplyDelete