Data Mining and Quantifying Literature
Chapter 7, Data Mining and quantifying literature, explains how analyzing data from texts, images, and auditory files can help point us in a direction of beginning research. Data analysis allows us to pull out patterns of a piece of work that can be recognized as part of the works overall theme, topic, and mood. Text analysis in particular is a “subset of data mining that focuses on the analysis of language” (Underwood 2017). Patterns of terminology, vocabulary, and nomenclature contribute to the textual analysis of language. Furthermore, one can analyze a specific work of text and then compare it to a larger corpus in order to pull out larger themes. Generalizability and reliability “determine the extent to which the results of an analysis can be applied outside the single sample”. Before textual analysis can take place, distant reading but first be put into practice. Distant reading was first introduced in 2000 by Franco Moretti. Distant reading is “the idea of processing content- subjects, themes, persons, or places- or information about publication date, place, author, or title in a large number of textual items without engaging in the reading of the actual text” (112-113). The significance of using distant reading and data analysis together contributes to further research about large social and cultural questions, including “what has been included and left out of traditional studies of literary and historical materials” (113). One problem that arises with data mining is the volume of measurement of consumption. For example, how can we compare the consumption of the Holy Bible, which is mostly kept in the home and passed down between generations, to “the market records of commercial publishers of Harry Potter titles?” (115). Data analysis can also be done with image and audio files. However, several gender and racial biases can occur when it comes to voice inflection and facial recognition.
The chapter introduces how Voyant is an excellent tool for data mining. Voyant was developed by Geoffrey Rockwell and Stefan Sinclair and facilitates the crucial feature of correlation within data analysis. One explanation I found very useful from the text was how “keywords that repeat are often clues to themes or topics in a body of work and seeing how the terms are resituated is a useful tool for seeing the facets of an argument to which the term is central” (116).
I am analyzing the piece of work “The Dream” by Mary Shelley. Voyant tools has helped me notice patterns in Shelley’s changes in mood, terminology, and vocabulary throughout the text. The “trends” tool in particular has been very helpful in helping me establish which terms appear mostly in the beginning half of the text and which appear only in the second half. The mood of the piece drastically changes halfway through the text, which can be seen when terms like “hope” and “love” are very frequent in the beginning, whereas terms like “dark” and “death” appear very frequently at the end. I am excited to further explore this shift in the mood and use Voyant tools in helping me get quantifiable data from literature.
Comments
Post a Comment