The tide is high for textual humanities. A vast amount of digital data is at the researchers’ fingertips, as are powerful text mining tools for topic modeling, textual similarity clustering, and sentiment analysis. Each year, more humanities scholars start working in the new digital paradigm, using algorithmic methodologies and machine learning, including deep learning. However, many of the tools were developed within a frame of engineering and commercial application – such as sifting texts for content, classifying sentiments in product reviews, filtering spam e-mails, detecting political stance. But are these tools really so powerful in the hands of humanities scholars? Do they answer pertinent questions and offer new perspectives? Or do they drive the Humanities from being a reflective, complex discipline into yet another way of optimizing capitalism’s self-regulation?
In our 18-months Digital Lives project “Research Epistemologies in Text-based Digital Humanities: Analyses of Valuation Practices after the Machine Learning Turn”, funded by the SNSF and starting in December 2018, we take stock of the current state of an emerging field. Our research centers on how two text-based disciplines – corpus linguistics and computational literary distant reading – have been transforming themselves within the digital paradigm, applying and adapting tools originally constructed within natural language engineering. From a humanities point of view the success of the digital methods is striking, as they usually do not require any expert knowledge about the make-up of texts: Machine Learning (ML) and particularly recurrent neural networks (deep learning) methods have proven to be more successful than rule-based methods or approaches operating directly on linguistic or literary annotations.
In view of the unfolding machine learning turn we will run two aligned case studies that apply state-of-the-art sentiment detection. Our practical aim in these case studies is to examine how available dictionaries, algorithms, and machine learning procedures work on two sets of textual data extracted from the social web, recepies comments (chefkoch.de) and literature reviews (lovelbooks.de): How do users convey their evaluation of cooking and books? Given traditional humanities approaches of close reading and annotation, we will examine how well the automatic methods perform.
Using this empirical research, in all its stages, as a source, our main aim is to examine its underlying epistemology: What impact does the digital technologyhave? What research epistemologies are inscribed in the ML-methods? In how far are they compatible to the axioms and methodologies inherited from the humanities tradition? How do the disciplines deal with the “irritation” of the digital technology? It is not our goal to find the best method, but to reveal how underlying worldviews and epistemologies guide the analysis and potentially emerge anew.