Dangerous Art of Text Mining

In this book, Guldi argues that a world awash in text—over 280 billion emails per day—requires interpretive tools that traditional quantitative science cannot provide. Text mining is dangerous because analysts trained in quantification often lack a sense of what could go wrong when archives are biased, incomplete, or evidence the suppressions of the past. The book pursues a catalog of disasters created by data science experts who voyage into humanistic study.

To overcome these dangers, this book proposes an approach based in “hybrid knowledge,” where historical methods direct the choice of algorithm and analysis. Case studies are explored in conjunction with rigorous discussions of historical methodology, grounded in recent work from the philosophy of history (including Koselleck, Erle, Assman, Tanaka, Chakrabarty, and others). While reflecting upon the seasoned toolkit of traditional historians, the book also vigorously investigates the quantitative dimension, and explores the “fit” of algorithms with each historical frame of reference on the past.

The book concludes with a discussion of the discipline of historical linguistics, where critical work with data was able to help researchers purge inherited racial bias from their map of the relationships between world languages, and it ends with a call for a “cyborg historian”—a hybrid researcher who brings the best of artificial intelligence to the contemplative framework of a traditional historian.


The Dangerous Art of Text Mining maps out a hybrid methodology for the humanities—one that can reconcile the powerful quantitative approach of the data sciences with the nuanced approach of traditional historians. Illuminating the dangers of a naïve approach to archival texts, Dangerous Art represents a bold path for employing technology in the service of humanistic reflection.

Order Book on Amazon