The use of quantitative methods in historical research is not new. Quantitative research methods were a part of economic, political and social history during the 1960-1980s. Modern technological advances in computing means collecting data no longer has to be difficult and time consuming. So it is easy to see why, in 2012, Danah Boyd and Kate Crawford believed the era of big data had begun.  Large data sets, often seen as the defining feature of big data, can be created and manipulated on desktop computers. Technological advances led to big data becoming more accessible and naturally it was introduced into digital humanities. That does not mean it has been welcomed with open arms into either digital humanities or digital history. Scepticism as to the arguments or evidence big data has to offer has been high and the impact it has on historical research and the types of research question it creates has been a big bone of contention for digital historians. This essay will examine whether big data changes the nature of historical research, narrowing the types of research questions or whether it opens up other avenues of historical research and allows for the possibility of expanding our historical knowledge.
While quantitative methods have been implemented in historical research, the research method most associated with history is ‘close reading’. Historical research has been dominated by the interrogation of a small number of carefully selected, often text based, primary sources which can provide valuable insights. Jean-Baptise et al. argue ‘reading small collections of carefully chosen works enables scholars to make powerful inferences about trends in human thought’ . Yet, as Franco Moretti argues, the trouble with close reading is that it necessarily depends on an extremely small canon and as such cannot provide an insight into the underlying system or social phenomena. The hostility towards big data could be contributed to the challenge it presents to primary sources. The expertise of ‘close reading’ is not only central to humanistic disciplines but ‘to the self identity of lots of humanists themselves’. Historical research methods implementing big data can open up other, larger avenues of research to the historian but it involves stepping away from the text and it is in this way big data changes historical research.
Distant reading focuses on the quantitative data removing text from its context. Instead of analysing sources for its literary forms, conventions and semantics, the source is text mined to create statistics and probabilities. Big data offers a different way of reading primary sources, either by reducing the text to units smaller than itself such as devices, themes or tropes or expanding it to include larger units such as genres or systems. This form of research allows cultural and historical trends which would have remained hidden. A good example of the historical trends missed by historian but found through distant reading can be found in n-grams database created by Erez Lieberman Aiden and Jean-Baptise Michel. The censorship of unknown authors and artists during the Nazi period were found through the analysis of the database containing over two trillion words. Thus there are advantages to big data. It has the potential to create a global history by measuring the various changes in human society. And it does this by opening up the bigger research questions historians, have always wanted to ask but have been restrained by a lack of time, funding and resources to endeavour on a task of such magnitude.Boyd and Crawford argue big data creates a radical shift in how we think about research by looking at new objects and new methods of obtaining historical knowledge.  For them, big data is not considered in the binary conversation of close versus distant reading but how big data ‘reframes key questions about the constitution of knowledge, processes of research and how we should engage with knowledge’. However, it is important not to over emphasise distant reading as a revolutionary new practice of digital history, big data does have its disadvantages and limitations.
The first limitation of big data is the reliance on computers for research purposes. Arguably, the majority of historians and history students are tech-savvy enough to have a good handle on standard programmes such as Microsoft Word and Excel. However, big data requires the use of more sophisticated representations of its research findings than the tables, graphs and charts created with a fair knowledge of Microsoft Office programmes. The creation of Extensible Markup Language (XML) and Application Programming Interface (API) and the gathering and analysing of large amounts of data is a ‘skill set generally restricted to those with a computational background’. Perhaps it is the lack of computational knowledge which makes digital historians sceptical about the benefits big data has to offer digital history. Boyd and Crawford argue humanists use digital resources all the time, but due to their naivety are on how the resources work, they are unaware of the potential to get more out of them. Instead humanists arrange resources in ways that make it far harder for them to be used. But big data is not just limited to the computer literacy of the digital historian; the data sets, themselves, carry their own limitations.
It is important to interrogate textual sources before carrying out qualitative research; similarly it is important to ask critical questions of big data before quantitative research is carried out. Regardless of the size of the data set, the properties and limitations should be understood and the biases of the data should be determined before carrying out quantitative research. Furthermore, historians should assess the limitations of the questions they can ask of big data and determine what interpretations are the most appropriate. Boyd and Crawford argue big data is problematic in it enables patterns to be read in the results where they do not exist. But is this not also a problem faced with textual sources? While the research methods differ both close and distant reading suffer from the same problems of using primary sources. Both require the retention of context. When a word, sentence, phrase or paragraph is removed from the entire document, it loses its context. So too does quantitative data lose context when data is interpreted at scale and it is ‘even harder to maintain when data [is] reduced to fit into a model’. But this is not the only similarity between close and distant reading. The process of generalisation in close reading and the process of categorisation in distant reading both rely on historians’ judgements. The fear of big data changing historical research could be a manifestation of the fear of digital history being ‘no more than a colonization [sic] of the humanities by the sciences’.
Big data may apply some of the quantitative methods of computer science but it does not apply objectivity as readily as its connection with science would imply. Big data requires an interpretive framework and is thus subjective. Big data’s tools of representation: summary tables, charts, line graphs and other modes of visualisation all require interpretation themselves. Trevor Owens argues data can be constructed artefacts, texts and processed information which can be interpreted. Data sets are created by historians who make choices on what data to collect and how to encode it. Therefore, as constructed artefacts, data has the same characteristics as textual sources created by authors. So consideration of its author’s intended audience and their purpose for the data should be undertaken. Even in the visualisation stage, data is not evidence but new artefacts, objects, and texts that is generated and which can also be read and explored. Ben Schmidt supports Owens, arguing the ‘graphs derived from big data require nuanced, contextual interpretation [and] . . . give a new source to interpret’. While the objects of interpretation are not manuscripts stored at archives, or digitally for that matter, these new artefacts created by big data are still reliant on literary canons. Big data, for the moment at least, is mainly created from textual sources. Although these textual sources are not studied in-depth but are mined for word frequency, use of semantics or for other such statistical purposes, the interpretations created from them are still reliant on textual information. Tim Hitchcock argues this is why distant reading, to a certain extent, does not provide new insights to historians. Research questions are still being determined by literary canons and still resemble those asked through older technology. This is why big data does not always change the research questions historians ask. Retention of old concepts and methods can lead to historians imposing limitations on big data. By reading data in terms of the interpretations already amassed, the value of the information data has to offer is lost. Heuser and Le-Khac have recognised the problems of such impositions on big data. They argue there is a tendency to throw away data that does not fit established concepts which potentially damages historical knowledge. Big data needs to be more than validation of existing interpretations otherwise ‘quantitative methods [will] never produce new knowledge’.
Historians too concerned with whether big data changes historical research miss the opportunities provided by big data to increase their historical knowledge. Instead of arguing over which research method should be employed in digital history, historians could employ a mixture of both close and distant reading to provide both the depth and breadth they have been striving for. Owens argues ‘big data is an opportunity . . . to bring the skills . . . honed in the close reading of texts . . . into service for this new species of text’. By employing close reading methods on artefacts created by distant reading methods, historians receive both the in-depth and specific human experience and the broader trends of society that experience sits within. Big data may be a new research method but the sources it creates can strengthen pre-existing methods.
Quantitative methods may not be new to historical research but technological advances has allowed quantitative research to penetrate history to such an extent that it sparked debates over the effects it has had on historical research. This debate has centred on the use of close and distant reading, the advantages and disadvantages of both methods of research and the contributions each makes to historical knowledge. The main concern of historians is how big data changes research questions. While big data offers the opportunity to explore larger, cultural research questions, its reliance on narrow literary canons allows big data to create sources which in turn can be closely read and thus can be used in the same research questions we ask of non-digital primary sources. Big data has the potential to not only inform historians of new trends, both globally and socially, but can be used to enhance pre-existing research questions.
 Danah Boyd & Kate Crawford, ‘Critical Questions for Big Data’, Information, Communication & Society, Vol. 15, No. 5 (2012), p. 662-679
 Jean-Baptise Michel et al, ‘Quantitative Analysis of Culture Using Millions of Digitized Books’, Science, Vol. 331 (2011), pp. 176-182
 John Bohannon, ‘Google Opens Books to New Cultural Studies’, Science, Vol. 330 (2010), p. 1600
 Boyd & Crawford, ‘Critical’, p. 665
 Boyd & Crawford, ‘Critical’, p. 665
 Boyd & Crawford, ‘Critical’, p. 674
 Boyd & Crawford, ‘Critical’, p. 668
 Boyd & Crawford, ‘Critical’, p. 670
 Boyd & Crawford, ‘Critical’, p. 668
 Boyd & Crawford, ‘Critical’, p. 671
 Ryan Heuser & Long Le-Khac, ‘Learning to Read Data: Bringing out the Humanistic in the Digital Humanities’, Victorian Studies, Vol. 54, No. 1 (2011), pp. 79-86
 Heuser & Le-Khac, ‘Learning’, p. 81
Hitchcock,Tim, ‘Big Data for Dead People: Digital Readings and the Conundrums of Positivism’, Historyonics, http://historyonics.blogspot.co.uk/2013/12/big-data-for-dead-people-digital.html; consulted 10th April 2014
Manning, Patrick, ‘Big Data in History’, We Think History, www. http://wethink.hypotheses.org/1485; consulted 10th April 2014
Moretti, Franco, ‘Conjectures on World Literature’, New Left Review, Vol. 1 (2000), www. http://newleftreview.org/II/1/franco-moretti-conjectures-on-world-literature; consulted 12th April 2014
Mullen, Abby, ‘”Big” Data for Military Historians’, Canadian Military History, http://canadianmilitaryhistory.ca/big-data-for-military-historians-by-abby-mullen/; consulted 10th April 2014
Owens, Trevor, ‘Defining Data for Humanists: Text, Artifact, Information or Evidence?’, Journal of Digital Humanities, Vol. 1, No. 1 (2011), http://journalofdigitalhumanities.org/1-1/defining-data-for-humanists-by-trevor-owens/; consulted 11th April 2014
Schmidt, Ben, ‘Assisted Reading vs. Data Mining’, Sapping Attention, www. http://sappingattention.blogspot.co.uk/2010/12/assisted-reading-vs-data-mining.html; consulted 11th April 2014
Schöch, Christof, ‘Big? Smart? Clean? Messy? Data in the Humanities’, Journal of Digital Humanities, Vol. 2, No. 2 (2013), http://journalofdigitalhumanities.org/2-3/big-smart-clean-messy-data-in-the-humanities/; consulted 12th April 2014