This talk aims to provide a philosophical framework through which the current emphasis on data-intensive biology can be studied and understood.
Over the last two decades, online databases, digital visualization tools and automated data analysis have become key tools to cope with the increasing scale and diversity of scientifically relevant information that is being accumulated (the so-called ‘big data’). Within the biological and biomedical sciences, digital access to data has revolutionized research methods and ways of doing science, thus also challenging how life is researched and understood. Prominent scientists have characterized this shift as leading to a new, ‘data-intensive’ paradigm for research, encompassing innovative ways to produce, store, disseminate and interpret data. This talk aims to provide a philosophical framework through which the current emphasis on data-intensive biology, and more generally the role played by data in scientific inquiry, can be studied and understood. To achieve this, I focus on what I call data journeys: the ways in which scientific data are disseminated across a multiplicity of contexts in order to function as evidence for knowledge claims. As I will show, the more widely data are disseminated and re-used, the more significant their epistemic role is deemed to be. To be transformed into knowledge, scientific data need to be ordered, labelled and packaged to make them portable – that is, capable of being picked up and transported across different sites. In this talk, I focus on the role of online databases as key sites for data packaging, whose structure and functioning strongly affects how existing data about organisms are transformed into new claims about the biological world. Building on a close study of the material conditions under which data travel, I then put forward three main arguments: (1) portability is a defining characteristic of data as a component of scientific inquiry, which crucially depends on the specific domains through which data are made to move; (2) what counts as data in the first place depends on the procedures and contexts through which researchers attribute evidential value to objects and processes; and (3) the fruitfulness of data-intensive science can thus be understood as resulting from the skilful use of information technologies to articulate and multiply the contexts in which different types of data can be organised and interpreted.