Sociotechnical Data studies

Parallel Session 2:
Wednesday 7 June, 14:00 - 15:30  

Grupperom 4, Georg Sverdrups hus

Sam Hind, University of Manchester, and James Steinhoff, University College Dublin: Simulation and a prehistory of synthetic data 

Greti-Iulia Ivana, School of Social and Political Sciences, University of G: Streaming and the Making of Socio-Technical Assets 

Frederik Vejlin, Department of Digital Design and Information Studies: Complex Data: Notes on the complexities of simplification in data-driven HRI experiments 

Tanja Knaus, TIK, University of Oslo: The politics and epistemological challenges of database ethnography: how to conduct empirical studies on data infrastructures 

 

Parallel Session 3:
Thursday June, 09:00 - 10:30  

Grupperom 4, Georg Sverdrups hus

Sini Teräsahde, Faculty of Social Sciences / Faculty of Education and Culture, Tampere university: Datafication in public self-services - Hidden agendas on data providing systems in employment services

Essi Iisakka and Marja Alastalo, University of Eastern Finland: Sociotechnical imaginaries of data-driven welfare services and forgotten data work  

Åshild Kolås, Peace Research Institute Oslo (PRIO): Crisis informatics, disinformation and emergency politics  

Bidisha Chaudhuri, International Institute of Information Technology Bangalore (IIITB): Unpacking the Worlds of Covid-19 Dashboards through a Sociotechnical Lens 

Parallel Session 4:
Thursday 8 June, 11:00 - 12:30  

Grupperom 4, Georg Sverdrups hus

Susanna Lidström, KTH Royal Institute of Technology: Mediating ocean data: from Argo floats to sustainable development 

Michael Anker Petersen Hockenhull, IT University of Copenhagen: Studying the state through digital infrastructure: A sociotechnical digital methods approach 

Jose Antonio Ballesteros Figueroa, The James Hutton Institute: Quantitative Devices as Sociotechnical Infrastructures 

Ursula Plesner, Copenhagen Business School: Accelerated retrofitting: Infrastructuring for openness and security at the digital Nordic Borders through 20 years 

Parallel Session 5:
Thursday 8 June, 16:00 - 17:30  

Grupperom 4, Georg Sverdrups hus

Asbjørn M. Pedersen, Department of Information Studies, Aarhus University: ’We want to save lives with data’: BI Data Work as a Matter of Care  

Ilpo Helén, University of Eastern Finland: Iteration with multi-purposing of data in healthcare. Observations from planning of a regional data management platform 

Lene Pettersen, Kristiania University College: Magicians, Transporters, and Translators of Digital Data:  An Ethnographic Field Study in a Platform Company 

Peter Danholt, Center for STS-studies, Aarhus University: The sociotechnical assembling of the vulnerable child case  

Abstracts

Simulation and a prehistory of synthetic data 

Sam Hind, University of Manchester, and James Steinhoff, University College Dublin 

Synthetic data–data “that computer simulations or algorithms generate as an alternative to real-world data” (Andrews 2021)–is cast as the cure for nearly all the problems associated with data-intensive methods: data collection costs, surveillance, environmental impacts, and bias. Such claims present synthetic data as escaping the real world by operating within a virtual world or simulation. However, we argue that simulation is more connected to the real world than its proponents are willing to admit. 

Whilst Nikolenko (2021) traces the history of synthetic data back to early computer vision research, we offer a longer view by developing a ‘prehistory’ of synthetic data. We consider three regimes of simulation: a) statistical (the ‘Monte Carlo method’ which Galison (1996) calls the first simulation), b) discrete-event (United Steel’s ‘General Simulation Program’), and c) interactive (the graphical ‘Fortran-based Simulation System’). 

Across these regimes, simulation is presented as the means of escaping the real world and its economic constraints (time, labour, data). However, simulation always entails new demands of these same resources. Simulation - whether in the form of synthetic data or otherwise - requires new forms of labour and technical mediation which connect it deeply to the material world it purports to offer escape from. 

References 
Andrews, Gerard (2021) What is synthetic data? The Official NVIDIA Blog.: https://blogs.nvidia.com/blog/2021/06/08/what-is-synthetic-data/  
Galison, Peter (1996) Computer Simulations and the Trading Zone. In The Disunity of Science: Boundaries, Contexts, and Power. Stanford. 
Nikolenko, Sergey I (2021) Synthetic Data for Deep Learning. Springer. 

Streaming and the Making of Socio-Technical Assets 

Greti-Iulia Ivana, School of Social and Political Sciences, University of G 

In the literature which discusses the political economies of digital technology, data are commonly conceptualised as assets (Beauvisage and Mellet, 2020a, 2020b, Zuboff 2019, Couldry and Mejias 2019). As long as data generate capital by being fed into markets of prediction, user profiling and targeted advertising, this is undeniable. However, there is another aspect of assetisation which is equally important and which has received comparatively less attention in the field: devices and systems which underpin data extraction can and should also be conceptualised as assets. 

In the current paper I argue that streamed videos and podcasts produced by users are key socio-technical operating assets for the sharing platforms where they are available. If data is the new oil, user generated content is the new oil rig. To analyse the role of such materials which are typically uploaded to mainstream platforms like Youtube, Tiktok or Twitch on the broader digital economy, I focus mainly on two aspects: 1) video content created by users as operating (non-current) asset; 2) the making of this asset through the interconnection of human and non-human agency (Gillespie 2017, 2018, Seyfert 2021, Burrell 2016).  Additionally, I discuss how the value of user generated videos is measured through a simple view or subscription count. I also argue that this numeric expression of value both enables and obscures the socio-technical assemblage which is fundamental for the process of assetisation. 

Complex Data: Notes on the complexities of simplification in data-driven HRI experiments 

Frederik Vejlin, Department of Digital Design and Information Studies

Common critiques of the present ‘data moment’ often accuse algorithmic systems of abusing sophisticated technical simplifications to transform, and thereby distort, real-world complexity into machine-legible data through processes of classification, clustering, ranking, and prediction (Birhane 2021; Douglas-Jones, Walford, and Seaver 2021; Mackenzie 2015; Maguire et al. 2020). In STS, similar concerns have been raised in relation to the production of scientific knowledge (e.g., Mol and Law 2002), where observations must be purified of confounding artefacts – like materiality, sociality, and subjectivity (Law 2004) – to become ‘proper data’. In this talk, I bring these two perspectives together to explore the relations between complexity and datafication in so-called data-driven human-robot interaction (HRI). Data-driven HRI was developed at the Hiroshi Ishiguro Laboratories (HIL) in Japan as a hybrid algorithmic-experimental method for automatically generating robotic behaviour by training algorithms on human interaction data. The roboticists recognized that the dynamics of sociality are far too complex to be modelled predictably and require the use of ‘scalable’ methods that somehow both replicate and disregard social complexity (Liu et al. 2016). Based on material from fieldwork at the HIL, I unpack practices of datafication and experimentation in data-driven HRI to show how the interlacing of multiple simplification devices – algorithmic, experimental, technological, social – weaves equivocal relations between allegedly ‘simple’ data and ‘complex’ sociality. Inspired by Talia Dan-Cohen (2017, 2020; also, Mol and Law 2002), I emphasise the importance of probing the ontological politics of complexity enacted both in sociotechnical data studies and processes of datafication in algorithmic and experimental systems. 

The politics and epistemological challenges of database ethnography: how to conduct empirical studies on data infrastructures 

Tanja Knaus, TIK Centre, University of Oslo 

Affective signals of the voice, that signify mental states, emotions, pathologies, stress, or a person’s identity are recorded, analyzed, and classified by voice recognition software systems found in many devices today. Tech companies hope that these systems enable voice powered devices to react to human emotions, detect diseases, improve human-computer interaction, and augment human-to-human interaction itself. This project follows the automation and standardization of ‘affective data’ inferred from these vocal signals on two case studies: Cogito and audEERING gmbH, that both produce voice recognition software for call centres. In order to conduct empirical research on data structures, we need to further develop ethnographic methodologies that can adequately trace the micro practices connected to data and their sociotechnical implications. Therefore, I focus primarily on the analysis of the database itself as a sight of ethnographic research to follow the data processing methods within these infrastructures. By storing, relating, and providing data the database is an integral and critical part of the infrastructure that creates the basis for algorithmic modelling and contributes to an ongoing ‘worlding’. Therefore, I question and further develop the method of ‘database ethnography’ as put forward by Nandine Shuurman (2008) and adopted and expanded by Burns and Wark (2020). I will share the ontological and epistemological challenges that arise when conducting research on sociotechnical regimes that are powered by data pipelines and its promises. 

Datafication in public self-services - Hidden agendas on data providing systems in employment services 

Sini Teräsahde, Faculty of Social Sciences / Faculty of Education and Culture, Tampere university; with Jaana Parviainen, Juhu Rantala, Paula Alanen and Anne Koski

In the Nordic countries, the boundary between public and private has become obscure when public services are provided in this borderland. In the digitalization of services, the heterogenous assemblages of actors who design new technologies become shapers of society. In this paper, we ask what kind of administrative and economic interests actors have in the digitalization of employment services and how different technological, political, economic, and legislative boundaries and conditions guide actors in the service design. Drawing on actor-network theories and the recent discussions of datafication, we examine how these actors, implement and benefit the state's fast-paced digitization and digitalization policy through employment services. Based on the analysis of the expert interviews, we present preliminary observations about the expectations, goals, needs, interests, tensions, uncertainties, and ignorance of the public and private sector actors related to the service design, data acquisition and processing. The public sector actors are interested in data produced as a deliverable of employment services to foresee employment prospects. However, digital self-services meet the needs of jobseekers poorly. The private software companies, again, strive for wider access and utilisation of the public data pools and data consented by the data subjects while being conditioned by the legislation and identification standards. Their promise is to inform individual decision-making, but the motivation is to profit by providing platforms to personnel management of companies. Thus, the public, private and individual intermingle in the datafication processes resulting in equivocal interpretations of its value and beneficiaries and challenging the understanding of data ethics. 

Sociotechnical imaginaries of data-driven welfare services and forgotten data work 

Marja Alastalo, University of Eastern Finland; with Essi Lisakka 

Data-intensive healthcare and social services have been widely promoted by the state and regional actors as well as think-tanks and private enterprises in Finland. Public databases and their further utilization have been regarded as a means to strategically govern public services. Data-intensive services seek to translate different services and functions into (structured) data that can be combined and further utilized in knowledge-based management, research, innovation, and business. Finland underwent a major public sector reform as the responsibility for organizing the healthcare, social welfare and rescue services transferred from municipalities to wellbeing services counties in January 2023. The new wellbeing services counties face wide-ranging needs to renew and unify their information systems. In our presentation, we discuss how digitalization and data-driven systems are depicted in the statutory wellbeing services counties’ strategy documents. 

What kind of digital and data-driven futures are imagined in the strategy papers? We analyse the strategies using the theoretical concept of sociotechnical imaginaries. These imaginaries are seen as social and political worldbuilding; imagination is a cultural resource that describes and directs the possible futures. We interpret the preliminary results against our ethnographic fieldwork on building digital and data-driven systems at a wellbeing services county and the literature on data work to highlight diverse but often overlooked work that data-driven systems require. We consider datafication of public welfare services as a complex sociotechnical process and discuss the potential analytical shortcomings of the concept of sociotechnical imaginaries. 

Crisis informatics, disinformation and emergency politics 

Åshild Kolås, Peace Research Institute Oslo (PRIO) 

The software company System Development Corporation (SDC) played a key role in the early days of emergency management studies, establishing the Emergency Operations Research Center (EORC) in Santa Monica, California in the mid-1960s. EORC was a civil application of the simulation and system training technologies originally developed by SDC for military use. Today, a range of new digital tools are available to first responders, including applications that pull content from social media, aggregate, analyze and curate the data, and display information deemed important to the user. This paper digs into the roots of emergency management in mid-20th century USA, reviews the study of crisis informatics and draws on the concept of ‘emergency politics’ to discuss questions that arise from the growing application of user-generated content and geolocation in emergencies. The paper examines digitalization of emergency management as ‘emergency politics’, discussing the tension between authoritarian ‘militaristic’ approaches to emergency management and decentralizing approaches that rely on civil engagement, preparedness and ‘resilience’. A case in point is the monitoring of social media in the United Kingdom for COVID-19 ‘disinformation’, carried out by the UK government’s Counter Disinformation Unit, the Rapid Response Unit and the UK military’s 77th Brigade, set up in 2015 to combat information operations by countries deemed hostile to the UK. This raises questions about the slippery slope of ‘tackling disinformation’ in responses to emergencies, and the militarization of emergency management in the age of information warfare. 

Unpacking the Worlds of Covid-19 Dashboards through a Sociotechnical Lens 

Bidisha Chaudhuri, International Institute of Information Technology Bangalore (IIITB); with Amelia Acker and Megan Finn 

Data dashboards by the governments, private agencies, civil society organizations and volunteers’ networks became ubiquitous during the pandemic. What these dashboards presented as data varied from population-level virus-related data to non-virus deaths, from data about available resources to data about hate crimes. Together, they created and maintained a datafied discourse of the pandemic that accommodated faith in data, lack of data, mistrust in data, counter-data practices, all at once. In this paper, we unpack this datafied discourse to unravel the underlying networks of data, people, organizations, tools, protocols, and resources (social, political and financial) that shape what we know (or not) about the pandemic, what we consider (or not) relevant information about the pandemic. We draw on a year-long qualitative study of 11 Covid-19 data dashboard projects across India and the U.S. to foreground the work of infrastructuring (Pipek and Wulf 2009) these data interfaces that covered different dimensions of the pandemic and allowed for collective actions to be centered around data during the crisis. While discourses around datafication are often framed around data as factual or political, rendering contested versions of reality, we turn to a more sociotechnical understanding of data, that is, how realities of the pandemic rendered data as dashboards. Rather than debating data as “matters of fact”, we engage with data as “matters of concern” (Latour 2004) by focusing on the people who found meaning in data, what meanings they drew from data and how they constituted these meanings through their everyday work. 

Mediating ocean data: from Argo floats to sustainable development 

Susanna Lidström, KTH Royal Institute of Technology; with Adam Wickberg

Argo floats are autonomous devices that roam the upper 2000 metres of the ice-free ocean collecting data on temperature and salinity. The floats transmit their data via satellites to Argo data management centres around the world, where they are made immediately available for weather forecasting, and quality controlled and eventually produced as high-quality data for scientific purposes, primarily climate studies. At all stages, the data are freely available. Since its inception in the 1990s, the Argo program has revolutionised ocean science in terms of amounts of data available from beneath the surface. This study follows the Argo data from collection by the physical floats, through data management and processing, to implications for and direct usage in the development and formulation of ocean governance goals, especially the Sustainable Development Goals (SDGs) for ocean sustainability and climate change. We also consider the reverse – how environmental governance aims impact the development of Argo. Our aim is to identify and analyse points along this two-way journey where particular decisions and practices shape the relationship between ocean data and their use(r)s, informing potentially conflicting views of ocean exploitation and protection. We are also interested in how the Argo program interacts with SDG targets for increased equity, through technology transfer and capacity building. Our analysis draws on a theoretical framework including environmental history, science and technology studies, media studies and ecocriticism to characterise the environmental and sociotechnical imaginaries that inform – and are informed by – the vast amounts of data collected and produced by the Argo infrastructure. 

Studying the state through digital infrastructure: A sociotechnical digital methods approach 

Michael Anker Petersen Hockenhull, IT University of Copenhagen

The modern state is increasingly characterized by a high-level of reliance on digital systems and infrastructures. This paper explores the use of an STS-informed sociotechnical approach to studying the digitalized state and its data. In particular, the paper recounts the use of a digital methods (Rogers, 2009; 2019) approach which takes the material and sociotechnical technicity (Omena, 2021) of digital systems seriously. 

The paper is based on a study of mandatory sustainability reporting amongst large Danish companies, and the use of a state-run data infrastructure, the Virk API, provided by the Danish Business Administration. This API allows registered users to access data on Danish companies, and is frequently used by banks, accountants, investors and other actors to perform due dilligence and other required checks. Simultaneously, the API outputs data provided by organizations via annual reports, tax forms and more. Together these data streams and infrastructures form a kind of public-private ecosystem which underwrite the performance as well as outsourcing of key functions of the state such as auditing, regulation and statistics. The study provides an analysis of the state as digitally enmeshed with the private sector. 

Taking this case as its point of departure, the paper will argue and demonstrate that it is beneficial to combine a sociotechnically STS-informed approach with a more technically digital methods-informed approach to understand this phenomenon. Doing so will enable detailed empirical study a part of the state, its infrastructures and data. 

Quantitative Devices as Sociotechnical Infrastructures 

Jose Antonio Ballesteros Figueroa, The James Hutton Institute

Indicators influence policy formulation together with political interests, organisational problems, and the measured problem (Boswell and Rodrigues 2016). Therefore, indicators - as quantitative devices - can be included within sociotechnical infrastructures that allow and encourage their use to co-produce particular understandings and solutions to problems. At the same time, indicators on their own do not shape the way environmental policies are framed. Instead, these tools are part of a more significant epistemology where the quantification of everything has been reinforced as a lingua franca among policymakers. As this paper suggests, the notion of quantitative devices observes them as part of sociotechnical infrastructures and imaginaries. This paper seeks to introduce the notion of Quantitative Devices (QDs), understood as tools that contribute to the mobilisation, and on occasions, imposition (Wynne 2005), of particular imaginaries throughout discourses that praise the use of quantitative methods and data. The essence of QDs is not in the quantitative methodologies required for their construction but in the sociotechnical elements that allow their performativity. Quantitative devices are mobilised through an infrastructure of quantification where political, technological, statistical and epistemic systems interact. This notion is situated using as case study the development of a human-based modelling system around circular economy practices in Scotland. The model, developed at the James Hutton Institute, requires a set of political actors, statistical understandings, technological systems and community respondents to operate.  The papers theorises a new way to incorporate the sociotechnical infrastructure that are needed for data tools to operate. 

Accelerated retrofitting: Infrastructuring for openness and security at the digital Nordic Borders through 20 years 

Ursula Plesner, Copenhagen Business School; with Luna Secher Rasmussen & Bertil Rolandsen 

This paper conceives of the digital Nordic borders as an infrastructure that supports and generates specific ‘regimes of meaning and power’, focusing specifically on the ideological and practical tensions between openness and security. Through a diachronic analysis of the establishment and maintenance of the border infrastructure, we highlight how ‘cracks’ and ‘chocks’ step by step challenge and forge the unfolding regime of “openness”, as the application of digital technologies intensify. The empirical material consists of documents such as official State statements, legislation, procurements, and popular articles. Analytically, the study examines the ‘infrastructuring’ of the digital Nordic borders by applying the concept of retrofitting (Howe et al 2016). For infrastructures to operate over long periods of time, they must be constantly retrofitted to meet new contingencies. The analytical strategy operationalizes and develops the concept of retrofitting by introducing ‘Chocks’ and ‘Cracks’ as instances that allow us to examine major geopolitical events (such as migration waves or Brexit) as well as inconspicuous developments (such as technological innovation). The analysis covers a period of 20 years and shows that the digital infrastructuring of openness in the Nordics surrounding the borders is a paradoxical project that constantly – through retrofitting – both creates borders, enforces digitalization, and offers openness. The paper contributes to STS studies of borders as infrastructures with an empirically grounded theorization of accelerated retrofitting. 

’We want to save lives with data’: BI Data Work as a Matter of Care 

Asbjørn M. Pedersen, Department of Information Studies, Aarhus University 

Through an ethnographic study of a healthcare Business Intelligence unit, this paper explores BI data work as a ‘matter of care’. 

Big Data in its various forms - be it data science, machine learning, or business intelligence - promises to solve healthcare challenges and make healthcare data-driven. These aspirations for data-driven healthcare are often found in political and management strategies that anticipate new technologies to enhance management, research, and patient treatment. Meanwhile, Big Data in general have been criticized for data fetishism, mass surveillance, and disregarding the ethical implications and the human impact of data practices. 

Despite this, the desire for data-driven healthcare is not solely imposed from the outside: Big Data and its practitioners (the so-called data professionals) are no longer placed in distant centers of calculation but have moved in with healthcare. In other words, they are situated within the messy, social-technical practices of healthcare, co-creating data technologies, practices, and politics with healthcare professionals. I suggest exploring these data work practices of data professionals as a matter of care. In this case, entangled practices that involve not only caring for data, but also for healthcare professionals and patients; navigating local needs and demands; entering critical dialogues on data and data representations; and ultimately aiming to save lives with data. 

Iteration with multi-purposing of data in healthcare. Observations from planning of a regional data management platform 

Ilpo Helén, University of Eastern Finland

With datafication of healthcare, great expectations have emerged regarding the secondary use of the patient data for administrative, management and multiple research purposes. Related to these expectations, hopes for better intermingling of clinical data with data on treatment outcome and cost data for better planning and controlling the functioning of healthcare organisations have also raised. My paper is based on ethnographic inspired study on a planning project of a regional ‘new generation’ patient data management system in Finland that was expected to enable such multi-purposing of health data and implementing advanced data analytics solutions, AI included, on the platform.  I discussed the issues which an objective of multi-purposing of data engendered in the planning process, as regards to adjusting both technical solutions and healthcare practices to the objective, and iteration and anticipation work by which the expert-developers tried to solve the issues. I claim that multi-purposing of health-related data is inclined to create sociotechnical syncretism in healthcare organization and practices under datafication. 

Magicians, Transporters, and Translators of Digital Data:  An Ethnographic Field Study in a Platform Company 

Lene Pettersen, Kristiania University College

It is argued that large sets of digital data are ushering in a new data economy in which big data infrastructures, products, and services will generate massive economic growth. Despite some early anthropological work concerning data (e.g. Boellstorff and Maurer 2015), anthropologists have until recently been cautious in taking data as their research object (Douglas‐Jones, Walford, and Seaver 2021).  In a data-driven economy, data tend to be understood as valuable because they can be transformed into something else (Walford 2021). 

This abstract presents insights from an ethnographic field study of a platform company in the Nordics that offers an integration platform to clients. The platform enables clients’ data to be combined in new ways and their IT systems’ data to be better exploited. However, the findings show that if data are to be transformed into anything, they must be connected to time and place (context). Three different job roles working with data were found to play important roles. I label these magicians, transporters, and translators. Magicians are programmers working with the functionality of the integration platform that integrates data. They also work as transporters, who set up the dataflows of the clients’ data into the platform. Because the platform’s very complex functionality and its new way of thinking about data (integrating data rather than IT systems), the work carried out by those whom I label translators is very important. Translators show the customers how they can use the platform and what it can do with their data. In the terminology of ANT, the translators provide the customers with a program: scripts that follow the intended use of a certain technology (Latour 1992). 

Boellstorff, Tom, and Bill Maurer. 2015. Data, now bigger and better!, Social Anthropology/Anthropologie sociale. Chicago: Prickly Paradigm Press. 
Douglas‐Jones, Rachel, Antonia Walford, and Nick Seaver. 2021. "Introduction: Towards an anthropology of data."  Journal of the Royal Anthropological Institute 27 (S1):9-25. 
Latour, Bruno. 1992. "Where are the missing masses? The sociology of a few mundane artifacts."  Shaping technology/building society: Studies in sociotechnical change 1:225-258. 
Walford, Antonia. 2021. "Data–ova–gene–data."  Journal of the Royal Anthropological Institute 27 (S1):127-141. 

The sociotechnical assembling of the vulnerable child case 

Peter Danholt, Center for STS-studies, Aarhus University; with Peter Lauritsen

In this paper we report on an ongoing research project on the role of data and datafication in relation to vulnerable children. The project is part of the “SHAPE – shaping digital citizenship” – project at Aarhus University, Denmark. Through an ethnographic study of, and in collaboration with social workers, we investigate the formation of the “digital double” of the vulnerable child. We draw on actor-network theory and data studies. The case shows that the assembling of the vulnerable child case is comprised of a heterogenous set of technologies, tools and practices, such as coping techniques to decrease stress levels of the child and the child’s family; digital systems for collaboration between the social work professionals and the families; simple quantitative performance indicators that measure goals and progress and the aggregation of these into data visualization for management oversight and others. In the assembling of the child case, we find that the heterogeneity of tools and data (and not just digital data) result in a versatile and sensitive practice well equipped to involve and care for the child and the family. Based on this we discuss the concept of the digital double in relation to a sociotechnical understanding of the digital as comprised of digital and non-digital practices and elements. 

Published June 2, 2023 10:26 AM - Last modified June 5, 2023 10:42 AM