Maquinações
Experimentações teóricas, fichamentos e outros comentários descartáveis

Dadoficação, dadoísmo e vigilância de dados

Rafael Gonçalves
27/02/2024
José Van Dijckgooglebig datatecnologias da informaçãodadoismometadadovigilânciafichamento

Fichamento do artigo "Datafication, dataism and dataveillance" publicado pela pesquisadora José Van Djick1.

Abstract

Metadata and data have become a regular currency for citizens to pay for their communication services and security—a trade-off that has nestled into the comfort zone of most people. This article deconstructs the ideological grounds of datafication. Datafication is rooted in problematic ontological and epistemological claims. As part of a larger social media logic, it shows characteristics of a widespread secular belief. Dataism, as this conviction is called, is so successful because masses of people—naively or unwittingly—trust their personal information to corporate platforms. The notion of trust becomes more problematic because people’s faith is extended to other public institutions (e.g. academic research and law enforcement) that handle their (meta)data. The interlocking of government, business, and academia in the adaptation of this ideology makes us want to look more critically at the entire ecosystem of connective media.

Snowden como escancaramento da vigilância em curso

Snowden’s disclosures have been more than a wakeup call for citizens who have gradually come to accept the “sharing” of personal information—everything from marital status to colds, and from eating habits to favorite music—via social network sites or apps as the new norm (van Dijck 2013a). Platform owners routinely share users’ aggregated metadata with third parties for the purpose of customized marketing in exchange for free services. Many people may not have realized, up until Snowden’s leaks, that corporate social networks also—willingly or reluctantly—share their information with intelligence agencies. (p. 197)

Metadado como moeda paga pelo uso de segurança e serviços de comunicação

When Barack Obama defended his administration’s policies of mass surveillance saying that there was “no content, just metadata” involved in the PRISM scheme, he added that citizens cannot expect a hundred percent security and a hundred per cent privacy and no inconvenience. The president’s explanation echoed social media companies’ argument that users have to give up part of their privacy in exchange for free convenient platform services. In other words, metadata appear to have become a regular currency for citizens to pay for their communication services and security—a trade-off that has nestled into the comfort zone of most people. (p. 197-8)

Dadoficação como quantificação da ação social na forma de dados

What explains this remarkable tolerance for Big Brother and Big Business routinely accessing citizens’ personal information also known as Big Data? Part of the explanation may be found in the gradual normalization of datafication as a new paradigm in science and society. Datafication, according to Mayer-Schoenberger and Cukier (2013) is the transformation of social action into online quantified data, thus allowing for real-time tracking and predictive analysis. Businesses and government agencies dig into the exponentially growing piles of metadata collected through social media and communication platforms, such as Facebook, Twitter, LinkedIn, Tumblr, iTunes, Skype, WhatsApp, YouTube, and free e-mail services such as gmail and hotmail, in order to track information on human behavior: “We can now collect information that we couldn’t before, be it relationships revealed by phone calls or sentiments unveiled through tweets” (Mayer-Schoenberger and Cukier 2013: 30). Datafication as a legitimate means to access, understand and monitor people’s behavior is becoming a leading principle, not just amongst techno-adepts, but also amongst scholars who see datafication as a revolutionary research opportunity to investigate human conduct. (p. 198)

Dadoismo como ideologia: crença na quantificação objetiva e potencial de rastreamento de comportamento humano e socialidade através de tecnologias de mídias digitais

However compelling some examples of applied Big Data research, the ideology of dataism shows characteristics of a widespread belief in the objective quantification and potential tracking of all kinds of human behavior and sociality through online media technologies. Besides, dataism also involves trust in the (institutional) agents that collect, interpret, and share (meta)data culled from social media, internet platforms, and other communication technologies. (p. 198)

Dataveillance (vigilância de dados?) como vigilância contínua através do uso de (meta)dado

Notions of “trust” and “belief” are particularly relevant when it comes to understanding dataveillance: a form of continuous surveillance through the use of (meta)data (Raley 2013). As Snowden’s documents made clear, people have faith in the institutions that handle their (meta)data on the presumption that they comply with the rules set by publicly accountable agents. However, as journalists found out, the N.S.A. regularly defies court rulings on data use, just as corporations are constantly testing legal limits on privacy invasion.1 More profoundly, the Snowden files have further opened people’s eyes to the interlocking practices of government intelligence, businesses, and academia in the adaptation of dataism’s ideological premises. Therefore, we need to look into the credibility of the whole ecosystem of connective media. What are the distinctive roles of government, corporations and academia in handling our data? And what kind of critical attitude is required in the face of this complex system of online information flows? (p. 198)

Dataficação como paradigma para entender socialidade e comportamento social. Metadado como tesouro.

Over the past decade, datafication has grown to become an accepted new paradigm for understanding sociality and social behavior. With the advent of Web 2.0 and its proliferating social network sites, many aspects of social life were coded that had never been quantified before — friendships, interests, casual conversations, information searches, expressions of tastes, emotional responses, and so on. As tech companies started to specialize in one or several aspects of online communication, they convinced many people to move parts of their social interaction to web environments. Facebook turned social activities such as “friending” and “liking” into algorithmic relations (Bucher 2012; Helmond and Gerlitz 2013); Twitter popularized people’s online personas and promoted ideas by creating “followers” and “retweet” functions (Kwak et al. 2010); LinkedIn translated professional networks of employees and job seekers into digital interfaces (van Dijck 2013b); and YouTube datafied the casual exchange of audiovisual content (Ding et al. 2011). Quantified social interactions were subsequently made accessible to third parties, be it fellow users, companies, government agencies, or other platforms. The digital transformation of sociality spawned an industry that builds its prowess on the value of data and metadata —automated logs showing who communicated with whom, from which location, and for how long. Metadata—not too long ago considered worthless byproducts of platform-mediated services—have gradually been turned into treasured resources that can ostensibly be mined, enriched, and repurposed into precious products. (p. 198-9)

Presuposição de captura passiva x mediação algorítmica nas plataformas

The industry-driven datafication view resonates not only in entrepreneurs’ auspicious gold rush metaphors, but also in researchers’ claims hailing Big Data as the holy grail of behavioral knowledge. Data and metadata culled from Google, Facebook, and Twitter are generally considered imprints or symptoms of people’s actual behavior or moods, while the platforms themselves are presented merely as neutral facilitators. Twitter supposedly enables the datafication of people’s sentiments, thoughts, and gut-feelings as the platform records “spontaneous” reactions; users leave traces unconsciously, so data can be “collected passively without much effort or even awareness on the part of those being recorded” (Mayer-Schoenberger and Cukier 2013: 101). Analysts often describe the large-scale gauging of tweets as using a thermometer to measure feverish symptoms of crowds reacting to social or natural events—an assumption founded on the idea that online social traffic flows through neutral technological channels. In this line of reasoning, neither Twitter’s technological mediation by hashtags, retweets, algorithms, and protocols, nor its business model seems relevant (Gillespie 2010). (p. 199)

Predição como caso de mineração da vida

Identifying patterns of conduct or activities out of unconsciously left (meta)data on social network sites increasingly serves to predict future behavior. Information scientists Weerkamp and De Rijke (2012) state it very clearly: “We are not interested in current or past activities of people, but in their future plans. We propose the task of activity prediction, which revolves around trying to establish a set of activities that are likely to be popular at a later time.” They position activity prediction as a special case of “life mining”, a concept defined as “extracting useful knowledge from the combined digital trails left behind by people who live a considerable part of their life online.” The phrase “useful knowledge” begs the question: useful for whom? According to Weerkamp and De Rijke, social media monitoring provides meaningful information for police and intelligence services to forecast nascent terrorist activity or calculate crowd control, and for marketers to predict future stock market prices or potential box office revenues (see also Asur and Huberman 2011. From the viewpoints of surveillance and marketing, predictive analytics—relating (meta)data patterns to individual’s actual or potential behavior and vice versa—yields powerful information about who we are and what we do. When it comes to human behavior, though, this logic may also reveal a slippery slope between analysis and projection, between deduction and prediction (Amoore 2011). (p. 199)

Paradoxo do pensamento big data: coleta neutra e manipulação ativa

A “big data mindset” also seems to favor the paradoxical premise that social media platforms concomitantly measure, manipulate, and monetize online human behavior. Even though metadata culled from social media platforms are believed to reflect human behavior-as-it-is, the algorithms employed by Google, Twitter and other sites are intrinsically selective and manipulative; both users and owners can game the platform. For instance, when Diakopoulos and Shamma (2010) predict political preferences by analyzing debate performance through tweets, they seem to ignore the potential for spin-doctors or partisan twitterers to influence Twitter debates in real time. In marketing circles, the prediction of future customers’ needs is akin to the manipulation of desire: detecting specific patterns in consumer habits often results in simultaneous attempts to create demand—a marketing strategy that is successfully monetized through Amazon’s famed recommendation algorithm (Andrejevic 2011). Social media content, just like internet searches, is subject to personalization and customization, tailoring messages to specific audiencesor individuals (Pariser 2011; Bucher 2012). Promoting the idea of metadata as traces of human behavior and of platforms as neutral facilitators seems squarely at odds with the well-known practices of data filtering and algorithmic manipulation for commercial or other reasons. (p. 200)

Dado não é bruto, exige interpretação

The idea of (meta)data being “raw” resources waiting to be processed perfectly fits the popular life-mining metaphor. According to Mayer-Schoenberger and Cukier (2013), each single data set is likely to have some intrinsic, hidden, not yet unearthed value, and companies are engaged in a race to discover how to capture and rate this value. But as Lisa Gitelman aptly states, “raw data” is an oxymoron: “Data are not facts, they are ‘that which is given prior to argument’ given in order to provide a rhetorical basis. Data can be good or bad, better or worse, incomplete and insufficient” (Gitelman 2013: 7). Automated data extraction performed on huge piles of metadata generated by social media platforms reveals no more information about specific human behavior than large quantities of sea water yield information about pollution—unless you interpret these data using specific analytical methods guided by a focused query. (p. 201)

Dadoísmo

The compelling logic of dataism is often fueled by the rhetoric of new frontiers in research, when large sets of unconsciously left data, never available before, are opening up new vistas. Dataism thrives on the assumption that gathering data happens outside any preset framework—as if Twitter facilitates microblogging just for the sake of generating “life” data—and data analysis happens without a preset purpose—as if data miners analyze those data just for the sake of accumulating knowledge about people’s behavior. It may not always be simple to identify in what context (meta)data are generated and for what purposes they are processed. And yet it is crucial to render hidden prerogatives explicit if researchers want to keep up users’ trust in the datafication paradigm. Trust is partly grounded in the persuasive logic of a dominant paradigm; for another part, though, faith resides with the institutions that carry the belief in Big Data. (p. 202)

O papel das instituições no dadoísmo

A second line of critical inquiry is leveled at the institutional structures that scaffold Big Data thinking. Data companies, government agencies and researchers alike underscore the importance of users’ trust in societies where growing parts of civilian life—from application procedures to medical records and financial transactions—are moved onto online platforms. Establishing and maintaining the system’s integrity is often assigned as a task to “the state”—whereas “the platforms” have to comply with the rules set by government agencies. When Mayer-S choenberger and Cukier (2013) address the perils of metadata’s ubiquitous availability—i.e. profiling based on stereotypes, penalties based on propensities, surveillance based on association, a weakened right to privacy—they hold governments responsible for taking measures to avert these potential risks. The authors of Big Data call for a new “caste of big-data auditors we call algorithmists” to “secure a fair governance of information in the big-data era” (Mayer-Schoenberger and Cukier 2013: 184). Academics, too, count on national governments to regulate possibly adverse effects of datafication; but they also turn to data companies when they call for “trust and goodwill” from corporations and ask them to give users “transparency and control” over their information Kosinski et al. 2013). In striving for trust and credibility, there is a presumed separation of public, corporate, and state institutions as autonomous bodies that each has a distinctive relationship with users—whether consumers or citizens. (p. 202-3)

Entrelaçamento entre serviços públicos, academia e tecnologias de corporações

And yet, if the Snowden files have taught us anything, it is probably that institutions gathering and processing Big Data are not organized apart from the agencies that have the political mandate to regulate them. In fact, all three apparatuses—corporate, academic, and state—are highly staked in getting unrestraint access to metadata as well as in the public’s acceptance of datafication as a leading paradigm. Scientists, government agencies and corporations, each for different reasons, have a vested interest in datafied relationships and in the development of methods that allow for prediction as well as manipulation of behavior. The aspirations of all agents to know, predict, and control human behavior overlap to some extent but differ on other accounts. Data firms want their platforms to be acknowledged as objective, standardized aggregators of metadata—better and more precise than the tools government agencies or academics use for measuring consumer sentiment, public health, or social movements.5 When government agencies and academics adopt commercial social media platforms as the gold standard for measuring social traffic, they in fact transfer the power over data-collection and interpretation from the public to the corporate sector. As boyd and Crawford (2012: 14) argue: “There is a deep government and industrial drive toward gathering and extracting maximal value from data, be it information that will lead to more targeted advertising, product design, traffic planning, or criminal policing.” (p. 203)

In this tripartite alignment of forces, government, academia, and data firms are interconnected at the level of personnel as well as through their exchange of innovative technologies, i.e. by co-developing data mining projects. In an article on the Snowden case for The New York Times, reporters Risen and Wingfield (2013) bare close connections between Silicon Valley and the N.S.A.: “Both hunt for ways to collect, analyze and exploit large pools of data about millions of Americans. The only difference is that the N.S.A. does it for intelligence, and Silicon Valley does it to make money.” Links between data firms and state intelligence agencies show how technical experts rotate jobs between academia and health industries, and move from data firms to financial services or intelligence agencies. The interests of corporations, academics, and state agencies converge in various ways. For instance, S kype and its owner Microsoft readily engaged with the C.I.A. on Project Chess aimed at making Skype calls useable to law enforcement officials. As Timothy Garton-Ash (2013) quipped in an op-ed in The Guardian: if Big Brother came back in the 21st century, “he would return as a private-public partnership.” (p. 203)

Sobre a capacidade de predição da Google x Estados

5: Google executives argue that Google search data can reveal trends a week or two earlier than official government statistics (Aspen Institute Report 2010). In addition, it is argued that Google Flu Trends is a better instrument to measure for emerging flu epidemics than national surveillance systems for influenza-like symptoms (Wilson et al.2009). (p. 203)

Dadoísmo pressupõe confiança nos métodos quantitativos, independência e integridade institucionais

What is at issue here is not just an embrace of dataism as a technique of knowing social action—human behavior being measured, analyzed, and predicted on the basis of large sets of metadata—but also as a faith in high-tech companies’ and government agencies’ intention to protect user data from exploitation. Dataism presumes trust in the objectivity of quantified methods as well as in the independence and integrity of institutions deploying these methods—whether corporate platforms, government agencies, or academic researchers. Trust and independence, however, are embattled notions in an ecosystem of connectivity where all online platforms are inevitably interconnected, both on the level of infrastructure as on the level of operational logic (van Dijck 2013a; van Dijck and Poell 2013). When everything and everyone is connected through the same infrastructure and operates through the same logic—a view theorized by Foucault well before the advent of online technologies. (p. 204)

For instance, the logic of predictive analytics appears to be corroborated by governments, researchers, and corporations alike. Google claims they are much better than state agencies in forecasting unemployment statistics or flu epidemics because their web crawlers can determine when an individual is about to start looking for a new job or starts seeking information about influenza. Facebook Likes can potentially predict which young mothers may be likely to malnourish their children—information which state health agencies may act upon. And the N.S.A. declares they have prevented at least fifty terrorist attacks due to the PRISM scheme, based on data culled from social media platforms and e-mail services. Problematic in these institutional forms of dataism is not only the fact that we lack insight in the algorithmic criteria used to define what counts as job seeking, dysfunctional motherhood, or terrorism. More questionably, the contexts in which the data were generated and processed—whether through commercial platforms or public institutions—all appear to be interchangeable. (p. 204)

Datavaillance x Vigilância

Dataveillance—the monitoring of citizens on the basis of their online data—differs from surveillance on at least one important account: whereas surveillance presumes monitoring for specific purposes, dataveillance entails the continuous tracking of (meta)data for unstated preset purposes. Therefore, dataveillance goes well beyond the proposition of scrutinizing individuals as it penetrates every fiber of the social fabric (Andrejevic 2012: 86). Dataveillance is thus a far-reaching proposition with profound consequences for the social contract between corporate platforms and government agencies on the one hand and citizens-consumers on the other. Let’s look more closely at the distinctive role of each actor in this battle for credibility and trust. (p. 205)

Papel da academia na credibilidade do ecosistema de dataveillance

Responsibility for maintaining credibility of the ecosystem as a whole also resides with academics. The unbridled enthusiasm of many researchers for datafication as a neutral paradigm, reflecting a belief in an objective quantified understanding of the social, ought to be scrutinized more rigorously. Uncritical acceptance of datafication’s underpinning ideological and commercial premises may well undermine the integrity of academic research in the long run. To keep and maintain trust, Big Data researchers need to identify the partial perspectives from which data are analyzed; rather than maintaining claims to neutrality, they ought to account for the context in which data sets are generated and pair off quantitative methodologies with qualitative questions. Moreover, the viability and verifiability of predictive analytics as a scientific method deserves a lot more interdisciplinary enquiry, combining for instance computational, ethnographic and statistical approaches (Giglietto et al. 2012: 155) (p. 206)

O papel dos cidadãos

Meanwhile, as Edward Snowden’s unscrupulous actions show, there is an overarching significant actor in the fight for credibility that is often overlooked: users-citizens. When Snowden made the choice to go public with his inside information on N.S.A. dataveillance practices, he not only showed the power of an individual employee to unveil and unsettle a complex state-industrial-academic complex of forces. He also counted on the vigilance of many citizens—researchers, influential bloggers, journalists, lawyers and activists—to take public his concern about the structural flaws in the ecosystem that is currently developing. Over the past decade, the actual power of users-citizens vis-à-vis corporate platforms and the state has triggered substantial debate, albeit mostly in activist and academic circles. Some have found the ability of users to resist platforms’ privacy policies and surveillance tactics to be quite limited; individuals are steered by platforms’ technologies and business models of single platforms while it is extremely hard to gain insight in the system’s interdependence and complexity (Draper 2012; Hartzog and Selinger 2013; Mager 2012). Other researchers have argued in favor of strengthening digital (consumer) literacy particularly at the level of understanding privacy and security in relation to social data (Pierson 2012). And there is a growing mass of critical scholarship stressing the importance of users in baring howconnective media are forging a new social contract on societies while refurbishing sociality and democracy in online environments (Langlois 2013; Lovink 2012). (p. 206)


  1. DIJCK, J. VAN. Datafication, dataism and dataveillance: Big Data between scientific paradigm and ideology. Surveillance & Society, v. 12, n. 2, p. 197–208, 9 maio 2014.