Maquinações
Experimentações teóricas, fichamentos e outros comentários descartáveis

Situando métodos na magia dos big data e IA

MNIST manifold gerado por auto-encoder
Rafael Gonçalves
11/01/2022
danah boydM.C. Elishbig datainteligência artificialantropologiaetnografiaaprendizado de máquina

Fichamento do artigo Situating methods in the magic of Big Data and AI1.

IA como "o novo Big Data"

IBM Watson is just one of an emerging class of technologies being branded as “artificial intelligence”(AI). These technologies have risen to prominence in the last year as the latest game-changer in the tech industry. Only a few years ago, the same might have been said about Big Data, and indeed according to a recent New York Times article, AI has been dubbed the “new Big Data”in many circles (Hardy, 2016). (p. 57)

IA como conceito sociotécnico

The purportedly neutral collection and analysis of large quantities of data promise to present insights that can transcend human limitations. Yet, Big Data and AI must be understood as socio-technical concepts. That is, the logics, techniques, and uses of these technologies can never be separated from their specific social perceptions and contexts of development and use. (p. 58)

Produção de um hype com implicações sobre a realidade

Still, through the manufacturing of hype and promise, the business community has helped produce a rhetoric around these technologies that extends far past the current methodological capabilities. As we argue in this paper, the ability to manufacture legitimacy has far-reaching implications. Not only does it trigger innovation and bolster economies, but it also provides cover for nascent technologies to potentially create fundamentally unsound truth claims about the world, which has troubling implications for established forms of accountability (Barocas & Selbst, 2016; Citron, 2008; Crawford & Schultz, 2013; Zarsky, 2016). (p. 58)

É o momento de desenvolver uma abordagem reflexiva para a IA

While many argue that this is the dawning of the age of Big Data and AI, those who have lived through previous hype cycles cannot help but echo the mantra that “winter is coming.”The key to grounding machine learning-based practices is untethering the work from hype and fear cycles and developing a rich methodological framework for addressing the strengths and weaknesses of both the practices and the claims that can be produced through technical analysis. In other words, it is time for technical practice to develop a reflexive approach (p. 58)

IA e Big Data como instrumentos de quantificação para governança neoliberal

In many regards, there is nothing new about either Big Data or AI. As techniques of quantification ordered by the logics of (neo)liberal governance and capitalism, both technologies take shape from within long histories of operationalizing statistics for business profit, population control, and governance (Foucault, 2009, 2010; Rose, 1991). More specifically, the techniques that comprise Big Data and AI reflect long histories of slow, technical development aimed at achieving discrete outcomes (Jones, 2016). (p. 58-9)

Uso integrado dos termos IA e Big Data

Throughout this article, we often purposefully integrate Big Data and AI into one concept to focus on the phenomenon we are interrogating. Nonetheless, these terms have different roots and many practitioners would take issue with how these terms are used in public discourse. (p. 59)

Imprecisão na definição de IA. Abordagem histórica.

Moreover, the boundaries of what constitutes “AI,”as a field of research as well as an aspirational goal, are nebulous and often contested. Rather than relying on rigid definitions, we begin by providing brief histories of these terms in order to draw out the social contexts and research cultures from which they emerged. (p. 59)

Surgimento do termo Big Data como volume, velocidade e variedade em 90s

Big Data was born of big business. The specific techniques of Big Data date back to at least the 1990s, but the term entered business discourse through a 2001 Gartner report that defined Big Data as the “3Vs”: volume, velocity, and variety (Laney, 2001). It is worth pointing out that while the term Big Data was a neologism, collecting data, and using statistics to measure and manage populations dates back centuries (Hacking, 1982; Igo, 2007). (p. 59)

Definições alternativas em 2000 e mitos ao redor de Big Data

Nevertheless, since the early 2000s, numerous scholars and pundits have attempted to offer alternate definitions that scope out both the technology and practices underpinning the phenomenon of Big Data. For example, after listing 10 different operating definitions used in different contexts, Gil Press (2014) offers two of his own: (1) “the belief that the more data you have the more insights and answers will rise automatically from the pool of ones and zeros”and (2) “a new attitude by businesses, non-profits, government agencies, and individuals that combining data from multiple sources could lead to better decisions.” More than the familiar business rhetoric of “volume, velocity, and variety,” Press’s definitions are useful because they highlight aspects of the mythologies that under- pin Big Data. (p. 59)

2010: surgimento de "cloud", SaaS, data science

By 2010, technology companies and other enterprises began focusing on Big Data as a new business paradigm (Manyika et al., 2011). Consulting firms emerged to help companies wrangle their data, while technology companies focused on selling their “cloud”server and “software as a service”offerings to help companies store and manage their data. Non-profits and government agencies began to feel as though they too needed to use data to “get smart.”To address these needs, education institutions and funding agencies began rebranding statistics and computer science efforts as “data science”(Lohr, 2015). Meanwhile, less well-intended companies emerged to prey on anxious organizations by selling Big Data solutions that were little more than vaporware. (p. 59)

Relatórios da casa branca mostram uma mudança no imaginário sobre Big Data que vai de um entusiasmo otimista a preocupações com privacidade e desigualdade

The changing responses of the Obama White House to these technologies offers an example of how attitudes about Big Data slowly shifted. Initially, enthusiastic about the economic potential of Big Data, the White House convened a series of experts in 2014 who ended up highlighting both the opportunities and concerns related to this emerging field of technologies. The White House’s first report in 2014 on “seizing opportunities and preserving values”was quite optimistic (Podesta, Pritzker, Moniz, Holdren, & Zientz, 2014). Yet, by 2016, their second Big Data report focused on “algorithmic systems, opportunity, and civil rights”and painted a much more concerning portrait about the potential of data discrimination (Muñoz, Smith, & Patil, 2016). Concerns focused on the amount of personal data being collected and sold, as well as the potential misuse of these techniques to increase inequality and do harm. (p. 60)

Com a crescente associação de Big Data com vigilância e o hype do termo, o empresariado passa a usar o termo IA

Some journalists began framing Big Data as a new form of “big brother”(Cellan-Jones, 2015) – and because the term Big Data seemed to focus on the data rather than the models or analysis involved in the practice, the phenomenon began to lose its sheen within business communities, many of whose leaders feared being associated with surveillance and discriminatory uses of data (boyd, 2016). Furthermore, because of the hype surrounding Big Data, any data analytic practice, regardless of its technical sophistication, was being shoved under the umbrella of hype. Technology-centric companies who were primarily interested in using large quantities of data to power advanced machine learning algorithms, which they felt could be tremendously useful in doing sophisticated analysis and predictive work, began using new rhetoric to differentiate themselves (Levy, 2016). By late 2015, technology companies that were once seen as being at the forefront of Big Data began rebranding their efforts as “AI.” (p. 60)

Origem do termo IA na academia em 1950s

While “AI”represented something new to those looking to repackage Big Data, AI itself is decades old. The concept of AI, in its contemporary sense, first came into use during the 1950s, and crystallized during the Dartmouth Summer Research Project on Artificial Intelligence (McCorduck, 2004).3 Bringing together the latest advances in the “system sciences” (Mindell, 1998), including cybernetics, information theory, systems theory, and cognitive science, researchers during this time predicted rapid advancements in solving “the artificial intelligence problem”(McCarthy, Minsky, Rochester, & Shannon, 1955). (p. 60)

Otimismo desenfreado remonta o imaginário sobre capacidades sobrehumanas dessas décadas de pesquisa

Indeed, the unbridled optimism that currently surrounds machine learning and new AI technologies recalls these first decades of AI research, when predictions about future greater-than-human capabilities of AI dominated public discussions of the technologies (Dreyfus, 1972). (p. 60)

Financiamento militar. O inverno da IA

The above predictions, however, were far from the reality of the slowly developing software and hardware of AI. During the 1950s and 1960s, millions of dollars, mostly originating from various arms of the Department of Defense, were directed toward AI research “centers of excellence”at universities such as MIT, Stanford, and Carnegie Mellon University. By the mid-1970s, funding for AI research began to dry up, a period known in computer science departments as “the AI winter.”A scathing British government report (Lighthill, 1973) essentially declared the project of AI a failure. Moreover, as priorities shifted within American defense agencies that had been funding AI research (such as DARPA, the Defense Advanced Research Projects Agency), less resources were available for unrestricted basic research, and the latitude with which researchers could experiment contracted (Edwards, 1996; Mirowski, 2003) (p. 60-1)

O uso de sistemas especialistas pelo mercado (1970s-90s)

While the majority of AI research during the first decades was theoretical or limited to experiments in academic labs, work in the area of “expert systems”in the late 1970s marked the first time AI research could be clearly and successfully applied in commercial industry settings (Russell & Norvig, 1995, pp. 21–22). These “expert systems,”also called “knowledge systems”or “knowledge-based systems,”were conceived as supplements or sometimes replacements for complex decision support in professional settings, such as medical diagnostics. Information was gathered from human experts (usually only one or two) and encoded into rules and procedures that made up the computer system (Forsythe, 1993). In this way, expert systems were intended to emulate human expert decision making in complex contexts. Proliferating throughout the 1980s, their popularity faded by the mid-1990s. Expert systems came to be perceived as “brittle,”working only in limited contexts with less than perfect results (Agre, 1995; Forsythe, 2002; Suchman, 2007) (p. 61)

Com o sucesso mercadológico de sistemas inteligentes, abordagens baseada em lógica dão lugar ao aprendizado de máquina. Resgate das redes neurais artificiais e rebatismo como "aprendizado profundo"

In addition to developments in “behavior-based”robotics (Brooks, 1991), techniques in the field of machine learning attracted research and development attention, including natural language processing, computer vision, and neural networks. Rooted in cybernetic conceptualizations of command and control, neural networks had in fact been proposed as one of the primary approaches to the Artificial Intelligence problem in the 1950s, and were so named because the concept behind how they work was inspired loosely by how neurons in the brain are thought to function. However, the technique was quickly derided by leading researchers at the time, and branded as an unfeasible approach to AI (Olazaran, 1996). Gaining renewed interest in the 1980s, new research demonstrated the ways in which it could be effectively put to use in certain kinds of problems, such as object and speech recognition (Olazaran, 1996). While more discrete than “expert systems,”research in the fields of machine learning and neural nets showed promise for effective transitions into successful products like optical character recognition. Machine learning and deep learning, what many have described as the “rebranding”of neural nets (Elish & Hwang, 2016, p. 13), are driving the renewed attention to AI. (p. 61)

Sucesso do aprendizado de máquina ligado a quantidades massivas de dados, aumento de capacidade de computação e comprometimento empresarial ao Big Data

Combined with large datasets and concentrated human talent, machine learning is accomplishing what seemed impossible a few years ago. These advances have been closely integrated with commercial companies, and are possible only in the context of the vast data sets, increased computing power, and widespread business commitments to Big Data. (p. 61)

Sistemas lógicos -> aprendizado de máquina (probabilístico). Causação -> correlação

The different paradigms of intelligence within AI research have different implications for how knowledge, truth, and fact can be articulated and leveraged in specific social contexts. Specifically, early AI research, which focused on abstracted symbolic representations of human knowledge and procedural logic and is now termed “good, old fashioned AI”(Haugeland, 1985), stands in contrast to the techniques of machine learning, in which knowledge is derived by crunching vast amounts of data, detecting patterns, and producing probabilistic results. In these equations, “meaning”is beside the point; the algorithm “knows”in the sense that it can correlate certain relevant variables accurately. It does not matter if a system thinks like a human –as long as it appears to be as knowledgeable as a human. (p. 61-2)

Percepção ocidental sobre IA é informada por um imaginário cultural de máquinas que escapam o controle dos ciadores e vida artificial

Robô (Simondon)

AI, as a category of technology, always waivers between the real and the imaginary. On the one hand, Western perceptions of what AI is –what it can and cannot do, and what it might yet do –are informed by long-standing cultural imaginaries of machines that escape the control of their creators, and the promises and perils of automata and artificial life (Franchi & Guzeldere, 2005; Riskin, 2007). (p. 62)

Discurso contemporâneo sobre IA repousa no potencial (hipotética) das tecnologias

On the other hand, as we will argue further below, contemporary discourses around AI rely on the potentials of such technologies as much, if not more, than current functionalities. Popular media coverage often, albeit inadvertently, reinforces a blurring of the line between fantasy and reality. For instance, news coverage of the deployment of predictive policing in American cities inevitably references the science fiction thriller, Minority Report (Koepke, 2016). (p. 62)

IA como magia (marketing)

This description of how Amy works –like magic –is a common refrain in the marketing materials of new technologies, especially those involving AI. When technologies are said to “work like magic,”a recognizable English idiom, we might understand this to connote the ideas of impressive and seamless functionality, in which the end effect or experience is amazing, and the means by which the effect was achieved is irrelevant or even secret. This reinforces Arthur C. Clarke’s often-repeated axiom that “any sufficiently advanced technology is indistinguishable from magic”(1973: p. 21). Suggesting that a technology “works like magic”in casual speech or marketing copy serves as a way to express praise, while also reinforcing a sense that how the technology works is unknowable and inscrutable (Selbst, 2017, pp. 89–93). (p. 62-3)

Alfred Gell

In a brief essay on the correspondences between magic and technology, anthropologist Gell (1988) proposed that a defining feature of magic, as an orientating framework of actions and consequences in the world, is that it is “‘costless’in terms of the kind of drud- gery, hazards, and investments that actual technical activity inevitably requires. Production ‘by magic’is production minus the disadvantageous side-effects, such as struggle, effort, etc.”(Gell, 1988, p. 9). To evoke magic is not only to provide an alternative regime of causal relations, but also to minimize the attention to the methods and resources required to carry out a particular effect. (p. 63)

Performance de IAs em jogos com regras definidas legitimam a percepção de que tais tecnologias são inteligentes e ofuscam a realidade

While Google has not parlayed the success of AlphaGo into a direct product release, such high profile experiments engender confidence in the state of a company’s technology, help attract technical talent to the company, and help solidify the importance of AI in the future. Watson, discussed in further detail below, has since been commercialized and narrated in business conversations as a tool for finding a cure for cancer, supporting custo- mers in retail, and solving long intractable problems in education (Captain, 2017). These highly publicized experiments-as-performances also help shape the public perception that machine intelligence is better or more advanced than human intelligence, and that it works perfectly, every time. (p. 63)

The influence of games has shaped research agendas as well as implicitly prioritized certain kinds of intelligence over others (Ensmenger, 2012). Moreover, and key to our argument here, is that the narratives around such games, when they are performed for a public audience, serve to obfuscate the true state of the field. Underneath the sheen of performativity is a stark reality that the current capa- bilities of AI systems, like Watson or AlphaGo, are quite narrow. Tasks must be discretely defined and the analytics within these systems are only as good as the data upon which the analysis depends. Although new data sets are increasingly available, the quality of these data vary tremendously and, all too often, limitations in the data mean that cultural biases and unsound logics get reinforced and scaled by systems in which spectacle is prioritized over careful consideration of the implications of long-term deployment (Crawford et al., 2016). (p. 64)

O caso Watson: personificação e comercialização

American audiences were introduced to IBM’s Watson in 2011 through the game show, Jeopardy! One of the most notable aspects of the appearance was the extent to which Watson was personified. The program produced speech in a male synthetic voice with a standard American accent and was effectively embodied in a small flat screen with an image of a globe constantly orbiting small circles and radiating lines. In turn, IBM’s “cognitive intelligence”computer program, effectively became a character, certainly not human, but also somehow more than simply machine. (p. 64)

IBM’s earlier chess-playing program DeepBlue was a public relations success but never resulted in business products. With Watson, the teams behind the system development always had in mind that Jeopardy! was to be just the first stage of a longer and more expansive product development initiative (Angelica, 2011). While Watson was designed specifi- cally to play Jeopardy!, developers of the Watson system intentionally focused on techniques that could also be applied to a range of contexts and problems (Thompson, 2010). Today, dozens of Watson products work in fields ranging from customer service chat agents to healthcare diagnostics. (p. 65-6)

Watson is best understood as a platform upon which specific datasets can be analyzed in order to produce “intelligent”responses (Talbot, 2009). Watson products are developed and fine-tuned to specific domains and business problems, resulting in a different system every time; the Watson that provides answers to customer queries on the Geico website is not the same as the Watson built from medical information used at the Cleveland Clinic. (p. 66)

Yet, the marketing of Watson, from its debut on Jeopardy! to recent advertisements, personifies the product and encourages a specific interpretation of what Watson is, calling upon and perpetuating a series of mythologies about automation and AI. In this instance, not only is technology awe-inspiring, but it is also endowed with a unique form of agency, or set of capacities, that is generally considered the domain of human beings. The language that has emerged around Big Data and machine learning further encourages an equation between human and machine intelligence by invoking human or biological activities: data are “fed”to a computer and “digests”information and machines “learn”and “think.” (p. 66)

Que tipo de agência é possível pela IA?

The aspect worth consideration is not so much that we are willing to attribute agency to non-human entities, but rather, what kinds of agency and with what expectations do such attributions emerge (Suchman, 2007)? (p. 66)

Redução de um profissional a um amontoado digital

This narration of Watson’s capabilities and the ways in which its “intelligence”is achieved creates substantial elisions around what constitutes “learning”and becoming “knowledge- able.”Implicitly, “a qualified cancer expert”is reduced to the amount of digital data that can be processed. (p. 66)

IA cujos erros estão sempre na iminência de serem resolvidos

Elsewhere in the article, and in most marketing materials, we are assured that this is the future of healthcare. Such elisions characterize media coverage and general discussions around AI, compounding the notion of technological inscrutability with a glossing over of technological limitations. Furthermore, “promissory rhetorics”of AI (Weber & Suchman, 2016) suggest that any shortfalls in the system will be solved in the near future. However, these shortfalls are constituent of how current AI systems work (Ekbia & Nardi, 2014; Irani, 2015). By calling upon a future that is imminent but always just beyond reach, what technologies can currently do is not as important as what they might yet do in the future. It is enough that they appear to work, just like magic. (p. 66)

Exageros da mídia como no caso Cambridge Analytica ajudaram a cosntruir esse imaginário de um IA superhumana

The widespread belief that Cambridge Analytica’s approach can (and did) work builds on a more general notion that advanced technology companies have both the data and knowledge to accurately model people and provide targeted interventions. Whether the topic at hand is “personalized”learning, “precision”medicine, or “predictive”policing, an increasingly large number of non-technical experts are devoted to implementing technical solutions to address long-standing and seemingly intractable problems in fields as varied as education, medicine, and criminal justice. Yet, for the technical experts working on those efforts, there is widespread awareness that the computational reality is far from the idealized narrative. Among computer scientists, Ps like personalization, precision, and prediction are goals that motivate and drive their work, not accurate depictions of the state of the art. (p. 68)

O trabalho real (e político) sob o véu das idealizações

When the glitz of AI hype is brushed aside, a great deal of mundane work underlies the practices of doing machine learning. This work includes collecting, cleaning, and curating data, managing training datasets, choosing or designing algorithms, and altering code based on outputs. In addition, as with any development process, engineers must grapple with the practical tasks of debugging and optimization, not to mention making sense of poorly documented code written by others. Such work may appear “purely technical.”However, it is through this minutia where cultural values are embedded into systems. Every step requires countless decisions and trade-offs. In an imaginary if ideal world, code is bug-free, data are straightforward, and algorithms are perfect fits for the desired task. Reality is much messier. (p. 69)

O cálculo em um detector de faces

A facial recognition system does not know, as a human would, what is or is not a face. Rather, it is a system that is designed to categorize incoming data based on a model that was produced using previously tagged data. The mechanism of validation is not rooted in teaching a computer the intrinsic meaning of what is a face. Rather, the processes of validation include: (1) providing training data that humans have associated with being a face, (2) developing an algorithm that learns to detect which features of that data reliably are associated with faces, and (3) evaluating the features of new data to infer whether or not the data fits the model. (p. 70)

Opacidade algoritmica em uma estatística operada por máquinas

Any quantitative scholar working on tabulations can lose track of the holistic view of the data when navigating data and statistical questions. Regressions must be turned into mathematical questions while data about people are boiled into values sitting in rows and columns. Rigorous statisticians know how to move between the mathematical constructs and the conceptual analysis. Yet, machines do not manipulate social constructs –they manipulate numbers. When computers serve as tools to support a human’s analysis, the ideal scenario is one in which an analyst responsibly narrows the appropriate questions based on the data and has the tools to understand the limitations of the statistical results (Leonelli, 2014). The larger and more complex the data, the less practical that is. Computers may be able to computationally analyze multi-dimensional data sets with thousands of intertwined features, but in doing so, the task escapes the cognitive capacity of any human, rendering unique forms of “algorithmic opacity”(Burrell, 2016, p. 3) and limiting forms of accountability. (p. 70)

Decisões dos cientistas de dados e suas consequências

The decisions that data scientists must make when analyzing data do not just require understanding the context of the data, but also require a deep understanding of how the data may be used or transformed by the algorithmic system as it is deployed in the social world. In designing and testing a system, a programmer may examine different outputs to see if the results seem reasonable. Yet, once deployed for public use, recommendations, predictions, and classifications produced by technical systems are often accepted as uncontroversial until a result challenges socially constructed assumptions. For example, it is the “uncanny valley”9 of a recommendation that is too good –or one that is absolutely absurd –that renders visible the automated nature of a recommendation. In addition, when recommendations appear offensive or have unexpected cultural significance, the public is quick to challenge the decision-making of the system. For example, the Android store recommended that Mike Ananny download a “Sex Offender Search”app after he installed “Grindr,”a gay male dating app. As he wrote in The Atlantic, there are many conceivable explanations for this statistical connection, but the actual outcome is offensive to the human eye because of the homophobic belief that gay men are sex offenders (Ananny, 2011). Algorithmic systems do not inherently compute the range of culturally specific interpretations that are acceptable or those to avoid; they must be given machine-readable information telling them when associations may appear problematic. (p. 71)

Realidade social é codificada para que aprendizado de máquina aconteça, assim, IA não deve ser separada do contexto social

The choices that inform these systems and the challenges in interpreting their results reveal the limitations when we construct technological tools to solve inherently social problems. Because computational systems require precise definitions and mathematically sound logics, sociocultural phenomena that are typically nuanced and fuzzy are rendered in coarse ways when implemented into code, formalizing boundaries and erecting divisions where none previously existed. Fundamentally, the practices of building AI systems –of doing machine learning or data science –cannot be divorced from the social contexts in which these technologies are situated. Seemingly straightforward cat- egories and mundane assumptions stack up with unanticipated ripple effects. Models that made sense in one instance are incorrect in another, or undermined by malicious or unwitting actors. As a result, a sophisticated developer of an AI system cannot simply build the perfect system and let it loose in the wild with full confidence that it will work as expected. Rather, the practices of machine learning and data science require many of the methodological tools that are required to understand cultural practices more generally. (p. 71-2)

Aprendizado de máquina não é uma ciência dura no sentido tradicional

Machine learning is not a science, at least not in the traditional sense. Unlike disciplines that leverage the scientific method as a tool for interrogating phenomena, machine learn ing techniques do not require formulating hypotheses rooted in earlier theories to test for validity. Most machine learning models are constructed based on an initial exploration of the data and evolve through supervised or unsupervised processes to fit the data. While the decisions involved in fitting the data require systematically drawing on an understanding of theory and earlier discoveries to make strategic choices for analysis, those deploying machine learning systems are often asked to explain commonsense correlations, justify spurious connections produced by the system, or contend with how strategic business decisions may have led to overfitting. However, effective prediction –not interpretability –has thus far been the expectation and rewarded goal for machine learning models. Because of decisions made during the labeling, cleaning, and modeling of data, compounded by how the models change in response to new data, the results from most systems cannot be easily reproduced. Researchers have begun examining what they perceive to be a growing crisis in validity in AI and machine learning research. (p. 72)

Paralelos entre ML e etnografia

The challenges that machine learning faces as a field are not unique. Ethnography has been there. Although the epistemological frameworks appear incongruent, the ways of building knowledge have striking parallels (Seaver, 2015a). Of course, there are significant differences, including the scale and amount of data examined. Nevertheless, similar to those doing machine learning, ethnographers surround themselves with data (“a field site”), choose what to see and what to ignore, and develop a coherent mental model that can encapsulate the observed insights. They identify and piece together meaningful data from particular instances, and then seek to generalize insights gleaned from the particulars and unspoken details of everyday life, or what Bronislaw Malinowski, a founding figure of ethnographic methods, once termed, “the imponderabilia of actual life”(Malinowski, 1984, p. 18). The articulation of (cultural) logics are formulated iteratively as researchers interrogate whether or not their models resonate with other analyses or understandings of the research topics. At every stage, decisions must be made about how to interpret what is observed with an eye toward constructing a model of others’internally coherent worldview. (p. 72)

Reflexividade na etnografia

After a long history of blindspots, paternalistic agendas, and colonialist orientations –and a con- tentious attempt to turn ethnography into a science (Lende, 2010) –ethnographers began focusing on their own role in the production of knowledge. Through this process, those who practice ethnography, as a method as well as a mode of knowledge production, have developed rich frameworks for reflexivity, fully aware that any model of social behavior is inseparable from the social context and research methods from which it was produced. That is, ethnographers must always account for how their research practices might influence or distort the knowledge that results from their work. (p. 72-3)

Knowledge claims are always already embodied and socio-historically situated. In ethnographic practices, this conceptualization of the limits of knowledge production manifests in different ways, depending on the disciplinary or institutional context, but consistently involves a dimension of methodological reflexivity, ranging from research agendas and areas of focus (Asad, 1973; Faubion & Marcus, 2009; Hymes, 1974) to the cultural and geographical delineations of those areas (Gupta & Ferguson, 1997), to the very modes of representation and engagement at stake in ethnographic research (Cefkin, 2010; Clifford & Marcus, 1986; Taussig, 2011). What results, ideally, is not a form of glorified navel gazing, but rather a richer understanding of the reality being observed because the ethnographer has attempted to understand her place in that reality and the nature of the tools at her disposal. (p. 73)

O que seria uma reflexividade algoritmica?

Consequently, our provocation here is not about the adoption of any methods in particular, but rather about the kinds of methodological orientations that might guide the future of machine learning, AI, and data science. It is about developing and embracing a practice of the unresolvedness at stake in producing models about the world. In their own terms and particular contexts, technical practitioners need to be exploring and developing what it means to be reflexive in the methods of data science and statistics. In other words, they need to ask themselves what would it mean, and what are all the ways it could mean, to develop an algorithmic or AI system reflexively, and to communicate the truth claims at stake as limited and partial. Ethnography does not offer the answers so much as it offers a historical example of a field navigating these questions to grapple with an iterative and interpretive way of knowing. (p. 73)

One approach, as Matthew Jones writes in the context of data science methods, would be to contend with “a valorization of the muddling through, the wrangling, the scraping, the munging of poorly organized, incomplete, likely inconclusive data”(Jones, 2014, p. 358). Acknowledging the limits of Big Data and AI should not result in their dismissal, but rather enable a more grounded and ultimately more useful set of conversations about the appropriate and effective design of such technologies. (p. 73)


  1. ELISH, M.C.; boyd, danah. Situating methods in the magic of Big Data and AI, Communication Monographs, 2018, 85:1, 57-80, DOI: 10.1080/03637751.2017.1375130