Raça e gênero (na inteligência artificial)

TechCrunch, CC BY 2.0, via Wikimedia Commons

Rafael Gonçalves

07/07/2024

Timnit Gebru raça gênero sexismo algorítmico racismo algorítmico algoritmos inteligência artificial aprendizado de máquina fichamento

Fichamento do verbete "Raça e gênero"¹ de Timnit Gebru.

Produção de viés como algo inerente da prática científica. Exemplo: Charles Darwin (racismo e sexismo)

Like many disciplines, often those who perpetuate bias are doing it while attempting to come up with something better than before. However, the predominant thought that scientists are “objective” clouds them from being self-critical and analyzing what predominant discriminatory view of the day they could be encoding, or what goal they are helping advance. For example, in the nineteenth century, Charles Darwin worked on his theory of evolution as a carefully researched and well-thought-out alternative to creationism. What many leave out, however, is that the title of his book was On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life (emphasis added), in which he writes: “The western nations of Europe . . .now so immeasurably surpass their former savage progenitors [that they] stand at the summit of civilization. . . .[T]he civilised races of man will almost certainly exterminate, and replace, the savage races throughout the world.”2 And in his subsequent book, The Descent of Man and Selection in Relation to Sex, he notes that “[m]an is more courageous, pugnacious and energetic than woman, and has a more inventive genius. His brain is absolutely larger, [while] the formation of her skull is said to be intermediate between the child and the man.”3 (p. 253-4)

Because of the myth of scientific objectivity, these types of claims that seem to be backed up by data and “science” are less likely to be scrutinized. Just like Darwin and Hunt, many scientists today perpetuate the view that there is an inherent difference between the abilities of various races and sexes. However, because their works seem to be corroborated by data and empirical experiments, these views are likely to gain credibility. What is not captured in any of these analyses is, for example, that the IQ test in and of itself was designed by white men whose concept of “smartness” or “genius” was shaped, centered, and evaluated on specific types of white men. (p. 255)

Thus, the types of data-driven claims about race and gender made by the likes of Darwin are still alive today and will probably be for the foreseeable future. The only difference will be the method of choice used to “corroborate” such claims. (p. 255)

Ciclo de realimentação descontrolado no uso de dados passados para decisão futura

An aptitude test designed by specific people is bound to inject their subjective biases of who is supposed to be good for the job and eliminate diverse groups of people who do not fit the rigid, arbitrarily defined criteria that have been put in place. Those for whom the tech industry is known to be hostile will have difficulty succeeding, getting credit for their work, or promoted, which in turn can seem to corroborate the notion that they are not good at their jobs in the first place. It is thus unsurprising that in 2018, automated hiring tools used by Amazon and others which naively train models based on past data in order to determine future outcomes, create runaway feedback loops exacerbating existing societal biases. (p. 256)

A hiring model attempting to predict the characteristics determining a candidate’s likelihood of success at Amazon would invariably learn that the undersampled majority (a term coined by Joy Buolamwini) are unlikely to succeed because the environment is known to be hostile toward people of African, Latinx, and Native American descent, women, those with disabilities, and members of the LGBTQ+ community and any community that has been marginalized in the tech industry and in the United States. The person may not be hired because of bias in the interview process, or may not succeed because of an environment that does not set up people from certain groups for success. Once a model is trained on this type of data, it exacerbates existing societal issues driving further marginalization. (p. 256-7)

The model selects for those in the nonmarginalized group, who then have a better chance of getting hired because of a process that favors them and a higher chance of success in the company because of an environment that benefits them. This generates more biased training data for the hiring tool, which further reinforces the bias creating a runaway feedback loop of increasing the existing marginalization. (p. 257)

Desregulamentação em tecnologias de reconhecimento facial e viés de raça

Predictive policing is only one of the data-driven algorithms employed by U.S. law enforcement. The perpetual lineup report by Clare Garvie, Alvaro Bedoya, and Jonathan Frankle discusses law enforcement’s unregulated use of face recognition in the United States, stating that one in two American adults are in a law enforcement database that can be searched and used at any time. There is currently no regulation in place auditing the accuracy of these systems, or specifying how and when they can be used. (p. 257-8)

As it stands, unregulated usage of automated facial analysis tools is spreading from law enforcement to other high-stakes sectors such as employment. And a recent study by Buolamwini and Gebru shows that these tools could have systematic biases by skin type and gender.18 After analyzing the performance of commercial gender classification systems from three companies, Microsoft, Face++, and IBM, the study found near perfect classification for lighter skinned men (error rates of 0 percent to 0.8 percent), whereas error rates for darker skinned women were as high as 35.5 percent. After this study was published, Microsoft and IBM released new versions of their APIs less than six months after the paper’s publication, major companies such as Google established fairness organizations, and U.S. Senators Kamala Harris, Cory Booker, and Cedric Richmond called on the FBI to review the accuracy of automated facial analysis tools used by the agency.19 Even those in the healthcare industry cautioned against the blind use of unregulated AI. (p. 258)

18: Joy Buolamwini, and Timnit Gebru. “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification,” Proceedings of Machine Learning Research 81 (2018): 77–91. (p. 258)

Raça x tom de pele

In The Cost of Color, sociologist Ellis Monk notes that “some studies even suggest that within-race inequalities associated with skin tone among African Americans often rival or exceed what obtains between blacks and whites as a whole.”20 Thus, instead of performing their analysis by race, Buolamwini and Gebru used the Fitzpatrick skin-type classification system to classify images into darker and lighter skinned subjects, analyzing the accuracy of commercial systems for each of these subgroups. (p. 258)

Reprodução de esteriótipos de gênero no design da tecnologia

For instance [[of tools that perpetuate harmful gender stereotypes even with high accuracy]], the task of automatic gender recognition (AGR) itself implicitly assumes that gender is a static concept that does not frequently change across time and cultures. However, gender presentations greatly differ across cultures—a fact that is often unaccounted for in these systems. Gender classification systems are often trained with data that has very few or no transgender and nonbinary individuals. And the outputs themselves only classify images as “male” or “female.” For transgender communities, the effects of AGR can be severe, ranging from misgendering an individual to outing them in public. Hamidi et al. note that according to the National Transgender Discrimination Survey conducted in 2014, 56 percent of the respondents who were regularly misgendered in the workplace had attempted suicide. While there are well-documented harms due to systems that perform AGR, the utility of these tools is often unclear. (p. 259-60)

One of the most common applications of AGR is for targeted advertising (e.g., showing those perceived to be women a specific product). This has the danger of perpetuating stereotypes by giving subliminal messages regarding artifacts that men versus women should use. For example, Urban Outfitters started personalizing their website based on the perceived genders of their frequent customers. But the program was scrapped after many customers objected to gender-based marketing: some shoppers often bought clothes that were not placed in their ascribed gender’s section, and others were opposed to the concept of gender-based targeting in and of itself. (p. 260)

What does it mean for children to grow up in households filled with feminized voices [[ literally with Alexa, Cortana, Holly, and Siri, and fictionally in films Samantha (Her), Joi (Blade Runner 2049) and Marvel’s AIs, FRIDAY (Avengers: Infinity War), and Karen (Spider-Man: Homecoming)]] that are in clearly subservient roles? AI systems are already used in ways that are demeaning to women without explicitly encoding gendered names and voices. For example, generative adversarial networks (GANs), models that have been used to generate imagery among many other things, have been weaponized against women. Deep fakes, videos generated using GANs, create pornographic content using the faces of ordinary women whose photos have been scraped from social media without consent. (p. 260)

Captura da pauta sobre o impacto negativo da IA em grupos marginalizados pelo complexo acadêmico-industrial-corporativo estadunidense

After a group of people from marginalized communities sacrificed their careers to shed light on how AI can negatively impact their communities, their ideas are now getting co-opted very quickly in what some have called a capture and neutralize strategy. In 2018 and 2019 respectively, the Massachusetts Institute of Technology (MIT) and Stanford University announced interdisciplinary initiatives centered around AI ethics, with multibillion dollar funding from venture capitalists and other industries, and war criminals like Henry Kissinger taking center stage in both the Stanford and MIT opening events (p. 261-2)

Mirroring what transpired in political anthropology, these well-funded initiatives exclude the voices of the marginalized people who they claim to support, and instead center powerful entities who have not worked on AI ethics, and in many cases have interests in proliferating unethical uses of AI. Like diversity and inclusion, ethics has become the language du jour. While Stanford’s human-centered AI initiative has a mission statement that “[t]he creators of AI have to represent the world,” the initiative was announced with zero black faculty initially listed on the website out of 121 professors from multiple disciplines. (p. 262)

While refusing to stop selling automated facial analysis tools to law enforcement without any regulation in place, and actively harming the careers of two women from marginalized communities negatively impacted by Amazon’s product, the company then claimed to work on fairness by announcing a joint grant with NSF. This incident is a microcosm for the capture and neutralize strategy that disempowers those from marginalized communities while using the fashionable language of ethics, fairness, diversity, and inclusion to advance the needs of the corporation at all costs. (p. 262)

While two black women pointed out the systematic issues with Amazon’s products, and a third assembled a coalition of AI experts to reinforce their message, many in the academic community continue to publish papers and do research on AI and ethics in the abstract. As of 2019, fairness and ethics have become safe-to-use buzzwords, with many in the machine learning community describing them as “hot” topic areas. However, few people working in the field question whether some technologies should exist in the first place and often do not center the voices of those most impacted by the technologies they claim to make more “fair.” For example, at least seven out of the nine organizers on a 2018 workshop on the topic of ethical, social, and governance issues in AI32 at a leading machine learning conference, Neural Information Processing Systems, were white. If an entire field of research uses the pain of negatively impacted communities, co-opts their framework for describing their struggle, and uses it for the career advancement of those from communities with power, the field contributes to the further marginalization of communities rather than helping them. The current movement toward sidelining many groups in favor of powerful interests that have never thought about AI ethics except in the abstract, or have only been forced to confront it because of works from people in marginalized communities like Raji and Buolamwini, shows that the fairness, transparency, accountability, and ethics in the AI movement are on the road to doing “parachute science” like many of the fields before it. (p. 262-3)

Ciência de paraquedas/helicóptero

This colonial attitude is currently pervasive in the AI ethics space. Some have coined the terms “parachute research” or “helicopter research”34 to describe scientists who “parachute” in to different marginalized communities, take what they would like for their work whether it is data, surveys, or specimens, and leave. This type of work not only results in subpar science due to researchers who conduct it without understanding the context, but it further marginalizes the communities by treating them as caged curiosities (as mentioned by Joy Buolamwini) without alleviating their pain. The best way to help a community is by elevating the voices of those who are working to make their community better—not by doing parachute research (p. 263)

Injustiças a grupos marginalizados como consequência do desenho de algoritmos por um grupo hegemônico

To start, had the field of language translation been dominated by Palestinians as well as those from other Arabic speaking populations, it is difficult to imagine that this type of mistake in the translation system [[from arabic 'Good morning' to english 'Hurt them' and hebrew 'attack them']] would have transpired. Tools used by Google and Facebook currently work best for translations between English and other Western languages such as French, reflecting which cultures are most represented within the machine learning and natural language processing communities. Most of the papers and corpora published in this domain focus on languages that are deemed important by those in the research community, those who have funding and resources, and companies such as Facebook and Google, which are located in Silicon Valley in the United States. It is thus not surprising that the overwhelming bias of the researchers and the community itself is toward solving translation problems between languages such as French and English. (p. 264-5)

Viés societal e viés de automação

Secondly, natural language processing tools embed the societal biases encoded in the data they are trained on. While Arabic-speaking people are stereotyped as terrorists in many non-Arab majority countries to the point that a math professor was interrogated on a flight due to a neighboring passenger mistaking his math writings for Arabic,36 similar stereotypes do not exist with the majority of English, French, or other Western language speakers. Thus, even when mistakes occur in translations between languages such as French and English, they are unlikely to have such negative connotations as mistaking “good morning” for “attack them.” (p. 265)

Racial and gender biases in natural language processing tools are well documented. As shown by Bolukbasi et al. and Caliskan et al., word embeddings that were trained on corpora such as news articles or books exhibit behaviors that are in line with the societal biases encoded by the training data. For example, Bolukbasi et al. found that word embeddings could be used to generate analogies, and those trained on Google news complete the sentence “man is to computer programmer as woman is to ‘X’ with “homemaker.”37 Similarly, Caliskan et al. demonstrated that in word embeddings trained from crawling the web, African American names are more associated with unpleasant concepts like sickness, whereas European American names are associated with pleasant concepts like flowers.38 Dixon et al.39 have also shown that sentiment analysis tools often classify texts pertaining to LGBTQ+ individuals as negative. Given the stereotyping of Muslims as terrorists by many western nations, it is thus less surprising to have a mistake resulting in a translation to “attack them.” This incident also highlights automation bias: the tendency of people to overtrust automated tools. An experiment designed by scientists at Georgia Tech University to examine the extent to which participants trust a robot, showed that they were willing to follow it toward what seemed to be a burning building, using pathways that were clearly inconvenient. In the case of the Palestinian who was arrested for his “good morning” post, authorities trusted the translation system and did not think to first see the original text before arresting the individual. (p. 265-6)

Viés na classificação do Google Photos (e sua relação com preconceitos históricos)

In the Google Photos incident, there were as many instances of white people being mistaken for whales as black people being misclassified as gorillas. However, the connotation of being mistaken for a whale is not rooted in racist and discriminatory history such as black people being depicted as monkeys and gorillas.41 Even if someone could convince himself or herself that algorithms sometimes just spit out nonsense, the structure of the nonsense will tend vaguely toward the structure of historical prejudices. (p. 266)

Amplificação de assimetrias de poder com o uso de automatização

The existing power imbalance coupled with these types of systematic errors disproportionately affecting marginalized groups makes proposals such as the extreme vetting initiative by the United States Immigration and Customs Enforcement (ICE) even more problematic and scary. The 2018 initiative proposes that ICE partners with tech companies to monitor various people’s social network data with automated tools and use that analysis to decide whether they should be allowed to immigrate to the United States are expected to be good citizens or are considered to be at risk of becoming terrorists. While any attempt to predict a person’s future criminal actions is a dangerous direction to move toward warned by science fiction movies such as Minority Report and TV series like Black Mirror, the proposal is even scarier paired with the systematic errors of the automated tools that would be used for such analyses. Natural language processing and computer vision based tools have disproportionate errors and biases toward those who are already marginalized and are likely to be targeted by agencies such as ICE (p. 266-7)

Ideal de neutralidade e objetividade na "visão de lugar nenhum" na engenharia

If we are to work on technology that is beneficial to all of society, it has to start from the involvement of people from many walks of life and geographic locations. The future of whom technology benefits will depend on who builds it and who utilizes it. As we have seen, the gendered and racialized values of the society in which this technology has been largely developed have seeped into many aspects of its characteristics. To work on steering AI in the right direction, scientists must understand that their science cannot be divorced from the world’s geopolitical landscape, and there are no such things as meritocracy and objectivity. Feminists have long critiqued “the view from nowhere”: the belief that science is about finding objective “truths” without taking people’s lived experiences into account. This and the myth of meritocracy are the dominant paradigms followed by disciplines pertaining to science and technology that continue to be dominated by men. (p. 267)

Tecnologias de vigilância como falha na comunidade de criptografia

The educational system must move away from the total abstraction of science and technology and instead show how people’s lived experiences have contributed to the trajectory that technology follows. In his paper The Moral Character of Cryptographic Work, Phillip Rogaway sees the rise of mass surveillance as a failure of the cryptographic community. He discusses various methods proposed in cryptography and outlines how the extreme abstraction of the field and lack of accounting for the geopolitical context under which cryptography is used has resulted in methods that in reality help the powerful more than the powerless. He calls on scientists to speak up when they see their technology being misused, and cites physicists’ movement toward nuclear disarmament asking cryptographers to do the same. (p. 268)

GEBRU, Timnit, 2020. Race and Gender. Em: The Oxford Handbook of Ethics of AI [em linha]. Oxford University Press. 252–269. [Acesso em 16 janeiro 2024]. ISBN 978-0-19-006739-7. Disponível em: https://doi.org/10.1093/oxfordhb/9780190067397.013.16
↩