Artificial Intelligence’s Human Biases

Artificial intelligence used to be science fiction, but since the first half of the 20th century, this technology has rapidly grown to permeate every sector of the economy. Facial recognition and natural language processing are prime examples of our society’s technological prowess, but these technologies also enforce pre-existing biases in our society via the over- or under-representation of specific populations in the training data and, consequently, the people for which they work. In order to have a more just society, we must recognize and combat this trend.


The MIT Technology Review defines artificial intelligence (AI) as “the quest to build machines that can reason, learn, and act intelligently”— abilities that are typically associated with human intelligence. While forms of this technology have been through many iterations, experts state that today, one of the pillars of AI is data. Moreover, much of artificial intelligence is currently implemented with machine learning: “algorithms [that] use statistics to find patterns in massive amounts of data.” This approach proves extremely useful in complex processes which lack defined rules. For example, two applications of AI that use machine learning are facial recognition technology — software that identifies individuals from images of their faces — and natural language processing — a branch of AI focusing on the ability of computers to interpret and manipulate human language. While these technologies bring extraordinary possibilities, they face issues including that of their over- or under-representation of certain sectors of the population.

Facial Recognition Software

Despite its many benefits, facial recognition unfairly disadvantages women and people of color. This software has been used gainfully in law enforcement, in casinos to help self-identified gambling addicts stay away, in hotels to help staff greet guests, at a college in the U.K. to take attendance, in supermarkets to prevent underage drinking, and to prevent identity theft among other types of fraud. Multiple governments have implemented such systems for targeted purposes; for instance, India recently announced plans to implement a national facial recognition system to identify criminals and missing children.

However, studies have shown that facial recognition works less accurately for individuals with darker skin. In 2018, researchers from MIT and Stanford University experimented on three commercially-available facial recognition programs to test their ability to correctly classify gender. They found that while the programs incorrectly identified white men 0.8% of the time, darker-skinned women were identified incorrectly up to 46.8% of the time. Therefore, if law enforcement attempted to identify a criminal using this software, individuals with darker skin and women would more likely be misidentified as culprits. Such accusations can drastically impact individuals’ lives. Moreover, thanks to these biases, women of color will be unable to reap the benefits of facial recognition technology to the extent at which white men will. Therefore, this technology puts people of color and women at a disadvantage compared to their white and male counterparts. 

Moreover, facial recognition technology enforces systemic racism already present in law enforcement. In 2017, the Pew Research Center reported that although African Americans constitute only 12% of the American population, they amount to 30% of sentenced individuals. Additionally, the rise of the Black Lives Matter movement has shed a light on the bias in some officers’ conduct; an article in the Journal of Epidemiology and Community Health from 2018 reported that black and Native American individuals are disproportionately likely to lose their lives during police shootings. These disparities reflect the racial bias prevalent in our criminal justice system and the fact that — thanks to their high degree of incarceration — people of color constitute more than their share of criminal data. Law enforcement databases are biased because of our unjust system, so any technology relying on this data — from which machine learning processes learn — will further perpetuate its bias. Research at Georgetown Law School suggests that when facial recognition software is used in law enforcement, African American individuals are most likely to be selected as potential suspects due to the demographic’s disproportionate representation in law enforcement databases. The overrepresentation of people of color in prisons contributes to our society’s inequality and ruins the lives of members of the overrepresented groups. By extending this bias, this technology exacerbates this systemic issue.

Natural Language Processing

A similar phenomenon exists in natural language processing. Instances of natural language processing include apps that help individuals with disabilities communicate or convert sign language to text, machine translation systems such as Google Translate and algorithms to help police predict and identify motives. However, these advantages only exist for certain populations. A study published by the Association for Computational Linguistics found that YouTube is more likely to incorrectly autocaption female speakers than male speakers. Thus, search systems — which sometimes rely on indexing text from videos and are necessary for users to find information — will be less likely to accurately find information provided by women. Additionally, viewers who must use automatic captioning, such as the hearing-impaired, will have a harder time understanding what women have to say than men. Consequently, women are given a disadvantage when it comes to communicating information. 

Similarly, these systems have more difficulty interpreting dialects of English — which tend to be spoken by minority populations — than what is considered the standard version of English. A study conducted by a lab at the University of Massachusetts, Amherst reports that tweets written in African American English — a dialect of English that tends to be spoken by African Americans — are more likely classified as non-English rather than conventional English tweets. Language identification is often the first step in natural language processing programs; if certain dialects of English are not classified as such, data using them will be excluded, which will remove representation of the populations that use these dialects. 

Representation of all members of our society is crucial in order to portray an accurate image of who we are and to include all. Choosing not to represent certain populations or to overrepresent groups, which leads to detrimental effects, suggests that some are not equal members of our society. To maintain a society with diverse perspectives and opinions — which are critical for innovation and growth — all members must be adequately represented; technology that does anything but is destructive to our progress.


While such acknowledgments may deter us from AI, steps may be taken to address this issue and contribute to the accuracy of this software. The first step is to increase transparency regarding how AI works. Artificial intelligence is like a black box; very few people know the details of how processes work. If we have a better understanding of how this technology arrives at its conclusions, we will be able to better pinpoint the sources of bias within the data that it relies on and dispel the blind trust placed in this technology. Therefore, researchers must focus on the explainability of artificial intelligence. 

Moreover, an April 2019 study conducted by AI Now, a research group at New York University, suggests that the limitations of the technology stem from the lack of diversity in not just training data sets but also the individuals who implement it. Over 80% of professors involved in AI are male, and the gender gap in the industry is even worse, as women constitute only 15% and 10% of AI researchers at Google and Facebook respectively. Furthermore, public data on employees who are trans or belong to other gender minorities does not exist, and only 2.5%  of Google and 4% of Microsoft and Facebook employees are black. To combat these gender and race gaps, AI companies must promote diversity through greater commitment to the hiring and retention of underrepresented groups and by promoting inclusive workplace environments.

Finally, despite the evidence that indicates the partiality of data, numbers and technology are generally considered to be free of prejudice or bias. People trust statistics blindly, consider data-driven sources unbiased by default, and allow the continued use of biased technology that comes from biased data. This will only perpetuate these biases at a greater rate. Hence, organizations that create AI must scrutinize the bias in their data. Furthermore, in order to prevent us from blindly enforcing the societal biases that this software reflects, we as a society must recognize the subjectivity and prejudice present in these artificial intelligence systems.

The examples discussed above demonstrate that facial recognition and natural language processing software disadvantage people of color and women regarding representation. The solution is not as simple as merely procuring more and better data. Data reflects society, which is biased, so the data is inherently biased; consequently, any representative sample will also be biased, which ensures biased software. In order to gather unbiased data, we must reconcile the biases present in society — which is exceedingly difficult, if not impossible — before creating technology that continues to rampantly enforce said biases. If we acknowledge the biases in the data that artificial intelligence relies on, increase its transparency, and remedy the lack of representation in its workplaces, there is no reason why this innovative technology should stop altogether; in the future, this software could be an asset for not just a select few, but everyone. 

(This article is up to date as of January 6, 2020.)