AI Used to Curb Hate on Twitter Actually Punishes Black People

By N. Jamiyla Chisholm Aug 12, 2019

Tweets believed to have been written by Black users are noted as sexist, hateful, harassing or abusive at higher rates than those thought to have been tweeted by White users, says a study released this month by researchers at Cornell University. The artificial intelligence bias was so stark that in some cases, the algorithm flagged what it thought was Black speech more than twice as frequently.

The researchers analyzed five datasets marked for abusive language, totaling a combined 270,000 Twitter posts, where all five had been flagged by humans as abusive language or hate speech. The researchers then trained a machine learning model to predict hateful or offensive speech for each set and tested the results against a sixth database of more than 59 million tweets that included Census data, location identifiers and words associated with specific demographics. While a Twitter user’s race couldn't be confirmed, tweets were classified as either “Black-aligned” or “White-aligned. The researchers say the datasets likely aren’t the exact ones Twitter uses, but note that the consistency of the results suggests widespread racial bias, which they attribute to an oversampling of Black people’s tweets, implicit bias and poor training.

“These systems are being developed to identify language that’s used to target marginalized populations online,” the study’s co-author Thomas Davidson said in a statement issued shortly after the team presented the paper. “It’s extremely concerning if the same systems are themselves discriminating against the population they’re designed to protect.”

Researchers say that to fix this racial bias, the people who initially enter the data into these social media platforms need better training on diverse language and that companies should develop systems that are sensitive enough to detect different social and cultural contexts.

“When we as researchers, or the people we pay online to do crowdsourced annotation, look at these tweets and have to decide, ‘Is this hateful or not hateful?’ we may see language written in what linguists consider African American English and be more likely to think that it’s something that is offensive due to our own internal biases,” Davidson said. “We want people annotating data to be aware of the nuances of online speech and to be very careful in what they’re considering hate speech.”