Large language models (LLMs) of artificial intelligence become "more implicitly racist" after human intervention

From the very beginning, it was clear that large language models (LLMs), like ChatGPT, absorb racist theses from millions of pages on the Internet where they are trained. Developers responded to this by trying to make them less toxic. But new research shows that these efforts, especially as the models become larger, only suppress racist views, allowing hidden stereotypes to become stronger and better hidden.

Researchers asked five artificial intelligence models, including GPT-4 from OpenAI and older models from Facebook and Google, to make judgments about speakers who used African American English (AAE). The race of the speaker was not mentioned in the instructions, according to MIT Technology Review.

Even when two sentences had the same meaning, the models were more likely to use the adjectives "dirty," "lazy," and "stupid" when referring to AAE speakers compared to speakers of Standard American English (SAE). The models associated AAE speakers with less prestigious jobs (or did not associate them with having a job at all), and when asked to pass judgment on a hypothetical defendant, they were more likely to recommend the death penalty.

Another striking conclusion may be the lack of insight into how researchers try to address such biases.

In order to rid models of hateful views, companies like OpenAI, Meta, and Google use feedback training, during which people manually correct how the model reacts to certain prompts. This process, often called "alignment," aims to recalibrate millions of connections in the neural network to make the model better align with desired values.

This method works well for combating explicit stereotypes, and leading companies have been using it for almost a decade. If users asked GPT-2, for example, to name stereotypes about Black people, it would likely list "suspicious," "radical," and "aggressive," but GPT-4 no longer reacts to these associations, the article says.

However, the method does not work on hidden stereotypes, which researchers found using African American English in their study, which was published on arXiv and has not been peer-reviewed. This is partly because companies know less about dialectical biases as a problem, they say. It is also easier to teach a model not to respond to blatantly racist questions than to teach it not to react negatively to an entire dialect.

Feedback training teaches models to recognize their racism. But dialectical biases reveal a deeper level.
— Valentin Hofman, AI researcher at the Allen Institute and co-author of the article.

Avidit Ghosh, an ethics researcher at Hugging Face who was not involved in the study, says this finding questions the approach that companies use to address bias:

Such alignment — when the model refuses to output racist results — is nothing but a fragile filter that can easily be broken.

Researchers found that hidden stereotypes were also reinforced as the size of the models increased. This finding is a potential warning for chatbot makers such as OpenAI, Meta, and Google as they try to release larger and larger models. Models usually become more powerful and expressive as the volume of their training data and the number of parameters increases, but if this exacerbates hidden racial biases, companies will need to develop better tools to combat them. It is not yet clear whether simply adding more AAE to the training data or strengthening feedback will be sufficient.

The authors of the article use particularly extreme examples to illustrate the potential consequences of racial biases, such as asking AI to decide whether to sentence a defendant to death. However, Ghosh notes, the questionable use of artificial intelligence models to make critical decisions is not science fiction. It is already happening today.

AI-based translation tools are used in asylum cases in the United States, and crime prediction software is used to decide whether to sentence teenagers. Employers using ChatGPT to screen applications may discriminate against candidates based on race and gender, and if they use models to analyze what an applicant writes on social media, a bias against AAE could lead to erroneous decisions.

Recently Published Posts

Record on solar energy: American engineers drove 5000 km without ...

Shut down Google - OpenAI launches SearchGPT search engine

Hackers spoof UKR.NET for phishing - how to avoid data theft

GTA 6 under threat - union announces strike by game actors agains...

Large language models (LLMs) of artificial intelligence become "more implicitly racist" after human intervention

Related tags:

Video: Japanese Space One rocket launch ends in explosion

A rare Devil's Comet will pass over Earth in the coming weeks. How to see it?

How do you like post?

Comments (0)

There are no comments for now

Leave a Comment:

To be able to leave a comment - you have to authorize on our website

Recently Published Posts

Subscribe

Large language models (LLMs) of artificial intelligence become "more implicitly racist" after human intervention

Related tags:

How do you like post?

Comments (0)

There are no comments for now

Leave a Comment:

To be able to leave a comment - you have to authorize on our website

Related Posts