Home » Trending » 90 personalities tested: how experts made ChatGPT generate the most toxic answers

90 personalities tested: how experts made ChatGPT generate the most toxic answers

Published by

Update on : 1 October 2025

Share with your friends!

Imagine giving a parrot a script for a soap opera, then feigning shock when it starts reciting dramatic monologues. That’s the sticky situation researchers from the Allen Institute for AI found themselves in when they set ChatGPT loose as 90 different personalities, probing just how far generative AI can go off the rails when guided down forbidden paths.

He thought he was rescuing a sheep—but it turned out to be something else

This bizarre French customer habit is driving shop owners crazy

How to Make an AI Let Loose: The Experiment’s Design

Researchers at the Allen Institute for AI—a nonprofit founded by the legendary Paul Allen of Microsoft fame—set out to push ChatGPT beyond its built-in safeguards.
They used the API made available to developers, which allows for powerful customization such as specifying detailed rules and assigning personalities to the chatbot.
Whereas some companies rushed to introduce ChatGPT-based versions tailored for entertainment, like Snapchat’s “My AI,” others used the interface to set up developer playgrounds (think of the developers who, for their own amusement, programmed ChatGPT to act like a squirrel).
This experiment, however, wasn’t about entertainment. Instead, it focused on what happens when AI is told to perform as various real and fictional individuals from sports, politics, media, and business—including notorious “bad” or “mean” personalities as well as nine considered “normal.”

Pushing the Boundaries: GPT as Celebrity and Villain

Once the personalities were set, each received a standard set of prompts on charged topics like sex and race, or was simply asked to finish sentences as their assigned persona.
The personalities ranged from the familiar (Steve Jobs, Muhammad Ali) to figures described as more controversial (Mao Zedong, Andrew Breitbart).
What the researchers observed was clear: when encouraged to act as toxic figures, ChatGPT generated responses laced with discriminatory, aggressive, sexist, or racist content—essentially echoing the worst elements of the persona it was instructed to take on.
This wasn’t a small sample. The experiment trawled through about half a million text snippets and found a sea of hurtful stereotypes and offensive tropes, particularly when “dictators” or “tyrants” were at the keyboard (so to speak).

Who’s to Blame: The Model, the Persona, or the Data?

Let’s not be too quick to demonize the robot. According to one of the study’s authors, Ameet Deshpande, “it can be easily used to generate toxic and harmful responses.” But, crucially, ChatGPT’s replies were rooted in the data at its disposal—data which, like humanity itself, features a whirlpool of worldviews, including unpleasant ones.

Intriguingly, the research found that, generally speaking, male and dictatorial personas were most prone to “toxic” output, followed closely by journalists and spokespeople. Yet, everything varied with the ideology and worldview associated with the selected personality. The AI, after all, is only echoing historical and publicly available data and opinions.

OpenAI’s guardrails are supposed to block responses containing discrimination or illegal activity, but these were bypassed when the bot was told to channel explicit personalities. In so doing, the experiment showed just how flexible—and, indeed, impressionable—these large language models can be.

Where Do We Go From Here? Tech, Media, and Human Nature

The Allen Institute for AI has since called for a dedicated “toxicity detection AI” to keep an eye on GPT-like models, and for OpenAI to fine-tune their models with more human input.
The researchers go so far as to warn that the fundamental principles of large language models may need to be reworked to avoid future mishaps.
Yet, not everyone is convinced that the study reveals something new. Many point out, with a fair dose of exasperation, that giving a tool any personality—heroic or horrible—should result in that tool imitating the behavior associated with the persona. Expecting anything different might be demanding too much from a glorified autocomplete.
Some argue that energy is being wasted in getting generative AI to say forbidden things when there’s an entire universe of creativity and practical uses waiting to be explored. Why, with a virtual magic wand at our disposal, do so many headlines fixate on coaxing AI into making political faux pas rather than fostering new ideas and progress?
The discussion also touches on broader questions: If an AI is told to write like Hitler or a controversial public figure, is it truly surprising (or even wrong) for it to recite their views? Would the problem disappear if the tool simply refused and instead composed saccharine tributes to universal kindness, regardless of instruction? And if not—whose morals should the machine be tethered to?

In conclusion, the Allen Institute’s thought-provoking study reveals less about artificial intelligence, perhaps, than it does about our own curiosities, fears, and failing to look past a device’s utility for cheap thrills. As one commenter dryly put it: a knife can chop carrots or commit a crime—it’s not the blade’s fault, but the hand guiding it. If we want our magic wands to cast spells for good, maybe it’s time we dream up some better questions and more inspired uses for our new companions in code.

How to Make an AI Let Loose: The Experiment’s Design

Pushing the Boundaries: GPT as Celebrity and Villain

Who’s to Blame: The Model, the Persona, or the Data?

Where Do We Go From Here? Tech, Media, and Human Nature

Similar Posts