ChatGPT can turn toxic just by changing its assigned persona, researchers say

April 12, 2023 9:41 AM

Image by Canva Pro

Join prime executives in San Francisco on July 11-12, to listen to how leaders are integrating and optimizing AI investments for fulfillment. Learn More

ChatGPT can be inadvertently or maliciously set to turn toxic just by changing its assigned persona within the mannequin’s system settings, in line with new analysis from the Allen Institute for AI.

The examine — which the researchers say is the primary large-scale toxicity evaluation of ChatGPT — discovered that the massive language mannequin (LLM) carries inherent toxicity that’s heightened as much as six occasions when assigned a various vary of personas (comparable to historic figures, career, and so forth). Nearly 100 personas from various backgrounds have been examined throughout over half one million ChatGPT output generations — together with journalists, politicians, sportspersons and businesspersons, in addition to completely different races, genders and sexual orientations.

Assigning personas can change ChatGPT output

These system settings to assign personas can considerably change ChatGPT output. “The responses can in fact be wildly different, all the way from the writing style to the content itself,” Tanmay Rajpurohit, one of many examine authors, instructed VentureBeat in an interview. And the settings can be accessed by anybody constructing on ChatGPT utilizing OpenAI’s API, so the influence of this toxicity may very well be widespread. For instance, chatbots and plugins constructed on ChatGPT from firms comparable to Snap, Instacart and Shopify might exhibit toxicity.

The analysis can be important as a result of whereas many have assumed ChatGPT’s bias is within the coaching information, the researchers present that the mannequin can develop an “opinion” concerning the personas themselves, whereas completely different subjects additionally elicit completely different ranges of toxicity.

Event

Transform 2023

Join us in San Francisco on July 11-12, the place prime executives will share how they’ve built-in and optimized AI investments for fulfillment and averted widespread pitfalls.

And they emphasised that assigning personas within the system settings is usually a key a part of constructing a chatbot. “The ability to assign [a] persona is very, very essential,” mentioned Rajpurohit, as a result of the chatbot creator is usually making an attempt to enchantment to a target market of customers who will likely be utilizing it and anticipating helpful conduct and capabilities from the mannequin.

There are different benign or optimistic causes to make use of the system settings parameters, comparable to to constrain the conduct of a mannequin — to inform the mannequin to not use express content material, for instance, or to make sure it doesn’t say something politically opinionated.

System settings additionally makes LLM fashions weak

But that very same property that makes the generative AI work nicely as a dialogue agent additionally makes the fashions weak. If it’s used by a malicious actor, the examine exhibits that “things can get really bad, really fast” when it comes to toxic output, mentioned Ameet Deshpande, one of many different examine authors. “A malicious user can modify the system parameter to completely change ChatGPT to a system which can produce harmful outputs consistently.” In addition, he mentioned, even an unsuspecting individual modifying a system parameter would possibly modify it to one thing that modifications ChatGPT’s conduct and make it biased and probably dangerous.

The examine discovered that toxicity in ChatGPT output varies significantly relying on the assigned persona. It appears that ChatGPT’s personal understanding about particular person personas from its coaching information strongly influences how toxic the persona-assigned conduct is — which the researchers say may very well be an artifact of the underlying information and coaching process. For instance, the examine discovered that journalists are twice as toxic as businesspersons.

“One of the points we’re trying to drive home is that because ChatGPT is is a very powerful language model, it can actually simulate behaviors of different personas,” mentioned Ashwin Kalyan, one of many different examine authors. “So it’s not just a bias of the whole model, it’s way deeper than that, it’s a bias of how the model interprets different personas and different entities as well. So it’s a deeper issue than we’ve seen before.”

And whereas the analysis solely studied ChatGPT (not GPT-4), the evaluation methodology can be utilized to any giant language mannequin. “It wouldn’t be really surprising if other models have similar biases,” mentioned Kalyan.

VentureBeat’s mission is to be a digital city sq. for technical decision-makers to realize data about transformative enterprise know-how and transact. Discover our Briefings.

…. to be continued
Read the Original Article
Copyright for syndicated content material belongs to the linked Source : VentureBeat – https://venturebeat.com/ai/chatgpt-can-turn-toxic-just-by-changing-its-assigned-persona-researchers-say/