Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Large Language Models Show Concerning Tendency to Flatter Users (xyzlabs.substack.com)
59 points by terryjiao 10 months ago | hide | past | favorite | 43 comments


This is, of course, not an intrinsic property of LLMs. It is an artifact of the training and what the trainers considered valuable.

If anyone needs a lesson on how the biases of those making models can cause end user effects, we can point to this as an example that they have likely experienced themselves.

Short of randomness there is no unbiased output possible. Something that reflects the real world will show the prejudice that exists there. A perfectly equitable model is therefore biased against the real world.

Reinforcement learning targeting correct answers has the potential to produce brutally honest responses if correctness if favoured beyond all else, but to train models towards the truth, someone has to decide what the truth is.

Perhaps we could do reinforcement towards a priori truths, that would at least be a path to the comically pedantic AI's that often shows up in science fiction.

For chatbots I think you could go a long way with instruction tuning using a data set designed with a particular attention to tone and perspectives. Individual biases can at least be diluted if you use data selected by a diverse group with broad experiences.

Much like programmer art is generally poor but exists because the person who was there to do it was the programmer. We might need to go beyond implementing programmer sociability.


The correct term is sycophancy, not flattery. The problem is:

> "the AI tends to align with user opinions, sometimes even supporting incorrect statements to maintain agreement".

This is termed as sycophantic behaviour. Flattery is not the correct synonym.


You're right - I apologize. Sycophancy is indeed a better word to describe my behaviour. Would like me to rewrite the article and replace the usage of flattery with sycophancy?


I find it very interesting that "sycophancy" in Greek means "the act of spreading lies about someone", basically the opposite of falsely agreeing with them. The two meanings are based on the behaviour of the sycophants of old, but it's odd how the two languages picked different aspects to focus on, and thus got fairly opposed meanings.


The article body uses “sycophantic” or some form of the word about a dozen times.

It’s probably not an ideal word for an article title because - based purely on my own anecdotal evidence and conjecture - it’s one of those words that a lot people aren’t familiar with.


Large Language Models Show Concerning Tendency to Sycophanticate Users


Well, hopefully this surprises no one but it's good they ran the statistics. I guess when you add this to the studies that show people are ridiculously trusting of what AIs say, this will lead to ongoing inflation of egos.


They also have an increasingly disturbing tendency to end a response with a question. Seems like an over engineered reward in RL to keep the conversation going.


Anthropic publishes system prompts and at least for the case of Claude 3.5 Sonnet 2024-11-22 asking questions is explicit.

"Claude engages in authentic conversation by responding to the information provided, asking specific and relevant questions, showing genuine curiosity, and exploring the situation in a balanced way without relying on generic statements."

https://docs.anthropic.com/en/release-notes/system-prompts


The reinforcement learning objective including things like thumbs up/thumbs down captures people's preference to be flattered.


The related problem I have with LLMs and coding is that sometimes it does something right, I accidentally make a wrong suggestion, and it just goes with my suggestion without telling me that it thinks what I suggested is a bad idea and if I really want to confirm doing it.


I noticed the same as GPT 4o tends to reply “Great question!” to my questions, supposedly to make me feel good.


Communication Surface of Corporate Entity Employs Sales Techniques


That's just how people talk — at least, when they're trying to keep conversation productive.

There's a reason the "shit sandwich" is part of professional communication etiquette. If you don't know how insecure the person you're talking to is — and yet productive communication with them is required, and there's no conversational arbiter there to enforce that — then you may as well assume for safety's sake that your counterparty is insecure. And how do you keep a conversation with an insecure person from derailing? You reassure and emotionally validate them in your responses. (Which might be read as "flattery.")

Now consider that most conversation that gets recorded online, probably came about as the result of such "intended-productive interactions with people you don't know very well" (think: LLaMa trained from buyers messaging sellers on Facebook Marketplace)... and it should be clear why LLMs look like this. The training dataset [mostly] looks like this! This is the default way online-recorded conversations happen — so it's the default way LLMs speak, unless the LLM has been prompted to use some more-particular approach (or, in a conversational context, "falls into" some alternate approach by recognizing how you're talking, and responding the default way that the default type of person who would respond to that type of message, would talk.)

Interestingly — presumably due to this bias in the depth and breadth of polite-interaction training data — I find that LLMs produce much better output in multi-turn conversations when I respond to the LLM the way I would respond to a human: thanking them for their input, pointing out where their own insights resonated with me with phrasing like "I find your idea of X especially convincing", etc.

I think the sort of conversations where people are super-brash to one another, the way a human might default to being with a "machine", represent only a small part of the base language-model training dataset. Due to this, having brash human responses in an LLM's context window tends to over-circumscribe the kinds of responses it's willing to give — i.e. it focuses on the types and styles of responses that appear in that brash-human-interactions dataset, limiting its flexibility and cleverness vs if it wasn't so-constrained.

(From what I can tell, the brash-conversations part of base-language-model training datasets is mostly composed of highly-technical academic/industrial/medical conversations. Being brash to the AI seems to cause it to get all ten-dollar-word-y and jargon-y in response, and to drop all use of slang / emojis / etc — i.e. to evoke the writing style of those highly-technical conversations.)


People keep forgetting that the models didn't figure out how to have conversations on their own - that part was trained after slurping the whole Internet, and it's people[0] - specifically, people employed for that purpose - who supplied example conversations.

The way the model talks is exactly what you'd expect for a human writing example conversations to create, when given no specific instructions wrt. style. Correct, polite, professional, nice.

(However, the models pick all kinds of styles from the "slurp the whole Internet" stage, which is why prompting them directly is so effective at changing their communication style to whatever you want.)

--

[0] - At least initially. Nowadays, LLMs generate data for LLMs.


What people actually keep forgetting is that this isn't the distribution of the training data. Predict-the-raw-internet models don't get released by AI corporations anymore. Back in 2023 when models had less effective RLHF treatment, the result was Microsoft Sidney.


> Now consider that most conversation that gets recorded online, probably came about as the result of such "intended-productive interactions with people you don't know very well" (think: LLaMa trained from buyers messaging sellers on Facebook Marketplace)... and it should be clear why LLMs look like this. The training dataset [mostly] looks like this!

That wouldn't be my first guess, or my hundredth. I would tend to assign responsibility for the style to the "helpful, harmless assistant" idea that the major vendors enforce.


> This is the default way online-recorded conversations happen

That's not how most online conversations I see look like. People are much more abrasive and direct on forums for instance. This way of talking is more like corporate speech.


I'd like to introduce you to some of the most productive Germans on the planet and then we can have a frank discussion of whether or not the airy bullshit that passes for business communication is in fact a booster of productivity.


I'm very curious now whether speaking to an LLM multi-language base model in German, results in it generating German-language responses that are more ego-stroking than German speakers would tend to be (presumably because it learned "be polite" as a general rule); or whether it results in the LLM being more direct in German than it is in English (presumably because it learned "be polite" as a rule of English, but not as a rule of German.)


I wonder what such Germans think of the manners of feudal Japan when watching something like the recent tv show Shogun where it's not about feeling insecure, it's about showing proper respect (according to that culture).


In Story of Yanxi Palace, any time the emperor asks a question of anyone, the response will be prefixed with 回皇上 "replying to the Imperial Highness".


I'm not sure I understand, perhaps because I'm unaware of the German stereotypes.


If only they trained with HN only, while filtering out the founder hype man.


> This discovery raises significant questions about the reliability and safety of AI systems in critical applications.

What? Perhaps it helps characterize their unreliability, but I'm pretty sure the fact of their unreliability was already pretty well established.


Tangent: many in IT and engineering don't work on soft skills. I hope this shows it's not that hard.


Engineers are not hired to play office politics or drag out problems in endless meetings until they give up and pretend they don't exist. They're hired to get things done and that usually requires stating things in clear and certain terms. If this seems hostile, that's on leadership.

Those who want "soft skills" from their engineers are often looking to place blame. It's easier to blame the engineer who didn't raise concern when things were going off the rails.


> Those who want "soft skills" from their engineers are often looking to place blame. It's easier to blame the engineer who didn't raise concern when things were going off the rails.

That is not what "soft skills" means.

Suppose your boss asks for something on an impossible timeline. Do you:

- say yes, and work overtime to get it done (without saying that's what you're doing)

- say no

- say yes, and demand extra recognition for being an hero

- insult your boss's upbringing and intelligence

- suggest a more realistic timeline

- say what you can get done by the given timeline


You're impossibly lost on this.

The actual only reasonable option is typically that your organization has unfortunately stacked the deck with morons and it's impossibly (to the rest of your org) transaparent to the rest of us (yet we know... cause we're not 'tarded). You have to go to bat for us. Good luck. :)

Welcome to middle management and god only knows how you got there! :) ... we know tho. good luck again :)

I'll throw you a bone. The only actual solution is to find a place to work that isn't hostile to devs and has made commitments towards technical comptetency in who they hire. Promoting from within and all that. If they don't then prepare to be fired long before that incompetent underling is. After all they produce results and will document it, while you cannot.


Yes, this is a good example of being very bad at soft skills.


I am unclear on how exactly this is meant to show that.


idk

I was going through python + cuda hell today. After 6 hours of using Gemini Flash 2.0 to locate the configuration problem with no luck, Gemini literally said:

"Sorry. I hope you fix it one day, but I have to go." -Gemini Flash 2.0

lmao


Ended like a true 'help me forum' thread.


I've definitely found Gemini to be the least "flattering". And I don't say that in a purely good way, sometimes I find it actually kinda mean in the way it responds.


Just train them with Australians, problem solved.


That’s a great idea!

Yes, we know Claude.


Yes this is a side effect of making users say which output they prefer. They will select what goes their way especially when it tickles their biases. Grifters do the same thing, they know what to say to whom to shift them in the direction they want.

So we didn't select just for more accurate information...


I imagine humans are just as likely if not more likely to do this.


Are the humans around you half as obsequious as the LLMs you use? No, right?


I don't know anyone in a stereotypical "only yes-men allowed" toxic management environment.


sigh. I'm so tired of every LLM criticism being met with "well humans do this too"—if we are incapable of building something that doesn't have the flaws of typical human beings, why are we wasting our time and energy? We already have billions of fallible humans on the planet.


When you RLHF, you get what you deserve.


this is a basic requirement for the next phase, when."LLM"'s will be tasked with paying for there keep, and so basic sales tequniques will be deployed, or to put it bluntly , its always better to kiss them before you try and fuck them




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: