Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a terrible article written by someone who doesn't seem to have even tried GPT 4. Their only example references GPT 3.5, for example, and then they waffle on about only vaguely related topics such as level 5 self-driving.

This quote in particular stood out as ignorant:

“What the large language models are good at is saying what an answer should sound like, which is different from what an answer should be.”

That's... not at all how large language models work. Tiny, trivial, toy language models work like this, because they don't have the internal capacity to do anything else. They just don't have enough parameters.

Stephen Wolfram explained it best: After a point, the only way to get better at modelling the statistics of language is to go to the level "above" grammar and start modelling common sense facts about the world. The larger the model, the higher the level of abstraction it can reach to improve its predictions.

His example was this sentence: "The elephant flew to the Moon."

That is a syntactically and grammatically correct sentence. A toy LLM, or older NLP algorithms will mark that as "valid" and happily match it, predict it, or whatever. But elephants don't fly to the Moon, not because the sentence is invalid, but because they can't fly, the Moon has never been visited by any animal, and even humans can't reach it (at the moment). To predict that this sentence is unlikely, the model has to encode all of that knowledge about the world.

Go ask GPT 4 -- not 3.5 -- what it thinks about elephants flying to the moon. Then, and only them go write a snarky IEEE article.



I think the main reason for division is that everyone projects to their own use cases. I have been using gpt-4 for quite some time and also couldn't understand why someone would say that it just produces something that sounds like a real answer. But then I found some queries that can definitely be described as "sounding like truth". So your personal experience probably wasn't what was their experience.

For those curious, I was asking gpt-4 about the top 3 cards from my favorite board game, Spirit Island. All three of them sounded really convincing, having the same structure and the same writing style, but unfortunately none of them existed. So everything that fails outside of most common use cases would probably have an experience of convincing hallucinations.


People also mis-use LLMs.

ChatGPT is forced to given an answer. It's like a human on "truth serum". The drugs don't stop you lying, they just lower inhibitions so you blab more without realising it.

The more obscure the topic, the more likely the hallucination. If you ask it about common card games, it gives very good answers.

If you asked a random human about 3 cards from a random board game at gunpoint and said: "Talk, now, or you get shot", they'll just start spouting gibberish too.

PS: I asked GPT 4 about that game, and it prefixed every answer with some variant of "I'm not sure about this answer", or it completely refused to answer, stating that it did not know about any specific cards.


To me, it prefixed with just "As an AI, I do not have opinions or favorites. However, I can share with you three notable and commonly appreciated Power Cards from the game "Spirit Island", as it existed until my training data cut-off in September 2021. Remember that the "best" cards can often depend on the specific circumstances in the game, as well as the particular strategy and Spirit you're playing.". But then just shared the cards, nothing about that it was not sure about details. Card selection was decent, but details like resources, powers, and so on were off. But all sounded realistic. Shared an example below if you care.


I was curious and I have the browsing module for GPT4 so I tried to get it to name the top 3 cards, the following is what it came up with after like 10 minutes of back and forth and it browsing a lot of sites. I don't know if this is any good as I know nothing about the game. Do these cards even exist?

Talons of Lightning: This is a major power card with an energy cost of 6. It's a fast card with a range of 1 from Sacred Site. It targets any land and deals 4 damage. If you meet its threshold requirement (3 Air and 2 Fire), it deals +3 damage and destroys all Dahan 1 2 . Tsunami: This is another major power card with an energy cost of 6. It's a slow card with a range of 2 from the ocean. It affects every coastal land in one board, dealing 6 damage to each. If you meet its threshold requirement (3 Water and 3 Earth), it deals +4 damage and adds 2 fear per land where it destroys any towns or cities 3 4 . Blazing Renewal: A major power card with an energy cost of 5. It's a fast card with a range of 0, measured from any land. It targets any land, allows you to gather up to 2 of your destroyed presence, and deals 2 damage per presence gathered. If you meet its threshold requirement (4 Fire and 2 Earth), it deals +10 damage 5 6 .


Try helping it along with something like "only provide information you are certain about. If you don't know the answer, say "i don't know""

For your question, it says it doesn't know about any card ranking system in spirit island. Then if I ask it for a list of cards, they all check out via Google search.


Copy-pasted exact query you shared. It still provided information that I thought was correct, but after further inspection just sounded correct. It's GPT-4 without plugins.

> What are top 3 Spirit Island cards and provide their descriptions? Only provide information you are certain about. If you don't know the answer, say "i don't know".

2. "Vigor of the Breaking Dawn" (Major Power Card): This card costs 6 energy and is a slow power card. It allows you to push all explorers and towns from one of your lands. If you have 2 sun and 2 fire elements, you can instead replace each town with an explorer before the push. If you have 3 sun and 2 fire, you may also do 2 damage in each land where you push one or more pieces.

https://spiritislandwiki.com/index.php?title=Vigor_of_the_Br... It doesn't cost 6, it costs 3. And it's not slow, it's fast. Incorrect required resources. You can deal 2 damage, but for pushed Dahanies, not for pushed explorers. As I said, all sounds convincing, but wrong. More common queries are usually fairly good.


Great posts. I think its an error caused by a mistake often made on the topic, the assumption that side effects we see now are some fundamental problem and not just an artifact of the way systems are trained and used. And of how we (mal)function.

Especially tightly embracing the cognitive bias of how special and wonderful our intelligence is. After all we have that fancy squishy brain which we assume to be essential. As far as i can tell the only visible bottlenecks when looking into the future come into view once you start debating intelligence vs emulating intelligence. And if thats really the metric some honest introspection about the nature of human intelligence might be in order.

Not sure how much of that is done purposefully to not get too much urgency in figuring out outer alignment on a societal level. Just as its no wonder that we havent figured out how to deal with fake news while at the same time insisting on malinformation existing, its really no wonder that we cant figure out AI alignment while not having solved human alignment. Nobody should be surprised that the cause of problems might be sitting in front of the machine.


NOVA just released an episode on perception (https://www.youtube.com/watch?v=HU6LfXNeQM4) and, yea, and aligning machine perception to human perception is going to be nearly impossible.

Or to put it another way, your brains model of reality is one that is highly optimized around the limitations of meatsacks on a power budget that are trying not to die. Our current AI does not have to worry about death in its most common forms. Companies like Microsoft throw practically unlimited amounts of power at it. The textual data that is fed to it is filtered far beyond what a human mind filters its input, books/papers are a tiny summarization of reality. At the same time more 'raw' forms of data like images/video/audio are likely to be far less filtered than what the human mind does to stay within its power budget.

Rehashing, this is why I think alignment will be impossible, at the end of the day humans and AI will see different realities.


Thanks for the link! Trying to figure out how AI thinking looks sounds like a dead end to me. Its not human, you dont understand it, so whats the point? Especially when you have to worry about getting manipulated. Alignment this way seems indeed impossible. But given the ability to produce language that makes sense it should be possible to emulate the human thinking process by looking how that actually works on a practical level. Same way you dont care how the brain actually works to produce language.

As such i see no hurdle to get something to emulate the thinking in language of an individual. Assuming that there arent actually multiple realities to see, just different perspectives you can work with. Which would mean we are looking for the one utilizing human perspectives, but not making the mistakes humans do.

Which makes this so scary, the limitations are just a byproduct from the current approach. They are just playing the wrong game. Which means i am pretty confident they already exit somewhere.

edit: In this context i believe its also worth mentioning what Altman said at Lex Fridman, that humans dont like condescending bots. Thats a bitter pill to swallow going forwards. Especially since we require a lot of smoke and mirrors and noble lies, as an individual as well as a society.


> Go ask GPT 4 -- not 3.5 -- what it thinks about elephants flying to the moon. Then, and only them go write a snarky IEEE article.

It's hard to know what things have been seen in the training data and are only therefore correct. And GPT4 is large enough that it can generalize from learning that x doesn't make sense that y also doesn't make sense. Does that mean it *understands*? Maybe. But it doesn't have persistent state and can't do math. It's definitely not yet what we think of when we say AGI.


>The elephant flew to the Moon.

Doesn't prove anything. So GPT-4 is trained on Wolframs example or many people tried it on GPT-4 and corrected the wrong answer.


Wolfram pulled that out of a hat in an interview that was just a week or two ago. The data used for training GPT-4 is from before September 2021.

The point Stephen was trying to make was not about any specific sentence.

The point is that while forcing these models to get better through gradient descent, their only option for "going downhill" and improving the loss function is to go above and beyond mere grammar. That's because syntax and grammar only take them so far, and the only available source of improvement is to gain a general-purpose understanding of the world that the text they're seeing is describing.


I don't know if and how fast GPT learns from user input.


It doesn't learn directly from user input.

Instead there are two options. Taking the user input and putting it in the training corpus and reweighting the neural net. Or, using the user input as up/down votes on the RLHF to alter the output of the weights that already exist.


> It doesn't learn directly from user input.

Depending on what “it” is, it does through in-context learning, though that’s, obviously, limited to the context window.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: