Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They can’t summarize something that hasn’t been summarized before.


About a year ago, I gave a film script to an LLM and asked for a summary. It was written by a friend and there was no chance it or its summary was in the training data.

It did a really good -- surprisingly good -- job. That incident has been a reference point for me. Even if it is anecdotal.


I'm not as cynical as others about LLMs but it's extremely unlikely that script had multiple truly novel things in it. Broken down into sufficient small pieces it's very likely every story element was present multiple times in the LLM's training data.



I'm not sure I understand the philosophical point being made here. The LLM has "watched" a lot of movies and so understands the important parts of the original script it's presented with. Are we not describing how human media literacy works?


The point is that if you made a point to write a completely novel script, with (content-wise, not semantically) 0 DNA in it from previous movie scripts, with an unambiguous but incoherent and unstructured plot, your average literate human would be able summarize what happened on the page, for all that they'd be annoyed and likely distressed by how unusual it was; but that an LLM would do a disproportionately bad job compared to how well they do at other things, which makes us reevaluate what they're actually doing and how they actually do it.

It feels like they've mastered language, but it's looking more and more like they've actually mastered canon. Which is still impressive, but very different.


This tracks, because the entire system reduces to a sophisticated regression analysis. That's why we keep talking about parameters and parameter counts. They're literally talking about the number of parameters that they're weighting during training. Beyond that there are some mathematical choices in how you interrelated the parameters that yields some interesting emergent phenomena, and there are architecture choices to be made there. But the whole thing boils down to regression, and regression is at its heart a development of a canon from a representative variety of examples.

We are warned in statistics to be careful when extrapolating from a regression analysis.


And have you managed to perform such a test or is that just an imaginary result you're convinced will happen ? Not trying to be snarky here but i see this kind of thing a lot and 'this is my model of how LLMs work and so this is how they would behave in this test I cannot verify' is very uncompelling.


I'm not making a philosophical point. The earlier comment is "I updated a new script and it summarized it," I was simply saying the odds of that script actually being new is very slim. Even though obviously that script or summaries of it do not exist in their entirety in the training data, its individual elements almost certainly do. So it's not really a novel (pun unintended?) summarization.



I'd like to see some examples of when it struggles to do summaries. There were no real examples in the text, besides one hypothetical which ChatGPT made up.

I think LLMs do great summaries. I am not able to come up with anything where I could criticize it and say "any human would come up with a better summary". Are my tasks not "truly novel"? Well, then I am not able, as a human, to come up with anything novel either.


they can, they just can't do it well. at no point does any LLM understand what it's doing.


If you think they can't do this task well I encourage you to try feeding ChatGPT some long documents outside of its training cutoff and examining the results. I expect you'll be surprised!


It can produce something that looks like a summarization based on similarly matching texts.

Depending how unique the text is determines how accurate the summarization is likely to be.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: