Hacker Newsnew | past | comments | ask | show | jobs | submit | lsy's commentslogin

It's disheartening that a potentially worthwhile discussion — should we invest engineering resources in LLMs as a normal technology rather than as a millenarian fantasy? — has been hijacked by a (at this writing) 177-comment discussion on a small component of the author's argument. The author's argument is an important one that hardly hinges at all on water usage specifically, given the vast human and financial capital invested in LLM buildout so far.


Going to a popular restaurant that accepts app delivery orders (or a grocery store in a neighborhood where people prefer to pay for delivery) is an objectively bad experience. The kitchen or checkout line is backed up with delivery orders, there are a bunch of delivery drivers double-parked or loitering near the front, and due not to any moral failing but rather what must be a crushing grind, the drivers are for the most part rushed and inconsiderate of the staff or other customers.

The class of people who order delivery regularly are generally trading the short-term reward of convenient food for way more money than makes sense, too little of that money benefits the class of people who do the delivering, and as the article points out, it is essentially harming the business it's being ordered from.

I would love to see more restaurants and stores declining to support this kind of system. While there may be some marginal profit now, in the long run the race to the bottom is going to mean fewer sustainable businesses.


At the very least, I make an effort to pick up food in person these days. Saves me a lot of money, is better for the restaurant, and since it's not my livelihood I can just show up a bit early, park properly, and hang around, ensuring that the food will be as fresh as possible when I get home and avoiding any rush.

The animosity I sometimes see between the restaurant staff and the delivery drivers can be really uncomfortable. It's not shocking, they have competing incentives and I think there's a pretty stark class/culture divide, but it's unfortunate when a system like this pits workers against each other that are just both trying to do their job as best they can.


I feel like this needs an editor to have a chance of reaching almost anyone… there are ~100 section/chapter headings that seem to have been generated through some kind of psychedelic free association, and each section itself feels like an artistic effort to mystify the reader with references, jargon, and complex diagrams that are only loosely related to the text. And all wrapped here in a scroll-hijack that makes it even harder to read.

The effect is that it's unclear at first glance what the argument even might be, or which sections might be interesting to a reader who is not planning to read it front-to-back. And since it's apparently six hundred pages in printed form, I don't know that many will read it front-to-back either.


From a rhetorical perspective, it's an extended "Yes-set" argument or persuasion sandwich. You see it a lot with cult leaders, motivational speakers, or political pundits. The problem is that you have an unpopular idea that isn't very well supported. How do you smuggle it past your audience? You use a structure like this:

* Verifiable Fact

* Obvious Truth

* Widely Held Opinion

* Your Nonsense Here

* Tautological Platitude

This gets your audience nodding along in "Yes" mode and makes you seem credible so they tend to give you the benefit of the doubt when they hit something they aren't so sure about. Then, before they have time to really process their objection, you move onto and finish with something they can't help but agree with.

The stuff on the history of computation and cybernetics is well researched with a flashy presentation, but it's not original nor, as you pointed out, does it form a single coherent thesis. Mixing in all the biology and movie stuff just dilutes it further. It's just a grab bag of interesting things added to build credibility. Which is a shame, because it's exactly the kind of stuff that's relevant to my interests[3][4].

> "Your manuscript is both good and original; but the part that is good is not original, and the part that is original is not good." - Samuel Johnson

The author clearly has an Opinion™ about AI, but instead of supporting they're trying to smuggle it through in a sandwich, which I think is why you have that intuitive allergic reaction to it.

[1]: https://changingminds.org/disciplines/sales/closing/yes-set_...

[2]: https://en.wikipedia.org/wiki/Compliment_sandwich

[3]: https://www.oranlooney.com/post/history-of-computing/

[4]: https://news.ycombinator.com/item?id=45220656#45221336


https://wii-film.antikythera.org/ - This is a 1-hour talk by the author which summarizes what seems to be the gist of the book. I haven't read the book completely. I read a few sections.

Personally, I think the book does not add anything novel. Reading Karl Friston and Andy Clark would be a better investment of time if the notion of predictive processing seems interesting to you.


I guess I am the odd one out here. Reading it front-to-back has been a blast so far and even though i find my own site's design to be a bit more readable for long text, I certainly appreciate the strangeness of this one.


You might prefer this sort of thing: A Definition of AGI https://arxiv.org/abs/2510.18212


Ooh, that looks very cool. The lack of a concrete definition of AGI and a scientifically (in the correct domains) backed operationalization of such a definition that can allow direct comparisons between humans and current AIs, where it isn't impossible for humans and/or easy to saturate by AIs, is much needed.


Yes, the word for all of that is "prolix".


I got the same impression as well. I think I've become so cynical to these kinds of things that whenever I see this kind of thing, I immediately assume bad faith / woo and just move on to the next article to read.


It's interesting to call this a pre-mortem as it seems mainly organized around thinking positively past the imperfections of the technology. It's like a pre-mortem for the housing crisis that focuses on the benefits of subprime mortgage lending.

What I'd expect to see is an analysis of how to address or prevent the same situation as previous bubbles: that society has allocated resources to a specific investment that are far in excess of what that investment can fundamentally be expected to return. How can we avoid thinking sloppily about this technology, or getting taken in by hucksters' just-so stories of its future impact? How can we successfully identify use-cases where revenues exceed investment? When the next exciting tech comes around, how can we harness it well as a society without succumbing to irrational exuberance?


I don't know if you could have an independent government institution to help regulate booms and busts like the Fed does with the money supply? I'm not sure what you'd do with AI but there are fairly obvious things that could have been done with housing like restrict lending on the upside and spend on infrastructure in the bust.

Elected politicians have perverse incentives to let bubbles run so they can claim it's their policies providing never ending growth.


I think this leaves out what is probably the most likely future for this technology, having a similar destiny to most technologies as a tool. Both of these visions assume (I think incorrectly) a trend towards ubiquity, where either every interaction you as a person have is mediated by computers, or where within a certain "room" every interaction anyone has is mediated by computers.

But it seems more likely that like other technologies developed by humanity, we will see that computers are not efficient for, or extensible to, every task, and people will naturally tend to reach for computers where they are helpful and be disinclined to do so when they aren't helpful. Some computers will be in rooms, some will get carried around or worn, some will be integrated into infrastructure.

Similar to the automobile, steam powered motors, and electricity, we may predict a future where the technology totally pervades our lives, but in reality we eventually develop a sort of infrastructure that delimits the tool's use to a certain extent, whether it is narrow or wide. If that's the case then the work for the field is less about shoving the tech into every interaction, and more about developing better abstractions to allow people to use compute in an empowering rather than a disempowering way.


It already IS ubiquitous. What is the path to non-ubiquity then? Most people are depending on it in many personal contexts. A lot of people are even using it in their jobs whether others agree with it or not. Everyday it's becoming more ubiquitous than before.

Smart phones are this way for example. You may see them as just tools, but we became centaurs with our phones. I don't think being a "tool" precludes it from being a centaur or ubiquitous. I agree with you on some points, but I don't think the distinction you're making is valid here.


No doubt it's a profit margin game, but I wish the big e-reader companies (Kindle, Kobo) would take a foray into this form factor. The friction of navigating through an Android interface into an app is just enough to negate the convenience benefit of a pocketable device. But the mainstream e-readers are unfortunately just big enough to require a jacket or a bag to carry them in.


I'm sure it's nearly an academic distinction, but:

> Basically, for any given region, we find its highest point and assume that there is a perfectly placed sibling peak of the same height that is mutually visible.

Shouldn't you always add 335km to the horizon distance to account for the possibility of Everest (i.e. a taller sibling peak) being on the other side of the horizon?


You're right, but all lines of sight are mutual, so we will notice this oversight when checking the other peak.

This seems poorly explained, but I think the author was in a hurry to get to the main algorithm, and sped through the intro.


Author here. I really appreciate this question because it's the entire reason I wrote the post. I feel like this is a unique problem and so I'm sure I'm not considering all the possibilities.

I _think_ your suggestion is covered by the fact that I'm basing the size of the tile on the single highest point that it contains. The steps are:

    1. Take any point and calculate the furthest theoretical distance it could see if there were another point at just the perfect distance away for mutual visibility. Note how there could of course be an Everest, but I don't check because step 2 should solve that.
    2. Based on that furthest theoretical distance I create a tile of that exact width around the point and then check to see what the new highest point is within that tile _and_ its surroundings. "Surroundings" here means a border region around the tile of the same width as the tile itself. These surroundings don't get viewsheds calculated for them, they're just auxiliary data.
    3. If a higher point is found, then increase the width based on that new highest point and repeat step 2. If no higher point is found then the tile is ready.


Impressive that this was done in 3 days at all, but to anyone who is familiar at all with System 7's appearance, the screenshot is almost comically "off" and gives away that this is not a straight port so much as some kind of clean-room reimplementation. The attached paper is more reserved, calling this a "bootable prototype".


It's likely that they didn't have the rights to use the original fonts or icons.


And yet they advertise: "Chicago Bitmap Font: Pixel-perfect rendering of the classic Mac font"


It's a bitmap font, so someone took some screenshots and used those. Typefaces can't be copyrighted.


Fixing "theoretical" nondeterminism for a totally closed individual input-output pair doesn't solve the two "practical" nondeterminism problems, where the exact same input gives different results given different preceding context, and where a slightly transformed input doesn't give a correctly transformed result.

Until those are addressed, closed-system nondeterminism doesn't really help except in cases where a lookup table would do just as well. You can't use "correct" unit tests or evaluation sets to prove anything about inputs you haven't tested.


There is no such thing as "exactly the same input, but with different preceding context". The preceding context is input!

If you were to obtain exactly the same output for a given input prompt, regardless of context, then that would mean that the context is being ignored, which is indistinguishable from the session not maintaining any context such that each prompt is in a brand new empty context.

Now what some people want is requirements like:

- The different wording of a prompt with exactly the same meaning should not change anything in the output; e.g. whether you say "What is the capital of France" or "What is France's capital" the answer should be verbatim identical.

- Prior context should not change responses in ways that don't have any interaction with the context. For instance, a prompt is given "what is 2 + 2", then the answer should always be the same, except if the context instructs the LLM that 2 + 2 is to be five.

These kinds of requirements betray a misunderstanding of what these LLMs are.


While I get that this is how LLMs work, I think you should think backwards from the user / from what AI as a field is aiming for and recognize that the „naive“ way of the parent to ask for reliable responses no matter what the „context“ is, is exactly what a good AI system should offer.

„The context is the input“ betrays a misunderstanding of what (artificial) intelligence systems are aiming for.


Then we need something else. This is not how LLMs work. They are simple statistical predictors, now universal anwsering machines.


I agree mostly. They are all that you say, but if you think about the conditional distribution that you are learning, there is nothing preventing us in principle from mapping different contexts to the same responses. It is rather a practical limitation that we don’t have sufficient tools of shaping these distributions very soundly. All we can do is throw data at them and hope that they generalize to similar contexts.

We have observed situations where agentic LLM traces on verifiable problems with deterministic (greedy) decoding lead to either completely correct or completely wrong solutions depending on the minutes on the clock which are printed as coincidental output of some tool that the LLM used.

I think there may be some mild fixes to current models available , for example it is worrying that the attention mechanism can never fully disregard any token in the input, because the softmax will always assign a > 0 weight everywhere (and the NN has no way of setting a logit to -infinity). This directly causes that it is extremely difficult for the LLM to fully ignore any part of the context reliably.

However Yann LeCun actually offers some persuasive arguments that autoregressive decoding has some limitations and we may need something better.


> They are simple statistical predictors, now universal anwsering machines.

I see this a lot. I kinda' doubt the "simple" part, but even beyond that, is there any evidence that statistical predictor can't be a universal answering machine? I think there's plenty of evidence that our thinking is at least partially a statistical predictor (e.g. when you see a black sheep you don't think "at least one side of this sheep is black", you fully expect it to be black on both sides)

I'm not saying that LLMs _are_ universal answering machines. I'm wondering why people question that they are/they can become one, based on the argument that "fundamentally they are statistical predictors". So they are. So what?


Does your definition of "universal answering machine" include the answers being correct?

If it does, statistical predictors can't help you because they're not always correct or even meaningful (correlation does not imply causation).

If it doesn't then, by all means, enjoy your infinite monkeys


> These kinds of requirements betray a misunderstanding of what these LLMs are.

They do not. Refusing to bend your requirements to a system that can't satisfy them is not evidence of misunderstanding the system.

And if you tack on "with X 9s of reliability" then it is something LLMs can do. And in the real world every system has a reliability factor like that.


Sure. But the context always starts with the first input, right? And how can you guarantee—or why should you guarantee—that the reply to the first input will always be the same? And if that’s not the case, how can we ensure the preceding context remains consistent?


If an input along with the context generated some random seed or hash this would certainly be possible. Just paste your seed over to your coworker, they supply it to the model and it contains all contextual information.


I wonder if there's a way to use an LLM to rewrite the prompt, standardizing the wording when two prompts mean the same thing?


It's going to backfire. In real scenarios (not regression testing) users don't want to see the exact same thing twice out of the LLM in the same session in spite of trying to refine the result with more context.

There are going to be false positives: text that is subtly different from a previous response is misidentified as a duplicate such that the previous response is substituted for it, frustrating the user.


Google search rewrites misspelled search queries and also lets you override it if that's not what you want. Maybe something similar would work?


Not an expert, but I've been told RAG in combination with a database of facts is one way to get more consistency here. Using one of the previous examples, you might have a knowledge store (usually a vector database of some kind) that contains a mapping of countries to capitols and the LLM would query it whenever it had to come up with an answer rather than relying on whatever was baked into the base model.


Deterministically, you mean? ;)


oh so you want it to be thinking???? now we talking


> where the exact same input gives different results given different preceding context

Why and how is this a problem?

If 'preceding context' doesn't cause different results, it means you can simply discard the context. Why do I want that? It's not how I expect a tool to work (I expect vim responds differently to my input after I switch to the insert mode). It's absolutely not how I expect intelligence to work either. It sounds like the most extreme form of confirmation bias.


When the context is auto-generated and may include irrelevant data.

This is a common AI benchmark and has been for years before GPT-2 even existed. LLMs need to not get distracted by irrelevant facts and there are tests that measure this. It's the motivation for attention mechanisms, which are the breakthrough that enabled LLMs to scale up.


An example is translation. I MTLed some text recently where the name of a (fictional) city was translated about a dozen different ways. Sometimes you'd get a calque, sometimes you'd get a transliteration (including several wrong ones). Ironically "dumb" MTLs are often much more consistent about this than LLMs.


This is really useful in reproducing bugs.


I was with you until you said it “doesn’t really help”. Did you mean “doesn’t completely solve the problem “?


A world model itself, in its particulars, isn't as important as the tacit understanding that the "world model" is necessarily incomplete and subordinate to the world itself, that there are sensory inputs from the world that would indicate you should adjust your world model, and the capacity and commitment to adjust that model in a way that maintains a level of coherence. With those things you don't need a complex model, you could start with a very simple but flexible model that would be adjusted over time by the system.

But I don't think we have a hint of a proposal for how to incorporate even the first part of that into our current systems.


Sounds like the “open-world assumption” used in RDF, with coherence maintained by OWL. (Well, at least it’s a hint of a proposal.)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: