Hacker Newsnew | past | comments | ask | show | jobs | submit | sharkjacobs's commentslogin

> Most importantly, Olaf can speak and engage in conversations, creating a truly one-of-a-kind experience.

We already live in the world where hackers are pwning refrigerators, I can't wait for prompt injection attacks on animatronic cartoon characters.


> We already live in the world where hackers are pwning refrigerators, I can't wait for prompt injection attacks on animatronic cartoon characters.

It's not necessarily AI controlling the communication. Disney has long had 'puppet' characters whose communication is controlled by a human behind the scenes.


They're already using similar tech for the Mickey meet and greets and the Galaxy's Edge stormtroopers. The details aren't public, but it seems to be a mix of complex dialogue trees with interrupts or context switches, controlled in real time by the actor or operator.

It's not even complex, just some pre-recorded lines that the character can trigger via finger movements. You can want them do it and it becomes very obvious.

That's interesting; if you're doing human in the loop, I would have thought it'd be easier to just do voice swapping. Or did the technology not quite line up?

Someones linked in this thread the Defunctland video essay on these characters that I highly recommend watching since it goes into this in detail.

But the main reason is, there's a lot of brand imagery on the line with these interactions, someone putting on a voice, or using a voice changer could make a mistake. Disney instead have a conversation tree with pre-recorded voice lines that a remote operator can control. Much harder to mess up


And possibly more importantly, much easier to keep doing for hours on end. There's no need for a highly trained actor.

Yep, in this case everything is controlled through a steam deck.

Maybe I'm not late enough in my career to understand what you're saying, but what kind of problems are you helping the business solve with code that hasn't been proven to work?

Sorry I wrote that hastily and my wording seems to have caused much confusion. Here's a rewrite:

> The job is to help the business solve a problem, not just to ship code. In cases where delivering code actually makes sense, then yeah you should absolutely be able to prove it works and meets the requirements like the OP says. But there are plenty of cases where writing code at all is the wrong solution, and that’s an important distinction I didn’t really understand until later in my career.

Although funnily enough, the meaning you interpreted also has its own merit. Like other commenters have mentioned, there's always a cost tradeoff to evaluate. Some projects can absolutely cut corners to, say, ship faster to validate some result or gain users.


Getting a big customer to pay for a product that your sales team said could do X, Y, and Z but Y wasn't part of the product and now you need some plausible semblance of Y added so that you can send an invoice. If it doesn't work, that can be addressed later.

Getting a big sale by hacking together a demo that wouldn't scale up in the slightest without a complete rework of your backend.

Was it ever explained or understood why ChatGPT Images always has (had?) that yellow cast?

My pet theory is that OpenAI screwed up the image normalization calculation and was stuck with the mistake since that's something that can't be worked around.

At the least, it's not present in these new images.


There's still something off in the grading, and I suspect they worked around it

(although I get what you mean, not easily since you already trained)

I'm guessing when they get a clean slate we'll have Image 2 instead of 1.5. In LMArena it was immediately apparent it was an OpenAI model based on visuals.


wdym it cant be worked around when there exist literal yellow tint corrector models/tools haha

There's a possibility that any automatic correction could have false positives (since the yellow tint doesn't happen 100% of the time) which creates different problems where a image could have an even weirder hue.

Yeah, though I can imagine a conversation like this:

SWE: "Seriously? import PIL \ read file \ == (c + 10%, m = m, y = y, k = k) \ save file done!"

Exec: "Yeah, and first blogger get's a hold of image #1 they generate, starts saying 'Hey! This thing's been color corrected w/o AI! lol lame'"

Or not, no idea. i've not understood the choice either, besides very intelligent AI-driven auto-touch up for lighting/color correction has been a thing for a while. It's just, for those I end up finding an answer for, maybe 25% of head scratcher decisions do end of having a reasonable, if non intuitive answer for. Here? haven't been able to figure one yet though, or find a reason/mention by someone who appears to have an inside line on it.


Meta's codec avatars all have a green cast because they spent millions on the rig to capture whole bodies and even more on rolling it out to get loads of real data.

They forgot to calibrate the cameras, so everything had a green tint.

Meanwhile all the other teams had a billion macbeth charts lying around just in case.


Also, you'd be shocked at how few developers know anything at all about sRGB (or any other gamut/encoding), other than perhaps the name. Even people working in graphics, writing 3D game engines, working on colorist or graphics artist tools and libraries.

Not really, but there's a number of theories. The simplest one is that they "style tuned" the AI on human preference data, and this introduced a subtle bias for yellow.

And I say "subtle" - but because that model would always "regenerate" an image when editing, it would introduce more and more of this yellow tint with each tweak or edit. Which has a way of making a "subtle" bias anything but.


There was also the theory that is was because they scanned a bunch of actual real books and book paper has a slight yellow hue.

That seems unlikely, as we didn't see anything like that with Dall-E, unless the auto regressive nature of gpt-image somehow was more influenced by it.

My pet theory is that this is the "Mexico filter" from movies leaking through the training data.

I never heard anything concrete offered. At least it's relatively easy to work around with a tone mapping / LUTs.

I'm guessing that it was intentional all along, as no other models exhibit this behavior. It was so it could be instantly recognized as ChatGPT

There's definitely an analysis on the net somewhere, can't remember the details though.

maybe their version of synth-id? it at least helps me spot gpt images vs gemini's

Colloquially called the urine filter

lets not mince words, its called the "piss filter"

Not always, it started at a very specific point. Studio Ghibli craze + reinforcement learning on the likes.

The Studio Ghibli craze started with the initial release of images in ChatGPT, and the yellow filter has always existed even at that time. They did not make changes to the model as a result of RL (until pontentially today, with a new model)

That's not how it works the model doesn't just update in real time to likes and besides it was already yellow upon release

That's pretty fun.

It's not surprising per se but it does put things in perspective that Texas has a bigger footprint than every country in Europe.


There is a much nicer visual tool that helps you visualize this: https://thetruesize.com/. (It works best on desktop)

You can place a state/country on top of another country and see the true size. Helps to make up for the improper sizing caused by map projections.

I use it to help my lovely dutch friends realize why I can't just bike to work. :)


Yeah, money machine go brrrr is a great sign of "footprint", lets just ignore millenniums of inventions, technology and others things coming from Europe, before the US was even a colony. Texas GDP was $x millions last year, clearly larger footprint on the world :)

It's actually pretty fun and interesting the different bubbles we all live in, for better or worse.


A lot of those discoveries were actually made elsewhere ( not the majority, but an embarrassingly significant amount)

https://www.manchester.ac.uk/about/news/indians-predated-new...

https://sd2.org/bibha-chowdhuri-a-woman-of-firsts-with-no-re...

> After the war ended, Cecil Powell, a British physicist, continued the research in England using similar methods with more sensitive plates, detecting a new particle and winning him the Nobel Prize in 1950. Chowdhury and Bose’s work was acknowledged in his book, but their recognition quickly faded.

https://www.theguardian.com/world/article/2024/sep/01/hidden...

https://www.cs.umd.edu/~gasarch/BLOGPAPERS/fibfibs.pdf


> lets just ignore millenniums of inventions, technology and others things coming from Europe, before the US was even a colony

Those people are dead. They did great things. But it's irrelevant to their standing and influence today.


True, like how Silicon Valley should change it name, because Gordon Moore died so lets forget everything he ever did.

No place in the US has "Silicon Valley" as its formal name.

I meant the geographic footprint. I was surprised by how big Texas is, even though it is famously big.

This post is a mess. The best advice is clear and specific, and this is neither.

The examples are at best loosely related to the points they're supposed to illustrate.

It's honestly so bad that I cynically suspect that this post was created solely as a way to promote click3, in the first bullet, and then 4 more bullets were generated to make it a "whole" post


The five anti-patterns (or remedies, rather)-

  1. Don't re-send info you've already sent (be resourceful) 
  2. Play to model strengths (e.g. generating an image of text vs generating text in an image, or coding/executing a string counter rather than counting a search string)
  3. Stay aware of declining accuracy as context window fills up
  4. Don't ask for things it doesnt know (e.g. obscure topics or outside cutoff window)
  5. Careful with vibe coding


Maybe this guy has been searching so hard for structural issues with his LLM interaction because his writing is just so so bad.


Yeah, there's not a lot that's actionable here... mostly boils down to "try a lot of stuff yourself and find out what LLMs are good and bad at." Rs in strawberry, generating some specific text in Nano Banana, what knowledge it knows, etc. Don't do those specific things because obviously (?) models are bad at them.


Yeah I was hoping for a lot more from the title.


3x price drop almost certainly means Opus 4.5 is a different and smaller base model than Opus 4.1, with more fine tuning to target the benchmarks.

I'll be curious to see how performance compares to Opus 4.1 on the kind of tasks and metrics they're not explicitly targeting, e.g. eqbench.com


Why? They just closed a $13B funding round. Entirely possible that they're selling below-cost to gain marketshare; on their current usage the cloud computing costs shouldn't be too bad, while the benefits of showing continued growth on their frontier models is great. Hell, for all we know they may have priced Opus 4.1 above cost to show positive unit economics to investors, and then drop the price of Opus 4.5 to spur growth so their market position looks better at the next round of funding.


Nobody subsidizes LLM APIs. There is a reason to subsidize free consumer offerings: those users are very sticky, and won't switch unless the alternative is much better.

There might be a reason to subsidize subscriptions, but only if your value is in the app rather than the model.

But for API use, the models are easily substituted, so market share is fleeting. The LLM interface being unstructured plain text makes it simpler to upgrade to a smarter model than than it used to be to swap a library or upgrade to a new version of the JVM.

And there is no customer loyalty. Both the users and the middlemen will chase after the best price and performance. The only choice is at the Pareto frontier.

Likewise there is no other long-term gain from getting a short-term API user. You can't train out tune on their inputs, so there is no classic Search network effect either.

And it's not even just about the cost. Any compute they allocate to inference is compute they aren't allocating to training. There is a real opportunity cost there.

I guess your theory of Opus 4.1 having massive margins while Opus 4.5 has slim ones could work. But given how horrible Anthropic's capacity issues have been for much of the year, that seems unlikely as well. Unless the new Opus is actually cheaper to run, where are they getting the compute from for the massive usage spike that seems inevitable.


LLM APIs are more sticky than many other computing APIs. Much of the eng work is in the prompt engineering, and the prompt engineering is pretty specific to the particular LLM you're using. If you randomly swap out the API calls, you'll find you get significantly worse results, because you tuned your prompts to the particular LLM you were using.

It's much more akin to a programming language or platform than a typical data-access API, because the choice of LLM vendor then means that you build a lot of your future product development off the idiosyncracies of their platform. When you switch you have to redo much of that work.


No, LLMs really are not more sticky than traditional APIs. Normal APIs are unforgiving in their inputs and rigid in their outputs. No matter how hard you try, Hyrum's Law will get you over and over again. Every migration is an exercise in pain. LLMs are the ultimate adapting, malleable tool. It doesn't matter if you'd carefully tuned your prompt against a specific six months old model. The new model of today is sufficiently smarter that it'll do a better job despite not having been tuned on those specific prompts.

This isn't even theory, we can observe the swings in practice on Openrouter.

If the value was in prompt engineering, people would stick to specific old versions of models, because a new version of a given model might as well be a totally different model. It will behave differently, and will need to be qualified again. But of course only few people stick with the obsolete models. How many applications do you think still use a model released a year ago?


A Full migration is not always required these days.

It is possible to write adapters to API interfaces. Many proprietary APIs become de-facto standards when competitors start creating those compatibility layers out of the box to convince you it is a drop-in replacement. S3 APIs are good example Every major (and most minor) providers with the glaring exception of Azure support the S3 APIs out of the box now. psql wire protocol is another similar example, so many databases support it these days.

In the LLM inference world OpenAI API specs are becoming that kind of defacto standard.

There are always caveats of course, and switches go rarely without bumps. It depends on what you are using, only few popular widely/fully supported features or something niche feature in the API that is likely not properly implemented by some provider etc, you will get some bugs.

In most cases bugs in the API interface world is relatively easy to solve as they can be replicated and logged as exceptions.

In the LLM world there are few "right" answers on inference outputs, so it lot harder to catch and replicate bugs which can be fixed without breaking something else. You end up retuning all your workflows for the new model.


> But for API use, the models are easily substituted, so market share is fleeting. The LLM interface being unstructured plain text makes it simpler to upgrade to a smarter model than than it used to be to swap a library or upgrade to a new version of the JVM.

Agree that the plain text interface (which enables extremely fast user adoption) also makes the product less sticky. I wonder if this is part of the incentive to push for specialized tool calling interfaces / MCP stuff - to engineer more lock in by increasing the model specific surface area.


Eh, I'm testing it now and it seems a bit too fast to be the same size, almost 2x the Tokens Per Second and much lower Time To First Token.

There are other valid reasons for why it might be faster, but faster even while everyone's rushing to try it at launch + a cost decrease leaves me inclined to believe it's a smaller model than past Opus models


It could be a combination of over-provisioning for early users, smaller model and more quantisation.


It does seem too fast to be a huge model, but it also is giving me the vibes of the typical Opus level of intelligence. So who knows.


It's double the speed. 60t/s Vs 30. Combined with the price drop it's a strong signal that this is a smaller model or more efficient architecture.


Probably more sparse (MoE) than Opus 4.1. Which isn't a performance killer by itself, but is a major concern. Easy to get it wrong.


We already know distillation works pretty well. So definitely would make sense Opus 4.5 is effectively smaller (like someone else said, could be via MoE or some other technique too).

We know the big labs are chasing efficiency cans where they can.


It seems plausible that it's a similar size model and that the 3x drop is just additional hardware efficiency/lowered margin.


Or just pressure from Gemini 3


Maybe it's AWS Inferentia instead of NVidia GPUs :)


It is considered valuable and worthwhile for a society to educate all of its children/citizens. This means we have to develop systems and techniques to educate all kinds of people, not just the ones who can be dropped off by themselves at a library when they turn five, and picked up again in fifteen years with a PHD.


Sure. People who are self motivated are who will benefit the earliest. If a society values ensuring every single citizen gets a baseline education they can figure out how to get an AI to persuade or trick people into learning better than a human could.


Sure, but the point is that if 5% of students are using AI then you have to assume that any work done outside classroom has used AI, because otherwise you're giving a massive advantage to the 5% of students who used AI, right?


> The students remain motivated to learn how to solve problems without AI because they know they will be evaluated without it in class later.

Learning how to prepare for in-class tests and writing exercises is a very particular skillset which I haven't really exercised a lot since I graduated.

Never mind teaching the humanities, for which I think this is a genuine crisis, in class programming exams are basically the same thing as leetcode job interviews, and we all know what a bad proxy those are for "real" development work.


> in class programming exams are basically the same thing as leetcode job interviews, and we all know what a bad proxy those are for "real" development work.

Confusing university learning for "real industry work" is a mistake and we've known it's a mistake for a while. We can have classes which teach what life in industry is like, but assuming that the role of university is to teach people how to fit directly into industry is mistaking the purpose of university and K-12 education as a whole.

Writing long-form prose and essays isn't something I've done in a long time, but I wouldn't say it was wasted effort. Long-form prose forces you to do things that you don't always do when writing emails and powerpoints, and I rely on those skills every day.


There's no mistake there for all the students looking at job listings that treat having a college degree as a hard prerequisite for even being employable.


I use it every day.

Preparing for a test requires understanding what the instructor wants. concentrate on the wrong thing get marked down.

Same applies to working in a corporation. You need to understand what management wants. It’s a core requirement.


Art theft is a pretty cool crime.


Only if it's a proper heist. I don't need more guys just walking in and taking something like they're shoplifting a candy bar. I need guys meticulously planning and executing a theft that dodges the very latest in alarm and anti-theft technology.


Bonus points for any rappeling and using tools that cut circular holes in glass.


"How to steal a million" - a boomerang rather than to screwdriver..


Agreed. If no one uses gymnastics to traverse a laser filled room it's actually pretty lame.


See "The Hot Rock".


Topkapi (1964)


Taking the discussion seriously, a case study of a well-planned heist that culminated in someone walking in at the right time and just taking the thing could actually be pretty interesting.


Right, all of these amateurs wanting to spend all this money on special glass cutting tools, rappelling equipment, bypassing alarms, or even some Ocean's 11 EMP ridiculousness when you just need a ~$10 tool and a big pair of brass ones to pull it off.


No crime should be described as "cool". Adherence is the foundation of a functioning society.

Although you could argue the law is not the best arbiter of mortality.


Lots of crimes are cool. Adherence is the foundation of slavery.

Functioning societies need every rule and law tested, and retested continually for suitability.


Rosa Parks did a cool crime


You may want to re-examine your own username.


It was civil disobedience then. What was the point? No idea, but that’s art for you.


There's nothing cool about stealing cultural artifacts and society's ability to enjoy them.


Arguably high profile thefts increase interest in art and therefore more people enjoy art.

Also artworks can still be enjoyed post-theft through replicas etc.

And if the artwork is returned, as in this case, it's just a big win all round. Creating a new performance artwork in the process.


I mean, compared to arson, sure.

Compared to growing psychedelic mushrooms, I don't think so.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: