Computer use is the most important AI benchmark to watch if you're trying to for...

poopiokaka · 2025-10-08T01:47:59 1759888079

Not the current benchmarks, no. The demos in this post are so slow. Between writing the prompt, waiting a long time and checking the work I’d just rather do it myself.

panarky · 2025-10-08T03:29:04 1759894144

It's not about being faster than you.

It's about working independently while you do other things.

ssl-3 · 2025-10-08T04:10:24 1759896624

And it's a neat-enough idea for repetitive tasks.

For instance: I do periodic database-level backups of a very closed-source system at work. It doesn't take much of my time, but it's annoying in its simplicity: Run this GUI Windows program, click these things, select this folder, and push the go button. The backup takes as long as it takes, and then I look for obvious signs of either completion or error on the screen sometime later.

With something like this "Computer Use" model, I can automate that process.

It doesn't matter to anyone at all whether it takes 30 seconds or 30 minutes to walk through the steps: It can be done while I'm asleep or on vacation or whatever.

I can keep tabs on it with some combination of manual and automatic review, just like I would be doing if I hired a real human to do this job on my behalf.

(Yeah, yeah. There's tons of other ways to back up and restore computer data. But this is the One, True Way that is recoverable on a blank slate in a fashion that is supported by the manufacturer. I don't get to go off-script and invent a new method here.

But a screen-reading button-clicker? Sure. I can jive with that and keep an eye on it from time to time, just as I would be doing if I hired a person to do it for me.)

thewebguyd · 2025-10-08T14:30:03 1759933803

Have you tried AutoHotKey for that? It can do GUI automation. Not an LLM, but you can pre-record mouse movements and clicks, I've used it a ton to automate old windows apps

ssl-3 · 2025-10-08T18:48:52 1759949332

I've tried it previously, and I've also given up on it. I may try it again at some point.

It is worth noting that I am terrible at writing anything resembling "code" on my own. I can generally read it and follow it and understand how it does what it does, why it does that thing, and often spot when it does something that is either very stupid or very clever (or sometimes both), but producing it on a blank canvas has always been something of a quagmire from which I have been unable to escape once I tread into it.

But I can think through abstract processes of various complexities in tiny little steps, and I can also describe those steps very well in English.

Thus, it is without any sense of regret or shame that I say that the LLM era has a boon for me in terms of the things I've been able to accomplish with a computer...and that it is primarily the natural-language instructional input of this LLM "Computer Use" model that I find rather enticing.

(I'd connect the dots and use the fluencies I do have to get the bot to write a functional AHK script, but that sounds like more work than the reward of solving this periodic annoyance is worth.)

redman25 · 2025-10-08T13:53:26 1759931606

They could literally run 24/7 overnight assuming they eventually become good enough to not need hand holding.