More

noddybear · 2025-11-12T13:40:41 1762954841

Nah, its all pattern matching. This is how automated theorem provers like Isabelle are built, applying operations to lemmas/expressions to reach proofs.

staticman2 · 2025-11-12T14:38:56 1762958336

I'm sure if you pick a sufficiently broad definition of pattern matching your argument is true by definition!

Unfortunately that has nothing to do with the topic of discussions, which is the capabilities of LLMs, which may require a more narrow definition of pattern matching.

vbarrielle · 2025-11-12T15:44:17 1762962257

Automated theorem provers are also built around backtracking, which is absent in LLMs.

noddybear · 2025-10-03T23:27:31 1759534051

Thank you!

noddybear · 2025-10-03T22:30:56 1759530656

Biters are disabled, but cliffs are not

noddybear · 2025-10-03T22:30:42 1759530642

This is our earlier work. Since May we've made it really easy for the community to build their own agents to play the game: you can now hook up your terminal to get Claude Code to play the game.

dang · 2025-10-03T22:45:37 1759531537

That's great!

(just for clarity: links to past threads in no way imply that the new post isn't welcome! They're just because some readers enjoy poking back through past related discussions as well)

typpilol · 2025-10-03T23:20:49 1759533649

Is there going to be some kind of plugin support for other games?

Id love to see Claude playa age of empires.

Claude plays command and conquer.

I already know there a huge AI starcraft 2 scene, but I don't think those are LLM AI.

noddybear · 2025-10-03T23:34:53 1759534493

I am really keen on plugging into Age of Empires 2 - although practically I think we need a couple of years of improvements before LLMs would be smart/fast enough to react to the game in realtime. Currently they can't react fast enough - although specially trained networks could be viable.

yeasku · 2025-10-04T01:13:31 1759540411

Open AI tried to create a Dota 2 AI with reinforcement learning. Some of its best people worked on that.

They had to dumb down the game and keep the bot playing on the same patch, even then it could not win against a proffesional team.

Crespyl · 2025-10-04T04:07:05 1759550825

I'm pretty sure that AI did take at least a few games off of the pros. IIRC the professional team only had one win, the last match.

I do agree that the game was terribly dumbed down to make it tractable. I keep hoping they'll revisit Dota 2 to see if they can find meaningful improvements and tackle the full game.

typpilol · 2025-10-04T07:14:59 1759562099

The last time they deployed it... It beat the current world champions

Crespyl · 2025-10-04T14:53:27 1759589607

Yes, the OpenAI Five bots won a best of three in their custom format, back in 2019. The bots won the first two games, then a third game was played which the humans won, which is the point I was trying to make (I'm not the GP).

Unless you know of another time the bots were deployed formally against a pro team more recently, which I'd love to hear about.

[0] https://web.archive.org/web/20190413210513/https://venturebe...

noddybear · 2025-08-25T05:13:11 1756098791

"Spem in Alium" is the most beautiful piece of music in existence for me: https://www.youtube.com/watch?v=iT-ZAAi4UQQ

At 40 distinct melodies, it is certainly the 'grandest' piece in early English church music.

dhosek · 2025-08-25T05:49:11 1756100951

As I recently commented on Bluesky, I want to write a contemporary choral setting of Spem in Alium (hope in another) but write the title Spem in Allium (hope in garlic) and see if it can make it to publication before anyone notices).

arduanika · 2025-08-25T05:27:43 1756099663

Yes! I had a chance to sing in this once.

But it definitely suffers from every one of the reasons for obscurity listed in TFA, on steroids.

noddybear · 2025-05-19T22:18:57 1747693137

The onus is on you to present evidence to justify your claim. Without actual data beyond anecdotes, your claim can and should be dismissed.

noddybear · 2025-05-08T15:36:34 1746718594

1. These are additions to our existing Factorio Learning Environment, which is an extensive agent environment for evaluating pre-trained LLM agents in an unbounded/open-ended setting in the game of Factorio. I don't agree that it is trivial, as there is significant infrastructure in place to support Factorio as an LLM eval.

2. Factorio is an unsolved game in multi-agent research.

3. This is a research environment. You can read our paper on Arxiv if you're interested! Nobody will make any money of this.

goa · 2025-05-08T19:25:17 1746732317

Ok you are right, my initial post was tongue in cheek and aimed to reference the dropbox hackernews meme.

1. Are current agents running on the same machine, and with the future A2A they can play across the internet?

2. How many agents can play/cooperate right now?

3. Can you also control enemies? Might be fun to have a generative adverserial setup?

noddybear · 2025-05-08T15:09:29 1746716969

Hey everyone,

It's Mart, Neel and Jack from the Factorio Learning Environment team.

Since our initial release, we have been working hard to expand the environment to support multi-agent scenarios, reasoning models and MCP for human-in-the-loop evals.

We have also spent time experimenting with different ways to elicit more performance out of agents in the game, namely tools for vision and reflection.

Today, we are proud to release v0.2.0, which includes several exciting new features and improvements.

Thanks for checking this out.

noddybear · 2025-04-10T16:53:24 1744304004

Cool! Looks a lot like Tanuki: https://github.com/Tanuki/tanuki.py

nate_nowack · 2025-04-10T17:00:26 1744304426

yea its a popular DX at this point: https://blog.alternatebuild.dev/marvin-3x/

noddybear · 2025-03-12T13:28:50 1741786130

This is true - there are simpler benchmarks that can saturate planning for these models. We were motivated to create a broader spectrum eval, to test multiple capabilities at once and remain viable into the future.

noosphr · 2025-03-12T19:43:00 1741808580

That's fair enough, but you should test other frontier model types to see if the benchmark makes sense for them.

For example the shortest path benchmark is largely useless when you look at reasoning models - since they have the equivalent of scratch paper to work through their answers the limitation became their context length rather than any innate ability to reason.