No, it can still be modeled as a finite state machine. Each state just encodes the configuration of your memory. I.e. if you have 8 bits of memory, your state space just encodes 2^8 states for each memory configuration.
Any real-world deterministic thing can be encoded as a FSM if you make your state space big enough, since it by definition there has only a finite number of states.
You could model a specific instance of using your computer this way, but you could not capture the fact that you can execute arbitrary programs with your PC represented as an FSM.
Your computer is strictly more computationally powerful than an FSM or PDA, even though you could represent particular states of your computer this way.
The fact that you can model an arbitrary CFG as an regular language with limited recursion depth does not mean there’s no meaningful distinction between regular languages and CFG.
> you can execute arbitrary programs with your PC represented as an FSM
You cannot execute arbitrary programs with your PC, your PC is limited in how much memory and storage it has access to.
>Your computer is strictly more computationally powerful
The abstract computer is, but _your_ computer is not.
>model an arbitrary CFG as an regular language with limited recursion depth does not mean there’s no meaningful distinction between regular languages and CFG
Yes this I agree. But going back to your argument, claiming that LLMs with a fixed context-window are basically markov chains so they can't do anything useful is reductio ad absurdum in the exact same way as claiming that real-world computers are finite state machines.
A more useful argument on the upper-bound of computational power would be along the lines of circuit complexity I think. But even this does not really matter. An LLM does not need to be turing complete even conceptually. When paired with tool-use, it suffices that the LLM can merely generate programs that are then fed into an interpreter. (And the grammar of turing-complete programming languages can be made simple enough, you can encode Brainfuck in a CFG). So even if an LLM could only ever produce programs with a CFG grammar, the combination of LLM + brainfuck executor would give turing completeness.
I never claimed that. They demonstrate just how powerful Markov chains can be with sophisticated state representations. Obviously LLMs are useful, I have never claimed otherwise.
Additionally, it doesn’t require any logical leaps to understand decoder only LLMs as Markov Chains, they preserve the Markov Property and otherwise be have exactly like them. It’s worth noting that encoder-decoder LLMs do not preserve the Markov property and can not be considered Markov chains.
Edit: I saw that post and at the time was disappointed by how confused the author was about those topics and how they apply to the subject.