Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It has to be addressed architecturally with some sort of extension to transformers that can focus the attention on just the relevant context.

People have tried to expand context windows by reducing the O(n^2) attention mechanism to something more sparse and it tends to perform very poorly. It will take a fundamental architectural change.



I'm not an expert but it seemed fairly reasonable to me that a hierarchical model would be needed to approach what humans can do, as that's basically how we process data as well.

That is, humans usually don't store exactly what was written in as sentence five paragraphs ago, but rather the concept or idea conveyed. If we need details we go back and reread or similar.

And when we write or talk, we form first an overall thought about what to say, then we break it into pieces and order the pieces somewhat logically, before finally forming words that make up sentences for each piece.

From what I can see there's work on this, like this[1] and this[2] more recent paper. Again not an expert so can't comment on the quality of the references, just some I found.

[1]: https://aclanthology.org/2022.findings-naacl.117/

[2]: https://aclanthology.org/2025.naacl-long.410/


>extension to transformers that can focus the attention on just the relevant context.

That is what transformers attention does in the first place, so you would just be stacking two transformers.


Can one instruct an LLM to pick the parts of the context that will be relevant going forward? And then discard the existing context, replacing it with the new 'summary'?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: