You can already do what you're looking for by reading the browser cache as new data is cached. This would allow you to see the site as it was loaded originally, instead of simply fetching an updated view from a URL. The data layout for the cache in Firefox and Chrome is available online.
They'd probably reject that idea under some bullshit privacy or security excuse Wayland-like reasoning. Also why we don't have XUL extensions anymore and why they'll eventually copy chrome on that manifest crap.
Knowing this is the direction things were headed, I have been trying to get Firefox and Google to create a feature that archives your browser history and pipes a stream of it in real time so that open-source personal AI engines can ingest it and index it.
AFAICS this has nothing to do with "open-source personal AI engines".
The recorded history is stored in a SQLite database and is quite trivial to examine[0][1]. A simple script could extract the information and feed them to your indexer of choice. Developing such a script isn't the task for an internet browser engineering team.
The question remains whether the indexer would really benefit from real-time ingestion while browsing.
Due to the dynamic nature of the Web, URLs don't map to what you've seen. If I visit a URL at a certain time, the content I see is different than the content you see or even if I visit the same URL later. For example, if we want to know the tweets I'm seeing are the same as the tweets you're seeing and haven't been subtly modified by an AI, how do you do that? In the age of AI programming people, this will be important.
I'm confused, do you want more than the browser history then? ...something like Microsoft's Recall? Browsers currently don't store what they've seen and for good reasons. I was with you for a sec, but good luck convincing Mozilla to propagate rendered pages to other processes then!
I understand GP like they want to browse normally and have that session's history feed into another indexing process via some IPC like D-Bus. It's meant to receive human events from the browser.
Chrome Devtools MCP on the other hand is a browser automation tool. Its purpose is to make it trivial to send programmed events/event-flows to a browser session.
The universities need to get together and develop their own open-source search engine as part of an ongoing research project. It should be hosted in a distributed fashion from the universities themselves. They have the expertise and the resources these days to do it. And much of the high quality content on the public web originates from the universities anyway. It will be like the Library of Alexandria and not subject to censorship.
There needs to be a browser that archives your browser history and pipes a stream of it in real time so that open-source personal AI engines can ingest it and index it. The future of the Web will be built on this. Google may not do it. Firefox could.
Firefox probably won't suddenly have the best AI, but it could be the only browser that does this. Previous: https://news.ycombinator.com/item?id=46018789