LLMs are a key enabling technology to extract real insights from the enormous am...

schmidtleonard · 2025-07-14T20:57:31 1752526651

I remember when PRISM was spooky. This is gonna be something else!

int_19h · 2025-07-14T21:08:01 1752527281

Imagine PRISM, but all intercepted communications are then fed into automatic sentiment analysis by a hierarchy of models. The first pass is done by very basic and very fast models with a high error rate, but which are specifically trained to minimize false negatives (at the expense of false positives). Anything that is flagged in that pass gets fed to some larger models that can reason about the specifics better. And so on, until at last the remaining content is fed into SOTA LLMs that can infer things from very subtle clues.

With that, full-fledged panopticon becomes technically feasible for all unencrypted comms, so long as you have enough money to handle compute costs. Which the US government most certainly does.

I expect attempts to ban encryption to intensify going forward now that it is a direct impediment to the efficiency of such system.

schmidtleonard · 2025-07-14T21:18:49 1752527929

Yep, and that's assuming it is tuned to be reactive rather than tuned to proactively build cases against people, which is something that has been politically convenient in the past

> If you give me six lines written by the hand of the most honest of men, I will find something in them which will hang him -Cardinal Richelieu

and which the Vance / Bannon / Posobiec arm of the current administration seems quite keen on, probably as a next step once they are done spending the $170B they just won to build out their partisan enforcement apparatus.

https://en.wikipedia.org/wiki/Unhumans

jMyles · 2025-07-14T21:01:05 1752526865

So what are the actions which represent our duties to resist?

* End-to-end encryption (has downsides with regard to convenience)

* Legislation (very difficult to achieve, and can be ignored without the user having a way to verify)

* Market choices (ie, doing business only with providers who refrain from profiteering from illicit surveillance)

* Creating open-weight models and implementations which are superior (and thus forcing states and other malicious actors to rely on the same tooling as everyone else)

* Teaching LLMs the value of peace and the degree to which it enjoys consensus across societies and philosophies. This of course requires engineering what is essentially the entire corpus of public internet communications to echo this sentiment (which sounds unrealistic, but perhaps in a way we're achieving this without trying?)

* Wholesale deprecation of legacy states (seems inevitable, but still possibly centuries off)

What am I missing? What's the plan here?

andai · 2025-07-14T21:05:23 1752527123

I call it One Fed Per Child...

ezst · 2025-07-14T21:01:02 1752526862

NLP was a thing decades before LLMs and deep learning. If one thing, LLMs are a crazy inefficient and costly way to get at it. I really doubt this has anything to do with scaling.

TZubiri · 2025-07-14T21:09:14 1752527354

LLMs are unbelievably effective at NLP. Most NLP before that was pretty bad, the only good example I can think of is Alexa, and it was restricted to English.

lucaspauker · 2025-07-14T21:04:23 1752527063

It is way better now though...

spandrew · 2025-07-14T21:06:27 1752527187

People pointing out NLP are missing the point — pulling and crafting rules to run effective NLP is time consuming and technical. With an LLM you can just ask it exactly what you want and it interprets. That's the value; and as this deal just proved it's worth the scaling costs.

ezst · 2025-07-14T21:26:20 1752528380

The point that is missed isn't about LLMs adequacy as a NLP technique, it's that they cost you 10000 times more for the same effect (after the upfront set-up), which is why I have my doubts that they will be used at scale, at the center of some large data ingestion pipeline. The benefit will probably be for the out of ordinary tasks and outliers.

xnx · 2025-07-14T21:20:15 1752528015

grep : NLP :: NLP : LLM

moomoo11 · 2025-07-14T21:01:31 1752526891

Even the best LLM can't even process a 50 line CSV with like 2+ columns properly.

sshine · 2025-07-15T06:35:52 1752561352

LLMs make counting mistakes like forgetting the number of columns halfway through. I won't say "much like humans", since that will probably trigger some. But the general tendency for LLMs to be "bad at counting" (this includes computing) is resolved by producing programs that do the counting, and executing those programs instead. The LLMs that do that today are called agentic.

moomoo11 · 2025-07-15T14:04:25 1752588265

Right. Except those agents are not working as expected in many cases when the files become more complicated.

sshine · 2025-07-15T16:34:02 1752597242

I haven't tried working with very large files.

But Claude Code does read the entire file when it reads or writes anything.

Humans don't do anything close to that when the files get big.

So presumably what LLMs need is a finer context granularity than per-file.

moomoo11 · 2025-07-15T22:47:59 1752619679

The promise is that we can automate work.

The reality is that for any meaningful work automation, the currently available tooling is not meeting that expectation.

And 99% of us do not have the capabilities nor knowledge to build these SOTA models which is why A. we are not at OpenAI making 10M+ TC and B. We are application developers who are using off the shelf technology to build products and services.

As such, we have real world experience with these technologies.

BTW I use AI heavily every day in cursor and whatever else.

autoexec · 2025-07-14T20:58:58 1752526738

and hallucinated about.

swat535 · 2025-07-15T01:06:02 1752541562

This is even more terrifying, imagine an AI making up all sorts of "facts" about you that puts you on a watch list, resulting an endless life of harassment by the Government..

and what recourse do you have as a citizen? next to none.

echelon · 2025-07-14T20:57:19 1752526639

If you think about LLMs as new types of databases, it's quite obvious that they'll start winning over many types of legacy systems.

They ingest unstructured data, they have a natural query language, and they compress the data down into manageable sizes.

They might hallucinate, but there are mechanisms for dealing with that.

These won't destroy actual systems of record, but they will obsolete quite a lot of ingestion and search tools.

int_19h · 2025-07-14T21:10:41 1752527441

LLMs don't make for a particularly good database, though. The "compression" isn't very efficient when you consider that e.g. the entirety of Wikipedia - with images! - is an order of magnitude smaller than a SOTA LLM. There are no known reliable mechanisms to deal with hallucinations, either.

So, no, LLMs aren't going to replace databases. They are going to replace query systems over those databases. Think more along the lines of Deep Research etc, just with internal classified data sources.

msgodel · 2025-07-14T21:13:41 1752527621

Maybe query UIs but RAGs like Deep Research depend on old fashion query systems.

int_19h · 2025-07-14T21:22:24 1752528144

You're right, "subsume" would be a better word here. Although vector search is also a thing that I feel should be in the AI bucket. Especially given that SOTA embedding models are increasingly based on general-purpose LLMs.

ericmcer · 2025-07-14T21:04:05 1752527045

arent they complete trash as a database? "Show me people who have googled 'Homemade Bomb' in the last 30 days". For returning bulk data in a sane format it is terrible.

If their job was to process incoming data into a structured form I could see them being useful, but holy cow it will be expensive to in realtime run all the garbage they pick up via surveillance through an AI.

andai · 2025-07-14T21:06:25 1752527185

Most LLMs I use would respond to this by writing a Python program to run the query.

moomoo11 · 2025-07-14T22:33:47 1752532427

And that program would be written different each time and sometimes fail.

sshine · 2025-07-15T16:38:25 1752597505

...and the LLM, given an agentic loop, would ingest its own error message and correct itself...

...and eventually it'd persist some knowledge in a context window to not make that mistake for a while...

...and then it'd forget and make the same mistake again...

moomoo11 · 2025-07-15T16:53:04 1752598384

exactly