More

panabee · 2025-09-26T21:58:14 1758923894

The association between pathogens and cancer is under-appreciated, mostly due to limitations in detection methods.

For instance, it is not uncommon for cancer studies to design assays around non-oncogenic strains, or for assays to use primer sequences with binding sites mismatched to a large number of NCBI GenBank genomes.

Another example: studies relying on The Cancer Genome Atlas (TCGA), which is a rich database for cancer investigations. However, the TCGA made a deliberate tradeoff to standardize quantification of eukaryotic coding transcripts but at the cost of excluding non-poly(A) transcripts like EBER1/2 and other viral non-coding RNAs -- thus potentially understating viral presence.

Enjoy the rabbit hole. :)

edem · 2025-09-27T05:48:26 1758952106

can you translate this to English?

panabee · 2025-09-23T16:46:39 1758645999

A more accurate title: "Are Cornell Students Meritocratic and Efficiency-Seeking? Evidence from 271 MBA students and 67 Undergraduate Business Students."

This topic is important and the study interesting, but the methods exhibit the same generalizability bias as the famous Dunning-Kruger study.

The referenced MBA students -- and by extension, the elites -- only reflect 271 students across two years, all from the same university.

By analyzing biased samples, we risk misguided discourse on a sensitive subject.

@dang

panabee · 2025-07-23T08:54:46 1753260886

This is long overdue for biomedicine.

Even Google DeepMind's relabeled MedQA dataset, created for MedGemini in 2024, has flaws.

Many healthcare datasets/benchmarks contain dirty data because accuracy incentives are absent and few annotators are qualified.

We had to pay Stanford MDs to annotate 900 new questions to evaluate frontier models and will release these as open source on Hugging Face for anyone to use. They cover VQA and specialties like neurology, pediatrics, and psychiatry.

If labs want early access, please reach out. (Info in profile.) We are finalizing the dataset format.

Unlike general LLMs, where noise is tolerable and sometimes even desirable, training on incorrect/outdated information may cause clinical errors, misfolded proteins, or drugs with off-target effects.

Complicating matters, shifting medical facts may invalidate training data and model knowledge. What was true last year may be false today. For instance, in April 2024 the U.S. Preventive Services Task Force reversed its longstanding advice and now urges biennial mammograms starting at age 40 -- down from the previous benchmark of 50 -- for average-risk women, citing rising breast-cancer incidence in younger patients.

empiko · 2025-07-23T09:37:19 1753263439

This is true for every subfield I have been working on for the past 10 years. The dirty secret of ML research is that Sturgeon's law apply to datasets as well - 90% of data out there is crap. I have seen NLP datasets with hundreds of citations that were obviously worthless as soon as you put the "effort" in and actually looked at the samples.

panabee · 2025-07-23T09:47:50 1753264070

100% agreed. I also advise you not to read many cancer papers, particularly ones investigating viruses and cancer. You would be horrified.

(To clarify: this is not the fault of scientists. This is a byproduct of a severely broken system with the wrong incentives, which encourages publication of papers and not discovery of truth. Hug cancer researchers. They have accomplished an incredible amount while being handcuffed and tasked with decoding the most complex operating system ever designed.)

briandear · 2025-07-23T13:59:56 1753279196

> this is not the fault of scientists. This is a byproduct of a severely broken system with the wrong incentives, which encourages publication of papers and not discovery of truth

Are scientists not writing those papers? There may be bad incentives, but scientists are responding to those incentives.

eszed · 2025-07-23T14:22:15 1753280535

That is axiomatically true, but both harsh and useless, given that (as I understand from HN articles and comments) the choice is "play the publishing game as it is" vs "don't be a scientist anymore".

pyuser583 · 2025-07-23T15:38:13 1753285093

I agree, but there is an important side-effect of this statement: it's possible to criticize science, without criticizing scientists. Or at least without criticizing rank and file scientists.

There are many political issues where activists claim "the science has spoken." When critics respond by saying, "the science system is broken and is spitting out garbage", we have to take those claims very seriously.

That doesn't mean the science is wrong. Even though the climate science system is far from perfect, climate change is real and human made.

On the other hand, some of the science on gender medicine is not as established medical associates would have us believe (yet, this might change in a few years). But that doesn't stop reputable science groups from making false claims.

roughly · 2025-07-23T17:35:36 1753292136

If we’re not going to hold any other sector of the economy personally responsible for responding to incentives, I don’t know why we’d start with scientists. We’ve excused folks working for Palantir around here - is it that the scientists aren’t getting paid enough for selling out, or are we just throwing rocks in glass houses now?

panabee · 2025-07-23T14:55:50 1753282550

Valid critique, but one addressing a problem above the ML layer at the human layer. :)

That said, your comment has an implication: in which fields can we trust data if incentives are poor?

For instance, many Alzheimer's papers were undermined after journalists unmasked foundational research as academic fraud. Which conclusions are reliable and which are questionable? Who should decide? Can we design model architectures and training to grapple with this messy reality?

These are hard questions.

ML/AI should help shield future generations of scientists from poor incentives by maximizing experimental transparency and reproducibility.

Apt quote from Supreme Court Justice Louis Brandeis: "Sunlight is the best disinfectant."

jacobr1 · 2025-07-23T16:04:35 1753286675

Not a answer, but contributory idea - Meta-analysis. There are plenty of strong meta-analysis out there and one of the things they tend to end up doing is weighing the methodological rigour of the papers along with the overlap they have to the combined question being analyzed. Could we use this weighting explicitly in the training process?

panabee · 2025-07-23T17:27:28 1753291648

Thanks. This is helpful. Looking forward to more of your thoughts.

Some nuance:

What happens when the methods are outdated/biased? We highlight a potential case in breast cancer in one of our papers.

Worse, who decides?

To reiterate, this isn’t to discourage the idea. The idea is good and should be considered, but doesn’t escape (yet) the core issue of when something becomes a “fact.”

edwardbernays · 2025-07-23T14:38:55 1753281535

Scientists are responding to the incentives of a) wanting to do science, b) for the public benefit. There was one game in town to do this: the American public grant scheme.

This game is being undermined and destroyed by infamous anti-vaxxer, non-medical expert, non-public-policy expert RFK Jr.[1] The disastrous cuts to the NIH's public grant scheme is likely to amount to $8,200,000,000 ($8.2 trillion USD) in terms of years of life lost.[2]

So, should scientists not write those papers? Should they not do science for public benefit? These are the only ways to not respond to the structure of the American public grant scheme. It seems to me that, if we want better outcomes, then we should make incremental progress to the institutions surrounding the public grant scheme. This seems fair more sensible than installing Bobby Brainworms to burn it all down.

[1] https://youtu.be/HqI_z1OcenQ?si=ZtlffV6N1NuH5PYQ

[2] https://jamanetwork.com/journals/jama-health-forum/fullartic...

JumpCrisscross · 2025-07-23T21:05:23 1753304723

> This is true for every subfield I have been working on for the past 10 years

Hasn’t data labelling being the bulk of the work been true for every research endeavour since forever?

PaulHoule · 2025-07-23T14:24:58 1753280698

If you download data sets for classification from Kaggle or CIFAR or search ranking from TREC it is the same. Typically 1-2% of judgements in that kind of dataset are just wrong so if you are aiming for the last few points of AUC you have to confront that.

morkalork · 2025-07-23T15:36:52 1753285012

I still want to jump off a bridge whenever someone thinks they can use the twitter post and movie review datasets to train sentiment models for use in completely different contexts.

panabee · 2025-07-23T09:39:49 1753263589

To elaborate, errors go beyond data and reach into model design. Two simple examples:

1. Nucleotides are a form of tokenization and encode bias. They're not as raw as people assume. For example, classic FASTA treats modified and canonical C as identical. Differences may alter gene expression -- akin to "polish" vs. "Polish".

2. Sickle-cell anemia and other diseases are linked to nucleotide differences. These single nucleotide polymorphisms (SNPs) mean hard attention for DNA matters and single-base resolution is non-negotiable for certain healthcare applications. Latent models have thrived in text-to-image and language, but researchers cannot blindly carry these assumptions into healthcare.

There are so many open questions in biomedical AI. In our experience, confronting them has prompted (pun intended) better inductive biases when designing other types of models.

We need way more people thinking about biomedical AI.

arbot360 · 2025-07-23T15:36:26 1753284986

> What was true last year may be false today. For instance, ...

Good example of a medical QA dataset shifting but not a good example of a medical "fact" since it is an opinion. Another way to think about shifting medical targets over time would be things like environmental or behavioral risk factors changing.

Anyways, thank you for putting this dataset together, certainly we need more third-party benchmarks with careful annotations done. I think it would be wise if you segregate tasks between factual observations of data, population-scale opinions (guidelines/recommendations), and individual-scale opinions (prognosis/diagnosis). Ideally there would be some formal taxonomy for this eventually like OMOP CDM, maybe there is already in some dusty corner of pubmed.

bjourne · 2025-07-23T11:39:43 1753270783

What if there is significant disagreement within the medical profession itself? For example, isotretinoin is proscribed for acne in many countries, but in other countries the drug is banned or access restricted due to adverse side effects.

jacobr1 · 2025-07-23T16:08:55 1753286935

Would not one approach be to just ensure the system has all the data? Relevance to address systems, side effects, and legal constraints. Then when making a recommendations it can account for all factors not just prior use cases.

panabee · 2025-07-23T14:13:24 1753280004

If you agree that ML starts with philosophy, not statistics, this is but one example highlighting how biomedicine helps model development, LLMs included.

Every fact is born an opinion.

This challenge exists in most, if not all, spheres of life.

K0balt · 2025-07-23T12:30:55 1753273855

I think an often overlooked aspect of training data curation is the value of accurate but oblique data. Much of the “emergent capabilities “ of LLMs comes from data embedded in the data, implied or inferred semantic information that is not readily obvious. Extraction of this highly useful information, in contrast to specific factoids, requires a lot of off axis images of the problem space, like a CT scan of the field of interest. The value of adjacent oblique datasets should not be underestimated.

TZubiri · 2025-07-23T12:55:03 1753275303

I noticed this when adding citations to wikipedia.

You are may find a definition of what a "skyscraper" is, by some hyperfocused association, but you'll get a bias towards a definite measurement like "skyscrapers are buildings between 700m to 3500m tall", which might be useful for some data mining project, but not at all what people mean by it.

The actual definition is not in a specific source but in the way it is used in other sources like "the Manhattan skyscraper is one of the most iconic skyscrapers", on the aggregate you learn what it is, but it isn't very citable on its own, which gives WP that pedantic bias.

ethan_smith · 2025-07-23T20:40:36 1753303236

Synthetic data generation techniques are increasingly being paired with expert validation to scale high-quality biomedical datasets while reducing annotation burden - especially useful for rare conditions where real-world examples are limited.

ljlolel · 2025-07-23T16:04:02 1753286642

Centaur Labs does medical data labeling https://centaur.ai/

TZubiri · 2025-07-23T12:51:19 1753275079

Isn't labelling medical data for ai illegal as unlicensed medical practice?

Same thing with law data

iwontberude · 2025-07-23T14:15:38 1753280138

Paralegals and medical assistants don’t need licenses

nomel · 2025-07-23T18:20:02 1753294802

I think their question is a good one, and not being taken charitably.

Lets take the medical assistant example.

> Medical assistants are unlicensed, and may only perform basic administrative, clerical and technical supportive services as permitted by law.

If they're labelling data that's "tumor" or "not tumor", with any agency of the process,does that fit within their unlicensed scope? Or, would that labelling be closer to a diagnosis?

What if the AI is eventually used to diagnose, based on data that was labeled by someone unlicensed? Should there there need to be a "chain of trust" of some sort?

I think the answer to liability will be all on the doctor agreeing/disagreeing with the AI...for now.

SkyBelow · 2025-07-23T18:52:04 1753296724

To answer this, I would think we should consider other cases where someone could practice medicine without legally doing so. For example, could they tutor a student and help them? Go through unknown cases and make judgement, explaining their reasoning? As long as they don't oversell their experience in a way that might be considered fraud, I don't think this would be practicing medicine.

It does open something of a loophole. Oh, I wasn't diagnosing a friend, I was helping him label a case just like his as an educational experience. My completely IANAL guess would be that judges would look on it based on how the person is doing it, primarily if they are receiving any compensation or running it like a business.

But wait... the example the OP was talking about is doing it like a business and likely doesn't have any disclaimers properly sent to the AI, so maybe that doesn't help us decide.

TZubiri · 2025-07-24T14:54:07 1753368847

A bit simpler, but if they are training the AI to answer law questions or medical questions (specific to a case, and not general), then that's what I would argue is unlicensed practice.

Of course it's the org and not the individual who would be practicing, as labelling itself is not practicing.

mh- · 2025-07-23T14:54:52 1753282492

bethekidyouwant · 2025-07-23T13:22:16 1753276936

Illegal?

panabee · 2025-07-22T20:53:20 1753217600

The author is a respected voice in tech and a good proxy of investor mindset, but the LLM claims are wrong.

They are not only unsupported by recent research trends and general patterns in ML and computing, but also by emerging developments in China, which the post even mentions.

Nonetheless, the post is thoughtful and helpful for calibrating investor sentiment.

ariwilson · 2025-07-22T21:00:50 1753218050

What is wrong about their claims?

panabee · 2025-07-10T00:49:40 1752108580

Agreed. There is deep potential for ML in healthcare. We need more contributors advancing research in this space. One opportunity as people look around: many priors merit reconsideration.

For instance, genomic data that may seem identical may not actually be identical. In classic biological representations (FASTA), canonical cytosine and methylated cytosine are both collapsed into the letter "C" even though differences may spur differential gene expression.

What's the optimal tokenization algorithm and architecture for genomic models? How about protein binding prediction? Unclear!

There are so many open questions in biomedical ML.

The openness-impact ratio is arguably as high in biomedicine as anywhere else: if you help answer some of these questions, you could save lives.

Hopefully, awesome frameworks like this lower barriers and attract more people.

govideo · 2025-07-10T17:34:22 1752168862

I'd love to hear more of our thoughts re open questions in biomedical ML. You sound like you have a crisp, nuanced grasp the landscape, which is rare. That would be very helpful to me, as an undergrad in CS (with bio) trying to crystalize research to pursue in bio/ML/GenAI.

Thank you.

panabee · 2025-07-10T21:53:43 1752184423

Thanks, but no one truly understands biomedicine, let alone biomedical ML.

Feynman's quote -- "A scientist is never certain" -- is apt for biomedical ML.

Context: imagine the human body as the most devilish operating system ever: 10b+ lines of code (more than merely genomics), tight coupling everywhere, zero comments. Oh, and one faulty line may cause death.

Are you more interested in data, ML, or biology (e.g., predicting cancerous mutations or drug toxicology)?

Biomedical data underlies everything and may be the easiest starting point because it's so bad/limited.

We had to pay Stanford doctors to annotate QA questions because existing datasets were so unreliable. (MCQ dataset partially released, full release coming).

For ML, MedGemma from Google DeepMind is open and at the frontier.

Biology mostly requires publishing, but still there are ways to help.

After sharing preferences, I can offer a more targeted path.

govideo · 2025-07-11T21:45:48 1752270348

ML first, then Bio and Data. Of course, interconnectedness runs high (eg just read about ML for non-random missingness in med records) and that data is the foundational bottleneck/need across the board.

Interesting anecdote abt Stanford doctors annotating QA question!

Each of your comments get my mind going... I'm going to think about them more and may ping you on other channels, per your profile. Thanks!

panabee · 2025-07-12T03:26:36 1752290796

More like alarming anecdote. :) Google did a wonderful job relabeling MedQA, a core benchmark, but even they missed some (e.g., question 448 in the test set remains wrong according to Stanford doctors).

For ML, start with MedGemma. It's a great family. 4B is tiny and easy to experiment with. Pick an area and try finetuning.

Note the new image encoder, MedSigLIP, which leverages another cool Google model, SigLIP. It's unclear if MedSigLIP is the right approach (open question!), but it's innovative and worth studying for newcomers. Follow Lucas Beyer, SigLIP's senior author and now at Meta. He'll drop tons of computer vision knowledge (and entertaining takes).

For bio, read 10 papers in a domain of passion (e.g., lung cancer). If you (or AI) can't find one biased/outdated assumption or method, I'll gift a $20 Starbucks gift card. (Ping on Twitter.) This matters because data is downstream of study design, and of course models are downstream of data.

Starbucks offer open to up to three people.

panabee · 2025-07-08T16:52:30 1751993550

Thank you both for an illuminating thread. Comments were concise, curious, and dense with information. Most notably, there was respectful disagreement and a levelheaded exchange of perspective.

kragen · 2025-07-08T17:00:36 1751994036

Thank you! I try, but often I fail.

panabee · 2025-04-25T21:02:56 1745614976

To provide more color on cancers caused by viruses, the World Health Organization (WHO) estimates that 9.9% of all cancers are attributable to viruses [1].

Cancers with established viral etiology or strong association with viruses include:

- Cervical cancer - Burkitt lymphoma - Hodgkin lymphoma - Gastric carcinoma - Kaposi’s sarcoma - Nasopharyngeal carcinoma (NPC) - NK/T-cell lymphomas - Head and neck squamous cell carcinoma (HNSCC) - Hepatocellular carcinoma (HCC)

[1] https://pmc.ncbi.nlm.nih.gov/articles/PMC8831861

panabee · 2025-01-29T22:13:38 1738188818

Nvidia (NVDA) generates revenue with hardware, but digs moats with software.

The CUDA moat is widely unappreciated and misunderstood. Dethroning Nvidia demands more than SOTA hardware.

OpenAI, Meta, Google, AWS, AMD, and others have long failed to eliminate the Nvidia tax.

Without diving into the gory details, the simple proof is that billions were spent on inference last year by some of the most sophisticated technology companies in the world.

They had the talent and the incentive to migrate, but didn't.

In particular, OpenAI spent $4 billion, 33% more than on training, yet still ran on NVDA. Google owns leading chips and leading models, and could offer the tech talent to facilitate migrations, yet still cannot cross the CUDA moat and convince many inference customers to switch.

People are desperate to quit their NVDA-tine addiction, but they can't for now.

[Edited to include Google, even though Google owns the chips and the models; h/t @onlyrealcuzzo]

vidarh · 2025-01-29T22:23:23 1738189403

The CUDA moat is largely irrelevant for inference. The code needed for inference is small enough that there are e.g. bare-metal CPU only implementations. That isn't what's limiting people from moving fully off Nvidia for inference. And you'll note almost "everyone" in this game are in the process of developing their own chips.

buyucu · 2025-01-29T22:46:03 1738190763

My company recently switched from A100s to MI300s. I can confidently say that in my line of work, there is no CUDA moat. Onboarding took about month, but afterwards everything was fine.

panabee · 2025-01-29T22:56:30 1738191390

Alternatives exist, especially for mature and simple models. The point isn't that Nvidia has 100% market share, but rather that they command the most lucrative segment and none of these big spenders have found a way to quit their Nvidia addiction, despite concerted efforts to do so.

For instance, we experimented with AWS Inferentia briefly, but the value prop wasn't sufficient even for ~2022 computer vision models.

The calculus is even worse for SOTA LLMs.

The more you need to eke out performance gains and ship quickly, the more you depend on CUDA and the deeper the moat becomes.

buyucu · 2025-01-30T08:20:19 1738225219

llm inference is fine on rocm. llama.cpp and vllm both have very good rocm support.

llm training is also mostly fine. I have not encountered any issues yet.

most of the cuda moat comes from people who are repeating what they heard 5-10 years ago.

onlyrealcuzzo · 2025-01-29T22:29:56 1738189796

> OpenAI, Meta, AWS, AMD, and others have long attempted to eliminate the Nvidia tax, yet failed.

Gemini / Google runs and trains on TPUs.

You have no incentive to infer on AMD if you need to buy a massive Nvidia cluster to train.

boroboro4 · 2025-01-30T00:16:54 1738196214

Meta trains on Nvidia and infers on AMD. There is incentive if your inference costs are high.

vidarh · 2025-01-30T06:40:36 1738219236

Meta also has a second generation of their own AI accelerator chips designed.

panabee · 2025-01-29T22:51:33 1738191093

Google was omitted because they own the hardware and the models, but in retrospect, they represent a proof point nearly as compelling as OpenAI. Thanks for the comment.

Google has leading models operating on leading hardware, backed by sophisticated tech talent who could facilitate migrations, yet Google still cannot leap over the CUDA moat and capture meaningful inference market share.

Yes, training plays a crucial role. This is where companies get shoehorned into the CUDA ecosystem, but if CUDA were not so intertwined with performance and reliability, customers could theoretically switch after training.

onlyrealcuzzo · 2025-01-30T00:09:20 1738195760

> yet Google still cannot leap over the CUDA moat and capture meaningful inference market share.

It's almost as if being a first-mover is more important than whether or not you use CUDA.

talldayo · 2025-01-30T01:31:56 1738200716

Both matter quite a bit. The first-mover advantage obviously rewards OEMs in a first-come, first-serve order, but CUDA itself isn't some light switch that OEMs can flick and get working overnight. Everyone would do it if it was easy, and even Google is struggling to find buy-in for their TPU pods and frameworks.

Short-term value has been dependent on how well Nvidia has responded to burgeoning demands. Long-term value is going to be predicated on the number of Nvidia alternatives that exist, and right now the number is still zero.

baq · 2025-01-30T08:23:43 1738225423

Google has a self inflicted wound in the time to get an api key.

Der_Einzige · 2025-01-30T18:28:18 1738261698

The fact that this comment is DOWNVOTED despite being literally 1000% true is evidence that HN is full of loonies.

panabee · 2025-01-30T01:23:37 1738200217

It's unclear why this drew downvotes, but to reiterate, the comment merely highlights historical facts about the CUDA moat and deliberately refrains from assertions about NVDA's long-term prospects or that the CUDA moat is unbreachable.

With mature models and minimal CUDA dependencies, migration can be justified, but this does not describe most of the LLM inference market today nor in the past.

panabee · on Dec 21, 2024

Nadella is a superb CEO, inarguably among the best of his generation. He believed in OpenAI when no one else did and deserves acclaim for this brilliant investment.

But his "below them, above them, around them" quote on OpenAI may haunt him in 2025/2026.

OAI or someone else will approach AGI-like capabilities (however nebulous the term), fostering the conditions to contest Microsoft's straitjacket.

Of course, OAI is hemorrhaging cash and may fail to create a sustainable business without GPU credits, but the possibility of OAI escaping Microsoft's grasp grows by the day.

Coupled with research and hardware trends, OAI's product strategy suggests the probability of a sustainable business within 1-3 years is far from certain but also higher than commonly believed.

If OAI becomes a $200b+ independent company, it would be against incredible odds given the intense competition and the Microsoft deal. PG's cannibal quote about Altman feels so apt.

It will be fascinating to see how this unfolds.

Congrats to OAI on yet another fantastic release.

panabee · on Jan 1, 2025

To address the downvotes, this comment isn't guaranteeing OAI's success. It merely notes the remarkably elevated probability of OAI escaping Nadella's grip, which was nearly unfathomable 12 months ago.

Even after breaking free, OAI must still contend with intense competition at multiple layers, including UI, application, infrastructure, and research. Moreover, it may need to battle skilled and powerful incumbents in the enterprise space to sustain revenue growth.

While the outcome remains highly uncertain, the progress since the board fiasco last year is incredible.

panabee · on Dec 14, 2024

Balancing safety and innovation in human health is incredibly difficult.

This isn't a criticism of the FDA, but rather an observation of facts underscoring both the challenges and opportunities.

Some foods consumed safely by humans are toxic/harmful in mice, due to how mice react differently to key compounds.

Common examples:

1. Chocolate - theobromine

2. Coffee - caffeine

3. Chili peppers - capsaicin

BurningFrog · on Dec 14, 2024

All those are "pesticides" that plants produce to try to poison predators. Other examples are cocaine and nicotine.

zeofig · on Dec 14, 2024

Poor little bastards can't eat anything fun

computerdork · on Dec 15, 2024

snorting coke probably gives them asthma:)

akira2501 · on Dec 14, 2024

> This isn't a criticism of the FDA

Why wouldn't it be? In many ways they've clearly lost the fight. They're much smaller and less supported than the entities they intend to regulate. There is a known revolving door problem between federal and commercial employment. The natural mission of regulating food _and_ drugs is no longer sensible in our current social and political environment.

mschuster91 · on Dec 14, 2024

> The natural mission of regulating food _and_ drugs is no longer sensible in our current social and political environment.

Speaks volumes about the state of the USA given that y'all's regulations on food are so lax that the topic already tanked an agreement with the EU (TTIP) as well as a bilateral agreement with the UK (the one the Brexiteers proclaimed would be possible once Brexit came, but still isn't there).

The most obvious differences are washing eggs, washing chicken carcasses with chlorine and prophylactic (or worse, growth-stimulating) usage of antibiotics. All of that is banned here, but allowed in the US - mostly to mask the horrible sanitary and working conditions in farms and slaughterhouses. I'm not going to act like European slaughterhouses are paradises because they are everything but that, but nowhere near the levels of horror from the US.

When even regulation to prevent the worst of the worst isn't feasible any more, frankly I'd say your system has failed entirely.

wbl · on Dec 15, 2024

The regulations weren't what sank the treaty. They were an excuse used by influential agriculture lobbies to sink it.

mschuster91 · on Dec 15, 2024

Ag lobbies are one thing (and they're pretty problematic as well, not shying away from extortion and some IMHO are even bordering on terrorism), but rest assured our populations absolutely and vocally do not want chlorinated chickens, nor do we want GMO food.

amelius · on Dec 14, 2024

Isn't coffee toxic to humans too?

panabee · on Dec 14, 2024

This is a good question and encapsulates the challenges of food and drug regulation.

Yes and no. At certain concentrations, many safe compounds become dangerous in humans.

Even at tiny doses, foods like peanuts may be safe for the vast majority yet lethal for a minority.

Given how devilishly heterogeneous the human race is, the ideal solution provides safety testing at the individual level, not the population level [0]. But this is years away until computational and biological breakthroughs arrive.

[0] Population level is a misnomer. FDA trial sizes below.

Phase 1: 20 to 100 people

Phase 2: up to several hundred people

Phase 3: 300 to 3,000 people

https://www.fda.gov/patients/drug-development-process/step-3...

EA-3167 · on Dec 14, 2024

Keep in mind that very commonly accepted safe substances such as carrots or water also become lethal at high enough doses, for all humans. It's often stated, and it can be tiring to hear, but the dose really does make the poison.

pas · on Dec 14, 2024

And context matters. Folks with this or that kidney problems can die from basically nothing while the median bloke doesn't even notice that their piss is a tad darker.

EA-3167 · on Dec 15, 2024

That's an excellent point, and of course individuals can have pretty stark differences in metabolism, with the alcohol flush reaction being a great and highly visible example.

DebtDeflation · on Dec 14, 2024

At a high enough dose, sure. But study after study over the last decade has shown coffee to have a positive effect on everything from diabetes to Alzheimers.

khendron · on Dec 14, 2024

Not for me. Coffee, anything caffeinated actually, makes me sick like I have the flu. The older I get, the more sensitive I become. I can't even eat chocolate anymore because of the caffeine content.

Dalewyn · on Dec 14, 2024

As the US Air Force once demonstrated, there is no such thing as the average man.

On average coffee is good, but you are not the average man.

rrr_oh_man · on Dec 14, 2024

Rabbit hole: designing flight controls that fit everyone turned out to fit no one

j16sdiz · on Dec 14, 2024

Tell this to GNOME developers

amelius · on Dec 14, 2024

Did you ever investigate the underlying cause?

khendron · on Dec 15, 2024

No, I just avoid it.

amelius · on Dec 14, 2024

I think most people can drink only about 5 cups of coffee before getting various symptoms.

This is quite a low amount, i.e., we're frequently consuming it at a dose that is close to or higher than the dose where you see negative effects.

101008 · on Dec 14, 2024

5 cups in which period?

Also, I don't think most people drink more than 2 cups of coffee per day -- if I know someone who drinks more than 2 or 3 I'd think they have a problem.

paulryanrogers · on Dec 15, 2024

How many were not bought and paid for by coffee producers?

markasoftware · on Dec 14, 2024

from a brief look, it seems most of the benefits are attributed to antioxidants / other chemicals in the coffee, not caffeine.

DebtDeflation · on Dec 14, 2024

I'm not a medical professional, but my understanding is that at least for the anti Alzheimer effect, it is due to the caffeine, specifically the effect it has on dilating capillaries in the brain.

NavinF · on Dec 14, 2024

Does that work with other stimulants? Pretty much all the stronger ones dialate capillaries with fewer side effects than caffeine

opan · on Dec 14, 2024

What other stimulants do you have in mind? Ephedrine?

NavinF · on Dec 15, 2024

Eg Dextroamphetamine is well studied. A dose that causes the same capillary dilation as caffeine would have undetectable side effects

bumby · on Dec 14, 2024

My understanding is the toxic level of caffeine is so high that you’d die of hyponatremia first from the water in coffee before caffeine kills you.

spondylosaurus · on Dec 14, 2024

Genuine question: does that mean when people die after consuming way too much caffeine, it's not because of "caffeine toxicity" per se, but because the effects of the caffeine put too much strain on their body?

wging · on Dec 14, 2024

Caffeine doesn't only come in coffee. Caffeine pills and powders make overdoses much easier. For example: https://pmc.ncbi.nlm.nih.gov/articles/PMC8824417/

necubi · on Dec 14, 2024

Nobody is dying from the caffeine in coffee (like 50-100mg per cup). According to the NIH, LD50 of caffeine is 150-200mg/kg (so say 10g for a small person). That's like 100 cups of coffee. Even with espresso that's hard to imagine.

It would need to be in powder form or concentrated in something far beyond natural levels of coffee/tea/matcha/etc.

aphantastic · on Dec 14, 2024

1 cup filter coffee can be 170mg or more. And LD50 isn’t really relevant here, even LD1 levels are deadly to hundreds of millions of people. It’s entirely possible for what some might consider a “normal” amount of coffee to be deadly to many. See other comment for espresso calculations.

https://m.youtube.com/watch?v=etnMr8oUSDo

necubi · on Dec 14, 2024

LD50 is (an estimate) of the 50th percentile (i.e., 50% chance of dying), but that doesn't mean it's linear. It _certainly_ doesn't mean that 1% of people will die at 2% of that value, which I think is what you're implying.

The lowest example of a lethal dose I can find in the literature is 57mg/kg. Caffeine overdoses are so rare that we don't know the true distribution, but it's clearly not the case that millions people will die from a few coffees.

Your other comment calculated the lethal dose as *a gallon of espresso*. That's like 125 shots. That is not a remotely normal amount of coffee. It would take multiple people over an hour to make that much espresso for you.

---

Edit: I can't reply, but "LD1" isn't a group of people and you can't just claim it's 1% of the population. LD50 doesn't imply anything about the population distribution or how it varies by person. It refers to a particular experimental set up (or estimate from a natural experiment) in which 50% of the subjects died after a certain dosage.

For example, the LD50 of falling is ~50ft. Some people will be more susceptible to dying by falling a certain distance than others, but there are many other factors involved and it makes no sense to say someone is in 1% of falling-death-probability.

I agree that LD50 doesn't tell you everything you'd want to know, like the lowest possible dose that might kill someone. There might be people who are extremely sensitive to a substance, or situations in which it's particularly dangerous (in combination with other substances or another health condition, for example). For something safe and widely used like caffeine, I'd expect that the vast majority of people would experience roughly similar toxicity (say, within 2x of the median) with a tiny population of outliers; but you can't just assume that there's 1% of the population that's drastically more sensitive.

aphantastic · on Dec 14, 2024

That’s not what I was implying at all I have no clue how you arrived at that. I’m saying an LD1 does exist – it’s the dose that would be fatal to 1% of a population (and further a LD0.1 and 0.0001 exist). These doses are lower than the LD50, fatal to millions, and approach what some would consider normal. For instance: https://www.nbcnews.com/news/amp/ncna759716

aphantastic · on Dec 14, 2024

The cooldown period built into HN is there for a reason: taking time to reflect on messages and do any necessary background research makes for better discussions than impulsively saying whatever is top of mind. I suggest you use this time to understand what an LD50 actually is and how the concept generalizes.

(cc @dang, seeing this growing trend of people misunderstanding the missing reply button and evading the timer via edits, perhaps UI affordances could be developed to better introduce folks to the feature?)

bumby · on Dec 17, 2024

"It's entirely possible" isn't the way to think about estimating risk because it assumes the risk goal is zero (ie any risk > 0 means the outcome is "possible"). A dose greater than LD50 means "more probable than not" of dying, absent additional information, which is a more appropriate framing.

Similarly with caffeine content "can be". All kinds of variables like roasting time affect the dose But the semi-standardized dose for a cup of coffee is about 100mg. Related to your link, they are a much larger cup of coffee for comparison. If you normalize it to the standard coffee size, it comes to 100mg caffeine, so right in line with what would be expected.

hnuser123456 · on Dec 14, 2024

LD1 would be deadly to about 80 million people

bumby · on Dec 14, 2024

Right, there’s not zero population risk. But for assessing the risk to any randomly sampled person, the risk approaches zero.

aphantastic · on Dec 14, 2024

Classic bayesian error. The population in question here isn’t the globe, but rather people who have already died from caffeine related causes. Naturally the rate of increased caffeine sensitivity amongst those folks will be different from the population at large.

bumby · on Dec 14, 2024

I never said randomly selection from the global population. The point still holds, even if sampling from only those who have died from caffeine: the LD1 dose, by definition, is still safe for almost all the population. That’s why arguing about LD1 or LD.0001 isn’t particularly useful and comes across as overly pedantic.

Also, FWIW the LD50 can be calculated with censored populations (ie not all subjects have died.) Think about it: if I administer a dose that kills half but leaves the other half living, the LD50 remains unchanged even if I continue increasing the dosage until all have died (or not). LD50 does not require a complete set.

aphantastic · on Dec 14, 2024

Ok, I think we’re on the same page.

aphantastic · on Dec 14, 2024

LD50 for espresso is roughly 1 gallon per 50kg body mass. I wouldn’t want to, but I could drink 2 gallons of water without significant issue. If we accept that some people will naturally have a lower tolerance (and that espresso isn’t the strongest drink in the world), it’s not hard to see a caffeine overdose itself being fatal.

(based on 36ml espresso having 110mg caffeine, LD50 caffeine is 150-200mg/kg)

Tade0 · on Dec 14, 2024

I see one problem here: caffeine is soluble in water and a diuretic, meaning anyone attempting suicide by coffee would have a bad time doing it.

Healthy kidneys work through around a litre of blood a minute, so my guess is that, ahem, "breaking the seal" would keep caffeine levels in check.

bumby · on Dec 15, 2024

Let’s go back to the OP, which asked about coffee. A quick search shows the LD50 for coffee is about 118 cups. At 6 oz per cup, that’s roughly 21 liters. The LD50 for water is listed as 6 liters (below what you’d drink “without significant issue” btw). So someone is much more likely reach the LD50 for water well before caffeine when drinking coffee.

Are there other caffeine delivery mechanisms that differ? Of course, but that’s not what the OP asked. The question was about the toxicity of coffee. That’s why it’s not worth arguing when something like caffeine powder provides the majority of ODs. Likewise there’s going to be variation in toxicity between individuals but those numbers are intended to be generalizable numbers to a population.

folmar · on Dec 14, 2024

1 gallon = 7.6 l, and anectada from Wikipedia (referenced to https://www.scientificamerican.com/article/strange-but-true-...) says 6 l in 3 hours killed a man. Half-life of caffeine is approximately 5 h, so quite similar.

bumby · on Dec 14, 2024

1 gal is about half that (3.8 liters)

ggm · on Dec 15, 2024

UK/imperial gallon is 4.5l US gallon is 3.78l more people used to use imperial than US but now most commonwealth economies are metric.

bumby · on Dec 15, 2024

Fair enough, but it seems per clear that the OP was referring to US gal but the math was just off by double.

krispyfi · on Dec 16, 2024

https://www.msn.com/en-us/money/companies/panera-to-pull-hig...

Panera used to sell a drink that contained close to the FDA maximum recommended daily quantity of caffeine, and also allowed free refills. Several people died, and sales were halted after some wrongful death lawsuits.

This isn't to say that caffeine is dangerous. Danger isn't an intrinsic property of a substance but rather an emergent property of the context in which it is used. (This is why the schedules of the Controlled Substance Act are inherently stupid.)

6SixTy · on Dec 14, 2024

Dose makes the poison

ZYbCRq22HbJ2y7 · on Dec 14, 2024

https://en.wikipedia.org/wiki/The_dose_makes_the_poison

daedrdev · on Dec 14, 2024

Most things are toxic at high enough doses, for example too much oxygen in the air is poisionous

amelius · on Dec 14, 2024

Yes, but I get the impression that a lot of people have a limit on the number of cups they can drink per day (usually below 5), after which they get various symptoms.

chrisweekly · on Dec 14, 2024

Toxicity is in the dose. Even good old H2O can be lethal in sufficient quantity.

Spivak · on Dec 14, 2024

If you want a rabbit hole of trivia to go down https://en.m.wikipedia.org/wiki/Median_lethal_dose is a great read.

amelius · on Dec 15, 2024

Reminds me of https://www.dhmo.org/facts.html

manmal · on Dec 14, 2024

Too much coffee does make some humans toxic.

osigurdson · on Dec 14, 2024

Even humble hydrogen can be dangerous in high enough concentrations. This, and daylight savings time doesn't actually add any hours of sunlight.