Deepseek pioneered automatic prefix caching and caches on SSD. SSD reads are so ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		jbellis 77 days ago \| parent \| context \| favorite \| on: Claude Haiku 4.5 Deepseek pioneered automatic prefix caching and caches on SSD. SSD reads are so fast compared to LLM inference that I can't think of a reason to waste ram on it.

jychang 77 days ago [–]

It’s not instantly fast though. Context is probably ~20gb of VRAM at max context size. That’s gonna take some time to get from SSD no matter what.

TtFT will get slower if you export kv cache to SSD.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact