FWIW It runs on my 9060xt(AMD) 16gb, without any tweaks just fine. It's very use... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		patpatpat 41 days ago \| parent \| context \| favorite \| on: Nvidia Nemotron 3 Family of Models FWIW It runs on my 9060xt(AMD) 16gb, without any tweaks just fine. It's very useable. I asked it to write a prime sieve in c#, started responding in .38 seconds, and wrote an implementation @ 20 tokens/sec

Tepix 39 days ago | [–]

But you're using a 3rd party quant of unknown quality. Nvidia is only providing weights as BF16 and FP8.

genpfault 41 days ago | [–]

Getting ~150 tok/s on an empty context with a 24 GB 7900XTX via llama.cpp's Vukan backend.

Tepix 39 days ago | [–]

Again, you're using some 3rd party quantisations, not the weights supplied by Nvidia (which don't fit in 24GB).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact