Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
patpatpat
41 days ago
|
parent
|
context
|
favorite
| on:
Nvidia Nemotron 3 Family of Models
FWIW It runs on my 9060xt(AMD) 16gb, without any tweaks just fine. It's very useable. I asked it to write a prime sieve in c#, started responding in .38 seconds, and wrote an implementation @ 20 tokens/sec
Tepix
39 days ago
|
next
[–]
But you're using a 3rd party quant of unknown quality. Nvidia is only providing weights as BF16 and FP8.
genpfault
41 days ago
|
prev
[–]
Getting ~150 tok/s on an empty context with a 24 GB 7900XTX via llama.cpp's Vukan backend.
Tepix
39 days ago
|
parent
[–]
Again, you're using some 3rd party quantisations, not the weights supplied by Nvidia (which don't fit in 24GB).
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: