Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

yeah, i'd like to be able to run it locally. it should fit well onto my 12gb gpu


The model is based on Qwen2.5-Coder-7b it seems. I currently run some quantized variant of Qwen2.5-Coder-7b locally with llama.cpp and it fits nicely in the 8GB VRAM of my Radeon 7600 (with excellent performance BTW), so it looks like it should be perfectly possible.

I would also only use Zeta locally.


Are you happy with the speed with your 8GB GPU?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: