Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How much more work is it to get those up and running?


Almost none of you already have python. Download exl2, exui from github and run a few terminal commands. This let's me run the 120b param models, which won't fit in vram if I use llamacpp


Typo: I meant "almost none if you already have python installed"


Wait, so using large models isn't limited by VRAM anymore?


It is. I have 48GB of VRAM. But exl2 is more efficient, and can be quantized to partial bits. So you can run things like 4.75bpw, etc.

I can run 120b models at 3bpw.

The larger models like this are less affected (increased perplexity) by the quantization.


Did you have to quantize it yourself to 4.75bpw and 3bpw or are they readily available for download?


Most of the time it's readily available eg:

Panchovix/goliath-120b-exl2 (there's a different branch for each size)

Some of them I've had to do myself eg. I wanted a Q2 GGUF of Falcon 180b

There's a guy on huggingface called "TheBloke" who does GGUF, AWQ and GPTQ for most models. For exl2, you can usually just search for exl2 and find them.


Thanks, friend!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: