Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> So using 2 NVLinked GPU's with inference is not supported?

To make better use of multiple GPUs, we suggest employing a dedicated backend for serving the model. Please refer to https://tabby.tabbyml.com/docs/references/models-http-api/vl... for an example



I see. So this is like, I can have tabby be my LLM server with this limitation or I can just turn that feature off and point tabby at my self hosted LLM as any other OpenAI compatible endpoint?


Yes - however, the FIM model requires careful configuration to properly set the prompt template.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: