There are lots of third-party providers that will host your fine-tuned model for...

There are lots of third-party providers that will host your fine-tuned model for you, and just charge per token like OpenAI. Here are some of the providers I've personally used and would vouch for in production, along with their costs per 1M input tokens for Llama 3 8B, as a point of comparison:

- Replicate: $0.05 input, $0.25 output

- Octo: $0.15

- Anyscale: $0.15

- Together: $0.20

- Fireworks: $0.20

If you're looking for an end-to-end flow that will help you gather the training data, validate it, run the fine tune and then define evaluations, you could also check out my company, OpenPipe (https://openpipe.ai/). In addition to hosting your model, we help you organize your training data, relabel if necessary, define evaluations on the finished fine-tune, and monitor its performance in production. Our inference prices are higher than the above providers, but once you're happy with your model you can always export your weights and host them on one of the above!