You can utilize caching like https://platform.openai.com/docs/guides/prompt-caching and note that "Cache hits are only possible for exact prefix matches within a prompt" and that the cache contains "Messages: The complete messages array, encompassing system, user, and assistant interactions." and "Prompt Caching does not influence the generation of output tokens or the final response provided by the API. Regardless of whether caching is used, the output generated will be identical." So it's matching and avoiding the full reprocessing, but in a functionally identical way as reprocessing the whole conversation from scratch. Consider if the server with your conversation history crashed. It would be replayed on one without the cache.
If you're selectively faking things you don't care. You may not even be aware because the caching is transparent to you and you send the whole set of messages to the system each time either way. From the perspective of the person faking their model to look better than it is, it requires no special implementation changes.
And if you're faking your model to look better than it is, you probably aren't sending every call out to the paid 3rd party, you're more likely intentionally only using it to guide your model periodically.
> aren't sending every call out to the paid 3rd party, you're more likely intentionally only using it to guide your model periodically.
I'd you do that, you're going to have to pay each token multiple times: both as inferred token on your model, and as input tokens on the third party and your model.
If the conversation are long enough (I didn't do the math but I suspect they don't even need to be that long) it's going to be costlier than just using the paid model with caching.
You can utilize caching like https://platform.openai.com/docs/guides/prompt-caching and note that "Cache hits are only possible for exact prefix matches within a prompt" and that the cache contains "Messages: The complete messages array, encompassing system, user, and assistant interactions." and "Prompt Caching does not influence the generation of output tokens or the final response provided by the API. Regardless of whether caching is used, the output generated will be identical." So it's matching and avoiding the full reprocessing, but in a functionally identical way as reprocessing the whole conversation from scratch. Consider if the server with your conversation history crashed. It would be replayed on one without the cache.