A solution might be: use two different AIs. The first one you can prompt to your heart's content. The second one is never prompted by anyone except the service provider. The second one does the filtering.
If it's filtering by taking the output of the first model as a prompt (with some framing), then that is equally susceptible to prompt engineering. Indeed, you can already tell ChatGPT to write a prompt for itself to do such and such, and it will do so. You can even tell it to write a prompt to write a prompt.