Maybe I'm misunderstanding, but it seems like you're just talking about AI inpainting. That's like one of the first things people did with image diffusion technology. NVIDIA published a research paper on it back in 2018: https://arxiv.org/abs/1804.07723
Inpainting is harder on videos than on images, but there are plenty of models that can do it. Google's Veo 3 can remove objects from videos: https://deepmind.google/models/veo/