The github repo includes (among other things) a script (relying on python-pptx) to output decomposed layer images into a pptx file “where you can edit and move these layers flexibly.” (I've never user Powerpoint for this, but maybe it is good enough for this and ubiquitous enough that this is sensible?)
I saw some people at a company called Pruna AI got it down to 8 seconds with Cloudflare/Replicate, but I don't know if it was on consumer hardware or an A100/H100/H200, and I don't know if the inference optimization is open-source yet.