Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Any plans to offer speech to speech models which keep prosody, intonation, and timing intact? ElevenLabs is getting expensive for this.


we'll keep expanding these GPT-4o based models with more controls. Is the main feature missing we're missing custom voices?


No, not custom voices - but voices that can be influenced by a recording. As in, a male voice actor records a part, and the model transforms it to a female part - keeping all the prosody, intonation and timing in the original recording. This would allow one voice actor to do many roles.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: