Hot take: I think all these dictation tools are solving the wrong problem: they're optimizing for accurate transcription (and latency) when users actually need intelligent interpretation. For example: People don't speak in perfect emails. They speak in scattered thoughts and intentions that require contextual understanding.
I totally agree with this hot take. Whispering is not there yet, but I eventually want it to store as many of the transcripts as plain text markdown, alongside your audio files, in a folder.
The idea is that as we add more local-first apps into the ecosystem (writing, etc.), they're share this context. Transcription would benefit immensely if you also had a writing app that you could trust to store your data. To execute that vision, we needed a transcription app where we have control over how data is stored, and the best solution was to build our own.
You can use the unstructured chunkers and still use this to enhance the quality of the actual chunk. The chunkers from unstructured help you take a more informed approach for a specific type of data but they don't help you more accurate results for your use case.