Speech-to-text is a great feature and is a machine inference application whether...

polishdude20 · on Aug 11, 2024

About speech to text, are there models or apps that do speech to text but use a bit of AI to infer around the "umms and "uhhs" ?

Like if I'm doing a stream of consciousness talk about something while I'm on a hike, there's loads of utterances that would be converted to stuff Id need to edit out if it was a blog post.

Or better yet, have me be able to say "oh no, remove that last thing I said about the leprechauns."

Rinzler89 · on Aug 11, 2024

I was talking specifically about Generativ AI, not text-to-speech or speech-to-text since I don't consider those to be generative AI in the Callovian sense that's pushed nowadays and I don't want to be all pedantic about it and start splitting hairs on what technically is and what isn't, just keeping the mainstream frame of what the device manufacturers claim to be gen-AI.

alephnerd · on Aug 11, 2024

> not text-to-speech or speech-to-text since I don't consider those to be generative AI

Plenty of TTS, STT, Autocorrect, etc applications are now leveraging LLMs like LLAMA.

Functionally, GenAI is just a reskin around "NLP" like Siri.

> keeping the mainstream frame of what the device manufacturers claim to be gen-AI

If you're leveraging an LLM, it's safe to call it a GenAI product

halJordan · on Aug 11, 2024

It's being shoved down your throat, but even with such a close view you have no idea what it is. And don't want to discuss it because that's "pedantic"?

Rinzler89 · on Aug 11, 2024

Where did I say you can't discuss it? Feel free to discuss it if you want. Why do you need my permission? You'll just do it without me, since you seem to have a chip on your shoulder for no reason and I don't want to reward such attitudes.

I just clarified the gen-AI meaning I used in the context of my comment which is also the context manufacturers are referring to, and not the scientific definitions the AI experts are thinking about since your average consumer has no idea about ML and transformers and all the inner working of what they call AI.

yunohn · on Aug 11, 2024

What part of TTS is "generative"? It can definitely use ML/AI, but I fail to see the generative component.

jeffbee · on Aug 11, 2024

It's probably pointless to quibble over the definition since the term is now completely unmoored from whatever term of art it originally resembled.

Retric · on Aug 11, 2024

Be careful about straining your voice cords. Dictation is only fine if you’re using it sparingly.

sincerely · on Aug 11, 2024

Do you dictate differently than you normally talk? Or do you tell people who have conversations to be worried about their voice chords?

jeffbee · on Aug 11, 2024

This has to have been a wry joke, otherwise it's insane.

Example of a non-strenuous dictation task: if I am driving, my Android will read my texts and allow me to reply by voice, a speech-to-text and text-to-speech task that is damned handy.

Retric · on Aug 11, 2024

Individual non strenuous tasks still add up, it’s the total amount per day that matters. If you’re just dictating on the drive to work then it’s no big deal, but just because X is fine doesn’t automatically mean 2 X is fine.

Conversation between multiple people doesn’t involve one person speaking continuously for hours. As such you can spend a lot more hours per day dictating than is normal, that’s the risk not simply talking an extra 20 minutes per day.

JumpCrisscross · on Aug 11, 2024

> Dictation is only fine if you’re using it sparingly

Most correspondence throughout history was dictated.

Retric · on Aug 11, 2024

Conversation between people involves a lot of pauses and different people speaking.

Dictation for a few hours is a lot more stressful than normal conversation and you’re also speaking in your normal life, it adds up.