This demo runs Voxtral-Mini-3B, a new audio language model from Mistral, enabling state-of-the-art audio transcription directly in your browser. Everything runs locally, meaning none of your data is sent to a server (and your transcripts are stored on-device).
It took some time, but we finally got Kokoro TTS (v1.0) running in-browser w/ WebGPU acceleration! This enables real-time text-to-speech without the need for a server. Looking forward to your feedback!
This is brilliant. All we need now is for someone to code a frontend for it so we can input an article's URL and have this voice read it out loud... built-in local voices on MacOS are not even close to this Kokoro model
Yes, I am saying they might include features for TTS in addition to their current STT feature set. Seems like many of these sorts of apps are looking to add both to be more full fledged.
It uses OpenAI's set of whisper models, which support multilingual transcription and translation across 100 languages. Since the models run entirely locally in your browser (thanks to Transformers.js), no data leaves your device! Huge for privacy!
Models are cached on a per-domain basis (using the Web Cache API), meaning you don’t need to re-download the model on every page load. If you would like to persist the model across domains, you can create browser extensions with the library! :)
As for your last point, there are efforts underway, but nothing I can speak about yet!
Why is only one of them on WebGPU? Is it because there additional tricky steps required to make a model work on WebGPU, or is there a limitation on what ops are supported there?
I'm keen to do more stuff with WebGPU, so very interested to learn about challenges and limitations here.
To answer your question, while there are certain ops missing, the main limitation at the moment is for models with decoders... which are not very fast (yet) due to inefficient buffer reuse and many redundant copies between CPU and GPU. We're working closely with the ORT team to fix these issues though!
Thank you for the reply. Seems like all of the links are down at the moment, but it does sound a bit more feasible for some applications than I had assumed.
Really glad to hear the last part. Some of the new capabilities seem fundamental enough that they ought to be in browsers, in my opinion.
Hi everyone, Joshua from Hugging Face (and the creator of Transformers.js) here.
Starting with embeddings, we hope to simplify and improve the developer experience when working with embeddings. Supabase already has great support for storage and retrieval of embeddings (thanks to pgvector) [0], so it feels like this collaboration was long overdue!
Open-source embedding models are both smaller and more performant [1] than closed-source alternatives, so it's quite surprising that 98% of Supabase applications currently use OpenAI's text-embedding-ada-002 [2]. Probably because it is just easier to access? Well... that changes today! You can also iterate extremely quickly: experiment with and choose the model that works best for you (no vendor lock-in)! In fact, since the article was written, a new leader has just appeared on top of the MTEB leaderboard [3].
I look forward to answering any questions you have!
This web-app fixes the two main problems of OpenAI's tokenizer playground: (1) being capped at 50k characters, and (2) not supporting GPT-4/GPT-3.5 tokenizers.
Everything runs in-browser thanks to Transformers.js.