More

xenova · 2025-07-24T15:46:10 1753371970

This demo runs Voxtral-Mini-3B, a new audio language model from Mistral, enabling state-of-the-art audio transcription directly in your browser. Everything runs locally, meaning none of your data is sent to a server (and your transcripts are stored on-device).

xenova · 2025-02-07T15:30:08 1738942208

It took some time, but we finally got Kokoro TTS (v1.0) running in-browser w/ WebGPU acceleration! This enables real-time text-to-speech without the need for a server. Looking forward to your feedback!

amelius · 2025-02-07T18:23:51 1738952631

Now that's what I call "server-less" computing!

deivid · 2025-02-07T17:00:56 1738947656

Amazing! I'm interested in models running locally and Kokoro seems amazing. Are you aware of similar models but for Speech to text?

xenova · 2025-02-07T17:31:05 1738949465

We have released a bunch of speech recognition demos (using whisper, moonshine, and others). For example:

- https://huggingface.co/spaces/Xenova/whisper-web

- https://huggingface.co/spaces/Xenova/whisper-webgpu

- https://huggingface.co/spaces/Xenova/realtime-whisper-webgpu

- https://huggingface.co/spaces/webml-community/moonshine-web

alex_young · 2025-02-10T19:27:18 1739215638

The realtime Whisper demo is amazing.

How can I understand what's in the compiled JS though? Is there some source for that?

Ono-Sendai · 2025-02-07T17:25:52 1738949152

whisper

sebastiennight · 2025-02-07T22:26:38 1738967198

This is brilliant. All we need now is for someone to code a frontend for it so we can input an article's URL and have this voice read it out loud... built-in local voices on MacOS are not even close to this Kokoro model

satvikpendem · 2025-02-07T23:07:03 1738969623

There are a few already, I assume MacWhisper will add it. That being said, I am also working on a (crossplatform, in Flutter) UI for this.

sebastiennight · 2025-02-08T22:31:37 1739053897

My understanding is that MacWhisper is a front-end for Whisper.cpp so... it does Speech-to-text? (transcribing what you dictate)

Here I'm talking about the model shared in this thread, which is text-to-speech (reading out loud content from the web)

satvikpendem · 2025-02-08T22:43:41 1739054621

Yes, I am saying they might include features for TTS in addition to their current STT feature set. Seems like many of these sorts of apps are looking to add both to be more full fledged.

xenova · on Jan 16, 2025

NPM package: https://www.npmjs.com/package/kokoro-js GitHub: https://github.com/hexgrad/kokoro

xenova · on Jan 1, 2025

For those interested in learning more, the source code is available on GitHub: https://github.com/huggingface/transformers.js-examples/tree...

xenova · on June 10, 2024

It uses OpenAI's set of whisper models, which support multilingual transcription and translation across 100 languages. Since the models run entirely locally in your browser (thanks to Transformers.js), no data leaves your device! Huge for privacy!

Source code: https://github.com/xenova/whisper-web/tree/experimental-webg...

xenova · on April 11, 2024

We’ve put out a ton of demos that use much smaller models (10-60 MB), including:

- (44MB) In-browser background removal: https://huggingface.co/spaces/Xenova/remove-background-web. (We also put out a WebGPU version: https://huggingface.co/spaces/Xenova/remove-background-webgp...).

- (51MB) Whisper Web for automatic speech recognition: https://huggingface.co/spaces/Xenova/whisper-web (just select the quantized version in settings).

- (28MB) Depth Anything Web for monocular depth estimation: https://huggingface.co/spaces/Xenova/depth-anything-web

- (14MB) Segment Anything Web for image segmentation: https://huggingface.co/spaces/Xenova/segment-anything-web

- (20MB) Doodle Dash, an ML-powered sketch detection game: https://huggingface.co/spaces/Xenova/doodle-dash

… and many many more! Check out the Transformers.js demos collection for some others: https://huggingface.co/collections/Xenova/transformersjs-dem....

Models are cached on a per-domain basis (using the Web Cache API), meaning you don’t need to re-download the model on every page load. If you would like to persist the model across domains, you can create browser extensions with the library! :)

As for your last point, there are efforts underway, but nothing I can speak about yet!

jph00 · on April 11, 2024

Why is only one of them on WebGPU? Is it because there additional tricky steps required to make a model work on WebGPU, or is there a limitation on what ops are supported there?

I'm keen to do more stuff with WebGPU, so very interested to learn about challenges and limitations here.

xenova · on April 11, 2024

We have some other WebGPU demos, including:

- WebGPU embedding benchmark: https://huggingface.co/spaces/Xenova/webgpu-embedding-benchm...

- Real-time object detection: https://huggingface.co/spaces/Xenova/webgpu-video-object-det...

- Real-time background removal: https://huggingface.co/spaces/Xenova/webgpu-video-background...

- WebGPU depth estimation: https://huggingface.co/spaces/Xenova/webgpu-depth-anything

- Image background removal: https://huggingface.co/spaces/Xenova/remove-background-webgp...

You can follow the progress for full WebGPU support in the v3 development branch (https://github.com/xenova/transformers.js/pull/545).

To answer your question, while there are certain ops missing, the main limitation at the moment is for models with decoders... which are not very fast (yet) due to inefficient buffer reuse and many redundant copies between CPU and GPU. We're working closely with the ORT team to fix these issues though!

jfoster · on April 11, 2024

Thank you for the reply. Seems like all of the links are down at the moment, but it does sound a bit more feasible for some applications than I had assumed.

Really glad to hear the last part. Some of the new capabilities seem fundamental enough that they ought to be in browsers, in my opinion.

xenova · on April 11, 2024

Odd, the links seem to work for me. What error do you see? Can you try on a different network (e.g., mobile)?

jfoster · on April 11, 2024

Error is "xenova-segment-anything-web.static.hf.space unexpectedly closed the connection."

Works on mobile network, though, so might just be my internet connection.

xenova · on Feb 9, 2024

The 8-bit quantized version of the RMBG-v1.4 model is ~45MB, which makes it perfect for in-browser usage (it even works on mobile)!

Link to model: https://huggingface.co/briaai/RMBG-1.4

xenova · on Dec 4, 2023

Paper: https://arxiv.org/abs/2312.00752 Models: https://huggingface.co/state-spaces

xenova · on Aug 7, 2023

Hi everyone, Joshua from Hugging Face (and the creator of Transformers.js) here.

Starting with embeddings, we hope to simplify and improve the developer experience when working with embeddings. Supabase already has great support for storage and retrieval of embeddings (thanks to pgvector) [0], so it feels like this collaboration was long overdue!

Open-source embedding models are both smaller and more performant [1] than closed-source alternatives, so it's quite surprising that 98% of Supabase applications currently use OpenAI's text-embedding-ada-002 [2]. Probably because it is just easier to access? Well... that changes today! You can also iterate extremely quickly: experiment with and choose the model that works best for you (no vendor lock-in)! In fact, since the article was written, a new leader has just appeared on top of the MTEB leaderboard [3].

I look forward to answering any questions you have!

[0] https://supabase.com/vector [1] https://huggingface.co/spaces/mteb/leaderboard [2] https://supabase.com/blog/hugging-face-supabase [3] https://huggingface.co/BAAI/bge-large-en

xenova · on Aug 5, 2023

This web-app fixes the two main problems of OpenAI's tokenizer playground: (1) being capped at 50k characters, and (2) not supporting GPT-4/GPT-3.5 tokenizers.

Everything runs in-browser thanks to Transformers.js.