Hacker Newsnew | past | comments | ask | show | jobs | submit | xenova's commentslogin

This demo runs Voxtral-Mini-3B, a new audio language model from Mistral, enabling state-of-the-art audio transcription directly in your browser. Everything runs locally, meaning none of your data is sent to a server (and your transcripts are stored on-device).


It took some time, but we finally got Kokoro TTS (v1.0) running in-browser w/ WebGPU acceleration! This enables real-time text-to-speech without the need for a server. Looking forward to your feedback!


Now that's what I call "server-less" computing!


Amazing! I'm interested in models running locally and Kokoro seems amazing. Are you aware of similar models but for Speech to text?



The realtime Whisper demo is amazing.

How can I understand what's in the compiled JS though? Is there some source for that?


whisper


This is brilliant. All we need now is for someone to code a frontend for it so we can input an article's URL and have this voice read it out loud... built-in local voices on MacOS are not even close to this Kokoro model


There are a few already, I assume MacWhisper will add it. That being said, I am also working on a (crossplatform, in Flutter) UI for this.


My understanding is that MacWhisper is a front-end for Whisper.cpp so... it does Speech-to-text? (transcribing what you dictate)

Here I'm talking about the model shared in this thread, which is text-to-speech (reading out loud content from the web)


Yes, I am saying they might include features for TTS in addition to their current STT feature set. Seems like many of these sorts of apps are looking to add both to be more full fledged.



For those interested in learning more, the source code is available on GitHub: https://github.com/huggingface/transformers.js-examples/tree...


It uses OpenAI's set of whisper models, which support multilingual transcription and translation across 100 languages. Since the models run entirely locally in your browser (thanks to Transformers.js), no data leaves your device! Huge for privacy!

Source code: https://github.com/xenova/whisper-web/tree/experimental-webg...


We’ve put out a ton of demos that use much smaller models (10-60 MB), including:

- (44MB) In-browser background removal: https://huggingface.co/spaces/Xenova/remove-background-web. (We also put out a WebGPU version: https://huggingface.co/spaces/Xenova/remove-background-webgp...).

- (51MB) Whisper Web for automatic speech recognition: https://huggingface.co/spaces/Xenova/whisper-web (just select the quantized version in settings).

- (28MB) Depth Anything Web for monocular depth estimation: https://huggingface.co/spaces/Xenova/depth-anything-web

- (14MB) Segment Anything Web for image segmentation: https://huggingface.co/spaces/Xenova/segment-anything-web

- (20MB) Doodle Dash, an ML-powered sketch detection game: https://huggingface.co/spaces/Xenova/doodle-dash

… and many many more! Check out the Transformers.js demos collection for some others: https://huggingface.co/collections/Xenova/transformersjs-dem....

Models are cached on a per-domain basis (using the Web Cache API), meaning you don’t need to re-download the model on every page load. If you would like to persist the model across domains, you can create browser extensions with the library! :)

As for your last point, there are efforts underway, but nothing I can speak about yet!


Why is only one of them on WebGPU? Is it because there additional tricky steps required to make a model work on WebGPU, or is there a limitation on what ops are supported there?

I'm keen to do more stuff with WebGPU, so very interested to learn about challenges and limitations here.


We have some other WebGPU demos, including:

- WebGPU embedding benchmark: https://huggingface.co/spaces/Xenova/webgpu-embedding-benchm...

- Real-time object detection: https://huggingface.co/spaces/Xenova/webgpu-video-object-det...

- Real-time background removal: https://huggingface.co/spaces/Xenova/webgpu-video-background...

- WebGPU depth estimation: https://huggingface.co/spaces/Xenova/webgpu-depth-anything

- Image background removal: https://huggingface.co/spaces/Xenova/remove-background-webgp...

You can follow the progress for full WebGPU support in the v3 development branch (https://github.com/xenova/transformers.js/pull/545).

To answer your question, while there are certain ops missing, the main limitation at the moment is for models with decoders... which are not very fast (yet) due to inefficient buffer reuse and many redundant copies between CPU and GPU. We're working closely with the ORT team to fix these issues though!


Thank you for the reply. Seems like all of the links are down at the moment, but it does sound a bit more feasible for some applications than I had assumed.

Really glad to hear the last part. Some of the new capabilities seem fundamental enough that they ought to be in browsers, in my opinion.


Odd, the links seem to work for me. What error do you see? Can you try on a different network (e.g., mobile)?


Error is "xenova-segment-anything-web.static.hf.space unexpectedly closed the connection."

Works on mobile network, though, so might just be my internet connection.


The 8-bit quantized version of the RMBG-v1.4 model is ~45MB, which makes it perfect for in-browser usage (it even works on mobile)!

Link to model: https://huggingface.co/briaai/RMBG-1.4



Hi everyone, Joshua from Hugging Face (and the creator of Transformers.js) here.

Starting with embeddings, we hope to simplify and improve the developer experience when working with embeddings. Supabase already has great support for storage and retrieval of embeddings (thanks to pgvector) [0], so it feels like this collaboration was long overdue!

Open-source embedding models are both smaller and more performant [1] than closed-source alternatives, so it's quite surprising that 98% of Supabase applications currently use OpenAI's text-embedding-ada-002 [2]. Probably because it is just easier to access? Well... that changes today! You can also iterate extremely quickly: experiment with and choose the model that works best for you (no vendor lock-in)! In fact, since the article was written, a new leader has just appeared on top of the MTEB leaderboard [3].

I look forward to answering any questions you have!

[0] https://supabase.com/vector [1] https://huggingface.co/spaces/mteb/leaderboard [2] https://supabase.com/blog/hugging-face-supabase [3] https://huggingface.co/BAAI/bge-large-en


This web-app fixes the two main problems of OpenAI's tokenizer playground: (1) being capped at 50k characters, and (2) not supporting GPT-4/GPT-3.5 tokenizers.

Everything runs in-browser thanks to Transformers.js.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: