Hacker Newsnew | past | comments | ask | show | jobs | submit | dandiep's commentslogin

For someone looking to switch over to Gemini from OpenAI, are there any gotchas one should be aware of? E.g. I heard some mention of API limits and approvals? Or in terms of prompt writing? What advice do people have?

https://epoch.ai/benchmarks/simplebench

Just do it.

I use a service where I have access to all SOTA models and many open sourced models, so I change models within chats, using MCPs eg start a chat with opus making a search with perplexity and grok deepsearch MCPs and google search, next query is with gpt 5 thinking Xhigh, next one with gemini 3 pro, all in the same conversation. It's fantastic! I can't imagine what it would be like again to be locked into using one (or two) companies. I have nothing to do with the guys who run it (the hosts from the podcast This day in AI, though if you're interested have a look in the simtheory.ai discord.

I don't know how people use one service can manage...


99% of what I do is fine-tuned models, so there is a certain level of commitment I have to make around training and time to switch.

Still no GPT 5.x fine tuning?

I emailed support a while back to see if there was an early access program (99.99% sure the answer is yes). This is when I discovered that their support is 100% done by AI and there is no way to escalate a case to a human.


Containment breach is going to occur from a disgruntled customer convincing the customer service bot it needs to get a hold of a supervisor

I tried this!

Me: Is it possible get early access to the GPT 5 fine tuning? It is much better at non-English languages, which is the majority of what we do, so it would be greatly beneficial to us.

Bot: I'm an AI support agent. I don't have information about early access to GPT-5 fine-tuning or unannounced features. Currently, there is no public information or program for early access to GPT-5 fine-tuning. Please keep an eye on official OpenAI channels for updates. Let me know if you have any questions about current features or products!

Me: Can you ask someone from the team if I can be part of the early access? I know there is no public information, but there is an early access program and I would like to be part of it.

Bot: I'm an AI support agent and do not have the ability to forward requests or make inquiries about early access programs with the OpenAI team. If early access becomes available, it will be announced through official channels. Let me know if you have any other questions about OpenAI products or current fine-tuning options.

Me: How do I get in touch with the open ai team?

Bot: I'm an AI support agent and can only assist with OpenAI products or services through this support channel. There is no separate contact method for the OpenAI team beyond official support and public announcements. If you have questions about current products or features, I'm here to help

Should have tried a jailbreak maybe...


Yea you might have to convince it someone is in mortal danger or something

You're talking about a model that will notoriously refuse to say a racist word even if you tell it that it's the only way to prevent a nuclear apocalypse.

100% agree. Will never understand the love for the global menu bar, it makes no sense.


You don't have to like it, but the global menu bar is at the top of the screen which means you just fling the mouse to the top and then go left or right, instead of having to get to the right vertical.


True. This could be nicely solved by placing a non-global window all the way at the top of the window, so that you can still fling the cursor to the top of the screen if the window is maximized or otherwise along the top edge of the screen.


I mean, it does take less space (compared to having one per window)


Only if you have two or more windows vertically stacked, which isn’t what you’re likely to be doing when you want to maximize vertical space.


but if they're stacked horizontally, buttons don't fit


Isn’t this solved in 3.14/PEP-649?


I want to say it (or something similar at least) was originally addressed by from __future__ import annotations back in Python 3.7/3.8 or thereabouts? I definitely remember having to use stringified types a while back but I haven't needed to for quite a while now.


Yes, annotations allows you to use the declared types as they are, no strings.


It turns them into thunks (formerly strings) automatically, an important detail if you're inspecting annotations at run time because the performance hit of resolving the actual type can be significant.


TIL, thanks! It looks like 3.14 is also changing it so that all evaluations are lazy.


At last, Pi-thon.


I've been using yt-dlp to download transcripts. Are there alternatives that don't require going through all these hoops? I'm guessing no.


I thought transcripts already broke a long ago. Are they working again?


Whisper is unusable IMO because of the hallucinations. Widely documented. Removing silence from audio clips helps, but even then it will auto correct grammar, translating bilingual speech, etc. Improved in the latest audio models but not solved [1]

1. https://news.ycombinator.com/item?id=43427376


I wouldn't describe it as "unusable" so much as needing to understand its constraints and how to work around them. I built a business on top of Whisper [1] and one of the early key insights was to implement a good voice activity detection (VAD) model in order to reduce Whisper's hallucinations on silence.

[1] https://speechischeap.com


How does this make a profit? Whisper should be $0.006 to $0.010 per minute, but you rate less than $0.001? Do you 10x the audio?


Thanks for noticing. It took a lot of effort to optimize the pipeline every step of the way. VAD, inference server, hardware optimization, etc. But nothing that would compromise on quality. The audio is currently transcribed in its original speed. I'll be sure to publish something if I manage to speed it up without incurring any losses to the WER.


That's the problem with raws large models, it should always be coupled with satellite small models and logic. It's (probably) easier to detect hallucinations using a traditional ML/DL model that can catch mismatches (it's easy to build a synthetic dataset for this) than transcribing. And the simplest piece of code can detect a silence and that it should match no text.


well, auto correcting grammar happens in normal subtitles too... "Why don't subtitles match dubbing?" by Tom Scott: https://youtu.be/pU9sHwNKc2c


I wish Google would provide a WebRTC endpoint for their Live mode like Open AI does for their Realtime models [1]. Makes it so much easier to deploy without needing something like LiveKit or Pipecat.

1. https://platform.openai.com/docs/guides/realtime#connect-wit...


Appreciate the efforts. It’s not there yet, but when it gets there it will open up a lot of use cases.

Any fine tuning for s2s in the horizon?


1) Previous TTS models had problems with major problems accents. E.g. a Spanish sentence could drift from a Spain accent to Mexican to American all within one sentence. Has this been improved and/or is it still a WIP?

2) What is the latency?

3) Your STT API/Whisper had MAJOR problems with hallucinating things the user didn't say. Is this fixed?

4) Whisper and your audio models often auto corrected speech, e.g. if someone made a grammatical error. Or if someone is speaking Spanish and inserted an English word, it would change the word to the Spanish equivalent. Does this still happen?


1/ we've been working a lot on accents, so expect improvements with these models... though we're not done. Would be curious how you find them. And try giving specific detailed instructions + examples for the accents you want

2/ We're doing everything we can to make it fast. Very critical that it can stream audio meaningfully faster than realtime

3+4/ I wouldn't call hallucinations "solved", but it's been the central focus for these models. So I hope you find it much improved


As mentioned in another comment, the British accents are very far from being authentic.


3) Whisper really needs to be paired with Silero VAD, otherwise the hallucination problem makes it almost unusable.


100% and I’ve done this, but it’s still there.


This is a fantastic idea. Now how to get it on RDS…



PostgreSQL Anonymizer developer here : this was possible with version 0.8 which is now completely unmaintained.

This extension is currently not available on RDS but it is available on many others DBaaS providers : Azure SQL, Google Cloud SQL, Crunchy Bridge, ....


Assuming if it's for a support team or internal users with a lower SLA, I wonder if it's possible to have a small self-hosted PostgreSQL server that basically acts as a shim by holding a foreign-data wrapper connection to the actual RDS instance



In RDS, if you cannot use this, you can create masked view and use query rewrite to make it work.

In my experience PG anonymizer has performance issues when it comes to large queries.


Version 2.0 was released a few days ago. It's a complete rewrite in Rust.

Performance should be better than with v1.x


+1 for RDS support. I have wanted to use this for a while in our production systems. reply



Same. Lack of RDS support is the only reason we aren’t using this.



PostgreSQL Anonymizer developer here : this was possible with version 0.8 which is now completely unmaintained. This extension is currently not available on RDS but it is available on many others DBaaS providers : Azure SQL, Google Cloud SQL, Crunchy Bridge, ....


Just to jump in here -> We support RDS + more and you can self-host, Neosync.

https://github.com/nucleuscloud/neosync

(I'm one of the co-founders)


I tried to figure out how/if this does what I need and your README had no examples. I clicked a couple of level deep, found no obvious demonstrations and left.

I checked the homepage but I do not watch Loom-style demos personally, definitely not 5 minute ones, and so I left.

-

When I click on OP's link, or just search for it on Google, it takes less than a full page for the extension to show me an extremely straightforward demonstration of its value. You should have something like that.

A simple example of what queries will look like, what setup will look like, all concisely communicated, no 5 minute lectures involved.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: