For someone looking to switch over to Gemini from OpenAI, are there any gotchas one should be aware of? E.g. I heard some mention of API limits and approvals? Or in terms of prompt writing? What advice do people have?
I use a service where I have access to all SOTA models and many open sourced models, so I change models within chats, using MCPs eg start a chat with opus making a search with perplexity and grok deepsearch MCPs and google search, next query is with gpt 5 thinking Xhigh, next one with gemini 3 pro, all in the same conversation. It's fantastic! I can't imagine what it would be like again to be locked into using one (or two) companies. I have nothing to do with the guys who run it (the hosts from the podcast This day in AI, though if you're interested have a look in the simtheory.ai discord.
I don't know how people use one service can manage...
I emailed support a while back to see if there was an early access program (99.99% sure the answer is yes). This is when I discovered that their support is 100% done by AI and there is no way to escalate a case to a human.
Me: Is it possible get early access to the GPT 5 fine tuning? It is much better at non-English languages, which is the majority of what we do, so it would be greatly beneficial to us.
Bot: I'm an AI support agent. I don't have information about early access to GPT-5 fine-tuning or unannounced features. Currently, there is no public information or program for early access to GPT-5 fine-tuning. Please keep an eye on official OpenAI channels for updates. Let me know if you have any questions about current features or products!
Me: Can you ask someone from the team if I can be part of the early access? I know there is no public information, but there is an early access program and I would like to be part of it.
Bot: I'm an AI support agent and do not have the ability to forward requests or make inquiries about early access programs with the OpenAI team. If early access becomes available, it will be announced through official channels. Let me know if you have any other questions about OpenAI products or current fine-tuning options.
Me: How do I get in touch with the open ai team?
Bot: I'm an AI support agent and can only assist with OpenAI products or services through this support channel. There is no separate contact method for the OpenAI team beyond official support and public announcements. If you have questions about current products or features, I'm here to help
You're talking about a model that will notoriously refuse to say a racist word even if you tell it that it's the only way to prevent a nuclear apocalypse.
You don't have to like it, but the global menu bar is at the top of the screen which means you just fling the mouse to the top and then go left or right, instead of having to get to the right vertical.
True. This could be nicely solved by placing a non-global window all the way at the top of the window, so that you can still fling the cursor to the top of the screen if the window is maximized or otherwise along the top edge of the screen.
I want to say it (or something similar at least) was originally addressed by from __future__ import annotations back in Python 3.7/3.8 or thereabouts? I definitely remember having to use stringified types a while back but I haven't needed to for quite a while now.
It turns them into thunks (formerly strings) automatically, an important detail if you're inspecting annotations at run time because the performance hit of resolving the actual type can be significant.
Whisper is unusable IMO because of the hallucinations. Widely documented. Removing silence from audio clips helps, but even then it will auto correct grammar, translating bilingual speech, etc. Improved in the latest audio models but not solved [1]
I wouldn't describe it as "unusable" so much as needing to understand its constraints and how to work around them. I built a business on top of Whisper [1] and one of the early key insights was to implement a good voice activity detection (VAD) model in order to reduce Whisper's hallucinations on silence.
Thanks for noticing. It took a lot of effort to optimize the pipeline every step of the way. VAD, inference server, hardware optimization, etc. But nothing that would compromise on quality. The audio is currently transcribed in its original speed. I'll be sure to publish something if I manage to speed it up without incurring any losses to the WER.
That's the problem with raws large models, it should always be coupled with satellite small models and logic. It's (probably) easier to detect hallucinations using a traditional ML/DL model that can catch mismatches (it's easy to build a synthetic dataset for this) than transcribing. And the simplest piece of code can detect a silence and that it should match no text.
I wish Google would provide a WebRTC endpoint for their Live mode like Open AI does for their Realtime models [1]. Makes it so much easier to deploy without needing something like LiveKit or Pipecat.
1) Previous TTS models had problems with major problems accents. E.g. a Spanish sentence could drift from a Spain accent to Mexican to American all within one sentence. Has this been improved and/or is it still a WIP?
2) What is the latency?
3) Your STT API/Whisper had MAJOR problems with hallucinating things the user didn't say. Is this fixed?
4) Whisper and your audio models often auto corrected speech, e.g. if someone made a grammatical error. Or if someone is speaking Spanish and inserted an English word, it would change the word to the Spanish equivalent. Does this still happen?
1/ we've been working a lot on accents, so expect improvements with these models... though we're not done. Would be curious how you find them. And try giving specific detailed instructions + examples for the accents you want
2/ We're doing everything we can to make it fast. Very critical that it can stream audio meaningfully faster than realtime
3+4/ I wouldn't call hallucinations "solved", but it's been the central focus for these models. So I hope you find it much improved
PostgreSQL Anonymizer developer here : this was possible with version 0.8 which is now completely unmaintained.
This extension is currently not available on RDS but it is available on many others DBaaS providers : Azure SQL, Google Cloud SQL, Crunchy Bridge, ....
Assuming if it's for a support team or internal users with a lower SLA, I wonder if it's possible to have a small self-hosted PostgreSQL server that basically acts as a shim by holding a foreign-data wrapper connection to the actual RDS instance
PostgreSQL Anonymizer developer here : this was possible with version 0.8 which is now completely unmaintained.
This extension is currently not available on RDS but it is available on many others DBaaS providers : Azure SQL, Google Cloud SQL, Crunchy Bridge, ....
I tried to figure out how/if this does what I need and your README had no examples. I clicked a couple of level deep, found no obvious demonstrations and left.
I checked the homepage but I do not watch Loom-style demos personally, definitely not 5 minute ones, and so I left.
-
When I click on OP's link, or just search for it on Google, it takes less than a full page for the extension to show me an extremely straightforward demonstration of its value. You should have something like that.
A simple example of what queries will look like, what setup will look like, all concisely communicated, no 5 minute lectures involved.
reply