More

robbiemitchell · on Jan 11, 2025

This appears to be an excuse to rep your own AI startup.

ahaucnx · on Jan 11, 2025

Yes I agree.

You should be transparent about promoting your own company. Sneaking it into like this will definitely backfire.

What prevents you from being open about it?

robbiemitchell · on Jan 3, 2025

I get a better first pass at code by asking it to write code at the level of a "staff level" or "principal" engineer.

For any task, whether code or a legal document, immediately asking "What can be done to make it better?" and/or "Are there any problems with this?" typically leads to improvement.

robbiemitchell · on Dec 7, 2024

One helpful addition would be Requests Per Minute (RPM), which varies wildly and is critical for streaming use cases -- especially with Bedrock where the quota is account wide.

robbiemitchell · on Nov 16, 2024

For training, would it be useful to stabilize the footage first?

FatalLogic · on Nov 16, 2024

Stabilization appears to be a subset of a literally wider, but more rewarding, challenge: reconstructing the whole area that is scanned by the camera. It could be better to work on that challenge, not on simple stabilization.

That's similar to how the human visual system 'paints' a coherent scene from a quite narrow field of high-resolution view, with educated guesses and assumptions

cma · on Nov 17, 2024

https://vidpanos.github.io/

There are other recent ones that do a new camera from any vantage point, not just rotation+fov changes like the above as well. But they still might want stabilized video as the baseline input if they don't already use it.

Besides saccades and tracking, your eyes also do a lot of stabilization, even counter rotating on the roll axis as you lean your head to the side. I'm not sure if they roll when tracking a subject that rolls, I would think not common enough to need to be a thing.

FatalLogic · on Nov 17, 2024

Thanks - that link is very interesting. You can see some distortion and 'hallucination', which would be a risk with my suggestion. Their video output is great work, but the far end of the fence at the right hand side glitches and vanishes at about 4-5 sec mark, for instance

nairoz · on Nov 16, 2024

I guess yes. Having worked on video processing, it's always better if you can stabilize because it significantly reduces the number of unique tokens, which would be even more useful for the present method. However, you probably lose in generalization performance and not all videos can be stabilized.

robbiemitchell · on July 3, 2024

> some gimmick

"key differentiator" and not necessarily easy to pull off or pay for

robbiemitchell · on June 12, 2024

It wasn't even set up for success at selling.

After years of raising 3 kids, you would think if I ask to add diapers to the cart, it would know something. But no, it would just go with whatever is the top recommended, or first in a search, or something like that. Nothing using the brand or most recent sizes we purchased.

There was no serious attempt to drive real commerce. Instead, Alexa became full of recommendation slots that PMs would battle over. "I set that timer for you. Do you want to try the Yoga skill?"

On the other hand, they have taken on messy problems and solved them well, but not using technology, and for no real financial gain. For example, if you ask for the score of the Tigers game, Alexa has to reconcile which "Tigers" sports team you mean among both your own geography and the worldwide teams, at all levels from worldwide to local, across all sports, might have had games of interest. People worked behind the scenes to manage this manually, tracking teams of interest and filling intent slots daily.

ClassyJacket · on June 12, 2024

The insane lack of basic heuristics in every day apps to do very obvious things like you mentioned baffles me. They can come up with huge scale fuzzy vector search AI suggestion systems for a billion users, but can't think to do stuff like, only suggest things available in your size?

I'm actually working on an app that solves this for a specific use case, tho it isn't in the retail space.

Voice assistants are particularly egregious - they've done all that work to correctly recognise the words I said - i.e. the hard part - but then it breaks because I said "set reminder" instead of "create reminder"??

treflop · on June 12, 2024

My impression with a lot of products is that no one that is substantially involved in making them actually substantially uses them themselves.

transcriptase · on June 12, 2024

I’m convinced nobody on the Google/Nest Home teams has ever used the product outside of testing in a VM.

ngc248 · on June 13, 2024

Product managers are no longer about the users. They just want to have some "impact" and then they move onto another product.

watt · on June 13, 2024

Same when you know how the sausage is made.

AbstractH24 · on June 13, 2024

My question is always if it’s just the company I’m at or is it how the who industry is run? The more companies I work at the more i realize it’s the later.

On the flip side, it’s easy to take for granted what DOES work when you know how much better it could be. I was siting at dinner with an 73 year old man yesterday who could stop talking about how amazing Siri is cause it’ll tell him the population of some country.

ozim · on June 13, 2024

That is when you force yourself and eat your own produced dogfood to make sure it is good.

But when $$$ are rolling in anyway who cares enough?

threatofrain · on June 13, 2024

This only goes so far, trying to use your own head as a simulation or approximation of user experience. Some of us will be building software for people who we will never be in our lifetime.

ghaff · on June 12, 2024

They're hard in different ways (and ML helped with voice recognition to a degree that PhD linguists struggled to do for years.

But to your example. OK. Set and create probably mean the same thing in the context of a reminder. Probably add and a few other things too. Should this go on some running ToDo list app I use? Should it ask me for a due date? Should it go on my calendar? And that's a very simple example.

TeMPOraL · on June 12, 2024

> Should this go on some running ToDo list app I use? Should it ask me for a due date? Should it go on my calendar? And that's a very simple example.

Yes. If it isn't obvious from the context, it should ask.

What it should not do, is demand you to issue all your commands in format of "${brand 1}, do ${something} with ${brand 2} in ${brand 3}". That's what makes current voice assistants a cringe.

TeMPOraL · on June 12, 2024

> Voice assistants are particularly egregious - they've done all that work to correctly recognise the words I said - i.e. the hard part - but then it breaks because I said "set reminder" instead of "create reminder"??

They hardly even managed the hard part. What's surprising for me is that for a year now, ChatGPT app has been miles ahead of SOTA in voice assistants in terms of speech-to-text with whatever the thing is they're using, and somehow none of the voice assistants managed to improve. OpenAI could blow them all out of the water today, if they delegated a couple of engineers to spend a week integrating their app deeper into Android intent system - and 90% of that wouldn't be because GPT-4, but because of speech-to-text model that doesn't suck donkey balls.

lolinder · on June 13, 2024

> somehow none of the voice assistants managed to improve.

No one has been working on the old generation of assistants for years now. They all basically came to the conclusion that the architecture that everyone had settled on was a dead end and wouldn't get any better, so they directed their attention elsewhere.

Now Google is working on it again, but just using an LLM for better intent parsing isn't exciting enough to warrant attention, so in classic Google fashion they launched a brand new product (Gemini) that's going to run alongside Assistant for a few years confusing everyone until they yank Assistant (which still will have features that haven't been ported).

Apple seems to be working on improving Siri rather than starting fresh, but it's taken them a while to get it ready because Apple never moves on something fast.

simiones · on June 13, 2024

Actually, speech-to-text benefits massively from a good language model. It's impossible to do speech to text if you don't understand the language. The better you understand the language and the context of what is being said, the better you will be at speech-to-text. So it's no surprise whatsoever to anyone that the best-in-class language model would have the best in class speech-to-text.

I think a lot of people underestimate how disconnected simple sound patterns are from human speech. It's hard if not impossible to even recognize word boundaries on a phonogram of regular human speech, even for highly eloquent speakers in formal settings. And many sounds are entirely ambiguous, people rarely understand the exact phonemes they use in practice. For example, most native English speakers pronounce the "peech" part of "speech" more like "beach" than like "peach", if you look at a phonogram [0]. Phonetics is really complicated, and varies far more between languages than people tend to assume.

[0] https://www.youtube.com/watch?v=U37hX8NPgjQ

phatskat · on June 15, 2024

> but then it breaks because I said "set reminder" instead of "create reminder"??

Which is wild to me. If my Google home barely mishears “lights on”, I get random Spotify. But “cut the lights”? Works every time to turn them off

shortrounddev2 · on June 13, 2024

Buying the promoted products is the point; they get advertising revenue that way

shortrounddev2 · on June 13, 2024

If they rebuy your most recent purchase instead of the promoted brand, they don't get advertising revenue

m463 · on June 13, 2024

step 1: remove three sponsored diaper brands from the top of your cart...

robbiemitchell · on June 1, 2024

Processing high volumes of unstructured data (text)… we’re using a STAG architecture.

- Generate targeted LLM micro summaries of every record (ticket, call, etc.) continually

- Use layers of regex, semantic embeddings, and scoring enrichments to identify report rows (pivots on aggregates) worth attention, running on a schedule

- Proactively explain each report row by identifying what’s unusual about it and LLM summarizing a subset of the microsummaries.

- Push the result to webhook

Lack of JSON schema restriction is a significant barrier to entry on hooking LLMs up to a multi step process.

Another is preventing LLMs from adding intro or conclusion text.

adamsbriscoe · on June 2, 2024

> Lack of JSON schema restriction is a significant barrier to entry on hooking LLMs up to a multi step process.

(Plug) I shipped a dedicated OpenAI-compatible API for this, jsonmode.com a couple weeks ago and just integrated Groq (they were nice enough to bump up the rate limits) so it's crazy fast. It's a WIP but so far very comparable to JSON output from frontier models, with some bonus features (web crawling etc).

tarasglek · on June 2, 2024

The metallica-esque lightning logo is cool

joatmon-snoo · on June 2, 2024

We actually built an error-tolerant JSON parser to handle this. Our customers were reporting exactly the same issue- trying a bunch of different techniques to get more usefully structured data out.

You can check it out over at https://github.com/BoundaryML/baml. Would love to talk if this is something that seems interesting!

BoorishBears · on June 1, 2024

> Lack of JSON schema restriction is a significant barrier to entry on hooking LLMs up to a multi step process.

How are you struggling with this, let alone as a significant barrier? JSON adherence with a well thought out schema hasn't been a worry between improved model performance and various grammar based constraint systems in a while.

> Another is preventing LLMs from adding intro or conclusion text.

Also trivial to work around by pre-filling and stop tokens, or just extremely basic text parsing.

Also would recommend writing out Stream-Triggered Augmented Generation since the term is so barely used it might as well be made up from the POV of someone trying to understand the comment

robbiemitchell · on June 1, 2024

Asking even a top-notch LLM to output well formed JSON simply fails sometimes. And when you’re running LLMs at high volume in the background, you can’t use the best available until the last mile.

You work around it with post-processing and retries. But it’s still a bit brittle given how much stuff happens downstream without supervision.

fancy_pantser · on June 2, 2024

Constrained output with GBNF or JSON is much more efficient and less error-prone. I hope nobody outside of hobby projects is still using error/retry loops.

joatmon-snoo · on June 2, 2024

Constraining output means you don’t get to use ChatGPT or Claude though, and now you have to run your own stuff. Maybe for some folks that’s OK, but really annoying for others.

fancy_pantser · on June 2, 2024

You're totally right, I'm in my own HPC bubble. The organizations I work with create their own models and it's easy for me to forget that's the exception more than the rule. I apologize for making too many assumptions in my previous comment.

joatmon-snoo · on June 2, 2024

Not at all!

Out of curiosity- do those orgs not find the loss of generality that comes from custom models to be an issue? e.g. vs using Llama or Mistral or some other open model?

int_19h · on June 2, 2024

I do wonder why, though. Constraining output based on logits is a fairly simple and easy-to-implement idea, so why is this not part of e.g. the OpenAI API yet? They don't even have to expose it at the lowest level, just use it to force valid JSON in the output on their end.

jncfhnb · on June 2, 2024

… why would you have the LLM spit out a json rather than define the json yourself and have the LLM supply values?

esafak · on June 2, 2024

If the LLM doesn't output data that conforms to a schema, you can't reliably parse it, so you're back to square one.

jncfhnb · on June 2, 2024

It’s significantly easier to output an integer than a JSON with a key value structure where the value is an integer and everything else is exactly as desired

esafak · on June 2, 2024

That's because you've dumbed down the problem. If it was just about outputting one integer, there would be nothing to discuss. Now add a bunch more fields, add some nesting and other constraints into it...

jncfhnb · on June 2, 2024

The more complexity you add the less likely the LLM is to give you a valid response in one shot. It’s still going to be easier to get the LLM to supply values to a fixed scheme than to get the LLM to give the answers and the scheme

neverokay · on June 2, 2024

Is there a general model that got fine tuned on these json schema/output pairs?

Seems like it would be universally useful.

janpieterz · on June 2, 2024

How would I do this reliably? Eg give me 10 different values, all in one prompt for performance reasons?

Might not need JSON but whatever format it outputs, it needs to be reliable.

jncfhnb · on June 2, 2024

Don’t do it all in one prompt.

janpieterz · on June 2, 2024

Right, but now I’m basically running a huge performance hit, need to parallelize my queries etc.

I was parsing a document recently, 10-ish questions for 1 document, would make things expensive.

Might be what’s needed but not ideal.

jncfhnb · on June 3, 2024

LLM performance is a function of the number of tokens, not queries

yeahwhatever10 · on June 2, 2024

The phrase you want to search is "constrained decoding".

BoorishBears · on June 2, 2024

The best available actually have the fewest knobs for JSON schema enforcement (ie. OpenAI's JSON mode, which technically can still produce incorrect JSON)

If you're using anything less you should have a grammar that enforces exactly what tokens are allowed to be output. Fine Tuning can help too in case you're worried about the effects of constraining the generation, but in my experience it's not really a thing

benreesman · on June 1, 2024

I only became aware of it recently and therefore haven’t done more than play with in a fairly cursory way, but unstructured.io seems to have a lot of traction and certainly in my little toy tests their open-source stuff seems pretty clearly better than the status quo.

Might be worth checking out.

lastdong · on June 2, 2024

“Use layers of regex, semantic embeddings, and scoring enrichments to identify report rows (pivots on aggregates) worth attention, running on a schedule”

This is really interesting, is there any architecture documentation/articles that you can recommend?

evrydayhustling · on June 7, 2024

I'm late to this party, but here's a post I wrote about it. This is more motivation but we are working on technical posts/papers for release. Happy to field emails in the meantime if this is timely for you.

https://www.linkedin.com/pulse/ai-2024-more-answers-fewer-qu...

lastdong · on June 14, 2024

Awesome! Thank you

robbiemitchell · on Nov 5, 2023

It is workable. It’s like $6/month.

The real problem is that most people don’t want to pay for things.

dzikimarian · on Nov 5, 2023

I used IFTTT for a while to manage smart sockets. Then I moved to the Home Assistant.

Works better and doesn't require monthly payment. Just one time purchase of hardware.

It was unnecessarily complex to setup tho, because producers lock down their hardware to their closed cloud. Then they demand monthly payments because they have "ongoing cost", that shouldn't exist in the first place (IFTTT mentioned api fees of various producers as main reason for more aggressive monetization).

I'm happy to pay for a product. I'm not going to pay company that constructs closed ecosystem just because it allows to extract monthly fee from me.

user_7832 · on Nov 5, 2023

I think HA combined with something like Node Red would actually solve OP's question. Only thing being it's still a more techy thing than a polished "drop-in" thing.

robbiemitchell · on Oct 24, 2023

As a Zapier user looking to go with something private for new use cases, how does this compare to an on-prem n8n.io?

zicon35 · on Oct 24, 2023

Great question!

We're different from n8n in the sense that we help you build native integrations with APIs and not in a no-code way.

We will help you build an integration of Slack (for eg) within your product with your users/customers, natively.

That means, your end-users or customers will not know that you are using our APIs underneath as Revert can be fully white-labelled.

We don't offer a workflow UI like n8n at all.

Happy to answer any further questions on this. Feel free to book a time from our cal if you think this could be relevant for you!

Cheers.

robbiemitchell · on July 29, 2023

You can do this in Spotify now by creating a playlist and then playing the station driven from that playlist.

thornewolf · on July 29, 2023

They removed this feature. Now there is a psuedo-replacement called "enhanced shuffle" which interleaves recommended songs in your shuffled playlist. Unfortunately, this drastically reduces the amount of song discovery per unit time.

robbiemitchell · on July 31, 2023

This is a huge bummer! I wonder why they removed the playlist station feature.

delusional · on July 29, 2023

I find that spotify will tend to just smash music from my other playlists and other history into that station. I can never get it to recommend me new stuff.

redwall_hp · on July 29, 2023

The autogenerated genre playlists have this issue too. "House Music" is a very broad genre (or metagenre, given how many subgenres exist), so you'd think that they wouldn't end up being largely comprised of tracks I already have on my own playlist. And the rest end up being kind of bland and not really quite what I'm looking for.

I seem to remember that Spotify used to have a heavy focus on employee-operated playlists that curated new releases, though I didn't really take advantage of them at the time. The newer (cheaper) algorithm-driven ones don't really cut it.