Hacker Newsnew | past | comments | ask | show | jobs | submit | robbiemitchell's commentslogin

This appears to be an excuse to rep your own AI startup.


Yes I agree.

You should be transparent about promoting your own company. Sneaking it into like this will definitely backfire.

What prevents you from being open about it?


I get a better first pass at code by asking it to write code at the level of a "staff level" or "principal" engineer.

For any task, whether code or a legal document, immediately asking "What can be done to make it better?" and/or "Are there any problems with this?" typically leads to improvement.


One helpful addition would be Requests Per Minute (RPM), which varies wildly and is critical for streaming use cases -- especially with Bedrock where the quota is account wide.


For training, would it be useful to stabilize the footage first?


Stabilization appears to be a subset of a literally wider, but more rewarding, challenge: reconstructing the whole area that is scanned by the camera. It could be better to work on that challenge, not on simple stabilization.

That's similar to how the human visual system 'paints' a coherent scene from a quite narrow field of high-resolution view, with educated guesses and assumptions


https://vidpanos.github.io/

There are other recent ones that do a new camera from any vantage point, not just rotation+fov changes like the above as well. But they still might want stabilized video as the baseline input if they don't already use it.

Besides saccades and tracking, your eyes also do a lot of stabilization, even counter rotating on the roll axis as you lean your head to the side. I'm not sure if they roll when tracking a subject that rolls, I would think not common enough to need to be a thing.


Thanks - that link is very interesting. You can see some distortion and 'hallucination', which would be a risk with my suggestion. Their video output is great work, but the far end of the fence at the right hand side glitches and vanishes at about 4-5 sec mark, for instance


I guess yes. Having worked on video processing, it's always better if you can stabilize because it significantly reduces the number of unique tokens, which would be even more useful for the present method. However, you probably lose in generalization performance and not all videos can be stabilized.


> some gimmick

"key differentiator" and not necessarily easy to pull off or pay for


It wasn't even set up for success at selling.

After years of raising 3 kids, you would think if I ask to add diapers to the cart, it would know something. But no, it would just go with whatever is the top recommended, or first in a search, or something like that. Nothing using the brand or most recent sizes we purchased.

There was no serious attempt to drive real commerce. Instead, Alexa became full of recommendation slots that PMs would battle over. "I set that timer for you. Do you want to try the Yoga skill?"

On the other hand, they have taken on messy problems and solved them well, but not using technology, and for no real financial gain. For example, if you ask for the score of the Tigers game, Alexa has to reconcile which "Tigers" sports team you mean among both your own geography and the worldwide teams, at all levels from worldwide to local, across all sports, might have had games of interest. People worked behind the scenes to manage this manually, tracking teams of interest and filling intent slots daily.


The insane lack of basic heuristics in every day apps to do very obvious things like you mentioned baffles me. They can come up with huge scale fuzzy vector search AI suggestion systems for a billion users, but can't think to do stuff like, only suggest things available in your size?

I'm actually working on an app that solves this for a specific use case, tho it isn't in the retail space.

Voice assistants are particularly egregious - they've done all that work to correctly recognise the words I said - i.e. the hard part - but then it breaks because I said "set reminder" instead of "create reminder"??


My impression with a lot of products is that no one that is substantially involved in making them actually substantially uses them themselves.


I’m convinced nobody on the Google/Nest Home teams has ever used the product outside of testing in a VM.


Product managers are no longer about the users. They just want to have some "impact" and then they move onto another product.


Same when you know how the sausage is made.


My question is always if it’s just the company I’m at or is it how the who industry is run? The more companies I work at the more i realize it’s the later.

On the flip side, it’s easy to take for granted what DOES work when you know how much better it could be. I was siting at dinner with an 73 year old man yesterday who could stop talking about how amazing Siri is cause it’ll tell him the population of some country.


That is when you force yourself and eat your own produced dogfood to make sure it is good.

But when $$$ are rolling in anyway who cares enough?


This only goes so far, trying to use your own head as a simulation or approximation of user experience. Some of us will be building software for people who we will never be in our lifetime.


They're hard in different ways (and ML helped with voice recognition to a degree that PhD linguists struggled to do for years.

But to your example. OK. Set and create probably mean the same thing in the context of a reminder. Probably add and a few other things too. Should this go on some running ToDo list app I use? Should it ask me for a due date? Should it go on my calendar? And that's a very simple example.


> Should this go on some running ToDo list app I use? Should it ask me for a due date? Should it go on my calendar? And that's a very simple example.

Yes. If it isn't obvious from the context, it should ask.

What it should not do, is demand you to issue all your commands in format of "${brand 1}, do ${something} with ${brand 2} in ${brand 3}". That's what makes current voice assistants a cringe.


> Voice assistants are particularly egregious - they've done all that work to correctly recognise the words I said - i.e. the hard part - but then it breaks because I said "set reminder" instead of "create reminder"??

They hardly even managed the hard part. What's surprising for me is that for a year now, ChatGPT app has been miles ahead of SOTA in voice assistants in terms of speech-to-text with whatever the thing is they're using, and somehow none of the voice assistants managed to improve. OpenAI could blow them all out of the water today, if they delegated a couple of engineers to spend a week integrating their app deeper into Android intent system - and 90% of that wouldn't be because GPT-4, but because of speech-to-text model that doesn't suck donkey balls.


> somehow none of the voice assistants managed to improve.

No one has been working on the old generation of assistants for years now. They all basically came to the conclusion that the architecture that everyone had settled on was a dead end and wouldn't get any better, so they directed their attention elsewhere.

Now Google is working on it again, but just using an LLM for better intent parsing isn't exciting enough to warrant attention, so in classic Google fashion they launched a brand new product (Gemini) that's going to run alongside Assistant for a few years confusing everyone until they yank Assistant (which still will have features that haven't been ported).

Apple seems to be working on improving Siri rather than starting fresh, but it's taken them a while to get it ready because Apple never moves on something fast.


Actually, speech-to-text benefits massively from a good language model. It's impossible to do speech to text if you don't understand the language. The better you understand the language and the context of what is being said, the better you will be at speech-to-text. So it's no surprise whatsoever to anyone that the best-in-class language model would have the best in class speech-to-text.

I think a lot of people underestimate how disconnected simple sound patterns are from human speech. It's hard if not impossible to even recognize word boundaries on a phonogram of regular human speech, even for highly eloquent speakers in formal settings. And many sounds are entirely ambiguous, people rarely understand the exact phonemes they use in practice. For example, most native English speakers pronounce the "peech" part of "speech" more like "beach" than like "peach", if you look at a phonogram [0]. Phonetics is really complicated, and varies far more between languages than people tend to assume.

[0] https://www.youtube.com/watch?v=U37hX8NPgjQ


> but then it breaks because I said "set reminder" instead of "create reminder"??

Which is wild to me. If my Google home barely mishears “lights on”, I get random Spotify. But “cut the lights”? Works every time to turn them off


Buying the promoted products is the point; they get advertising revenue that way


If they rebuy your most recent purchase instead of the promoted brand, they don't get advertising revenue


step 1: remove three sponsored diaper brands from the top of your cart...


Processing high volumes of unstructured data (text)… we’re using a STAG architecture.

- Generate targeted LLM micro summaries of every record (ticket, call, etc.) continually

- Use layers of regex, semantic embeddings, and scoring enrichments to identify report rows (pivots on aggregates) worth attention, running on a schedule

- Proactively explain each report row by identifying what’s unusual about it and LLM summarizing a subset of the microsummaries.

- Push the result to webhook

Lack of JSON schema restriction is a significant barrier to entry on hooking LLMs up to a multi step process.

Another is preventing LLMs from adding intro or conclusion text.


> Lack of JSON schema restriction is a significant barrier to entry on hooking LLMs up to a multi step process.

(Plug) I shipped a dedicated OpenAI-compatible API for this, jsonmode.com a couple weeks ago and just integrated Groq (they were nice enough to bump up the rate limits) so it's crazy fast. It's a WIP but so far very comparable to JSON output from frontier models, with some bonus features (web crawling etc).


The metallica-esque lightning logo is cool


We actually built an error-tolerant JSON parser to handle this. Our customers were reporting exactly the same issue- trying a bunch of different techniques to get more usefully structured data out.

You can check it out over at https://github.com/BoundaryML/baml. Would love to talk if this is something that seems interesting!


> Lack of JSON schema restriction is a significant barrier to entry on hooking LLMs up to a multi step process.

How are you struggling with this, let alone as a significant barrier? JSON adherence with a well thought out schema hasn't been a worry between improved model performance and various grammar based constraint systems in a while.

> Another is preventing LLMs from adding intro or conclusion text.

Also trivial to work around by pre-filling and stop tokens, or just extremely basic text parsing.

Also would recommend writing out Stream-Triggered Augmented Generation since the term is so barely used it might as well be made up from the POV of someone trying to understand the comment


Asking even a top-notch LLM to output well formed JSON simply fails sometimes. And when you’re running LLMs at high volume in the background, you can’t use the best available until the last mile.

You work around it with post-processing and retries. But it’s still a bit brittle given how much stuff happens downstream without supervision.


Constrained output with GBNF or JSON is much more efficient and less error-prone. I hope nobody outside of hobby projects is still using error/retry loops.


Constraining output means you don’t get to use ChatGPT or Claude though, and now you have to run your own stuff. Maybe for some folks that’s OK, but really annoying for others.


You're totally right, I'm in my own HPC bubble. The organizations I work with create their own models and it's easy for me to forget that's the exception more than the rule. I apologize for making too many assumptions in my previous comment.


Not at all!

Out of curiosity- do those orgs not find the loss of generality that comes from custom models to be an issue? e.g. vs using Llama or Mistral or some other open model?


I do wonder why, though. Constraining output based on logits is a fairly simple and easy-to-implement idea, so why is this not part of e.g. the OpenAI API yet? They don't even have to expose it at the lowest level, just use it to force valid JSON in the output on their end.


… why would you have the LLM spit out a json rather than define the json yourself and have the LLM supply values?


If the LLM doesn't output data that conforms to a schema, you can't reliably parse it, so you're back to square one.


It’s significantly easier to output an integer than a JSON with a key value structure where the value is an integer and everything else is exactly as desired


That's because you've dumbed down the problem. If it was just about outputting one integer, there would be nothing to discuss. Now add a bunch more fields, add some nesting and other constraints into it...


The more complexity you add the less likely the LLM is to give you a valid response in one shot. It’s still going to be easier to get the LLM to supply values to a fixed scheme than to get the LLM to give the answers and the scheme


Is there a general model that got fine tuned on these json schema/output pairs?

Seems like it would be universally useful.


How would I do this reliably? Eg give me 10 different values, all in one prompt for performance reasons?

Might not need JSON but whatever format it outputs, it needs to be reliable.


Don’t do it all in one prompt.


Right, but now I’m basically running a huge performance hit, need to parallelize my queries etc.

I was parsing a document recently, 10-ish questions for 1 document, would make things expensive.

Might be what’s needed but not ideal.


LLM performance is a function of the number of tokens, not queries


The phrase you want to search is "constrained decoding".


The best available actually have the fewest knobs for JSON schema enforcement (ie. OpenAI's JSON mode, which technically can still produce incorrect JSON)

If you're using anything less you should have a grammar that enforces exactly what tokens are allowed to be output. Fine Tuning can help too in case you're worried about the effects of constraining the generation, but in my experience it's not really a thing


I only became aware of it recently and therefore haven’t done more than play with in a fairly cursory way, but unstructured.io seems to have a lot of traction and certainly in my little toy tests their open-source stuff seems pretty clearly better than the status quo.

Might be worth checking out.


“Use layers of regex, semantic embeddings, and scoring enrichments to identify report rows (pivots on aggregates) worth attention, running on a schedule”

This is really interesting, is there any architecture documentation/articles that you can recommend?


I'm late to this party, but here's a post I wrote about it. This is more motivation but we are working on technical posts/papers for release. Happy to field emails in the meantime if this is timely for you.

https://www.linkedin.com/pulse/ai-2024-more-answers-fewer-qu...


Awesome! Thank you


It is workable. It’s like $6/month.

The real problem is that most people don’t want to pay for things.


I used IFTTT for a while to manage smart sockets. Then I moved to the Home Assistant.

Works better and doesn't require monthly payment. Just one time purchase of hardware.

It was unnecessarily complex to setup tho, because producers lock down their hardware to their closed cloud. Then they demand monthly payments because they have "ongoing cost", that shouldn't exist in the first place (IFTTT mentioned api fees of various producers as main reason for more aggressive monetization).

I'm happy to pay for a product. I'm not going to pay company that constructs closed ecosystem just because it allows to extract monthly fee from me.


I think HA combined with something like Node Red would actually solve OP's question. Only thing being it's still a more techy thing than a polished "drop-in" thing.


As a Zapier user looking to go with something private for new use cases, how does this compare to an on-prem n8n.io?


Great question!

We're different from n8n in the sense that we help you build native integrations with APIs and not in a no-code way.

We will help you build an integration of Slack (for eg) within your product with your users/customers, natively.

That means, your end-users or customers will not know that you are using our APIs underneath as Revert can be fully white-labelled.

We don't offer a workflow UI like n8n at all.

Happy to answer any further questions on this. Feel free to book a time from our cal if you think this could be relevant for you!

Cheers.


You can do this in Spotify now by creating a playlist and then playing the station driven from that playlist.


They removed this feature. Now there is a psuedo-replacement called "enhanced shuffle" which interleaves recommended songs in your shuffled playlist. Unfortunately, this drastically reduces the amount of song discovery per unit time.


This is a huge bummer! I wonder why they removed the playlist station feature.


I find that spotify will tend to just smash music from my other playlists and other history into that station. I can never get it to recommend me new stuff.


The autogenerated genre playlists have this issue too. "House Music" is a very broad genre (or metagenre, given how many subgenres exist), so you'd think that they wouldn't end up being largely comprised of tracks I already have on my own playlist. And the rest end up being kind of bland and not really quite what I'm looking for.

I seem to remember that Spotify used to have a heavy focus on employee-operated playlists that curated new releases, though I didn't really take advantage of them at the time. The newer (cheaper) algorithm-driven ones don't really cut it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: