That’s fair! It has been mentioned before so we‘ll likely build that into the platform. Would you like us to upload your model to your huggingface account, download the weights or choose an inference provider we then upload it to?
I love the idea of the product! I would trust your solution to be the best for very simple use cases but not for multistep or ReAct agents. Any thoughts / insights on that?
I think the demo could be more exciting, the voice of the person talking sounds like he's bored haha
Ha - here's the advice I give to YC startups about making demo videos for HN:
"What works well for HN is raw and direct, with zero production values. Skip any introductions and jump straight into showing your product doing what it does best. Voiceover is good, but no marketing slickness—no fancy logos or background music!"
I guess there's zero production values and zero production values...
Totally agree. Raw is great, but energy matters too. If the person sounds bored, it's hard to get excited about the product—even if it's amazing. Passion is contagious.
That's true, thanks for the feedback! In the end, it wasn't boredom, but the long work - put too much energy into the platform ;) Taking it to heart for the next one!
Yes, great point. We are currently working on multistep RL.
The big problem with the trivial approach (give a single reward to the entire (ReAct) trajectory) is that the model receives a weak learning signal per decision (called credit assignment problem in literature), i.e. the individual decisions are not properly taken into account, which will then make the training unstable. I guess this has been an unsolved problem for a long time; however was not really looked at since generalist “planning” agents were not a big thing in RL until o1/DeepSeek.
IMO, the most promising approach to this is something along the lines of MA-RLHF (https://arxiv.org/abs/2410.02743) but adapted to the real world, i.e., spitting up the reward model to grade individual actions inside the trajectory to reduce the “attention distance” between the reward and the decision.
Hello, fifi here. I made this app because I was super tired of my friends not shipping. They were always talking about what they would do and then proceeed to never do it.
Funnily enough I got half of them motivated to accomplish what they set out to but the other half just admitted they're not so driven.
Anyway, have a try and let me know your thoughts :)
I don't know what they've done, but eventually it unsticks if you just keep swiping. Then it gets stuck again, then it scrolls again. It's navigable, given sufficient patience and effort.
set C1 = A1 + B1 = 7
now change C1 = 14 expected A1 = 6 expected B1 = 8
what it did A1 = 7 B1 = 7
great