Congratulations! This looks really great. What've you found to be the best hands / end effectors these days? When do you think we'll have good, reliable 5 finger hands that are ~reasonably priced?
I'm not convinced that 5 finger hands are necessary right now, but there is a really long tail of hand suppliers that we've been exploring to help get the price down.
I think at volume the price for a good set of hands should settle somewhere around $300-500. Most of it comes down to meeting suppliers where they're at and negotiating mutually beneficial deals. It's not magic but it does require having a good understanding of the hardware in order to negotiate well.
Actually yea, the benefit of our parallel gripper is that we get some proprioceptive feedback which we can't get from the current 5 finger hand. I'm not sure how important this will be long term - I think vision can eventually mostly compensate if the ML models are good enough
Off the shelf robots -- we've got our models running on dozen+ different robot types (and have this specific generalization demo working on multiple platforms too.)
I'll bite. I _worked_ at Stripe. Stripe has no authority in decision making here, the issuing bank decides (eg the customers bank) who wins and losses chargebacks. Stripe is a conduit of information, not a party to the decision.
I saw your foundation model is trained on data from several different robots. Is the plan to eventually train a foundation model that can control any robot zero shot? That is, the effect of actuations on video/sensor input is collected and understood in-context and actuations are corrected to yield intended behavior. All in-context. Is this feasible?
More specifically, has your model already exhibited this type of capability, in principle?
Nearly 2 years ago I bet a roboticist $10 that we’d have “sci-fi” robots in 2 years.
Now, we didn’t set good criteria for the bet (it was late at night). However, my personal criteria for “scifi” are twofold:
1. Robots that are able to make peanut butter sandwiches without explicit training
2. Robots able to walk on sand (eg Tatooine)
Based on your current understanding, who won the bet? Also, what kind of physical benchmarks do you associate with “sci-fi robots”?
Hi! Very cool results. Are you able to share some numbers about the slope of the scaling curve you found, i.e. how performance responds to a growing nr of demonstrations?
Academically I'd also be very interested how much of a data efficiency improvement you achieved with the pretrained model + task specific post-training versus from-scratch task specific training - like, if post training requires say 50 additional demos, and from-scratch on smaller model requires say 250 demos (or whatever) to match performance, that would be an interesting quntification of the efficiency benefit of using the big foundation model
How does the post-training step work? In the case of t-shirt folding, does a supervisor perform the folding first, many times? Or is the learning interactive, where a supervisor corrects the robot if it does something wrong?
(i'm an investor in the company, and invested over 3 years ago.)
this product has always been about AI—what they launched is almost exactly what they pitched me. their expectation of where the world going ended up being prescient.
Call them tomorrow and ask them to fire whoever caused the AI to hallucinate on their tech demo and give a wholly incorrect answer to the question of "where is the next eclipse". Get them to fire whoever didn't check that video for accuracy while you're at it.
this is probably the most privacy-forward hardware device on the market—you have to physically be making contact with the device for it to begin listening (at which point an LED is prominently visible) and it will stop listening as soon as you break contact.
Echo devices, for example, were sold as having a "hardware mute switch" from day one. Sure enough, teardown after teardown[0] has confirmed the hardware mute switch actually physically disables the mic (cuts power to the ADC, mic lines, etc).
If this is implemented in software it's no different than a phone and worse than an Echo.
Physically make contact as in, tap it with your hand TNG style? Or worse, hold contact with your hand? How do you project the laser display and talk without using both hands? Is the hand criss crossing difficult I'm this situation?
An LED comes on... Is it bright? Can you see it direct line of sight from your eye to your shirt without fussing with it? Is the LED just there for others to know the owner isn't recording the conversation?
Can the camera passively watch in hardware with a firmware update? Is the physical contact for audio capture in hardware or software?
Honestly, I don't think the concern is this particular product or company, even if they can truly adhere to a privacy-first policy. For me the consideration is a slow erosion of privacy from any company or product. For instance: twenty years ago the idea that someone could quickly take a discreet high-quality video with something in their pocket wasn't possible. Smart phones made that possible, then we see things like Google Glass and now these accessory pendant devices will make it even easier. To be clear, I'm not against things like the Humane and Rewind pendants, I'm just curious about how they will impact society, especially considering how quickly we're moving without putting much thought into their impacts.
A lot of people are saying that using this device would require speaking all kinds of private things out loud in public, but people would likely alter their behavior and use of the device in public. The nature of the questions they ask would be different, they would self-censor. In private they'd use it differently. People don't watch porn on their phone in the subway (mostly) and they wouldn't state their credit card info out loud using this on a subway either. If you have to say "take a photo" then the people around you know you are taking a photo. If it can record video it should beep occasionally or something. I still don't have a complete idea of how the UI works though, can the projector project onto a wall instead of your hand? Can you listen to replies via wireless earbuds? Would like to see something more in depth about how to use it and what can be accomplished with it.
- he's not touching it during the phone call
- it's not super clear in the demo when he's saying "your engagement comes through your voice, touch, gesture, or the lasering display"
How do you engage through (a) voice or (b) gesture then?
Even if that's true, if it sees any success it will both normalize that type of device in public, and very shortly see aliexpress flooded with a bunch of cheap clones from companies with no such beliefs.
tbf, that precedent went a long time ago when most people got powerful computers with sophisticated voice recording capabilities in their pockets and even on their wrist...
That might be nice, but this simply refers to the information capture window, how about the bigger problem being all your data being beamed to OAI servers ?
privacy forward hardware for the sensors.
But i am assuming all input to the model can be used for training? Just as it is current standard for any AI assistant?
So my conversations, my calendar every thing will be open to one entity. Otherwise it would not be able to condense them into a summary. i dont have privacy anymore if the entity reads everything. Even if my friends do not consent to their texts being fed into the AI as input.
This is a privacy issue for me, regardless of me allowing it to activate camera or microphone. Nothing stops a hacker with access to query the entity to spit out all users that have a depressed sentiment analysis of their texting history etc...
I believe they're looking to imply here that they own their own charter, rather than renting someone else's, which is how almost all U.S. fintech companies operate (look in the website footer of, say, Unit and you'll see: "Banking services are provided by Unit's partner banks who are Member FDIC.")
this is an important point. There are other providers who offer programatic creation of bank accounts, payments, etc. But all existing solutions wrap a bank, who then wrap middleware providers and core systems. When you work with Column, you're working with only Column. This has implications for cost (fewer people taking a slice of the pie), performance/usability/experience (modern, tightly integrated systems), and development velocity (fewer players in the game of Telephone). Column collapses the layers of the financial services stack and exposes this functionality via API.
I’m sure costs is part of the equation but I’d imagine the control is far more important. Reselling legacy bank services means you’re limited to what they can do, which is usually not much. Most finance technology is heavily limited by what they can do… because of their partners, and that’s why they’re usually just nicer interfaces to the same old services. Hence banks like Monzo in the UK building their own infrastructure from the ground up too. The less you’re dependent on legacy technology, the more you can do.
Stripe Reader may be a more accurate link: https://stripe.com/terminal/stripe-reader. This is Stripe announcing its own own hardware that developers can use to build their own point of sale payments experiences.