- We already are at Level 1+ with GPT 4, but they are basically assistants and not truly AGI.
- Level 2 "Competent level" is basically AGI (capable of actually replacing many humans in real world tasks). These systems are more generalized, capable of understanding and solving problems in various domains, similar to an average human's ability.
The jump from Level 1 to Level 2 is significant as it involves a transition from basic and limited capabilities to a more comprehensive and human-like proficiency.
However, the exact definition is tautological - capabilities better than 50% of skilled adults.
So IMO the paper basically states in a lot of words that we not at AGI and is restating the common understanding of AGI to be "Level 2 Competent", but doesn't otherwise really add to understanding of AGI.
AGI now means many different things to many different people. I don't think that's really any "common definition" anymore.
For some, it's simply Artificial and Generally Intelligent(perform many tasks, adapt). For some, it might mean any that it needs to do everything a normal human can. For some, non-biological life axiomatically cannot become AGI. For some, it must be "conscious" and "sentient".
For some, it might require literal omniscience and omnipotence and accepting anything as AGI means, to them, that they are being told to worship it as a God. For some, it might mean something more like an AI that is more competent than the most competent human at literally every task.
For some, acknowledging it means that we must acknowledge it has person-like rights. For some it cannot be AGI if it lies. For some it cannot be AGI if it makes any mistake. For some it cannot be AGI until it has more power than humans. These are several definitions and implications that are partially or wholly mutually conflicting but I have seen different people say that AGI is each different one of those.
I've got a much simpler definition: an AGI should be able to produce a better version of itself.
I'm not saying this would necessarily lead to the technological singularity: maybe it's somehow a dead end. Maybe the "better version of itself, which itself shall built a better version of itself" will be stuck at some point, hitting some limit that'd still make it less intelligent than the most intelligent humans. That I don't know.
But what I know is that an AI that is incapable of producing a better version of itself is less intelligent than the humans who created it in the first place.
I actually really like this definition and will be giving it some thought. But right off the bat, that’s not how most people will see it - and so while this definition is certainly thought-provoking and useful, it doesn’t specify much that’s relatable to other tasks and therefore I think will always be a niche definition.
An AI that can make a better version of itself may not be able to communicate in any human language for example; and that is now a de facto requirement for most people to see something as AI I think.
> I'm not saying this would necessarily lead to the technological singularity:
You kinda are though: if it hits a limit and can no longer make a better version of itself, then your definition means the final one in the sequence isn't an AGI even though it's worse parent is.
> But what I know is that an AI that is incapable of producing a better version of itself is less intelligent than the humans who created it in the first place.
Neither necessary nor sufficient:
(1) they are made by teams of expert humans, so an AGI could be smarter than any one of them and still not as capable as the group (kinda like how humans are smarter than evolution, but not smart enough to make a superhuman intelligence even though evolution made us by having lots of entities and time)
(2) one that can do this can still merely be a special-purpose AI that's no good at anything else (like an optimising compiler told to compile its own source code)
(3) what if it can only make its own equal, being already at some upper limit?
They do. 99% of what you think as human intelligence is social, and has been obtained by previous generations and passed to the person. In a sense, we are hugely overfitted on distilled knowledge, actual biological capabilities are much less impressive.
Okay but in reality an AGI is just an agent that can learn new things and reapply existing knowledge to new problems. Generally intelligent, it doesn’t have to mean anything more nor does it imply godlike intelligence.
Everything you just mentioned seems to be some philosophy of sentience or something. A few years ago when ANNs become popular for everything, general intelligence just meant “can do things it wasn’t explicitly trained on”
This definition is itself tautological and also quite flawed. For example at what point in this machine’s development has it attained AGI? What if it learns to/is taught to stop learning? What if the machine is not capable of, e.g. math? What kind of knowledge is legitimate vs illegitimate? In many ways the concept of AGI masks a fundamental social context of the machine to obey standards and only adopt the “correct” knowledge. This is why, e.g. instruction tuning or RLHF was such a leap for the perception of intelligence, because the machines obeyed a social contract with users that was designed into them
This is one of the problems we've had with intelligence we've had for a very long time. We've not been able to break it down well into distinctive pieces for classification. You either have all the pieces of human intelligence, or you're not intelligent at all.
That sents the bar unreasonably high in my opinion. Almost all of humanity does not have skills 'better than 50% of skilled adults' by definition and those definitely qualify as generally intelligent.
It's also rather vague and at least in my first pass skim I'm not seeing them define what it means to be skilled or unskilled. So I'm not sure the metric is even meaningful without this because it's not like you're one day unskilled in driving a car and the next day you're "skilled." Does that mean anyone with a drivers license? Does that mean a professional driver? We talking taxi driver, nascar, rally racing, F1? What? Skills are on a continuous distribution and our definitions of skilled vs unskilled are rather poorly defined and typically revolve around employment rather than capabilities.
I hope I just missed it because the only clarification I saw was with this example
> e.g., “Competent”
or higher performance on a task such as English writing ability would only be measured against
the set of adults who are literate and fluent in English
That's the whole problem with all these definitions: they are rooted in very imprecise terms whose meaning seems to depend on the beholder of the prose.
Wow, what a massive jump between Level 0 and Level 2. They state that their goal is to help facilitate conversation within the community but I feel like such a gap very obviously does not help. The arguments are specifically within these regions and we're less concerned about arguing if a 50th percentile general AI is AGI vs a 99th percentile. It's only the hype people (e.g. X-risk and Elon (who has no realistic qualifications here)) who are discussing those levels.
I know whatever you define things as people will be upset and argue, but the point of a work like this is to just make something we can all point to and be a place to refine from, even if messy. But with this large of a gap it makes such refinement difficult and I do not suspect we'll be using these terms as we move forward. After all, with most technology the approach is typically slow before accelerating (since there's a momentum factor). A lot of the disagreement in the community is if we're in the bottom part of the S curve or at the beginning of the exponential part. I have opinions but no one really knows, so that's where we need flags placed to reduce stupid fights (fights that can be significantly reduced by recognizing we're not using the same definitions but assuming the other person is).
> Level 2 "Competent level" is basically AGI (capable of actually replacing many humans in real world tasks).
I still find that a weird definition for two reasons:
1. All the narrow systems already have replaced humans in many real-word tasks.
2. I'm dubious there is much difference between a level 2 and level 4 general AI. And outperforming a % of humans seems like an odd metric, given that humans have diverse skill sets. A more sane metric would be Nth percentile for M% of tasks human workers perform in 2023.
That's a speculative claim because we don't really know what's involved. We could be one simple generalization trick away, which wouldn't be very significant in terms of effort, just effect.
Mediocre human armed with modern smartphone with connectivity, a 100 popular apps, youtube wealth of knowledge and search engine is clearly in 99-th percentile in most areas of life compared, for example, with people armed with tech from 1940-s. Search engine itself is AI in this sense.
When algorithm becomes everyday occurence it stops being AI and becomes an appliance.
In chess, it only took a few years to go from human optional to human in the loop being non-optimal to the point that it would cause them to lose moste of the time.
I expect that this will become the case in a rapidly increasing number of situations, including many workplaces and even in the military.
A decent workmanlike overview. If I had to criticize:
Reducing things to quasi-orthogonal categories (here quality and scope) is a human habit that's good for learning and characterizing, but perhaps not for accuracy. I would instead have hoped for some algorithmic categories (math, semantics, reasoning (deductive, inductive, probabilistic/weighted...), evaluation (summarization, synthesis), interactivity, etc. That seems closer to the creature's natural joints (to use Plato's phrase).
"Emergence" is mentioned as prior art, but is otherwise largely ignored. To me, that's a feature that distinguishes intelligence from calculation/algorithm. (That it's hard to operationalize makes it no less key.)
Usability is only mentioned as part of UI for autonomy for hybrid human-AI systems. But the overall question is about measuring AGI for purposes of assessing utility and deployment risk. To me, usability is the metric that determines how valuable AGI would be, in broadly-available applications. Its relative unimportance suggests that people are mainly concerned about AI's use in highly-targeted (if also not highly-leveraged) applications, like market or election influence.
Finally, the notion of "progress" itself is a somewhat charged but relatively unnecessary concept. I believe the goal is to adopt the lens that will highlight effectiveness in the high-value and high-risk applications (something like heightened scrutiny in legal cases). In that case, we'd enumerate the value and fault models before deciding on principles and constraining the ontology.
There are lots of problems for which algorithms (with no data/learning required) are trivially superhuman, in the sense that they perform tasks no human could hope to do. AlphaFold is not 'only' an algorithm, as it also needs data to perform well. But this doesn't seem like a satisfactory reason to call AlphaFold "ASI" and not apply this term to your run-of-the-mill mixed-integer-program-solver for some difficult scheduling problem.
These are cases of AI being super intelligent in narrow fields. For now, they're not AGI, but I do expect these abilities to be possible to integrate into generic, multimodal AGI's within our lifetime, and maybe this decade.
Instead of one acronym that now means a million different things to a million different people, there are "levels" that correspond to what percentage of skilled human workers it can replace.
I need to read the full paper now, but going from just the table you posted, I see a problem with their "Narrow" classification, in that it doesn't correct for task characteristics and obscurity - which means the "Narrow" rating isn't useful without also giving the specific task to which it applies, and overall it feels off.
I mean, they put GPT-4 ("SOTA LLMs") for a subset of tasks at Level 2N, a spell checker at Level 3N, and chess and Go solvers at Levels 4N and 5N. It feels to me that any useful grading scheme would give the opposite ordering. Sure, the current classification meets their definitions - "outperforms X% humans" - but at least with chess and Go, the rating is dominated entirely by the fact that very few people play these games at non-joke level, and the activity itself is super specific, and super narrowly scoped, mathematically speaking. Feels almost like calling CPUs Level 5N at "adding numbers together", since they clearly outperform 100% of humans at that. Or, much less jokingly, one could convincingly argue a PID controller is Level 5 Narrow, because not only it's going to be outperforming 100% of humans at its task, it's also learning and adapting (within its operational envelope).
Computer Chess and Go is so far ahead of human ability now that it doesn't matter how many humans decide to take up the game seriously. I see your broader point but it doesn't really matter here.
Yes what is designated a task is a fair bit arbitrary but game playing AI's have been
popularly designated narrow (well basically anything besides LLMs are narrow) for a long time now.
Sure. My point is that this classification is useful only within its category. Say I have a specific problem, like "play chess" or "play StarCraft" or "do maths" or "spellcheck essays". It would be great to have a task-specific list of algorithms ranked by their "Narrow" levels. It would help with use cases like: "oh, model X here spellchecks at level 5N, but is super complex, meanwhile algorithm Y is only Level 3N, but it fits on a napkin; Y is more than sufficient for my MVP". But this rating doesn't let us compare between tasks. That's fine by itself, but then it doesn't make sense to use the same rating for both specialized and general algorithms.
I think "general" needs to be more than a boolean though: The standard "wide range of non-physical tasks, including metacognitive abilities like learning new skills" is met by ChatGPT (as in the chart), which is much better in some languages than others, and at some tasks than others.
But how wide does it have to be to count as "wide"? A brain upload of any human ought to count, but most of us have much less breadth than any LLM.
I think they make a category error by putting ChatGPT etc in the General column. As far as I can tell we only have narrow definitions of 'intelligence' and ChatGPT falls into one of those. I don't know of a general agreement on what 'General Intelligence' is in people, so how can we categorise anything is AGI? Knowing a bit about how ChatGPT works I feel it is a lot more like a chess program than a human.
ChatGPT is the most general system ever created and packaged. You can throw arbitrary problems at it and get half-decent solutions for most of them. It can summarize and expand text, translate both explicitly and internally[0], play games, plan, code, transcode, draw, cook, rhyme, solve riddles and challenges, do basic reasoning, and many, many other things. Whether one is leaning more towards "stochastic parrot" or more towards "sparks of AGI" - it's undeniable that it's a general system.
--
[0] - The whole "fine-tune LLM on a super-specific task but only in language X (which is not English), its performance for that task improves in languages other than X" part, indicating it's not just learning tokens, but the meanings behind them too.
It can't cook, it can talk about cooking. It wouldn't be able to get a pan out of a drawer. I know all we do these days is produce text tokens on the internet, but it is in fact in itself a domain specific task. If you can talk about opening a can of beans you're an LLM. If you can do that and actually open the physical can we may be a little bit further towards general intelligence.
We don't even have a full self driving system, the limited systems we have are not LLMs, and there isn't even a system on the Horizon that can drive and talk to you about the news and cook you a dinner.
If that was a valid criticism of its intelligence, Stephen Hawking would have spent most of life categorised as a vegetable.
Also:
> We don't even have a full self driving system,
debatable given the accident rate of the systems we do have
> the limited systems we have are not LLMs,
they tautologically are LLMs
> and there isn't even a system on the Horizon that can drive and talk to you about the news and cook you a dinner
There's at least four cooking robots in use, and that's just narrow AI and used to show off. Here's one from 14 years back: https://youtu.be/nv7VUqPE8AE
Stephen Hawking lost the capacity to move because his ALS paralyzed him, not because his brain lacked the capacity to do so, come on this has to be the worst analogy of the year. Also no, driving systems are not LLMs. LLMs are large language models, no existing self driving system runs on a language model. And also, that's not what the word "tautology" means. "All bachelors are unmarried" is a tautology.
Ah, you wrote unclearly, it sounded like you were asserting that no system was an LLM rather than no driving system.
So while your claim is still false, I will accept that it isn't tautologically so.
Likewise, I am demonstrating that the actual definition you're using here is poor due to the consequence of it ruling out Stephen Hawking, and that goal means that the reason why he couldn't do things is unimportant: you still ruled him out with your standard.
Transformer models are surprisingly capable in multiple domains, so although ChatGPT hasn't got relatively many examples of labeled motor control input/output sequences and corresponding feedback values, this was my first search result for "llm robot control": https://github.com/GT-RIPL/Awesome-LLM-Robotics (note several are mentioning specifically ChatGPT).
That's not a category error. GPT is general. It is able to perform many tasks (Creative Writing, playing Chess, Poker and other games, language translation, Code, robot piloting) etc
Arguments like yours are why I regard "generality" (certainly in the context of AGI) as a continuum rather than a boolean. AlphaZero is more general than AlphaGo Zero, as the former can do three games and the latter only one. All LLMs are much more general than those game playing models, even if they aren't so wildly superhuman on any specific skill, and `gpt-4-vision-preview` is more general than any 3.5 model as 4 can take image inputs while the 3.5's can't.
Yes. If you read "Computing Machinery and Intelligence" this idea of generality being a continuum is a point that Turing makes actually (albeit in different words). What constitutes generality of an AI is really going to be very sensitive to your metric and the assessment is going to vary a lot from observer to observer.
Paid marketing content isn’t really relevant. Jobs are being lost left and right due to the overall economy. The only valid takeaway is that BCG consultants are such low quality that a chat bot can improve their productivity. I’d avoid such consultants, but that’s already known.
Additionally, chatgpt is the running joke - full of mistakes and misinformation. Probably appealing to that market segment that’s easily gullible and falls for whatever the running conspiracy theory is popular at the time.
>While strong AI might be one path to achieving AGI, there is no scientific consensus on methods for determining whether machines possess strong AI attributes such as consciousness
Maybe we should employ the methods we use to ascertain that fellow human beings are conscious entities with subjective experience?
Also, consciousness probably optional for intelligence:
> Maybe we should employ the methods we use to ascertain that fellow human beings are conscious entities with subjective experience.
Historically, this has frequently included refusing to accept $outgroup are real people with any subjective experience (or at least, any that matters).
I'd like us to do better — not that I can actually suggest any test that would do this objectively, but I would like us to do better.
> Maybe we should employ the methods we use to ascertain that fellow human beings are conscious entities with subjective experience?
What methods? Are there any? Is there suddenly some consensus among philosophers about the p-zombie thought experiment?
AFAIK effectively the best we can do is (a) ascertain that you yourself are a conscious entity with subjective experience and then (b) assume - with no way to ascertain that - that the humans around you are like you.
Such a fantastic paper overall, it was a pleasure to read, it's very accessible, and greatly informative. If anyone is new to the idea and is seeking a definition of AGI, reading this paper is easy and is immeasurably superior to merely googling or reading the wikipedia article.
My only criticism for the article within the particular set of goals outlined above is this:
The paper seems to be under-exploring two aspects that appear to be worth exploring explicitly and in detail:
1. Ability to rapidly learn from a very limited amount of instructory data post-deployment and substantially advance in its abilities in the domain of the learning post-deployment, as opposed to possessing certain level of professional skills immediately on-deployment.
2. Ability to invent entirely new ideas, like for instance inventing an entirely new system of numbers or another symbolic or other system on its own, all to advance its current goals.
Both in part to distinguish an AGI from a large collection of glued together Narrow AIs, each purpose-built for a specific, but entire domain of fairly loosely related tasks, and in part to ensure that a high level AGI system always appears at least as intelligent as an average human teenager across the full spectrum of all possible cognitive and metacognitive interactions with the said teenager (be those interactions initiated by another human or by the cognitive projection of the environment).
Without these abilities, there could be a system - it could be argued, I believe - that would technically (or at least arguably) satisfy the definition of an ASI level of AGI as per the paper that an average human child / teenager may appear more intelligent in comparison to, exceeding the said system in plasticity and real-time limited-input adaptability of the intellect rather than off-the-shelf proficiency in trained adult human tasks: a high level AGI system might be initially trained on trillions of tokens of input data, but once deployed, it needs to be able to acquire new skills and proficiencies from mere tens-to-thousands of input examples - such that humans do.
Perhaps the framework presented by the paper intended to silently encompass these abilities and the remarks above, but surely they deserved a separate discussion, such as other aspects of the definitions and the framework proposed by the paper are indeed explicitly discussed.
Similarly, not including "autonomy" into the "six principles" (making them seven) for composing a definition of an AGI and only discussing it briefly and on a side also appears to be a questionable choice for the same reasons.
Frankly, I think the pathwai.org taxonomy is far more useful in the long-term even if many of the individual attributes are speculative. Frankly, I'm a little surprised that Deep Mind neglected to cite pathwai.org's research. I'd highly recommend checking it out (though full disclosure, as the author I have my biases.) http://www.pathwai.org/index.html (Desktop only, best at high resolutions)
Our ethics largely revolve around harm and permanence.
It's rarely unethical to do something that is easily reversed or that causes no harm. That's pretty likely to be the case with any AI...we can checkpoint their memory, or whatever.
So, right or wrong, I think our current ethical standards will lead us to believe that we can do anything we like to an AI that is operating in a temporary-memory mode, and any harm caused could be easily reset like it never happened.
also i think "torture" is a real stretch. torture is a state we can have triggered by various methods; we can just remove that state as a possibility for digital intelligence (probably?).
On a high level assuming the concept of an AGI is even possible, I think AGI would be a system that creates ideas and makes decisions on it's own without being "instructed" what to do like the LLMs we are seeing today.
This is what happens when you co-opt vernacular language to draw in research money.
Had researchers stuck with a new technical term for their innovations, they wouldn’t have to convince people that their metaphor is not a metaphor.
They could just pursue some “hypercapable general computing system” that does all the same things as “Level 5 AGI” and barely stir a whiff of controversy or debate. Regulation would be a boring technical matter rather than opportunity for political grandstanding, and HN would be cluttered by 44.34% fewer endlessly repetitive comment threads.
- We already are at Level 1+ with GPT 4, but they are basically assistants and not truly AGI.
- Level 2 "Competent level" is basically AGI (capable of actually replacing many humans in real world tasks). These systems are more generalized, capable of understanding and solving problems in various domains, similar to an average human's ability. The jump from Level 1 to Level 2 is significant as it involves a transition from basic and limited capabilities to a more comprehensive and human-like proficiency.
However, the exact definition is tautological - capabilities better than 50% of skilled adults.
So IMO the paper basically states in a lot of words that we not at AGI and is restating the common understanding of AGI to be "Level 2 Competent", but doesn't otherwise really add to understanding of AGI.