Saturday, April 25, 2026

Complementary Intelligence

 In the following, I will try sketch a way of thinking about human intelligence and human nature through emphasizing its difference from the various methods and systems we call artificial intelligence. This is rooted in my strong belief that talking about a single-dimensional “intelligence” that one can have more or less of, and the obvious extension to asking whether humans or machines are “more intelligent”, is actively harmful for understanding both human and machine intelligence. What matters is the qualitative difference. Between humans and machines, but also between different approaches to AI. We can even think of the different approaches in a geometric framework, with the implication that any type of intelligence must have a direction as well as a magnitude. This perspective, which I call Complementary Intelligence, also suggests a positive research program, that seeks to find the types of intelligence that we do not currently have but that could be interesting to humans, rather than simply imitating human intelligence.

Ok, this was a lot. Let’s rewind the tape, and start by looking at the history of artificial intelligence and the ways we think about it.


You can understand the human mind by looking at the history of our failures at modeling it. For a very long time, we have tried to make machines in our image. After we invented the digital computer, this development sped up and we called it Artificial Intelligence. During the last 70 years or so, the AI research community invented a number of clever ways to make computers do things that so far only humans could. Usually, we set ourselves some problem – play Chess, prove mathematical theorems, translate from Russian to English, or something like that - that humans were good at. Then, we came up with some way of making a computer perform the task. Success!

But when we look closer, we find that the way the computer does the task is typically quite different from how humans do it. The computer may be much better than humans in some ways, and much worse in other ways, and in general just different. So we conclude that we didn’t really achieve “real” artificial intelligence after all. Maybe we were trying to solve the wrong task? So we find another task to solve, and another way of making the computer solve it, and try again. As a result, the study of artificial intelligence has contributed a wide range of technologies, many of which are crucial to our technological civilization. In our urge to talk about AI as a single thing, it is often underappreciated how many these technologies are, and also how different they are. Path finding, object-oriented programming, and optical character recognition are quite different things, but they are all outcomes of AI research. They are also in use in myriads of places and the world would grind to a halt if they disappeared.

Another result of this process is that we know more about what we are not. Every time we realize that the successful solution we have built is fundamentally different than us, we learn something about ourselves. We learn that we are not like that technology, or not only like that technology. However good that machine is at identifying traffic signs or playing Pac-Man, it does so in a fundamentally different way than we do it.

In a sense, this is just a continuation of our long history of using the defining technology of whatever age we are in as a metaphor or lens for understanding ourselves. Descartes, living in an age recently transformed by the mechanical measurement of time, thought of animals and humans as being like clockworks. Freud thought of our drives as producing something like pneumatic pressure requiring outlets, much like the steam engines that pulled trains and powered factories. The telephone switchboard was a popular metaphor in early 20th century neuroscience. But of course, if you try to actually build a mind that functions like a clockwork, a steam engine, or a telephone switchboard, you rapidly realize that there’s a lot missing. And from the incompleteness of the metaphor, you conclude that we are much more, and quite different.

Let us therefore try to sketch a history of AI focusing on not only its successes, but its complement: what we have learned about what we are not. This will by necessity be a very potted history.

The earliest successes of what is now known as AI were based on planning. This includes early Chess and Checkers players, and automated theorem provers such as Logic Theorist. The basic idea is to start at some state (such as a board position in Chess, or an axiom from which you want to derive a theorem) and consider the various possible actions available from there (moves in Chess, transformations in theorem proving). As considering all possible consequences of all possible actions recursively becomes computationally intractable for all but trivial problems, much of the art of planning is in the heuristics for which actions to consider at each point. Already in the 1950s we had theorem proving systems that could rediscover some previously discovered theorems much faster than humans, and in the next decades we saw major successes. In 1996 the Robbins conjecture, a long-open problem, was proven by a search-based theorem prover. Similarly, planning approaches led to superhuman play in classic board games such as Checkers and Chess.

In particular for board games, there has been quite a bit of work comparing humans to planning algorithms for game play. It turns out we are not alike. The algorithms explore many, many more potential moves and board states. Humans tend to explore just a few move sequences, but be much better at evaluating the positions.

In the 1970s and 1980s, expert systems were one of the main foci of AI research. The idea was to encode the knowledge of human experts in a form amenable to logical reasoning, and then let the computer do the reasoning rather than the human expert. Alas, it was not so easy.

Extracting the requisite knowledge from human experts turned out to be an enormous time sink. Humans, experts or not, seemed to have a hard time expressing what they knew. In particular, they found it very hard to express procedural knowledge (how to do things) in a way that could be formalized as rules. The finished systems often turned out to be brittle and inflexible, needing humans to look through decisions, which obviously severely limits the usefulness of the system. An often-used example is MYCIN, which was developed at Stanford in the early 70s to diagnose blood diseases. It took years to encode the 500 or so rules that the system used, and despite good performance in trials, MYCIN was never used in clinical practice.

The most reasonable explanation for the limited success of expert systems is that humans do not store their knowledge as a set of logical statements and rules. This might seem like a pretty obvious thing. Did anyone ever think that our brains operated this way? Surprisingly, yes. A long history of thought–ever since Aristotle–has postulated logics not just as a normative ideal about how we should think, but as a theory of how we actually think. More explicitly, the computer metaphor of the mind that become popular with the rise of cognitive psychology and the advent of cognitive science explicitly compares the functioning of the human mind to a standard computer, von Neumann architecture and all. The best interpretation of the relative failure of classic expert systems is that the computer metaphor cannot literally be true, at least at the level of how we encode knowledge.

Neural networks in one form or another underlie almost all modern AI. That much is generally known. Less often mentioned is that the earliest computer models of neural networks were proposed back in the 1940s, and the backpropagation algorithm that is the direct predecessor of the optimizers used in modern deep learning was invented in the 1970s. While there have been numerous minor and medium-size inventions in neural networks since, the remarkable success of the neural network approach is to a large extent due to us having more data and more compute so we can train larger networks. 

Much has been said about how neural networks mimic some features of human learning, such as learning hierarchies of representations. Less is said about how profoundly different they are to human brains. To begin with, there is no evidence of anything like backpropagation going on in the brain. This is reflected in how differently neural networks learn. In most settings, a neural network must see many more training examples than a human to learn the same concept. And when a concept is learned, it seems to be brittle. For any given model, it seems to be possible to find “attacks”, where changing a few tiny elements of the image completely throws the neural network, making it classify a panda as a gibbon or an abstract pattern of yellow and black as a school bus. Modern foundation models keep being susceptible to jailbreaks and prompt injection attacks. For all their proficiency at recognizing patterns, neural networks clearly do this in a different way than we do.

Similar things can be said about reinforcement learning. Seemingly miraculously, we can train neural networks to play games or control robots based only on feedback on their behavior. It’s astonishing that this works at all; it’s essentially trial-and-error on a massive scale. But why is such massive scale necessary? DeepMind’s classic experiments on learning to play Atari games with deep reinforcement learning saw each game being played for an equivalent of 38 days of game time. In contrast, a human can usually learn to play such games in less than an hour, sometimes in mere minutes. Of course, humans do this partly based on their familiarity with other games, as well as a lifetime of learning other visuomotor skills, from hopscotch to chopping onions. Artificial reinforcement learning systems are not good at this. Typically, they struggle to generalize beyond the narrow setting they have been trained on. Those networks that spent 38 simulated days to learn a simple Atari game? If you make a tiny change to the game, such as remapping the colors, or changing a few pixels here and there, they become utterly helpless.

There are other paradigms within AI that contextualize our own intelligence in other ways. Such as evolutionary computation. By (often crudely) mimicking Darwinian evolution, we can solve a large variety of problems. Evolution can come up with new designs for antennas, surprising but lucrative trading portfolios, useful software, and many other things. Evolution can also be used for supervised learning and reinforcement learning, often with more or less equally good results as the more commonly used gradient descent methods. But isn’t this weird? How can we get such good results through a completely different type of algorithm? Clearly, the currently dominant paradigm of AI is not the only way of solving the various problems we use AI to solve. 

The viability of evolutionary computation also reminds us about the perhaps greatest product of natural evolution: us. We are evolved beings, with an evolved culture. The main reason that we perceive ourselves as being generally intelligent is that we have built a world tailored to our shared cognitive capabilities. These capabilities have evolved over hundreds of millions of years. When we come to the world and start thinking, we are not blank slates; we build on an intricate neurophysiology and a vast repertoire of skills, instincts, and perspectives, some of which might at some point have helped our ancestors pick non-poisonous fruit, outwit crocodiles, or predict when the rain would come. In contrast, a machine learning model is quite literally a blank slate before training starts. Or rather, a blank matrix. Unlike all AI in existence, we are “trained” in a multi-timescale distributed process, encompassing our whole phylogenetic lineage as well as our whole culture.

Which brings us to present day. We now have large language models, and they are like the mind of god. At least according to breathless hypesters and accelerationists. More sober commentators still recognize that they are some of the most impressive technology we have ever seen, and they may well turn out to be almost uniquely consequential. We have all been humbled by LLMs doing something we didn’t think they could. Some of us multiple times. What are their shortcomings?

To begin with, they are good at tasks largely in proportion to how easily these tasks can be represented as strings. If the input is text and the output is text, chances are the LLM can solve the task very well. Modern multimodal models are also now very good at generating and classifying images, which are internally represented as strings of tokens. But spatial reasoning and interaction is another matter. Currently, huge resources are spent on trying to make these models confidently interact with graphical user interfaces. Granted, they are getting better at it. But they are still atrociously bad at, for example, playing video games. (Unless the game is very well known and you build an elaborate harness for it.)

It is likely that multimodal models will soon get much better at spatial interaction, at least for tasks that are economically relevant. The bigger issue is the lack of memory and continual learning. The current state of LLM memory is like the protagonist of the movie Memento, an amnesic man who can’t form new memories, and therefore has to write little notes to himself (or tattoo notes on his body) to remind himself who he is and what he is doing. This is because an LLM does not modify its parameters as you interact with it. All the little numbers that define it remain frozen in time. Instead, it keeps a short-term memory of its interaction in its context, but the length of this context is necessarily limited. To achieve something akin to long term memory, the harness around the LLM will at intervals summarize its context as a text file and store it away in a kind of database, which it can then access in the future. Rather like writing little notes to itself. This is likely to be a fundamental limitation of LLMs, not in the sense that it cannot be overcome, but in the sense that the solution will look quite different to an LLM as we know them today.

There are other ways in which LLMs differ from us which have not yet understood fully, because the technology is so new. For example, just like other forms machine learning, LLMs appear to have a strong bias towards problems that are in its training set. One way this manifests is a curious lack of novel insights stemming from LLMs recombining existing knowledge. Very much, if not most, of human creativity comes from recombining existing knowledge. Now, LLMs have a broader range of “expertise” than any human ever. Which actual human would simultaneously have detailed knowledge of peat bogs, pupillometry, polymerization, Pasadena, and Paul Krugman? In fact, frontier LLMs have had this staggering breadth of knowledge for at least 3 years, since GPT-4. A human with such range would surely make a stream of unexpected connections. Yet, few if any truly novel insights are directly attributable to LLMs. Why? We don’t know. We also don’t know whether this is a fundamental limitation of this approach to AI.

So far, we have only talked about intelligence in a relatively abstract information-processing sense. But we are not just brains, we are whole bodies. As you may have noticed, the way you think is strongly affected by whether you are hungry, horny, angry, or something else. And much of your thinking involves your body in some way, whether it is walking, tying your shoelaces, or typing on a keyboard. Some argue that all of your thinking is rooted in your body. Opinions diverge within cognitive science as to how important the body is to thinking. But what is plain to see is that physical robots are far, far behind non-embodied AI. Robots struggle to do things that are trivial for us, such as opening door handles. This is not a new issue: it has been the case for the whole history of AI, and much commented on.

Replaying the history of AI this way, we can sketch a different understanding of human intelligence and human nature than what we would get from using the AI we built as a metaphor for ourselves. More precisely, we can paint a picture of intelligence that emphasizes the parts which our AI systems are not good at, or which they do in a very different way to us. We can emphasize the complementary part.

Let us consider the difference. If we did use AI as a metaphor for our own intelligence, or as a lens for understanding it, similar to how previous generations used clockworks or steam power, we would arrive at what could be described as a rather classicist picture. Human intelligence operates by considering a large range of alternatives, tries to solve specific tasks that have well-defined rewards and can be clearly separated from other tasks, learns each skill on its own, starts from a blank slate when learning, and sees task descriptions and world descriptions largely as text. This picture has echoes not only of philosophies of past centuries, but also of modern management thinking and the kind of postmodern thinking which sees everything as a “text”.

The complementary intelligence view is instead that we are creatures that are deeply rooted in our history, both our evolutionary history as a species, our cultural history, and our personal history. Context is what we excel at. Most of what we do cannot easily be stated as separate tasks with well-defined rewards. We learn and reason slowly in terms of clock time, but effectively in terms of number of examples we need to see. We almost never operate according to logical rules, though we may tack them on as justification for what we did. Text is just one of our modalities, and somewhat “tacked on” compared to for example sight, smell, or proprioception. The body plays an important role in our thinking, and fine manipulation is another thing we excel at.

Neither human nor artificial intelligence is “general” in anything but a trivial sense, and could never be. The reason we believe we have general intelligence is that we live in a world we have constructed over the course of our civilization to fit our capabilities perfectly. Our societies, technologies, and built environments are scaffolding and support systems for our very particular type of intelligence. This makes us feel very smart and powerful. But thinking that the particular capabilities that the world we constructed test and amplify is all there is to intelligence is a very parochial view.

Looking at human intelligence this way gives us perspective on the rapidly advancing capabilities of AI. It is often asked when AI will overtake human intelligence. But this assumes that intelligence is a single-dimensional quantity. The various types of machine intelligence we have created can instead be seen as vectors pointing in different directions. Classic symbolic planning is one vector, LLMs are another, and fuzzy logic also another. Human intelligence is yet another direction. Moving further along in one direction (increasing the magnitude of the vector) may have limited bearing when projected on other intelligence vectors.

So, to directly answer the question about when artificial intelligence will surpass human intelligence: it did so long time ago, many times, and it never will. Various technologies that we refer to as artificial intelligence have surpassed humans at calculating, planning, solving logical puzzles, factual recall, and many other things. Yet, it is extremely unlikely that any technology would have exactly the same intelligence vector as human intelligence. Because these machines are not humans, their intelligence will always point in different directions.

Interestingly, the history of AI can be seen as a sequence of attempts at approximating the human intelligence vector. The moving goalpost phenomenon then becomes a game of finding a particular point on this vector, only describing it in one or a few dimensions, and trying to invent something that reaches this point. We then reach that point, only to discover that we did so by following a completely different vector than then we tried to imitate. So we find a new point, and repeat the procedure.

We can use complementary intelligence as a term to describe this view, but we can also see it as a positive research program. Complementary intelligence as a direction is, basically, to lean into the difference. We should not try to eliminate it, instead we should support it. Recognize the strengths of the human intelligence vector and build AI systems that amplify it.

At the same time, we should try to move away from trying to approximate the human intelligence vector. For example, plenty of AI researchers around the world are currently working on how to augment LLMs with continual learning, because they realize that’s not something that will be found along the intelligence vector of current LLMs. I think we should take our efforts elsewhere. Simply imitating human capabilities is perhaps the least interesting way of building artificial intelligence. It’s so unimaginative. And it leaves so many potential capabilities on the table. You could even argue that fully imitating human intelligence is immoral. We don’t actually want artificial intelligence that has all the capabilities we have, and is better at all of them. Because we want to matter. And we don’t want to be replaced.

Instead we should seek to amplify machines’ capabilities at tasks that we humans are not particularly good at, or do not want to do. In the best case, tasks that are not done at all, because we can’t or won’t do them, even though we might want to. We want AI that lets us focus on the things that we want to be good at, and give us abilities we didn’t have before.

To take some very quotidian examples: high-frequency trading is an example of complementary intelligence, because we can’t trade that fast. It is literally physically impossible for humans. AI deployed inside a video game, to generate levels, control non-player characters, or something like that, is another example of complementary intelligence, because you could not have humans implementing every NPC, and they would probably be very bored if they were asked to follow the rules that NPCs follow. AI methods for helping us make sense of modalities we are not attuned to, from WiFi reflection to gravitation fields are also good examples, as they expand our perceptual space. Then there is of course the boring but immensely impactful technologies that make the modern world possible and does not replace any cognitive work people actually want to do, such as databases and web search.

But beyond this, there is a virtual infinity of new types of intelligence we could develop, and new tasks we could discover, and new solutions we could invent. Taking the metaphor of intelligence vectors seriously, we could envision a hypersphere of capabilities on which every possible intelligence vector, at its maximum magnitude, could only reach a particular point. By definition, we have only explored an infinitesimal part of the interior volume of this hypersphere. There is so much more to do.

Most of these types of intelligence would likely be uninteresting and indeed incomprehensible to us humans and our society. But there is likely to be a practically infinite number of directions we could appreciate and build exciting new capabilities around if we first invented them. I don’t know what these capabilities would be, because they have not been invented. But I think the semi-automated open-ended search for new types of intelligence and associated tasks that would likely be of interest to humans to be the most exciting direction for AI I can imagine. This will definitely require new thinking about open-ended search, discovery of new ways of measuring search space, and clever measures of what humans find interesting.

As you can tell, there’s a lot to be worked out here. I’m thinking I should write my next book about Complementary Intelligence, so I get a chance to work some of it out. What do you think, should I?