Sunday, May 11, 2025

On the death of the lecture

I would like to say that predictions about the death of the lecture as a mode of knowledge transmission are as old as the lecture, but I don't think that's entirely accurate. As far as I can tell, people only started predicting the death of the lecture with the proliferation of the book printing and (upper class) literacy. For example, here is a prediction from the late 18th century:

"People have nowadays…got a strange opinion that everything should be taught by lectures. Now, I cannot see that lectures can do as much good as reading the books from which the lectures are taken Lectures were once useful, but now, when all can read, and books are so numerous, lectures are unnecessary."

The luminary behind these words is none other than Samuel Johnson, a man of letters if there ever was one. (Cited here.) And, you know, I kind of agree. I typically prefer reading a book to listening to a lecture. I don't have the attention span necessary for following a lecture, and my thoughts will start wandering off as I start doodling, scrolling, or playing a game on my phone.

I have learned, however, that I am in the minority. I don't listen to podcasts either, can't stand talk radio, and despise audiobooks. I much prefer the interactive nature of the printed page, where you can read at your own pace, flip forwards and backwards, and stop to think. You are also not distracted by the author's voice. I mean the author’s actual, physical voice, from their vocal cords. You may very well be distracted by the author’s imagined voice produced by their imaginary vocal cords operating inside your own head as you read their writing. Yes, that’s quite the image. You’re welcome. Anyway, where were we, something about distractions?

Why do people even go to lectures? I guess it varies, but much of it is really about being there. Next week, I plan to attend a lecture here at NYU, largely to be seen by my colleagues as being there, but also to force myself to listen to what is said, see how people react to it, and hear which questions are asked. I also look forward to chatting with my colleagues before and afterwards; the actual content of the lecture may or may not be what we talk about, but it will certainly be a relevant backdrop. I will probably be reading something else or playing a game on my phone during part of the lecture, listening with one ear. And: this is fine. All of these are perfectly good reasons and behaviors.

Back in my undergrad days, back before I had a phone to scroll or play on, I used to doodle in my notebooks while listening with varying attention to the lecture. The “notes” I took from my philosophy classes are largely drawings of bizarre creatures sprinkled with the names of philosophers and their arguments, sometimes illustrated in cartoon form. Sometimes I would chat with whoever sat next to me, sometimes read a book, and often I would daydream. I have fond memories of looking out the window at the wind rustling the leaves in autumnal Lund while listening to lectures on epistemology. I remember the room I was in when I first felt the force of Quine’s incommensurability thesis and was gripped by an urge to vanquish it in single combat. I would not have had that memory if I had just read about it in a book. But I did also read about Quine’s incommensurability thesis in a book, and that made me understand it much better. (But can I really compare these two modes of learning?)

Maybe you read this and think that I’m down on lectures because I’m a bad lecturer. But I’m a pretty good lecturer, at least according to what my students say. Well, at least those few students that actually fill out the course satisfaction surveys. They say that my lectures are engaging, funny even. I think that’s true. They also say that I’m disorganized and chronically late with feedback and grades. Also true. But we were talking about lectures here (fun), not grading (boring). I strongly believe that me being such a bad listener makes me a better lecturer. My inability to focus on what lecturers say means that I’m constantly paranoid that nobody is listening to me, so I do what I can to remain a strong attractor in attention space. Switch things up. And again. Yes, I have learned a decent model of my students’ attention, but beyond that, I feel the strong need to avoid boring myself as I lecture. It’s a dialog with the audience/students, whether they say anything or not, and above all it’s a live performance. It’s a tension between improvisation and the strict structure of the slides. But actually–did you know this?–you can edit the slides as you lecture. I usually do. That’s why I never give students my slides in advance, they are not finished until after the lecture.

I remember the discussions around 2012 or so, when Massive Open Online Courses (MOOCs) were all the rage. Various colleagues of mine, including some senior and very accomplished professors, argued that university teaching as we knew it was on its way out, to be replaced with prerecorded videos and integrated assessments. Because while we might be decent lecturers ourselves, we couldn’t compete with the real pros, who also had real resources to prepare and produce their courses. Sal Khan, Andrew Ng, these kinds of people. Because lectures are infinitely reproducible, economies of scale would win out.

This hasn’t happened. So far. MOOCs exist, and many students watch these lectures as a complement to their regular lectures, while many others don’t. Many others who are not students also watch such lectures, and I’m not even sure there’s a meaningful boundary to be drawn between MOOCs, podcasts, and general influencer content. That’s fine with me, I don’t really care about any of that. I’m just noting that these online videos fulfill another purpose than the in-person lecture.

As an aside, the MOOC idea was itself largely reheated leftovers. Distance education via snail mail has existed for at least a century or so. In many countries, educational content has been delivered via TV and radio, sometimes including whole school curricula as well as university-level courses. Apparently, there was even at some point a business in recording lectures on VHS tapes and mailing them to learners. The more things change…

Reliable assessment of online-only courses was always a tricky thing, and I suppose that AI developments have now completely killed off any chance of simultaneously scalable and reliable online assessment. I mean, the LLM can just do your homework, dude. The only kind of online assessment you can AI-proof for the foreseeable future is likely oral exams. But they don’t scale well, which negates the whole idea of online classes being infinitely scalable. So we continue lecturing, mostly in person.

See what I did there? I waited more than ten paragraphs before mentioning AI, and then I didn’t mention it in the context of AI systems replacing lectures. I bet that what you thought this piece was going to be about when you started reading. And what can I say, asking Claude or Gemini to explain things to me is pretty nifty. The ability to ask follow-up questions is even niftier. I have learned things that way, and as certain people never tire of saying, this is the worst these models will ever be. Still, as someone who cares about accuracy, I go to a source I have some reason to trust to check any fact I care enough about.

If you have followed me this far, I suppose you expect some kind of conclusion here. Not sure this is that kind of post, though. I guess my conclusion is: to each their own. Modes of knowledge transmission are largely complementary. Most people seem to like to listen to other people talking, and I like to talk. I’m not going anywhere, and neither are lectures. Thanks for coming to my TED talk.





Tuesday, May 06, 2025

Write an essay about Julian Togelius

I am well-known enough that most LLMs know about me, but few know me well. I also have a unique name. So one of my go-to tests for new LLMs is to ask them to write an essay about me. It's very enlightening: most of them hallucinate wildly. So far, only Gemini 2.5 Pro (with web search capabilities) gets it mostly (not completely) right.

Even the much-hyped o3, for all its agentic prowess, is very bad at factuality. There's something wrong in every paragraph. Better than an average 7b model, but worse than Llama 70b or Mistral Large. Knowing the subject (myself) intimately is also interesting in that it helps with tracing where the hallucinated "facts" come from. For example, LLMs sometimes claim that I work at the University of Malta (like Georgios Yannakakis) or the University of Central Florida (like Ken Stanley used to do). I guess I'm close to Georgios and Ken in some sort of conceptual space. This exercise is also a sobering counter to Gell-Mann amnesia. If the LLMs get so many things wrong about me, how could I trust them on other somewhat obscure topics?

Sunday, April 27, 2025

A smartphone analogy

In the early 2000s, there were various attempts at smartphones, but they were just not good enough. Then the iPhone came along in 2007, and actually worked! I remember trying one after having used a couple of proto-smartphones, and it was a revelation. So usable, so functional. Everybody rightly predicted that smartphones would be huge, tech companies poured ludicrous amounts of money into keeping up, and a zillion startups were founded with the premise of doing things on/with your phone.

And for a few years, progress really was great. Photos became so good that you could leave your camera at home, and then video became good, and you could share photos and videos directly on social media. Location became reliable and didn't drain the battery, and you could share it with people. Games got good, and inventive. Swipe typing, fingerprint scanners, car integration. The synergies kept coming.


Then, smartphones peaked. Sure, they keep getting technically better. Gigabytes and megapixels keep going up, nanometers and milliseconds keep going down. But no-one except enthusiasts really care anymore. It hasn't felt like phones have been able to do qualitatively new things for the last ten years or so. And the skills you need to operate them have stayed the same. You go buy the latest iPhone or Pixel or Samsung, and expect it to do what the last one did, just a little better. Therefore, the smartphone brands largely market their phones with lifestyle marketing, rarely mentioning those Gigabytes and Megapixels. In fact, you rarely think about your phone, while you use it all the time. It has become part of you and therefore invisible. Like a part of your body.


What has changed is the rest of the tech stack, and indeed the rest of society. You are now expected to always carry a smartphone and use it for a wide variety of things, from logging onto all your digital services, to editing and signing documents, taking the bus, entering the gym, splitting the dinner bill, keeping up with friends, watching movies, and so on. We're always on our phones. That last sentence felt almost painful to write because it is such a cliché. And it is such a cliché because it is true.


Imagine life without a smartphone in 2025. Yes, you'd be kind of helpless. For perspective on this, try traveling to China without installing a VPN on your phone (so you can access your Western apps) and without installing any of the apps that Chinese society runs on, such as WeChat. You will feel like an alien or a time traveler, suddenly materializing in a society which you lack the basic means of interfacing with.


Some things we were promised from the beginning, like augmented reality based on sensors that rapidly and reliably model the physical world around us and incorporate it into the virtual world, have still not materialized and we don't know when or even if we will ever get there. Connectivity is still not guaranteed, and might cut out in unexpected places. Battery life is still bad. Screens still crack. Videos buffer. Pressure on business models has led to the average new smartphone game arguably getting worse, although the best ones are excellent. There are still spam calls. Remarkably, I still cannot walk into a store and be guided to the shelves where I can find the items on my online shopping list, even if I can find them on the store's webpage.


Now think of ChatGPT as the iPhone moment of Large Language Models (I include multimodal models in this term). Then, LLMs are currently where smartphones were in 2010 or so. Let's follow this thought and see where it leads. What would this mean?


Here are some speculations:


Numbers will keep going up, benchmarks will keep being broken, but this will have little impact on most people's use cases. The models will already be good enough for most things you'd want to do with them. Most people don't prove theorems or write iambic pentameter as part of their daily work or life. So the announcement that Claude 8 or Gemini 7 finally beats the HumanitysLastExamFinalFinalThisOneLatest.docx benchmark will be greeted with a ¯\_(ツ)_/¯, much like the announcement that iPhone 16 Pro finally has Hybrid Focus Pixels for its Ultra Wide camera.


Some of the dominant players might be the same as today, others will change. The cost of entering the market will not increase, because there will be a good supply of components (e.g. data, pretrained models) for cheap or free. Apple and Samsung may be the kings of smartphones, but nobody has a majority of the market globally, and there's a constant churn of competitors, some of them really good.


Costs will come down and stay down. You can buy a no-name phone that's good enough for your daily use for $100, or a brand-name one (Motorola) for $200. Similarly, there will keep being good enough LLMs available for free, and an abundance of choice if you're willing to pay. Differentiation will be hard, as all the useful features will rapidly be copied by competitors.


However, society and our tech stack will wrap itself around the ubiquitous availability of good LLMs. We will use LLM-powered software for everything, all the time. These things will be thought companions for most of us, and we will be expected to be in touch with our LLM-powered companions and agents on a more or less constant basis. Imagine life without LLM-powered software in 2040: you will feel mentally naked, a bit stupid, and out of touch with the world around you.


There will be some things that we were promised from the start that will keep on not materializing. I personally believe that hallucinations and jailbreaks will never be "solved", just learned to reckon with. There will also keep being a "normie bias", where LLMs will output things that feel generic and do better the more similar the tasks are to what they have seen before. Yet, they will be incredibly useful for thousands of things, and at least moderately useful for almost anything that can be put into words.


And of course, AI progress will continue. But the interesting progress may not be in feeding token streams to transformers.


I have no particular evidence that the future will play out like this. This was literally just a random thought I had during lunch that got too long for a tweet, so it became a blog post instead. But given the quality of AI forecasting we see these days, it strikes me as just as good a guess as any of the others.


By the way, if you haven't already, you should absolutely read AI as Normal Technology.

Thursday, January 23, 2025

Stop talking about AGI, it's lazy and misleading




The other week, I was interviewed about the discourse around AGI and why people like Sam Altman say that we will reach AGI soon and continue towards superintelligence. I said that people should stop using the term AGI, because it's lazy and misleading. Here are the relevant paragraphs, for context:



Some people have asked what I mean by this. It would seem to be a weird thing to say for someone who recently wrote a (short) book with the title Artificial General Intelligence. But a central argument of my book is that AGI is undefinable and unlikely to ever be a useful concept. Let me explain.


What would AGI mean? An AI system that can do everything? But what is "everything"? If you interpret this as "solve every possible problem (within a fixed time frame)", that is impossible per the No Free Lunch theorem. Further, we don't even know what kind of space every possible problem would be defined in. Or whether such a space would be relevant to the kinds of problems humans care about, or the kind of thinking humans are good at. Comparing ourselves with other animals, and with computers, it seems that our particular cognitive capacities are a motley bunch occupying a rather limited part of possible cognition. We are good at some things, bad at others, even compared with a raven, a cuttlefish, or a Commodore 64. Psychologists claim that they have a measure of something they call "general intelligence", but that really only means factor analysis on a bunch of different tests they have invented, and different tests would yield a different measure.


But let's say we mean by AGI a computer system that is good at roughly the kind of thinking we are good at. Ok, so what counts as thinking here? Is falling in love thinking? What about tying your shoelaces? Making a carbonara? Understanding your childhood trauma? Composing a symphony, planning a vacation, proving the Riemann hypothesis? Being a good friend, and living a good life?


Additionally, there is the issue of whether these capabilities would come "out of the box", or do they need some kind of training, or prompting? How extensive would that preparation be? Humans train a long time to be good at things. How hard is it to instruct the AI system to use this capacity? How fast can it do it, and how much does it cost? How good does the end result need to be? Would an AGI system also need to be bad at things humans are bad at? And what about when it is unclear what good and bad means? For example, our aesthetic judgments partly depend on the limits of our sensory processing and pattern recognition.


One way of resolving these questions is to say that AGI would be an AI system that could do a large majority (say, 90%) of economically important tasks, excluding those that require direct manipulation of the physical world. Such a system should be able do these tasks with minimum instruction, perhaps a simple prompt or a single example, and it would do them fast enough (and cheaply enough in terms of computation) that it would be economically competitive with an average human professional. The quality of the end result would also be competitive with an average human professional.


The above paragraph is my best attempt at "steelmanning" the concept of AGI, in the sense that it is the most defensible definition I can think of that is relevant to actual human concerns. We can call it the "economic definition" of AGI. Note that it is much narrower than the naïve idea of AGI as being able to do literally anything. It excludes vast spaces of potential cognitive ability, including tasks that require physical manipulation, things we haven't figured out how to monetize, things that cannot easily be defined as tasks, and of course all kinds of cognition humans can't carry out or have not figured out how to do well yet. (We are very bad at coming up with examples of cognitive tasks that neither we or our machines can do, because we have constructed our world so that it mostly poses us cognitive challenges we can handle. We can call this process civilization.)


Alas, even the economic definition is irredeemably broken. This is because which tasks are economically important is relative to our current economy and technology. Spellchecking is not a viable job for humans because computers do that now; typesetting has not been a viable job since desktop publishing; and once upon a time, before the printing press, manually copying texts ("manuscripts") was an intellectual job performed by highly trained monks. Throughout human history, new technologies (machines, procedures, and new forms of organization) have helped us do the tasks that are important to us faster, better, and simpler. Again and again. So if you take the economic definition of AGI literally, we have reached AGI several times in the history of civilization.


Still, unemployment has been more or less constant for as long as we have been able to estimate it (when smoothed over a few decades). This is because we find new things to do. New needs to fulfil. As Buddha taught, human craving is insatiable. We don't know in advance which the new jobs will be and which kind of cognitive skills they would require. Historically, our track record in predicting what people will work with in the future is pretty bad; it seems that we are mostly unable to imagine jobs that don't exist yet. There have been many predictions that we will only work a few hours a day, or even a few hours per week by now. But somehow, there are still needs that are unfulfilled, so we invent more work. Most people today work in jobs that would be unimaginable to someone living 200 years ago. Even compared to when I was born 45 years ago, people may have the same job titles (graphic designer, travel agent, bank teller, car mechanic etc) but the actual tasks done within these jobs are quite different.


One attempt to salvage the economic definition of AGI would be to say that AGI is a system that can perform 90% of the tasks that are economically valuable right now, January 2025. Then AGI will mean something else next year. This sounds like a viable definition of something, but I would have expected this much talked-about concept to be a little less ephemeral.


Alternatively, you could argue that AGI means a system that could do 90% of all economically valuable tasks now, and also all those that become important after this system is introduced, in perpetuity. This means that whenever we come up with a new need, an existing AGI system will be ready to satisfy that. The problem with this is that we don't know which tasks will be economically important in the future, we only know that they will be tasks that become important because AGI (or, more generally, technology) can do the tasks that were economically important previously. So… that means that AGI would be a system that could do absolutely everything that a human could potentially do (to some extent and capacity)? But we don't even know what humans can do, because we keep inventing new tasks and exploring new capacities as we go along. Jesus might have been a capable carpenter but could neither know that we would one day need software engineering nor that humans could actually do it. And we certainly don't know what humans will find important in the future. This definition becomes weirdly expansive and, crucially, untestable. We could basically never know whether we had achieved AGI, because we would have to wait for decades of social progress to see whether the system was good enough.


This is getting exhausting, don't you think? This initially intuitive concept got surprisingly slippery. But wait, there's more. There are a bunch of other definitions of AGI out there which are not formulated in terms of the ability of some systems to perform tasks or solve problems. For example, pioneering physicist David Deutsch thinks that AGI is qualitatively different from today's AI methods, and that true AGI is computationally universal, can create explanatory knowledge, and can be disobedient. Other definitions emphasize autonomy, embodiedness, or even consciousness. Yet other definitions emphasize the internal working of the system, and tend to exclude pure autoregressive modeling. Many of these definitions are not easily operationalizable. Most importantly, they are surprisingly different from each other.


Now, we might accept that we cannot precisely define AGI, and still think that it's a useful term. After all, we need some way of talking about the increasingly powerful abilities of modern AI, and AGI is as good a term as any, right?


Wrong. It's lazy and misleading. Why?


Lazy: Using the term AGI is a cop out of having to be clear about which particular system capabilities you are talking about, and which domains they have impact on. Genuine and impactful discussion about the progress of AI capabilities and their impacts on the world requires being concrete about the capabilities in question and the aspects of the world they would impact. This requires engaging deeply with these topics, which is hard work.


Misleading: As the term AGI will inevitably mean different things for different people, there will be misunderstandings. When someone says that AGI will arrive by time T and it will lead to X, some people will understand AGI as referring to autonomous robots, others as a being with godlike powers, yet others as digital copy of a human being, while the person who said it might really just mean a souped-up LLM that can write really good Python code and convincing essays. And vice versa. None of these understandings is necessarily wrong, as there is no good definition of AGI and many bad ones.


Misleading: The way the term of AGI is used implies that it is a single thing, and reaching AGI is a discrete event. It can also imply that general intelligence is a single quantity. When people hear talk about AGI appearing at a certain date, they tend to think of time as divided into before and after AGI, with different rules applying. All of those are positions you can hold, but which do not have particularly strong evidence in their favor. If you want to argue those positions, you should argue them separately, not smuggle them in via terminology.


Misleading: To many, AGI sounds like something that would replace them. That's scary. If you want to engage people in honest and productive discussion, you don't want to start by essentially threatening them. Given that the capabilities of existing, historical, or foreseeable AI methods and systems are very uneven (what Ethan Mollick calls the "jagged frontier") it makes most sense to talk about the particular concrete capabilities that we can foresee such systems having.


I would like to clarify what I am not saying here. I am not saying we should stop talking about the progress of AI capabilities and how they might transform society. On the contrary, we should talk more about this. AI capabilities of various kinds are advancing rapidly and we are not talking enough about how it will affect us all. But we need to improve the quality of the discussion. Using hopelessly vague and ambiguous terms like AGI as a load-bearing part of an argument makes for bad discussion, limited understanding, and ultimately bad policy. Everytime you use the term AGI in your argument you owe it to yourself, and your readers/listeners, to replace it with a more precise term. This will likely require hard thinking and might change your argument, often by narrowing it.


I would also like to clarify that I am accusing a whole lot of people, including some rich and/or famous people, of being intellectually lazy and making misleading arguments. They can do better. We can all do better. We should.


Not everyone argues this way. There are plenty of thoughtful thinkers who bother to be precise. Even leaders of large industrial AI labs. For example, Dario Amodei of Anthropic wrote a great essay on what "powerful AI" might mean for the world; he avoids the term AGI (presumably because of the conceptual baggage discussed here) and goes into commendable detail on particular fields of human enterprise. He is also honest about which domains he does not know much about. Another example is Shane Legg of DeepMind, the originator of the term AGI, who co-wrote a paper breaking down the concept along the axes of performance and generality. It is worth noting that even the person who came up with the term (and may have thought deeper about it that anyone else) happily acknowledges that it is very hard to define, and is perhaps better seen as a spectrum or an aspiration. The difference between us is that I think that such an acknowledgement is a good reason to stop using the term.


If you have read all the way here but for some reason would like to read more of my thoughts about AGI, I recommend that you read my book. It's short and non-technical, so you can give it to your friends or parents when you're done.


If you find yourself utterly unconvinced by my argument, you may want to know that I gave this text to both Gemini, Claude, and R1, and they thought it was well-argued and had no significant criticisms. But what do they know, it's not like they are general intelligences, are they?