Skip to Content

Welcome to the Diamond Age

It’s been a while since I’ve written anything, and what better topic to bring me out of my hiatus than AI. Let’s kick things off in an especially cringey way, with a reference to my own tweet:

For those unfamiliar, Snow Crash and the Diamond Age are dystopian futuristic novels by the prolific sci-fi author Neal Stephenson. Snowcrash, despite being published in 1992, popularized the concept of the avatar and invented the idea of a “metaverse”. Unsurprisingly, the book resurfaced in recent years as Zuckerberg made his big bets on a VR metaverse, culminating in the renaming of Facebook to Meta. 

While less discussed in popular culture, I would argue that the Diamond Age is more relevant to the current moment,  particularly with regards to generative AI and large language models (LLMs). The book is set in a world revolutionized by nano-technology and centers on the coming-of-age story of a young protagonist, Nell, who receives a stolen copy of a highly sophisticated interactive book entitled a Young Lady's Illustrated Primer.  The Primer is intended to help young women reach their full potential through a completely personalized curriculum that relies heavily on narratives that adapt to the reader and their environment. Despite being totally implausible at the time of the book’s 1995 publishing, the Primer is eerily similar to large language models like GPT4. 

Here is an excerpt of Nell’s interaction with the Primer as it tells her a story it generated:

[Primer]” Once upon a time there was a little Princess named Nell who was imprisoned in a tall dark castle on an island—”

[Nell] “Why?” 

[Primer] “Nell and Harv had been locked up in the Dark Castle by their evil stepmother.” 

[Nell] “Why didn't their father let them out of the Dark Castle?” 

[Primer] “Their father, who had protected them from the whims of the wicked stepmother, had gone sailing over the sea and never come back.”

Forget 1995 - I think it’s worth taking a moment to recognize how unthinkable this sort of an interaction with a machine was just a few years ago. And yet this is really quite similar to the types of interactions over 100 million people have had with chatGPT. 

Here’s my attempt at recreating such an interaction in chatGPT using its GPT4 model:

    

Note that the responses by chatGPT were interrupted by me, as I tried to mimic the way Nell would interrupt the Primer with her curiosity. 

So how did we get here?

There’s been so much written by people far more qualified than myself on how LLMs work. Probably my favorite piece so far is by Steven Wolfram, no slouch himself when it comes to advancing AI. But if you boil it down, the essence of an LLM like GPT4 is predicting the next word in a sequence of text, based on the words that precede it. Really, that’s pretty much it. It just turns out doing that well is hard, but once you do it well, you start to be able to generate human-like responses to a given input. This has sparked a debate about the very nature of human intelligence: are we just prediction machines, making an educated guess on what our output should be, for some input?

Regardless of the answer, everyone sensible agrees that GPT4 is not artificial general intelligence, or AGI. An LLM like GPT4 is trained on a huge amount of training data, basically the entirety of the internet. But it struggles to generalize beyond its training data. Deep learning researcher Francois Chollet defines general intelligence as follows:

According to Chollet, LLMs still fail to learn patterns and acquire skills when shown a handful of examples that they haven’t seen before - an ability that humans naturally possess. 

Given that GPT4 is not AGI, but still very powerful, the two questions I have a vested interest in answering are: how will LLMs change the ways we interact with software?  - and perhaps even more critically - how will LLMs not change the ways we interact with software? 

What will change

The biggest change I foresee in how software will function might sound prosaic or even downright banal. Unless, of course,  you’ve ever tried to onboard new users to a product you’ve launched, in which case you know how challenging it is for users to “get” what the software can do. Right now people learn how to use software, - particularly the more powerful, creative sort - through trial and error. If they’re feeling really frustrated and exhausted all other options, they might turn to a software’s dreaded documentation and “help” content. But the reality remains: there is a giant loss in potential productivity and utility represented by the gap between what users wish to do with software and what they are able to achieve and express, merely because they don’t know how to access functionality that already exists. This lost potential is true even for supposedly non-technical or “no-code” products. 

Documentation, a concept associated with boredom and drudgery, will mean something totally different in a post GPT4 world. Much like the Young Lady’s Primer made learning come alive for its reader, LLMs will make learning about software a joy. A user will be able to have a conversation with their software, asking it questions the way it would ask an already skilled user, except without shame or embarrassment. And because a model like GPT4 can make sense of any variety of data types and symbolic forms (including code itself), the range of topics which the user can explore is no longer limited to what the creator bothers to document. 

The ability of GPT4  to make sense of new data that it is exposed to without being entirely re-trained is one of its most remarkable properties. While more ambitious companies can fine tune an LLM by using proprietary data, an even simpler hack is simply to make sure the data you want to make sense of is less than GPT4’s 32,768 token limit:

Previously, machine learning models had to be retrained on domain specific data. Models like GPT4 are mysteriously capable of something called in-context learning. Instead of being retrained, the model can be shown just a few labeled examples in a prompt and generate the desired output. While researchers debate how this is even possible, the consequence is that virtually anyone can spin up their own domain specific model by sending the right prompt to a model like GPT4.

Aside from a more personalized path to learning new software, Geoffrey Litt has an amazing piece on how LLM’s might allow users to alter the functionality of the software itself. Litt writes:

LLMs will represent a step change in tool support for end-user programming: the ability of normal people to fully harness the general power of computers without resorting to the complexity of normal programming. Until now, that vision has been bottlenecked on turning fuzzy informal intent into formal, executable code; now that bottleneck is rapidly opening up thanks to LLMs.

The desire for users to customize their software beyond its out-of-the-box capabilities is part of what makes Excel so popular, and arguably the world’s biggest programming language. Users of excel can create their own formulas, and then chain functions, by passing the output of one cell as the input to another; this is really a sort of functional programming made accessible. A more skilled excel user (but not necessarily a programmer) can create a nice template or workbook with modifiable inputs, and then share it with a slightly less skilled colleague to play with and modify. 

Excel, however, has been limited to specific types of numerical tasks that can be represented in a tabular format. In most other domains, there has long been an implicit trade off between usability and flexibility that has stumped product developers. Thankfully, LLMs push that frontier way, way out. My own startup, Composer, now offers an escape hatch from our visual, no-code interface, under the assumption that even people without coding skills can use GPT4 to modify raw code. Now users can perform tasks that would otherwise be very cumbersome in a visual interface, and even use GPT4 to explain, modify and improve the output generated by the visual interface. You can check out a demo here.

The ability to modify software in a way previously reserved for engineers is owing to GPT4’s capacity to translate from natural language to formal programming languages and back again. Roon, an AI researcher, describes how text is the universal interface:

Slowly but surely, we can see a new extension to the UNIX credo being born. Those who truly understand the promise of large language models, prompt engineering, and text as a universal interface are retraining themselves to think in a new way. They start with the question of how any new business process or engineering problem can be represented as a text stream. What is the input? What is the output? Which series of prompts do we have to run to get there?

What stays the same(-ish)

Even if text becomes the universal interface, it is not necessarily the ideal end user interface for all contexts. Sometimes, you actually want to use computers for what they were originally intended as: ultra powerful calculators. LLMs like GPT4 are probabilistic models, meaning that like humans, they mostly get things right - but sometimes they make big mistakes. 

Stephen Wolfram shares some examples where GPT4 is inferior to a formal computational language like the one running Wolfram Alpha. In one such example, ChatGPT flubs the distance from Chicago to Tokyo:

The real answer, it turns out, is 6,313 miles - an answer confirmed by Wolfram Alpha. The difference is that ChatGPT is returning a plausible, somewhat-likely answer to the question, whereas Wolfram Alpha is precise and deterministic. Wolfram Alpha isn’t nearly as flexible and forgiving as GPT4, which is why it isn’t getting the limelight right now. But there are still tasks for which it is superior, and I have no doubt that OpenAI would concede this point given that they are rolling out a Wolfram Alpha plugin for ChatGPT. 

With all the excitement over advances in LLMs, it’s worth remembering that humanity has dealt with the limitations of natural language before. Modern mathematical notation, a formal symbolic system we take for granted today, did not exist before the 16th century. As mathematics professor Joseph Mazur points out:

Even our wonderful symbol for equality – you know, those two parallel lines – was not used in print before 1575, when the Welsh mathematician and physician Robert Recorde wrote an algebra book that he called the Whetstone of Witte. (We can only guess that the title is a pun on sharpening mathematical wit.) In it he wrote “is equal to” almost two hundred times for the first two hundred pages before finally declaring that he could easily “avoid the tedious repetition” of those three words by designing the symbol “=====” to represent them.

I don’t know about you, but I’d rather not go back to spelling out “equals to” when a simple “=” will suffice. More seriously, there are times when you want to work directly with mathematical notation in its modern symbolic form, as it is more precise, more compact and ultimately more expressive than plain English. Mazur describes the power of modern notation beautifully: “In reading an algebraic expression, the experienced mathematical mind leaps through an immense number of connections in relatively short neurotransmitter lag times, cutting to the chase of compact understanding.”

Beyond the realm of mathematics, we often need software to be precise. At Composer, we allow people to deploy automated trading strategies and put their hard earned money behind them. If an order is sent to buy a specific security at a specific price in a specific amount, our customers have no tolerance for those trades being executed “mostly correctly”. They want, and are paying for, the superhuman precision that only computers can provide.

In other contexts, the challenge is less one of precision and more one of usability and speed of both user input and feedback. Using a deliberately absurd example, no one wants to drive a car by natural language commands; telling the car “to turn left 13 degrees” is a lot less intuitive than yanking the steering wheel. Less silly examples abound in software, too. Much of the power of a visual interface are what designers (and I’m not one) refer to as signifiers and affordances

Signifiers are perceptible cues that designers include in (e.g.) interfaces so users can easily discover what to do. Signifiers optimize affordances, the possible actions an object allows, by indicating where and how to take action. Designers use marks, sounds and other signals to help people perform appropriate tasks.

ChatGPT, as impressive as it is, is a bit of a blank slate. You can type anything. That’s really powerful, but also kind of intimidating and cognitively demanding. Compare that to the various signifiers in the form of buttons, sliders, and tabs in say, Spotify. I often just click around Spotify without explicitly searching for anything when I’m lazy and just want to discover stuff without putting in energy. And there’s a big play button I can hit when I’m ready. 

Ultimately, the main thing that I think won’t change is the value of good design. It will be the job of talented product designers to debate where natural language is the best interface, where other types of interfaces make more sense, and how those two different modes should work together. The best customer experiences are typically the result of the product developer taking end-to-end ownership, often by vertical integration of the product’s value chain. Customers don’t really care about what technology you use, but rather that you fuse the right technologies in such a way that they successfully solve their problems. And at the end of the day, even the best LLMs have no intrinsic preferences or taste. If money can’t buy taste, neither can more parameters or GPUs.