AI + Us: Who decides what comes next?

Am I just a next-word-prediction tool? Are we really all that different from AI?

Jul 26, 2024

A common claim around generative AI and the likes of ChatGPT is that it’s not really intelligent, it’s just a next-word-predictor.

I don’t disagree with the statement, all the model does is basically predict words based on a given context (a prompt or question), but it’s fun to ask, isn’t that all we do too? (well, maybe only fun in certain settings. Maybe the kind of thing that might be interesting as a late night debate)

white and black printer paper — Photo by Glen Carrie on Unsplash

As this might sit kinda weird for some folk, I’ll get my caveat in upfront: These aren’t serious views, it’s just fun to think through some of this stuff.

So whatever, don’t take it too seriously.

To go through this, we’ll look at a fundamental building block for generative AI and how that works (from a super high, non-technical level) as well as look at human behaviour stuff, thinking through the case for us just being next-word-predictors.

It’s called a neural network for a reason

The core, underlying technology that generative AI models such as ChatGPT are based on is called Neural Network. Neural Networks are a concept that have been around for decades and is basically a software model of the human brain (you’ll have to forgive this simplification - but the idea was born out of some ideas about how we think the brain works).

The human brain is made up of cells called neurons, and they are connected via synapses. The brain has around 86 billion neuron cells and are all interconnected with something in the region of 100 trillion synapses. Each neuron cell in the brain, when given some input or stimuli, will fire or not (basically each of these billions of cells are capable of giving a yes/no answer when given some input), sending it’s yes/no answer to all the connected neurons (which then go through the same process of deciding yes/no and sending that to the next neurons) - nothing more complicated than that. It’s the network of all these billions of cells combining with their yes/no answers that produce the output - the action/response/thought from the brain.

This is exactly the basis for neural networks in AI - they have billions of artificial “neurons” interconnected, all capable of saying yes/no. ChatGPT, and neural networks in general, measure the size of the model in terms of “parameters” which equates to the total number of both neurons and synapses combined - ChatGPT3.5 reportedly had 175 billion parameters, so not at the scale of human brain, but just because it’s smaller, that doesn’t mean its not conceptually similar (but of course not as powerful).

Neural Network training aka brain training

The process of training this dumb collection of neurons is done the same for human learning and AI learning. They get trained by consuming a load of data and expected answers. If you are familiar with the Look, Cover, Write, Check process for children learning to spell, it’s basically this approach (and if you aren’t familiar, then hopefully you can work out what it involves from the name..).

Let’s say, you want to train a Neural Network to identify if a photo is of a dog, you might start out with a huge bank of photos, all labelled as dog-or-not (the labelling is the important part - it needs to work out if its correct or not). You repeatedly feed in the photos sequentially, and the Neural Network will try to identify if the picture is a dog or not.

At first it will have no idea whatsoever and just randomly label things, but each time it gets it wrong it goes back and does some maths to adjust its network and keeps trying. Overtime, those mathematical adjustments result in your IsItADogGPT1 Neural Network getting better, maybe it will be start being able to get good success rates when the photos are of dogs vs lets say people, but then show it a photo of a fox and it will get stumped. From there we keep feeding in photos - both positive and negative examples, until eventually it becomes pretty good at reliably identifying dogs only.

Now, what can also happen here is that rather than learning general traits of dogs, if you train it for too long on too narrow a data set, it is at risk of basically learning to recognise your collection of photos - and if you then show it a brand new photo of a dog it might perform terribly. This is a concept called over-fitting - our end goal is to achieve generalisation, that is good performance on previously unseen data - and this over-fitting is where our model ends up memorising our data set to rigidly.

Why are we talking about this? Well, it’ll come up later, so just remember this.

It’s all just child play

Essentially, if you have witnessed children grow up (or can remember a time when you have tried to learn something), the above same process very much mirrors the way a human learns.

Examples from early childhood are the easiest to consider when comparing with the learning process of Neural Networks (AI). A child learning to identify animals go through that same early process - once they can identify a horse, understandably, lots of other four legged creatures such as ponies, zebras, even antelope, might get mis identified as a horse. Likewise with a dog, as per our earlier example, if you show a child an animal picture book they might learn a picture of a dog, but show it a picture of a fox or a wolf, and they will probably mis-categorise it as a dog as well.

My favourite anecdotal story that illustrates this can be seen in lots of modern children. Kids these days often get more early exposure to touch screen computing devices such as mobile phones and tablets than other devices - so they learn to identify all computing screens as touch screen, and instinctively attempt to swipe and touch laptop screens. On more than one occasion I have seen children encounter a TV and instinctively assume it was a touch screen and try to swipe it to change things.

As already mentioned, the common schooling technique of Look, Cover, Write, Check is used to help children learn how to spell words - it’s a technique that is effective and teaching children to spell the specific words under-test, but not more broadly about language and comprehension. Memorisation through repetition is the same process as training Neural Networks/AI and has the same phases of learning.

You’ve got an organ going there, no wonder the sound has so much body

I’m not a biologist, so maybe I’ll be dragged for this one, but the brain is an organ (remember my earlier caveat: don’t take this too seriously). Your heart, stomach, liver are all just organs too. Seemingly, those are all fairly dumb, in the sense that they have tasks they are dedicated to, and manage them reliably. We don’t think the stomach has special powers - it digests food, manages acidity etc (whatever else the stomach does - see above note about me not being a biologist). Of course all organs are kind of incredible in their own right - but they are organs, they perform tasks, and many of them perform tasks that we can model and simulate in the real world with machines.

So why should we imagine the brain to be different? Of course, yes, it’s incredibly complicated, and we don’t understand lots about how it works, but it’s an organ so why would it be so unimaginable to think that we can’t model and simulate it elsewhere? And if we did, what would that mean? What would it mean for how we think about consciousness?

For me, this is one of the most interesting questions

Put aside any doubts you might have about ChatGPT and the current crop of Generative AI - if in the future, and maybe it’s years away, we understand how the brain works and manage to model it completely in a computer, what would it mean? what are the implications? does it change how we think about consciousness or what is it to be alive? Is it ok to turn it off?

Sing for the moment

I also like to use compulsion to sing as a good example of over-fitting (I told you we’d come back to it). As I have mentioned in other articles already, I have a strange compulsion to use song lyrics in titles (there are two in this newsletter alone), and I often find that there are phrases or words that will immediately trigger song lyrics in my head. If you are in to music, or have spent a lot of time listening to it, then you might find you often spontaneously burst into song (either out loud or in your head) on hearing particular words or phrases too - and I’m not sure what that is, if not an outright example of next-word-prediction. Just like a Large Language Model (LLM/AI/ChatGPT), I’m given a prompt and my brain predicts the next words.

The examples when this happens won’t be with any old song, of course, having listened to loads of music, it will be songs I have listened and re-listened to many times over. Having heard those songs and lyrics so many times they re-enforce the neural-pathways in my brain to the point of over-fitting - I can’t hear those phrases or words without my brain always going to that same next word.

This is what happens when you learn words for a play or song - you repeat the words over and again, re-reading them, over-fitting your brain so it automatically predicts the same exact words when prompted. If you wanted to train a Neural Network to perform a play, then over-fitting is exactly what you’d want (you’d probably train it with training data - e.g. the script - and also have alternative non-script test data where you’d be looking to maximise the error) - but of course, we never want to do that, we want our AI to be able to generalise and vary its answers.

Just the other week I had two incidents that fall into this next-word prediction example:

I had a delivery, and my son asked me “whats in the box? whats in the box?!” (you can probably see where this is going) - to which, without hesitation, I shouted back “what’s in the box?! what’s in the box?!” imitating the famous closing scene of the movie Seven. I asked ChatGPT what it thought of on hearing the phrase too..
I had an email from someone who’s name has similar syllables and sounds as the singer Jason Derulo - and as soon as I saw his name in my inbox I started singing his name in the style of Jason Derulo (this isn’t because I have listened to lots of Jason Derulo, but rather that every song he sings, he sings his own name - if you aren’t familiar with what I mean, check out 57 minutes of Jason Derulo singing his own name)

Misspeaking

Another favourite example of my brain just carrying out routine next-word-prediction is misspeaking. When I say misspeak, I don’t mean fumbling my words, but when in conversation I state something that my brain definitely knows is wrong, but because it’s predicting words in a sequence it gets it wrong.

The kind of conversation I mean might go as follows:

A: Did you see the game last night?
Me: Yeah, was great, couldn’t believe they managed to hold out like that, really thought Man Utd were going to score
Me: I mean, Man City

(first off, I should address the terrible example conversation - it feels like that episode of The IT Crowd where they try to talk about football, but I couldn’t remember precise examples of when this has happened to me so I went with what I felt might be an accessible example)

The kind of scenario I’m trying to describe is where you 100% know the facts and details of an event, but in the context of a faster moving conversation, you misstate some fact and say something that your brain definitely knows is categorically incorrect.

Often times you might immediately, in the next breath, correct yourself - so there can be no doubting that your brain fairly confidently knows the correct word. So what happened? Why did you say the wrong word (that you know, immediately, is 100% incorrect)? It looks an awful lot like next-word-prediction in action to me, with a moving conversation your brain looked at potential words that might come next and got the wrong one.

When it’s in a higher-speed, more open context, the questions and text our brain is processing becomes a lot blurrier. My brain is no longer dealing with a simple, single question “which teams played in the match?”, which it could easily answer, instead its dealing with a stream of text and the context of an entire conversation, and in the context of that sentence a lot of answers look like the right answer, and in lots of other contexts those other teams (words) would have made perfect sense as the next predicted word in the sentence. All that has happened is my brain simply picked the wrong word - I talked about word prediction in LLMs previously, in the context of ChatGPT struggling exactly the same way whilst answering questions about rugby, where lots of possible words look like they might be correct, my brain is doing the same thing here.

Whilst talking about LLMs processing we talk about something called a “context window” - which is basically how much text input can the LLM take. Initially it was quite a small window, but models are continually increasing the windows (Google’s recent models can deal with 1 million tokens). When I am asked a question, I am dealing with a very specific, and small context window (just the question) which is easy, but when I am in a conversation, my context-window becomes a lot bigger as my brain has to continually process the ongoing conversation. For example, someone might spend 5 minutes going on about how great their team was playing, and all the good points, but then say “but then they got a goal in the last minute!”, referring to the opposition team last mentioned some time ago.

Conclusion

So, here’s the case for us being not that different after all:

AI is modelled on some elements of what we know about the human brain
The human brain is just an organ, doing (albeit complex) tasks
We can observe that children learn and behave the same way as AI does
Humans can memorise fixed texts or scripts, through a process that Neural Networks also mimic, behaviourally, called over-fitting
For phrases that my brain has been over-exposed to (or over trained on), it can’t help but immediately start predicting the relevant following words when prompted (again, if that’s not just next word prediction then what is?)
In conversation, or higher speed interactions, my brain will misspeak and convey facts I know are not correct - my brain is predicting the next word and sometimes, just like ChatGPT, it hallucinates and predicts the wrong word.

I have previously written about AI + creativity which touches on some similar ideas too, if that interests you!

But does it really make any difference? I’m not sure it does. And these are just my incredible thoughts anyway, so conclude what you like..

assorted color and pattern game application — Photo by Margarida Afonso on Unsplash

Thank you for reading Incredible Thoughts. This post is public so feel free to share it.

I’m calling it IsItADogGPT as a fun reference to chatGPT and to make the example a bit more accessible. I know GPT stands for “Generative Pre-trained Transformers” - and is an extension to the vanilla, feed-forward, back-prop neural-network style architecture I am talking about here. Calling it this was to serve as a joke, not a technical reference. Don’t @ me.