Might AI "turn evil"?

That's a difficult question, but the answer might be pivotal for our understanding of AI ethics and our approach to AI development. Which are, in their own rights, fascinating topics I hope to return to elsewhere.

The question ultimately comes down to the issue of goals. In order to have a strong chance of creating "good" AI, we need to give our AI goals which align with our idea with what is good.

It's important to interpret this in a charitable way. We might not all agree that machines can "have goals", but we should agree that machines can exhibit goal-oriented behaviour.

This is a somewhat meandering post, but that I hope will tell a story that starts by answering slightly easier questions, and work our way up through other clues in history in science to answer the original question.

Why does anything happen at all?


On a cosmic scale, things "happen" in the logical intersection between time and change. The second law of thermodynamics tells us that the entropy of an isolated system can only increase. In other words, over time, things go from more ordered to less ordered. Sandcastles are never accidentally built by the winds over time, but they are always destroyed over time.

The concept of a goal is built into fundamental reality. Entropy is just one example of natural phenomena which tend towards something – in this case, disorder.


Doesn't the existence of life fly in the face of entropy? Well, fortunately for us, the complexity of part of a system can increase so long as on average entropy is increasing.

Life has adopted goal-achievement, and taken it for its own, colouring our world with a unique theatre of beauty and violence.

Around 3.8 billion years ago when life first came to be, and for the next 2 billion years, the only organisms – single-celled organisms – followed the same goal. Their survival was ensured by sensing their surroundings in quite an ingenious way. They could sense chemicals in the water, and compare it with chemicals they sensed moments ago. They then release their own chemicals or move correspondingly depending on whether the chemicals they sensed got "better" or not. They wouldn't have known it, of course, but they were the first beings to have the goals of reproduction and survival.

Around 1 billion years later, at the beginning of a period known as the Cambrian, there was a great explosion of life. The arms race truly began. Life started sprouting bones, tentacles, eyes and claws. Behaviour changed, becoming based around interaction with other life. But the goals remained the same – reproduce, survive.


It's hard to say exactly when we, humans, truly became moral agents. We certainly are now – we are able to assess moral conundra and make assessments based on what we know and what we believe is right. Perhaps this happened some time during the Neolithic revolution, when we stopped hunting and gathering, and became the first life on earth to create settlements based on agriculture. Whenever it was, humans now have the possibility of acting altruistically. (This stuff is fascinating. If you've not yet read Sapiens – see sources below – I highly recommend it – thorough, readable and humourous!)

The acceleration of change for humans over the last couple of centuries has been like nothing else we have seen. As Hans Rosling tells us in Factfulness (see below), 70 years ago, you could only expect to live past 50 in four countries. Today, you can expect to live past 50 in every country in the world. Existence is no longer merely about survival for most humans. Existence is increasingly about comfort, pleasure, entertainment and social validation.

Over the last century, we have an evolved an extremely broad range of goals today. Most of these goals have never been set at any time in history – for example, landing on the moon, transmitting data through the air, or posing the perfect selfie for Instagram.

Some of these goals actually turn us against the goals set for us by 3.8 billion years of evolution – reproduce, survive. The most extreme examples might be to end our own lives through suicide or to become celibate.

We have more diverse goals than ever. And they're not always consistent with each other.

Outsourcing goal achievement

We're building tech to generate value to humans at an exponential rate. Right now, the majority of that tech is non-AI. It exists to serve a purpose, such as crunching data to tell us how successful our marketing campaigns are, or converting PNGs into JPGs. But now AI is on the rise. More and more people and businesses are dedicating themselves to creating "intelligent" machines – or if you prefer, machines which mimic human intelligence.

Creating "good" AI

How do optimise our chances of creating "good" AI? It comes back to our goals. What is "good" in humans terms aligns to what goals we think are good to achieve. To optimise our chances of creating "good" AI, we need to try to align the AI to our "good" human goals.

But there are two problems with that.

What is "good"?

Most humans broadly agree on a set of moral principles, perhaps exemplified in our social contract. This might include moral norms such as "we should not steal" or "we should be truthful".

Yet a range of extremely difficult moral questions remain. This is exactly the problem faced by the developers of driverless cars. The MIT developed a platform called the Moral Machine which gathers human perspectives on moral decisions made by AI by presenting a range of moral dilemmas to the user, who must choose between two options in the case of a collision. Very few people share exactly the same view on what the "right" action would be.

The problem of perfect alignment

Let's imagine that a humanoid robot programmed with advanced AI needs to make a "trolley problem" style moral choice. How does it choose? Ultimately, it will choose in line with its pre-programmed goals. Of course, it probably wasn't pre-programmed with the exact scenario. So long as the AI hasn't been able to override its own goals, if the goals are sufficiently well programmed in the first place, it will make a decision based on those goals.

This brings us to our first requirement for moral AI: moral goals should not be able to be overriden. Logically, if you programme a machine to aim only to do X, overriding that programming would be contrary to the goal of achieving X, and you might therefore believe this impossible. However, if you programme a machine with an array of goals, it's impossible to guarantee that those goals may not at some time contradict each other. So we must think incredibly carefully about the way this is programmed.

Our ability to do this, however, is limited by the finitude of human understanding. Yet the possible combinations of future events are at least indefinite and probably infinite given the existence of quantum events. Since humans can't know about these in advance, the combinations are effectively infinite.

Even if we could, in principle, programme the perfectly morally aligned AI machine, it's almost certainly impossible to decide which set of moral norms to align it to – never mind align it to the reality that these costly development exercises will likely have other goals altogether.

Why are we trying to build AI at all? Some are doing it out of pure curiosity. Some are doing it for recognition. Others are doing it to generate profits. And even if we do everything right, there is no guarantee the programmes could not be hijacked by those looking to do evil through AI.

The future

History has taught us that the universe we live in does not predispose anything to be good at simultaneously aiming to achieve a range of goals at once.

Humans have only had anything that vaguely resembles a computer for less than a century. To truly believe that we can create reliably "good" AI (whatever that might mean) is human arrogance in its full stride.

Sadly, this is increasingly the norm – with extremely human consequences. I'll come back to this topic soon..!


  • Other Minds - Peter Godfrey-Smith
  • Life 3.0 - Max Tegmark
  • Sapiens - Yuval Noah Harari
  • Factfulness – Hans Rosling