## How We Know What We Know – Diagonal Proof

(This post continues a series that I started writing last year. Click on the “How we know what we know” tag to read those earlier posts. In particular, if you’re not familiar with the concept of proof by contradiction, you might want to read the third post before reading this.)

The idea of infinity is one that has caused a lot of people a lot of trouble, over the years, from theologians asking whether an omnipotent God is capable of creating a stone he can’t lift, to physicist David Deutsch, whose 2011 book *The Beginning Of Infinity* tries, among other things, to use trans-finite mathematics to prove that Britain’s thoroughly broken electoral system is actually the best possible system.

A lot of these problems come from the fact that we use ‘infinity’ as a number, when in fact it’s no such thing – it’s the place where numbers break down. It’s a placeholder for something that doesn’t exist. Thinking of ‘infinity’ as a number is like thinking of the word ‘person’ as a person – it’s a category error.

Infinity doesn’t behave like a number. If you multiply it by anything, you get infinity. If you add anything to it, you get infinity. If you divide it by anything, you get infinity. If you take anything away from it, you get infinity.

So it should hardly be surprising that the whole idea of talking about ‘infinity’ is wrong. We should, in fact, be talking about infinit*ies*.

Because there are infinities of different sizes. This was proved by Georg Cantor in his famous ‘diagonal proof’.

In mathematics, two sets are the same size if there is what is called a one-to-one correspondence between them. So, for example, the set containing all the natural numbers [FOOTNOTE These are the 'counting numbers' - one, two, three and so on] is exactly the same size as the set containing all the even numbers, because for every natural number you can assign an even number to it:

1 – 2

2 – 4

3 – 6

4 – 8

…

And so forth. No matter how far you go, you will always be able to find an even number that is double any natural number, so the set of even numbers is the same size as the set of natural numbers.

The same goes for the odd numbers:

1 – 1

2 – 3

3 – 5

4 – 7

…

This makes a kind of sense, even though it seems a bit odd. There are only half as many even numbers as there are natural numbers, but half of infinity is still infinity.

But now let’s look at the real numbers. These are all the decimal numbers – 0.5, 1.2548356, and so on. To make it simple, we’ll look just at those decimal numbers that are greater than 0 and less than 1.

We can try to match these up with the naturals, too. It doesn’t really matter what order we match them up in, so long as we can match each one up with a single natural number:

1 – 0.123456… [FOOTNOTE - Here an ellipsis means the number carries on indefinitely]

2 – 0.135468…

3 – 0.954651…

4 – 0.154684…

5 – 0.364548…

6 – 0.584678…

And so on. So far, so good, right? Natural numbers, matched up with decimal numbers.

What Cantor then did was take the first digit of the first number, the second digit of the second number, and so on:

1 – 0.**1**23456…

2 – 0.1**3**5468…

3 – 0.95**4**651…

4 – 0.154**6**84…

5 – 0.3645**4**8…

6 – 0.58467**8**…

This gives, in our example, the number 0.134648…

The clever thing Cantor then did was to add one to each digit (ticking over so that nine becomes zero), getting, in our example, the number 0.245759…

That number is now very interesting, because it does not appear anywhere on the list, no matter how far you go down. Its first digit is different from the first digit of the first number, so it can’t be the first number. Its second digit is different from the second digit of the second number, so it can’t be the second number. The seven-billion-and-sixty-ninth digit, if we continued looking that far, would be different from the seven-billion-and-sixty-ninth digit of the seven-billion-and-sixty-ninth number.

So this number doesn’t appear anywhere on the list. It can’t.

This can only mean one thing – that there are more real numbers between zero and one than there are natural numbers. So some infinities are bigger than others.

For a long time, people thought that Cantor’s proof must be mistaken in some way, that it must be the equivalent of those ‘proofs’ you sometimes see that one equals two, most of which have a division by zero hidden in them somewhere. Surely infinity just meant infinity. The idea of a smaller and a larger infinity (which Cantor labelled “aleph-null” and “aleph-one”) made no sense to anyone. Those who did think about it thought it was mostly a curiosity, rather than a particularly important result.

But then in the twentieth century, Cantor’s argument became the basis of a mathematical proof which completely changed how mathematicians think about what they do, and which in turn led to the invention of the computer. We’ll pick up on that next time…

## Part 5: Zatanna

This essay appears in a revised form in my book *An Incomprehensible Condition: An Unauthorised Guide To Grant Morrison’s Seven Soldiers*. Paperback, Hardback, Kindle (US), Kindle (UK), other ebook formats

This isn’t going to be about what you expect it to be.

Other than Mister Miracle, Zatanna is probably the most explicit statement of the basic themes of Seven Soldiers that Morrison could make, and yet people have been so confused by its form (a parody of another comic) that they really haven’t looked. It’s a great piece of sleight of hand by Morrison. While everyone is laughing at references to beards, the real information is getting slipped in under our noses.

The ‘m’ in M-theory very deliberately doesn’t stand for anything, at all. While the word comes from ‘membrane’ – as in the membrane universes it describes, Edward Witten, its creator, says “M can stand variously for ‘magic’, ‘mystery’, or ‘matrix’, according to one’s taste.” while Michio Kaku favours ‘mother’.

There’s an area of physics called ‘string theory’. As a matter of fact, this – and M-theory – are misnomers. A theory, in science, has predictive power – people have been able to come up with tests of the theory, and run those tests, and the result has been consistent with the theory. String ‘theory’ should really be called the string hypothesis – as it makes no predictions which are currently testable, let alone actually tested. Unlike quantum theory, or thermodynamics, it’s not made a single prediction which can be confirmed in the observable physical world. In fact, possibly even hypothesis is too strong a word – string philosophy, or string religion, might be better.

But despite this complete lack of testable predictions, physicists have been working on string theory for over forty years. This is because we currently have two separate theories of the universe – General Relativity and Quantum Mechanics – which are both, as far as we can see, absolutely accurate, with no exceptions to either ever having been found, but which are incompatible.

And the reason for this is gravity – General Relativity explains gravity perfectly, while Quantum Mechanics doesn’t. But QM *does* though show that all the other fundamental forces – the strong and weak nuclear forces and electromagnetism (which itself unifies such apparently-disparate phenomena as light, radio waves, magnetism and electricity) – are really all different aspects of the same thing. {FOOTNOTE: I am oversimplifying enormously here, but the gist of this is correct. If you want to understand all the details, read The Feynman Lectures On Physics, follow it with The Road To Reality by Roger Penrose (which is a much worse book but covers the decades of scientific progress since the Feynman lectures were released) and then read The Fabric Of Reality by David Deutsch to disabuse yourself of some of the wrong notions in The Road To Reality. At which point you’ll know about as much about this stuff as I do – which is to say you’ll *realise* you know nothing.} And physicists think that any successful ‘theory of everything’ will show that gravity is really the same thing as all the other forces, because it would be neater that way.

This isn’t as stupid a reason as it sounds, if you know about things like Kolmogrove Complexity, Solomonoff Induction and message entropy – and it’s how people like Einstein worked. Einstein didn’t get his theories of relativity by checking experimental results, but by trying to remove various bits of mathematical ugliness and come up with more universal equations.

Remember though what I said in the last essay – saying “everything is connected to everything else” is the same as saying “nothing is connected to anything” as far as information goes. Physicists look for symmetries, but it’s symmetries breaking that’s where the interesting stuff happens. A universe where everything was exactly the same as everything else would be a universe with nothing at all in it.

And so, whether gravity is in some sense ‘the same’ as electricity, as magnetism, as light, as the forces that hold atoms together – and we have every reason to think it is – in important ways *it is still different*. And without those differences – without those unique properties of gravity – apples wouldn’t fall to the ground and black holes wouldn’t exist. It’s in the differences, not the similarities, that the flavour of the world resides.

But nonetheless, we do think those similarities are there, and we want to find them, so we can better understand this universe in which we find ourselves.

There have been several attempts at Theories Of Everything that do this over the years – Einstein spent the last forty years of his life working on various dead-end attempts, and the physicist Frank Tipler has argued in a rather wonderful paper that Richard Feynman actually *did* discover the theory of everything, back in the 1960s, but hadn’t realised it because his theory unfortunately required an infinite number of terms in the equations.{FOOTNOTE Tipler has *also* argued at times that he’s proved the existence of God, that Barack Obama is evil because he doesn’t believe in aether, and that if we clone Jesus using genetic material from the Turin Shroud we’ll be able to figure out how to get free energy from baryon annihilation. He’s one of the more…original…thinkers in physics. But in this case he makes a reasonable argument.} But none of these have had much success among what for want of a better term we can call the physics ‘community’, in part because they’re not neat. They’re not nice.

String theory is nice. And it ties up gravity and electromagnetism in a neat little bow.

What string theory says is that rather than particles being 0-dimensional points, like conventional physics says, they’re actually the end of one-dimensional lines (‘strings’) that can vibrate in more dimensions than we can see. In the same way that a guitar string vibrating up and down can make different musical notes, a one-dimensional string vibrating in ten dimensions can give the appearance of a zero-dimensional particle moving in a four-dimensional spacetime.

In this model a photon (the particles that carry the electromagnetic force – ‘light particles’) is one of the things you get from a string whose ends are dangling loose, while a graviton (the hypothetical particle that would carry the gravitational force, that has never yet been observed) would be what you’d get from a string whose ends were joined, forming a loop.

The only slight problem with this – a beautiful piece of mathematics – was that people very quickly noticed that there’s more than one way of doing this, and by the early 1990s there were five different string theories. All of them had the same basic idea – that you have 1-d strings vibrating in N dimensions – but their models all had different numbers of dimensions, and made different predictions (without any of them making the kind of prediction *that can be tested*). If string theory was going to survive at all, something else had to come along.

That something was M-theory.

**Matrix Theory**

What M-theory says is that there are actually even more dimensions than that – that our 0-D particles in 4D spacetime that are really 1-D strings in 10D spacetime are *really* 1-D slices of 2-D sheets (membranes, or ‘branes’ for short) in an 11-D spacetime. All of the competing string theories were just selecting different sets of ten dimensions out of the eleven ‘real’ ones (think of the blind people and the elephant). The reason why gravity looks different from the other forces is that the strings that cause the ‘normal’ forces are open-ended, but the ends are stuck to p-dimensional ‘branes (or p-branes for short. This is physicist humour), while gravitons move freely between different ‘branes because their loop structure stops them sticking to anything.

M-theory also gives an explanation, of sorts, for the existence of the universe. It says that multi-dimensional ‘branes are rippled, and that two of them at some point banged together – and our universe is a four-dimensional interference pattern from the ripples on those two p-branes. The ‘lumpiness’ of the universe (the way matter clusters together into galaxies with vast tracts of space in between) comes from some of the ripples cancelling each other out and others reinforcing each other, while the expansion is caused by the two branes moving.

Now, this is pretty much exactly like the way holograms are created {FOOTNOTE: If you don’t know about how holograms are created, Wikipedia has a good explanation} and indeed it is {FOOTNOTE: I think. This is not my area of expertise – I’ve skim-read tons of papers on cosmology and particle physics, but my main scientific interests are rather more esoteric areas to do with the application of pure mathematics. Please don’t blame me for any epistemic failures caused by this essay.} a special case of a rather more general area of string theory, the ‘holographic universe’ principle.

This principle says that rather than being, as we appear, a three-dimensional {FOOTNOTE: Here I’m talking only of spacelike dimensions} universe, we’re actually only a two-dimensional pattern of information – like the panels of a comic book – ‘painted on’ the cosmological horizon (the part of the universe past which it’s impossible even in principle to see anything). But that information encodes a third dimension implicitly – the same way you can get a three-dimensional hologram on a two-dimensional image.

To explain why, we need to look at the connections between information, entropy, gravity and black holes {FOOTNOTE: For more on all these things, and on Seven Soldiers, and many other subjects that connect to this series of essays, see my book Sci-Ence! Justice Leak!}

The reason for this is something called the Black Hole Information Paradox, discovered by Stephen Hawking (more or less as a trivial lemma based on the more important work of Jakob Bekenstein). Black holes must have entropy, as Bekenstein showed, because otherwise we could violate the Second Law of Thermodynamics (just get a piece of Highest Entropy Matter and throw it into the black hole – the entropy outside the black hole decreases, so the entropy inside the black hole must increase). Unfortunately, they also have something called Hawking Radiation – they let out energy. But that energy is – has to be – random. Which means that information that goes into the black hole has to stay there – it’s been destroyed as far as the outside universe is concerned. Which shouldn’t happen – conservation of information is actually the same thing as the Second Law. {FOOTNOTE: The best guess at the moment is that the energy coming out is not *quite* random, so information can eventually leak out of a black hole, given enough time. Hawking now claims that everything, yes everything, can escape the deadly gravitational pull of a black hole – it just takes a while.}

But the interesting thing is that black holes must have the highest possible information density, because of this – you cannot have something that contains more information in a given space than a black hole. And Bekenstein worked out how much information this is – it’s called the Bekenstein Bound – and discovered it was I<=2piRE/hcln2 {If I do turn this into a book, you can see this formula all nicely typeset}

Here I is the information, and the important thing to note is that it's proportional to R, rather than say to R squared or cubed. In other words, I increases with the derivative of the surface area of the sphere, not of the volume. In other other words, if you have a sphere of any size – even universe size – and it's got maximum information density, you can get all the information that's in it just from its surface, without having to look inside.

Which means from an information point of view, the whole visible universe might as well be inside a black hole – and when the universe expands, that's other stuff falling into the black hole from outside.

And another way of saying that is that the whole three-dimensional spatial universe is just a mathematical artefact, and we're 'really' a two-dimensional pattern of information, spread infinitely thinly on the outside of a three-dimensional bubble. It just feels to us like we're inside.

Note that while the holographic principle – the idea that we are a hologram – depends on string theory, the rest of this doesn't. That *is* the maximum amount of information that can be contained in a sphere, and it *is* the amount that is contained in a black hole. Whether we're holograms or not, we *can* be described – 100% accurately – by just the information on the surface of the smallest possible sphere we could fit in. What's on the inside doesn't count – surfaces matter.

Mystery Theory

But just what *is* information?

As defined by Claude Shannon, information is the same thing as unpredictability – if you’re given a sequence, the information in the next item in the sequence is the inverse of the probability you could have predicted it given the previous items.

For example, if I give you a sequence 1, 2, 3, 4, 5, 6…, telling you the next number is seven gives you very little new information, because you could have predicted it with very high probability from the previous numbers.

If I say “my love is like a red, red”, you can guess that the next word is ‘rose’ – saying ‘rose’ won’t give you any new information. But if it turns out that my love is, in fact, like a red, red baboon’s bottom, then you’ve got some new information.

Now, the interesting thing about this is that information and entropy are the same thing. I’m not going to show you a formal proof of that here, but I can sketch it informally:

You can think of the information content of something as being the length of the shortest message you could write giving a precise description of it. Imagine you have a perfectly cubic crystal, made of just one type of atom, with no impurities, and it’s precisely one centimeter on each side. To describe that, you just say “a 1 cm cubic crystal of atom X”, and that contains *all* the information about it.

Now suppose you drop the crystal on the floor and it shatters into a thousand pieces, all of them irregular. To describe that perfectly, you need to describe the shape of all the different pieces and where they are in relation to each other. You’d need a rather large book to give all that information. A loss of order has become a gain in information (a gain in the information in the object, that is. You’ve lost the information you had about the object).

This is a rather more important thing than you might realise – this is the reason why entropy always increases. Because there is only *one* way for the atoms in that cube to be arranged in a perfect crystalline cube, but a functionally-infinite number of ways for the atoms to be arranged in ways that *aren’t* a perfect crystalline cube. Any deviation at all from an ordered state is far, far more likely to go to a disordered state (a state that takes more information to describe) than to an ordered one. But a disordered state is still more likely to go to another disordered state than back to the ordered one.

Information is the same as entropy, and so processing information produces waste heat – this is why your laptop gets hot.

And increase in entropy is the same thing as time.

This may not seem intuitively obvious, but it’s a fact. In general, the laws of physics are time-invariant – they don’t have an arrow of time built in. Newton’s laws of motion, for example, look exactly the same going forwards and backwards in time – if you took a film of the solar system, with all the planets going round the sun, and ran it backwards, there would be nothing there that looked wrong. There are very good mathematical reasons for thinking that time does not, in any real sense, exist at all.

What do exist, though, are different states of entropy, different configurations of matter. And each of those configuration spaces (let’s call them ‘universes’ for now) contains information about other configuration states. And that information always seems to describe another, slightly more ordered, configuration space (it couldn’t describe a less ordered one, because that would take more space than there is in the universe, obviously). We call that described configuration space ‘the past’. We call those configuration states that are more disordered than this one, that can be predicted from this one (but not perfectly, otherwise the description would take up more space than there is in the universe) ‘the future’.

This is why we can know the past but not know the future – why, indeed, there are always many possible futures but only one past. Because the number of more disordered states is always greater than the number of more ordered states. {FOOTNOTE: For more on this see Julian Barbour’s excellent book The End Of Time. In fairness, I should point out that Barbour’s timeless, Machian, formulation of physics is just as speculative as string theory. The difference is that while string theory is messy and postulates many extra dimensions we can’t see, Barbour’s formulation is beautiful and does away with one. I should be very surprised to see string theory or M-theory lead to a successful, testable theory except via the sort of simplifying process by which phlogiston led to oxygen or the Lorenz contraction to relativity, but I should be even more surprised if something like Barbour’s formulation doesn’t eventually become the basis of our standard understanding of physics.}

In fact, information is so crucial – information, entropy and time are so tied up – that several physicists have suggested that information, rather than matter or energy, is what the universe is made of. Perhaps most famously, John Wheeler {FOOTNOTE: A contender for greatest American physicist of the twentieth century, possibly only topped by his student Richard Feynman, it would take more space than I have here to explain why Wheeler’s opinion matters. Just trust me – he knew what he was talking about.} wrote:

It from bit. Otherwise put, every ‘it’—every particle, every field of force, even the space-time continuum itself—derives its function, its meaning, its very existence entirely—even if in some contexts indirectly—from the apparatus-elicited answers to yes-or-no questions, binary choices, bits. ‘It from bit’ symbolizes the idea that every item of the physical world has at bottom—a very deep bottom, in most instances—an immaterial source and explanation; that which we call reality arises in the last analysis from the posing of yes–no questions and the registering of equipment-evoked responses; in short, that all things physical are information-theoretic in origin and that this is a participatory universe.

Now, my own opinion is that It’s More Complicated Than That, and that Wheeler was in a sense being confused by the Copenhagen interpretation which he never abandoned (even though he put his name to his grad. student Hugh Everett’s explanation of the more reasonable Many Worlds theory), but in another, deeper sense he was right. E.T. Jaynes showed that we can derive probability theory from pure logic. Time, entropy and many conservation laws in physics can be derived from probability theory. So it’s entirely possible that when we get the final Theory Of Everything, it will be derivable entirely from pure logic and computation on a small amount of initial information.

So if all that is right, then what are we? Rather than a three-dimensional universe existing in time, we’re a whole series of still, two-dimensional patterns of information – two dimensional patterns on a three-dimensional surface – and we don’t have any existence in time at all. There’s just a lot of two dimensional patterns, next to each other in some sense, which you can put in order and perceive as a story.

When Morrison wants us to have empathy for comic characters – when he gets us to reach out our hand and touch Zatanna’s, to help her save herself (and is there *any* reader, no matter how sceptical and materialist, who *didn’t* touch Zee’s hand when they got to that part? Who *didn’t* reach out to help her? I hope I never meet someone so lacking in feelings…), he really wants us to save *ourselves*. One of the big, big themes of Seven Soldiers, one that Morrison practically bludgeons us over the head with, is that we should be careful what we create, and be kind to our creations. Be they robots, golems, amorphous beings taking the shape of our perfect lover, or be they our children – or the comic characters we create – we should help them up when they fall. {FOOTNOTE: And if physicist Max Tegmark is to be believed, many of the things we ‘create’ have their own objective existence as separate universes. According to Tegmark’s Ultimate Ensemble Theory, not only is the universe made of information, but it’s specifically a mathematical formula – and every other mathematical formula is just as real. If so, as far as I can see, that means that every equation, every poem, every piece of music, every computer program – in short every *thought* – is a universe to itself, as real as this one.}

Because if we’re made of information, then we’re made of *words*. We can’t avoid eating the fruit of the tree of knowledge – everything we do, everything we are, is information processing. Berkeley was right when he said esse is percipi (and right when he attacked Newton on the basis that nothing is absolute, though as wrong as you can get about the infinitessimals in calculus) – nothing can exist without being perceived. But at the same time the mere act of perception is a destructive one – we increase the order in our brains by destroying the order outside. There is no such thing as a non-destructive act, or a harmless thought.

Life – and intelligence – is a constant, permanent struggle against entropy, but entropy has loaded the dice against us. We can’t possibly win, but nor can we possibly give up and admit defeat. The best thing – the only thing – we can do is to keep fighting anyway, and offer a hand up to anyone who falls in the struggle, as we ourselves have already fallen.

“We have found a strange footprint on the shores of the unknown. We have devised profound theories, one after another, to account for its origins. At last, we have succeeded in reconstructing the creature that made the footprint. And lo! It is our own.”

Sir Arthur Eddington, Space, Time, and Gravitation, 1920

**Comic issues** Zatanna #1-4

**Artists **Ryan Sook (pencils), Mick Gray (inks), Nathan Eyring (colours)

**Other credits** Jared K Fletcher (letters), Harvey Richards (asst editor), Peter Tomasi (editor)

**Connected Morrison works** Animal Man deals with many of the same themes slightly more explicitly, as does The Invisibles, but probably the most thematically-similar work, though different in flavour, is The Filth

**
Look Out For** 2D projections of 3D spaces, dice, form and in-form-ation, top hats, “if you can’t keep it down, don’t bring it up”, hands, ‘mortal clay’ and parent problems.

**Who breaks a butterfly on the wing? How to keep young and beautiful! And a cat in a Morrison story that doesn’t die!**

Still to come in Seven Soldiers

Still to come in Seven Soldiers

## How We Know What We Know 3: Proof By Contradiction

Before we start, I’d just like to apologise for the lack of replies to comments recently, especially on part 2 of this series. My life recently has been… not bad, but full of incident, and it’s taken all my spare energy to get even the few posts I’ve managed up. Those of you who’ve followed my blog for a while will know that I go through phases of productivity and phases where getting words out is like pulling teeth. I’m in the latter at the moment but hope to be in the former again soon. That said, let’s move on…

So after parts one and two, we’re going to move away from the scientific method for a little while, and talk about mathematics. This jump between subjects is the biggest one we’re going to see in this series of posts, and it might seem that we’ve gone completely off-topic. In fact, this is the necessary background we need in order to put the topics of the first two posts into a more formal context. By post twelve in this series, we will have the outlines of a formal proof of the efficacy of the scientific method.

(Don’t worry, I’m not going to throw a huge set of equations or diagrams at anyone – it’s all going to be much the same kind of thing as the posts so far have been).

So we’re going to go back to very basic maths, of the kind that used to be taught as a matter of course in schools (it may still be, but as an adult I am legally obliged to assume that no child educated after the day I left school was taught anything). If you’ve studied anything about mathematical proofs at all, you can ignore this post – it’s aimed at those with little or no maths who want to understand what’s coming up.

This is a technique of proof that dates back at least to Euclid, and is referred to as *reductio ad absurdum* or proof by contradiction. It’s a very simple technique, but it’s very powerful. To prove a statement must be true, assume its opposite and then see what follows. If you can get a contradiction then that proves the opposite of the statement must be false, so the statement must be true.

As an example, here’s how Euclid used it to prove there is no highest prime number:

Assume there is a highest prime number – say, for argument’s sake, we choose 37. Now, by definition, a prime number is a number that has no divisors except itself and one. So if 37 is the highest prime number, all numbers over 37 must be divisible by a number between two and thirty-seven inclusive, by definition.

But what about the number (2x3x…35x36x37)+1?

It’s not divisible by two, because if you try you’re left with a remainder of one. It’s not divisible by three, because if you try you’re left with a remainder of one…

So that leaves two alternatives. Either (2x3x…35x36x37)+1 is itself a prime, or there is some number higher than 37 but lower than that high number which is a divisor of (2x3x…35x36x37)+1 but isn’t itself divisible by any number under 37.

In other words, if 37 is the highest prime number, then there has to be a prime number higher than 37. This is a contradiction, so we’re left with 37 *not* being the highest prime. But this can work for *any* number – you can just stick n in there and have (2x3x…n-1xn)+1, for any value of n you choose.

So no number can be the highest prime number, so there must be an infinite number of primes, ever higher.

Another example:

Once we know Pythagoras’ theorem (that given a right angle triangle with sides a, b and c, where c is the hypotenuse, c^2 = a^2 + b^2), we can prove that the hypotenuse is always shorter than the sum of the other two sides (c < a+b).

So assume that a+b is less than or equal to c, the opposite of what we've assumed:

a+b<=c

Square both sides

a^2 + 2ab + b^2 <= c^2

We know that c^2 = a^ + b^2, from the Pythagorean theorem, so we can substitute that in for c^2:

a^2 + 2ab + b^2 <= a^ + b^2

Take a^ + b^2 from both sides:

2ab <=0

But we're talking about a triangle, so all the sides must have positive lengths, so the multiple of a and b can't be less than or equal to zero. So a+b can't be less than or equal to c. So c must be less than a+b .

Despite its simplicity, this technique is quite a beautiful one, mathematically. Partly, this is because it's so audacious and almost arrogant – "OK, we'll try it your way, and assume that there's a highest prime… oh look, we've proved there isn't a highest prime! Why are you hitting yourself?"

But also because, while mathematics isn't a science (though as we shall see, it may be rather more than 'a science' and may include all the sciences within it), it shows an admirably scientific attitude – *start off assuming you’re wrong, because nobody ever learned anything new by thinking they already knew everything*.

And finally, because it’s *brave*, because it exposes the fatal weakness with logical systems. If you can prove a contradiction, if you can actually show that a contradiction must be true within your system, then the whole system becomes worse than useless. If you can prove that A *and* not-A are true, then you can prove literally anything with your system. With something like mathematics, which started out as an attempt to find absolute truths, the idea that you could prove a contradiction was, for a lot of the subject’s history, very, very scary to a lot of people. Proof *by* contradiction comes quite close to this, mentally.

In fact, one of the main mathematical projects of the early part of the last century, one that took the work of some of the best mathematicians and logicians of their – or any – generation, was an attempt to prove that mathematics could not possibly have any contradictions in it, and ensure that no-one would ever find any. They failed, but to see why we’ll have to first go back to the nineteenth century, to Georg Cantor, and take a closer look at infinity…

## How We Know What We Know 2: Occam’s Razor

(Continues from part one)

So far we’ve examined how we form a scientific theory. What we need to know now is what makes a *good* theory – how do we choose between two theories which make the same predictions?

The answer is a principle which has been known since the fourteenth century, but which is still widely misunderstood – Occam’s Razor.

What Occam’s Razor says is that when given two competing explanations, all things being equal, we should prefer the simpler one.

Intuitively, this makes sense – if we have two explanations of why telephones ring, one of which is “electrical pulses are sent down a wire” and the other is “electrical pulses are sent down a wire, except for my phone, which has magic invisible pixies which make a ringing noise and talk to me in the voices of my friends”, we can be pretty confident in dismissing the second explanation and thinking no more about it – it introduces additional unnecessary complexities into things.

It is important, however, to note that this only applies if the two competing hypotheses make the same predictions. If the magic pixie hypothesis also predicted, for example, that none of my friends would remember any of the phone calls I remembered having with them (because they were really with the pixies) then if that were correct we would have a good reason for preferring the more complex hypothesis over the less complex one – it would explain the additional datum. (In reality, we would need slightly more evidence than just my friends’ forgetfulness before we accepted the pixie hypothesis, but it would be a way to distinguish between the two hypotheses).

Another example – “There is a force that acts on all bodies, such that they are attracted to other bodies in proportion to the product of their masses and in inverse proportion to the distance in between them”. Compare to “Angels push all bodies, in such a way that they move in the same way that they would if there was a force that acted upon them, such that they were attracted to other bodies in proportion to the product of their masses and in inverse proportion to the distance in between them”. The two hypotheses make the same predictions, so we go with Newton’s theory of universal gravitation rather than the angel theory. If we discovered that if we asked the angels very nicely by name to stop pushing they would, we would have a good reason to accept the angel hypothesis.

A third, real-life example – “life-forms evolve by competing for resources, with those best able to gain resources surviving to reproduce. Over many millions of years, this competition gives rise to the vast diversity of life-forms we see around us.” versus “God made every life form distinctly, just over six thousand years ago, and planted fake evidence to make it look like life forms evolve by competing for resources, with those best able to gain resources surviving to reproduce and giving rise to the vast diversity of life-forms we see around us, in order to test our faith.”

Any possible piece of evidence for the first hypothesis is a piece of evidence for the second, and vice versa. Under those circumstances, we need to discard the second hypothesis. (Note that in doing so we are not discarding the God hypothesis altogether – this comparison says nothing about the God or gods believed in by intelligent religious people such as, say, Andrew Rilstone or Fred Clark, though of course there may well be equally good arguments against those deities. But it does give us more-than-ample reason to dismiss without further thought the vicious, evil deities worshipped by Tim LaHaye or Fred Phelps.

But hang on, doesn’t it work the other way, too? Can’t we say “that big long explanation about masses and distances is far more complicated than just saying ‘angels did it’, so we should just say that”?

Well, no… remember what we’re trying to do is find the simplest *explanation* for a phenomenon. if you accept gravity as an explanation, that’s a single explanation for everything. If you use the angel explanation, you have to ask about every apparent act of gravity “Why did that happen?” and get the answer “angel number forty-nine trillion decided to push that molecule in that direction” – you’re just shifting all the complexity into the word ‘angel’, not getting rid of it.

So the question now is what do we mean by ‘explanation’? After all, nothing is ever ultimately explained. We ask why things fall to the ground, we get ‘because gravity’. We ask why does gravity exist, and after a few centuries we discover it’s because mass warps space-time. We ask why that happens… and so far answer came there none. Ultimately with *any* question you can keep asking ‘why?’ and at some point we hit the boundaries of what is explicable. Does this mean that there’s no such thing as an explanation?

Clearly it doesn’t – we have an intuitive understanding of what the word ‘explanation’ means – but how can we formalise that understanding in a way that allows us to discuss it properly?

I would suggest this as a rough definition – **something counts as an explanation if it is the answer to two separate questions**.

By which I mean, if the force of gravity were *only* the answer to the question “why do things fall down?” then it would be no answer at all, really – it’s just shifting the problem across. “Things fall because there is a force of things-fallingness” sounds like an explanation to many people, but it doesn’t actually tell you anything new.

However, gravity is *also* the answer to the question “why do planets go in elliptical orbits around the sun?” – two apparently unrelated facts, things falling and planets going in orbit, can be explained by the same principle.

This kind of explanation can happen in all the sciences – and explanations can even cross sciences. Take cancer as an example. There are several diseases that we call cancer (lung cancer is not the same disease as leukaemia is not the same disease as a brain tumour), and they all have the same explanation – a cell starts replicating too much, and the replicated cells themselves also reproduce too fast. They compete for resources with the normal cells, and eventually starve them out, because they can reproduce faster. That explanation works for all the different diseases we call cancer, whatever their outcomes, and whatever their original cause.

But that explanation can then even be taken off into other fields. I once worked for a company that wasn’t making very many sales, and had the sales people on a salary, not just commission. They took on more sales staff, because they weren’t making very many sales – but the new sales staff didn’t make enough more sales to justify their salaries. So they took on more sales staff, because they weren’t making very many sales…

I realised, just looking at the organisation, that the sales department had literally become a cancer in the business. It was draining the business’ resources and using them to grow itself at a frightening rate while the rest of the business was being starved. I quit that job, and within six months the company had been wound up.

That’s the power of a really good explanation – it will be applicable to multiple situations, and tell you what is happening in all of them. The explanation “parts of a system that take resources from the rest of the system to grow at a rapid rate without providing resources back to the rest of the system will eventually cause the system to collapse” works equally well for biological systems and for companies. That principle is a powerful explanation, and it’s the simplest one that will make those predictions.

So now we have the two most important tools of empiricism, the basis of science – we have the concept of the simplest explanation that fits the facts, and we have the idea of feedback. Those two are all you *need* for you to be doing science – and we’ll come back to both of them later, when we talk about Bayes’ Theorem, Solomonoff Induction and Kolmogrov Complexity – but if those are your only tools it’ll take you a while to get anywhere. We also need to be able to think rigorously about our results, and the best tool we have for that is mathematics. Next, we’ll look at proof by contradiction, the oldest tool for rigorous mathematical thinking that we know of.

## How We Know What We Know: 1 – Feedback

One of the reasons I’ve started this series of posts is because I have a huge respect for the scientific method – in fact, I’d go so far as to say that I think the scientific method is the only means we have of actually knowing anything about the world, or indeed anything at all – but I think that even many other people who claim to believe science to be important don’t fully understand how it works. I also think that many of the people who do know how the scientific method works are not fully aware of the implications of this.

This is not to say, of course, that I am an authority or an expert – in fact, questioning authority and experts is one of the things that defines the scientific method – but it does mean that I’ve thought about this stuff a lot, and might have something worthwhile to say.

To start with, let’s look at what the scientific method isn’t. When I talk about the scientific method here I’m talking about what is, in effect, a Platonic ideal version of science. Science as it is actually practiced has all sorts of baggage that comes with being a human being, or with working in a university environment. Try and imagine here that I am talking about the things that a hypothetical alien race’s science would have in common with ours.

The most important thing for us to note as being unnecessary for science is peer review. That’s not to say peer review is a bad thing – in fact it can be a very good thing, a way to separate out crackpottery from real science, and more importantly a way to discover what your embarassing mistakes are before you become committed to believing in them – but it’s not necessary for doing science. That can be shown rather easily by the fact that neither Newton’s Principia or Darwin’s On The Origin Of Species were peer-reviewed, but it would be hard to argue that Newton and Darwin weren’t scientists.

More importantly, there’s some evidence that peer review actually doesn’t do any better at telling good science from bad than choosing at random. I have some problems with the methodology of that study (I think meta-analyses are, if anything, actively bad science rather than just being neutral as peer review is), but other studies have shown that in fact the majority of published studies in peer-reviewed journals are likely to be false.

So if I’m not talking about science-as-it-is-practiced, with all its flaws and human errors, what am I talking about? What is the core of the scientific method?

Well, the first, and most important, part is feedback.

Feedback may be the single most important concept in science – so much so that it’s been reinvented under different names in several different disciplines. Feedback is the name it’s given in cybernetics – the science of control systems, which is what I’m most familliar with – and in information theory and engineering. In computer programming it’s known as recursion. In biology it’s known as evolution by natural selection. And in mathematics it’s called iteration. All of these are the same concept.

Feedback is what happens when the output of a system is used as one of the inputs (or the only input) of that system. So musicians will know that if you prop an electric guitar up against an amp, or have your microphone too near a speaker, you quickly get a high-pitched whining tone. That’s because the tone from the speaker is going into the guitar’s pickups, or into the mic, in such a way that the low frequencies cancel out while the high frequencies add up. The sound goes straight out of the speaker and back into the pickup or mic, and can quickly become overwhelmingly loud.

That’s what we call ‘positive feedback’. Positive feedback leads to exponential growth very quickly – in fact it’s pretty much always the cause of exponential growth. We can see how easily this happens using a computer program:

#!/usr/bin/perl

$myNumber = 2;

while ( $myNumber > 0 ) {

print $myNumber. ” “;

$myNumber *= $myNumber;#This says that as long as myNumber is greater than

#0 – which it always is – the program should

#multiply it by itself, after printing it to the

#screen.}

This program starts with the number two, multiplies it by itself, and then takes the number it gets and uses that as its input, multiplying it by itself. When I ran this program on my computer, the numbers got so big that the computer couldn’t cope with them before I had a chance to blink – it just kept saying the answer was infinity. The first few outputs, though, were 2, 4, 16, 256, 65536, 4294967296, 1.84467440737096 x 10^19. That last number is roughly a two with nineteen noughts following it, for those of you who don’t know exponential notation.

So positive feedback can make things change a huge amount very, very quickly. So what does negative feedback do?

Negative feedback does the opposite, of course, which means that it keeps things the same. The easiest example of negative feedback at work I can think of is a thermostat. A thermostat is set for a temperature – say eighteen degrees – and controls a heating and a cooling device. When the temperature hits nineteen degrees, it turns the heater off and the cooler on, and when it hits seventeen it turns the cooler off and the heater on. Again, the output (the temperature) is being used as the input, but this time the output does the opposite of what the input is doing – if the input moves up the output moves down – and so it keeps it steady.

Negative feedback is used in all control systems, because negative feedback looks just like an intelligence trying to find a particular goal. That’s because it *is* how intelligent agents (like people) try to get to their goals.

Imagine you’re driving a car – the input is what you see through the windscreen, while the output is the way your hands turn the steering wheel. You want to go in a straight line, but you see that the car is veering to the left – as a result, you turn the steering wheel slightly to the right. If it veers to the right, you turn the steering wheel to the left. If you’re a good driver, this feedback becomes almost automatic and you do this in a series of almost imperceptible adjustments. (If you’re me, you veer wildly all over the road and your driving instructor quits in fear for his life).

So what happens when you put positive and negative feedback together? The answer is you get evolution by natural selection.

A lot of people, for some reason, seem to have difficulty grasping the idea of evolution (and not just religious fundamentalists, either). Evolution by natural selection is actually a stunningly simple idea – if you get something that copies itself (like an amoeba, or a plant, or a person), eventually you’ll get tons of copies of it all over the place – positive feedback. But things that copy themselves need resources – like food and water – in order to make more copies. If there aren’t enough resources for everything, then some of them will die (negative feedback from the environment – the environment ‘saying’ “OK, we’ve got enough of you little replicators now”).

Only the ones that live will be able to make more copies of themselves, so if some of the copies are slightly different (giraffes with longer necks, or people who are clever enough to avoid being eaten by sabre-toothed tigers), the ones whose differences help them live longest will make the most copies.

And those differences will then be used as the starting point for the next rounds of feedback, both positive and negative – so the differences get amplified very quickly when they’re useful, and die off very quickly when they’re useless, so you soon end up with giraffes whose necks are taller than my house, and humans who can invent quantum physics and write *Finnegans Wake*, within what is, from the point of view of the universe, the blink of an eye.

But what has that to do with the scientific method?

Everything – in fact, in essence, it *is* the scientific method.

To do science, you need to do three – and only three – things. You need to have a hypothesis, perform an experiment to test that hypothesis, and revise your hypothesis in accordance with the result. It’s a process exactly like that of natural selection.

In particular, for science we want negative feedback – we desperately want to prove ourselves wrong. We come up with a hypothesis – let’s say “All things fall to the ground, except computer monitors, which float”. We now want to see if our hypothesis will survive, just like our giraffes or people did. So we want negative feedback. So we have to ask what test will prove us wrong?

What we don’t want is a test that seems to confirm our hypothesis – that’s boring. We got our hypothesis from looking at the world – maybe I dropped a cup on the floor and it broke (that’s where positive feedback from the environment comes in – we need something from the environment to start the ball rolling). So we don’t want to run a test where we already know the answer – we’re not trying to prove to ourselves that we’re right. So we don’t try dropping another cup.

A test that might go wrong there is dropping a computer monitor. If we try that, we discover that our initial hypothesis was wrong – computer monitors don’t float. So we revise our hypothesis – maybe to “All things fall to the ground, and if you put your foot under a monitor when you drop it, it really hurts” – and then we test the new hypothesis.

When your hypothesis matches experiment time and again – when everything you or anyone else can think to throw at it, that might prove it wrong, matches what your hypothesis says – then you’ve got a theory you can use to make predictions. You’ve suddenly got the ability to predict the future! That’s pretty impressive, for something that is, in essence, no different from what my guitar does when leaned against an amp.

You can also use it to ‘predict’ the past, in the same way – which is why things like paleontology are sciences, and why social sciences like history are called social sciences rather than arts. You can do the same thing there, except that the experiments involve looking for things that have already happened but you don’t know, rather than trying new things and seeing what happened. You might, for example, come up with the hypothesis “*Tyrannosaurus Rex* was actually a vegetarian.” Using that hypothesis, you’d make various predictions – that if you looked at a *T. Rex* skull it would have lots of flat teeth, suitable for grinding vegetation, for example. Then you’d go and look at the skull, and examine the teeth, and see that in fact it had tons of razor-sharp teeth suitable for ripping flesh, and revise your hypothesis, maybe coming up with “*Tyrannosaurus Rex* was actually *not* a vegetarian.”

(Apologies to my friends Mike and Debi, whose field I have grossly oversimplified there).

This is the big difference between scientists and other groups – like conspiracy theorists or a sadly-large number of politicians. Conspiracy theorists go looking for evidence that confirms their ‘theories’, and they find it. You can always find confirmation of anything, if you’re willing to ignore enough negative evidence. If you go looking for evidence that you’re wrong – and you do so sincerely, and invite others to aid you in your search – and you don’t find it, you’re probably right.

Next week – how to choose between alternative theories.

5comments