How We Know What We Know 2: Occam’s Razor

(Continues from part one)

So far we’ve examined how we form a scientific theory. What we need to know now is what makes a *good* theory – how do we choose between two theories which make the same predictions?
The answer is a principle which has been known since the fourteenth century, but which is still widely misunderstood – Occam’s Razor.
What Occam’s Razor says is that when given two competing explanations, all things being equal, we should prefer the simpler one.

Intuitively, this makes sense – if we have two explanations of why telephones ring, one of which is “electrical pulses are sent down a wire” and the other is “electrical pulses are sent down a wire, except for my phone, which has magic invisible pixies which make a ringing noise and talk to me in the voices of my friends”, we can be pretty confident in dismissing the second explanation and thinking no more about it – it introduces additional unnecessary complexities into things.

It is important, however, to note that this only applies if the two competing hypotheses make the same predictions. If the magic pixie hypothesis also predicted, for example, that none of my friends would remember any of the phone calls I remembered having with them (because they were really with the pixies) then if that were correct we would have a good reason for preferring the more complex hypothesis over the less complex one – it would explain the additional datum. (In reality, we would need slightly more evidence than just my friends’ forgetfulness before we accepted the pixie hypothesis, but it would be a way to distinguish between the two hypotheses).

Another example – “There is a force that acts on all bodies, such that they are attracted to other bodies in proportion to the product of their masses and in inverse proportion to the distance in between them”. Compare to “Angels push all bodies, in such a way that they move in the same way that they would if there was a force that acted upon them, such that they were attracted to other bodies in proportion to the product of their masses and in inverse proportion to the distance in between them”. The two hypotheses make the same predictions, so we go with Newton’s theory of universal gravitation rather than the angel theory. If we discovered that if we asked the angels very nicely by name to stop pushing they would, we would have a good reason to accept the angel hypothesis.

A third, real-life example – “life-forms evolve by competing for resources, with those best able to gain resources surviving to reproduce. Over many millions of years, this competition gives rise to the vast diversity of life-forms we see around us.” versus “God made every life form distinctly, just over six thousand years ago, and planted fake evidence to make it look like life forms evolve by competing for resources, with those best able to gain resources surviving to reproduce and giving rise to the vast diversity of life-forms we see around us, in order to test our faith.”

Any possible piece of evidence for the first hypothesis is a piece of evidence for the second, and vice versa. Under those circumstances, we need to discard the second hypothesis. (Note that in doing so we are not discarding the God hypothesis altogether – this comparison says nothing about the God or gods believed in by intelligent religious people such as, say, Andrew Rilstone or Fred Clark, though of course there may well be equally good arguments against those deities. But it does give us more-than-ample reason to dismiss without further thought the vicious, evil deities worshipped by Tim LaHaye or Fred Phelps.

But hang on, doesn’t it work the other way, too? Can’t we say “that big long explanation about masses and distances is far more complicated than just saying ‘angels did it’, so we should just say that”?

Well, no… remember what we’re trying to do is find the simplest explanation for a phenomenon. if you accept gravity as an explanation, that’s a single explanation for everything. If you use the angel explanation, you have to ask about every apparent act of gravity “Why did that happen?” and get the answer “angel number forty-nine trillion decided to push that molecule in that direction” – you’re just shifting all the complexity into the word ‘angel’, not getting rid of it.

So the question now is what do we mean by ‘explanation’? After all, nothing is ever ultimately explained. We ask why things fall to the ground, we get ‘because gravity’. We ask why does gravity exist, and after a few centuries we discover it’s because mass warps space-time. We ask why that happens… and so far answer came there none. Ultimately with *any* question you can keep asking ‘why?’ and at some point we hit the boundaries of what is explicable. Does this mean that there’s no such thing as an explanation?

Clearly it doesn’t – we have an intuitive understanding of what the word ‘explanation’ means – but how can we formalise that understanding in a way that allows us to discuss it properly?

I would suggest this as a rough definition – something counts as an explanation if it is the answer to two separate questions.

By which I mean, if the force of gravity were *only* the answer to the question “why do things fall down?” then it would be no answer at all, really – it’s just shifting the problem across. “Things fall because there is a force of things-fallingness” sounds like an explanation to many people, but it doesn’t actually tell you anything new.

However, gravity is *also* the answer to the question “why do planets go in elliptical orbits around the sun?” – two apparently unrelated facts, things falling and planets going in orbit, can be explained by the same principle.

This kind of explanation can happen in all the sciences – and explanations can even cross sciences. Take cancer as an example. There are several diseases that we call cancer (lung cancer is not the same disease as leukaemia is not the same disease as a brain tumour), and they all have the same explanation – a cell starts replicating too much, and the replicated cells themselves also reproduce too fast. They compete for resources with the normal cells, and eventually starve them out, because they can reproduce faster. That explanation works for all the different diseases we call cancer, whatever their outcomes, and whatever their original cause.

But that explanation can then even be taken off into other fields. I once worked for a company that wasn’t making very many sales, and had the sales people on a salary, not just commission. They took on more sales staff, because they weren’t making very many sales – but the new sales staff didn’t make enough more sales to justify their salaries. So they took on more sales staff, because they weren’t making very many sales…

I realised, just looking at the organisation, that the sales department had literally become a cancer in the business. It was draining the business’ resources and using them to grow itself at a frightening rate while the rest of the business was being starved. I quit that job, and within six months the company had been wound up.

That’s the power of a really good explanation – it will be applicable to multiple situations, and tell you what is happening in all of them. The explanation “parts of a system that take resources from the rest of the system to grow at a rapid rate without providing resources back to the rest of the system will eventually cause the system to collapse” works equally well for biological systems and for companies. That principle is a powerful explanation, and it’s the simplest one that will make those predictions.

So now we have the two most important tools of empiricism, the basis of science – we have the concept of the simplest explanation that fits the facts, and we have the idea of feedback. Those two are all you *need* for you to be doing science – and we’ll come back to both of them later, when we talk about Bayes’ Theorem, Solomonoff Induction and Kolmogrov Complexity – but if those are your only tools it’ll take you a while to get anywhere. We also need to be able to think rigorously about our results, and the best tool we have for that is mathematics. Next, we’ll look at proof by contradiction, the oldest tool for rigorous mathematical thinking that we know of.

How We Know What We Know: 1 – Feedback

One of the reasons I’ve started this series of posts is because I have a huge respect for the scientific method – in fact, I’d go so far as to say that I think the scientific method is the only means we have of actually knowing anything about the world, or indeed anything at all – but I think that even many other people who claim to believe science to be important don’t fully understand how it works. I also think that many of the people who do know how the scientific method works are not fully aware of the implications of this.

This is not to say, of course, that I am an authority or an expert – in fact, questioning authority and experts is one of the things that defines the scientific method – but it does mean that I’ve thought about this stuff a lot, and might have something worthwhile to say.

To start with, let’s look at what the scientific method isn’t. When I talk about the scientific method here I’m talking about what is, in effect, a Platonic ideal version of science. Science as it is actually practiced has all sorts of baggage that comes with being a human being, or with working in a university environment. Try and imagine here that I am talking about the things that a hypothetical alien race’s science would have in common with ours.

The most important thing for us to note as being unnecessary for science is peer review. That’s not to say peer review is a bad thing – in fact it can be a very good thing, a way to separate out crackpottery from real science, and more importantly a way to discover what your embarassing mistakes are before you become committed to believing in them – but it’s not necessary for doing science. That can be shown rather easily by the fact that neither Newton’s Principia or Darwin’s On The Origin Of Species were peer-reviewed, but it would be hard to argue that Newton and Darwin weren’t scientists.

More importantly, there’s some evidence that peer review actually doesn’t do any better at telling good science from bad than choosing at random. I have some problems with the methodology of that study (I think meta-analyses are, if anything, actively bad science rather than just being neutral as peer review is), but other studies have shown that in fact the majority of published studies in peer-reviewed journals are likely to be false.

So if I’m not talking about science-as-it-is-practiced, with all its flaws and human errors, what am I talking about? What is the core of the scientific method?

Well, the first, and most important, part is feedback.

Feedback may be the single most important concept in science – so much so that it’s been reinvented under different names in several different disciplines. Feedback is the name it’s given in cybernetics – the science of control systems, which is what I’m most familliar with – and in information theory and engineering. In computer programming it’s known as recursion. In biology it’s known as evolution by natural selection. And in mathematics it’s called iteration. All of these are the same concept.

Feedback is what happens when the output of a system is used as one of the inputs (or the only input) of that system. So musicians will know that if you prop an electric guitar up against an amp, or have your microphone too near a speaker, you quickly get a high-pitched whining tone. That’s because the tone from the speaker is going into the guitar’s pickups, or into the mic, in such a way that the low frequencies cancel out while the high frequencies add up. The sound goes straight out of the speaker and back into the pickup or mic, and can quickly become overwhelmingly loud.

That’s what we call ‘positive feedback’. Positive feedback leads to exponential growth very quickly – in fact it’s pretty much always the cause of exponential growth. We can see how easily this happens using a computer program:

#!/usr/bin/perl

$myNumber = 2;

while ( $myNumber > 0 ) {

print $myNumber. ” “;
$myNumber *= $myNumber;

#This says that as long as myNumber is greater than
#0 – which it always is – the program should
#multiply it by itself, after printing it to the
#screen.

}

This program starts with the number two, multiplies it by itself, and then takes the number it gets and uses that as its input, multiplying it by itself. When I ran this program on my computer, the numbers got so big that the computer couldn’t cope with them before I had a chance to blink – it just kept saying the answer was infinity. The first few outputs, though, were 2, 4, 16, 256, 65536, 4294967296, 1.84467440737096 x 10^19. That last number is roughly a two with nineteen noughts following it, for those of you who don’t know exponential notation.

So positive feedback can make things change a huge amount very, very quickly. So what does negative feedback do?

Negative feedback does the opposite, of course, which means that it keeps things the same. The easiest example of negative feedback at work I can think of is a thermostat. A thermostat is set for a temperature – say eighteen degrees – and controls a heating and a cooling device. When the temperature hits nineteen degrees, it turns the heater off and the cooler on, and when it hits seventeen it turns the cooler off and the heater on. Again, the output (the temperature) is being used as the input, but this time the output does the opposite of what the input is doing – if the input moves up the output moves down – and so it keeps it steady.

Negative feedback is used in all control systems, because negative feedback looks just like an intelligence trying to find a particular goal. That’s because it is how intelligent agents (like people) try to get to their goals.

Imagine you’re driving a car – the input is what you see through the windscreen, while the output is the way your hands turn the steering wheel. You want to go in a straight line, but you see that the car is veering to the left – as a result, you turn the steering wheel slightly to the right. If it veers to the right, you turn the steering wheel to the left. If you’re a good driver, this feedback becomes almost automatic and you do this in a series of almost imperceptible adjustments. (If you’re me, you veer wildly all over the road and your driving instructor quits in fear for his life).

So what happens when you put positive and negative feedback together? The answer is you get evolution by natural selection.

A lot of people, for some reason, seem to have difficulty grasping the idea of evolution (and not just religious fundamentalists, either). Evolution by natural selection is actually a stunningly simple idea – if you get something that copies itself (like an amoeba, or a plant, or a person), eventually you’ll get tons of copies of it all over the place – positive feedback. But things that copy themselves need resources – like food and water – in order to make more copies. If there aren’t enough resources for everything, then some of them will die (negative feedback from the environment – the environment ‘saying’ “OK, we’ve got enough of you little replicators now”).

Only the ones that live will be able to make more copies of themselves, so if some of the copies are slightly different (giraffes with longer necks, or people who are clever enough to avoid being eaten by sabre-toothed tigers), the ones whose differences help them live longest will make the most copies.

And those differences will then be used as the starting point for the next rounds of feedback, both positive and negative – so the differences get amplified very quickly when they’re useful, and die off very quickly when they’re useless, so you soon end up with giraffes whose necks are taller than my house, and humans who can invent quantum physics and write Finnegans Wake, within what is, from the point of view of the universe, the blink of an eye.

But what has that to do with the scientific method?

Everything – in fact, in essence, it is the scientific method.

To do science, you need to do three – and only three – things. You need to have a hypothesis, perform an experiment to test that hypothesis, and revise your hypothesis in accordance with the result. It’s a process exactly like that of natural selection.

In particular, for science we want negative feedback – we desperately want to prove ourselves wrong. We come up with a hypothesis – let’s say “All things fall to the ground, except computer monitors, which float”. We now want to see if our hypothesis will survive, just like our giraffes or people did. So we want negative feedback. So we have to ask what test will prove us wrong?

What we don’t want is a test that seems to confirm our hypothesis – that’s boring. We got our hypothesis from looking at the world – maybe I dropped a cup on the floor and it broke (that’s where positive feedback from the environment comes in – we need something from the environment to start the ball rolling). So we don’t want to run a test where we already know the answer – we’re not trying to prove to ourselves that we’re right. So we don’t try dropping another cup.

A test that might go wrong there is dropping a computer monitor. If we try that, we discover that our initial hypothesis was wrong – computer monitors don’t float. So we revise our hypothesis – maybe to “All things fall to the ground, and if you put your foot under a monitor when you drop it, it really hurts” – and then we test the new hypothesis.

When your hypothesis matches experiment time and again – when everything you or anyone else can think to throw at it, that might prove it wrong, matches what your hypothesis says – then you’ve got a theory you can use to make predictions. You’ve suddenly got the ability to predict the future! That’s pretty impressive, for something that is, in essence, no different from what my guitar does when leaned against an amp.

You can also use it to ‘predict’ the past, in the same way – which is why things like paleontology are sciences, and why social sciences like history are called social sciences rather than arts. You can do the same thing there, except that the experiments involve looking for things that have already happened but you don’t know, rather than trying new things and seeing what happened. You might, for example, come up with the hypothesis “Tyrannosaurus Rex was actually a vegetarian.” Using that hypothesis, you’d make various predictions – that if you looked at a T. Rex skull it would have lots of flat teeth, suitable for grinding vegetation, for example. Then you’d go and look at the skull, and examine the teeth, and see that in fact it had tons of razor-sharp teeth suitable for ripping flesh, and revise your hypothesis, maybe coming up with “Tyrannosaurus Rex was actually not a vegetarian.”

(Apologies to my friends Mike and Debi, whose field I have grossly oversimplified there).

This is the big difference between scientists and other groups – like conspiracy theorists or a sadly-large number of politicians. Conspiracy theorists go looking for evidence that confirms their ‘theories’, and they find it. You can always find confirmation of anything, if you’re willing to ignore enough negative evidence. If you go looking for evidence that you’re wrong – and you do so sincerely, and invite others to aid you in your search – and you don’t find it, you’re probably right.

Next week – how to choose between alternative theories.

How We Know What We Know: Introduction

I’ve been reading up a lot over the last few years about a large variety of subjects, not science as such but how we do science and how we actually know what we know. I’ve written about some of these things before, in Sci-Ence! Justice Leak!, but there I was looking at stuff for its science-fictional or storytelling possibilities.

However, I want to write about this stuff seriously. Partly, that’s to help organise my own thoughts – I’m an autodidact, and I’ve read a VAST amount without trying to organise it except in an ad hoc manner. But also, it’s because I find this stuff absolutely fascinating. So I’ve come up with a through-line, and I’m going to try to do a post a week for the next twelve weeks. I’m going to try to be properly accurate, but still convert this all into vernacular English.

What I’m going to talk about is the scientific method – what it is, why it’s important, and how developments in computer science have meant we can create and prove, based on a very small set of assumptions, a mathematically rigorous formulation of the scientific method. Not only that, but we can use that prove what the optimal thing to do is in all circumstances (given enough computing power…)

There will be twelve parts to this series:

1 – Feedback
Explaining possibly the most important concept in human thought, and looking at the hypothesise-experiment-revise process in science.

2 – Occam’s Razor
The single most important tool in modern science, invented by a mediaeval monk.

3 – Proof By Contradiction
A mathematical technique, first formulated by Euclid, that’s the basis for much modern mathematics.

4 – Diagonal Proof
Georg Cantor’s proof and why it’s important

5 – Turing and Godel
On notions of computability, and what a computer program is.

6 – Kolmogrov Complexity
What’s the smallest computer program that could print out this essay?

7 – Bayes’ Theorem
An 18th century vicar shows us how to make decisions in the absence of information.

8 – Ashby’s Law
Cybernetics and attempting to control the uncontrollable

9 – Thermodynamics and Shannon
What is information, and how is it related to chaos?

10 – Solomonoff Induction
How to predict the future

11 – Hutter’s algorithm
Universal artificial intelligence

Epilogue
In which we look at what we’ve learned.

This will be summarising stuff from many books and articles, but in particular The Fabric Of Reality by David Deutsch, Probability Theory — The Logic Of Science by E.T. Jaynes, Information Theory, Inference, and Learning Algorithms by David MacKay, some of the posts on the LessWrong group blog, the lectures in Scott Aaronson’s sidebar and An Introduction To Cybernetics by W. Ross Ashby. Mistakes are, of course, mine, not theirs. Part 1 in this series will come next week.

(More generally my plan at the moment is to have four big series of posts on the go – my Beach Boys reviews, starting up my Doctor Who reviews again, this series and a series of posts on Cerebus – all posting roughly weekly, with the other three days of the week left either for linkblogs or for rants on whatever comes to mind in comics or politics).

The Grandfather Paradox: Experimentally Resolved?

A revised and improved version of this essay is in my book Sci-Ence! Justice Leak! – hardback, paperback, PDF Kindle (US), Kindle (UK), all other ebook formats

I am utterly astounded that I’d never seen this before today – an experiment that may have more profound implications for our worldview than… maybe any experiment since the Michelson-Morley experiment?

I’m going to assume here that everyone knows about the Grandfather Paradox. This is just the simple question “What happens if you have a time machine, and go back and kill your granddad so you can never be born?”, the staple of many TV science fiction shows.

Now the normal answer to that question is “You can’t, so don’t be daft”. But for physicists, that’s not good enough – apart from anything else, General Relativity allows for the existence of ‘closed timelike curves’. These are paths through space-time that act much like paths through space – you can go in at one end and pop out the other – except that the other end is somewhere else in time as well as space. So it’s theoretically possible that you *could* do that, and we’d quite like to know what would happen if you did before everyone’s granddad starts retroactively never-having-existed.

Now, the main hypothesis in physics up to now has been, in effect, that it doesn’t matter. David Deutsch, a quantum computing expert at Oxford University, demonstrated that in quantum-mechanical terms you could have an outcome that makes sense so long as you accepted the many-worlds version of reality. Essentially, the probability that you were ever born, and the probability that you killed your grandfather, would both be 1/2 – or in other words the ‘you’ in a universe where you were born would travel to a universe where you were never born, kill your grandfather there, then come back to one where you’d never killed your grandfather. Nice and simple.

However, Seth Lloyd, a quantum physicist at MIT, never liked the many-worlds hypothesis (for reasons which, I have to say, make no sense at all to me), and he and a team of colleagues came up with another, simpler, idea, which is just that if you go back in time and try to shoot your grandfather, something will stop you. Maybe the gun will misfire, maybe you’ll be arrested, maybe your grandma was having an affair with the milkman and you’re his biological grandchild – something will just make sure that you can’t do that, because it would be cheating.

Now, there are huge, huge, MASSIVE problems with this – it gets rid of causality, it allows information to come from nowhere, and it just seems like a gigantic handwave. It makes no sense at all, and just seems like a desperate attempt to try to get out of the obvious, blatant, truth that the Many-Worlds interpretation is the only one consistent with the experiments and maths. When I first read about it, I thought it was just a neat way of avoiding the truth.

Unfortunately, it appears to be true. What I hadn’t realised was that they’d *actually done the experiment*!

Lloyd and his colleagues came up with an ingenious experiment, which I’m not entirely sure I’m capable of explaining, as it’s not really sunk in yet. This will be a GROSS oversimplification, and is just designed to get the idea across – please don’t kill me for inaccuracies. The full description is in the linked PDF. This is what Pratchett, Stewart and Cohen call lies-to-adults – the story is right, but each individual fact is wrong.

Essentially, photons (light particles) can be polarised a couple of ways, and they’ll only go through stuff that’s polarised the same way. That’s why Polaroid sun-glasses work – they block all the photons that are polarised the wrong way, so only let some light through.

Now, until something detects it, a photon isn’t in any particular polarisation – it’s in all of the possible polarisations at once. But once something has detected what kind of polarisation a photon is in, it’s always been that way – quantum causality works both ways in time. So you can set up an experiment that only detects photons of one polarisation, and that way you can send a message back to the past, to the photon emitter (light source) saying “Only send photons of this type”. If you do this the right way, you can send a photon back in time (but you can’t look at the photon that’s been sent back in time until it’s come back to the time you sent it from, or the experiment can’t work). That might sound mad, but it’s the way things are – accept it for now.

Now, by doing this, you can set up a kind of quantum ‘gun’ – set it up so that the photon going back in time tries to cancel out itself coming forward in time – all you do is put something in the middle that tries to change the polarisation of the backwards-in-time photon to the opposite of the forwards-in-time one. Changing polarisation is easy, and works about 96% of the time.

It never worked on the backwards-in-time photons.

This means that if you went back in time and tried to kill your grandfather, the gun really *would* misfire! Every time.

Now, assuming their experimental design wasn’t flawed and their maths works – and it looks OK to me, but I’m not a quantum physicist – then that means a lot of things:

Firstly, it means the universe is completely deterministic. There’s no such thing as chance.
Secondly, it’s strong evidence *against* the many-worlds hypothesis – the first such evidence I’ve ever heard of. It almost certainly means there’s a single universe.

Most interestingly, it means we can say goodbye to cause-and-effect. Effects can cause their own cause. For science-fiction fans, we’re living in the universe of Bill & Ted, the Doctor Who story Blink, and By His Bootstraps, (EDIT or of this rather nice short-short story by Simon Bucher-Jones) rather than Back To The Future or Pyramids Of Mars.

This of course means that access to a closed timelike curve (something that has never been observed in the real universe, but is theoretically possible), gives you essentially godlike powers. Got a closed timelike curve and want a million quid? Just put two pence in the bank and say “tomorrow, if my account has two milion pounds or less in it, I’ll take half of the money out and bring it back today and stick it in the account.” So if tomorrow you’ve still got 2p, you’d go back and put an extra penny in, which means that actually tomorrow you’ve got 3p in, which means… and the only stable way that can work out (other than you dying or something over the next day) is for the million pounds just to appear in your bank account.

Want to write a bestselling novel? Decide to print out five hundred pages just covered with the letter “A” and send it to a publisher. If they publish it and it becomes a bestseller, you send that back to yourself. If they don’t, you print out all the letter “A” apart from one “B” at the end and send that back to yourself to try that, and repeat – the only stable outcome is that you have a novel arrive that you never actually wrote but that will be an instant bestseller. And so on.

The possibility of time-travel in a *single, consistent universe* has never been one that’s really been taken seriously before, because it was just so absurd. I’m still 90% sure that there must be a mistake somewhere – the many-worlds hypothesis, as odd as it may sound, is far, FAR less ridiculous than this. But this is one of those things where either in a few months we’ll have a very quiet paper by Lloyd saying “Oops, I was totally wrong about everything because I forgot to carry the one” or in a hundred years’ time we’ll have a totally new understanding of physics based around this paper. I really can’t see a middle ground here…

Bullet-Biters And Bomb-Testers

Sometimes serendipity happens. I was trying to think of a way to link together a couple of sections of the Hyperpost book, when I found this old post from Scott Aaronson’s blog Shtetl-Optimised.

In it, Aaronson talks about how he’d noticed that there was a lot of overlap between Libertarians and proponents of the Many-Worlds Hypothesis in quantum physics, and had tried to figure out why:

Some connections are obvious: libertarianism and MWI are both grand philosophical theories that start from premises that almost all educated people accept (quantum mechanics in the one case, Econ 101 in the other), and claim to reach conclusions that most educated people reject, or are at least puzzled by (the existence of parallel universes / the desirability of eliminating fire departments)…

My own hypothesis has to do with bullet-dodgers versus bullet-swallowers. A bullet-dodger is a person who says things like:

“Sure, obviously if you pursued that particular line of reasoning to an extreme, then you’d get such-and-such an absurd-seeming conclusion. But that very fact suggests that other forces might come into play that we don’t understand yet or haven’t accounted for. So let’s just make a mental note of it and move on.”

Faced with exactly the same situation, a bullet-swallower will exclaim:

“The entire world should follow the line of reasoning to precisely this extreme, and this is the conclusion, and if a ‘consensus of educated opinion’ finds it disagreeable or absurd, then so much the worse for educated opinion! Those who accept this are intellectual heroes; those who don’t are cowards.”

I think he’s on to something, but I think there’s a second aspect, which is what happens when those ideas actually hit reality.

Because Libertarianism and the Many Worlds Hypothesis have one big difference between them – one has immediate real-world consequences, and the other doesn’t. And that means that it is no longer a purely intellectual exercise.

Leaving aside whether the claims for Libertarianism (of the Ayn Rand type, which is what Aaronson is referring to) stack up logically, and assume for a moment one believes them to be correct, should you *act* as if you believe the claims to be correct? To take Aaronson’s example, should we privatise the fire service?

If you’re a libertarian, you believe the answer should be yes – that privatising the fire service would have the end result of fewer fires, and those fires being fought more cheaply. But what if you’re wrong? If you’re wrong, then the result would be people – potentially a lot of people – losing their homes.

So there’s a second level of calculation to be done here – how sure are you of your own reasoning ability and the information (your priors, in Bayesian terms) you use to come to your conclusions? *WHEN YOU FACTOR IN THE PROBABILITY OF YOU BEING WRONG* does the expected benefit if you’re right outweigh the expected loss if you’re wrong?

Now, on this blog I often fall into the ‘bullet biter’ side of things *when talking about ideas with no real-world immediate consequences*, because it’s both intellectually right and more interesting. But take the Many-Worlds hypothesis. I consider this the most likely of the various explanations of quantum theory I’ve read, and would put my confidence in that judgement at about 80% – I’m a bullet-biter there, and proud of it.

And I’m a bullet-biter when it comes to certain forms of alternative medicine. I’m convinced from the experimental evidence, for example, that taking certain vitamin supplements in large doses will massively decrease the risk of cancer, and have stated that on this blog too. And again, I’d put my confidence in that at about 80% (I rarely put my confidence in *anything* much above that).

Now, the downside with taking vitamins is that there’s a cost of maybe a pound a day and – if you believe the very worst possible reports, which as far as I can see have no evidentiary basis, but if we’re assuming I’m wrong we’re assuming I’m wrong – a very small increased risk of kidney stones. The benefit, if I’m right, is not getting cancer. An 80% chance of ‘not getting cancer’ outweighs a 20% chance of a 1% increase in kidney stones, so it’s worth the pound a day to me to put my money where my mouth is and actually take the vitamins.

On the other hand, one can come up with a real-world test for the Many-Worlds Hypothesis. If it’s true then, were I to stand at ground zero of a nuclear weapons test, I should expect to live through it. There would be a googolplex or so universes where I’d die instantly, but I would not experience those, because I’d die too quickly. On the other hand, there’d be a one-in-a-googolplex chance of me surviving, which according to Many-Worlds means there’s a universe where I *would* survive. That would be the only one I’d experience, so from my own point of view I’d survive.

But even though I am persuaded by the Many-Worlds hypothesis, I’m not going to try that one out.

However, there are people out there who *would* do it, who would say “No, I’ll be fine! Drop the bomb!” – let’s call them bomb-testers.

And I think while being a bullet-biter can be a good thing, being a bomb-tester never is.

A bullet-biter might say “I’m convinced the Singularity is coming, but I’ll give some money to Greenpeace just in case” while the bomb-tester would say “I’m convinced the Singularity is coming, so I’m not going to support environmental protection measures, because we’ll be gods in twenty years anyway”.
A bullet-biter might say “I’m convinced the Bible is literally true, but I’m not going to hurt anyone who thinks differently”. A bomb-tester would say “I’m convinced the Bible is literally true, so I’ll persecute homosexuals”

I think a lot of people – particularly in the ‘skeptic’ community – think of themselves as being bullet-biters when they’re actually bomb-testers. They’ve reached a logical conclusion, and are going to act on that and damn the consequences. This is why some people say Richard Dawkins and fundamentalist Christians are the same kind of person – not because their beliefs are equally unjustifiable, but because they are both certain enough of their own rightness that they’ll act on it even when the downside of that action looks to the rest of us far worse than whatever upside they believe in.

Which is not to say that “acting on one’s beliefs” is a bad thing. One reason I have more respect for Eliezer Yudkowsky (of Less Wrong ) than for other Signulatarians is that he’s willing to act on his beiefs (even though I don’t find his arguments convincing, and think he has more than a little of a Messianic streak at times). But his actions *take into account the possibility he’s wrong* – he’s acting in a way to minimise expected harm. If he’s right and he doesn’t act, the world will end. If he’s wrong and he does act, then he wastes his time and looks a fool. Were I to find his general arguments convincing, I’d be doing the same.

If you find yourself defending an intellectual position that others don’t hold, then you’re quite possibly an ‘intellectual hero’. But if you find yourself acting on that position without considering what might happen if you’re wrong, then you’ll end up a real-world villain…

Geeks Dig Metaphors: Paradigm A Dozen

All work and no play makes Jack a dull boy, all work and no play makes Jack a dull boy, all work…

This series of posts has become rather longer than the very short thing I was originally going to write, but we’re heading into the home stretch now. (Parts one, two and three for latecomers.)

This post is the part that inspired the overall title for this mini-series, and is probably going to be the least convincing. But I find it the most convincing.

You see, in large part I agree with the Singulatarians, and that’s precisely why I disagree with them.

Let me explain.

Belief in the Singularity is part of what we might call a ‘paradigm’ or ‘meme-plex’ (depending on precisely what species of wanker we are), or a world-view. It’s one that, in its broadest outlines, I share, and it is that the universe can be regarded as pure information.

People arrive at this position – a sort of scientific neo-Platonism – from a variety of scientific sources, but you can get to it from proper computer science (see Scott Aaronson’s wonderful series of lectures on Quantum Computing Since Democritus), information theory, cybernetics, quantum theory via either the Copenhagen or Many-Worlds interpretations, Bayes’ theorem, Solomonoff induction or probably a dozen other ways. Almost all these fields, incidentally, come originally from work by John von Neumann…

In brief, this world-view could be summarised as:

  • Most of modern science is more-or-less correct. In particular, relativity, evolution and quantum physics are largely correct
  • It makes no sense to talk about things that are outside of the physical world, such as souls or gods, unless those things can be proved to exist by some effect they have on the physical world
  • Any physical system can be modelled by a Turing machine, given enough time and memory
  • Any two things which are isomorphic are the same (the identity of indiscernibles)
  • The scientific method – form a hypothesis, make a prediction from that hypothesis, test the prediction, revise the hypothesis in light of the results – is the only way of obtaining accurate information about the universe
  • The mind is a purely physical process
  • If you want a book explaining this viewpoint in great detail, I recommend David Deutsch’s The Fabric Of Reality (which I reviewed here )

    Now, most of this is stuff which is fairly sensible, and with which I (and I suspect most people) could agree. And it leads to the belief that both the universe and the human mind can be thought of in some sense as computer programs, or as mathematical formalisms.

    (Those of you who know a little of the history of philosophy will now get why I referred to the attitude of Singulatarians as Panglossian in the last post – Doctor Pangloss in Candide being of course a satire of Leibniz, whose ideas are very much a 17th century precursor to this worldview).

    At one extreme, this belief that the universe can be modelled as a computer program simply leads to things like Steve Yegge’s argument that we should treat questions like ‘what’s outside the universe?’ the same way we should treat an undef in programming. At the other, it leads to the ideas of mathematical physicist Max Tegmark, who argues that all mathematical formal systems have an objective reality in exactly the same way our universe does.

    This worldview does impact on the Singulatarians, in a variety of ways, from shaping their view of the end result of the Singularity, to their thoughts on how it should be created (a lot of the discussions around the Singularity Institute involve people trying to come up with a rigorous decision theory, based on Bayesian probabilities, that would work in a quantum multiverse, because they believe this to be necessary for the creation of an artificial intelligence that won’t harm humanity).

    But while this worldview is probably the closest we’ve got to a ‘correct understanding of the universe’ so far, it is only a model. And I think going from that model to statements that the mind ‘is’ a computer program, or that the universe ‘is’, is a step too far – confusing the map with the territory. Our models – our worldviews – are metaphors. They’re ways of understanding the universe. They’re not the actual universe itself, any more than Burns’ love really was a red red rose.

    Every other model we’ve had of the universe so far – the Aristotelean worldview, the clockwork universe of Newton and so on – has proved incorrect. Those models all worked for a restricted domain – those cases that could be understood and measured at the time, and that people had bothered to check. But it was the edge cases – those areas in which those worldviews were stretched to their limits – that caused those models to fall down.

    And every time, while the predictions made for things that were already known stayed the same (Aristotle, Newton and Einstein all predict that things will fall to the ground), the underlying view of the universe changed immeasurably, along with the predictions for the unknown.

    Our knowledge of science is immeasurably better now than, say, a hundred years ago, but it’s not yet complete. It may never be, but no matter what, things like a quantum theory of gravity, if we ever find one, *will* bring with them new ways of looking at the world, and I have no doubt that saying the universe is a computer program, or that the human mind is one, will look as ridiculous as saying that things move towards their natural place based on how much earth, air, fire or water they contain.

    The Singularity is, pretty much by definition, the place where our current thinking breaks down, even if you accept all the arguments for it. Now, either we’ve managed to get everything exactly right for the first time in history, and what’s more that getting everything exactly right will lead to immortality just before Ray Kurzweil would otherwise die, followed by the creation of heaven on Earth, or there’s a mistake in our current scientific thinking.

    I’d like to believe the former, but I’m not putting money on it…

    Geeks Dig Metaphors: The Technical Problems With The Singularity

    Back to introduction

    I have come to the conclusion that anyone who talks about how easy it’s going to be to simulate a human brain in a computer either understands computers but doesn’t understand biology, or doesn’t understand computers but understands biology. I’m currently studying for a Master’s in Bioinformatics, so I have an equal lack of understanding of both subjects.

    The argument seems to be “the genome is like a computer program – it contains all the information needed to build a person. The genome’s only a few gigabytes long, so the Kolmogrov complexity of the ‘create a brain’ program must be less than that. We have computers that can run programs that long, so it’s only a matter of time before we can run a ‘create a brain’ program on our computers”.

    Now, firstly, I simply don’t believe that one can reduce the problem in this way. Intuitively, it doesn’t make much sense. I have a little over 20GB of Beach Boys MP3s/FLACs on my hard drive. They couldn’t be compressed much more than that without loss of information. The human brain is supposed to be the most complex known object in the universe. I simply don’t believe that the most complex known object in the universe has a lower Kolmogrov complexity than the surf-pop harmony stylings of the Wilson brothers. I mean, I’ve not even counted my Jan and Dean MP3s in there!

    But let’s ignore the intuitive problems, and also ignore various practical issues like epigenetic inheritance, and assume for the moment that the human genetic code is a computer program, and this 3GB (or “3/4 of the size of the directory which just contains my very favourite Beach Boys live bootlegs” in human terms) program will, if run on the correct hardware, produce a human body, including the brain. Here is where we hit the problem with the concept of Kolmogrov complexity, so freely bandied around by a lot of these people.

    Basically, Kolmogrov complexity is a measure of how small the smallest computer program that can produce a given output is. For example, say we want to run a program that outputs “Hello World!” and a line break. In Perl (the language with which I’m most familiar) this would be:

    #!/usr/bin/perl
    print “Hello World!\n”;

    That’s 39 bytes long. This means that we know the Kolmogrov complexity of a Hello, World program must be 39 bytes or less. It might be possible to do it in fewer bytes in some other programming language, but we know that any program more than 39 bytes long isn’t the shortest possible program that does that.

    Now, the reason Kolmogrov complexity is a useful measure is that it doesn’t vary *much* between languages and platforms. Say you have a program written in perl, but for some reason you want to run it in Java. ‘All’ you need to do is wrap it in another program, which converts perl to Java, so if your ‘convert perl to Java’ program is, say, 1.2 megabytes (that’s the size of the /usr/bin/perl program on my GNU/Linux system, which converts perl to machine code, so that’s a reasonable size), the length of the shortest Java program to do that thing must be at most the length of the perl program plus 1.2 megabytes.

    As program size gets bigger, that ‘plus 1.2 megabytes’ gets swamped by the size of the program, so Kolmogrov complexity is a very good measure of the complexity of *getting a particular computer to perform a task*.

    But the problem is that it doesn’t take into account the complexity of *the hardware performing the task*, so it’s not very good when moving between vastly different types of hardware.

    Take a jukebox as an example. If you want it to play, say, Good Vibrations by the Beach Boys, you look for the code for that (say, A11) and punch that into the jukebox, which executes that ‘program’. Now, that proves that the Kolmogrov complexity of the ‘play Good Vibrations’ program is a couple of bytes long.

    But if I want my computer to play Good Vibrations, the simplest program that will do it is ‘playsound /media/disk/Beach\ Boys/Smiley\ Smile/Good\ Vibrations.mp3’ – that’s thirty-five times the length of the jukebox ‘program’. But that’s not all – you have to count the size of the ‘playsound’ program (15 kilobytes) and the MP3 file (3.8 megabytes). Moving our ‘program’ from the jukebox to my computer has made it several million times as long, because we’ve had to take information that was previously in hardware (the physical Beach Boys CD within the jukebox and the ability of the jukebox to play music) and convert it into software (the MP3 file and the playsound program).

    Now, I never normally talk about my day job here, because I don’t want to give anyone an excuse to confuse my views with those of my employers, but it’s almost impossible not to, here. The lab in which I work produces a piece of software which allows you to run programs compiled for one kind of computer on another kind. The product I work on allows you to take programs which are compiled for computers with x86 processors (such as, almost certainly, the one you’re using to read this) that run GNU/Linux, and run them on machines which have POWER chips, which also run GNU/Linux.

    Now, this program took many person-decades of work by some very, very bright people, and a huge amount of money, to develop. It’s a very complex, sophisticated piece of software. Every time even something relatively small is changed, it has to go through a huge battery of tests because something put in to make, say, Java run faster might make, for example, the Apache web server break. (This is lucky for me as it means I have a job). Even given this, it’s still not perfect – programs run slower on it than they would on an x86 box (sometimes not very much slower, but usually at least a bit slower), and there are some programs that can’t work properly with it (not many, but some). It’s astonishingly good at what it does, but what it does is, by necessity, limited. (To the extent that, for example, the programs still have to be for the same GNU/Linux distro the POWER machine is running – you can’t use it to run, for example, a Red Hat 5 program on a Red Hat 4 box).

    Now, this hugely complex, sophisticated, expensive-to-create program converts from one chip type to another. But both chips are Von Neumann architectures. Both use the same peripheral devices, and the same interfaces to those devices. Both are designed by human beings. And the people writing that program have access to information about the design of both types of chip, and can test their program by running the same program on an x86 box and on a POWER box with their program and seeing what the result is.

    Now, when it comes to the ‘program’ that is the genetic code, none of that’s true. In this case, the hardware+operating system is the cell in which the genetic code is embedded, plus the womb in which that cell gets embedded, the umbilical cord that brings it nutrients, the systems that keep the mother’s body temperature regulated, the hormone levels that change at different times… basically, instead of having two chips, both of which you can examine, and leaving everything else the same and trying to get your three gig program to run (which I know from experience is in itself a massive problem), you have to simulate an entire human being (or near as dammit) in software in order to run the genetic code program – which we’re running, remember, *in order to simulate part of a human being!*

    And you have to do that with no access to source code, with no way of testing like-for-like (unless there are women who are lining up to be impregnated with randomly-altered genetic material to see what happens), and with the knowledge that the thing you’re creating isn’t just a computer program, but at least potentially a sentient being, so a coding error isn’t going to just cause someone to lose a day’s work because of a crash, but it’ll give this sentient being some kind of hideous genetic illness.

    And what are you left with, at the end of that effort? A baby. One that can’t interact with the physical world, and that is many, many times slower than a real baby. And that will undoubtedly have bugs in (any computer program longer than about thirty lines has bugs in). And that requires huge amounts of energy to run.

    I can think of more fun, more reliable ways of making a baby, should I happen to want one.

    But the point is, that even that would be a phenomenal, incredible achievement. It would dwarf the Manhattan Project and the moon landings and the Human Genome Project. It would require billions in funding, thousands of people working on it, many decades of work, and several huge conceptual breakthroughs in both computer science and biology.

    Which is not to say that it’s impossible. I’ve never seen a good argument against it being theoretically possible to create artificial general intelligence, and I’ve not seen any convincing ones against the possibility of uploading and emulating a particular person’s brain state. And assuming that technology continues to improve and civilisation doesn’t collapse, it may well happen one day. But people like Kurzweil arguing that the relatively small size of the genome makes it a trivial problem, one that will be solved in the next couple of decades, are like those people who drew graphs in 1950 showing that if top speeds achieved by humanity carried on increasing we’d be at the speed of light in the 1990s. The first example of that I saw was in Heinlein’s essay Pandora’s Box. The retro-futurology page has an examination of some of the other predictions Heinlein made in that essay. Suffice it to say, he didn’t do well. And Heinlein was far more intelligent and knowledgeable than Kurzweil.

    And of course, hidden in that paragraph above is a huge assumption – “assuming that technology continues to improve and civilisation doesn’t collapse”. It’s that, in part, that I want to talk about in the next part of this, coming up in a few hours.