Tuesday, February 5, 2013

Algorithmic probability: an explanation for programmers

Suppose you do a double slit experiment using low luminosity light source, and a night vision device for the viewer.

You see a flash of light - a photon has hit the detector. Then you see another flash of light. If you draw the flashes on grid paper you will, over the time, see something like this image on the right:

Suppose you want to predict your future observations. Doing so is called 'induction' - trying to infer the future observations from the past observations.

A fairly general, formalized way to do so is the following: You have a hypothesis pool, initially consisting of all possible hypotheses. Each hypothesis is a computer program that outputs a movie (like computer demos). As you look through the night vision device, and see flashes of light, you discard all movies that did not match your observations so far.

To predict the observations, you can look at what the movies show after the current time. But those movies will show different things - how do you pick the most likely one? Note that among the 1024 bit demos (yes, you can write a graphic program in 128 bytes), an 1023 bit demo will appear twice - followed by 1, and followed by 0 (it does not matter what you have after a program); a 1000 bit demo will appear roughly 16 million times, and so on. Thus the movies resulting from shorter demos will be more common - the commonality of a movie will be 2-l where l is the bit content of the demo.

This is the basic idea behind algorithmic probability with a distinction that algorithmic probability utilizes a Turing machine (to which your computer is roughly equivalent, apart for the limited memory).

It is interesting to speculate what those demos may be calculating. Very simple demos can look like a sequence of draw_point(132,631); skipframes(21); draw_point(392,117); and so on, hard coding the points. Demos of this kind which aren't thrown away yet will grow in size proportionally to number of points seen on the screen.

We can do better. The distribution of points in the double slit experiment is not uniform. Demos that map larger fraction of random sequences of bits to more common points on the screen, and smaller fraction of random sequences to less common points on the screen will most commonly produce the observed distribution. For instance, to produce Gaussian distribution, you can count bits set to 1 in a long sequence of random bits.

This sort of demos can resemble quantum mechanics, where the solution would be made by calculating complex amplitudes at the image points, and then converting them into a probability distribution by applying Born rule.

Can we do better still? Can we discard the part where we pick just one point? Can we just output a screen of probabilities?

Not within our framework. First off, Turing machines do not output or process real numbers at all; real numbers have to be approximately encoded in some manner (If you ever hear of Solomonoff induction code 'assigning' probabilities, this works through the code converting subsequent random bit strings into specific observations, producing, in the limit, more of some observations than the others). Secondarily, we haven't observed a screen of probabilities; when you toss a coin once, you don't see a 50% probability of it landing heads, you observe heads (or tails). What we have actually seen was flashes of light, at well defined points. The part of the demo where it picks a single point to draw a flash at is as important as the part where it finds the probability distribution of those points. Those parts are not even separate in our Gaussian distribution example, where the probability distribution is not explicitly computed but instead generated by counting the set bits.

Can we output a multitude of predictions with one code and then look through them for the correct ones? We can't do that without losing predictive power - we'll end up with a very short demo that outputs sequence from a counter, eventually outputting all possible video sequences.

[This blog post is a work in progress; todo is to take photos of double slit experiment using a laser pointer, and to make other illustrations as appropriate]


  1. I assume this is the start of a sequence in which you try to genuinely apply a quantitative Occam's razor to the interpretation of quantum mechanics?

    1. I am kind of unsure. My expertise is largely within the field of programming, so I understand the CS-related aspect of it well, but when it comes to quantum mechanics I do not want to claim expertise (though I definitely can write a simulator of the double slit experiment). If you can collaborate on the physics, we can do that. I myself do not think that Solomonoff Induction is a good idea. I imagine that the hypotheses should be ranked by how many symmetries they have, and the size of the guess space, justifying Occam's razor not as a law of nature but as an observation that shooting blind in a lower dimensional space has better chance of hitting the target.

    2. "justifying Occam's razor not as a law of nature but as an observation that shooting blind in a lower dimensional space has better chance of hitting the target." - that's excellent and should go in the post somewhere.