(Typing it up to reference in discussions; it is by no means a detailed overview)
It is unanimously agreed that statistics depends
somehow on probability. But, as to what probability is and how it is
connected with statistics, there has seldom been such complete
disagreement and breakdown of communication since the Tower of Babel.
Doubtless, much of the disagreement is merely terminological and would
disappear under sufficiently sharp analysis.
Leonard Savage, The foundations of statistics, 1954.
The origin of the classical probability
When you throw a symmetrical 6-sided die, after sufficiently many bounces, the probability of getting one specific face is 1/6 .
We arrive at this number by considering the symmetry of the die and the physical process of the bouncing of the die. We can not predict which side the die will land at - due to the extreme sensitivity to initial conditions - but the symmetry permits us to conclude something about the way the die will land on average if we are to perform many trials. It is not a philosophical stance that this probability represents the frequency of occurrence in an infinite number of trials - it is just really what it is - in our partial model of the die physics.
The probability as observer's belief
The observer's probability that a die on the table has rolled some number may have to be further adjusted based on extra knowledge. For instance you can look at the die, and see that it rolled 5; now your probability that the die has rolled 5 is nearly 100%.
For example of a more involved event, you can make a robot that throws a die and tosses a coin, and if the coin has landed heads, it tells you the number that the die rolled, otherwise it tells you an uniform random number between 1 and
7 inclusive (which it obtains, say, by spinning a small roulette). Exercise for the reader: the robot gave you 6; what is the probability that the die rolled 6, and what is the probability that robot was answering using the roulette?
Bayes rule
The Bayes theorem is the rule by which probabilities affect other probabilities in the examples such as this robot problem (which I recommend you to solve on your own).
Uses of probability theory in computer graphics (and other applied physics)
Probability theory is widely used in computer graphics; for instance to calculate illumination values by averaging the number of photons that hit specific area. Pseudo-random number generators are employed in place of the die toss; a pseudo-random number generator is actually rather similar, in essence, to the bouncing of the die. Literal computation of average number of photons is often prohibitively slow for high quality imagery as the error decreases proportionally to inverse square root of number of simulated photons; a wide variety of more advanced methods, for example Metropolis light transport, are used to improve the asymptotic convergence. Sometimes, the convergence can be improved by forcing the photons into a regular, rather than random pattern and re-regularization of the photon field. The Bayes rule also pops in once in a while.
Probability theory as logic of uncertainties
Classical logic processes certain propositions and their relations, to obtain conclusions about the world. The probability theory gives same results as classical logic in the limit of certainty, and can be used to process uncertain propositions.
Hypotheses and probability
Suppose you have a hypothesis that a coin is biased and it always lands heads. How do you test this hypothesis? You can adopt a strategy that you toss the coin 20 times and believe coin to be biased if it always landed heads. Then an unbiased coin will trick you approximately 1 time out of million. Your degree of confidence in the coin being biased is thus described as this: I assume it is biased on basis of success of an experiment which had one in a million chance of non biased coin tricking me. (You can choose the adequate number of tosses in the experiment on basis of the cost of mistake and cost of the toss). This is one of the basic concepts of the scientific method.
But wait, you say. The coin might be biased, or it might not be biased, it is uncertain if it is or isn't! How can I assume it is biased? Wouldn't it be useful to find probability that the coin is biased? If you apriori knew the probability that the coin is biased, you could calculate the probability that coin is biased after performing series of experiments, using the above-mentioned likelihood of being tricked, in combination with Bayes theorem.
So there is a strong desire to assign some more or less arbitrary prior probability for the coin being biased, and then update it using Bayes theorem. While after a multitude of updates you become more correct, it still has all the obvious disadvantages of introducing a made up, arbitrary number into your calculations.
The former representation of the partial resolution of uncertainty by experiment is often called "frequentist" , while the latter is called "Bayesian".
Length-dependent prior probabilities for hypotheses
One can assign lower prior probabilities to more complicated hypotheses. Formally, one can represent a hypothesis with a Turing machine input tape of length
l, and assign it probability proportional to 2
-l . This is called 'Solomonoff prior'. You can strike out the hypotheses that do not match the observations, this is called Solomonoff Induction.
Note that it is mathematically equivalent to a belief, as a dogmatic certainty, that the ultimate physics of the
world we live in is a prefix Turing machine that is fed random bits on the input tape (probability of a specific start sequence is then 2
-l) .