Bayesianism gives a pretty compelling account of probability as a subjective measure of uncertainty. It’s my favourite major approach to the issue, and I think for the most part it’s an intuitive theory that runs into less weird philosophical pitfalls than accounts like Frequentism. I also think the extent to which it’s taken to “solve” probability seems a bit overstated to me. In particular, I don’t think Bayesianism really fully analyses what probabilities are, without kicking the can down the road to other murky concepts like “degree of belief” or “subjective uncertainty” etc. that aren’t all that well analysed. I think a big part of why it gets a pass on this is that, taken in context of the 20th century work that led to its formulation, it’s a *huge* step forward. In a 1929 lecture Bertrand Russell said:
Probability is the most important concept in modern science, especially as nobody has the slightest notion what it means.
It’s sometimes hard to put yourself back in the shoes of someone in the grips of a pre-paradigmatic confusion – probability might seem very intuitive to you now but for a lot of incredibly smart people in the early 20th century had no clue what was going on with it, to the point that they were genuinely a bit worried given how ubiquitous their usage was. “Foundational” questions about probability then weren’t pedantic weirdos asking “but what is “subjective uncertainty”?”. They were more general, open ones about the nature of probability – was it subjective or objective? Why does it seem to show up in decision-making so much? Is there something “necessary” about the use of probability in decision-making or can we use something else? Can we be rational in the face of uncertainty without using probabilities?
The 20th century was (among a few other things) a tour de force of ground-breaking work on these kinds of questions. Bayesianism was a school of thought that emerged and which gives (in my view pretty good) answers to all of the above questions. So it’s natural from the point of view of these kinds of questions to view it as giving a firm grounding for the use of probabilities, or maybe even “solving” them. They certainly made it a lot clearer what the hell was going on when we talked about them.
But at the same time, there’s a sense in which I think the Bayesian answer to probabilities feels a bit circular, or at the very least merely pushes a lot of the analysis a step back. If you ask “What does it mean when an agent says that the probability of a coin landing heads 0.5?” and get back an answer like “It means that their subjective uncertainty given their state of knowledge about the world corresponds to 0.5 on heads” or “their degree of belief is 0.5” or even “It means that their credence function assigns 0.5 to heads”, you might be a bit frustrated and want to ask “Ok but what does it mean for an agent’s subjective uncertainty given their information to be a specific value? Why is their degree of belief 0.5 and not another number? And what the hell is a credence function? That just sounds like another word for ‘probability’ to me”. In general, when we’re analysing a concept we want the concepts being used in the definiens to be “primitive” in some sense, or at least concepts which are already well-understood and analysed.
I think there are two obvious competing accounts of what something like “subjective uncertainty” is supposed to mean in a non-circular way, and I’m not sure I think either works. The first is what I call the “modelling” approach, and the second is a more dispositional, functionalist account. I kind of like the second approach better, but I think how to properly spell it out is tough and there are some harder-to-articulate problems with it I think, so I’m going to save that for the next post. For the third post I’ll try to write up what I think might be a more promising angle to take, as confused as it currently is. Here I’m going to run briefly through how I understand the modelling approach, and why I think it fails, because I think it’s a more common one (at least implicitly).
The Modelling approach to defining subjective uncertainty
This view, broadly speaking, says that when we assign probabilities to hypotheses, we have a mental model of the possibilities consistent with our knowledge, and this determines our subjective uncertainty. Most commonly the type of model used seems to be a possible-world structure, in some guise or other. This is often literally reasoning over a set of possible worlds, but it could be a set of Turing machines outputting world histories, or something even funkier. For the sake of simplicity I’ll just stick to talking about reasoning over sets of possible worlds, but the specifics don’t matter so much. When I see people apply this kind of process in real life, it’s almost always confined to reasoning over finite sets of possible worlds, where each “atomic” possible world is equally likely, with a hypothesis being more probable if a greater fraction of the possible worlds make that hypothesis true. Intuitively speaking, there are lots of ways the world could be given our knowledge, and our degree of belief in a proposition goes up the more of those worlds are ones in which the proposition is true.
In the finite-set case this yields obvious answers to probability questions: when I consider a fair dice, maybe I have a toy model of the world which has six cases – one possible outcome for each way the dice could land. Of these 6, only one corresponds to the dice landing on 3, so the probability is ⅙ – easy. If we want to answer more complicated questions, like “what’s the probability the dice lands below 2 but above 3?” we can also just count. However, a key component of why this is answer feels so natural and “well-founded” is that our model was artificially constrained to make things easy. When we consider uncertain outcomes, there aren’t really single possible worlds corresponding to each distinct outcomes. My model of which world I might be in contains a world in which the dice landed on 6 AND it’s raining in Sydney, as well as one in which the dice landed on 6 AND it isn’t raining in Sydney. If I’m being thorough there’s actually a mind-boggling proliferation of distinct possible worlds, whether I’m only implicitly imagining them or not. This is fine for the above approach as long as the set stays only “very big”. If there’s a raining-in-Sydney and a not-raining-in-Sydney world for each way the dice lands, then everything essentially cancels out.
Objection: This only works for finite, toy cases
Unfortunately the set of possible worlds isn’t only “very big”, it’s infinite. There’s a possible world, for every n, in which “the maximum GDP of humanity across its history is n units of whatever the most popular currency at the time is” is true. At some point the numbers get ridiculous, but not impossible, so there’s a possible world for them out there somewhere. If you don’t like that example, consider the set of possible worlds in which the universe is infinite. Famously, anything naive to do with counting or comparing sizes just kind of breaks down when you get to infinities – if there are infinitely many possible worlds which contain the result of my dice, I can put a bijection between the worlds in which the dice lands 6 and ones in which the dice doesn’t. Does this mean that the dice lands on 6 with probability 0.5? It doesn’t feel like it, especially since that particular bijection was totally arbitrary – we can pick whatever mapping we like to make whatever two events we like be equally likely in a “how many worlds are they true in” sense.
Response: Put a measure over the set of possible worlds
The issue above was due to an overly-simplistic approach to reading off probability from our set of possible-worlds – one that breaks down when we consider an infinite set of possibilities. Maybe the problem was the naive kind of counting approach, which works for toy examples but not for infinite models. One thing we can do instead is have some function which assigns to each possible world a non-negative real number, satisfying some constraints (e.g. the sum over all worlds must be 1, the sum over all worlds in which X is true must be 1 – the sum over all worlds in which X is false). What we’re really doing is defining a measure over S, the set of possible worlds. There’s a bit more technical detail than this, and things are a bit more complicated if we think the set of possible worlds is uncountable, but let’s leave those aside as I don’t think they’re super relevant to the point here. The key point is that we just need to have an appropriate “weighting” function, and we can make sense of some subsets of possible worlds being more likely than other, equally sized subsets.
If we have a nice measure like this, then we seem to be able to make perfect sense of what our model says about the probability that the dice landed 6 – it’s just whatever the countable subset of worlds in which the dice landed 6 sums to for our measure. One interesting caveat here is that, whatever our measure is, it can’t assign equal weight to all possible worlds. Since measures are countably additive, if we assign each world 0, the sum over all worlds is 0, and if we assign it some p > 0, there’s some finite number n such that the sum over n is > 1. Both are obviously Bad News, so our measure needs to not be uniform. The Solomonoff prior is an obvious example of how to do this – if we have our set of possible worlds in our model be those whose histories are outputs of a Turing machine, then we can weight each possible world proportional to 2^-k, where k is the length of the description of the Turing machine. It’s pretty easy to see that the set of all possible worlds will have non-zero weight but will still sum to 1, which is great. This isn’t the *only* way to assign a well-behaved distribution over the possible worlds, but it’s a famous one.
Counter-response: This makes subjective uncertainty either totally arbitrary or circular
The issue with the above approach is that assigning a measure over our set of possible worlds here is equivalent to defining a probability distribution over it. There are many ways to assign appropriate weightings over possible worlds, and so our degree of belief is determined by which one we choose. Either we choose arbitrarily, which doesn’t feel right (it doesn’t seem like I can “choose” to weight a possible world in my prior more), or there’s some criteria by which I can choose the “right” weighting, presumably it’s because one represents my subjective uncertainty better. But that means there’s a “subjective uncertainty” that isn’t being determined by my model of possible worlds because it’s the thing determining that model. In other words, to determine what our subjective uncertainty in a proposition is given the set of worlds in which it is true, we need to first decide which worlds are more likely, even though this model is what was supposed to be grounding our probability judgements in the first place. In other other words, to have a model of the possible worlds we need to choose a prior, but our prior was supposed to be the thing that our model explained!
I think it’s important to distinguish this from the more straightforward claim that any prior is “valid”, and that rational agents can have different priors on the same hypothesis. It’s obviously true that two rational agents can have arbitrarily different priors for a hypothesis, but neither of their priors are “arbitrary” – they both come from different models which are supposed to be determining the agent’s subjective uncertainty. The issue here is that for a fixed agent, with a fixed set of possible worlds under consideration, that model over possible worlds can’t explain their uncertainty unless there’s either arbitrariness or some other type of subjective uncertainty governing their weightings in the model.
There are a lot of back-and-forths and off-shooting rabbit-holes that spin off from the above, but I don’t think any of them head anywhere promising. I think a big objection to my arguing above might be that you think it gets the “explanatory direction” the wrong way round – maybe you think we don’t actually have some rarified model inside us that informs our behaviour, rather our internal models are there as a kind of organised representation of our dispositions towards uncertain lotteries, which are the real “bedrock” concept in this analysis. In other words, a world w1 gets weighted twice as heavily as a world w2 in my model just because I’d take a bet we were in w1 vs w2 at 1:2 odds. This makes betting behaviour the fundamental explanatory concept, with fancy possible-world models being just ways for us to internally keep track of and organise these betting dispositions. This approach – subjective uncertainty as a fundamentally behavioural/dispositional account – is really interesting, and although promising I think it has a ton of issues of its own. I’ll hopefully write those up soon, but in the mean-time please reach out if you think anything I’ve said here is especially stupid and/or wrong, or if you have any thoughts on it.