What are Subjective Probabilities? (Part 1)

Bayesianism gives a pretty compelling account of probability as a subjective measure of uncertainty. It’s my favourite major approach to the issue, and I think for the most part it’s an intuitive theory that runs into less weird philosophical pitfalls than accounts like Frequentism. I also think the extent to which it’s taken to “solve” probability seems a bit overstated to me. In particular, I don’t think Bayesianism really fully analyses what probabilities are, without kicking the can down the road to other murky concepts like “degree of belief” or “subjective uncertainty” etc. that aren’t all that well analysed. I think a big part of why it gets a pass on this is that, taken in context of the 20th century work that led to its formulation, it’s a *huge* step forward. In a 1929 lecture Bertrand Russell said:

Probability is the most important concept in modern science, especially as nobody has the slightest notion what it means.

It’s sometimes hard to put yourself back in the shoes of someone in the grips of a pre-paradigmatic confusion – probability might seem very intuitive to you now but for a lot of incredibly smart people in the early 20th century had no clue what was going on with it, to the point that they were genuinely a bit worried given how ubiquitous their usage was. “Foundational” questions about probability then weren’t pedantic weirdos asking “but what is “subjective uncertainty”?”. They were more general, open ones about the nature of probability – was it subjective or objective? Why does it seem to show up in decision-making so much? Is there something “necessary” about the use of probability in decision-making or can we use something else? Can we be rational in the face of uncertainty without using probabilities?

The 20th century was (among a few other things) a tour de force of ground-breaking work on these kinds of questions. Bayesianism was a school of thought that emerged and which gives (in my view pretty good) answers to all of the above questions. So it’s natural from the point of view of these kinds of questions to view it as giving a firm grounding for the use of probabilities, or maybe even “solving” them. They certainly made it a lot clearer what the hell was going on when we talked about them.

But at the same time, there’s a sense in which I think the Bayesian answer to probabilities feels a bit circular, or at the very least merely pushes a lot of the analysis a step back. If you ask “What does it mean when an agent says that the probability of a coin landing heads 0.5?” and get back an answer like “It means that their subjective uncertainty given their state of knowledge about the world corresponds to 0.5 on heads” or “their degree of belief is 0.5” or even “It means that their credence function assigns 0.5 to heads”, you might be a bit frustrated and want to ask “Ok but what does it mean for an agent’s subjective uncertainty given their information to be a specific value? Why is their degree of belief 0.5 and not another number? And what the hell is a credence function? That just sounds like another word for ‘probability’ to me”. In general, when we’re analysing a concept we want the concepts being used in the definiens to be “primitive” in some sense, or at least concepts which are already well-understood and analysed.

I think there are two obvious competing accounts of what something like “subjective uncertainty” is supposed to mean in a non-circular way, and I’m not sure I think either works. The first is what I call the “modelling” approach, and the second is a more dispositional, functionalist account. I kind of like the second approach better, but I think how to properly spell it out is tough and there are some harder-to-articulate problems with it I think, so I’m going to save that for the next post. For the third post I’ll try to write up what I think might be a more promising angle to take, as confused as it currently is. Here I’m going to run briefly through how I understand the modelling approach, and why I think it fails, because I think it’s a more common one (at least implicitly).  

The Modelling approach to defining subjective uncertainty

This view, broadly speaking, says that when we assign probabilities to hypotheses, we have a mental model of the possibilities consistent with our knowledge, and this determines our subjective uncertainty. Most commonly the type of model used seems to be a possible-world structure, in some guise or other. This is often literally reasoning over a set of possible worlds, but it could be a set of Turing machines outputting world histories, or something even funkier. For the sake of simplicity I’ll just stick to talking about reasoning over sets of possible worlds, but the specifics don’t matter so much. When I see people apply this kind of process in real life, it’s almost always confined to reasoning over finite sets of possible worlds, where each “atomic” possible world is equally likely, with a hypothesis being more probable if a greater fraction of the possible worlds make that hypothesis true. Intuitively speaking, there are lots of ways the world could be given our knowledge, and our degree of belief in a proposition goes up the more of those worlds are ones in which the proposition is true.

In the finite-set case this yields obvious answers to probability questions: when I consider a fair dice, maybe I have a toy model of the world which has six cases – one possible outcome for each way the dice could land. Of these 6, only one corresponds to the dice landing on 3, so the probability is ⅙ – easy. If we want to answer more complicated questions, like “what’s the probability the dice lands below 2 but above 3?” we can also just count. However, a key component of why this is answer feels so natural and “well-founded” is that our model was artificially constrained to make things easy. When we consider uncertain outcomes, there aren’t really single possible worlds corresponding to each distinct outcomes. My model of which world I might be in contains a world in which the dice landed on 6 AND it’s raining in Sydney, as well as one in which the dice landed on 6 AND it isn’t raining in Sydney. If I’m being thorough there’s actually a mind-boggling proliferation of distinct possible worlds, whether I’m only implicitly imagining them or not. This is fine for the above approach as long as the set stays only “very big”. If there’s a raining-in-Sydney and a not-raining-in-Sydney world for each way the dice lands, then everything essentially cancels out.

Objection: This only works for finite, toy cases

Unfortunately the set of possible worlds isn’t only “very big”, it’s infinite. There’s a possible world, for every n, in which “the maximum GDP of humanity across its history is n units of whatever the most popular currency at the time is” is true. At some point the numbers get ridiculous, but not impossible, so there’s a possible world for them out there somewhere. If you don’t like that example, consider the set of possible worlds in which the universe is infinite. Famously, anything naive to do with counting or comparing sizes just kind of breaks down when you get to infinities – if there are infinitely many possible worlds which contain the result of my dice, I can put a bijection between the worlds in which the dice lands 6 and ones in which the dice doesn’t. Does this mean that the dice lands on 6 with probability 0.5? It doesn’t feel like it, especially since that particular bijection was totally arbitrary – we can pick whatever mapping we like to make whatever two events we like be equally likely in a “how many worlds are they true in” sense. 

Response: Put a measure over the set of possible worlds

The issue above was due to an overly-simplistic approach to reading off probability from our set of possible-worlds – one that breaks down when we consider an infinite set of possibilities. Maybe the problem was the naive kind of counting approach, which works for toy examples but not for infinite models. One thing we can do instead is have some function which assigns to each possible world a non-negative real number, satisfying some constraints (e.g. the sum over all worlds must be 1, the sum over all worlds in which X is true must be 1 – the sum over all worlds in which X is false). What we’re really doing is defining a measure over S, the set of possible worlds. There’s a bit more technical detail than this, and things are a bit more complicated if we think the set of possible worlds is uncountable, but let’s leave those aside as I don’t think they’re super relevant to the point here. The key point is that we just need to have an appropriate “weighting” function, and we can make sense of some subsets of possible worlds being more likely than other, equally sized subsets.

If we have a nice measure like this, then we seem to be able to make perfect sense of what our model says about the probability that the dice landed 6 – it’s just whatever the countable subset of worlds in which the dice landed 6 sums to for our measure. One interesting caveat here is that, whatever our measure is, it can’t assign equal weight to all possible worlds. Since measures are countably additive, if we assign each world 0, the sum over all worlds is 0, and if we assign it some p > 0, there’s some finite number n such that the sum over n is > 1. Both are obviously Bad News, so our measure needs to not be uniform. The Solomonoff prior is an obvious example of how to do this – if we have our set of possible worlds in our model be those whose histories are outputs of a Turing machine, then we can weight each possible world proportional to 2^-k, where k is the length of the description of the Turing machine. It’s pretty easy to see that the set of all possible worlds will have non-zero weight but will still sum to 1, which is great. This isn’t the *only* way to assign a well-behaved distribution over the possible worlds, but it’s a famous one.

Counter-response: This makes subjective uncertainty either totally arbitrary or circular

The issue with the above approach is that assigning a measure over our set of possible worlds here is equivalent to defining a probability distribution over it. There are many ways to assign appropriate weightings over possible worlds, and so our degree of belief is determined by which one we choose. Either we choose arbitrarily, which doesn’t feel right (it doesn’t seem like I can “choose” to weight a possible world in my prior more), or there’s some criteria by which I can choose the “right” weighting, presumably it’s because one represents my subjective uncertainty better. But that means there’s a “subjective uncertainty” that isn’t being determined by my model of possible worlds because it’s the thing determining that model. In other words, to determine what our subjective uncertainty in a proposition is given the set of worlds in which it is true, we need to first decide which worlds are more likely, even though this model is what was supposed to be grounding our probability judgements in the first place. In other other words, to have a model of the possible worlds we need to choose a prior, but our prior was supposed to be the thing that our model explained! 

I think it’s important to distinguish this from the more straightforward claim that any prior is “valid”, and that rational agents can have different priors on the same hypothesis. It’s obviously true that two rational agents can have arbitrarily different priors for a hypothesis, but neither of their priors are “arbitrary” – they both come from different models which are supposed to be determining the agent’s subjective uncertainty. The issue here is that for a fixed agent, with a fixed set of possible worlds under consideration, that model over possible worlds can’t explain their uncertainty unless there’s either arbitrariness or some other type of subjective uncertainty governing their weightings in the model.

What else?

There are a lot of back-and-forths and off-shooting rabbit-holes that spin off from the above, but I don’t think any of them head anywhere promising. I think a big objection to my arguing above might be that you think it gets the “explanatory direction” the wrong way round – maybe you think we don’t actually have some rarified model inside us that informs our behaviour, rather our internal models are there as a kind of organised representation of our dispositions towards uncertain lotteries, which are the real “bedrock” concept in this analysis. In other words, a world w1 gets weighted twice as heavily as a world w2 in my model just because I’d take a bet we were in w1 vs w2 at 1:2 odds. This makes betting behaviour the fundamental explanatory concept, with fancy possible-world models being just ways for us to internally keep track of and organise these betting dispositions. This approach – subjective uncertainty as a fundamentally behavioural/dispositional account – is really interesting, and although promising I think it has a ton of issues of its own. I’ll hopefully write those up soon, but in the mean-time please reach out if you think anything I’ve said here is especially stupid and/or wrong, or if you have any thoughts on it.

Infinite Ordinals and Naive Utilitarianism

Weird Intuitions

One of the interesting things about taking naïve utilitarianism to its logical conclusion is that it recommends things which are intuitively very hard to swallow. In particular, things which otherwise have very low or even negligible value can become dominating in ethical considerations in enough scale. One example that’s particularly salient today in EA discussions is the suffering of non-human animals, especially “lower” creatures like arthropods. I think most utilitarian-minded people – even very pro-animal welfare ones – would assign vastly lower value to the mental states (if they exist) of a mayfly than to those of a human. But if both have non-zero value, then in enough numbers (and if insects have anything they have numbers) their aggregate intrinsic value will outweigh those of a human – perhaps even humanity as a whole. There are approximately 1.4 billion insects (only a subset of arthropods) for each human on Earth – if your moral framework assigns your average insect >(1/1,400,000,000)th the worth of a human, naïve utilitarianism would seem to imply the intrinsic moral value of insects in Earth outweighs that of humans. It seems like the only other alternative is to deny that insects have non-zero intrinsic value, so that no amount of them is ever “worth” even one human. This probably seems odd to a lot of people, who might have an intuition along the lines of the following:

“Sure, bug suffering is bad, and all else equal I’d love to have more happy bugs frolicking about – I do care about them. But humans seem intrinsically valuable in a kind of qualitatively “overriding” sense, such that no amount of insects is worth to me one human”.

Note that we’re restricting ourselves to talking about intrinsic moral value here – the real picture is obviously complicated by the fact that humans have vastly greater instrumental moral value than insects. Maybe a utilitarian says that, intrinsically speaking, 10 billion happy insects are worth more than a human, but a human can create more value for other intrinsically valuable beings which counterfactually wouldn’t have been there. Those considerations aren’t what I’m getting at here – I’m getting at the intuition some people have that that first clause – that enough bugs can be as intrinsically valuable as one human – is just wrong. The intuition can be probed (more compellingly IMO) from the other direction, looking at disutility. “Dustspecks in eyes” is a bit of a meme, but it gets at an initially uncomfortable truth that, according to naïve utilitarianism, an almost imperceptibly mild inconvenience afflicting enough people will be more important to remedy than the protracted, horrific suffering of a relative few.

 What can we say about this intuition? In particular how do we square it against the naïve utilitarian calculus which says that if you care about both things, there should be some amount of the former which in aggregate becomes more valuable?

One natural response is that the intuition is just misguided, and it’s purely a failure of imagination. Our moral intuitions are generally pretty shaky at the extremes – it’s hard to viscerally “feel” as much compassion for an individual in the remote future as someone living today, even if intellectually you don’t believe there is any moral intrinsic difference between them. It’s hard to “feel” the suffering of someone on the other side of the world as much as much as you feel that of someone right in front of you, even if you don’t think there’s an intrinsic difference. Similarly, people are really, really bad at intuitively grasping huge numbers. Maybe it’s no wonder that you don’t feel that moved by a Graham’s number worth of people with dustspecks in their eyes, because Graham’s number just doesn’t “feel” like anything more than “a big number”, which is ever-so-slightly understating it. Maybe, if you could really intuit how big Graham’s number was, you would begin uncontrollably weeping at the thought of all that dust in all those eyes. Similarly, it might “feel” like no amount of mayflies is worth one human – but only because your intuitions about big numbers are probably already creaking under the weight of counting up to a billion. According to this response, you’ve laid out your moral “axioms” already, and should trust their logical conclusions over your intuitions, especially in areas where you know your intuitions tend to be flaky.

I think I largely agree with this response, and I think a lot of cases where utilitarianism seems to recommend something crazy *do* come from our limited ability to extrapolate our intuitions to very extreme scenarios. At the same time I think this response maybe overshoots, and importantly maybe precludes a type of utilitarianism that can consistently allow for some of these overriding judgements to be valid. After all, those axioms whose conclusions we’re telling people to trust over their intuitions started out as attempts to formalize those very intuitions! If something seems overwhelmingly compelling to someone (as rejecting the dustspecks thought experiment might to some), they should see if they can capture that intuition in their utilitarian calculus.

How might we capture this idea? That a mayfly does matter a non-zero amount but that no amount of mayflies could matter as much as one human? It might seem impossible on a naïve utilitarian view, but this is only because the values we’re assigning implicitly are all finite. If we extend the co-domain of our utility function to infinite ordinals instead then we can plausibly get a richer form of utilitarianism which preserves most of our “common-sense” intuitions about moral judgements while allowing for some things to “override” infinite amounts of other valuable things

Infinite Ordinals

In case you haven’t seen infinite ordinals before, they’re an extension of the idea of counting numbers/rankings beyond the finite, in the same way that infinite cardinals extend the concept of “size” beyond the finite.

The standard finite ordinals start like this:

0, 1, 2, 3…..

You may have come across them before. In the lingo of ordinals, each ordinal is the successor of the previous one. The ordinal successor function is simply “+1”. Implicitly when you’re thinking in utilitarian terms you’re ranking outcomes, and so your options must be embeddable somewhere into these ordinals (what it means for a bunch of things to be ordered by a consistent preference function is that they can be embedded in a set of ordinals). This is where the counter-intuitive thought experiments arise from: No matter how big a gap in value between two things in value (assign a mayfly to the ordinal x = 1 and a human to the ordinal of y = 10^100), there is some finite amount of times we can apply the successor function to x such we end up with a ordinal > y. As long as the intrinsic value of increasing amounts of bugs doesn’t asymptote (i.e. there’s no finite number n such that no amount of bugs can be worth n (which would in a roundabout way be a vindication of the “unimaginative” intuition)), then we can always eventually make our bugs more valuable than a human (and by extension, more valuable than all humans). The picture changes if we add infinite ordinals. The first is:

ω

and is defined as the first ordinal “after” all of the finite ordinals, or in other words the first ordinal that cannot be reached from a finite ordinal by any finite number of applications of the successor functions. This type of ordinal – one that is “inaccessible” from ordinals below no matter how many times you increment them – is called a limit ordinal, and obviously don’t occur anywhere in the set of finite ordinals.

Things don’t stop there: there is (always) a “next” ordinal, which as per our successor function above will be ω+1. You may (if you think very hard) be able to see what will come next. Eventually though, there will be a new limit, the ordinal which is the limit of:

 ω , ω+1, ω+2, ω+3…

which is ω + ω (or ω2 for short). And mind-bendingly enough we also eventually get:

ω2, ω3, ω4, ω5 ….

Up until another limit ordinal ω* ω, or ω2. I’m going to stop here before it starts getting silly, but if you find this stuff interesting, John Carlos Baez has a fantastic series of posts where he explores just how deep (or high?) this rabbit-hole goes. For now let’s just restrict or selves to ordinals < ω2.

Limit ordinals allow for a lot of interesting structure that finite ordinals lack. Suppose our utility function “embeds” beings into the ordinals, and that a mayfly is assigned 1 while a human is assigned ω. What kind of actions will our moral theory recommend?

Well, it’ll make some pretty common sense recommendations: two mayflies will be worth more than one, and two humans will be worth more than one (again assuming naïve additivity), since ω2 by definition is > ω. One human vs one human and one mayfly will also be easy, because again – ω+1 > ω by definition. This jives with the vague intuition that “all else equal, mayflies matter”. But what happens if we suddenly have to make a trade-off between mayflies and humans? Well, when we compare a human and a mayfly, since ω >> 1, our theory will recommend valuing the human over the mayfly (so far so good). But what’s interesting is that as we keep increasing the number of mayflies under consideration, the preference never changes! Because by definition, no repeated application of the successor function to an ordinal (in this case, no adding in some finite amount of more mayflies) can ever “reach” the limit ordinal. Note that this is a very limited application – in theory we could have as many “qualitatively different” valuable types of being, each of which is more valuable than any amount of “lesser creatures”, but similarly itself can never be worth one creature from the “level above”. Maybe you want to compare 1 human, 0 chickens and 500 bugs vs. 0 humans, 1000 chickens and 10^100 bugs. One embedding into the ordinals could map this onto comparing e.g.

ω3 + 500(ω)  vs.  1000(ω2) + 10^100 (ω)

In which case the former dominates. In general, when comparing mixtures of valuable things, our theory will “go down the list” in order of which things are individually most valuable until it finds a disagreement, and then side with whichever scenario is greater at that “level”. Everything is therefore still intrinsically valuable, although it will only be a “tie-breaker” and relevant to our decision if the two scenarios agree on everything else more valuable. This seems (to me) to square quite naturally with people’s intuitions about this class of problems, something like “Yeah sure, happier bugs are better – but only after making sure the humans are at least no worse off first”.

Maybe this is all sounding a bit gross at this point – maybe qualitatively screening off bugs or chickens from ever being as qualitatively valuable as one human sounds like the kind of thing generations in the future will look back on as incredibly bigoted. Maybe it’s a bit scary that this framework in theory allows for someone to consistently value humans but at the same time believe there is some special thing S such that no amount of humans can ever come close to being worth S. Would an agent who held such moral beliefs act convincingly moral until they all of a sudden did something unimaginably horrific? Maybe you just think that the system makes sense in theory, but a much better response is to just stick to the charge that all of this is really due to a lack of imagination, rather than the sceptical intuitions needing a more expressive framework for their moral beliefs. I’m kind of inclined to agree with all of these points (I think? I don’t know, my moral beliefs on this have been in flux lately). In any case, I think extending utilitarian thinking to infinite ordinals as above is still a good idea. Worst case-scenario, our moral framework will just make do with the subset of finite ordinals and can just pretend like those weird limit ordinals aren’t even there. But on the off-chance that that isn’t the case, it could be useful.

Deconstructing the Löbian 5-10 Problem

A pretty neat thought experiment that’s been floating around for quite a while now in the AI alignment/decision theory/nerdsniping space is the 5-10 problem. I’m not sure where it originated exactly but I’ve seen it discussed most by MIRI/MIRI-adjacent folk and it seems to have influenced their research a fair bit, so I assume it came about from them. It’s ostensibly a toy example which shows how sufficiently logical and self-reflective agents do some pretty wildly confusing and seemingly irrational things, and it’s a good intuition pump for why reasoning about/aligning such agents is very hard. I think it’s a cool idea but that it doesn’t work for reasons I haven’t really seen discussed elsewhere, so I thought I’d write them up with the expectation that I’ll immediately look like an idiot for missing something obvious. It should probably be caveat-ed that there seem to be a few versions of this problem which (IMO) are very different problems under the hood – what I’m talking about is the Löbian version specifically, since Löb’s Theorem in the context of self-reflection seems to be a popular topic.

Anyways the set-up is as follows: we have an AI/agent considering whether to take $5 or $10. They like money and they’ll do whatever they think gets more money, so obviously they’ll take the $10. Except wuh-oh, maybe they won’t. Because what they’re going to do is start searching through proofs about the outcomes of the choices, and if they search for long enough they’ll stumble across the following proof (where A stands for what the agent is choosing and U stands for their reward)

  1. (A = 5) → (U= 5)
  2. (A = 5) → ((A = 10) → (U=0))
  3. □(((A = 5) → (U = 5) ∧ ((A = 10) → (U = 0))) → (A = 5)
  4. □(((A = 5) → (U = 5) ∧ ((A = 10) → (U = 0))) → ((A = 5) → (U = 5) ∧ ((A = 10) → (U = 0))
  5. ((A = 5) → (U = 5)) ∧ ((A = 10) → (U = 0))
  6. A = 5

The crux in less formal language is this: If our agent proves that choosing $5 results in $5 but choosing $10 results in $0, then they’ll take the $5 because they want more money. But if they choose the $5 then it’s true that the agent choosing $10 results in $0 (because it’s false that it chooses $10, and any conditional with a false antecedent is true). But now we have a scenario where the provability of a proposition has implied its truth, and our agent somewhere in its proof search will prove this. They’ll then inevitably derive, via Löb’s theorem, the proposition outright – hence choosing $5 over the pathetic alternative of $10 which only nets them $0.

Step (3.) is the crucial one here, but unfortunately I think it’s also the one that doesn’t work. I think seeing this is hard because most discussions of the problem seem to have been slightly informal about the process of what the agent actually does, but it’s worth being really explicit. On the most natural interpretation (IMO), the agent does something like this:

Proof Search: Loop through all possible proofs. Keep going till I find a valid proof of a sentence of the form “((A = 5) -> (U = X)) ∧ ((A = 10) -> (U = Y))”. At this point, halt and choose $5 if X > Y else $10.

(there are some other alternatives for how to cash this out I think but as far as I can tell they either run into similar issues as the one below, or else are so unnatural that they make the above behaviour make sense)

So this proof search algorithm is fine, but it means that the agent can’t prove step (3.) above. The reason is that the halting condition of our algorithm means that even if “((A = 5) → (U = 5)) ∧ ((A = 10) → (U = 0))” is provable, the agent doesn’t know that it will find such a proof before its proof search halts, and it thus can’t conclude anything about its own behaviour from merely knowing a proof exists out there. In other words, it needs to be able to prove that “((A = 5) → (U = 5)) ∧ ((A = 10) → (U = 0))” is provable AND that no other conflicting proposition of an acceptable form has a proof occurring earlier in its proof-search. One way to patch this would be for the agent to prove it’s consistent, and therefore be assured that it doesn’t prove any contradictory propositions anywhere, let alone earlier in the proof search. But by Godel’s Second Incompleteness it can’t do this, assuming consistency. It is possible for the theory to prove of most true sentences that they have a proof with no shorter disproof (although interestingly, Rosser’s Theorem tells us not every true sentence), but even if our agent can prove of a given proof-enumeration that it yields a proof of (3.) before any other candidates, Löb’s theorem no longer applies.

To see this, let’s dig into the structure of Löb’s Theorem a bit more. The Theorem states that, for any P, our agent can prove (□P -> P) -> P. Above we set P = (A = 5) → (U = 5)) ∧ ((A = 10) → (U = 0)), but we then pointed out that for the agent to derive anything about its actions it needs the “augmented” proposition Q = P ∧ [No shorter contradiction of P]. So ok fine, we just sub Q in for P above and repeat the argument right? Except no, we can’t rerun this argument with Q unfortunately – the agent can prove that Q implies P, but *cannot* prove that the provability of the “[No shorter contradiction of P]” bit actually implies “[No shorter contradiction of P]”.

I think a key insight is that P was actually fairly special in this respect – the agent’s understanding of its own source code and its environment allowed it to prove that certain actions were consequences of its proofs, and that certain sentences would be true as a consequence of those actions. The set-up of the environment and the agent’s “hard-coded” rules are what allowed us to derive truths from proofs of provability. As a general rule though, we don’t get to take sentences for free from knowing they’re provable – this is precisely why Löb’s Theorem doesn’t blow up in terrifying ways and imply anything we want, the way it might seem to when first coming across it. The point is a pretty subtle one – the agent can indeed derive the fact that it takes the $5 from Q, but since this is not exactly the same as Q (which it cannot derive from the provability of Q), the antecedent of Löb’s Theorem isn’t satisfied and the argument doesn’t go through.

I don’t think this is an especially big deal – it’s relevant pretty exclusively to a fairly niche line of argument invoking Löb’s Theorem, whereas like I said there are plenty of semi-related thought experiments that don’t rely on this but rather general weirdness that arises from logical counterfactuals. I think this a very different issue though, so really they shouldn’t be lumped together. Still though, I think it’s a worthwhile thing to point out about what’s been a pretty popular thought experiment.