This Paradox Used to Split Smart People 50/50, But Now Things Have Changed

by Yanli 盐粒 in 2026-03-18

What is Newcomb's Paradox?

Imagine two boxes in front of you. The transparent Box A definitely contains $1, 000$ . The opaque Box B, the mysterious box, either contains $1, 000, 000$ or nothing at all.

You may choose only Box B (one-boxing), or take both boxes (two-boxing).

But before you choose, a highly accurate "super predictor" has already predicted your behavior:

If it predicts that you will take both boxes, it leaves Box B empty.
If it predicts that you will take only Box B, it puts $1,000,000 in Box B.

Now the money has already been placed or not placed, and the predictor will not change anything anymore. What would you choose?

If you have never seen this paradox before, pause for a moment and think about your own answer first. Then watch Veritasium's video and fill out its poll: This Paradox Splits Smart People 50/50.

In fact, both one-boxing and two-boxing are internally coherent choices. But what interested me was this: why would the split be 50/50? I chose one-boxing almost immediately. In Veritasium's poll, more than 70% also chose one box, and in another poll around me, one-boxing was also the majority. So what is going on here? Something must be off.

To fully understand the logic behind it, I abstracted the problem into a mathematical formula and did a deep think.

Q: What exactly is Newcomb's Paradox calculating?

At its core, this is a problem of maximizing expected payoff.

Let $y$ be the probability that the mysterious box contains $10^{6}$ dollars, and let $x$ be the probability that I choose only the mysterious box. Then my expected payoff from facing these two boxes can be written as:

(10^{6} \cdot y) x + (10^{3} + 10^{6} \cdot y) (1 - x)

Assume that both $x$ and $y$ take values between 0 and 1.

After simplification, we get the final payoff function:

f (x, y) = 10^{6} \cdot y - 10^{3} \cdot x + 10^{3}

This leads directly to the core disagreement in the paradox: how should we define $y$ ?

Q: If $y$ is a constant, what value of $x$ maximizes the payoff?

If we treat the "super predictor's" decision as an already completed fact from yesterday, something absolutely impossible to change today, then $y$ is a constant.

In the formula above, the coefficient of $x$ is negative, namely $- 10^{3}$ . That means when $y$ is fixed, the larger $x$ becomes, the smaller the payoff becomes. So under pure temporal-causal reasoning, $x$ must be 0. In other words, take both boxes. The extra thousand dollars is free money.

Q: What if $y$ is a variable positively correlated with $x$ ?

If we believe the machine is extremely accurate, and that my choice ( $x$ ) is tightly bound to its prediction ( $y$ ), then $y$ becomes a variable positively correlated with $x$ .

At that point, the game becomes a confrontation between a $10^{6}$ reward and a $10^{3}$ penalty. As long as the machine's predictive accuracy is even somewhat decent, meaning the slope of that positive correlation is greater than 0.001, the million-dollar gain instantly crushes the thousand-dollar loss. Under that logic, $x$ must be 1. In other words, take only one box.

Q: So what is this question really testing?

After thinking it through, I realized the problem contains two layers of ambiguity:

At the intuitive level, if your intuition is driven by $10^{6} ≫ 10^{3}$ , you choose one box. If your intuition is driven by loss aversion and temporal logic, you choose both boxes.
After fully understanding the setup, if you insist that "correlation does not imply causation," you choose both boxes. If you think "correlation can point to causation," you choose one box.

For decades, these two intuitions were roughly evenly matched in the population. But now the balance has shifted.

Q: Why do recent polls show an overwhelming majority choosing one box?

This is very different from the earlier 50/50 split. The deeper reason is that the technological background of our era has reshaped human intuition at a subconscious level.

The setup describes the predictor using the language of a "supercomputer," and that naturally makes people think of LLMs. An LLM is a typical model that uses statistical distributions to predict causal probabilities. Hugging Face even literally calls that model family CausalLM.

So the problem statement now carries an implicit suggestion that "correlation can indicate causation," which nudges more people toward one-boxing. In earlier surveys, by contrast, very few people believed such a "supercomputer" could really exist, so answers stayed closer to an even split.

How the Times Reshape Intuition

Newcomb's Paradox was proposed in 1969. In an era when personal computers did not even exist yet, and computers were still room-sized punch-card machines, "a supercomputer that can perfectly predict human behavior" sounded like theology or magic.

People in the past, especially philosophers: they saw that kind of prediction as absurd, so the mind automatically retreated to the strongest defensive line available, namely classical causality and the arrow of time. Yesterday cannot be changed by today, so they firmly chose two boxes.
People like us today: we are "mind-read" by recommendation algorithms every day, and we watch AI predict and even manipulate human behavior through probability distributions. Forget LLMs for a moment. What software product today does not have a recommendation algorithm? People in the past did not have that mindset. Anyone who could do this would have seemed like a mystic. Now people even use DeepSeek for fortune-telling, and large models are starting to replace the old mystic's job. We already have a felt sense that a machine can see through us. The predictor no longer feels theological. It feels like an engineering reality that is already on the horizon. So we are more willing to loosen our attachment to strict temporal causality, embrace evidential decision theory, and choose one box.

Q: What exactly is Newcomb's Paradox calculating? ​

Q: If y is a constant, what value of x maximizes the payoff? ​

Q: What if y is a variable positively correlated with x? ​

Q: So what is this question really testing? ​