Subjective implication decision theory in critical agentialism

This is a follow-up to a previous post on critical agentialism, to explore the straightforward decision-theoretic consequences. I call this subjective implication decision theory, since the agent is looking at the logical implications of their decision according to their beliefs.

We already covered observable action-consequences. Since these are falsifiable, they have clear semantics in the ontology. So we will in general assume observable rewards, as in reinforcement learning, while leaving un-observable goals for later work.

Now let’s look at a sequence of decision theory problems. We will assume, as before, the existence of some agent that falsifiably believes itself to run on at least one computer, C.

5 and 10

Assume the agent is before a table containing a 5 dollar bill and a 10 dollar bill. The agent will decide which dollar bill to take. Thereafter, the agent will receive a reward signal: 5 if the 5 dollar bill is taken, and 10 if the 10 dollar bill is taken.

The agent may have the following beliefs about action-consequences: “If I take action 5, then I will get 5 reward. If I take action 10, then I will get 10 reward.” These beliefs follow directly from the problem description. Notably, the beliefs include beliefs about actions that might not actually be taken; it is enough that these actions are possible that their consequences are falsifiable.

Now, how do we translate these beliefs about action-consequences into decisions? The most straightforward way to do so is to select the policy that is believed to return the most reward. (This method is ambiguous under conditions of partial knowledge, though that is not a problem for 5 and 10).

This method (which I will call “subjective implication decision theory”) yields the action 10 in this case.

This is all extremely straightforward. We directly translated the problem description into a set of beliefs about action consequences. And these beliefs, along with the rule of subjective causal decision theory, yield an optimal action.

The difficulty of 5 and 10 comes when the problem is naturalized. The devil is in the details: how to naturalize the problem? The previous post examined a case of both external and internal physics, compatible with free will. There is no obvious obstacle to translating these physical beliefs to the 5 and 10 case: the dollar bills may be hypothesized to follow physical laws, as may the computer C.

Realistically, the agent should assume that the proximate cause of the selection of the dollar bill is not their action, but C’s action. Recall that the agent falsifiably believes it runs on C, in the sense that its observations/actions necessarily equal C’s.

Now, “I run on C” implies in particular: “If I select ‘pick up the 5 dollar bill’ at time t, then C does. If I select ‘pick up the 10 dollar bill’ at time t, then C does.” And the assumption that C controls the dollar bill implies: “If C selects ‘pick up the 5 dollar bill at time t‘, then the 5 dollar bill will be held at some time between t and t+k“, and also for the 10 dollar bill (for some k that is an upper bound of the time it takes for the dollar bill to be picked up). Together, these beliefs imply: “If I select ‘pick up the 5 dollar bill’ at time t, then the 5 dollar bill will be held at some time between t and t+k“, and likewise for the 10 dollar bill. At this point, the agent’s beliefs include ones quite similar to the ones in the non-naturalized case, and so subjective implication decision theory selects the 10 dollar bill.

Twin prisoner’s dilemma

Consider an agent that believes itself to run on computer C. It also believes there is another computer, C’, which has identical initial state and dynamics to C.

Each computer will output an action; the agent will receive 10 reward if C’ cooperates plus 1 reward if C defects (receiving 0 reward for defection).

As in 5 and 10, the agent believes: “If I cooperate, C cooperates. If I defect, C defects.” However, this does not specify the behavior of C’ as a function of the agent’s action.

It can be noted at this point that, because the agent believes C’ has identical initial state and dynamics to C, the agent believes (falsifiably) that C’ must output the same actions as C on each time step, as long as C and C’ receive idential observations. Since, in this setup, observations are assumed to be equal until C receives the reward (with C’ perhaps receiving a different reward), these beliefs imply: “If I cooperate, C’ cooperates. If I defect, C’ defects”.

In total we now have: “If I cooperate, C and C’ both cooperate. If I defect, C and C’ both defect”. Thus the agent believes itself to be straightforwardly choosing between a total reward of 10 for cooperation, and a total of 1 reward for defection. And so subjective implication decision theory cooperates.

Note that this comes apart from the conventional interpretation of CDT, which considers interventions on C’s action, rather than on “my action”. CDT’s hypothesized intervention updates C but not C’, as C and C’ are physically distinct.

Newcomb’s problem

This is very much similar to twin prisoner’s dilemma. The agent may falsifiably believe: “The Predictor filled box A with $1,000,000 if and only if I will choose only box A.” From here it is straightforward to derive that the agent believes: “If I choose to take only box A, then I will have $1,000,000. If I choose to take both boxes, then I will have $1,000.” Hence subjective implication decision theory selects only box A.

The usual dominance argument for selecting both boxes does not apply. The agent is not considering interventions on C’s action, but rather on “my action”, which is falsifiably predicted to be identical with C’s action.

Counterfactual mugging

In this problem, a Predictor flips a coin; if the coin is heads, the Predictor asks the agent for $10 (and the agent may or may not give it); if the coin is tails, the Predictor gives the agent $1,000,000 iff the Predictor predicts the agent would have given $10 in the heads case.

We run into a problem with translating this to a critical agential ontology. Since both branches don’t happen in the same world, it is not possible to state the Predictor’s accuracy as a falsifiable statement, as it relates two incompatible branches.

To avoid this problem, we will say that the Predictor predicts the agent’s behavior ahead of time, before flipping the coin. This prediction is not told to the agent in the heads case.

Now, the agent falsifiably believes the following:

  • If the coin is heads, then the Predictor’s prediction is equal to my choice.
  • If the coin is tails, then I get $1,000,000 if the Predictor’s prediction is that I’d give $10, otherwise $0.
  • If the coin is heads, then I get $0 if I don’t give the predictor $10, and -$10 if I do give the predictor $10.

From the last point, it is possible to show that, after the agent observes heads, the agent believes they get $0 if they don’t give $10, and -$10 if they do give $10. So subjective implication decision theory doesn’t pay.

This may be present a dynamic inconsistency in that the agent’s decision does not agree with what they would previously have wished they would decide. Let us examine this.

In a case where the agent chooses their action before the coin flip, the agent believes that, if they will pay up, the Predictor will predict this, and likewise for not paying up. Therefore, the agent believes they will get $1,000,000 if they decide to pay up and then the coin comes up tails.

If the agent weights the heads/tails branches evenly, then the agent will decide to pay up. This presents a dynamic inconsistency.

My sense is that this inconsistency should be resolved by considering theories of identity other than closed individualism. That is, it seems possible that the abstraction of receiving an observation and taking on action on each time step, while having a linear lifetime, is not a good-enough fit for the counterfactual mugging problem to achieve dynamic consistency.


It seems that subjective implication decision theory agrees with timeless decision theory and evidential decision theory on the problems considered, while diverging from causal decision theory and functional decision theory.

I consider this a major advance, in that the ontology is more cleanly defined than the ontology of timeless decision theory, which considers interventions on logical facts. It is not at all clear what it means to “intervene on a logical fact”; the ontology of logic does not natively contain the affordance of intervention. The motivation for considering logical interventions was the belief that the agent is identical with some computation, such that its actions are logical facts. Critical agential ontology, on the other hand, does not say the agent is identical with any computation, but rather than the agent effectively runs on some computer (which implements some computation), while still being metaphysically distinct. Thus, we need not consider “logical counterfactuals” directly; rather, we consider subjective implications, and consider whether these subjective implications are consistent with the agent effectively running on some computer.

To handle cases such as counterfactual mugging in a dynamically consistent way (similar to functional decision theory), I believe that it will be necessary to consider agents outside the closed-individualist paradigm, in which one is assumed to have a linear lifetime with memory and observations/actions on each time step. However, I have not proceeded exploring in this direction yet.

[ED NOTE: After the time of writing I realized subjective implication decision theory, being very similar to proof-based UDT, has problems with spurious counterfactuals by default, but can similarly avoid these problems by “playing chicken with the universe”, i.e. taking some action it has proven it will not take.]

A critical agential account of free will, causation, and physics

This is an account of free choice in a physical universe. It is very much relevant to decision theory and philosophy of science. It is largely metaphysical, in terms of taking certain things to be basically real and examining what can be defined in terms of these things.

The starting point of this account is critical and agential. By agential, I mean that the ontology I am using is from the point of view of an agent: a perspective that can, at the very least, receive observations, have cognitions, and take actions. By critical, I mean that this ontology involves uncertain conjectures subject to criticism, such as criticism of being logically incoherent or incompatible with observations. This is very much in a similar spirit to critical rationalism.

Close attention will be paid to falsifiability and refutation, principally for ontological purposes, and secondarily for epistemic purposes. Falsification conditions specify the meanings of laws and entities relative to the perspective of some potentially falsifying agent. While the agent may believe in unfalsifiable entities, falsification conditions will serve to precisely pin down that which can be precisely pinned down.

I have only seen “agential” used in the philosophical literature in the context of agential realism, a view I do not understand well enough to comment on. I was tempted to use “subjective”; however, while subjects have observations, they do not necessarily have the ability to take actions. Thus I believe “agential” has a more concordant denotation.

You’ll note that my notion of “agent” already assumes one can take actions. Thus, a kind of free will is taken as metaphysically basic. This presupposition may cause problems later. However, I will try to show that, if careful attention is paid, the obvious problems (such as contradiction with determinism) can be avoided.

The perspective in this post can be seen as starting from agency, defining consequences in terms of agency, and defining physics in terms of consequences. In contrast, the most salient competing decision theory views (including framings of CDT, EDT, and FDT) define agency in terms of consequences (“expected utility maximization”), and consequences in terms of physics (“counterfactuals”). So I am rebasing the ontological stack, turning it upside-down. This is less absurd than it first appears, as will become clear.

(For simplicity, assume observations and actions are both symbols taken from some finite alphabet.)

Naive determinism

Let’s first, within a critical agential ontology, disprove some very basic forms of determinism.

Let A be some action. Consider the statement: “I will take action A”. An agent believing this statement may falsify it by taking any action B not equal to A. Therefore, this statement does not hold as a law. It may be falsified at will.

Let f() be some computable function returning an action. Consider the statement: “I will take action f()”. An agent believing this statement may falsify it by taking an action B not equal to f(). Note that, since the agent is assumed to be able to compute things, f() may be determined. So, indeed, this statement does not hold as a law, either.

This contradicts a certain strong formulation of naive determinism: the idea that one’s action is necessarily determined by some known, computable function.


But wait, what about physics? To evaluate what physical determinism even means, we need to translate physics into a critical agential ontology. However, before we turn to physics, we will first consider action-consequences, which are easier to reason about.

Consider the statement: “If I take action A, I will immediately there-after observe O.” This statement is falsifiable, which means that if it is false, there is some policy the agent can adopt that will falsify it. Specifically, the agent may adopt the policy of taking action A. If the agent will, in fact, not observe O after taking this action, then the agent will learn this, falsifying the statement. So the statement is falsifiable.

Finite conjunctions of falsifiable statements are themselves falsifiable. Therefore, the conjunction “If I take action A, I will immediately there-after observe O; if I take action B, I will immediately there-after observe P” is, likewise, falsifiable.

Thus, the agent may have falsifiable beliefs about observable consequences of actions. This is a possible starting point for decision theory: actions having consequences is already assumed in the ontology of VNM utility theory.

Falsification and causation

Now, the next step is to account for physics. Luckily, the falsificationist paradigm was designed around demarcating scientific hypotheses, such that it naturally describes physics.

Interestingly, falsificationism takes agency (in terms of observations, computation, and action) as more basic than physics. For a thing to be falsifiable, it must be able to be falsified by some agent, seeing some observation. And the word able implies freedom.

Let’s start with some basic Popperian logic. Let f be some testable function (say, connected to a computer terminal) taking in a natural number and returning a Boolean. Consider the hypothesis: “For all x, f(x) is true”. This statement is falsifiable: if it’s false, then there exists some action-sequence an agent can take (typing x into the terminal, one digit at a time) that will prove it to be false.

The given hypothesis is a kind of scientific law. It specifies a regularity in the environment.

Note that there is a “bridge condition” at play here. That bridge condition is that the function f is, indeed, connected to the terminal, such that the agent’s observations of f are trustworthy. In a sense, the bridge condition specifies what f is, from the agent’s perspective; it allows the agent to locate f as opposed to some other function.

Let us now consider causal hypotheses. We already considered action-consequences. Now let us extend this analysis to reasoning about causation between external entities.

Consider the hypothesis: “If the match is struck, then it will alight immediately”. This hypothesis is falsifiable by an agent who is able to strike the match. If the hypothesis is false, then the agent may refute it by choosing to strike the match and then seeing the result. However, an agent who is unable to strike the match cannot falsify it. (Of course, this assumes the agent may see whether the match is alight after striking it)

Thus, we are defining causality in terms of agency. The falsification conditions for a causal hypothesis refer to the agent’s abilities. This seems somewhat wonky at first, but it is quite similar to Pearlian casuality, which defines causation in terms of metaphysically-real interventions. This order of definition radically reframes the determinism vs. free will apparent paradox, by defining the conditions of determinism (causality) in terms of potential action.

External physics

Let us now continue, proceeding to more universal physics. Consider the law of gravity, according to which a dropped object will accelerate downward at a near-constant weight. How might we port this law into an agential ontology?

Here is the assumption about how the agent interacts with gravity. The agent will choose some natural number as the height of an object. Thereafter, the object will fall, while a camera will record the height of the object at each natural-number time expressed in milliseconds, to the nearest natural-number millimeter from the ground. The agent may observe a printout of the camera data afterwards.

Logically, constant gravity implies, and is implied by, a particular quadratic formula for the height of the object as a function of the object’s starting height and the amount of time that has passed. This formula implies the content of the printout, as a function of the chosen height. So, the agent may falsify constant gravity (in the observable domain) by choosing an object-height, placing an object at that height, letting it fall, and checking the printout, which will show the law of constant gravity to be false, if the law in fact does not hold for objects dropped at that height (to the observed level of precision).

Universal constant gravity is not similarly falsifiable by this agent, because this agent may only observe this given experimental setup. However, a domain-limited law, stating that the law of constant gravity holds for all possible object-heights in this setup, up to the camera’s precision, is falsifiable.

It may seem that I am being incredibly pedantic about what a physical law is and what the falsification conditions are; however, I believe this level of pedantry is necessary for critically examining the notion of physical determinism to a high-enough level of rigor to check interaction with free will.

Internal physics

We have, so far, considered the case of an agent falsifying a physical law that applies to an external object. To check interaction with free will, we must interpret physical law applied to the agent’s internals, on which the agent’s cognition is, perhaps, running in a manner similar to software.

Let’s consider the notion that the agent itself is “running on” some Turing machine. We will need to specify precisely what such “running on” means.

Let C be the computer that the agent is considering whether it is running on. C has, at each time, a tape-state, a Turing machine state, an input, and an output. The input is attached to a sensor (such as a camera), and the output is attached to an actuator (such as a motor).

For simplicity, let us say that the history of tapes, states, inputs, and outputs is saved, such that it can be queried at a later time.

We may consider the hypothesis that C, indeed, implements the correct dynamics for a given Turing machine specification. These dynamics imply a relation between future states and past states. An agent may falsify these dynamics by checking the history and seeing if the dynamics hold.

Note that, because some states or tapes may be unreachable, it is not possible to falsify the hypothesis that C implements correct dynamics starting from unreachable states. Rather, only behavior following from reachable states may be checked.

Now, let us think on an agent considering whether they “run on” this computer C. The agent may be assumed to be able to query the history of C, such that it may itself falsify the hypothesis that C implements Turing machine specification M, and other C-related hypotheses as well.

Now, we can already name some ways that “I run on C” may be falsified:

  • Perhaps there is a policy I may adopt, and a time t, such that if I implement this policy, I will observe O at time t, but C will observe something other than O at time t.
  • Perhaps there is a policy I may adopt, and a time t, such that if I implement this policy, I will take action A at time t, but C will take an action other than A at time t.

The agent may prove these falsification conditions by adopting a given policy until some time t, and then observing C’s observation/action at time t, compared to their own observation/action.

I do not argue that the converse of these conditions exhaust what it means that “I run on C”. However, they at least restrict the possibility space by a very large amount. For the falsification conditions given to not hold, the observations and behavior of C must be identical with the agent’s own observations and behavior, for all possible policies the agent may adopt.

I will name the hypothesis with the above falsification conditions: “I effectively run on C”. This conveys that these conditions may not be exhaustive, while still being quite specific, and relating to effects between the agent and the environment (observations and actions).

Note that the agent can hypothesize itself to effectively run on multiple computers! The conditions for effectively running on one computer do not contradict the conditions for effectively running on another computer. This naturally handles cases of identical physical instantiations of a single agent.

At this point, we have an account of an agent who:

  • Believes they have observations and take free actions
  • May falsifiably hypothesize physical law
  • May falsifiably hypothesize that some computer implements a Turing machine specification
  • May falsifiably hypothesize that they themselves effectively run on some computer

I have not yet shown that this account is consistent. There may be paradoxes. However, this at least represents the subject matter covered in a unified critical agential ontology.

Paradoxes sought and evaluated

Let us now seek out paradox. We showed before that the hypothesis “I take action f()” may be refuted at will, and therefore does not hold as a necessary law. We may suspect that “I effectively run on C” runs into similar problems.


Remember that, for the “I effectively run on C” hypothesis to be falsified, it must be falsified at some time, at which the agent’s observation/action comes apart from C’s. In the “I take action f()” case, we had the agent simulate f() in order to take an opposite action. However, C need not halt, so the agent cannot simulate C until halting. Instead, the agent may select some time t, and run C for t steps. But, by the time the agent has simulated C for t steps, the time is already past t, and so the agent may not contradict C’s behavior at time t, by taking an opposite action. Rather, the agent only knows what C does at time t at some time later than t, and only their behavior after this time may depend on this knowledge.

So, this paradox is avoided by the fact that the agent cannot contradict its own action before knowing it, but cannot know it before taking it.

We may also try to create a paradox by assuming an external super-fast computer runs a copy of C in parallel, and feeds this copy’s action on subjective time-step t into the original C’s observation before time t; this way, the agent may observe its action before it takes it. However, now the agent’s action is dependent on its observation, and so the external super-fast computer must decide which observation to feed into the parallel C. The external computer cannot know what C will do before producing this observation, and so this attempt at a paradox cannot stand without further elaboration.

We see, now, that if free will and determinism are compatible, it is due to limitations on the agent’s knowledge. The agent, knowing it runs on C, cannot thereby determine what action it takes at time t, until a later time. And the initial attempt to provide this knowledge externally fails.

Downward causation

Let us now consider a general criticism of functionalist views, which is that of downward causation: if a mental entity (such as observation or action) causes a physical entity, doesn’t that either mean that the mental entity is physical, or that physics is not causally closed?

Recall that we have defined causation in terms of the agent’s action possibilities. It is straightforwardly the case, then, that the agent’s action at time t causes changes in the environment.

But, what of the physical cause? Perhaps it is also the case that C’s action at time t causes changes in the environment. If so, there is a redundancy, in that the change in the environment is caused both by the agent’s action and by C’s action. We will examine this possible redundancy to find potential conflicts.

To consider ways that C’s action may change the environment, we must consider how the agent may intervene on C’s action. Let us say we are concerned with C’s action at time t. Then we may consider the agent at some time u < t taking an action that will cause C’s action at time t to be over-written. For example, the agent may consider programming an external circuit that will interact with C’s circuit (“its circuit”).

However, if the agent performs this intervention, then the agent’s action at time t has no influence on C’s action at time t. This is because C’s action is, necessarily, equal to the value chosen at time u. (Note that this lack of influence means that the agent does not effectively run on C, for the notion of “effectively run on” considered! However, the agent may be said to effectively run on C with one exception.)

So, there is no apparent way to set up a contradiction between these interventions. If the agent decides early (at time u) to determine C’s action at time t, then that decision causes C’s action at time t; if the agent does not do so, then the agent’s decision at time t causes C’s action at time t; and these are mutually exclusive. Hence, there is not an apparent problem with redundant causality.


It may be suspected that the agent I take to be real is epiphenomenal. Perhaps all may be explained in a physicalist ontology, with no need to posit that there exists an agent that has observations and takes actions. (This is a criticism levied at some views on consciousness; my notion of metaphysically-real observations is similar enough to consciousness that these criticisms are potentially applicable)

The question in regards to explanatory power is: what is being explained, in terms of what? My answer is: observations are being explained, in terms of hypotheses that may be falsified by action/observations.

An eliminativist perspective denies the agent’s observations, and thus fails to explain what ought to be explained, in my view. However, eliminativists will typically believe that “scientific observation” is possible, and seek to explain scientific observations.

A relevant point to make here is that the notion of scientific observation assumes there is some scientific process happening that has observations. Indeed, the scientific method includes actions, such as testing, which rely on the scientific process taking actions. Thus, scientific processes may be considered as agents in the sense I am using the term.

My view is that erasing the agency of both individual scientists, and of scientific processes, puts the ontological and epistemic status of physics on shaky ground. It is hard to say why one should believe in physics, except in terms of it explaining observations, including experimental observations that require taking actions. And it is hard to say what it means for a physical hypothesis to be true, with no reference to how the hypothesis connects with observation and action.

In any case, the specter of epiphenomenalism presents no immediate paradox, and I believe that it does not succeed as a criticism.

Comparison to Gary Drescher’s view

I will now compare my account to Gary Drescher’s view. I have found Drescher’s view to be both particularly systematic and compelling, and to be quite similar to the views of other relevant philosophers such as Daniel Dennett and Eliezer Yudkowsky. Therefore, I will compare and contrast my view with Drescher’s. This will dispel the illusion that I am not saying anything new.

Notably, Drescher makes a similar observation to mine on Pearl: “Pearl’s formalism models free will rather than mechanical choice.”

Quoting section 5.3 of Good and Real:

Why did it take that action? In pursuit of what goal was the action selected? Was that goal achieved? Would the goal have been achieved if the machine had taken this other action instead? The system includes the assertion that if the agent were to do X, then Y would (probably) occur; is that assertion true? The system does not include the assertion that if it were to do P, Q would probably occur; is that omitted assertion true? Would the system have taken some other action just now if it had included that assertion? Would it then have better achieved its goals?

Insofar as such questions are meaningful and answerable, the agent makes choices in at least the sense that the correctness of its actions with respect to its designated goals is analyzable. That is to say, there can be means-end connections between its actions and its goals: its taking an action for the sake of a goal can make sense. And this is so despite the fact that everything that will happen-including every action taken and every goal achieved or not-is inalterably determined once the system starts up. Accordingly, I propose to call such an agent a choice machine.

Drescher is defining conditions of choice and agency in terms of whether the decisions “make sense” with respect to some goal, in terms of means-end connections. This is a “outside” view of agency in contrast with my “inside” view. That is, it says some thing is an agent when its actions connect with some goal, and when the internal logic of that thing takes into account this connection.

This is in contrast to my view, which takes agency to be metaphysically basic, and defines physical outside views (and indeed, physics itself) in terms of agency.

My view would disagree with Drescher’s on the “inalterably determined” assertion. In an earlier chapter, Drescher describes a deterministic block-universe view. This view-from-nowhere implies that future states are determinable from past states. In contrast, the view I present here rejects views-from-nowhere, instead taking the view of some agent in the universe, from whose perspective the future course is not already determined (as already argued in examinations of paradox).

Note that these disagreements are principally about metaphysics and ontology, rather than scientific predictions. I am unlikely to predict the results of scientific experiments differently from Drescher on account of this view, but am likely to account for the scientific process, causation, choice, and so on in different language, and using a different base model.

Conclusion and further research

I believe the view I have presented to be superior to competing views on multiple fronts, most especially logical/philosophical systematic coherence. I do not make the full case for this in this post, but take the first step, of explicating the basic ontology and how it accounts for phenomena that are critically necessary to account for.

An obvious next step is to tackle decision theory. Both Bayesianism and VNM decision theory are quite concordant with critical agential ontology, in that they propose coherence conditions on agents, which can be taken as criticisms. Naturalistic decision theory involves reconciling choice with physics, and so a view that already includes both is a promising starting point.

Multi-agent systems are quite important as well. The view presented so far is near-solipsistic, in that there is a single agent who conceptualizes the world. It will need to be defined what it means for there to be “other” agents. Additionally, “aggregative” agents, such as organizations, are important to study, including in terms of what it means for a singular agent to participate in an aggregative agent. “Standardized” agents, such as hypothetical skeptical mathematicians or philosophers, are also worthy subjects of study; these standardized agents are relevant in reasoning about argumentation and common knowledge. Also, while the discussion so far has been in terms of closed individualism, alternative identity views such as empty individualism and open individualism are worth considering from a critical agential perspective.

Other areas of study include naturalized epistemology and philosophy of mathematics. The view so far is primarily ontological, secondarily epistemological. With the ontology in place, epistemology can be more readily explored.

I hope to explore the consequences of this metaphysics further, in multiple directions. Even if I ultimately abandon it, it will have been useful to develop a coherent view leading to an illuminating refutation.

On the falsifiability of hypercomputation, part 2: finite input streams

In part 1, I discussed the falsifiability of hypercomputation in a typed setting where putative oracles may be assumed to return natural numbers. In this setting, there are very powerful forms of hypercomputation (at least as powerful as each level in the Arithmetic hierarchy) that are falsifiable.

However, as Vanessa Kosoy points out, this typed setting has difficulty applying to the real world, where agents may only observe a finite number of bits at once:

The problem with constructive halting oracles is, they assume the ability to output an arbitrary natural number. But, realistic agents can observe only a finite number of bits per unit of time. Therefore, there is no way to directly observe a constructive halting oracle. We can consider a realization of a constructive halting oracle in which the oracle outputs a natural number one digit at a time. The problem is, since you don’t know how long the number is, a candidate oracle might never stop producing digits. In particular, take any non-standard model of PA and consider an oracle that behaves accordingly. On some machines that don’t halt, such an oracle will claim they do halt, but when asked for the time it will produce an infinite stream of digits. There is no way to distinguish such an oracle from the real thing (without assuming axioms beyond PA).

This is an important objection. I will address it in this post by considering only oracles which return Booleans. In this setting, there is a form of hypercomputation that is falsifiable, although this hypercomputation is less powerful than a halting oracle.

Define a binary Turing machine to be a machine that outputs a Boolean (0 or 1) whenever it halts. Each binary Turing machine either halts and outputs 0, halts and outputs 1, or never halts.

Define an arbitration oracle to be a function that takes as input a specification of a binary Turing machine, and always outputs a Boolean in response. This oracle must always return 0 if the machine eventually outputs 0, and must always return 1 if the machine eventually outputs 1; it may decide arbitrarily if the machine never halts. Note that this can be emulated using a halting oracle, and is actually less powerful. (This definition is inspired by previous work in reflective oracles)

The hypothesis that a putative arbitration oracle (with the correct type signature, MachineSpec → Boolean) really is one is falsifiable. Here is why:

  1. Suppose for some binary Turing machine M that halts and returns 1, the oracle O wrongly has O(M) = 0. Then this can be proven by exhibiting M along with the number of steps required for the machine to halt.
  2. Likewise if M halts and returns 0, and the oracle O wrongly has O(M) = 1.

Since the property of some black-box being an arbitration oracle is falsifiable, we need only show at this point that there is no computable arbitration oracle. For this proof, assume (for the sake of contradiction) that O is a computable arbitration oracle.

Define a binary Turing machine N() := 1 – O(N). This definition requires quining, but this is acceptable for the usual reasons. Note that N always halts, as O always halts. Therefore we must have N() = O(N). However also N() = 1 – O(N), a contradiction (as O(N) is a Boolean).

Therefore, there is no computable arbitration oracle.

Higher hypercomputation?

At this point, it is established that there is a form of hypercomputation (specifically, arbitration oracles) that is falsifiable. But, is this universal? That is, is it possible that higher forms of hypercomputation are falsifiable in the same setting?

We can note that it’s possible to use an arbitration oracle to construct a model of PA, one statement at a time. To do this, first note that for any statement, it is possible to construct a binary Turing machine that returns 1 if the statement is provable, 0 if it is disprovable, and never halts if neither is the case. So we can iterate through all PA statements, and use an arbitration oracle to commit to that statement being true or false, on the basis of provability/disprovability given previous commitments, in a way that ensures that commitments are never contradictory (as long as PA itself is consistent). This is essentially the same construction idea as in the Demski prior over logical theories.

Suppose there were some PA-definable property P that a putative oracle O (mapping naturals to Booleans) must have (e.g. the property of being a halting oracle, for some encoding of Turing machines as naturals). Then, conditional on the PA-consistency of the existence of an oracle with property P, we can use the above procedure to construct a model of PA + existence of O satisfying P (i.e. a theory that says what PA says and also contains a function symbol O that axiomatically satisfies P). For any PA-definable statement about this oracle, this procedure will, at some finite time, have made a commitment about this statement.

So, access to an arbitration oracle allows emulating any other PA-definable oracle, in a way that will not be falsified by PA. It follows that hypercomputation past the level of arbitration oracles is not falsifiable by a PA-reasoner who can access the oracle, as PA cannot rule out that it is actually looking at something produced by only arbitration-oracle levels of hypercomputation.

Moreover, giving the falsifier access to an arbitration oracle can’t increase the range of oracles that are falsifiable. This is because, for any oracle-property P, we may consider a corresponding property on an oracle-pair (which may be represented by a single oracle-property through interleaving), stating that the first oracle is an arbitration oracle, and the second satisfies property P. This oracle pair property is falsifiable iff the property P is falsifiable by a falsifier with access to an arbitration oracle. This is because we may consider a joint search for falsifications, that simultaneously tries to prove the first oracle isn’t an arbitration oracle, and one that tries to prove that the second oracle doesn’t satisfy P assuming the first oracle is an arbitration oracle. Since the oracle pair property is PA-definable, it is emulable by a Turing machine with access to an arbitration oracle, and the pair property is unfalsifiable if it requires hypercomputation past arbitration oracle. But this implies that the original oracle property P is unfalsifiable by a falsifier with access to an arbitration oracle, if P requires hypercomputation past arbitration oracle.

So, arbitration oracles form a ceiling on what can be falsified unassisted, and also are unable to assist in falsifying higher levels of hypercomputation.


Given that arbitration oracles form a ceiling of computable falsifiability (in the setting considered here, which is distinct from the setting of the previous post), it may or may not be possible to define a logic that allows reasoning about levels of computation up to arbitration oracles, but which does not allow computation past arbitration oracles to be defined. Such a project could substantially clarify logical foundations for mathematics, computer science, and the empirical sciences.

On the falsifiability of hypercomputation

[ED NOTE: see Vanessa Kosoy’s comment here; this post assumes a setting in which the oracle may be assumed to return a standard natural.]

It is not immediately clear whether hypercomputers (i.e. objects that execute computations that Turing machines cannot) are even conceivable, hypothesizable, meaningful, clearly definable, and so on. They may be defined in the notation of Peano arithmetic or ZFC, however this does not imply conceivability/hypothesizability/etc. For example, a formalist mathematician may believe that the Continuum hypothesis does not have a meaningful truth value (as it is independent of ZFC), and likewise for some higher statements in the arithmetic hierarchy that are independent of Peano Arithmetic and/or ZFC.

A famous and useful criterion of scientific hypotheses, proposed by Karl Popper, is that they are falsifiable. Universal laws (of the form “∀x. p(x)”) are falsifiable for testable p, as they can be proven false by exhibiting some x such that p(X) is false. In an oracle-free computational setting, the falsifiable hypotheses are exactly those in ∏₁ (i.e. of the form “∀n. p(n)” for natural n and primitive recursive p).

However, ∏₁ hypotheses do not hypothesize hypercomputation; they hypothesize (computably checkable) universal laws of the naturals. To specify the falsifiability criterion for hypercomputers, we must introduce oracles.

Let a halting oracle be defined as a function O which maps Turing machine-specifications to Booleans, which outputs “true” on exactly those Turing machines which eventually halt. Then, we can ask: is the hypothesis that “O is a halting oracle” falsifiable?

We can immediately see that, if O ever outputs “false” for a Turing machine which does eventually halt, it is possible to exhibit a proof of this, by exhibiting both the Turing machine and the number of steps it takes to halts. On the other hand, if O ever outputs “true” for a Turing machine which never halts, it is not in general possible to prove this; to check such a proof in general would require solving the halting problem, which is uncomputable.

Therefore, the hypothesis that “O is a halting oracle” is not falsifiable in a computational setting with O as an oracle.

However, there is a different notion of halting oracle whose definition is falsifiable. Let a constructive halting oracle be defined as a function O which maps Turing machine-specifications to elements of the set {∅} ∪ ℕ (i.e. either a natural number or null), such that it returns ∅ on those Turing machines which never halt, and returns some natural on Turing machines that do halt, such that the machine halts by the number of steps given by that natural. This definition corresponds to the most natural definition of a halting oracle in Heyting arithmetic, a constructive variant of Peano Arithmetic.

We can see that:

  1. If there exists a machine M such that O(M) = ∅ and M halts, it is possible to prove that O is not a constructive halting oracle, by exhibiting M and the time step on which M halts.
  2. If there exists a machine M such that O(M) ≠ ∅ and M does not halt by O(M) time steps, it is possible to prove that O is not a constructive halting oracle, by exhibiting M.

Therefore, the hypothesis “O is a constructive halting oracle” is computably falsifiable.

What about higher-level constructive halting oracles, corresponding to Σₙ in the Heyting Arithmetic interpretation of the arithmetic hierarchy? The validity of a constructive Σₙ-oracle is, indeed, falsifiable for arbitrary n, as shown in the appendix.

Therefore, the hypothesis that some black-box is a (higher-level) constructive halting oracle is falsifiable, in an idealized computational setting. It is, then, meaningful to speak of some black-box being a hypercomputer or not, on an account of meaningfulness at least as expansive as the falsifiability criterion.

This provides a kind of bridge between empiricism and rationalism. While rationalism may reason directly about the logical implications of halting oracles, empiricism is more skeptical about the meaningfulness of the hypothesis. However, by the argument given, an empiricism that accepts the meaningfulness of falsifiable statements must accept the meaningfulness of the hypothesis that some black-box O is a constructive halting oracle.

I think this is a fairly powerful argument that hypercomputation should not be ruled out a-priori as “meaningless”, and should instead be considered a viable hypothesis a-priori, even if it is not likely given other evidence about physics, anthropics, etc.

Appendix: higher-level halting oracles

We will now reason directly about the Heyting arithmetic hierarchy rather than dealing with Turing machines for simplicity, though these are logically equivalent. Σₙ₊₁ propositions can be written as ∃x₁∀y₁…∃xₙ∀yₙ.f(x₁, y₁, …, xₙ, yₙ) for some primitive-recursive f. The converse of this proposition (which is in ∏ₙ₊₁) is of the form ∀x₁∃y₁…∀xₙ∃yₙ.¬f(x₁, y₁, …, xₙ, yₙ).

An oracle O constructively deciding Σₙ₊₁ is most naturally interpreted as a function from a specification of f to (∃x₁∀y₁…∃xₙ∀yₙ.f(x₁, y₁, …, xₙ, yₙ)) ∨ (∀x₁∃y₁…∀xₙ∃yₙ.¬f(x₁, y₁, …, xₙ, yₙ)); that is, it decides whether the Σₙ₊₁ proposition is true or its converse ∏ₙ₊₁ proposition is, and provides a witness either way.

What is a natural interpretation of the witness? A witness for Σₙ₊₁ maps y₁…yₙ to x₁…xₙ (and asserts f(x₁, y₁, …, xₙ, yₙ)), while a witness for ∏ₙ₊₁ maps x₁..xₙ to y₁…yₙ (and asserts ¬f(x₁, y₁, …, xₙ, yₙ)). (Note that the witness must satisfy regularity conditions, e.g. the x₁ returned by a Σₙ₊₁ witness must not depend on the witness’s input; we assume that even invalid witnesses still satisfy these regularity conditions, as it is easy to ensure they are satisfied by specifying the right witness type)

Now, we can ask four questions:

  1. Fix f; suppose the Σₙ₊₁ proposition is true, and an invalid Σₙ₊₁-witness is returned; then, is it possible to prove the oracle false?
  2. Fix f; suppose the ∏ₙ₊₁ proposition is true, and an invalid ∏ₙ₊₁-witness is returned; then, is it possible to prove the oracle false?
  3. Fix f; suppose the Σₙ₊₁ proposition is true, and a ∏ₙ₊₁-witness is returned; then, is it possible to prove the oracle false?
  4. Fix f; suppose the ∏ₙ₊₁ proposition is true, and a Σₙ₊₁-witness is returned; then, is it possible to prove the oracle false?

These conditions are necessary and sufficient for O’s correctness to be falsifiable, because O will satisfy one of the above 4 conditions for some f iff it is invalid.

First let’s consider question 1. Since the witness (call it g) is invalid, it maps some y₁…yₙ to some x₁…xₙ such that ¬f(x₁, y₁, …, xₙ, yₙ). We may thus prove the witness’s invalidity by exhibiting y₁…yₙ. So the answer is yes, and similarly for question 2.

Now for question 3. Let the witness be g. Since the Σₙ₊₁ proposition is true, there is some x₁ for which ∀y₁…∃xₙ∀yₙ.f(x₁, y₁, …, xₙ, yₙ). Now, we may feed x₁ into the witness g to get a y₁ for which the oracle asserts ∀x₂∃y₂…∀xₙ∃yₙ.¬f(x₁, y₁, …, xₙ, yₙ). (Note, g’s returned y₁ must not depend on x’s after x₁, by regularity, so we may set the rest of the x’s to 0)

We proceed recursively, yielding x₁…xₙ and y₁…yₙ for which f(x₁, y₁, …, xₙ, yₙ), and for which the oracle asserts ¬f(x₁, y₁, …, xₙ, yₙ), hence proving the oracle invalid (through exhibiting these x’s and y’s). So we may answer question 3 with a “yes”.

For question 4, the proof proceeds similarly, except we start by getting x₁ from the witness. The answer is, then, also “yes”.

Therefore, O’s validity as a constructive Σₙ₊₁-oracle is falsifiable.

Philosophical self-ratification

“Ratification” is defined as “the act or process of ratifying something (such as a treaty or amendment) : formal confirmation or sanction”. Self-ratification, then, is assigning validity to one’s self. (My use of the term “self-ratification” follows philosophical usage in analysis of causal decision theory)

At first this seems like a trivial condition. It is, indeed, easy to write silly sentences such as “This sentence is true and also the sky is green”, which are self-ratifying. However, self-ratification combined with other ontological and epistemic coherence conditions is a much less trivial condition, which I believe to be quite important for philosophical theory-development and criticism.

I will walk through some examples.

Causal decision theory

Formal studies of causal decision theory run into a problem with self-ratification. Suppose some agent A is deciding between two actions, L and R. Suppose the agent may randomize their action, and that their payoff equals their believed probability that they take the action other than the one they actually take. (For example, if the agent takes action L with 40% probability and actually takes action R, the agent’s payoff is 0.4)

If the agent believes they will take action L with 30% probability, then, if they are a causal decision theorist, they will take action L with 100% probability, because that leads to 0.7 payoff instead of 0.3 payoff. But, if they do so, this invalidates their original belief that they will take action L with 30% probability. Thus, the agent’s belief that they will take action L with 30% probability is not self-ratifying: the fact of the agent having this belief leads to the conclusion that they take action L with 100% probability, not 30%, which contradicts the original belief.

The only self-ratifying belief is that the agent will take each action with 50% probability; this way, both actions yield equal expected utility, and so a policy 50/50 randomization is compatible with causal decision theory, and this policy ratifies the original belief.

Genetic optimism

(This example is due to Robin Hanson’s “Uncommon Priors Require Origin Disputes”.)

Suppose Oscar and Peter are brothers. Oscar is more optimistic than Peter. Oscar comes to believe that the reason he is more optimistic is due to inheriting a gene that inflates beliefs about positive outcomes, whereas Peter did not inherit this same gene.

Oscar’s belief-set is now not self-ratifying. He believes the cause of his belief that things will go well to be a random gene, not correlation with reality. This means that, according to his own beliefs, his optimism is untrustworthy.

Low-power psychological theories

Suppose a psychological researcher, Beth, believes that humans are reinforcement-learning stimulus-response machines, and that such machines are incapable of reasoning about representations of the world. She presents a logical specification of stimulus-response machines that she believes applies to all humans. (For similar real-world theories, see: Behaviorism, Associationism, Perceptual Control Theory)

However, a logical implication of Beth’s beliefs is that she herself is a stimulus-response machine, and incapable of reasoning about world-representations. Thus, she cannot consistently believe that her specification of stimulus-response machines is likely to be an accurate, logically coherent representation of humans. Her belief-set, then, fails to self-ratify, on the basis that it assigns to herself a level of cognitive power insufficient to come to know that her belief-set is true.

Moral realism and value drift

Suppose a moral theorist, Valerie, believes:

  • Societies’ moral beliefs across history follow a random walk, not directed anywhere.
  • Her own moral beliefs, for the most part, follow society’s beliefs.
  • There is a true morality which is stable and unchanging.
  • Almost all historical societies’ moral beliefs are terribly, terribly false.

From these it follows that, absent further evidence, the moral beliefs of Valerie’s society should not be expected to be more accurate (according to estimation of the objective morality that Valerie believes exists) than the average moral beliefs across historical societies, since there is no moral progress in expectation. However, this implies that the moral beliefs of her own society are likely to be terribly, terribly false. Therefore, Valerie’s adoption of her society’s beliefs would imply that her own moral beliefs are likely to be terribly, terrible false: a failure of self-ratification.

Trust without honesty

Suppose Larry is a blogger who reads other blogs. Suppose Larry believes:

  • The things he reads in other blogs are, for the most part, true (~90% likely to be correct).
  • He’s pretty much the same as other bloggers; there is a great degree of subjunctive dependence between his own behavior and other bloggers’ behaviors (including their past behaviors).

Due to the first belief, he concludes that lying in his own blog is fine, as there’s enough honesty out there that some additional lies won’t pose a large problem. So he starts believing that he will lie and therefore his own blog will contain mostly falsehoods (~90%).

However, an implication of his similarity to other bloggers is that other bloggers will reason similarly, and lie in their own blog posts. Since this applies to past behavior as well, a further implication is that the things he reads in other blogs are, for the most part, false. Thus the belief-set, and his argument for lying, fail to self-ratify.

(I presented a similar example in “Is Requires Ought”.)

Mental nonrealism

Suppose Phyllis believes that the physical world exists, but that minds don’t exist. That is, there are not entities that are capable of observation, thought, etc. (This is a rather simple, naive formulation of eliminative materialism)

Her reason for this belief is that she has studied physics, and believes that physics is sufficient to explain everything, such that there is no reason to additionally posit the existence of minds.

However, if she were arguing for the accuracy of her beliefs about physics, she would have difficulty arguing except in terms of e.g. physicists making and communicating observations, theorists having logical thoughts, her reading and understanding physics books, etc.

Thus, her belief that minds don’t exist fails to self-ratify. It would imply that she lacks evidential basis for belief in the accuracy of physics. (On the other hand, she may be able to make up for this by coming up with a non-mentalistic account for how physics can come to be “known”, though this is difficult, as it is not clear what there is that could possibly have knowledge. Additionally, she could believe that minds exist but are somehow “not fundamental”, in that they are determined by physics; however, specifying how they are determined by physics requires assuming they exist at all and have properties in the first place.)


I hope the basic picture is clear by now. Agents have beliefs, and some of these beliefs imply beliefs about the trustworthiness of their own beliefs, primarily due to the historical origins of the beliefs (e.g. psychology, society, history). When the belief-set implies that it itself is untrustworthy (being likely to be wrong), there is a failure of self-ratification. Thus, self-ratification, rather than being a trivial condition, is quite nontrivial when combined with other coherence conditions.

Why would self-ratification be important? Simply put, a non-self-ratifying belief set cannot be trustworthy; if it were trustworthy then it would be untrustworthy, which shows untrustworthiness by contradiction. Thus, self-ratification points to a rich set of philosophical coherence conditions that may be neglected if one is only paying attention to surface-level features such as logical consistency.

Self-ratification as a philosophical coherence condition points at naturalized epistemology being an essential philosophical achievement. While epistemology may possibly start non-naturalized, as it gains self-consciousness of the fact of its embeddedness in a natural world, such self-consciousness imposes additional self-ratification constraints.

Using self-ratification in practice often requires flips between treating one’s self as a subject and as an object. This kind of dual self-consciousness is quite interesting and is a rich source of updates to both self-as-subject beliefs and self-as-object beliefs.

Taking coherence conditions including self-ratification to be the only objective conditions of epistemic justification is a coherentist theory of justification; note that coherentists need not believe that all “justified” belief-sets are likely to be true (and indeed, such a belief would be difficult to hold given the possibility of coherent belief-sets very different from one’s own and from each other).

Appendix: Proof by contradiction is consistent with self-ratification

There is a possible misinterpretation of self-ratification that says: “You cannot assume a belief to be true in the course of refuting it; the assumption would then fail to self-ratify”.

Classical logic permits proof-by-contradiction, indicating that this interpretation is wrong. The thing that a proof by contradiction does is show that some other belief-set (not the belief-set held by the arguer) fails to self-ratify (and indeed, self-invalidates). If the arguer actually believed in the belief-set that they are showing to be self-invalidating, then, indeed, that would be a self-ratification problem for the arguer. However, the arguer’s belief is that some proposition P implies not-P, not that P is true, so this does not present a self-ratification problem.

High-precision claims may be refuted without being replaced with other high-precision claims

There’s a common criticism of theory-criticism which goes along the lines of:

Well, sure, this theory isn’t exactly right. But it’s the best theory we have right now. Do you have a better theory? If not, you can’t really claim to have refuted the theory, can you?

This is wrong. This is falsification-resisting theory-apologism. Karl Popper would be livid.

The relevant reason why it’s wrong is that theories make high-precision claims. For example, the standard theory of arithmetic says 561+413=974. Not 975 or 973 or 97.4000001, but exactly 974. If arithmetic didn’t have this guarantee, math would look very different from how it currently looks (it would be necessary to account for possible small jumps in arithmetic operations).

A single bit flip in the state of a computer process can crash the whole program. Similarly, high-precision theories rely on precise invariants, and even small violations of these invariants sink the theory’s claims.

To a first approximation, a computer either (a) almost always works (>99.99% probability of getting the right answer) or (b) doesn’t work (<0.01% probability of getting the right answer). There are edge cases such as randomly crashing computers or computers with small floating point errors. However, even a computer that crashes every few minutes functions very precisely correctly in >99% of seconds that it runs.

If a computer makes random small errors 0.01% of the time in e.g. arithmetic operations, it’s not an almost-working computer, it’s a completely non-functioning computer, that will crash almost immediately.

The claim that a given algorithm or circuit really adds two numbers is very precise. Even a single pair of numbers that it adds incorrectly refutes the claim, and very much risks making this algorithm/circuit useless. (The rest of the program would not be able to rely on guarantees, and would instead need to know the domain in which the algorithm/circuit functions; this would significantly complicate the reasoning about correctness)

Importantly, such a refutation does not need to come along with an alternative theory of what the algorithm/circuit does. To refute the claim that it adds numbers, it’s sufficient to show a single counterexample without suggesting an alternative. Quality assurance processes are primarily about identifying errors, not about specifying the behavior of non-functioning products.

A Bayesian may argue that the refuter must have an alternative belief about the circuit. While this is true assuming the refuter is Bayesian, such a belief need not be high-precision. It may be a high-entropy distribution. And if the refuter is a human, they are not a Bayesian (that would take too much compute), and will instead have a vague representation of the circuit as “something doing some unspecified thing”, with some vague intuitions about what sorts of things are more likely than other things. In any case, the Bayesian criticism certainly doesn’t require the refuter to replace the claim about the circuit with an alternative high-precision claim; either a low-precision belief or a lack-of-belief will do.

The case of computer algorithms is particularly clear, but of course this applies elsewhere:

  • If there’s a single exception to conservation of energy, then a high percentage of modern physics theories completely break. The single exception may be sufficient to, for example, create perpetual motion machines. Physics, then, makes a very high-precision claim that energy is conserved, and a refuter of this claim need not supply an alternative physics.
  • If a text is claimed to be the word of God and totally literally true, then a single example of a definitely-wrong claim in the text is sufficient to refute the claim. It isn’t necessary to supply a better religion; the original text should lose any credit it was assigned for being the word of God.
  • If rational agent theory is a bad fit for effective human behavior, then the precise predictions of microeconomic theory (e.g. the option of trade never reducing expected utility for either actor, or the efficient market hypothesis being true) are almost certainly false. It isn’t necessary to supply an alternative theory of effective human behavior to reject these predictions.
  • If it is claimed philosophically that agents can only gain knowledge through sense-data, then a single example of an agent gaining knowledge without corresponding sense-data (e.g. mental arithmetic) is sufficient to refute the claim. It isn’t necessary to supply an alternative theory of how agents gain knowledge for this to refute the strongly empirical theory.
  • If it is claimed that hedonic utility is the only valuable thing, then a single example of a valuable thing other than hedonic utility is sufficient to refute the claim. It isn’t necessary to supply an alternative theory of value.

A theory that has been refuted remains contextually “useful” in a sense, but it’s the walking dead. It isn’t really true everywhere, and:

  • Machines believed to function on the basis of the theory cannot be trusted to be highly reliable
  • Exceptions to the theory can sometimes be manufactured at will (this is relevant in both security and philosophy)
  • The theory may make significantly worse predictions on average than a skeptical high-entropy prior or low-precision intuitive guesswork, due to being precisely wrong rather than imprecise
  • Generative intellectual processes will eventually discard it, preferring instead an alternative high-precision theory or low-precision intuitions or skepticism
  • The theory will go on doing damage through making false high-precision claims

The fact that false high-precision claims are generally more damaging than false low-precision claims is important ethically. High-precision claims are often used to ethically justify coercion, violence, and so on, where low-precision claims would have been insufficient. For example, imprisoning someone for a long time may be ethically justified if they definitely committed a serious crime, but is much less likely to be if the belief that they committed a crime is merely a low-precision guess, not validated by any high-precision checking machine. Likewise for psychiatry, which justifies incredibly high levels of coercion on the basis of precise-looking claims about different kinds of cognitive impairment and their remedies.

Therefore, I believe there is an ethical imperative to apply skepticism to high-precision claims, and to allow them to be falsified by evidence, even without knowing what the real truth is other than that it isn’t as the high-precision claim says it is.

On hiding the source of knowledge

I notice that when I write for a public audience, I usually present ideas in a modernist, skeptical, academic style; whereas, the way I come up with ideas is usually in part by engaging in epistemic modalities that such a style has difficulty conceptualizing or considers illegitimate, including:

  • Advanced introspection and self-therapy (including focusing and meditation)
  • Mathematical and/or analogical intuition applied everywhere with only spot checks (rather than rigorous proof) used for confirmation
  • Identity hacking, including virtue ethics, shadow-eating, and applied performativity theory
  • Altered states of mind, including psychotic and near-psychotic experiences
  • Advanced cynicism and conflict theory, including generalization from personal experience
  • Political radicalism and cultural criticism
  • Eastern mystical philosophy (esp. Taoism, Buddhism, Tantra)
  • Literal belief in self-fulfilling prophecies, illegible spiritual phenomena, etc, sometimes with decision-theoretic and/or naturalistic interpretations

This risks hiding where the knowledge actually came from. Someone could easily be mistaken into thinking they can do what I do, intellectually, just by being a skeptical academic.

I recall a conversation I had where someone (call them A) commented that some other person (call them B) had developed some ideas, then afterwards found academic sources agreeing with these ideas (or at least, seeming compatible), and cited these as sources in the blog post write-ups of these ideas. Person A believed that this was importantly bad in that it hides where the actual ideas came from, and assigned credit for them to a system that did not actually produce the ideas.

On the other hand, citing academics that agree with you is helpful to someone who is relying on academic peer-review as part of their epistemology. And, similarly, offering a rigorous proof is helpful for convincing someone of a mathematical principle they aren’t already intuitively convinced of (in addition to constituting an extra check of this principle).

We can distinguish, then, the source of an idea from the presented epistemic justification of it. And the justificatory chain (to a skeptic) doesn’t have to depend on the source. So, there is a temptation to simply present the justificatory chain, and hide the source. (Especially if the source is somehow embarrassing or delegitimized)

But, this creates a distortion, if people assume the justificatory chains are representative of the source. Information consumers may find themselves in an environment where claims are thrown around with various justifications, but where they would have quite a lot of difficulty coming up with and checking similar claims.

And, a lot of the time, the source is important in the justification, because the source was the original reason for privileging the hypothesis. Many things can be partially rationally justified without such partial justification being sufficient for credence, without also knowing something about the source. (The problems of skepticism in philosophy in part relate to this: “but you have the intuition too, don’t you?” only works if the other person has the same intuition (and admits to it), and arguing without appeals to intuition is quite difficult)

In addition, even if the idea is justified, the intuition itself is an artifact of value; knowing abstractly that “X” does not imply the actual ability to, in real situations, quickly derive the implications of “X”. And so, sharing the source of the original intuition is helpful to consumers, if it can be shared. Very general sources are even more valuable, since they allow for generation of new intuitions on the fly.

Unfortunately, many such sources can’t easily be shared. Some difficulties with doing so are essential and some are accidental. The essential difficulties have to do with the fact that teaching is hard; you can’t assume the student already has the mental prerequisites to learn whatever you are trying to teach, as there is significant variation between different minds. The accidental difficulties have to do with social stigma, stylistic limitations, embarrassment, politics, privacy of others, etc.

Some methods for attempting to share such intuitions may result in text that seems personal and/or poetic, and be out of place in a skeptical academic context. This is in large part because such text isn’t trying to justify itself by the skeptical academic standards, and is nevertheless attempting to communicate something.

Noticing this phenomenon has led me to more appreciate forewards and prefaces of books. These sections often discuss more of the messiness of idea-development than the body of the book does. There may be a nice stylistic way of doing something similar for blog posts; perhaps, an extended bibliography that includes free-form text.

I don’t have a solution to this problem at the moment. However, I present this phenomenon as a problem, in the spirit of discussing problems before proposing solutions. I hope it is possible to reduce the accidental difficulties in sharing sources of knowledge, and actually-try on the essential difficulties, in a way that greatly increases the rate of interpersonal model-transfer.

On the ontological development of consciousness

This post is about what consciousness is, ontologically, and how ontologies that include consciousness develop.

The topic of consciousness is quite popular, and confusing, in philosophy. While I do not seek to fully resolve the philosophy of consciousness, I hope to offer an angle on the question I have not seen before. This angle is that of developmental ontology: how are “later” ontologies developed from “earlier” ontologies? I wrote on developmental ontology in a previous post, and this post can be thought of as an elaboration, which can be read on its own, and specifically tackles the problem of consciousness.

Much of the discussion of stabilization is heavily inspired by On the Origin of Objects, an excellent book on reference and ontology, to which I owe much of my ontological development. To the extent that I have made any philosophical innovation, it is in combining this book’s concepts with the minimum-description-length principle, and analytic philosophy of mind.

World-perception ontology

I’m going to write a sequence of statements, which each make sense in terms of an intuitive world-perception ontology.

  • There’s a real world outside of my head.
  • I exist and am intimately connected with, if not identical with, some body in this world.
  • I only see some of the world. What I can see is like what a camera placed at the point my eyes are can see.
  • The world contains objects. These objects have properties like shape, color, etc.
  • When I walk, it is me who moves, not everything around me. Most objects are not moving most of the time, even if they look like they’re moving in my visual field.
  • Objects, including my body, change and develop over time. Changes proceed, for the most part, in a continuous way, so e.g. object shapes and sizes rarely change, and teleportation doesn’t happen.

These all seem common-sensical; it would be strange to doubt them. However, achieving the ontology by which such statements are common-sensical is nontrivial. There are many moving parts here, which must be working in their places before the world seems as sensible as it is.

Let’s look at the “it is me who moves, not everything around me” point, because it’s critical. If you try shaking your head right now, you will notice that your visual field changes rapidly. An object (such as a computer screen) in your vision is going to move side-to-side (or top-to-bottom), from one side of your visual field to another.

However, despite this, there is an intuitive sense of the object not moving. So, there is a stabilization process involved. Image stabilization (example here) is an excellent analogy for this process (indeed, the brain could be said to engage in image stabilization in a literal sense).

The world-perception ontology is, much of the time, geocentric, rather than egocentric or heliocentric. If you walk, it usually seems like the ground is still and you are moving, rather than the ground moving while you’re still (egocentrism), or both you and the ground moving very quickly (heliocentrism). There are other cases such as vehicle interiors where what is stabilized is not the Earth, but the vehicle itself; and, “tearing” between this reference frame and the geocentric reference frame can cause motion sickness.

Notably, world-perception ontology must contain both (a) a material world and (b) “my perceptions of it”. Hence, the intuitive ontological split between material and consciousness. To take such a split to be metaphysically basic is to be a Descartes-like dualist. And the split is ontologically compelling enough that such a metaphysics can be tempting.

Pattern-only ontology

William James famously described the baby’s sense of the world as a “blooming, buzzing confusion”. The image presented is one of dynamism and instability, very different from world-perception ontology.

The baby’s ontology is closer to raw percepts than an adult’s is; it’s less developed, fewer things are stabilized, and so on. Babies generally haven’t learned object permanence; this is a stabilization that is only developed later.

The most basic ontology consists of raw percepts (which cannot even be considered “percepts” from within this ontology), not even including shapes; these percepts may be analogous to pixel-maps in the case of vision, or spectrograms in the case of hearing, but I am unsure of these low-level details, and the rest of this post would still apply if the basic percepts were e.g. lines in vision. Shapes (which are higher-level percepts) must be recognized in the sea of percepts, in a kind of unsupervised learning.

The process of stabilization is intimately related to a process of pattern-detection. If you can detect patterns of shapes across time, you may reify such patterns as an object. (For example, a blue circle that is present in the visual field, and retains the same shape even as it moves around the field, or exits and re-enters, may be reified as a circular object). Such pattern-reification is analogous to folding a symmetric image in half: it allows the full image to be described using less information than was contained in the original image.

In general, the minimum description length principle says it is epistemically correct to posit fewer objects to explain many. And, positing a small number of shapes to explain many basic percepts, or a small number of objects to explain a large number of shapes, are examples of this.

From having read some texts on meditation (especially Mastering the Core Teachings of the Buddha), and having meditated myself, I believe that meditation can result in getting more in-touch with pattern-only ontology, and that this is an intended result, as the pattern-only ontology necessarily contains two of the three characteristics (specifically, impermanence and no-self).

To summarize: babies start from a confusing point, where there are low-level percepts, and patterns progressively recognized in them, which develops ontology including shapes and objects.

World-perception ontology results from stabilization

The thesis of this post may now be stated: world-perception ontology results from stabilizing a previous ontology that is itself closer to pattern-only ontology.

One of the most famous examples of stabilization in science is the movement from geocentrism to heliocentrism. Such stabilization explains many epicycles in terms of few cycles, by changing where the center is.

The move from egocentrism to geocentrism is quite analogous. An egocentric reference frame will contain many “epicycles”, which can be explained using fewer “cycles” in geocentrism.

These cycles are literal in the case of a person spinning around in a circle. In a pattern-only ontology (which is, necessarily, egocentric, for the same reason it doesn’t have a concept of self), that person will see around them shapes moving rapidly in the same direction. There are many motions to explain here. In a world-percept ontology, most objects around are not moving rapidly; rather, it is believed that the self is spinning.

So, the egocentric-to-geocentric shift is compelling for the same reason the geocentric-to-heliocentric shift is. It allows one to posit that there are few motions, instead of many motions. This makes percepts easier to explain.

Consciousness in world-perception ontology

The upshot of what has been said so far is: the world-perception ontology results from Occamian symmetry-detection and stabilization starting from a pattern-only ontology (or, some intermediate ontology).

And, the world-perception ontology has conscious experience as a component. For, how else can what were originally perceptual patterns be explained, except by positing that there is a camera-like entity in the world (attached to some physical body) that generates such percepts?

The idea that consciousness doesn’t exist (which is asserted by some forms of eliminative materialism) doesn’t sit well with this picture. The ontological development that produced the idea of the material world, also produced the idea of consciousness, as a dual. And both parts are necessary to make sense of percepts. So, consciousness-eliminativism will continue to be unintuitive (and for good epistemic reasons!) until it can replace world-perception ontology with one that achieves percept-explanation that is at least as effective. And that looks to be difficult or impossible.

To conclude: the ontology that allows one to conceptualize the material world as existing and not shifting constantly, includes as part of it conscious perception, and could not function without including it. Without such a component, there would be no way to refactor rapidly shifting perceptual patterns into a stable outer world and a moving point-of-view contained in it.

Is requires ought

The thesis of this post is: “Each ‘is’ claim relies implicitly or explicitly on at least one ‘ought’ claim.”

I will walk through a series of arguments that suggest that this claim is true, and then flesh out the picture towards the end.

(note: I discovered after writing this post that my argument is similar to Cuneo’s argument for moral realism; I present it anyway in the hope that it is additionally insightful)

Epistemic virtue

There are epistemic virtues, such as:

  • Try to have correct beliefs.
  • When you’re not sure about something, see if there’s a cheap way to test it.
  • Learn to distinguish between cases where you (or someone else) is rationalizing, versus when you/they are offering actual reasons for belief.
  • Notice logical inconsistencies in your beliefs and reflect on them.
  • Try to make your high-level beliefs accurately summarize low-level facts.

These are all phrased as commands, which are a type of ought claim. Yet, they all assist one following such commands to have more accurate beliefs.

Indeed, it is hard to imagine how someone who does not (explicitly or implicitly) follow rules like these could come to have accurate beliefs. There are many ways to end up in lala land, and guidelines are essential for staying on the path.

So, “is” claims that rely on the speaker of the claim having epistemic virtue to be taken seriously, rely on the “ought” claims of epistemic virtue itself.

Functionalist theory of mind

The functionalist theory of mind is “the doctrine that what makes something a mental state of a particular type does not depend on its internal constitution, but rather on the way it functions, or the role it plays, in the system of which it is a part.” For example, according to functionalism, for myself to have a world-representing mind, part of my brain must be performing the function of representing the world.

I will not here argue for the functionalist theory of mind, and instead will assume it to be true.

Consider the following “is” claim: “There is a plate on my desk.”

I believe this claim to be true. But why? I see a plate on my desk. But what does that mean?

Phenomenologically, I have the sense that there is a round object on my desk, and that this object is a plate. But it seems that we are now going in a loop.

Here’s an attempt at a way out. “My visual system functions to present me with accurate information about the objects around me. I believe it to be functioning well. And I believe my phenomenological sense of there being a plate on my desk to be from my visual system. Therefore, there is a plate on my desk.”

Well, this certainly relies on a claim of “function”. That’s not an “ought” claim about me, but it is similar (and perhaps identical) to an “ought” claim about my visual system: that presenting me with information about objects is what my visual system ought to do.

Things get hairy when examining the second sentence. “I believe it to be functioning well.” Why do I believe that?

I can consider evidence like “my visual system, along with my other sensory modalities, presents me with a coherent world that has few anomalies.” That’s a complex claim, and checking it requires things like checking my memories of how coherent the world my senses present to me is, which is again relying on the parts of my mind to perform their functions.

I can’t doubt my mind except by using my mind. And using my mind requires, at least tentatively, accepting claims like “my visual system is there for presenting me with accurate information about the objects around me.”

Indeed, even making sense of a claim such as “there is a plate on my desk” requires me to use some intuition-reliant faculty I have of mapping words to concepts; without trust in such a faculty, the claim is meaningless.

I, therefore, cannot make meaningful “is” claims without at the same time using at least some parts of my mind as tools, applying “ought” claims to them.

Social systems

Social systems, such as legal systems, academic disciplines, and religions, contain “ought” claims. Witnesses ought to be allowed to say what they saw. Judges ought to weigh the evidence presented. People ought not to murder each other. Mathematical proofs ought to be checked by peers before being published.

Many such oughts are essential for the system’s epistemology. If the norms of mathematics do not include “check proofs for accuracy” and so on, then there is little reason to believe the mathematical discipline’s “is” claims such as “Fermat’s last theorem is true.”

Indeed, it is hard for claims such as “Fermat’s last theorem is true” to even be meaningful without oughts. For, there are oughts involved in interpreting mathematical notation, and in resolving verbal references to theorems. Such as, “the true meaning of ‘+’ is integer addition, which can be computed using the following algorithm.”

Without mathematical “ought”s, “Fermat’s last theorem is true” isn’t just a doubtful claim, it’s a meaningless one, which is not even wrong.

Language itself can be considered as a social system. When people misuse language (such as by lying), their statements cannot be taken seriously, and sometimes can’t even be interpreted as having meaning.

(A possible interpretation of Baudrillard’s simulacrum theory is that level 1 is when there are sufficient “ought”s both to interpret claims and to ensure that they are true for the most part; level 2 is when there are sufficient “ought”s to meaningfully interpret claims but not to ensure that they are true; level 3 is when “ought”s are neither sufficient to interpret claims nor to ensure that they are true, but are sufficient for claims to superficially look like meaningful ones; and level 4 is where “ought”s are not even sufficient to ensure that claims superficially look meaningful.)

Nondualist epistemology

One might say to the arguments so far:

“Well, certainly, my own ‘is’ claims require some entities, each of which may be a past iteration of myself, a part of my mind, or another person, to be following oughts, in order for my claims be meaningful and/or correct. But, perhaps such oughts do not apply to me, myself, here and now.”

However, such a self/other separation is untenable.

Suppose I am a mathematical professor, who is considering performing academic fraud, to ensure that false theorems end up in journals. If I corrupt the mathematical process, then I cannot, in the future, rely on the claims of mathematical journals to be true. Additionally, if others are behaving similarly to me, then my own decision to corrupt the process is evidence that others also decide to corrupt the process. Some of these others are in the past; my own decision to corrupt the process is evidence that my own mathematical knowledge is false, as it is evidence that those before me have decided similarly. So, my own mathematical “is” claims rely on myself following mathematical “ought” claims.

(More precisely, both evidential decision theory and functional decision theory have a notion by which present decisions can have past consequences, including past consequences affecting the accuracy of presently-available information)

Indeed, the idea of corrupting the mathematical process would be horrific to most good mathematicians, in a quasi-religious way. These mathematicians’ own ability to take their work seriously enough to attain rigor depends on such a quasi-religious respect for the mathematical discipline.

Nondualist epistemology cannot rely on a self/other boundary by which decisions made in the present moment have no effects on the information available in the present moment. Lying to similar agents, thus, undermines both the meaningfulness and the truth of one’s own beliefs.


I will summarize the argument thusly:

  • Each “is” claim may or may not be justified.
  • An “is” claim is only justified if the system producing the claim is functioning well at the epistemology of this claim.
  • Specifically, an “is” claim that you make is justified only if some system you are part of is functioning well at the epistemology of that claim. (You are the one making the claim, after all, so the system must include the you who makes the claim)
  • That system (that you are part of) can only function well at the epistemology of that claim if you have some function in that system and you perform that function satisfactorily. (Functions of wholes depend on functions of parts; even if all you do is listen for a claim and repeat it, that is a function)
  • Therefore, an “is” claim that you make is justified only if you have some specific function and you expect to perform that function satisfactorily.
  • If a reasonable agent expects itself to perform some function satisfactorily, then according to that agent, that agent ought to perform that function satisfactorily.
  • Therefore, if you are a reasonable agent who accepts the argument so far, you believe that your “is” claims are only justified if you have oughts.

The second-to-last point is somewhat subtle. If I use a fork as a tool, then I am applying an “ought” to the fork; I expect it ought to function as an eating utensil. Similar to using another person as a tool (alternatively “employee” or “service worker”), giving them commands and expecting that they ought to follow them. If my own judgments functionally depend on myself performing some function, then I am using myself as a tool (expecting myself to perform that function). To avoid self-inconsistency between myself-the-tool-user and myself-the-tool, I must accept an ought, which is that I ought to satisfactorily perform the tool-function I am expecting myself to perform; if I do not accept that ought, I must drop any judgment whose justification requires me to perform the function generating this ought.

It is possible to make a similar argument about meaningfulness; the key point is that the meaningfulness of a claim depends on the functioning of an interpretive system that this claim is part of. To fail to follow the oughts implied by the meaningfulness of ones’ statements is not just to be wrong, but to collapse into incoherence.

Certainly, this argument does not imply that all “ought”s can be derived from “is”es. In particular, an agent may have degrees of freedom in how it performs its functions satisfactorily, or in doing things orthogonal to performing its functions. What the argument suggests instead is that each “is” depends on at least one “ought”, which itself may depend on an “is”, in a giant web of interdependence.

There are multiple possible interdependent webs (multiple possible mind designs, multiple possible social systems), such that a different web could have instead come in to existence, and our own web may evolve into any one of a number of future possibilities. Though, we can only reason about hypothetical webs from our own actual one.

Furthermore, it is difficult to conceive of what it would mean for the oughts being considered to be “objective”; indeed, an implication of the argument is that objectivity itself depends on oughts, at least some of which must be pre-objective or simultaneous with objectivity.

Related, at least some of those oughts that are necessary as part of the constitution of “is”, must themselves be pre-“is” or simultaneous with “is”, and thus must not themselves depend on already-constituted “is”es. A possible candidate for such an ought is: “organize!” For the world to produce a map without already containing one, it must organize itself into a self-representing structure, from a position of not already being self-representing. (Of course, here I am referring to the denotation of “organize!”, which is a kind of directed motion, rather than to the text “organize!”; the text cannot itself have effective power outside the context of a text-interpretation system)

One can, of course, sacrifice epistemology, choosing to lie and to confuse one’s self, in ways that undermine both the truth and meaningfulness of one’s own “is” claims.

But, due to the anthropic principle, we (to be a coherent “we” that can reason) are instead at an intermediate point of a process that does not habitually make such decisions, or one which tends to correct them. A process that made such decisions without correcting them would result in rubble, not reason. (And whether our own process results in rubble or reason in the future is, in part, up to us, as we are part of this process)

And so, when we are a we that can reason, we accept at least those oughts that our own reason depends on, while acknowledging the existence of non-reasoning processes that do not.

Truth-telling is aggression in zero-sum frames

If you haven’t seen The Invention Of Lying, watch some of this clip (1 minute long).

If you’re like most people, this will induce a cringe reaction. The things these people are saying, while true, are rude and would ordinarily be interpreted as socially aggressive.

In a world where white lies (and hiding things for the sake of politeness) are normalized, such truth-telling is highly unusual. One automatically suspects the motives of the truth-teller. Maybe the waiter is saying “I’m embarrassed I work here” in order to manipulate the others by garnering pity. Maybe the woman is saying the man is unattractive in order to lower his self-esteem and gain advantage over him.

These interpretations are false in the world of The Invention Of Lying, because everyone talks that way. So, revealing such information does not indicate any special plot going on, it’s just the thing to do.

In our world, revealing such information does (usually) indicate a special plot, because it is so unusual. It’s erratic , and quite possibly dangerous.

Special social plots are usually interpreted as aggressive. It’s as if the game has reached an equilibrium state, and out-of-equilibrium actions are surprise attacks.

Wiio’s law states: “communication usually fails, except by accident”. The equilibrium of the game is for no communication to happen. Breaks in the game allow real communication, something most hope for but rarely find.

If we adopt a frame that says that unusual social plots are actions that are against someone (which is a zero-sum frame), this leads to the conclusion that truth-telling is aggression, as it is necessarily part of an unusual social plot.

Non-zero-sum frames, of course, usually interpret truth-telling positively: it contributes to a shared information commons, which helps just about everyone, with few exceptions. People are often capable of switching to non-zero-sum frames in natural emergency situations, but such situations are rare.

To transition from a zero-sum frame to a non-zero-sum frame, from normalized lying to normalized truth-telling, requires a special social plot involving unusual truth-telling. Because it almost never happens by default.

Such plots are always acts of aggression, when interpreted from within a zero-sum frame. And this concern is not without merit. When lying is built into the system, and so is punishment for actions labeled as “lying”, punishment of an ordinary instance of lying (a likely result of uncareful truth-telling) isn’t part of a functional behavioral control system, it’s a random act of scapegoating.

And so, there is, in practice, a limit on the rate that truth will be told. Because truth-telling uncovers local norm-violations (which are normal), leading to scapegoating. And people who fear this (or, who detect an unusual social plot happening and reflexively oppose it) will coordinate to suppress truth-telling.