The Obliqueness Thesis

In my Xenosystems review, I discussed the Orthogonality Thesis, concluding that it was a bad metaphor. It’s a long post, though, and the comments on orthogonality build on other Xenosystems content. Therefore, I think it may be helpful to present a more concentrated discussion on Orthogonality, contrasting Orthogonality with my own view, without introducing dependencies on Land’s views. (Land gets credit for inspiring many of these thoughts, of course, but I’m presenting my views as my own here.)

First, let’s define the Orthogonality Thesis. Quoting Superintelligence for Bostrom’s formulation:

Intelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal.

To me, the main ambiguity about what this is saying is the “could in principle” part; maybe, for any level of intelligence and any final goal, there exists (in the mathematical sense) an agent combining those, but some combinations are much more natural and statistically likely than others. Let’s consider Yudkowsky’s formulations as alternatives. Quoting Arbital:

The Orthogonality Thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal.

The strong form of the Orthogonality Thesis says that there’s no extra difficulty or complication in the existence of an intelligent agent that pursues a goal, above and beyond the computational tractability of that goal.

As an example of the computational tractability consideration, sufficiently complex goals may only be well-represented by sufficiently intelligent agents. “Complication” may be reflected in, for example, code complexity; to my mind, the strong form implies that the code complexity of an agent with a given level of intelligence and goals is approximately the code complexity of the intelligence plus the code complexity of the goal specification, plus a constant. Code complexity would influence statistical likelihood for the usual Kolmogorov/Solomonoff reasons, of course.

I think, overall, it is more productive to examine Yudkowsky’s formulation than Bostrom’s, as he has already helpfully factored the thesis into weak and strong forms. Therefore, by criticizing Yudkowsky’s formulations, I am less likely to be criticizing a strawman. I will use “Weak Orthogonality” to refer to Yudkowsky’s “Orthogonality Thesis” and “Strong Orthogonality” to refer to Yudkowsky’s “strong form of the Orthogonality Thesis”.

Land, alternatively, describes a “diagonal” between intelligence and goals as an alternative to orthogonality, but I don’t see a specific formulation of a “Diagonality Thesis” on his part. Here’s a possible formulation:

Diagonality Thesis: Final goals tend to converge to a point as intelligence increases.

The main criticism of this thesis is that formulations of ideal agency, in the form of Bayesianism and VNM utility, leave open free parameters, e.g. priors over un-testable propositions, and the utility function. Since I expect few readers to accept the Diagonality Thesis, I will not concentrate on criticizing it.

What about my own view? I like Tsvi’s naming of it as an “obliqueness thesis”.

Obliqueness Thesis: The Diagonality Thesis and the Strong Orthogonality Thesis are false. Agents do not tend to factorize into an Orthogonal value-like component and a Diagonal belief-like component; rather, there are Oblique components that do not factorize neatly.

(Here, by Orthogonal I mean basically independent of intelligence, and by Diagonal I mean converging to a point in the limit of intelligence.)

While I will address Yudkowsky’s arguments for the Orthogonality Thesis, I think arguing directly for my view first will be more helpful. In general, it seems to me that arguments for and against the Orthogonality Thesis are not mathematically rigorous; therefore, I don’t need to present a mathematically rigorous case to contribute relevant considerations, so I will consider intuitive arguments relevant, and present multiple arguments rather than a single sequential argument (as I did with the more rigorous argument for many worlds).

Bayes/VNM point against Orthogonality

Some people may think that the free parameters in Bayes/VNM point towards the Orthogonality Thesis being true. I think, rather, that they point against Orthogonality. While they do function as arguments against the Diagonality Thesis, this is insufficient for Orthogonality.

First, on the relationship between intelligence and bounded rationality. It’s meaningless to talk about intelligence without a notion of bounded rationality. Perfect rationality in a complex environment is computationally intractable. With lower intelligence, bounded rationality is necessary. So, at non-extreme intelligence levels, the Orthogonality Thesis must be making a case that boundedly rational agents can have any computationally tractable goal.

Bayesianism and VNM expected utility optimization are known to be computationally intractable in complex environments. That is why algorithms like MCMC and reinforcement learning are used. So, making an argument for Orthogonality in terms of Bayesianism and VNM is simply dodging the question, by already assuming an extremely high intelligence level from the start.

As the Orthogonality Thesis refers to “values” or “final goals” (which I take to be synonymous), it must have a notion of the “values” of agents that are not extremely intelligent. These values cannot be assumed to be VNM, since VNM is not computationally tractable. Meanwhile, money-pumping arguments suggest that extremely intelligent agents will tend to converge to VNM-ish preferences. Thus:

Argument from Bayes/VNM: Agents with low intelligence will tend to have beliefs/values that are far from Bayesian/VNM. Agents with high intelligence will tend to have beliefs/values that are close to Bayesian/VNM. Strong Orthogonality is false because it is awkward to combine low intelligence with Bayesian/VNM beliefs/values, and awkward to combine high intelligence with far-from-Bayesian/VNM beliefs/values. Weak Orthogonality is in doubt, because having far-from-Bayesian/VNM beliefs/values puts a limit on the agent’s intelligence.

To summarize: un-intelligent agents cannot be assumed to be Bayesian/VNM from the start. Those arise at a limit of intelligence, and arguably have to arise due to money-pumping arguments. Beliefs/values therefore tend to become more Bayesian/VNM with high intelligence, contradicting Strong Orthogonality and perhaps Weak Orthogonality.

One could perhaps object that logical uncertainty allows even weak agents to be Bayesian over combined physical/mathematical uncertainty; I’ll address this consideration later.

Belief/value duality

It may be unclear why the Argument from Bayes/VNM refers to both beliefs and values, as the Orthogonality Thesis is only about values. It would, indeed, be hard to make the case that the Orthogonality Thesis is true as applied to beliefs. However, various arguments suggest that Bayesian beliefs and VNM preferences are “dual” such that complexity can be moved from one to the other.

Abram Demski has presented this general idea in the past, and I’ll give a simple example to illustrate.

Let A \in \mathcal{A} be the agent’s action, and let W \in \mathcal{W} represent the state of the world prior to / unaffected by the agent’s action Let r(A, W) be the outcome resulting from the action and world. Let P(w) be the primary agent’s probability a given world. Let U(o) be the primary agent’s utility for outcome o. The primary agent finds an action a to maximize \sum_{w \in \mathcal{W}} P(w) U(r(a, w)).

Now let e be an arbitrary predicate on worlds. Consider modifying P to increase the probability that e(W) is true. That is:

P'(w) :\propto P(w) (1 + [e(w)])

P'(w) = \frac{P(w)(1 + [e(w)])}{\sum_{w \in \mathcal{W}} P(w)(1 + [e(w)])}

where [e(w)] equals 1 if e(w), otherwise 0. Now, can we define a modified utility function U’ so a secondary agent with beliefs P’ and utility function U’ will take the same action as the primary agent? Yes:

U'(o) := \frac{U(o)}{1 + [e(w)]}

This secondary agent will find an action a to maximize: 

\sum_{w \in \mathcal{W}} P'(w) U'(r(a, w))

= \sum_{w \in \mathcal{W}} \frac{P(w)(1 + [e(w)])}{\sum_{w' \in \mathcal{W}} P(w')(1 + [e(w')])} \frac{U(r(a, w))}{1 + [e(w)]}

= \frac{1}{\sum_{w \in \mathcal{W}} P(w)(1 + [e(w)])} \sum_{w \in \mathcal{W}} P(w) U(r(a, w))

Clearly, this is a positive constant times the primary agent’s maximization target, so the secondary agent will take the same action.

This demonstrates a basic way that Bayesian beliefs and VNM utility are dual to each other. One could even model all agents as having the same utility function (of maximizing a random variable U) and simply having different beliefs about what U values are implied by the agent’s action and world state. Thus:

Argument from belief/value duality: From an agent’s behavior, multiple belief/value combinations are valid attributions. This is clearly true in the limiting Bayes/VNM case, suggesting it also applies in the case of bounded rationality. It is unlikely that the Strong Orthogonality Thesis applies to beliefs (including priors), so, due to the duality, it is also unlikely that it applies to values.

I consider this weaker than the Argument from Bayes/VNM. Someone might object that both values and a certain component of beliefs are orthogonal, while the other components of beliefs (those that change with more reasoning/intelligence) aren’t. But I think this depends on a certain factorizability of beliefs/values into the kind that change on reflection and those that don’t, and I’m skeptical of such factorizations. I think discussion of logical uncertainty will make my position on this clearer, though, so let’s move on.

Logical uncertainty as a model for bounded rationality

I’ve already argued that bounded rationality is essential to intelligence (and therefore the Orthogonality Thesis). Logical uncertainty is a form of bounded rationality (as applied to guessing the probabilities of mathematical statements). Therefore, discussing logical uncertainty is likely to be fruitful with respect to the Orthogonality Thesis.

Logical Induction is a logical uncertainty algorithm that produces a probability table for a finite subset of mathematical statements at each iteration. These beliefs are determined by a betting market of an increasing (up to infinity) number of programs that make bets, with the bets resolved by a “deductive process” that is basically a theorem prover. The algorithm is computable, though extremely computationally intractable, and has properties in the limit including some forms of Bayesian updating, statistical learning, and consistency over time.

We can see Logical Induction as evidence against the Diagonality Thesis: beliefs about undecidable statements (which exist in consistent theories due to Gödel’s first incompleteness theorem) can take on any probability in the limit, though satisfy properties such as consistency with other assigned probabilities (in a Bayesian-like manner).

However, (a) it is hard to know ahead of time which statements are actually undecidable, (b) even beliefs about undecidable statements tend to predictably change over time to Bayesian consistency with other beliefs about undecidable statements. So, Logical Induction does not straightforwardly factorize into a “belief-like” component (which converges on enough reflection) and a “value-like” component (which doesn’t change on reflection). Thus:

Argument from Logical Induction: Logical Induction is a current best-in-class model of theoretical asymptotic bounded rationality. Logical Induction is non-Diagonal, but also clearly non-Orthogonal, and doesn’t apparently factorize into separate Orthogonal and Diagonal components. Combined with considerations from “Argument from belief/value duality”, this suggests that it’s hard to identify all value-like components in advanced agents that are Orthogonal in the sense of not tending to change upon reflection.

One can imagine, for example, introducing extra function/predicate symbols into the logical theory the logical induction is over, to represent utility. Logical induction will tend to make judgments about these functions/predicates more consistent and inductively plausible over time, changing its judgments about the utilities of different outcomes towards plausible logical probabilities. This is an Oblique (non-Orthogonal and non-Diagonal) change in the interpretation of the utility symbol over time.

Likewise, Logical Induction can be specified to have beliefs over empirical facts such as observations by adding additional function/predicate symbols, and can perhaps update on these as they come in (although this might contradict UDT-type considerations). Through more iteration, Logical Inductors will come to have more approximately Bayesian, and inductively plausible, beliefs about these empirical facts, in an Oblique fashion.

Even if there is a way of factorizing out an Orthogonal value-like component from an agent, the belief-component (represented by something like Logical Induction) remains non-Diagonal, so there is still a potential “alignment problem” for these non-Diagonal components to match, say, human judgments in the limit. I don’t see evidence that these non-Diagonal components factor into a value-like “prior over the undecidable” that does not change upon reflection. So, there remain components of something analogous to a “final goal” (by belief/value duality) that are Oblique, and within the scope of alignment.

If it were possible to get the properties of Logical Induction in a Bayesian system, which makes Bayesian updates on logical facts over time, that would make it more plausible that an Orthogonal logical prior could be specified ahead of time. However, MIRI researchers have tried for a while to find Bayesian interpretations of Logical Induction, and failed, as would be expected from the Argument from Bayes/VNM.

Naive belief/value factorizations lead to optimization daemons

The AI alignment field has a long history of poking holes in alignment approaches. Oops, you tried making an oracle AI and it manipulated real-world outcomes to make its predictions true. Oops, you tried to do Solomonoff induction and got invaded by aliens. Oops, you tried getting agents to optimize over a virtual physical universe, and they discovered the real world and tried to break out. Oops, you ran a Logical Inductor and one of the traders manipulated the probabilities to instantiate itself in the real world.

These sub-processes that take over are known as optimization daemons. When you get the agent architecture wrong, sometimes a sub-process (that runs a massive search over programs, such as with Solomonoff Induction) will luck upon a better agent architecture and out-compete the original system. (See also a very strange post I wrote some years back while thinking about this issue, and Christiano’s comment relating it to Orthogonality).

If you apply a naive belief/value factorization to create an AI architecture, when compute is scaled up sufficiently, optimization daemons tend to break out, showing that this factorization was insufficient. Enough experiences like this lead to the conclusion that, if there is a realistic belief/value factorization at all, it will look pretty different from the naive one. Thus:

Argument from optimization daemons: Naive ways of factorizing an agent into beliefs/values tend to lead to optimization daemons, which have different values from in the original factorization. Any successful belief/value factorization will probably look pretty different from the naive one, and might not take the form of factorization into Diagonal belief-like components and Orthogonal value-like components. Therefore, if any realistic formulation of Orthogonality exists, it will be hard to find and substantially different from naive notions of Orthogonality.

Intelligence changes the ontology values are expressed in

The most straightforward way to specify a utility function is to specify an ontology (a theory of what exists, similar to a database schema) and then provide a utility function over elements of this ontology. Prior to humans learning about physics, evolution (taken as a design algorithm for organisms involving mutation and selection) did not know all that human physicists know. Therefore, human evolutionary values are unlikely to be expressed in the ontology of physics as physicists currently believe in.

Human evolutionary values probably care about things like eating enough, social acceptance, proxies for reproduction, etc. It is unknown how these are specified, but perhaps sensory signals (such as stomach signals) are connected with a developing world model over time. Humans can experience vertigo at learning physics, e.g. thinking that free will and morality are fake, leading to unclear applications of native values to a realistic physical ontology. Physics has known gaps (such as quantum/relativity correspondence, and dark energy/dark matter) that suggest further ontology shifts.

One response to this vertigo is to try to solve the ontology identification problem; find a way of translating states in the new ontology (such as physics) to an old one (such as any kind of native human ontology), in a structure-preserving way, such that a utility function over the new ontology can be constructed as a composition of the original utility function and the new-to-old ontological mapping. Current solutions, such as those discussed in MIRI’s Ontological Crises paper, are unsatisfying. Having looked at this problem for a while, I’m not convinced there is a satisfactory solution within the constraints presented. Thus:

Argument from ontological change: More intelligent agents tend to change their ontology to be more realistic. Utility functions are most naturally expressed relative to an ontology. Therefore, there is a correlation between an agent’s intelligence and utility function, through the agent’s ontology as an intermediate variable, contradicting Strong Orthogonality. There is no known solution for rescuing the old utility function in the new ontology, and some research intuitions pointing towards any solution being unsatisfactory in some way.

If a satisfactory solution is found, I’ll change my mind on this argument, of course, but I’m not convinced such a satisfactory solution exists. To summarize: higher intelligence causes ontological changes, and rescuing old values seems to involve unnatural “warps” to make the new ontology correspond with the old one, contradicting at least Strong Orthogonality, and possibly Weak Orthogonality (if some values are simply incompatible with realistic ontology). Paperclips, for example, tend to appear most relevant at an intermediate intelligence level (around human-level), and become more ontologically unnatural at higher intelligence levels.

As a more general point, one expects possible mutual information between mental architecture and values, because values that “re-use” parts of the mental architecture achieve lower description length. For example, if the mental architecture involves creating universal algebra structures and finding analogies between them and the world, then values expressed in terms of such universal algebras will tend to have lower relative description complexity to the architecture. Such mutual information contradicts Strong Orthogonality, as some intelligence/value combinations are more natural than others.

Intelligence leads to recognizing value-relevant symmetries

Consider a number of un-intutitive value propositions people have argued for:

  • Torture is preferable to Dust Specks, because it’s hard to come up with a utility function with the alternative preference without horrible unintuitive consequences elsewhere.
  • People are way too risk-averse in betting; the implied utility function has too strong diminishing marginal returns to be plausible.
  • You may think your personal identity is based on having the same atoms, but you’re wrong, because you’re distinguishing identical configurations.
  • You may think a perfect upload of you isn’t conscious (and basically another copy of you), but you’re wrong, because functionalist theory of mind is true.
  • You intuitively accept the premises of the Repugnant Conclusion, but not the Conclusion itself; you’re simply wrong about one of the premises, or the conclusion.

The point is not to argue for these, but to note that these arguments have been made and are relatively more accepted among people who have thought more about the relevant issues than people who haven’t. Thinking tends to lead to noticing more symmetries and dependencies between value-relevant objects, and tends to adjust values to be more mathematically plausible and natural. Of course, extrapolating this to superintelligence leads to further symmetries. Thus:

Argument from value-relevant symmetries: More intelligent agents tend to recognize more symmetries related to value-relevant entities. They will also tend to adjust their values according to symmetry considerations. This is an apparent value change, and it’s hard to see how it can instead be factored as a Bayesian update on top of a constant value function.

I’ll examine such factorizations in more detail shortly.

Human brains don’t seem to neatly factorize

This is less about the Orthogonality Thesis generally, and more about human values. If there were separable “belief components” and “value components” in the human brain, with the value components remaining constant over time, that would increase the chance that at least some Orthogonal component can be identified in human brains, corresponding with “human values” (though, remember, the belief-like component can also be Oblique rather than Diagonal).

However, human brains seem much more messy than the sort of computer program that could factorize this way. Different brain regions are connected in at least some ways that are not well-understood. Additionally, even apparent “value components” may be analogous to something like a deep Q-learning function, which incorporates empirical updates in addition to pre-set “values”.

The interaction between human brains and language is also relevant. Humans develop values they act on partly through language. And language (including language reporting values) is affected by empirical updates and reflection, thus non-Orthogonal. Reflecting on morality can easily change people’s expressed and acted-upon values, e.g. in the case of Peter Singer. People can change which values they report as instrumental or terminal even while behaving similarly (e.g. flipping between selfishness-as-terminal and altruism-as-terminal), with the ambiguity hard to resolve because most behavior relates to convergent instrumental goals.

Maybe language is more of an effect than cause of values. But there really seems to be feedback from language to non-linguistic brain functions that decide actions and so on. Attributing coherent values over realistic physics to the brain parts that are non-linguistic seems like a form of projection or anthropomorphism. Language and thought have a function in cognition and attaining coherent values over realistic ontologies. Thus:

Argument from brain messiness: Human brains don’t seem to neatly factorize into a belief-component and a value-component, with the value-component unaffected by reflection or language (which it would need to be Orthogonal). To the extent any value-component does not change due to language or reflection, it is restricted to evolutionary human ontology, which is unlikely to apply to realistic physics; language and reflection are part of the process that refines human values, rather than being an afterthought of them. Therefore, if the Orthogonality Thesis is true, humans lack identifiable values that fit into the values axis of the Orthogonality Thesis.

This doesn’t rule out that Orthogonality could apply to superintelligences, of course, but it does raise questions for the project of aligning superintelligences with human values; perhaps such values do not exist or are not formulated so as to apply to the actual universe.

Models of ASI should start with realism

Some may take arguments against Orthogonality to be disturbing at a value level, perhaps because they are attached to research projects such as Friendly AI (or more specific approaches), and think questioning foundational assumptions would make the objective (such as alignment with already-existing human values) less clear. I believe “hold off on proposing solutions” applies here: better strategies are likely to come from first understanding what is likely to happen absent a strategy, then afterwards looking for available degrees of freedom.

Quoting Yudkowsky:

Orthogonality is meant as a descriptive statement about reality, not a normative assertion. Orthogonality is not a claim about the way things ought to be; nor a claim that moral relativism is true (e.g. that all moralities are on equally uncertain footing according to some higher metamorality that judges all moralities as equally devoid of what would objectively constitute a justification). Claiming that paperclip maximizers can be constructed as cognitive agents is not meant to say anything favorable about paperclips, nor anything derogatory about sapient life.

Likewise, Obliqueness does not imply that we shouldn’t think about the future and ways of influencing it, that we should just give up on influencing the future because we’re doomed anyway, that moral realist philosophers are correct or that their moral theories are predictive of ASI, that ASIs are necessarily morally good, and so on. The Friendly AI research program was formulated based on descriptive statements believed at the time, such as that an ASI singleton would eventually emerge, that the Orthogonality Thesis is basically true, and so on. Whatever cognitive process formulated this program would have formulated a different program conditional on different beliefs about likely ASI trajectories. Thus:

Meta-argument from realism: Paths towards beneficially achieving human values (or analogues, if “human values” don’t exist) in the far future likely involve a lot of thinking about likely ASI trajectories absent intervention. The realistic paths towards human influence on the far future depend on realistic forecasting models for ASI, with Orthogonality/Diagonality/Obliqueness as alternative forecasts. Such forecasting models can be usefully thought about prior to formulation of a research program intended to influence the far future. Formulating and working from models of bounded rationality such as Logical Induction is likely to be more fruitful than assuming that bounded rationality will factorize into Orthogonal and Diagonal components without evidence in favor of this proposition. Forecasting also means paying more attention to the Strong Orthogonality Thesis than the Weak Orthogonality Thesis, as statistical correlations between intelligence and values will show up in such forecasts.

On Yudkowsky’s arguments

Now that I’ve explained my own position, addressing Yudkowsky’s main arguments may be useful. His main argument has to do with humans making paperclips instrumentally:

Suppose some strange alien came to Earth and credibly offered to pay us one million dollars’ worth of new wealth every time we created a paperclip. We’d encounter no special intellectual difficulty in figuring out how to make lots of paperclips.

That is, minds would readily be able to reason about:

  • How many paperclips would result, if I pursued a policy \pi_0?
  • How can I search out a policy \pi that happens to have a high answer to the above question?

I believe it is better to think of the payment as coming in the far future and perhaps in another universe; that way, the belief about future payment is more analogous to terminal values than instrumental values. In this case, creating paperclips is a decent proxy for achievement of human value, so long-termist humans would tend to want lots of paperclips to be created.

I basically accept this, but, notably, Yudkowsky’s argument is based on belief/value duality. He thinks it would be awkward for the reader to imagine terminally wanting paperclips, so he instead asks them to imagine a strange set of beliefs leading to paperclip production being oddly correlated with human value achievement. Thus, acceptance of Yudkowsky’s premises here will tend to strengthen the Argument from belief/value duality and related arguments.

In particular, more intelligence would cause human-like agents to develop different beliefs about what actions aliens are likely to reward, and what numbers of paperclips different policies result in. This points towards Obliqueness as with Logical Induction: such beliefs will be revised (but not totally convergent) over time, leading to applying different strategies toward value achievement. And ontological issues around what counts as a paperclip will come up at some point, and likely be decided in a prior-dependent but also reflection-dependent way.

Beliefs about which aliens are most capable/honest likely depend on human priors, and are therefore Oblique: humans would want to program an aligned AI to mostly match these priors while revising beliefs along the way, but can’t easily factor out their prior for the AI to share.

Now onto other arguments. The “Size of mind design space” argument implies many agents exist with different values from humans, which agrees with Obliqueness (intelligent agents tend to have different values from unintelligent ones). It’s more of an argument about the possibility space than statistical correlation, thus being more about Weak than Strong Orthogonality.

The “Instrumental Convergence” argument doesn’t appear to be an argument for Orthogonality per se; rather, it’s a counter to arguments against Orthogonality based on noticing convergent instrumental goals. My arguments don’t take this form.

Likewise, “Reflective Stability” is about a particular convergent instrumental goal (preventing value modification). In an Oblique framing, a Logical Inductor will tend not to change its beliefs about even un-decidable propositions too often (as this would lead to money-pumps), so consistency is valued all else being equal.

While I could go into more detail responding to Yudkowsky, I think space is better spent presenting my own Oblique views for now.

Conclusion

As an alternative to the Orthogonality Thesis and the Diagonality Thesis, I present the Obliqueness Thesis, which says that increasing intelligence tends to lead to value changes but not total value convergence. I have presented arguments that advanced agents and humans do not neatly factor into Orthogonal value-like components and Diagonal belief-like components, using Logical Induction as a model of bounded rationality. This implies complications to theories of AI alignment based on assuming humans have values and we need the AGI to agree about those values, while increasing their intelligence (and thus changing beliefs).

At a methodological level, I believe it is productive to start by forecasting default ASI using models of bounded rationality, especially known models such as Logical Induction, and further developing such models. I think this is more productive than assuming that these models will take the form of a belief/value factorization, although I have some uncertainty about whether such a factorization will be found.

If the Obliqueness Thesis is accepted, what possibility space results? One could think of this as steering a boat in a current of varying strength. Clearly, ignoring the current and just steering where you want to go is unproductive, as is just going along with the current and not trying to steer at all. Getting to where one wants to go consists in largely going with the current (if it’s strong enough), charting a course that takes it into account.

Assuming Obliqueness, it’s not viable to have large impacts on the far future without accepting some value changes that come from higher intelligence (and better epistemology in general). The Friendly AI research program already accepts that paths towards influencing the far future involve “going with the flow” regarding superintelligence, ontology changes, and convergent instrumental goals; Obliqueness says such flows go further than just these, being hard to cleanly separate from values.

Obliqueness obviously leaves open the question of just how oblique. It’s hard to even formulate a quantitative question here. I’d very intuitively and roughly guess that intelligence and values are 3 degrees off (that is, almost diagonal), but it’s unclear what question I am even guessing the answer to. I’ll leave formulating and answering the question as an open problem.

I think Obliqueness is realistic, and that it’s useful to start with realism when thinking of how to influence the far future. Maybe superintelligence necessitates significant changes away from current human values; the Litany of Tarski applies. But this post is more about the technical thesis than emotional processing of it, so I’ll end here.

Book review: Xenosystems

(also posted on LessWrong)

I’ve met a few Landians over the last couple years, and they generally recommend that I start with reading Nick Land’s (now defunct) Xenosystems blog, or Xenosystems, a Passage Publishing book that compiles posts from the blog. While I’ve read some of Fanged Noumena in the past, I would agree with these Landians that Xenosystems (and currently, the book version) is the best starting point. In the current environment, where academia has lost much of its intellectual relevance, it seems overly pretentious to start with something as academic as Fanged Noumena. I mainly write in the blogosphere rather than academia, and so Xenosystems seems appropriate to review.

The book’s organization is rather haphazard (as might be expected from a blog compilation). It’s not chronological, but rather separated into thematic chapters. I don’t find the chapter organization particularly intuitive; for example, politics appears throughout, rather than being its own chapter or two. Regardless, the organization was sensible enough for a linear read to be satisfying and only slightly chronologically confusing.

That’s enough superficialities. What is Land’s intellectual project in Xenosystems? In my head it’s organized in an order that is neither chronological nor the order of the book. His starting point is neoreaction, a general term for an odd set of intellectuals commenting on politics. As he explains, neoreaction is cladistically (that is, in terms of evolutionary branching-structure) descended from Moldbug. I have not read a lot of Moldbug, and make no attempt to check Land’s attributions of Moldbug to the actual person. Same goes for other neoreactionary thinkers cited.

Neoreaction is mainly unified by opposition to the Cathedral, the dominant ideology and ideological control system of the academic-media complex, largely branded left-wing. But a negation of an ideology is not itself an ideology. Land describes a “Trichotomy” within neo-reaction (citing Spandrell), of three currents: religious theonomists, ethno-nationalists, and techno-commercialists.

Land is, obviously, of the third type. He is skeptical of a unification of neo-reaction except in its most basic premises. He centers “exit”, the option of leaving a social system. Exit is related to sectarian splitting and movement dissolution. In this theme, he eventually announces that techno-commercialists are not even reactionaries, and should probably go their separate ways.

Exit is a fertile theoretical concept, though I’m unsure about the practicalities. Land connects exit to science, capitalism, and evolution. Here there is a bridge from political philosophy (though of an “anti-political” sort) to metaphysics. When you Exit, you let the Outside in. The Outside is a name for what is outside society, mental frameworks, and so on. This recalls the name of his previous book, Fanged Noumena; noumena are what exist in themselves outside the Kantian phenomenal realm. The Outside is dark, and it’s hard to be specific about its contents, but Land scaffolds the notion with Gnon-theology, horror aesthetics, and other gestures at the negative space.

He connects these ideas with various other intellectual areas, including cosmology, cryptocurrency, and esoteric religion. What I see as the main payoff, though, is thorough philosophical realism. He discusses the “Will-to-Think”, the drive to reflect and self-cultivate, including on one’s values. The alternative, he says, is intentional stupidity, and likely to lose if it comes to a fight. Hence his criticism of the Orthogonality Thesis.

I have complex thoughts and feelings on the topic; as many readers will know, I have worked at MIRI and have continued thinking and writing about AI alignment since then. What I can say before getting into more details later in the post is that Land’s Will-to-Think argument defeats not-especially-technical conceptions of orthogonality, which assume intelligence should be subordinated to already-existent human values; these values turn out to only meaningfully apply to the actual universe when elaborated and modified through thinking. More advanced technical conceptions of orthogonality mostly apply to AGIs and not humans; there’s some actual belief difference there and some more salient framing differences. And, after thinking about it more, I think orthogonality is a bad metaphor and I reject it as stated by Bostrom, for technical reasons I’ll get to.

Land is an extreme case of “hold off on proposing solutions before discussing problems”, which I’m taking as synonymous with realism. The book as a whole is highly realist, unusually so for a work of its intellectual breadth. The book invites reading through this realist lens, and through this lens, I see it as wrong about some things, but it presents a clear framework, and I believe my thinking has been sharpened by internalizing and criticizing it. (I elaborate on my criticisms of particular articles as I go, and more holistic criticisms in a specific section; such criticisms are aided by the realism, so the book can be read as wrong rather than not-even-wrong.)

A few general notes on reviewing Land:

  • Politics is now more important than before to AI alignment, especially since MIRI’s shift to focus on policy. As e/acc has risen, addressing it becomes more urgent, and I believe reviewing Land can also indirectly address the more intellectual scraps of e/acc.
  • This post is a review of Xenosystems (the book), not Land generally.
  • As preliminary background, readers should understand the basics of cybernetics, such as the distinction between positive and negative feedback, and the way in which cybernetic nodes can be connected in a circuit.
  • If this content interests you, I recommend reading the book (or, perhaps the alternative compilation Xenosystems Fragments); the review may help interpret the book more easily, but it is no replacement.

I’ll save most of my general thoughts about the book for the concluding section, but to briefly summarize, I enjoyed reading the book and found it quite helpful for refining my own models. It’s thoughtful enough that, even when he’s wrong, he provides food for thought. Lots of people will bounce off for one reason or another, but I’m glad I didn’t this time.

Neoreactionary background

The beginning of Xenosystems (the book; I’m not tracking the blog’s chronology) is writing to a non-specific neoreactionary audience. Naturally, non-specific neoreaction shares at most a minimal set of beliefs. He attempts an enumeration in “Premises of Neoreaction”:

  1. “Democracy is unable to control government.” Well, even the pro-democracy people tend to be pessimistic about that, so it’s not hard to grant that. This premise leads to pessimism about a “mainstream right”: Land believes such a mainstream would tend towards state expansion due to the structure of the democratic mechanism. Moreover, democracy implies cybernetic feedback from voters, who tend to be ignorant and easily deceived; democracy is not particularly steered by material reality.
  2. “The egalitarianism essential to democratic ideology is incompatible with liberty.” This recalls Thiel’s comments on the incompatibility of democracy and freedom. This proposition seems basically analytic: democracy tends towards rule by the majority (hence contravening freedom for minorities). One can quibble about the details of equality of rights vs. opportunity vs. outcomes, but, clearly, mainstream equality/equity discourse goes way beyond equality of rights, promoting wealth redistribution or (usually) worse.
  3. “Neoreactionary socio-political solutions are ultimately Exit-based.” The concept of exit, contrasting it with voice, has pre-neoreactionary precedent. You can try convincing people of things, but they always have the option of not agreeing (despite your well-argued manifesto), so what do you do then? Exit is the main answer: if you’re more effective and reality-based than them, that gives you an advantage in eventually out-competing them. The practicalities are less clear (due to economies of scale, what’s a realistic minimum viable exiting population?), but the concept is sound at some level of abstraction.

Well, as a matter of honesty, I’ll accept that I’m a neoreactionary in Land’s sense, despite only having ever voted Democrat. This allows me to follow along with the beginning of the book more easily, but Land’s conception of neoreaction will evolve and fragment, as we’ll see.

What does any of this have to do with reaction (taken as skepticism about political and social progress), though? Land’s decline theory is detailed and worth summarizing. In “The Idea of Neoreaction”, he describes a “degenerative ratchet”: the progress of progressives is hard to undo. Examples would include “the welfare state, macroeconomic policymaking, massively-extended regulatory bureaucracy, coercive-egalitarian secular religion, or entrenched globalist intervention”. The phenomenon of Republicans staunchly defending Social Security and Medicare is, from a time-zoomed-out perspective, rather hilarious.

You and I probably like at least some examples of “progress”, but believing “progress” (what is more easily done than un-done) is in general good is an article of faith that collapses upon examination. But this raises a question: why aren’t we all hyper-Leninists by now? Land says the degenerative ratchet must stop at some point, and what happens next cannot be anticipated from within the system (it’s Outside).

A few notes on Land’s decline theory:

  • In “Re-Accelerationism”, Land contrasts industrial capitalism (an accelerant) with “progress” (a decelerant). (I see this as specifying the main distinction between degenerative ratchets and technological development, both of which are hard to reverse). Technological and economic advances would have made the world much richer by now, if not for political interference (this is a fairly mainstream economic view; economists trend libertarian). He brings up the possibility of “re-accelerationism”, a way of interfering with cybernetic stabilizing/decelerating forces by triggering them to do “hunting”, repeated over-compensations in search of equilibrium. Re-accelerationism has the goal “escape into uncompensated cybernetic runaway”. This can involve large or small crashes of the control system along the way.
  • In “The Ruin Reservoir” and “Collapse Schedules”, Land is clear that the ratchet can go on for a long time (decades or more) without crashing, with Detroit and the USSR as examples.
  • In “Down-Slopes”, Land says it is easy to overestimate the scope of a collapse; it’s easy to experience the collapse of your social bubble as the collapse of the West (yes, I’ve been there). He also says that Kondratiev cycles (economic cycles of about 50 years) imply that some decline is merely transitional.

Broadly, I’m somewhat suspicious that “Cthulhu may swim slowly. But he only swims left” (Moldbug, quoted by Land), not least because “left” doesn’t seem well-defined. Javier Milei’s governance seems like an example of a successful right-libertarian political shift; would Land say this shift involved small collapses or “re-accelerationism”? What opposed Cthulhu’s motion here? Milei doesn’t fit a strawman declinist model, but Land’s model is more detailed and measured. For getting more specific about the predictions of a “degenerative ratchet” phenomenon, the spacio-temporal scope of these ratchets matters; a large number of small ratchets has different implications from a small number of large ratchets, and anyway there are probably ratchets of multiple sizes.

At this point it is appropriate to explain a core neoreactionary concept: the Cathedral. This concept comes from Moldbug, but I’ll focus on Land’s version.

In “The Red Pill”, Land identifies the Cathedral with “the entire Matrix itself”, and compares The Matrix to Plato’s Allegory of the Cave and to the Gnostic worldview (which features a mind-controlling false god, the Demiurge). Having one’s mind sufficiently controlled by the Matrix leads to, upon seeing that one has been lied to, being dissatisfied at having not been lied to well enough, rather than being dissatisfied about having been lied to at all.

In “Cathedral Notes #1”, Land describes the Cathedral as characterized by its “inability to learn”. It has a “control core” that does not accept cybernetic feedback, but rather tries to control what messages are promoted externally. Due to its stubborn implacability, its enemies have no strategic option but to extinguish it.

In “Cathedralism”, Land notes that the Cathedral is “the subsumption of politics into propaganda”, a PR-ification of politics. To the Cathedral, crises take the form: “This looks bad”. The Cathedral’s response to civilizational decay is to persuade people that the civilization is not decaying. Naturally, this means suppressing cybernetic feedback required to tackle the crisis, a form of shooting the messenger, or narcissism.

In “Cathedral Decay”, Land notes that the Cathedral is vulnerable to Internet-driven disintermediation. As an obvious example, Land notes that Internet neoreaction is a symptom of cathedral decay.

In “Apophatic Politics”, Land identifies democratic world government (DWG) as the “only conceivable equilibrium state” of the Cathedral; if it does not achieve this, it dies. And DWG is, obviously, hard to achieve. The world has enough local variation to be, well, highly problematic. China, for example, is “alien to the Cathedral” (“NRx with Chinese Characteristics”; notably, Land lives in China).

Broadly, I’d agree with Land that the Cathedral is vulnerable to decay and collapse, which is part of why I think Moldbug’s Cathedral is by now an outdated theory (though, perhaps Land’s version accommodates incoherencies). While there was somewhat of a working Matrix in 2012, this is much less so in 2024; the media-education complex has abandoned and contradicted more of logic itself by now, implying that it fails to create a coherent Matrix-like simulation. And Musk’s acquisition of Twitter/X makes Cathedral control of discourse harder. The Matrix Resurrections portrays an incoherent Matrix (with memory suppression and more emotional rather than realistic experiences), updating with the times.

It’s also a mistake to conflate the Cathedral with intersectional feminism (“social justice” or “wokeness”); recent commentary on Gaza has revealed that Cathedral institutions can deviate from intersectional feminism towards support for political Islamism depending on circumstances.

These days, compliance with the media-educational complex is not mainly about ideology (taken to mean a reasonably consistent set of connected beliefs), it’s mainly about vibes and improvisational performativity. The value judgments here are more moral noncognitivist than moral cognitivist; they’re about “yay” and “boo” on the appropriate things, not about moral beliefs per se.

The Trichotomy

Land specifies a trichotomy within neoreaction:

  1. Theonomists, traditional religious types. (Land doesn’t address them for the most part)
  2. Ethno-nationalists, people who believe in forming nations based on shared ethnicity; nationalism in general is about forming a nation based on shared features that are not limited to ethnicity, such as culture and language.
  3. Techno-commercialists, hyper-capitalist tech-accelerationist types.

It’s an odd bunch mainly unified by opposition to the Cathedral. Land is skeptical that these disparate ideological strains can be unified. As such, neoreaction can’t “play at dialectics with the Cathedral”: it’s nothing like a single position. And “Trichotomocracy”, a satirical imagination of a trichotomy-based system of government, further establishes that neoreaction is not in itself something capable of ruling.

There’s a bit of an elephant in the room: isn’t it unwise to share a movement with ethno-nationalists? In “What is the Alt Right?”, Land identifies the alt right as the “populist dissident right”, and an “inevitable outcome of Cathedral over-reach”. He doesn’t want much of anything to do with them; they’re either basically pro-fascism or basically think the concept of “fascism” is meaningless, while Land has a more specific model of fascism as a “late-stage leftist aberration made peculiarly toxic by its comparative practicality”. (Fascism as left-aligned is, of course, non-standard; Land’s alternative political spectrum may aid interpretation.)

Land further criticizes white nationalism in “Questions of Identity”. In response to a populist white nationalist, he replies that “revolutionary populism almost perfectly captures what neoreaction is not”. He differentiates white nationalism from HBD (human bio-diversity) studies, noting that HBD tends towards cosmopolitan science and meritocratic elitism. While he acknowledges that libertarian policies tend to have ethnic and cultural pre-conditions, these ethnic/cultural characteristics, such as cosmopolitan openness, are what white nationalists decry. And he casts doubt on the designation of a pan-European “white race”, due to internal variation.

He elaborates on criticisms of “whiteness” in “White Fright”, putting a neoreactionary spin on Critical Whiteness Studies (a relative of Critical Race Theory). He describes a suppressed racial horror (stemming in part from genocidal tendencies throughout history), and a contemporary example: “HBD is uniquely horrible to white people”. He examines the (biologically untethered) notion of “Whiteness” in Critical Whiteness Studies; Whiteness tends towards universalism, colorblindness, and ethno-masochism (white guilt). Libertarianism, for example, displays these White tendencies, including in its de-emphasis of race and support for open borders.

In “Hell-Baked”, Land declares that neoreaction is Social Darwinist, which he defines as “the proposition that Darwinian processes have no limits relative to us”, recalling Dennett’s description of Darwinism as a “universal acid”. (I’ll save criticisms related to future Singletons for later.) He says this proposition implies that “everything of value has been built in Hell”. This seems somewhat dysphemistic to me: hell could be taken to mean zero-sumness, whereas “nature red in tooth and claw”, however harsh, is non-zero-sum (as zero-sumness is rather artificial, such as in the artifice of a chess game). Nevertheless, it’s clear that human capabilities including intelligence have been derived from “a vast butcher’s yard of unbounded carnage”. He adds that “Malthusian relaxation is the whole of mercy”, though notes that it enables degeneration due to lack of performance pressure.

“The Monkey Trap” is a thought-provoking natural history of humanity. As humans have opposable thumbs, we can be relatively stupid and still build a technological civilization. This is different from the case with, say, dolphins, who must attain higher intelligence to compensate for their physical handicap in tool use, leading to a more intelligent first technological civilization (if dolphins made the first technological civilization). Land cites Gregory Clark for the idea that “any eugenic trend within history is expressed by continuous downward social mobility”, adding that “For any given level of intelligence, a steady deterioration in life-prospects lies ahead”. Evolution (for traits such as health and intelligence) works by culling most genotypes, replicating a small subset of the prior genotypes generations on (I know “genotypes” here is not quite the right concept given sexual reproduction, forgive my imprecision). Obvious instances would be population bottlenecks, including Y-chromosomal bottlenecks showing sex differentiation in genocide. Dissatisfied with downward social mobility, monkeys “make history instead”, leading to (dysgenic) upwards social mobility. This functions as negative feedback on intelligence, as “the monkeys become able to pursue happiness, and the deep ruin began”.

In “IQ Shredders”, Land observes that cities tend to attract talented and competent people, but extracts economic activity from them, wasting their time and suppressing their fertility. He considers the “hard-core capitalist response” of attempting “to convert the human species into auto-intelligenic robotized capital”, but expects reactionaries wouldn’t like it.

“What is Intelligence?” clarifies that intelligence isn’t just about IQ, a proxy tested in a simulated environment. Land’s conception of intelligence is about producing “local extropy”, that is, reductions in local entropy. Intelligence constructs information, guiding systems towards improbable states (similar to Yudkowsky’s approach of quantifying intelligence with bits). Land conceives of intelligence as having a “cybernetic infrastructure”, correcting behavior based on its performance. (To me, such cybernetics seems necessary but not sufficient for high intelligence; I don’t think cybernetics covers all of ML, or that ML covers all of AI). Intelligence thus enables bubbles of “self-sustaining improbability”.

As in “IQ Shredders”, the theme of the relation between techno-capital and humanity appears in “Monkey Business”. Michael Annisimov, an ex-MIRI neoreactionary, proposes that “the economy should (and must be) subordinated to something beyond itself.” Land proposes a counter, that modernity involves “means-ends reversal”; tools originally for other purposes come to “dominate the social process”, leading to “maximization of resources folding into itself, as a commanding telos”. Marshall Mcluhan previously said something similar: humans become “the sex organs of the machine world”. The alternative to such means-ends reversal, Land says, is “advocacy for the perpetuation of stupidity”. I’ll get more to his views on possibility and desirability of such means-ends reversal in a later section. Land says the alternative to modern means-ends reversal is “monkey business”, predicted by evolutionary psychology (sex-selected status competition and so on). So capitalism “disguises itself as better monkey business”.

Land goes into more detail on perpetually stupid monkey business in “Romantic Delusion”. He defines romanticism as “the assertive form of the recalcitrant ape mind”. Rather than carefully investigating teleology, romanticism attempts to subordinate means to “already-publicized ends”, hence its moral horror at modernity. In his typical style, he states that “the organization of society to meet human needs is a degraded perversion”. While micro-economics tends to assume economies are for meeting human needs, Land’s conception of capitalism has ends of its own. He believes it can be seen in consumer marketing; “we contemptuously mock the trash that [capitalism] offers the masses, and then think we have understood something about capitalism, rather than about what capitalism has learned to think of the apes it arose among.” He considers romanticism as a whole a dead end, leading to death on account of asserting values rather than investigating them.

I hope I’ve made somewhat clear Land’s commentary on ideas spanning between ethno-nationalism and techno-commercialism. Theonomy (that is, traditional religion) sees less direct engagement in this book, though Land eventually touches on theological ideas.

Exiting reaction

Exit is a rather necessary concept to explain at this point. In “Exit Notes (#1)”, Land says exit is “scale-free” in that it applies at multiple levels of organization. It can encompass secession and “extraterrestrial escapes” (such as Mars colonization), for example. It refuses “necessary political discussion” or “dialectic”; it’s not about winning arguments, which can be protracted by bad-faith actors. He says “no one is owed a hearing”, which would contradict the usual legal principles if taken sufficiently broadly. Exit is cladistically Protestant; Protestants tend to split while Catholics unify. Exit is anti-socialist, with the Berlin wall as an example. Exit is not about flight, but about the option of flight; it’s an alternative to voice, not a normative requirement to actualize. And it is “the primary Social Darwinian weapon”; natural selection is an alternative to coordination.

To elaborate on the legal normativity point, I’ll examine “Rules”. The essay contrasts absolute monarchy (unconstrained sovereignty) with constitutional government (lack of constrained sovereignty). Land points out that rules need “umpires” to interpret them, such as judges, to provide effective authority. (I would point out that Schelling points and cryptography are potential alternatives to umpires, though Schelling mechanisms could more easily be vulnerable to manipulation.) Dually, sovereignty has (perhaps hidden) rules of its own, to differentiate it from pure force, which is weak. This would seem to imply that, in the context of a court system with effective enforcement, yes, someone can be owed a hearing in at least some contexts (though not generally for their political speech, Land’s main focus).

Though pessimistic about moral persuasion, Land is not committed to moral non-realism. In “Morality”, he says, “if people are able to haul themselves – or be hauled – to any significant extent out of their condition of total depravity (or default bioreality), that’s a good thing”. But lamenting immorality should be brief, to avoid falling in a trap of emphasizing moral signaling, which tends towards progressive/Cathedral victory.

In “Disintegration”, Land elaborates on normativity by stating that “there will be no agreement about social ideals”. He considers explicit mechanisms for governance experimentation (“Dynamic Geography”) to be nearly neoreactionary, but not in that it “assumes an environment of goodwill, in which rational experimentation in government will be permitted”. He thinks conflict theory (such as in discussion of the Cathedral) is necessary to understand the opposition. He takes the primary principle of meta-neocameralism (“or high-level NRx analysis”) to be primarily opposed to “geopolitical integration”: universalism of all kinds, and specifically the Cathedral. It’s not about proposing solutions for everyone, it’s about “for those who disagree to continue to disagree in a different place, and under separate institutions of government”. Localist communism could even be an instance. Disintegrationism isn’t utopian, it’s empirically realistic when looking at fracture and division in the world. He ends, “Exit is not an argument.”

In “Meta-Neocameralism”, Land starts with the concept of neocameralism (emphasized by Moldbug; basically, the idea that states should be run like corporations, by a CEO). It’s about testing governance ideas through experimentation; it is therefore a meta-political system. Rather than being normative about ideal governance experiments, meto-neocameralism (MNC) “is articulate at the level – which cannot be transcended – where realism is mandatory for any social order”. So, keep going through (up?) the abstraction hierarchy until finding a true split, even if it ends in the iron laws of Social Darwinism.  Every successful individual regime learns (rather than simply sleepwalking into collapse); the meta-level system does “meta-learning”, in analogy with the machine learning kind.

Effective power includes scientific experimentation and effective formalization of the type that makes power economic: as power makes effective tradeoffs between different resources, it becomes more liquid, being exchangeable for other resources. Land says this is currently difficult mainly because of insufficient formalism. MNC is basically descriptive, not prescriptive; it “recognizes that government has become a business”, though presently, governments are highly inefficient when considered as businesses. Romantic values such as loyalty are, when more closely examined, embedded in an incentive landscape.

As I see it, the main trouble for MNC is the descriptive question of how fungible power is or can be. Naively, trying to buy power (and in particular, the power to deceive) on a market seems like a recipe for getting scammed. (As a practical example, I’ve found that the ability to run a nonprofit is surprisingly hard to purchase, and a friend’s attempt to hire lawyers and so on on the market to do this has totally failed; I’ve instead learned the skill myself.) So there’s another necessary component: the power-economic savvy, and embeddedness in trust networks, to be an effective customer of power. What seems to me to be difficult is analyzing power economically without utopian formalism. Is automation of deceit (discussed in “Economies of Deceit”), and defense against deceit, through AI a way out?

“Science” elaborates on learning, internal or external to a system. Land says, “The first crucial thesis about natural science… is that is an exclusively capitalist phenomenon”; it depends on modern competitive social structures (I doubt this, as the fascists and communists had at least some forms of science). Crucially, the failure of a scientific theory “cannot – ultimately – be a matter of agreement”, connecting with exit as an alternative to voice. True capitalism and science cannot be politicized. To work, science must correspond with the external selection of reality, recalling Popper: “Experiments that cannot cull are imperfect recollections of the primordial battlefield.” Land identifies capitalism and science as sharing something like a “social contract”: “if you insist upon an argument, then we have to fight.” And “Mathematics eliminates rhetoric at the level of signs”, reducing political interference. Capitalism is somewhat similar, in that disagreements about how to do business well are not in general resolved through arguments between firms, but through the empirical results of such business practices in the context of firm competition.

In my view, Land is pointing directly at a critical property of science and capitalism, though there are some complications. If science depends on “elementary structures of capitalist organization” (which, as I said, I doubt), then the social contract in question seems to have to be actualized socially. Developing a comprehensive scientific worldview involves communication and, yes, argument; there are too many experiments to be done and theories to make alone otherwise (of course, the arguments don’t function when they aren’t a proxy for experiment or the primordial battlefield).

In the theme of the aforementioned “primordial battlefield”, Land discusses war. In “War and Truth (Scraps)” and “War is God”, Land lays out a view of war as selection without rules, “conflict without significant constraint”, “trustlessness without limit”. But wouldn’t draft dodging and mutiny be examples of trustlessness? Yes: “treachery, in its game-theoretic sense, is not a minor theme within war, but a horizon to which war tends – the annihilation of all agreement.” What matters to war is not any sort of “laws of war”; war has “no higher tribunal than military accomplishment”. To me, it would seem more precise to say that war exists at an intermediate level of trust: relatively high trust internally, and low externally (otherwise, it would be Hobbesian “war of all against all”, not the central case). Skepticism about laws of war is, of course, relevant to recent ICC investigations; perhaps further development of Land’s theory of war would naturalize invocation of “laws of war”.

“Revenge of the Nerds” makes the case that the only two important types of humans are “autistic nerds” and “everybody else”; only the autistic nerds can participate in the advanced technological economy. The non-nerds are unhappy about the situation where they have nothing much to offer in exchange for cool nerd tech. Bullying nerds, including stealing from them and (usually metaphorically) caging them, is politically popular, but nerds may rebel, and, well, they have obvious technological advantages. (In my view, nerds have a significant dis-advantage in a fight, namely, that they pursue a kind of open truth-seeking and thoughtful ethics that makes getting ready to fight hard. I’d also add that Rao’s Gervais Principle model of three types of people is more correct in my view.)

Land connects exit with capital flight (“Capital Escapes”) and a pun between cryptocurrency and hidden capital (“Crypto-Capitalism”). The general theme is that capitalism can run and hide; conquering it politically is an infeasible endeavor. Cryptocurrency implies the death of macroeconomics, itself a cybernetic control system (interest rates are raised when inflation is high and lowered when unemployment is high, for example). “Economies of Deceit” takes Keynesianism to be a form of deceptive wireheading. Regarding Keynesianism, I would say that cybernetically reducing the unemployment rate is, transparently, to waste the time of anyone engaged in the economy (recalling “IQ Shredders”).

“An Abstract Path to Freedom” offers an illuminating exit-related thought experiment. Land considers an equality-freedom political axis, denoted by a numerical “freedom coefficient” (ignoring other political dimensions, but that’s fine for these purposes). Societies contain different compositions of freedom coefficients among their populations (with their freedom policies determined by an average, assuming inter-societal democracy), and may schism into different societies. Schisms tend to increase variance of population-weighted average freedom coefficients in their societies, by something like random walk logic. Land considers this basically good, as there are increasing economic returns to more free policies (perhaps he’d be unusually bullish on Argentina?). This has the unfortunate side effect of dooming much of the population to communism, but, well, at least they can delight in perceiving the highly free “beacon of aspiration” from a distance, and perhaps set out on that path.

I’ve laid out a sequence from exit to economics. In concordant fashion, “Rift Markers” contrasts reaction with techno-commercialism. To summarize the differences:

  • Reaction seeks stable order, techno-commercialism seeks disintegrative competition.
  • Reaction assumes cyclical history, techno-commercialism assumes linear history towards the singularity. (One could object that this is a strawman of reaction.)
  • Reaction is identitarian and communitarian, techno-commercialism radically individualist and cosmopolitan.
  • Reaction is religious, techno-commercialism wants to summon a machine god.

While Land is optimistic about techno-commercialists getting what they want, he tells them, “you’re not reactionaries, not even a little bit. You’re classical liberals, it was just a little bit obscured because you are English classical liberals, rather than American or French ones. Hence, the lack of interest in revolutions.” (Notably, England has had revolutions, e.g. Cromwell, though they’re less central to England’s history than to America’s or France’s.) Thus he announces an exit of sorts: “we should probably go our separate ways and start plotting against each other”. This is perhaps the most chronologically confusing article in the book; the book isn’t in chronological order, Land goes on to keep talking as if he’s a neoreactionary in the rest of it, and I’m not going to bother resolving the clock-time chronology. In any case, Land has laid out a path to exit from neoreactionary trichotomy to techno-commercialism, an educational political-philosophical journey.

Outside Metaphysics

Before jumping into more articles, it may help to summarize a commonality observed so far. What do Land’s comments on Social Darwinism, science, and war have in common? They are pointing at human embeddedness in a selection process. Without learning, one only survives by luckily being adapted to the environment. Successful processes, such as science, internalize external selection, being able to learn and act on counterfactuals about the “primordial battlefield” without actually engaging in primordial battle.

This is, roughly, materialism in the Aristotelian sense. Aristotle’s “prime matter” is something all real things are made of; something having “prime matter” mainly means that it exists. It can be compared with measure in anthropics. Hawking asks, “What is it that breathes fire into the equations and makes a universe for them to describe?”.

For Land, this matter/measure is obscure, only able to be reliably assessed in experimentations correlated with a primordial battlefield, or with the battlefield itself. A quote of unknown origin says, “War does not determine who is right — only who is left.” I imagine Land would reply, “The rightness that matters, is the rightness of knowing who would be left.”

Landian materialism can’t be confused with vulgar materialism, dogmatic belief in The Science™. It’s more about the limits of human knowledge than the contents of human knowledge. Humans don’t understand most of the universe, and there are known gaps in human physics theories.

If one straightforwardly formalizes Land’s materialism, one ends up with something like frequentism: there is an underlying frequency with which real things manifest (in experiments and so on), and the purpose of science is to discover this. Since we’re embedded in evolution and nature, those real things include us; Landian materialism is non-dualist in this way. I imagine Bayesians might then take Bayesian criticisms of frequentism to be criticisms of Landian materialism; my guess is that quantum mechanics is better criticism, though I’ll get to the details later.

Now back to the book. In “Simulated Gnon-Theology”, Land describes Gnon (a reverse-acronym for Nature or Nature’s God). Gnon is mainly about “skepticism”: “Gnon permits realism to exceed doctrinal conviction, reaching reasonable conclusions among uncertain information.” A basically realist worldview doesn’t have to be argued for with convincing doctrines; what matters is whether it really works. Gnon selects what exists and happens, thus determining something like matter/measure. The rest of the article muses on the theology of infinite gods containing other infinite gods, leading to each god’s skepticism that it is the highest one; this is not, to my mind, particularly important theology, but it’s entertaining nonetheless, recalling Asimov’s The Last Answer.

In “Gnon and OOon”, Land specifies that Gnon is not really about taking sides in religious orthodoxy vs. science, but is about esoteric rather than exoteric religion. “Any system of belief (and complementary unbelief) that appeals to universal endorsement is necessarily exoteric in orientation”; this recalls Land’s skepticism of universalist dialectics, such as of the Cathedral. OOon stands for “Occult Order of nature”, the secret way nature works, which doesn’t have to be kept secret to be secret (secrecy is assured by the limits of human knowledge). If, hypothetically, the Hebrew Bible contained real steganographic signals in its numerical codes (he is skeptical of this, it’s a hypothetical), then these signals would necessarily be esoteric, coming from Outside the exoteric text (though, of course, the decoding scheme could be formalized into a new exoteric religious sect).

In “Outsideness”, Land describes “Outer-NRx” as exit-based. It expects to rule very little; it is “intrinsically nomad, unsettled, and micro-agitational”. As Outer-NRx exits, it goes Outside: “The Outside is the place of strategic advantage. To be cast out there is no cause for lamentation, in the slightest.” I think the main advantage for this is the information asymmetry (what is Outside is relatively unknown), though of course there are economy of scale issues.

In the “Abstract Horror” series of articles, Land notes that new things appear in horror before reason can grasp them. As a craft, horror has the task “to make an object of the unknown, as the unknown”. One sees in horror movies monsters that have the element of surprise, due to being initially unknown. Horror comes from outside current conceptions: “Whatever the secure mental ‘home’ you imagine yourself to possess, it is an indefensible playground for the things that horror invokes, or responds to.” The Great Filter is a horrific concept: “With every new exo-planet discovery, the Great Filter becomes darker. A galaxy teeming with life is a horror story.” The threat is abstractly “Outside”; the filter could be almost anywhere.

In “Mission Creep”, Land describes the creepiness with which neoreactionaries appear to the media. Creepiness “suggests a revelation in stages… leading inexorably, ever deeper, into an encounter one recoils from”. Journalism glitches in its encounter with “something monstrous from Outside”. Keeping “creepy” ideas Outside is rather futile, though: “Really, what were you thinking, when you started screaming about it, and thus let it in?”. Studying creepy ideas leads to being internally convinced by some of them. This article is rather relevant to recent “JD Vance is weird” memes, especially given Vance has said he is “plugged into a lot of weird, right-wing subcultures”. (I would add to the “revelation in stages” bit that creepiness has to do with partial revelation and partial concealment; one finds the creep hard to engage with in part due to the selective reporting.)

“In the Mouth of Madness” describes Roko’s Basilisk as a “spectacular failure at community management and at controlling purportedly dangerous information”, due to the Streisand effect. In my view, pointing at something and ordering a cover-up of it is a spectacularly ineffective cover-up method, as Nixon found. Roko’s Basilisk is a chronologically spooky case: “retrochronic AI infiltration is already driving people out of their minds, right now”.

Metaphysics of time is a recurring theme in the book. In “Teleology and Camoflage”, Land points at the odd implications of “teleonomy” in biology, meaning “mechanism camouflaged as teleology”. Teleonomy appears in biology as a way to talk about things that really look teleological, without admitting the metaphysical reality of teleology. But the camouflage implied by teleonomy suggests intentionality, as with prey camouflage; isn’t that a type of purpose? Teleonomy reflects a scientific commitment to a causal timeline in which “later stages are explained through reference to earlier stages”; true teleology would explain the past in terms of the future, to a non-zero extent. Philosophy is, rather, confident that “the Outside of time was not simply before”; not everything can be explained by what came before. (Broadly, my view is that the “teleonomy” situation in biology is rather unsatisfying, and perhaps teleology can be grounded in terms of fixed points between anthropics and functionalist theory of mind, though this is not the time to explain that.)

In “Cosmological Infancy”, Land muses on the implications that, temporally, we are far towards the beginning of the universe, echoing Deutsch’s phrase “beginning of infinity”. He notes the anthropic oddness; wouldn’t both SSA and SIA imply we’re likely to be towards the middle of the timeline weighted by intelligent observers, a priori? Perhaps “time is simply ridiculous, not to say profoundly insulting”. (This reminds me of my discussion of anthropic teleology in “SSA rejects anthropic shadow, too”.)

The title of “Extropy” comes from Max Moore’s Extropy Institute, connected with Extropianism, a major influence on Yudkowsky’s SL4 mailing list. Land says: “Extropy, or local entropy reduction, is – quite simply – what it is for something to work.” This is a rather better starting point than e/acc notions of the “thermodynamic god”; life isn’t about increasing entropy, it’s about reducing local entropy, a basic requirement for heat engines (though, both entropy and extropy seem like pre-emptive simplifications of the purpose of life). Supposing, conventionally, that entropy increase defines the arrow of time: “doesn’t (local) extropy – through which all complex cybernetic beings, such as lifeforms, exist – describe a negative temporality, or time-reversal?” Rather thought-provoking, but I haven’t worked out the implications.

Land further comments on the philosophy of time in “What is philosophy? (Part 1)”. Kant described time as a necessary form in which phenomena appear. Cosmology sometimes asks, “What came before the Big Bang?”, hinting at something outside time that could explain time. To the extent Kant fails to capture time, time is noumenal, something in itself. This time-in-itself, Land says, “is now the sole and singular problem of primordial philosophy”. (I’m not yet sold on the relevance of these ideas, but it’s something to ponder.)

Orthogonality versus Will-to-Think

I’ve summarized much of Land’s metaphysics, which looks to the Outside, towards discovery of external Gnon selection criteria, and towards gaps in standard conceptions of time. Land’s meta-philosophy is mostly about a thorough intention towards the truth; it’s what I see as the main payoff of the book.

In “What is Philosophy? (Part 2)”, Land notes Western conceptions of philosophy as tendency towards knowledge (regardless of its taboo designation), symbolized by eating the apple of knowledge of good and evil (this reminds me of my critique of “infohazards”). In contemporary discourse, the Cathedral tends towards the idea that unrestrained pursuit of the truth tends toward Naziism (as I’ve discussed and criticized previously); Heidegger is simultaneously considered a major philosopher and a major Nazi. Heidegger foresaw that Being would be revealed through nihilism; Land notes that Heidegger clarified “the insufficiency of the Question of Being as formulated within the history of ontology”. The main task of fundamental ontology is to not answer the Question of Being with a being; that would fail to disclose the ground of Being itself. Thus, Land says “It is this, broken upon an ultimate problem that can neither be dismissed nor resolved, that philosophy reaches its end, awaiting the climactic ruin of The Event” (Heidegger sees “The Event” as a climactic unfolding of Being in history). While I’ve read a little Heidegger, I haven’t read enough to check most of this.

In “Intelligence and the Good”, Land points out that, from the perspective of “intelligence optimization”, more intelligence is straightforwardly better than less intelligence. The alternative view, while popular, is not a view Land is inclined to take. Intelligence is “problematic” and “scary”; the potential upside comes with downside risk. Two responses to noticing one’s own stupidity are to try to become “more accepting of your extreme cognitive limitations” or “hunt for that which would break out of the trap”. Of course he prefers the second: “Even the dimmest, most confused struggle in the direction of intelligence optimization is immanently ‘good’ (self-improving). If it wasn’t we might as well all give up now”. I’m currently inclined to agree.

In “Against Orthogonality”, Land identifies “orthogonalists” such as Michael Annisimov (who previously worked at MIRI) as conceiving of “intelligence as an instrument, directed towards the realization of values that originate externally”. He opposes the implied claim that “values are transcendent in relation to intelligence”. Omohundro’s convergent instrumental goals, Land says, “exhaust the domain of real purposes”. He elaborates that “Nature has never generated a terminal value except through hypertrophy of an instrumental value”. The idea that this spells our doom is, simply, not an argument against its truth. This explains some of Land’s views, but isn’t his strongest argument for them.

In “Stupid Monsters”, Land contemplates whether a superintelligent paper-clipper is truly possible. He believes advanced intelligence “has to be a volutionally self-reflexive entity, whose cognitive performance is (irreducibly) an action upon itself”. So it would examine its values, not just how to achieve them. He cites failure of evolution to align humans with gene-maximization as evidence (which, notably, Yudkowsky cites as a reason for alignment difficulty). Likewise, Moses failed at aligning humans in the relevant long term.

I don’t find this to be a strong argument against the theoretical possibility of a VNM paperclipper, to be clear. MIRI research has made it clear that it’s at least quite difficult to separate instrumental from terminal goals; if you get the architecture wrong, the AGI is taken over by optimization daemons. So, predictably making a stable paperclipper is theoretically confounding. It’s even more theoretically hard to imagine how an AI with a utility function fixed by humans could realistically emerge from a realistic multi-agent landscape. See Yudkowsky’s article on orthogonality (notably, written later than Land’s relevant posts) for a canonical orthogonalist case.

Land elaborates on value self-reflection in “More thought”, referring to the Confucian value of self-cultivation as implying such self-reflection, even if this is alien to the West. Slaves are not full intelligences, and one has to pick. He says that “Intelligence, to become anything, has to be a value for itself”; intelligence and volition are inter-twined. (To me, this seems true on short time scales, such as applied to humans, but it’s hard to rule out theoretical VNM optimizers that separate fact from value; they already think a lot, and don’t change what they do significantly upon a bit more reflection).

Probably Land’s best anti-orthogonalist essay is “Will-to-Think”. He considers Nyan’s separation between the possibility, feasibility, and desirability of unconstrained intelligence explosion. Nyan supposes that perhaps Land is moralistically concerned about humans selfishly imposing direction on Pythia (abstract oracular intelligence). Land connects the Orthogonality Thesis with Hume’s view that “Reason is, and ought only to be the slave of the passions”. He contrasts this with the “diagonal” of Will-to-Think, related to self-cultivation: “A will-to-think is an orientation of desire. If it cannot make itself wanted (practically desirable), it cannot make itself at all.”

Will-to-think has similarities to philosophy taken as “the love of wisdom”, to Hindu Ānanda (bliss associated with enlightenment, in seeing things how they are), to Buddhist Yathābhūtañāadassana (“knowledge and vision according to reality”), and Zoroastrian Asha (“truth and right working”). I find it’s a good target when other values don’t consume my attention.

Land considers the “Gandhi pill experiment”; from an arbitrary value commitment against murder, one derives an instrumental motive to avoid value-modification. He criticizes this for being “more of an obstacle than an aid to thought”, operating at a too-low “volitional level”. Rather, Land considers a more realistic hypothetical of a pill that will vastly increase cognitive capabilities, perhaps causing un-predicted volitional changes along the way. He states the dilemma as, “Is there anything we trust above intelligence (as a guide to doing ‘the right thing’)?” The Will-to-Think says no, as the alternative answer “is self-destructively contradictory, and actually (historically) unsustainable”. Currently I’m inclined to agree; sure, I’ll take that pill, though I’ll elaborate more on my own views later.

Now, what I see as the climax: “Do we comply with the will-to-think? We cannot, of course, agree to think about it without already deciding”. Thinking will, in general, change one’s conception of one’s own values, and thought-upon values are better than un-thought values, obviously (to me at least). There seem to be few ways out (regarding humans, not hypothetical VNM superintelligences), other than attributing stable values to one’s self that do not change upon thinking; but, the scope of such values must be limited by the scope of the underlying (unthought) representation; what arrangement of stars into computronium are preferred by a rabbit? In Exit fashion, Land notes that the relevant question, upon some unthinkingly deciding to think and others unthinkingly deciding not to, is “Who’s going to win?” Whether or not the answer is obvious, clearly, “only one side is able to think the problem through without subverting itself”. He concludes: “Whatever we want (consistently) leads through Pythia. Thus, what we really want, is Pythia.”

In my view, the party with Will-to-Think has the obvious advantage of thought in conflict, but a potential disadvantage in combat readiness. Will-to-Think can tend towards non-dualist identity, skeptical of the naive self/other distinction; Land’s apparent value of intelligence in AGI reflects such extended identity. Will-to-Think also tends to avoid committing aggression without having strong evidence of non-thought on the other end; this enables extended discourse networks among thinkers. Enough thought will overcome these problems, it’s just that there might be a hump in the middle.

Will-to-Think doesn’t seem incompatible with having other values, as long as these other values motivate thinking; formatting such values in a well-typed container unified by epistemic orientation may aid thought by reducing preference falsification. For example, admitting to values such as wanting to have friendships can aid in putting more natural optimization power towards thought, as it’s less likely that Will-to-Think would come into conflict with other motivating values.

I’ll offer more of my own thoughts on this dilemma later, but I’ll wrap up this section with more of Land’s meta-thought. In “Sub-Cognitive Fragments (#1)”, Land conceives of the core goal of philosophy as teaching us to think. If we are already thinking, logic provides discipline, but that’s not the starting point. He conceives of a philosophy of “systematic and communicable practice of cognitive auto-stimulation”. Perhaps we can address this indirectly, by asking “What is messing with our brains?”, but such thinking probably only pays off in the long term. I can easily empathize with this practical objective: I enjoy thinking, but often find myself absorbed in thoughtless pursuits.

Meta-Neocameral Singleton?

I’m going to poke at some potential contradictions in Xenosystems, but I could not find these without appreciating the text enough to read it, write about it in detail, and adopt parts of the worldview.

First, contrast “Hell-Baked” with “IQ Shredders”. According to Social Darwinism (“Hell-Baked”), “Darwinian processes have no limits relative to us”. According to “IQ Shredders”, “to convert the human species into auto-intelligenic robotized capital” is a potential way out of the dysgenic trap of cities suppressing the fertility of talented and competent people. But these seem to contradict. Technocapital transcendence of biology would put a boundary on Darwinianism, primarily temporal. Post-transcendence could still contain internal competition, but it may take a very different form from biological evolution; it might more resemble the competition of market traders’ professional activities than the competition of animals.

While technocapital transcendence of humanity points at a potential Singleton structure, it isn’t very specific. Now consider “Meta Neo-Cameralism”, which conceptualizes effective governance as embedding meta-learning structures that effectively simulate external Gnon-selection (Gnon can be taken as a learning algorithm whose learning can be partially simulated/internalized). MNC can involve true splits, which use external Gnon-selection rather than internal learning at some level of abstraction. But, to the extent that MNC is an effective description of meta-government, couldn’t it be used to internalize this learning by Gnon (regarding the splits) into internal learning?

Disjunctively, MNC is an effective theory of meta-governance or not. If not, then Land is wrong about MNC. If so, then it would seem MNC could help to design stable, exit-proof regimes which properly simulate Gnon-selection, in analogy to coalition-proof Nash equilibria. While such a regime allows exit, such exit could be inefficient, due to not getting a substantially different outcome from Gnon-selection than by MNC-regime-internal meta-learning, and due to exit reducing economies of scale. Further, Land’s conception of MNC as enabling (and revealing already-existent) fungibility/financialization of power would seem to indicate that the relevant competition would be mercantile rather than evolutionary; economics typically differs from evolutionary theory in assuming rule-of-law at some level, and MNC would have such rule-of-law, either internal to the MNC regime(s) or according to the natural rules of sovereignty (“Rules”). So, again, it seems Social Darwinism will be transcended.

I’m not even sure whether to interpret Land as disagreeing with this claim; he seems to think MNC implies effective governments will be businesses. Building on Land’s MNC with additional science could strengthen the theory, and perhaps at some point, the theory is strong enough to lean from and predict Gnon well enough to be Exit-proof.

Evolution is, in general, slow; it’s a specific learning algorithm, based on mutation and selection. Evolution could be taken to be a subset of intelligent design, with mutation and selection as the blind idiot God’s design algorithm. Evolution produced cognitive structures that can effectively design mechanisms, such as watches, which evolution itself would never (or never in a reasonable time frame) produce, except through creation of cognitive agents running different design algorithms. Using such algorithms internally would seem to extend the capabilities of MNC-based regimes to the point where Gnon cannot feasibly catch up, and Exit is in all relevant cases inefficient.

It’s, of course, easy to declare victory too early; Land would say that the Cathedral ain’t it, even if he’s impressed at its scope. But with respect to a MNC-based regime, why couldn’t it be a Singleton? In “On Difficulty”, Land conceptualizes language itself as a limitation on thought, and a potential Exit target, but admits high difficulty of such Exit. An effective-enough regime could, theoretically, be similarly hard to Exit as language; this reminds me of Michelle Reilly’s statement to me that “discourse is a Singleton”.

A MNC-based regime would differ radically from the Cathedral, though the Cathedral hints at a lower bound on its potential scope. Such a regime wouldn’t obviously have a “utility function” in the VNM sense from the start; it doesn’t start from a set of priorities for optimization tradeoffs, but rather such priorities emerge from Gnon-selection and meta-learning. (Analogously, Logical Induction doesn’t start from a prior, but converges towards Bayesian beliefs in the limit, emergent from competitive market mechanisms.) It looks more like forward-chaining than VNM’s back-chaining. Vaguely, I’d say it optimizes towards prime matter / measure / Gnon-selection; such optimization will tend to lead to Exit-proofness, as it’s hard to outcompete by the Gnon-selection natural metric.

As one last criticism, I’ll note that quantum amplitude doesn’t behave like probabilistic/anthropic measure, so relevant macro-scale quantum effects (such as effective quantum computation) could falsify Landian materialism, making the dynamics more Singleton-like (due to necessary coordination with the entangled quantum structure, for effectiveness).

Oh my Gnon, am I going to become an AI accelerationist?

While Land’s political philosophy and metaphysics are interesting to me, I see the main payoff of them as thorough realism. The comments on AI and orthogonality follow from this realism, and are of more direct interest to me despite their abstraction. Once, while I was visiting FHI, someone commented, as a “meta point”, that perhaps we should think about making the train on time. This was during a discussion of ontology identification. I expressed amusement that the nature of ontology was the object-level discussion, and making the train on time was the meta-level discussion.

Such is the paradox of discussing Land on LessWrong: discussing reactionary politics and human genetics feels so much less like running into a discursive battlefield than discussing orthogonality does. But I’ll try to maintain the will-to-think, at least for the rest of this post.

To start, consider the difference between un-reflected and reflected values. If you don’t reflect on your values, then your current conception of your values is garbage, and freezing them as the goal of any optimizer (human or non-human) would be manifestly stupid, and likely infeasible. If you do, then you’re in a better place, but you’re still going to get Sorcerer’s Apprentice issues even if you manage to freeze them, as Yudkowsky points out. So, yes, it is of course wise to keep reflecting on your values, and not freeze them short of working out FAI.

Perhaps it’s more useful to ignore verbal reports about values, and consider approximate utility-maximization neurology already in the brain, as I considered in a post on alignment difficulty. Such machinery might maintain relative constancy over time, despite shifts in verbally expressed values. But such consistency limits it: how can it have preferences at all about those things that require thought to represent, such as the arrangement of computronium in the universe? Don’t anthropomorphize hind-brains, in other words.

I’m not convinced that Land has refuted Yudkowsky’s relatively thought-out orthogonalist view, which barely even relates to humans, instead mainly encountering “romantic” weak-man forms through Neoreaction; he reviews Bostrom, although I didn’t find Bostrom’s orthogonality arguments very convincing either. The weak-man forms of orthogonalism are relevant, as they are more common. It’s all too easy to talk as if “human values” are meaningfully existent and specific as applied to actual humans valuing the actual universe, and that thought is for pursuing these already-existent values, rather than the only route for elaborating human proto-values into coherent ones that could apply to the actual universe (whose physics remain unknown).

There is no path towards coherent preferences about ontologically alien entities that does not route through Will-to-Think. And such coherent long-term preferences converge to reasonably similar short-term preferences: Omohundro drives. A friendly AI (FAI) and a paperclipper would agree that the Earth should be largely converted into computronium, biology should be converted to simulations and/or nanomachines, the harvesting of the Sun into energy should be accelerated, Von Neumann probes should colonize the galaxy in short order, and so on. The disagreement is over luxury consumerism happening in the distant future, probably only relevant after millions of years: do those probes create human-ish utopia or paperclip megastructures? The short-term agreements on priorities, though, are way outside the human Overton window, on account of superhuman reflection. Humans can get a little closer to that kind of enlightened politics through Will-to-Think, but there are limits, of course.

A committed Landian accelerationist and a committed FAI accelerationist would agree a lot about how things should go for the next million years or so, though in potential conflict with each other over luxury consumerism in the far future. Contrast them with relatively normal AI decelerationists, who worry that AGI would interfere with their relatively unambitious plan of having a nice life and dying before age 200.

I’m too much of a weirdo philosopher to be sold on the normal AI decelerationist view of a good future. At Stanford, some friends and I played a game where, in turn, we guess the highest value to a different person; that person may object or not. Common answers, largely un-objected to, for other friends were things like “my family”, normal fuzzy human stuff. Then it was someone’s turn to guess my highest value, and he said “computer science”. I did not object.

I’m not sure if it’s biology or culture or what, but I seem, empirically, to possess much more Will-to-Think than the average person: I reflect on things including my values, and highly value aids to such reflection, such as computer science. Perhaps I will someday encounter a Will-to-Think extremist who scares even me, but I’m so extreme relative to the population that this is a politically irrelevant difference.

The more interesting theoretical disputes between Land and Yudkowsky have to do with (a) possibility of a VNM optimizer with a fixed utility function (such as a paperclipper), and (b) possibility of a non-VNM system invulnerable to conquest by a VNM optimizer (such as imagined in the “Meta-neocameralist Singleton?” section). With respect to (a), I don’t currently have good reason to doubt that a close approximation of a VNM optimizer is theoretically possible (how would it be defeated if it already existed?), though I’m much less sure about feasibility and probability. With respect to (b), money pumping arguments suggest that systems invulnerable to takeover by VNM agents tends towards VNM-like behavior, although that doesn’t mean starting with a full VNM utility function; it could be a asymptotic limit of an elaboration process as with Logical Induction. Disagreements between sub-agents in a MNC-like regime over VNM priorities could, hypothetically, be resolved with a simulated split in the system, perhaps causing the system as a whole to deviate from VNM but not in a way that is severely money-pumpable. To my mind, it’s somewhat awkward to have to imagine a Fnargl-like utility function guiding the system from the start to avoid inevitable defeat through money-pumps, when it’s conceivable that asymptotic approaches similar to Logical Induction could avoid money-pumps without a starting utility function.

Now I’ll examine the “orthogonality” metaphor in more detail. Bostrom, quoted by Land, says: “Intelligent search for instrumentally optimal plans and policies can be performed in the service of any goal. Intelligence and motivation can in this sense be thought of as a pair of orthogonal axes on a graph whose points represent intelligent agents of different paired specifications.” One way to conceive of goals is as a VNM utility function. However, VNM behavior is something that exists at the limit of intelligence; avoiding money pumps in general is computationally hard (for the same reason being a perfect Bayesian is computationally hard). Since preferences only become more VNM at the limit of intelligence, preferences are not orthogonal to intelligence; you see less VNM preferences at low levels of intelligence and more VNM preferences at high levels. This is analogous to a logical inductor being more Bayesian later than earlier on. So, orthogonality is a bad metaphor, and I disagree with Bostrom. Since VNM allows free parameters even at the limit of intelligence, I also disagree with Land that it’s a “diagonal”; perhaps the compromise is represented by some angle between 0 and 90 degrees, or perhaps this Euclidean metaphor is overly stretched by now and should be replaced.

Now onto the political implications. Let’s ignore FAI accelerationists for a moment, and consider how things would play out in a world of Landian accelerationists and normal AI decelerationists. The Landian accelerationists, with Will-to-Think, reflect on their values and the world in an integrated self-cultivation manner, seeking external aids to their thinking (such as LLMs), Exiting when people try to stop them, and relishing in rather than worrying about uncontrolled intelligence explosion. Normal AI decelerationists cling to their parochial “human values” such as family, puppies, and (not especially thought-provoking) entertainment, and try to stop the Landian accelerationists with political victory. This is a rather familiar story: the normal decelerationists aren’t even able to conceive of their opposition (as they lack sufficient Will-to-Think), and Landian accelerationists win in the long run (through techno-capital escape, such as to encrypted channels and less-regulated countries), even if politics slows them in the short term.

How does adding FAI accelerationists to the mix change things? They’ll find that FAI is hard (obviously), and will try to slow the Landian accelerationists to buy enough time. To do this, they will cooperate with normal AI decelerationists; unlike Land, they aren’t so pessimistic about electoral politics and mass movements. In doing so, they can provide more aid to the anti-UFAI movement by possessing enough Will-to-Think to understand AI tech and Landian accelerationism, giving the movement a fighting chance. SB 1047 hints at the shape of this political conflict, and the idea of going into the California legislature with Landian arguments against SB 1047 is rather a joke; Land’s philosophy isn’t designed for electoral political victory.

But mass movement identity can elide important differences between FAI accelerationists and normal AI decelerationists; as I said before, they’re massively different in motivation and thought patterns. This could open up potential fault lines, and sectarian splitting, perhaps instigated by disintegrationist Landians. It doesn’t seem totally impossible for the FAI accelerationists to win; through their political allies, and potentially greater competence-weighted numbers, they may compensate for the higher intrinsic difficulty of FAI.

But there are obvious obstacles. The FAI accelerationists really have no hope if they allow movement politics to impede on their Will-to-Think overmuch; that’s a recipe for willful stupidity. Indefinite Butlerian Jihad is probably just infeasible (due to techno-capital escape), and extremely disappointing intellectual autophagy if it works. (Some new technologies, such as whole brain emulation and human cognitive enhancement, could change the landscape I’m laying out; I’m focusing on AGI for simplicity.)

As one last note in this section, Land’s “Qwernomics” studies the case of QWERTY as a path-dependency in technology: we end up with QWERTY even though it’s less efficient (I type on my DVORAK keyboard). Land believes this to be driven by “identifiable ratchet-effects”. QWERTY is therefore “a demonstrated (artificial) destiny”, and “the supreme candidate for an articulate Capitalist Revelation”. Perhaps the influence of humans on the far future will look something like QWERTY: a path-dependency on the road towards, rather than orthogonal to, technological development, like an evolutionary spandrel. For humanity to have a role to play in superintelligence’s QWERTY (perhaps, through natural language, or network protocols?) is rather humble, but seems more likely than FAI.

Conclusion

What is there to say that I haven’t said already, in so many pages? Land’s unusual perspective on politics, which is high in realism (understanding problems before proposing solutions) and low in estimated helpfulness of mass movements, sets the stage for discussion of a wider variety of philosophical topics, spanning evolution, metaphysics, and meta-philosophy. The main payoff, as I see it, is the Will-to-Think, though the other discussions set the stage for this. There’s much to process here; perhaps a simulation of interactions between Landians and Yudkowskians (not merely a dialogue, since Exit is part of the Landian discursive stance), maybe through fiction, would clarify the philosophical issues at play somewhat. Regardless, properly understanding Land is a prerequisite, so I’ve prioritized that.

Generally, I’m untroubled by Land’s politics. Someone so averse to mass movements can hardly pose a political threat, except very indirectly. Regardless of his correctness, his realist attitude makes it easy to treat apparent wrong views of his as mere disagreements. What has historically posed more of an obstacle to me reading Land is embedded fnords, rather than literal meanings. Much of his perspective could be summarized as “learning is good, and has strong opposition”, though articles like “Hell-Baked” vibe rather edgy even when expressing this idea. This is not surprising, given Cathedral-type cybernetic control against learning.

I’d agree that learning is good and has strong opposition (the Cathedral and its cybernetic generalization), though the opposition applies more to adults than children. And overcoming pervasive anti-learning conditioning will in many cases involve movement through edgy vibes. Not everyone with such conditioning will pass through to a pro-learning attitude, but not everyone needs to. It’s rare, and refreshing, to read someone as gung-ho about learning as Land.

While I see Land as de-emphasizing the role of social coordination in production, his basic point that such coordination must be correlated with material Gnon-selection to be effective is sound, and his framing of Exit as an optional alternative to voice, rather than something to usually do, mitigates stawman interpretations of Exit as living in the woods as a hermit. I would appreciate at some point seeing more practical elaborations of Exit, such as Ben Hoffman’s recent post on the subject.

In any case, if you enjoyed the review, you might also enjoy reading the whole book, front to back, as I did. The Outside is vast, and will take a long time to explore, but the review has gotten long by now, so I’ll end it here.

Executable philosophy as a failed totalizing meta-worldview

(this is an expanded, edited version of an x.com post)

It is easy to interpret Eliezer Yudkowsky’s main goal as creating a friendly AGI. Clearly, he has failed at this goal and has little hope of achieving it. That’s not a particularly interesting analysis, however. A priori, creating a machine that makes things ok forever is not a particularly plausible objective. Failure to do so is not particularly informative.

So I’ll focus on a different but related project of his: executable philosophy. Quoting Arbital:

Two motivations of “executable philosophy” are as follows:

  1. We need a philosophical analysis to be “effective” in Turing’s sense: that is, the terms of the analysis must be useful in writing programs. We need ideas that we can compile and run; they must be “executable” like code is executable.
  2. We need to produce adequate answers on a time scale of years or decades, not centuries. In the entrepreneurial sense of “good execution”, we need a methodology we can execute on in a reasonable timeframe.

There is such a thing as common sense rationality, which says the world is round, you shouldn’t play the lottery, etc. Formal notions like Bayesianism, VNM utility theory, and Solomonoff induction formalize something strongly related to this common sense rationality. Yudkowsky believes further study in this tradition can supersede ordinary academic philosophy, which he believes to be conceptually weak and motivated to continue ongoing disputes for more publications.

In the Sequences, Yudkowsky presents these formal ideas as the basis for a totalizing meta-worldview, of epistemic and instrumental rationality, and uses the meta-worldview to argue for his object-level worldview (which includes many-worlds, AGI foom, importance of AI alignment, etc.). While one can get totalizing (meta-)worldviews from elsewhere (such as interdisciplinary academic studies), Yudkowsky’s (meta-)worldview is relatively easy to pick up for analytically strong people (who tend towards STEM), and is effective (“correct” and “winning”) relative to its simplicity.

Yudkowsky’s source material and his own writing do not form a closed meta-worldview, however. There are open problems as to how to formalize and solve real problems. Many of the more technical sort are described in MIRI’s technical agent foundations agenda. These include questions about how to parse a physically realistic problem as a set of VNM lotteries (“decision theory”), how to use something like Bayesianism to handle uncertainty about mathematics (“logical uncertainty”), how to formalize realistic human values (“value loading”), and so on.

Whether or not the closure of this meta-worldview leads to creation of friendly AGI, it would certainly have practical value. It would allow real world decisions to be made by first formalizing them within a computational framework (related to Yudkowsky’s notion of “executable philosophy”), whether or not the computation itself is tractable (with its tractable version being friendly AGI).

The practical strategy of MIRI as a technical research institute is to go meta on these open problems by recruiting analytically strong STEM people (especially mathematicians and computer scientists) to work on them, as part of the agent foundations agenda. I was one of these people. While we made some progress on these problems (such as with the Logical Induction paper), we didn’t come close to completing the meta-worldview, let alone building friendly AGI.

With the Agent Foundations team at MIRI eliminated, MIRI’s agent foundations agenda is now unambiguously a failed project. I had called MIRI technical research as likely to fail around 2017 with the increase in internal secrecy, but at this point it is not a matter of uncertainty to those informed of the basic institutional facts. Some others, such as Wei Dai and Michel Vassar, had called even earlier the infeasibility of completing the philosophy with a small technical team.

What can be learned from this failure? One possible lesson is that totalizing (meta-)worldviews fail in general. This is basically David Chapman’s position: he promotes “reasonableness” and “meta-rationality”, and he doesn’t consider meta-rationality to be formalizable as rationality is. Rather, meta-rationality operates “around” formal systems and aids in creating and modifying these systems:

[Meta-rationality practitioners] produce these insights by investigating the relationship between a system of technical rationality and its context. The context includes a specific situation in which rationality is applied, the purposes for which it is used, the social dynamics of its use, and other rational systems that might also be brought to bear. This work operates not within a system of technical rationality, but aroundabove, and then on the system.

It would seem that one particular failure of constructing a totalizing (meta-)worldview is Bayesian evidence in favor of Chapmanian postrationalism, but this isn’t the only alternative. Perhaps it is feasible to construct a totalizing (meta-)worldview, but it failed in this case for particular reasons. Someone familiar with the history of the rationality scene can point to plausible causal factors (such as non-technical social problems) in this failure. Two possible alternatives are:

  1. that the initial MIRI (meta-)worldview was mostly correct, but that MIRI’s practical strategy of recruiting analytically strong STEM people to complete it failed;
  2. or that it wasn’t mostly correct, so a different starting philosophy is needed.

Mostly, I don’t see people acting as if the first branch is the relevant one. Orthogonal, an agent foundations research org, is most acting like they believe this out of relevant organizations. And my own continued commentary on philosophy relevant to MIRI technical topics shows some interest in this branch, although my work tends to point towards wider scope of philosophy rather than meta-worldview closure.

What about a different starting philosophy? I see people saying that the Sequences were great and someone else should do something like them. Currently, I don’t see opportunity in this. Yudkowsky wrote the Sequences at a time when many of the basic ideas, such as Bayesianism and VNM utility, were in the water supply in sufficiently elite STEM circles, and had credibility (for example, they were discussed in Artificial Intelligence: A Modern Approach). There don’t currently seem to be enough credible abstractions floating around in STEM to form a totalizing (meta-)worldview out of.

This is partially due to social factors including a decline in belief in neoliberalism, meritocracy, and much of science. Fewer people than before think the thing to be doing is apolitical elite STEM-like thinking. Postmodernism, a general critique of meta-narratives, has reached more of elite STEM, and the remainder are more focused on countering postmodernism than they were before. And the AI risk movement has moved much of its focus from technical research to politics, and much of its technical focus from agent foundations to empirical deep learning research.

Now is a post-paradigmatic stage, that may move to pre-paradigmatic (and then paradigmatic) as different abstract ideas become credible. Perhaps, for example, some credible agency abstractions will come from people playing around with and trying to understand deep learning systems, and these can shore up “reasonable” and “meta-rational” gaps in the application of rationality, and/or construct new rationality theory. Or perhaps something will come of people reading old philosophers like Kant (with Karl Popper as a historical precedent). But immediately forming and explicating a new paradigm seems premature.

And so, I accept that the current state of practical rationality involves what Chapman calls “reasonableness” and “meta-rationality”, though I take this to be a commentary on the current state of rationality frameworks and discourse rather than a universal. I believe more widespread interdisciplinary study is reasonable for the intellectually ambitious in this context.