# A short conceptual explainer of Immanuel Kant’s Critique of Pure Reason

## Introduction

While writing another document, I noticed I kept referring to Kantian concepts. Since most people haven’t read Kant, that would lead to interpretation problems by default. I’m not satisfied with any summary out there for the purpose of explaining Kantian concepts as I understand them. This isn’t summarizing the work as a whole given I’m focusing on the parts that I actually understood and continue to find useful.

I will refer to computer science and statistical concepts, such as Bayesianism, Solomonoff induction, and AI algorithms. Different explainers are, of course, appropriate to different audiences.

Last year I had planned on writing a longer explainer (perhaps chapter-by-chapter), however that became exhausting due to the length of the text. So I’ll instead focus on what still stuck after a year, that I keep wanting to refer to. This is mostly concepts from the first third of the work.

This document is structured similar to a glossary, explaining concepts and how they fit together.

Kant himself notes that the Critique of Pure Reason is written in a dry and scholastic style, with few concrete examples, and therefore “could never be made suitable for popular use”. Perhaps this explainer will help.

## Metaphysics

We are compelled to reason about questions we cannot answer, like whether the universe is finite or infinite, or whether god(s) exist. There is an “arena of endless contests” between different unprovable assumptions, called Metaphysics.

Metaphysics, once the “queen of all the sciences”, has become unfashionable due to lack of substantial progress.

Metaphysics may be categorized as dogmatic, skeptical, or critical:

• Dogmatic metaphysics makes and uses unprovable assumptions about the nature of reality.
• Skeptical metaphysics rejects all unprovable assumptions, in the process ceasing to know much at all.
• Critical metaphysics is what Kant seeks to do: find the boundaries of what reason can and cannot know.

Kant is trying to be comprehensive, so that “there cannot be a single metaphysical problem that has not been solved here, or at least to the solution of which the key has not been provided.”  A bold claim.  But, this project doesn’t require extending knowledge past the limits of possible experience, just taking an “inventory of all we possess through pure reason, ordered systematically”.

## The Copernican revolution in philosophy

Kant compares himself to Copernicus; the Critique of Pure Reason is commonly referred to as a Copernican revolution in philosophy.  Instead of conforming our intuition to objects, we note that objects as we experience them must conform to our intuition (e.g. objects appear in the intuition of space).  This is sort of a reverse Copernican revolution; Copernicus zooms out even further from “the world (Earth)” to “the sun”, while Kant zooms in from “the world” to “our perspective(s)”.

## Phenomena and noumena

Phenomena are things as they appear to us, noumena are things as they are in themselves (or “things in themselves”); rational cognition can only know things about phenomena, not noumena.  “Noumenon” is essentially a limiting negative concept, constituting any remaining reality other than what could potentially appear to us.

Kant writes: “this conception [of the noumenon] is necessary to restrain sensuous intuition within the bounds of phenomena, and thus to limit the objective validity of sensuous cognition; for things in themselves, which lie beyond its province, are called noumena for the very purpose of indicating that this cognition does not extend its application to all that the understanding thinks. But, after all, the possibility of such noumena is quite incomprehensible, and beyond the sphere of phenomena, all is for us a mere void… The conception of a noumenon is therefore merely a limitative conception and therefore only of negative use. But it is not an arbitrary or fictitious notion, but is connected with the limitation of sensibility, without, however, being capable of presenting us with any positive datum beyond this sphere.”

It is a “problematical” concept; “the class of noumena have no determinate object corresponding to them, and cannot therefore possess objective validity”; it is more like a directional arrow in the space of ontology than like any particular thing within any ontology. Science progresses in part by repeatedly pulling the rug on the old ontology, “revealing” a more foundational layer underneath (a Kuhnian “paradigm shift”), which may be called more “noumenal” than the previous layer, but which is actually still phenomenal, in that it is cognizable through the scientific theory and corresponds to observations; “noumena”, after the paradigm shift, is a placeholder concept that any future paradigm shifts can fill in with their new “foundational” layer.

Use of the word “noumenon” signals a kind of humility, of disbelieving that we have access to “the real truth”, while being skeptical that anyone else does either.

In Bayesianism, roughly, the “noumenon” is specified by the hypothesis, while the “phenomena” are the observations.  Assume for now the Bayesian observation is a deterministic function of the hypothesis; then, multiple noumena may correspond to a single phenomenon.  Bayesianism allows for gaining information about the noumenon from the phenomenon.  However, all we learn is that the noumenon is some hypothesis which corresponds to the phenomenon; in the posterior distribution, the hypotheses compatible with the observations maintain the same probabilities relative to each other that they did in the prior distribution.

(In cases where the observation is not a deterministic function of the hypothesis, as in the standard Bayes’ Rule, consider replacing “hypothesis” above with the “(hypothesis, observation)” ordered pair.)

In Solomonoff Induction, there is only a limited amount we can learn about the “noumenon” (stochastic Turing machine generating our observations + its bits of stochasticity), since there exist equivalent Turing machines.

## A priori and a posteriori

A priori refers to the epistemic state possessed before taking in observations. In Bayesianism this is the P(X) operator unconditioned on any observations.

A posteriori refers to the epistemic state possessed after taking in observations. In Bayesianism this is P(X | O) where O refers to past observation(s) made by an agent, which may be numbered to indicate time steps, as in a POMDP.

While Kant and Hume agree that we can’t learn about universal laws from experience (due to Hume’s problem of induction), Hume concludes that this means we don’t know about universal laws, while Kant instead argues that our knowledge about universal laws must involve a priori judgements, e.g. geometric or arithmetic judgments. (One man’s modus ponens is another’s modus tollens…)

## Analytic and synthetic

Analytic propositions are ones that can be verified as true by expanding out definitions and doing basic formal operations. A common example is “All bachelors are unmarried”, which can be verified by replacing “bachelor” with “unmarried man”.

Synthetic propositions can’t be verified as true this way, e.g. “All bachelors are alone”. They can be true regardless. Causal judgments are synthetic, we can’t get the Principle of Sufficient Reason analytically.

Contemporary STEM people are likely to think something like this at this point: “Doesn’t that just mean analytic propositions are mathematical, and synthetic propositions are empirical/scientific?”

An immediate problem with this account: Kant doesn’t think geometric propositions are analytic.  Consider the proposition “A square is equal to itself when turned 90 degrees on its center”.  It’s not apparent how to verify the proposition as true by properly defining “square” and so on, and doing basic logical/textual transformations.  Instead we verify it by relating it to possible experience, imagining a rotating square in the visual field.

From a geometrical proof that a square is equal to itself when turned 90 degrees on its center, a prediction about possible experience can be derived, namely, that turning a square piece of wood 90 degrees by its center results in a wood square having the same shape and location in the visual field as it did previously.  Mathematics needs to correspond to possible experience to have application to the perceptible outside world.

Kant doesn’t even think arithmetic propositions are analytic. To get 2+2=4 from “analytic” operations, we could try defining 2=1+1, 4=1+1+1+1, then observing 2+2=(1+1)+(1+1)=1+1+1+1=4, however this requires using the associative property of addition. Perhaps there is an alternative way to prove this “analytically” but neither Kant nor I know of that. Instead we can verify addition by, synthetically, corresponding numbers to our fingers, which “automatically” get commutative/associative properties.

## The synthetic a priori

Besides the issue that math relates to possible experience, another problem with “analytic = mathematical” is that, as Kant argues, some propositions are both synthetic and a priori, and “the real problem of pure reason” is how we can know such propositions.

Here’s an argument for this. Suppose we first observe O and then conclude that P is true. If we’re reasoning validly, P is true a posteriori (relative to O). But this whole thought experiment pre-supposes that there is a time-structure in which we first see O and then we make a judgment about P. This time-structure is to some degree present even before seeing O, such that O can be evidence about P.

Imagine trying to argue that it’s raining outside to a person who doesn’t believe in time (including their own memory). Since you’re both inside and there are no windows, they can’t see that it’s raining. You try to argue that you were previously both outside and saw that it was raining. But they think there’s no such thing as “the past” so this is not convincing.

To make the argument to them successfully, their mind has to already implement certain dynamics even before receiving observations.

A baby is already born with cognitive machinery, it can’t “learn” all of that machinery from data, the process of learning itself already requires this machinery to be present in order to structure observations and relate them to future ones. (In some sense there was no cognition prior to abiogenesis, though; there is a difference between the time ordering of science and of cognitive development.)

In Bayesianism, to learn P from O, it must be the case a priori that P is correlated with O. This correlational structure could be expressed as a Bayesian network. This network would encode an a priori assumption about how P and O are correlated.

Solomonoff induction doesn’t encode a fixed network structure between its observations, instead it uses a mixture model over all stochastic Turing machines. However, all these machines have something in common, they’re all Stochastic turing machines producing an output stream of bits. Solomonoff induction assumes a priori “my observations are generated by a stochastic Turing machine”, it doesn’t learn this from data.

One could try pointing to problems with this argument, e.g. perhaps “there is time” isn’t a valid proposition, and time is a non-propositional structure in which propositions exist. But now that I just wrote that, it seems like I’m asserting a proposition about time to be true, in a contradictory fashion. The English language is more reflective than the language of Bayesian networks, allowing statements about the structure of propositions to themselves be propositions, as if the fact of the Bayesian network being arranged a particular way were itself represented by an assignment of a truth value to a node in that same Bayesian network.

## Transcendental

Philosophers today call Kant’s philosophy “transcendental idealism”. Kant himself uses the word “transcendental” to refer to cognitions about how cognitions are possible a priori.

This is in part an archaeological process. We see, right now, that we live in a phenomenal world that is approximately Euclidean. Was our phenomenal world always Euclidean, or was it non-Euclidean at some point and then switched over to Euclidean, or is time not enough of a real thing for this to cover all the cases? This sort of speculation about what the a priori empty mind is, from our a posteriori sense of the world, is transcendental.

One angle on the transcendental is, what else has to be true for the immediate (immanent) experience you are having right now to be what it is? If you are seeing a chair, that implies that chairs exist (at least as phenomena); if you see the same chair twice, that means that phenomenal objects can re-occur at different times; and so on.

## The transcendental aesthetic

Aesthetic means sense. The transcendental aesthetic therefore refers to the a priori cognitive structures necessary for us to have sensation.

Mainly (Kant argues) these are space and time. I often call these “subjective space”, “subjective time”, “subjective spacetime”, to emphasize that they are phenomenal and agent-centered.

## Space

Most of our observations appear in space, e.g. visual input, emotional “felt senses” having a location in the body. To some extent we “learn” how to see the world spatially, however some spatial structures are hardwired (e.g. the visual cortex). Image processing AIs operate on spatial images stored as multidimensional arrays; arbitrarily rearranging the array would make some algorithms (such as convolutional neural networks) operate worse, indicating that the pre-formatting of data into a spatial array before it is fed into the algorithm is functional.

If space weren’t a priori then we couldn’t become fully confident of geometrical laws such as “a square turned 90 degrees about its center is the same shape”, we’d have to learn these laws from experience, running into Hume’s problem of induction.

There is only one space, since when attempting to imagine two spaces, one is putting them side by side; there must be some outermost container.

Space is infinite, unbounded.  This doesn’t imply that the infinity is all represented, just that the concept allows for indefinite extension.  Finite space can be derived by adding a bound to infinite space; this is similar to Spinoza’s approach to finitude in the Ethics.

Space isn’t a property of things in themselves, it’s a property of phenomena, things as they relate to our intuition.  When formalizing mathematical space, points are assigned coordinates relative to the (0, 0) origin.  We always intuit objects relative to some origin, which may be near the eyes or head.  At the same time, space is necessary for objectivity; without space, there is no idea of external objects.

Our intuitions about space can only get universal geometric propositions if these propositions describe objects as they must necessarily appear in our intuition, not if they are describing arbitrary objects even as they may not appear in our intuition.  As a motivating intuition, consider that non-Euclidean geometry is mathematically consistent; if objective space were non-Euclidean, then our Euclidean intuitions would not yield universally valid geometric laws about objective space. (As it turns out, contemporary physics theories propose that space is non-Euclidean.)

## Time

We also see observations extending over time. Over short time scales there is a sense of continuity in time; over long time scales we have more discrete “memories” that refer to previous moments, making those past moments relevant to the present. The structuring of our experiences over time is necessary for learning, otherwise there wouldn’t be a “past” to learn from. AIs are, similarly, fed data in a pre-coded (not learned) temporal structure, e.g. POMDP observations in a reinforcement learning context.

The time in which succession takes place is, importantly, different from objective clock time, though (typically) these do not disagree about ordering, only pacing.  For example, there is usually only a small amount of time remembered during sleep, relative to the objective clock time that passes during sleep.  (The theory of relativity further problematizes “objective clock time”, so that different clocks may disagree about how much time has passed.)

We may, analogously, consider the case of a Solomonoff inductor that is periodically stopped and restarted as a computer process; while the inductor may measure subjective time by number of observation bits, this does not correspond to objective clock time, since a large amount of clock time may pass between when the inductor is stopped and restarted.

Kant writes, “Different times are only parts of one and the same time”.  Perhaps he is, here, too quick to dismiss non-linear forms of time; perhaps our future will branch into multiple non-interacting timelines, and perhaps this has happened in the past.  One especially plausible nonlinear timelike structure is a directed acyclic graph. Still, DAGs have an order; despite time being nonlinear, it still advances from moment to moment.  It is also possible to arbitrarily order a DAG through a topological sort, so the main relevant difference is that DAGs may drop this unnecessary ordering information.

Time is by default boundless but can be bounded, like space.

“Time is nothing other than the form of the inner sense, i.e.  of the intuition of our self and our inner sense”, in contrast to space, which is the form of the external sense. To give some intuition for this, suppose I have memory of some sequence of parties I have experienced; perhaps the first consists of myself, Bob, and Sally, the second consists of myself, Sally, and David, and the third consists of myself, Bob, and David.  What is common between all the parties I remember is that I have been at all of them; this is true for no one other than myself.  So, my memory is of situations involving myself; “me” is what is in common between all situations occurring in my subjective timeline.

Since time is the form of the inner sense, it applies to all representations, not only ones concerning outer objects, since all representations are in the mind.

Time is, like space, a condition for objects to appear to us, not a property of things in themselves.

## Relational knowledge

Kant clarifies the way in which we fail to cognize objects in themselves, with the example of a triangle: “if the object (that is, the triangle) were something in itself, without relation to you the subject; how could you affirm that that which lies necessarily in your subjective conditions in order to construct a triangle, must also necessarily belong to the triangle in itself?”

Relational knowledge allows us to know objects as they relate to us, but not as they don’t relate to us.  Geometry applies to objects that have locations in spacetime; for objects to appear in subjective spacetime, they must have coordinates relative to the (0, 0) origin, that is, the self; therefore, geometry applies to objects that have locations relative to the self; without a location relative to the self, the object would not appear in subjective spacetime.

It may seem silly to say that this “merely relational” knowledge fails to understand the object in itself; what properties are there to understand other than relational properties?  A triangle “in itself” independent of space (which relates the different parts of the triangle to each other) is a rather empty concept.

What is given up on, here, is an absolute reference frame, a “view from nowhere”, from which objects could be conceived of in a way that is independent of all subjects; instead, we attain a view from somewhere, namely, from subjective spacetime.

Einstein’s theory of special relativity also drops the absolute reference frame, however it specifies connections and translations between subjective reference frames in a way that Kant’s theory doesn’t.

## Sensibility and understanding

The sensibility is the faculty of passively receiving impressions, which are approximately “raw sense-data”. The understanding is the faculty of spontaneously conceptualizing an object by means of these impressions.

To recognize an object (such as an apple), the mind must do something; with no mental motion, the light pattern of the apple would hit the retina, but no object would be represented accordingly. In general, the understanding synthesizes raw data into a coherent picture.

## Manifold of intuition

Without concepts, sense data would be a disorganized flux, like a video of white noise; Kant terms this flux a “manifold of intuition”.  When I think of this, I think of a bunch of sheets of space tied together by a (curved) timeline holding them together, with pixel-like content in the space. Judgments, which are propositions about the content of our understanding (e.g. “there is a cat in front of me”), depend on the “unity among our representations”; what is needed is a “higher [representation], which comprehends this and other representations under itself”.  To judge that there is a cat in front of me, I must have parsed the manifold into concepts such as “cat” which relate to each other in a logically coherent universe; I cannot make a judgment from un-unified raw pixel-data. AI object recognition is an attempt to artificially replicate this faculty.

## Synthesis

Synthesis is the process of “putting different representations together with each other and comprehending their manifoldness in one cognition”.  This is an action of the spontaneity of thought, processing the manifold of intuition into a combined representation.

This relates to the phenomenal binding problem, how do we get a sense of a “unified” world from disconnected sensory data?

In Solomonoff Induction, the manifold of intuition would be the raw observations, and the manifold is synthesized by the fact that there is a universal Turing machine producing all the observations with a hidden state.  This is the case a priori, not only after seeing particular observations.  Similarly with other Bayesian models such as dynamic Bayesian networks; the network structure is prior to the particular observations.

## Transcendental Categories

“There is only one experience, in which all perceptions are represented as in thoroughgoing and lawlike connection, just as there is only one space and time…”

Different experiences are connected in a lawlike way, e.g. through causality and through re-occurring objects; otherwise, it would be unclear how to even interpret memories as referring to the same world. The transcendental categories (which are types of judgment) are ways in which different representations may be connected with each other.

Kant gives 12 transcendental categories, meant to be exhaustive. These include: causation/dependence, existence/nonexistence, necessity/contingence, unity, plurality. I don’t understand all of these, and Kant doesn’t go into enough detail to understand all of them. Roughly, these are different ways experiences can connect with each other, e.g. a change in an experienced object can cause a change in another, and two instances of seeing an object can be “unified” in the sense of being recognized as seeing the same object.

## Schema

A schema (plural schemata) relates the manifold of intuition (roughly, sense data) to transcendental categories or other concepts. As a simple example, consider how the concept of a cat relates to cat-related sense data. The cat has a given color, location, size, orientation, etc, which relate to a visual coordinate system. A cat object-recognizer may recognize not only that a cat exist, but also the location and possibly even the orientation of the cat.

Without schema, we couldn’t see a cat (or any other object); we’d see visual data that doesn’t relate to the “cat” concept, and separately have a “cat” concept. In some sense the cat is imagined/hallucinated based on the data, not directly perceived: “schema is, in itself, always a mere product of the imagination”. In Solomonoff induction, we could think of a schema as some intermediate data and processing that comes between the concept of a “cat” (perhaps represented as a generative model) and the observational sense data, translating the first to the second by filling in details such as color and location.

This applies to more abstract concepts/categories such as “cause” as well. When X causes Y, there is often a spacetime location at which that cause happens, e.g. a moment that one billiard ball hits another.

Kant writes: “Now it is quite clear that there must be some third thing, which on the one side is homogeneous with the category, and with the phenomenon on the other, and so makes the application of the former to the latter possible. This mediating representation must be pure (without any empirical content), and yet must on the one side be intellectual, on the other sensuous. Such a representation is the transcendental schema.”

Schemata are transcendental because they are are necessary for some phenomenal impressions, e.g. the impression that a cat is at some location. They are necessary for unifying the manifold of intuition (otherwise, there wouldn’t be an explanation of correlation between different individual pixel-like pieces of sense data).

## Consciousness of the self

Kant discusses consciousness of the self: “The consciousness of oneself in accordance with the determinations of our state in internal perception is merely empirical, forever variable; it can provide no standing or abiding self in this stream of inner appearances, and is customarily called inner sense or empirical apperception. That which should necessarily be represented as numerically identical cannot be thought of as such through empirical data. There must be a condition that precedes all experience and makes the latter itself possible, which should make such a transcendental presupposition valid.”

The idea of a lack of a fixed or permanent self in the stream of internal phenomena will be familiar to people who have explored Buddhism.  What you see isn’t you; there are phenomena that are representations of the engine of representation, but these phenomena aren’t identical with the engine of representation in which they are represented.

The self is, rather, something taken as “numerically identical with itself” which is a condition that precedes experience. Imagine a sequence of animation frames in a Cartesian coordinate system. In what sense are they part of “the same sequence”? Without knowing more about the sequence, all we can say is that they’re all part of the same sequence (and have an ordering within it); the sameness of the sequence of each frame is “numerical identity” similar to the identity of an object (such as a table) with itself when perceived at different times.

## Dialectic

Kant writes: “We termed dialectic in general a logic of appearance.” Dialectic is a play of appearances, claiming to offer knowledge, but instead offering only temporary illusions; different sophists argue us into different conclusions repeatedly, perhaps in a cyclical fashion.

Dialectic is an error it is possible to fall into when reasoning is not connected with possible experience. Kant writes about dialectic in part to show how not to reason. One gets the impression that Kant would have thought Hegel and his followers were wasting their time by focusing so much on dialectic.

## The Antinomies of Pure Reason

As an example of dialectic, Kant argues that time and space are finite and that they are infinite; that everything is made of simple parts and that nothing is simple; that causality doesn’t determine everything (requiring spontaneity as an addition) and that it does; that there is an absolutely necessary being and that there isn’t. Each of these pairs of contradictory arguments is an antinomy.

Philosophers argue about these sorts of questions for millenia without much resolution; it’s possible to find somewhat convincing arguments on both sides, as Kant demonstrates.

## Ens Realissimum

Kant writes: “This conception of a sum-total of reality is the conception of a thing in itself, regarded as completely determined; and the conception of an ens realissimum is the conception of an individual being, inasmuch as it is determined by that predicate of all possible contradictory predicates, which indicates and belongs to being.”

Say some objects are cold and some are hot. Well, they still have some things in common, they’re both objects. There is a distinction being made (hot/cold), and there is something in common apart from that distinction. We could imagine a single undifferentiated object, that is neither hot nor cold, but which can be modified by making it hot/cold to produce specific objects.

This is similar to Spinoza’s singular infinite substance/God, of which all other (possibly finite) beings are modifications, perhaps made by adding attributes.

The ens realissimum has a similar feel to the Tegmark IV multiverse, which contains all mathematically possible universes in a single being, or a generative grammar of a Turing complete language. It is a common undifferentiated basis for specific beings to be conceptualized within.

Kant considers deriving the existence of a supreme being (God) from the ens realissimum, but the concept is too empty to yield properties attributed to God, such as benevolence, being the intelligent creator of the universe, or providing an afterlife. He goes on to critique all supposed rational proofs of the existence of God, but goes on to say that he posits God and an afterlife because such posits are necessary to believe that the incentive of pleasure-seeking is aligned with acting morally. (Wishful thinking much?)

## Conclusion

What is Kant getting at, in all this? I think he is trying to get readers to attend to their experience, the spacetimelike container of this experience, and the way their world-model is constructed out of their experience. For example, the idea that time is the form of the inner sense is apparent from noting that all accessible memories include me, but it’s possible to “forget” about this subjective timeline and instead conceptualize time as observer-independent. The idea that the manifold of intuition must be actively synthesized into a representation containing objects (which is in line with cognitive science) challenges the idea that the world is “given”, that “we” are simply inhabitants of a stable world. The idea of the “noumenon” as a negative, limiting concept points us at our experience (and what our experience could be) as an alternative to interminably angsting about whether what we experience is “really real” or about metaphysical concepts like God, which makes it easier to get on with positivist math, science, economics, and scholarship without worrying too much about its foundations.

The sense I get reading Kant is: “You live in a world of phenomena synthesized by your mind from some external data, and that’s fine, in a sense it’s all you could ever hope for. You have plenty of phenomena and generalities about them to explore, you can even inquire into the foundations of what makes them possible and how your mind generates them (I’ve already done a lot of that for you), but there’s no deep Outside demanding your attention, now go live!”

When I take this seriously I worry about getting lost in my head, and sometimes I do get lost in my head, and the Outside does impinge on my cozy mental playground (demanding my attention, and loosening my mental assumptions structuring the phenomenal world), but things calm after a while and I experience the phenomenal world as orderly once again.

# On the paradox of tolerance in relation to fascism and online content moderation

It’s common in the last decade or so to see people invoke Karl Popper’s Paradox of Tolerance as a justification for de-platforming sufficiently fascist, reactionary, or bigoted content.  Here’s a comic that’s been shared a lot:

The message from this is: “Nazis are bad, if you ‘give Nazis a chance’ (by e.g. listening to them or allowing them to express their views to others), they’ll end up taking over and silencing and possibly killing lots of people including you, so you should kick them out of society when they start saying things that imply not tolerating others.”

An immediate problem that comes up is that pretty much all political philosophies refuse to tolerate some things, e.g. crimes like theft. Of course one could draw distinctions between intolerance of behavior versus intrinsic identity, but this gets into nuances not determined by simplistic arguments about “intolerance”.

## Who is intolerant according to Karl Popper’s criterion?

What did Karl Popper himself have to say about the matter? Quoting The Open Society and its Enemies:

Less well known [than other paradoxes] is the paradox of tolerance: Unlimited tolerance must lead to the disappearance of tolerance. If we extend unlimited tolerance even to those who are intolerant, if we are not prepared to defend a tolerant society against the onslaught of the intolerant, then the tolerant will be destroyed, and tolerance with them.—In this formulation, I do not imply, for instance, that we should always suppress the utterance of intolerant philosophies; as long as we can counter them by rational argument and keep them in check by public opinion, suppression would certainly be most unwise. But we should claim the right to suppress them if necessary even by force; for it may easily turn out that they are not prepared to meet us on the level of rational argument, but begin by denouncing all argument; they may forbid their followers to listen to rational argument, because it is deceptive, and teach them to answer arguments by the use of their fists or pistols. We should therefore claim, in the name of tolerance, the right not to tolerate the intolerant. We should claim that any movement preaching intolerance places itself outside the law and we should consider incitement to intolerance and persecution as criminal, in the same way as we should consider incitement to murder, or to kidnapping, or to the revival of the slave trade, as criminal.

(emphasis mine)

This adds some detail that helps to better specify the claim.  It would be unwise to counter positions with suppression when they can be argued against; if our position indeed has better evidence behind it and the opponents are willing to submit their position to rational argument/debate, then we should expect our position to “win” in such a rational argument, and it would be unwise to suppress the position.

Why might suppression be unwise? If those disagreeing are rationally persuadable, then suppression removes an opportunity for dialogue that could convince them of the truth. Even if they are not themselves rationally persuadable, rational third parties will be more convinced if the position is argued with rather than suppressed; suppressing the position can make it look better than it is, since supporters of that position can claim that it’s being unfairly ignored and suppressed by the establishment.

Another consideration against suppression is that it’s sometimes hard to be certain of who the “intolerant” one is. After all, suppressing speech is the kind of thing intolerant people do; if we’re designing rules that are supposed to be applicable by a diverse population, some of that population will be “intolerant”, and ideally these rules would advise these people and those watching them about how to determine whether or not they are.

Luckily, Karl Popper clarifies later in the section who is intolerant: “they are not prepared to meet us on the level of rational argument, but begin by denouncing all argument; they may forbid their followers to listen to rational argument, because it is deceptive, and teach them to answer arguments by the use of their fists or pistols.”

Hmm, who does this describe better in the contemporary political argument? The first example I think of is people who say that debating reactionaries is futile, who think debate leads to fascism, who think fascists should be punched instead of debated.  The New York Times recently published an article about debate on college campuses, in which a student reported that college campuses are less receptive to debate than they used to be, instead producing a climate of fear about speaking one’s mind.

There are good criticisms of this article, for example in noting that the student complained about a group of people including a teacher all disagreeing with her statement; that reflects unwillingness to debate fairly. However, in response to this article, some “leftist”-identifying commentators said things along the lines of “debate is bad, it leads to more fascism”. That debate itself is the generator of fascism, not merely an arena that fascism utilizes.

To be fair to such arguments, debate is an adversarial process, but it’s an adversarial process much older than the fascism of Mussolini or Hitler, engaged in by the likes of Ben Franklin and Abraham Lincoln. Chess and football are also adversarial processes, and probably have more in common with fascism than debate does, due to chess being a war simulation and football being a physical team-versus-team activity similar to military exercises.

Under Karl Popper’s criterion, those saying that debate itself leads to fascism (and is therefore not worth engaging with, and worth dis-endorsing in general) would certainly be “intolerant”; they denounce rational argument, encourage their followers to do so, and posture towards committing violence against those who debate by labeling those in favor of open debate as “fascists” who can be punched under a “punch fascists/Nazis” rule. (The call for “punching Nazis” is, ironically, concordant with Nazism: Triumph of the Will approvingly depicted Nazis punching Nazis in a burst of masculine passion.)

It’s important to remember that the historical Nazis did not take power through means that were legitimate in the Weimar republic; they took power in large part through paramilitary units that clashed with police forces, and are strongly suspected to have started the Reichstag fire, which led to emergency powers being granted that reduced the freedom of speech of the German citizenry.

These are not the actions a political unit who thought that open rational debate was on its side would take. These are, instead, the actions of a political unit that thinks it can only take power by violently disrupting the legitimate societal processes, and being in hiding most of the time while taking power. In general, if someone in a fight is hiding, they are hiding because they believe that they would lose an open fight; if they thought they would win an open fight, they would likely initiate such a fight. In RPG terms, a warrior will generally win an open fight with a thief, and the thief tries to win through stealth and picking surprising engagements; warriors are more likely to win in well-lit areas, thieves in poorly-lit areas.

While the comic citing Popper uses the phrase “preaching intolerance”, which Popper also uses, reading the full passage shows that Popper’s concern is about directing causing intolerance, “inciting” it, rather than rationally arguing for it. Rational arguments for intolerance (including the “paradox of tolerance” itself) can be dealt with at the level of rational argument, and those that the evidence is against can be defeated by opposing rational arguments. The problem comes in eschewing rational argument and encouraging others to do so as well.

It’s especially ironic for debate to be considered fascist (therefore antisemitic) when Judaism itself involves a lot of debate. There’s a long tradition of technical argumentation about how to interpret the Torah, including accounts of what happened historically and what laws Jews are bound to follow. I heard about an inter-faith dialogue event in which the Jewish speaker said that the essence of Judaism was that when speaking with someone you love, you should be willing to rationally argue against their beliefs, and be more willing to “lay into them” harder the more you love them, as becoming more right through rational argument is beneficial. I remember my Jewish father and grandfather being proud as I developed logical arguments for atheism around age 11. And I recently talked with some Hasidic Jews who believed it was important for social media not to censor Nazis, since they deserve a fair shot in open debate. These are signs of a culture that is enthusiastically in favor of open debate.

Given this, suppression of debate is directly anti-Semitic, even when it claims to be indirectly preventing future antisemitism by suppressing antisemitic fascists.

To add some nuance, this doesn’t imply that all behaviors that look like “debate” are actually the sort of attempts at intellectual improvement that help correct incorrect beliefs and which are valued by Judaism. Jean-Paul Sartre’s “Anti-Semite the Jew” (summary by Sarah Constantin here) discusses the traits of antisemites. These include laziness, people-orientation, impulsiveness, bullying, conformity, irrationality, mysticism, anti-intellectualism, and being part of a mob. Quoting the “irrational” section of Sarah’s summary:

They are irrational. “The anti-Semite has chosen to live on the plane of passion.” They like being angry (at the Jews), and seek out opportunities to work themselves up into a rage. They deliberately say trollish things that make no sense: “Never believe that anti‐Semites are completely unaware of the absurdity of their replies. They know that their remarks are frivolous, open to challenge. But they are amusing themselves, for it is their adversary who is obliged to use words responsibly, since he believes in words. The anti‐Semites have the right to play.”

Antisemites troll the process of rational discourse itself by saying things that make no sense, but which take more difficulty to refute rationally than to say, since the refutation has to abide by the laws of reason whereas the initial statement does not. The statements can, instead, be statements about who to affiliate with and who to bully, thinly disguised as rational arguments.

This is a consideration against continuing to platform someone after they make enough absurd arguments that are easy to demonstrate false; continuing to argue with them is wasting one’s time, since most of what they say is noise or, worse, deliberately-misleading statements, or instructions to others to commit violence in opposition to reason.  It would be rare, however, to find someone on the “pro-debate” side of contemporary discourse who wouldn’t agree that debate is in some cases simply not worth the effort due to the irrationality of the other side.

There are open questions in terms of which acts of speech are practicing intolerance and how they should be suppressed. At one extreme, criminal conspiracy and death threats are forms of speech that intrinsically move people towards harmful, “intolerant” actions. Someone who is only pretending to give rational arguments may still be whipping up an irrational mob, e.g. someone who proposes a law pardoning anyone who murders <some particular person> on <some particular date> is likely to be trying to stoke mob violence towards this person, rather than propose a law they think will be passed and which has good reasons for it.

## Elon Musk, Twitter, and free speech

This brings us to Mike Solona’s article (posted on Bari Weiss’s substack) about the response to Elon Musk’s takeover of Twitter:

“Free speech is the bedrock of a functioning democracy, and Twitter is the digital town square where matters vital to the future of humanity are debated. I also want to make Twitter better than ever by enhancing the product with new features, making the algorithms open source to increase trust, defeating the spam bots, and authenticating all humans.”

“Freedom,” “open source technology,” and “man, I really hate these spam bots.” The media’s reaction to these ambitions was instant and apoplectic. They were akin, we were told, to literal Nazism

Out of the gate, it was incoherent fury, with no consensus motive. We were told that Elon, who explicitly opposes censorship, intended to deplatform, and ultimately destroy, all of his critics, who are themselves explicitly in favor of censorship. We were told that Elon was building a propaganda engine. We were told that Twitter, which was until last week apparently a peaceful, utopian haven for principled discourse, would now revert to some earlier, imagined world of carnage (very bad tweets). The case was made, with zero evidence, that Elon is a racist. It was all just table stakes, really.

After a week or so, in brutal, Darwinian competition for attention, arguments against Musk blossomed into something more colorful. From Axios, a company committed in writing to never sharing an opinion, it was “reported” that Elon, once likened to Iron Man, was now behaving “like a supervillain.” His ownership of Twitter would lead to World War III, the case was made elsewhere. In one of my favorite moments of derangement, NPR helpfully reminded us that Elon is an imperialist. The basis for such an incredible charge? In the tradition of America’s Apollo Moon landing, one of the most celebrated accomplishments in human history, Elon wants to settle Mars, an uninhabited desert planet 155 million miles from Earth. This is just like colonial-era Britain’s brutal conquest of half the world, when you think about it.

The takes were all extraordinarily stupid, and yes, I loved every single one of them.

The worst people on the internet, delirious with rage, couldn’t stop themselves from saying the dumbest things they’ve ever said since last week and listen, again, yes, I love this. But as funny as the insanity is, it’s important to remember it’s all just that—insane. Irrelevant. Not remotely about what is actually at stake.

The incoherence of the anti-takeover arguments shows an attitude towards rational argument similar to the antisemites profiled by Sartre: absurd replies that can hardly be interpreted as corresponding to rational considerations. Mike Solana “loves” them because the over-the-top absurdity reveals the charade unambiguously.

Given how little Elon has said about how he plans to change the platform, there is little that his critics could be going off of to decide to criticize the takeover in this manner. The most obviously criticizable thing he said is that he is in favor of “free speech”. (I would, personally, most object to “authenticating all humans”, as that would harm many pseudonymous accounts I know and love, but that hasn’t been the main focus in the mainstream discourse.) An outside observer (e.g. foreign) would take from this that the mainstream American media is against free speech.

Now, it’s certainly possible to invoke “free speech” asymmetrically to only protect opinions that are endorsed by current power structures, but the natural way for a pro-free-speech person to object to this is to point out the asymmetry (as some anarchists do), not to give up the very idea of “free speech” to power worshippers. This whole discussion reminds me of the era of Reddit where there was a social justice subreddit called “ShitRedditSays” in which posters often made fun of redditors saying they were having their free speech rights violated by misspelling “free speech” and “freeze peach” and posting emojis of peaches, dissolving any semantic content of “free speech” into an absurdity.

Why would it make sense to cede the ideal of “free speech” to “right-wing fascist sympathizers”, when fascists are against free speech and liberals are for it? I think that’s the wrong question because it’s assuming that the strategy is selected because it makes sense, whereas it may alternatively (by Sartre’s model) be a mob strategy for synchronizing with others around an irrational pseudo-worldview. Someone might join such a mob if they find that they themselves cannot speak freely, and cannot unironically invoke the idea of “free speech” to un-silence themselves; they may see those who do so as “privileged” since only they find this strategy successful, and be encouraged to inflict their trauma of being silenced on such privileged, so as to level the playing field. (Such traumatized people would, of course, be “intolerant” under Karl Popper’s definition, and illiberal.)

It is in fact the case that many “outsider” platforms with light moderation, such as Gab, attract a high concentration of fascists and Nazis. I’ve heard from a friend who tracks various political discussion groups that a common pattern is for subreddits/forums that allow Nazis to be taken over by Nazis, and the resulting group then goes on to take over the next most Nazi-tolerant forum, and so on. However, this is in large part an artifact of the fact that Nazis are suppressed from mainstream discussion fora. The situation with light moderation implemented more generally (e.g. on Twitter) would be more like the days of the early Internet (Usenet, early Reddit, and so on), which did not have a high concentration of Nazism.

It would be hard for me to come up with better propaganda for Nazism as the supposedly anti-Nazi statement “As we say in Germany, if there’s a Nazi at the table and 10 other people sitting there talking to him, you got a table with 11 Nazis.” Imagine replacing Nazi with “fan of Rammstein”, for example: “As we say in Germany, if there’s a fan of Rammstein at a table and 10 other people sitting there talking to him, you got a table with 11 fans of Rammstein.” That makes Rammstein sound like a really compelling band, one that can quickly convert people by mere exposure. If I’d like Rammstein upon giving them a listen, doesn’t that mean Rammstein scores highly according to my preferences at an approximation of reflective equilibrium, and is therefore a great band?

The fact that this saying comes from Germany should not at all dissuade us from the idea that it’s Nazi propaganda; after all, many Germans were once Nazis, or are near descendants of former Nazis, Nazism was originally developed in Germany, and most Nazi propaganda has historically come from Germany.

If the saying were literally true, that would suggest a possible resolution to the Nazi-Jewish conflict: put some Jewish leaders (who are often in favor of open debate) in the same room as a Nazi, and soon almost all Jews will, through the Jewish tradition of debate, be converted to Nazis, accepting their own racial inferiority to the Aryans to the point where there is no longer any conflict, and Judaism can become a subsidiary of an international Nazi order.

The fact that this hypothetical is so ridiculous suggests that this saying is, in fact, not true.

What’s going through the minds of people who say things like this? One might imagine Nazism as a forbidden fruit, something we’d find delicious if we gave it a real chance, but would corrupt us, a veritable infohazard on par with the original forbidden fruit granting knowledge of good and evil…

…that starts sounding really awesome and appealing, an Eldritch horror that can “turn” people by mere exposure, as our brains are saturated with hedonic reward that overcomes all other faculties.  But wait, in such a world, why did the Nazis lose WWII so that we have so much anti-Nazi propaganda in the present moment? Couldn’t they just airdrop copies of “Mein Kampf” (translated to English/French/etc) on opposing territories to convince everyone of the greatness of Nazism, winning the “war” without any resistance?  If Nazism is so convincing in an open discussion, there is no explanation for fervent Allied resistance to Nazism.

The fact that Nazis took over using violence indicates that Nazism doesn’t win through rational debate of the merits of one regime or another; a look at the Nazi regime by foreigners reveals a terrifying, suppressive order with a high death rate and no end in sight, not a utopia that anyone would rationally prefer to the alternative systems of government. Rather, Nazism wins by convincing everyone that Nazis are powerful, that brownshirts will kill you if you stand up to it; people are terrorized into not wanting to be the first target, and thereby go along with the Nazi regime without resisting. It is, therefore, Nazi propaganda to exaggerate the degree to which Nazis are powerful, including the degree to which they’re effective at spreading their political views.

In discussions of the threat of “Nazism” or the “far right”, it is rarely clear how big the threat is, statistically. Is the number of Nazis in America more like the number of avowed neo-Nazis (~thousands), or more like the number of conservative Republicans (~tens of millions)? Is it a niche position, largely discredited, and dying out, or is it gaining ground and approaching a quorum?

Umberto Eco names, as one of the 14 elements of “ur-fascism”: “[…] the followers must be convinced that they can overwhelm the enemies. Thus, by a continuous shifting of rhetorical focus, the enemies are at the same time too strong and too weak.” That sounds familiar…

Really, it’s not possible to even know what a Nazi or fascist is without reading something about their political beliefs, e.g. Mussolini’s The Doctrine of Fascism; otherwise, how could someone possibly know that they are opposing rather than promoting fascism? It makes no sense for people concerned about fascism to isolate themselves from any material written by fascists, which might help them know anything at all about what they’re supposedly concerned about, rather than generating fear about a mysterious foreign Other, as the original fascists did.

## What do the speech regulators even want?

Consider a proposal to regulate everyone’s speech in a uniform manner to oppose racism and allow many cultures to thrive. Now, there’s an immediate contradiction in such a proposal: regulating everyone’s speech uniformly requires a discourse community to impose its norms on everyone, which is a form of imperialism that would extinguish alternative cultures of discourse.

Okay, but what about just regulating white people’s speech? This may be implied by the idea that “it’s not possible to be racist against white people”, that the objective of anti-racism is to frustrate pro-white racism, not anti-white racism which is more likely to be defensive. However, this is still pan-ethno-nationalist: it’s grouping together white people (a diverse, multi-ethnic group) who live in many places around the world into a single nation-like regulatory order, which, among other things, groups Jews in with Germans and regulates their speech the same way, absorbing them into the same speech-regulating nation. This does not seem like a proposal that a significant number of non-white people or Jews would be in favor of.

Moreover, the fact that white people’s speech is being regulated doesn’t mean that white people’s power is taken away. Quite the opposite: it’s typical in class societies for higher-class people to obey (and enforce upon each other) norms that are not enforced upon the lower classes. This allows the higher classes to distinguish themselves from the crowd, considering themselves refined enough to follow tighter social norms. In legal terms, there is discussion of a “right to be sued”: if you can be sued for not doing something, that means you can be accountable for doing it, which enables more trade opportunities. Similarly, uniform regulations within a nation can enable larger-scale social organization such as large freight transport and commodities markets.

Maybe the problem isn’t white people as a whole, it’s Anglos, i.e. cultures that are largely downstream of Great Britain from the 1700s-1900s. Since Great Britain colonized and enslaved so much of the world, maybe that’s what to be concerned about. It’s common for social justice people to cite German, German-Jewish, and French sources (Hegel, Marx, Heidegger, Adorno, Habermas, Sartre, Simone de Beauvior, Foucault, Deleuze, Derrida…); this is predicted by “social justice” being grouped into “continental philosophy”. One gets the sense that the target of such discourse is to expand the reach of German philosophy beyond German masculine intellectual culture into a variety of differently-colored and differently-gendered contexts, which are basically “re-skins” of German masculine intellectual culture in that they cite the same sources and have the same talking points, just with different aesthetics for different identity groups. One has to notice how similar social justice discourse is to itself, despite the diversity of identities; even relatively small ideological deviations are easy to classify as reactionary heresies. The resulting anti-Anglo censorship regime would, then, be quite consistent with German nationalism, as it would take out Germany’s main intellectual competitor (the Anglos, with the French being much more easily defeated by Germany historically, therefore a useful short-term ally) while promoting German intellectual culture; laid out this way, it’s not confusing what it is.

If I translate what speech regulators are asking for into an implied political philosophy, I can’t come up with any alternatives to universal cultural imperialism, white pan-ethno-nationalism, or anti-Anglo German colonial nationalism. Maybe this is in part due to a deficit of creativity on my part, but even if so, it’s even more due to the incoherence of the political position being taken, and/or lack of clarity about the consistent principles (if any) behind such a position.

What seems most likely is that general opposition to free speech in the name of anti-fascism is ironic: its surface-level expression is in tension with its actual situation. Such ironic anti-fascist expression leads to the sort of ridiculous pseudo-arguments cited in Mike Solana’s article. Being ironically against fascism is, in the general case, being in favor of fascism, since ironic expressions of anti-fascism displace actual anti-fascism while providing cover for fascism in the ensuing confusion (including confusion about what fascism even is).

Simulacra have originals; “anti-fascism” would not be an appealing target of ironic imitation if there weren’t at some point an anti-fascist movement that had some good properties, that people could believe in and assign moral authority to. Given the current situation, one has to conclude that antifa has been infiltrated and usurped by this point.

So, what is actually good about anti-fascism? Time to get object-level!

Fascism focuses on the status of people and groups within society rather than the overall freedom and prosperity of the society itself. However, this does not systematically lead to higher quality of life for fascists. It is preferable to, over a lifetime, fall socioeconomically by a decile (within the population) while the society becomes significantly more free and prosperous, rather than rising two deciles while society becomes significantly less free and prosperous. Only a false assumption that quality of life of zero-sum could lead to the opposite conclusion.

It is somewhat inauthentically idealistic (therefore easy to “cringe” at) to trade one’s own rank position in society for the flourishing of society at a rate that is self-sacrificing. However, it is also self-sacrificing to cause society to flourish less while increasing one’s own position within a way that causes one to need to protect oneself rather than being protected by society from a succession of ever less abstract, more short-term dangers.

There is always some misalignment between the interests of an individual and the interests of society; incentives are not perfectly aligned. However, self-interested individuals will seek to reduce the degree of misalignment, creating good incentives, which include opportunities to authentically signal virtue (i.e. take actions that make others think one is actually likely to in general improve one’s society, rather than take actions that make others think one is only pretending to do so to an undiscerning audience; the latter are what is more often called “virtue signaling” in contemporary discourse.)

The individual who opposes improvement of society and signals vice is actually more self-sacrificing than the individual who improves society more than is actually selfishly optimal, in the general case; it makes a big difference to treat the interests of the individual and society as opposed rather than similar but somewhat misaligned. It is unnatural to be confused into thinking that the life of a gang leader is safer and nicer than the life of a postal worker.

## Conclusion

While people cite Popper’s “paradox of tolerance” to justify suppressing fascist, reactionary, or bigoted speech in a generalized fashion, Popper’s argument actually only supports suppressing speech that enacts intolerance in contradiction to rational argumentation, and explicitly rules out suppressing speech that remains at the level of rational debate, even if this speech is an argument for some form of intolerance.

There really is such a thing as fascism, and it really does have negative consequences and involves mob violence in opposition to reason. However, general instructions to avoid talking with fascists, without clarifying what fascism even is (e.g. with material written by fascists explaining their political position), will serve this agenda, as they are also mob violence in opposition to reason, the kind of intolerance Karl Popper’s argument implies should not be tolerated.

# A method of writing content easily with little anxiety

Hello, I have today been informed about a method for writing content that is relatively easy and painless. The idea is that by default a human will when reading predict what word is going to come next. Studies show that humans can predict what letter is going to come next and get pretty good entropy scores. That means that even the process of reading involves generating plausible next words.

And so it is possible to similarly do the same when writing. When writing a sequence of words, after the end there will be a feeling that some word will come next. It is unexpected and abrupt for the sentence to immediately end, so there is a feeling of continuation, that something will come after. One can while writing simply look at the space right to the right of the text that has already been written and there will be a feeling there about what word will come next.

Do not worry if this is difficult and slow at first. A practice exercise is to write a few words and then take some time to get a feeling of what word is going to come next. With meditation it is possible to notice experiential phenomena that are in one’s experience and sometimes have a spatial location. Well, maybe someone who knows how to find these phenomena is going to be able to find some sense of “next word” ness when focusing on the end of a sentence fragment. That will tell them what word they think is going to come next.

It is possible to simply, when noticing what word is going to come next, type that word. That is in fact what I am doing right now and what I have been doing for this entire post. It is rather easy and painless to do. It is not anxiety-provoking because it is a meditative, instruction-following exercise where I simply get a sense of what word will come next and type it. I am not responsible for the output of my text predictive model. I am not to blame. I am not unworthy.

This shortcuts various anxiety loops that are possible when writing in a more traditional manner. At that point, while there are many sentence continuations that are invisibly generated, all are filtered out because it would feel somehow bad to have written that sentence. Maybe the sentence would be cringeworthy or stylistically bad or offensive or embarrassing or I-don’t-know-what, it might just be off, and someone with enough of an aesthetic taste might simply be repulsed by the idea of actually writing out that sentence. There is no filter this way, because the sense of what word is going to come next is always present even when reading text, and so it is not specific to writing and doesn’t indicate that the person having such a predict is, themselves, responsible for that text.

I know this is a somewhat silly way of thinking about moral responsibility, but it is emotionally compelling enough that it can increase writing output by a lot. And output is not always good, quality matters too. It would be to be susceptible to Goodhart’s law to say that writing more words is always better. However, writer’s block is a pain, it is really bad if it isn’t possible to write anything at all, or to only be able to write very short sentences and paragraphs. That way it is possible to go months without making any progress on a writing project.

I prefer to have something written even if it is low-quality. That way, it is possible to edit or re-try later. That will increase the level of quality. And I know that that seems like a confident statement, that it will increase the quality, not that it might. The difference is that right now I do not care about seeming overconfident, since I am simply reporting on what word I consider most likely, and not worrying that the word I think is most likely to come next is not actually the most likely, I am not even considering alternatives (more than 1) because I am writing so fast, so the words just come out and I am not worried about the implied overconfidence.

This is nearing the end of this essay. There is a job to conclude everything. I am not sure how to do that except by rating the quality of what I have already written. I have done practically no editing so far, only spelling and that kind of thing, and maybe pressed backspace a few times to erase the last word. But there is basically no editing here. And is it high quality? I will leave that for readers to judge.

In any case, this method has produced an artifact that explains the very method and provides a way of judging the method by the very artifact produced by the method in a rather recursive loop that might ground out somehow in some kind of probabilistic or utility theoretic calculation made by someone who is deciding whether to use this method to write, or not and to instead use their previous writing method exclusively at the risk of hitting writer’s block.

The end.

[ED NOTE: This took about 6 minutes to write. I previously described this method on Twitter. I made a single edit after, to add a missing word.]

# “Infohazard” is a predominantly conflict-theoretic concept

Nick Bostrom writes about “information hazards”, or “infohazards”:

Information hazards are risks that arise from the dissemination or the potential dissemination of true information that may cause harm or enable some agent to cause harm. Such hazards are often subtler than direct physical threats, and, as a consequence, are easily overlooked. They can, however, be important. This paper surveys the terrain and proposes a taxonomy.

The paper considers both cases of (a) the information causing harm directly, and (b) the information enabling some agent to cause harm.

The main point I want to make is: cases of information being harmful are easier to construct when different agents’ interests/optimization are misaligned; when agents’ interests are aligned, infohazards still exist, but they’re weirder edge cases.  Therefore, “infohazard” being an important concept is Bayesian evidence for misalignment of interests/optimizations, which would be better-modeled by conflict theory than mistake theory.

Most of the infohazard types in Bostrom’s paper involve conflict and/or significant misalignment between different agents’ interests:

1.  Data hazard: followed by discussion of a malicious user of technology (adversarial)

2.  Idea hazard: also followed by discussion of a malicious user of technology (adversarial)

3.  Attention hazard: followed by a discussion including the word “adversary” (adversarial)

4.  Template hazard: follows discussion of competing firms copying each other (adversarial)

5.  Signaling hazard: follows discussion of people avoiding revealing their properties to others, followed by discussion of crackpots squeezing out legitimate research (adversarial)

6.  Evocation hazard: follows discussion of activation of psychological processes through presentation (ambiguously adversarial, non-VNM)

7.  Enemy hazard: by definition adversarial

8.  Competitiveness hazard: by definition adversarial

9.  Intellectual property hazard: by definition adversarial

11. Knowing-too-much hazard: followed by discussion of political suppression of information (adversarial)

12. Norm hazard: followed by discussion of driving on sides of road, corruption, and money (includes adversarial situations)

13. Information asymmetry hazard: followed by discussion of “market for lemons” (adversarial)

14. Unveiling hazard: followed by discussion of iterated prisoner’s dilemma (misalignment of agents’ interests)

15. Recognition hazard: followed by discussion of avoiding common knowledge about a fart (non-VNM, non-adversarial, ambiguous whether this is a problem on net)

16. Ideological hazard: followed by discussion of true-but-misleading information resulting from someone starting with irrational beliefs (non-VNM, non-adversarial, not a strong argument against generally spreading information)

17. Distraction and temptation hazards: followed by discussion of TV watching (non-VNM, though superstimuli are ambiguously adversarial)

18. Role model hazard: followed by discussion of copycat suicides (non-VNM, non-adversarial, ambiguous whether this is a problem on net)

19. Biasing hazard: followed by discussion of double-blind experiments (non-VNM, non-adversarial)

20. De-biasing hazard: follows discussion of individual biases helping society (misalignment of agents’ interests)

21. Neuropsychological hazard: followed by discussion of limitations of memory architecture (non-VNM, non-adversarial)

22. Information-burying hazard: follows discussion of irrelevant information making relevant information harder to find (non-adversarial, though uncompelling as an argument against sharing relevant information)

23. Psychological reaction hazard: follows discussion of people being disappointed (non-VNM, non-adversarial)

24. Belief-constituted value hazard: defined as a psychological issue (non-VNM, non-adversarial)

25. Disappointment hazard: subset of psychological reaction hazard (non-VNM, non-adversarial, ambiguous whether this is a problem on net)

26. Spoiler hazard: followed by discussion of movies and TV being less fun when the outcome is known (non-VNM, non-adversarial, ambiguous whether this is a problem on net)

27. Mindset hazard: followed by discussion of cynicism and atrophy of spirit (non-VNM, non-adversarial, ambiguous whether this is a problem on net)

28. Embarrassment hazard: followed by discussion of self-image and competition between firms (non-VNM, includes adversarial situations)

29. Information system hazard: follows discussion of viruses and other inputs to programs that cause malfunctioning (includes adversarial situations)

30. Information infrastructure failure hazard: definition mentions cyber attacks (adversarial)

31. Information infrastructure misuse hazard: follows discussion of Stalin reading emails, followed by discussion of unintentional misuse (includes adversarial situations)

32. Robot hazard: followed by discussion of a robot programmed to launch missiles under some circumstances (includes adversarial situations)

33. Artificial intelligence hazard: followed by discussion of AI outcompeting and manipulating humans (includes adversarial situations)

Of these 33 types, 12 are unambiguously adversarial, 5 include adversarial situations, 2 are ambiguously adversarial, and 2 include significant misalignment of interests between different agents.  The remaining 12 generally involve non-VNM behavior, although there is one case (information-burying hazard) where the agent in question might be a utility maximizer (though, this type of hazard is not an argument against sharing relevant information).  I have tagged multiple of these as “ambiguous whether this is a problem on net”, to indicate the lack of a strong argument that the information in question (e.g. disappointing information) is actually bad for the receiver on net.

Simply counting examples in the paper isn’t a particularly strong argument, however.  Perhaps the examples have been picked through a biased process.  Here I’ll present some theoretical arguments.

There is a standard argument that the value of information is non-negative, that every rational agent from its own perspective cannot expect to be harmed by learning anything.  I will present this argument here.

Let’s say the actual state of the world is $W \in \mathcal{W}$, and the agent will take some action $A \in \mathcal{A}$.  The agent’s utility will be $u(W, A) \in \mathbb{R}$.  The agent starts with a distribution over $W$, $P(W)$.  Additionally, the agent has the option of observing an additional fact $m(W) \in \mathcal{O}$, which it will in the general case not know at the start.  (I chose $m$ to represent “measure”.)

Now, the question is, can the agent achieve lower utility in expectation if they learn $m(W)$ than if they don’t?

Assume the agent doesn’t learn $m(W)$.  Then the expected utility by taking some action $a$ equals $\sum_{w \in \mathcal{W}}P(W=w)u(w, a)$.  The maximum achievable expected utility is therefore

$max_{a \in \mathcal{A}} \sum_{w \in \mathcal{W}} P(W=w) u(w, a)$.

On the other hand, suppose the agent learns $m(W) = o$.  Then the expected utility by taking action $a$ equals $\sum_{w \in \mathcal{W}} P(W=w|m(W)=o)u(w, a)$, and the maximum achievable expected utility is

$max_{a \in \mathcal{A}} \sum_{w \in \mathcal{W}} P(W=w|m(W)=o)u(w, a)$.

Under uncertainty about $m(W)$, the agent’s expected utility equals

$\sum_{o \in \mathcal{O}}P(m(W)=o) \max_{a \in \mathcal{A}} \sum_{w \in \mathcal{W}}P(W=w|m(W)=o)u(w, a)$.

Due to convexity of the $\max$ function, this is greater than or equal to:

$\max_{a \in \mathcal{A}} \sum_{o \in \mathcal{O}}P(m(W)=o) \sum_{w \in \mathcal{W}}P(W=w|m(W)=o)u(w, a)$.

Re-arranging the summation and applying the definition of conditional probability, this is equal to:

$\max_{a \in \mathcal{A}} \sum_{o \in \mathcal{O}} \sum_{w \in \mathcal{W}} P(m(W)=o \wedge W = w)u(w, a)$.

Marginalizing over $o$, this is equal to:

$max_{a \in \mathcal{A}} \sum_{w \in \mathcal{W}} P(W=w) u(w, a)$.

But this is the same as the utility achieved without learning $m(W)$.  This is sufficient to show that, by learning $m(W)$, the agent does not achieve a lower expected utility.

(Note that this argument is compatible with the agent getting lower utility $u(W, A)$ in some possible worlds due to knowing $m(W)$, which would be a case of true-but-misleading information; the argument deals in expected utility, implying that the cases of true-but-misleading information are countervailed by cases of true-and-useful information.)

Is it possible to construct a multi-agent problem, where the agents have the same utility function, and they are all harmed by some of them learning something? Suppose Alice and Bob are deciding on a coffee shop to meet without being able to communicate beforehand, by finding a Schelling point.  The only nearby coffee shop they know about is Carol’s.  Derek also owns a coffee shop which is nearby.  Derek has the option of telling Alice and Bob about his coffee shop (and how good it is); they can’t contact him or each other, but they can still receive his message (e.g. because he advertises it on a billboard).

If Alice and Bob don’t know about Derek’s coffee shop, they successfully meet at Carol’s coffee shop with high probability.  But, if they learn about Derek’s coffee shop, they may find it hard to decide which one to go to, and therefore fail to meet at the same one.  (I have made the point previously that about-equally-good options can raise problems in coordination games).

This result is interesting because it’s a case of agents with the same goal (meeting at a good coffee shop) accomplishing that goal worse by knowing something than by not knowing it.  There are some problems with this example, however.  For one, Derek’s coffee shop may be significantly better than Carol’s, in which case Derek informing both Alice and Bob leads to them both meeting at Derek’s coffee shop, which is better than Carol’s.  If Derek’s coffee shop is significantly worse, then Derek informing Alice and Bob does not impact their ability to meet at Carol’s coffee shop.  So Derek could only predictably make their utility worse if somehow he knew that his shop was about as good to them as Carol’s.  But then it could be argued that, by remaining silent, Derek is sending Alice and Bob a signal that his coffee shop is about as good, since he would not have remained silent in other cases.

So even when I try to come up with a case of infohazards among cooperative agents, the example has problems.  Perhaps other people are better than me at coming up with such examples.  (While Bostrom presents examples of information hazards among agents with aligned interests in the paper, these lack enough mathematical detail to formally analyze them with utility theory to the degree that the coffee shop example can be analyzed.)

It is also possible that utility theory is substantially false, that humans don’t really “have utility functions” and therefore there can be information hazards.  Bostrom’s paper presents multiple examples of non-VNM behavior in humans.  This would call for revision of utility theory in general, which is a project beyond the scope of this post.

It is, in contrast, trivial to come up with examples of information hazards in competitive games.  Suppose Alice and Bob are playing Starcraft.  Alice is creating lots of some unit (say, zerglings).  Alice could tell Bob about this.  If Bob knew this, he would be able to prepare for an attack by this unit.  This would be bad for Alice’s ability to win the game.

It is still the case that Bob gains higher expected utility by knowing about Alice’s zerglings, which makes it somewhat strange to call this an “information hazard”; it’s more natural to say that Alice is benefitting from an information asymmetry.  Since she’s playing a zero-sum game with Bob, anything that increases Bob’s (local) utility function, including having more information and options, decreases Alice’s (local) utility function.  It is, therefore, unsurprising that the original “value of information is non-negative” argument can be turned on its head to show that “your opponent having information is bad for you”.

There are, of course, games other than common-payoff games and zero-sum games, which could also contain cases of some agent being harmed by another agent having information.

It is, here, useful to distinguish the broad sense of “infohazard” that Bostrom uses, which includes multi-agent situations, from a narrower sense of “self-infohazards”, in which a given individual gains a lower utility by knowing something.  The value-of-information argument presented at the start shows that there are no self-infohazards in an ideal game-theoretic case.  Cooperative situations, such as the coffee shop example, aren’t exactly cases of a self-infohazard (which would violate the original value-of-information theorem), although there is a similarity in that we could consider Alice and Bob as parts of a single agent given that they have the same local utility function.  The original value of information argument doesn’t quite apply to these (which allows the coffee shop example to be constructed), but almost does, which is why the example is such an edge case.

Some apparent cases of self-infohazards are actually cases where it is bad for some agent A to be believed by some agent B to know some fact X.  For example, the example Bostrom gives of political oppression of people knowing some fact is a case of the harm to the knower coming not from their own knowledge, but from others’ knowledge of their knowledge.

The Sequences contain quite a significant amount of advice to ignore the idea that information might be bad for you, to learn the truth anyway: the Litany of Tarski, the Litany of Gendlin, “that which can be destroyed by the truth should be”.  This seems like basically good advice even if there are some edge-case exceptions; until coming up with a better policy than “always be willing to learn true relevant information”, making exceptions risks ending up in a simulacrum with no way out.

A case of some agent A denying information to some agent B with the claim that it is to agent B’s benefit is, at the very least, suspicious.  As I’ve argued, self-infohazards are impossible in the ideal utility theoretic case.  To the extent that human behavior and values deviate from utility theory, such cases could be constructed.  Even if such cases exist, however, it is hard for agent B to distinguish this case from one where agent A’s interests and/or optimization are misaligned with B’s, so that the denial of information is about maintaining an information asymmetry that advantages A over B.

Sociologically, it is common in “cult” situations for the leader(s) to deny information to the followers, often with the idea that it is to the followers’ benefit, that they are not yet ready for this information.  Such esotericism allows the leaders to maintain an information asymmetry over the followers, increasing their degree of control.  The followers may trust the leaders to really be withholding only the information that would be harmful to them.  But this is a very high degree of trust.  It makes the leaders effectively unaccountable, since they are withholding the information that could be used to evaluate their claims, including the claim that withholding the information is good for the followers.  The leaders, correspondingly, take on quite a high degree of responsibility for the followers’ lives, like a zookeeper takes on responsibility for the zoo animals’ lives; given that the followers don’t have important information, they are unable to make good decisions when such decisions depend on this information.

It is common in a Christian context for priests to refer to their followers as a “flock”, a herd of people being managed and contained, partially through information asymmetry: use of very selective readings of the Bible, without disclaimers about the poor historical evidence for the stories’ truth (despite priests’ own knowledge of Biblical criticism), to moralize about ways of life.  It is, likewise, common for parents to lie to children partially to “maintain their innocence”, in a context where the parents have quite a lot of control over the childrens’ lives, as their guardians.  My point here isn’t that this is always bad for those denied information (although I think it is in the usual case), but that it requires a high degree of trust and requires the information-denier to take on responsibility for making decisions that the one denied information is effectively unable to make due to the information disadvantage.

The Garden of Eden is a mythological story of a self-infohazard: learning about good and evil makes Eve and Adam less able to be happy animals, more controlled by shame.  It is, to a significant degree, a rigged situation, since it is set up by Yahweh.  Eve’s evaluation, that learning information will be to her benefit, is, as argued, true in most cases; she would have to extend quite a lot of trust to her captor to believe that she should avoid information that would be needed to escape from the zoo.  In this case her captor is, by construction, Yahweh, so a sentimentally pro-Yahweh version of the story shows mostly negative consequences from this choice.  (There are also, of course, sentimentally anti-Yahweh interpretations of the story, in Satanism and Gnosticism, which consider Eve’s decision to learn about good and evil to be wise.)

The closest corporate case I know of to belief in self-infohazards is in a large tech company which has a policy of not allowing engineers to read the GDPR privacy law; instead, their policy is to have lawyers read the law, and give engineers guidelines for “complying with the law”.  The main reason for this is that following the law literally as stated would not be possible while still providing the desired service.  Engineers, who are more literal-minded than lawyers, are more likely to be hindered by knowing the literal content of the law than they are if they receive easier guidelines from lawyers.  This is still somewhat of an edge case, since information isn’t being denied to the engineers for their own sake so much as so the company can claim to not be knowingly violating the law; given the potential for employees to be called to the witness stand, denying information to employees can protect the company as a whole.  So it is still, indirectly, a case of denying information to potential adversaries (such as prosecutors).

In a legal setting, there are cases where information is denied to people, e.g. evidence is considered inadmissible due to police not following procedure in gaining that information.  This information is not denied to the jury primarily because it would be bad for the jury; rather, it’s denied to them because it would be unfair to one side in the case (such as the defendant), and because admitting such information would create bad incentives for information-gatherers such as police detectives, which is bad for information-gatherers who are following procedure; it would also increase executive power, likely at the expense of the common people.

So, invocation of the notion of a self-infohazard is Bayesian evidence, not just of a conflict situation, but of a concealed conflict situation, where outsiders are more likely than insiders to label the situation as a conflict, e.g. in a cult.

It is important to keep in mind that, for A to have information they claim to be denying to B for B’s benefit, A must have at some point decided to learn this information.  I have rarely, if ever, heard cases where A, upon learning the information, actively regrets it; rather, their choice to learn about it shows that they expected such learning to be good for them, and this expectation is usually agreed with later.  I infer that it is common for A to be applying a different standard to B than to A; to consider B weaker, more in need of protection, and less agentic than A.

Empathy based ethics in a darwinian organism often boils down to “Positive utilitarianism for me, negative utilitarianism for thee.”

Different standards are often applied because the situation actually is more of a conflict situation than is being explicitly represented.  One applies to one’s self a standard that values positively one’s agency, information, capacity, and existence, and one applies to others a standard that values negatively their agency, information, capacity, and existence; such differential application increases one’s position in the conflict (e.g. evolutionary competition) relative to others.  This can, of course, be rhetorically justified in various ways by appealing to the idea that the other would “suffer” by having greater capacities, or would “not be able to handle it” and is “in need of protection”.  These rhetorical justifications aren’t always false, but they are suspicious in light of the considerations presented.

Nick Bostrom, for example, despite discussing “disappointment risks”, spends quite a lot of his time thinking about very disappointing scenarios, such as AGI killing everyone, or nuclear war happening.  This shows a revealed preference for, not against, receiving disappointing information.

An important cultural property of the word “infohazard” is that it is used quite differently in a responsible/serious and a casual/playful context.  In a responsible/serious context, the concept is used to invoke the idea that terrible consequences, such as the entire world being destroyed, could result from people talking openly about certain topics, justifying centralization of information in a small inner ring.  In a casual/playful context, “infohazard” means something other people don’t want you to know, something exciting the way occult and/or Eldritch concepts are exciting, something you could use to gain an advantage over others, something delicious.

Here are a few Twitter examples:

• “i subsist on a diet consisting mostly of infohazards” (link)
• “maintain a steady infohazard diet like those animals that eat poisonous plants, so that your mind will poison those that try to eat it” (link)
• “oooooh an information hazard… Googling, thanks” (link)
• “are you mature/cool enough to handle the infohazard that a lot of conversations about infohazards are driven more by games around who is mature/cool enough than by actual reasoned concern about info & hazards?” (link)

The idea that you could come up with an idea that harms people in weird ways when they learn about it is, in a certain light, totally awesome, the way mind control powers are awesome, or the way being an advanced magical user (wizard/witch/warlock/etc) is awesome.  The idea is fun the way the SCP wiki is fun (especially the parts about antimemetics).

It is understandable that this sort of value inversion would come from an oppositional attitude to “responsible” misinforming of others, as a form of reverse psychology that is closely related to the Streisand effect.  Under a conflict theory, someone not wanting you to know something is evidence for it being good for you to learn!

This can all still be true even if there are some actual examples of self-infohazards, due to non-VNM values or behavior in humans.  However, given the argument I am making, the more important the “infohazard” concept is considered, the more evidence there is of a possibly-concealed conflict; continuing to apply a mistake theory to the situation becomes harder and harder, in a Bayesian sense, as this information (about people encouraging each other not to accumulate more information) accumulates.

As a fictional example, the movie They Live (1988) depicts a situation in which aliens have taken over and are ruling Earth.  The protagonist acquires sunglasses that show him the ways aliens control himself and others.  He attempts to put the sunglasses on his friend, to show him the political situation; however, his friend physically tries to fight off this attempt, treating the information revealed by the sunglasses as a self-infohazard.  This is in large part because, by seeing the concealed conflict, the friend could be uncomfortably forced to modify his statements and actions accordingly, such as by picking sides.

The movie Bird Box (2018) is a popular and evocative depiction of a self-infohazard (similar in many ways to Langford’s basilisk), in the form of a monster that, when viewed, causes the viewer to die with high probability, and to with low probability become a “psycho” who tries to show the monster to everyone else forcefully.  The main characters use blindfolds and other tactics to avoid viewing the monster.  There was a critical discussion of this movie that argued that the monster represents racism.  The protagonists, who are mostly white (although there is a black man who is literally an uncle named “Tom”), avoid seeing inter-group conflict; such a strategy only works for people with a certain kind of “privilege”, who don’t need to directly see the conflict to navigate daily life.  Such an interpretation of the movie is in line with the invocation of “infohazards” being Bayesian evidence of concealed conflicts.

What is one to do if one feels like something might be an “infohazard” but is convinced by this argument that there is likely some concealed conflict?  An obvious step is to model the conflict, as I did in the case of the tech company “complying” with GDPR by denying engineers information.  Such a multi-agent model makes it clear why it may be in some agents’ interest for some agents (themselves or others) to be denied information.  It also makes it clearer that there are generally losers, not just winners, when information is hidden, and makes it clearer who those losers are.

There is a saying about adversarial situations such as poker games: “If you look around the table and you can’t tell who the sucker is, it’s you”.  If you’re in a conflict situation (which the “infohazard” concept is Bayesian evidence for), and you don’t know who is losing by information being concealed, that’s Bayesian evidence that you are someone who is harmed by this concealment; those who are tracking the conflict situation, by knowing who the losers are, are more likely to ensure that they end up ahead.

As a corollary of the above (reframing “loser” as “adversary”): if you’re worried about information spreading because someone might be motivated to use it to do something bad for you, knowing who that someone is and the properties of them and their situation allows you to better minimize the costs and maximize the benefits of spreading or concealing information, e.g. by writing the information in such a way that some audiences are more likely than others to read it and consider it important.

Maybe the “infohazard” situation you’re thinking about really isn’t a concealed conflict and it’s actually a violation of VNM utility; in that case, the consideration to make clear is how and why VNM doesn’t apply to the situation.  Such a consideration would be a critique of Bayesian/utility based models applying to humans of the sort studied by the field of behavioral economics.  I expect that people will often be biased towards looking for exceptions to VNM rather than looking for concealed conflicts (as they are, by assumption, concealing the conflict); however, that doesn’t mean that such exceptions literally never occur.

# Selfishness, preference falsification, and AI alignment

If aliens were to try to infer human values, there are a few information sources they could start looking at.  One would be individual humans, who would want things on an individual basis.  Another would be expressions of collective values, such as Internet protocols, legal codes of states, and religious laws.  A third would be values that are implied by the presence of functioning minds in the universe at all, such as a value for logical consistency.

It is my intuition that much less complexity of value would be lost by looking at the individuals than looking at protocols or general values of minds.

Let’s first consider collective values.  Inferring what humanity collectively wants from internet protocol documents would be quite difficult; the fact a SYN packet must be followed by a SYN-ACK packet is a decision made in order to allow communication to be possible rather than an expression of a deep value.  Collective values, in general, involve protocols that allow different individuals to cooperate with each other despite their differences; they need not contain the complexity of individual values, as individuals within the collective will pursue these anyway.

Distinctions between different animal brains form more natural categories than distinctions between institutional ideologies (e.g. in terms of density of communication, such as in neurons), so that determining values by looking at individuals leads to value-representations that are more reflective of the actual complexity of the present world in comparison to determining values by looking at institutional ideologies.

There are more degenerate attractors in the space of collective values than in individual values, e.g. with each person trying to optimize “the common good” in a way that means that they say they want “the common good”, which means “the common good” (as a rough average of individuals’ stated preferences) thinks their utility function is mostly identical with “the common good”, such that “the common good” becomes  a mostly self-referential phrase, referring to something with little resemblance to what anyone wanted in the first place.  (This has a lot in common with Ayn Rand’s writing in favor of “selfishness”.)

There is reason to expect that spite strategies, which involve someone paying to harm others, are collective, rather than individual.  Imagine that there are 100 different individuals competing, and that they have the option of paying 1 unit of their own energy to deduct 10 units of another individual’s energy.  This is clearly not worth it in terms of increasing their own energy, and is also not worth it in terms of increasing the percentage of the total energy owned by them, since paying 1 energy only deducts 0.1 units of energy from the average individual.  On the other hand, if there are 2 teams fighting each other, then a team that instructs its members to hurt the other team (at cost) gains in terms of the percentage of energy controlled by the team; this situation is important enough that we have a common term for it, “war”.  Therefore, collective values are more likely than individual values to encode conflicts in a way that makes them fundamentally irreconcilable.

Let us also consider values necessary for minds-in-general.  I talked with someone at a workshop recently who had the opinion that AGI should optimize an agent-neutral notion of “good”, coming from the teleology of the universe itself, rather than human values specifically, although it would optimize our values to the extent that our values already align with the teleology.  (This is similar to Eliezer Yudkowsky’s opinion in 1997.)

There are some values embedded in the very structure of thought itself, e.g. a value for logical consistency and the possibility of running computations.  However, none of these values are “human values” exactly; at the point where these are the main thing under consideration, it starts making more sense to talk about “the telos of the universe” or “objective morality” than “human values”.  Even a paperclip maximizer would pursue these values; they appear as convergent instrumental goals.

Even though these values are important, they can be assumed to be significantly satisfied by any sufficiently powerful AGI (though probably not optimally); the difference in the desirability between a friendly and unfriendly AGI, therefore, depends primarily on other factors.

There is a somewhat subtle point, made by Spinoza, which is that the telos of the universe includes our own values as a special case, at our location; we do “what the universe wants” by pursuing our values.  Even without understanding or agreeing with this point, however, we can look at the way pure pursuit of substrate-independent values seems subjectively wrong, and consider the implications of this subjective wrongness.

“I”, “you”, “here”, and “now” are indexicals: they refer to something different depending on when, where, and who speaks them. “My values” is indexical; it refers to different value-representations (e.g. utility functions) for different individuals.

“Human values” is also effectively indexical.  The “friendly AI (FAI) problem” is framed as aligning artificial intelligence with human values because of our time and place in history; in another timeline where octopuses became sapient and developed computers before humans, AI alignment researchers would be talking about “octopus values” instead of “human values”. Moreover, “human” is just a word; we interpret it by accessing actual humans, including ourselves and others, and that is always indexical, since which humans we find depends on our location in spacetime.

Eliezer’s metaethics sequence argues that our values are, importantly, something computed by our brains, evaluating different ways the future could go.  That doesn’t mean that “what score my brain computes on a possible future” is a valid definition of what is good, but rather, that the scoring is what leads to utterances about the good.

The fact that actions, including actions about what to say is “good”, are computed by the brain does mean that there is a strong selection effect in utterances about “good”.  To utter the sentence “restaurants are good”, the brain must decide to deliver energy towards this utterance.

The brain will optimize what it does to a significant degree (though not perfectly) for continuing to receive energy, e.g. handling digestion and causing feelings of hunger that lead to eating.  This is a kind of selfishness that is hard to avoid.  The brain’s perceptors and actuators are indexical (i.e. you see and interact with stuff near you), so at least some preferences will also be indexical in this way.  It would be silly for Alice’s brain to directly care about Bob’s digestion as much as it cares about Alice’s digestion, there is separation of concerns implemented by presence of nerves directly from Alice’s brain to Alice’s digestive system but not to Bob’s digestive system.

For an academic to write published papers about “the good”, they must additionally receive enough resources to survive (e.g. by being paid), provide a definition that others’ brains will approve of, and be part of a process that causes them to be there in the first place (e.g. which can raise children to be literate).  This obviously causes selection issues if the academics are being fed and educated by a system that continues asserting an ideology in a way not responsive to counter-evidence.  If the academics would lose their job if they defined “good” in a too-heretical way, one should expect to see few heretical papers on normative ethics.

(It is usual in analytic philosophy to assume that philosophers are working toward truths that are independent of their individual agendas and incentives, with bad academic incentives being a form of encroaching badness that could impede this, whereas in continental philosophy it is usual to assert that academic work is done by individuals who have agendas as part of a power structure, e.g. Foucault saying that schools are part of an imperial power structure.)

It’s possible to see a lot of bad ethics in other times and places as resulting from this sort of selection effect (e.g. people feeling pressure to agree with prevailing beliefs in their community even if they don’t make sense), although the effect is harder to see in our own time and place due to our own socialization.  It’s in some ways a similar sort of selection effect to the fact that utterances about “the good” must receive energy from a brain process, which means we refer to “human values” rather than “octopus values” since humans, not octopuses, are talking about AI alignment.

In optimizing “human values” (something we have little choice in doing), we are accepting the results of evolutionary selection that happened in the past, in a “might makes right” way; human values are, to a significant extent, optimized so that humans having these values successfully survive and reproduce.  This is only a problem if we wanted to locate substrate-independent values (values applicable to minds in general); substrate-dependent values depend on the particular material history of the substrate, e.g. evolutionary history, and environmentally-influenced energy limitations are an inherent feature of this history.

In optimizing “the values of our society” (also something we have little choice in, although more than in the case of “human values”), we are additionally accepting the results of historical-social-cultural evolution, a process by which societies change over time and compete with each other.  As argued at the beginning, parsing values at the level of individuals leads to representing more of the complexity of the world’s already-existing agency, compared with parsing values at the level of collectives, although at least some important values are collective.

This leads to another framing on the relation between individual and collective values: preference falsification.  It’s well-known that people often report preferences they don’t act on, and that these reports are often affected by social factors.  To the extent that we are trying to get at “intrinsic values”, this is a huge problem; it means that with rare exceptions, we see reports of non-intrinsic values.

A few intuition pumps for the commonality of preference falsification:

1. Degree of difference in stated values in different historical time periods, exceeding actual change in human genetics, often corresponding to over-simplified values such as “maximizing productivity”, or simple religious values.

2. Commonality of people expressing lack of preference (e.g. about which restaurant to eat at), despite the experiences resulting from the different choices being pretty different.

3. Large differences between human stated values and predictions of evolutionary psychology, e.g. commonality people asserting that sexual repression is good.

4. Large differences in expressed values between children and adults, with children expressing more culturally-neutral values and adults expressing more culturally-specific ones.

5. “Akrasia”, people saying they “want” something without actually having the “motivation” to achieve it.

6. Feelings of “meaninglessness”, nihilism, persistent depression.

7. Schooling practices that have the effect of causing the student’s language to be aimed at pleasing authority figures rather than self-advocating.

Michelle Reilly writes on preference falsification:

Preference falsification is a reversal of the sign, and not simply a change in the magnitude, regarding some of your signaled value judgments. Each preference falsification creates some internal demand for ambiguity and a tendency to reverse the signs on all of your other preferences. Presumptively, any claim to having values differing from that which you think would maximize your inclusive fitness in the ancestral environment is either a lie, an error (potentially regarding your beliefs about what maximizes fitness, for instance, due to having mistakenly absorbed pop darwinist ideology), or a pointer to the outcome of a preference falsification imposed by culture.

(The whole article is excellent and worth reading.)

In general, someone can respond to a threat by doing what the threatener is threatening them to do, which includes hiding the threat (sometimes from consciousness itself; Jennier Freyd’s idea of betrayal trauma is related) and saying what one is being threatened into saying.  At the end of 1984, after being confined to a room and tortured, the protagonist says”I love Big Brother”, in the ultimate act of preference falsification.  Nothing following that statement can be taken as a credible statement of preferences; his expressions of preference have become ironic.

I recently had a conversation with Ben Hoffman where he zoomed in on how I wasn’t expressing coherent intentions.  More of the world around me came into the view of my consciousness, and I felt like I was representing the world more concretely in a way that led me to expressing simple preferences, such as that I liked restaurants and looking at pretty interesting things, while also feeling fear at the same time, as it seemed that what I had been doing previously was trying to be “at the ready” to answer arbitrary questions in a fear-based way; the moment faded, such that I am led to believe that it is uncommon for me to feel and express authentic preferences.  I do not think I am unusual in this regard; Michael Vassar, in a podcast with Spencer Greenberg (see also a summary by Eli Tyre), estimates that the majority of adults are “conflict theorists” who are radically falsifying their preferences, which is in line with Venkatesh Rao’s estimate that 80% of the population are “losers” who are acting from defensiveness and trying to make information relevant to comparisons between people illegible. In the “postrationalist” memespace, it is common to talk as if illegibility were an important protection; revealing information about one’s self is revealing vulnerabilities to potential attackers, making “hiding” as a generic, anonymous, history-free, hard-to-single-out person harder.

Can people who deeply falsify their preferences successfully create an aligned AI?  I argue “probably not”.  Imagine an institution that made everyone in it optimize for some utility function U that was designed by committee. That U wouldn’t be the human utility function (unless the design-by-committee process reliably determines human values, which would be extremely difficult), so forcing everyone to optimize U means you aren’t optimizing the human utility function; it has the same issues as a paperclip maximizer.

What if you try setting U = “make FAI”? “FAI” is a symbolic token (Eliezer writes about “LISP tokens”); for it to have semantics it has to connect with human value somehow, i.e. someone actually wanting something and being assisted in getting it.

Maybe it’s possible to have a research organization where some people deeply preference-falsify and some don’t, but for this to work the organization would need a legible distinction between the two classes, so no one gets confused into thinking they’re optimizing the preference-falsifiers’ utility function by constraining them to act against their values.  (I used the term “slavery” in the comment thread, which is somewhat politically charged, although it’s pointing at something important, which is that preference falsification causes someone to serve another’s values (or an imaginary other’s values) rather than their own.)

In other words: the motion that builds a FAI must chain from at least one person’s actual values, but people under preference falsification can’t do complex research in a way that chains from their actual values, so someone who actually is planning from their values must be involved in the project, especially the part of the project that is determining how human values are defined (at object and process levels).

Competent humans are both moral agents and moral patients.  A sign that someone is preference-falsifying is that they aren’t treating themselves, or others like them, as moral patients.  They might signal costly that they aren’t optimizing for themselves, they’re optimizing for the common good, against their own interests.  But at least some intrinsic preferences are selfish, due to both (a) indexicality of perceptors/actuators and (b) evolutionary psychology.  So purely-altruistic preferences will, in the usual case, come from subtracting selfish preferences from one’s values (or, sublimating them into altruistic preferences).  Eliezer has written recently about the necessity of representing partly-selfish values rather that over-writing them with altruistic values, in line with much of what I am saying here.

How does one treat one’s self as a moral agent and patient simultaneously, in a way compatible with others doing so?  We must (a) pursue our values and (b) have such pursuit not conflict too much with others’ pursuit of their values.  In mechanism design, we simultaneously have preferences over the mechanism (incentive structure) and the goods mediated by the incentive structure (e.g. goods being auctioned).  Similarly, Kant’s Categorical Imperative is a criterion for object-level preferences to be consistent with law-level preferences, which are like preferences about what legal structure to occupy; the object-level preferences are pursued subject to obeying this legal structure.  (There are probably better solutions than these, but this is a start.)

What has been stated so far is, to a significant extent, an argument for deontological ethics over utilitarian ethics.  Utilitarian ethics risks constraining everyone into optimizing “the common good” in a way that hides original preferences, which contain some selfish ones; deontological ethics allows pursuit of somewhat-selfish values as long as these values are pursued subject to laws that are willed in the same motion as willing the objects of these values themselves.

Consciousness is related to moral patiency (in that e.g. animal consciousness is regarded as an argument in favor of treating animals as moral patients), and is notoriously difficult to discuss.  I hypothesize that a lot of what is going on here is that:

1. There are many beliefs/representations that are used in different contexts to make decisions or say things.

2. The scientific method has criteria for discarding beliefs/representations, e.g. in cases of unfalsifiability, falsification by evidence, or complexity that is too high.

3. A scientific worldview will, therefore, contain a subset of the set of all beliefs had by someone.

4. It is unclear how to find the rest of the beliefs in the scientific worldview, since many have been discarded.

5. There is, therefore, a desire to be able to refer to beliefs/representations that didn’t make it into the scientific worldview, but which are still used to make decisions or say things; “consciousness” is a way of referring to beliefs/representations in a way inclusive of non-scientific beliefs.

6. There are, additionally, attempts to make consciousness and science compatible by locating conscious beliefs/representations within a scientific model, e.g. in functionalist theory of mind.

A chemist will have the experience of drinking coffee (which involves their mind processing information from the environment in a hard-to-formalize way) even if this experience is not encoded in their chemistry papers.  Alchemy, as a set of beliefs/representations, is part of experience/consciousness, but is not part of science, since it is pre-scientific.  Similarly, beliefs about ethics (at least, the ones that aren’t necessary for the scientific method itself) aren’t part of the scientific worldview, but may be experienced as valence.

Given this view, we care about consciousness in part because the representations used to read and write text like this “care about themselves”, wanting not to erase themselves from their own product.

There is, then, the question of how (or if) to extend consciousness to other representations, but at the very least, the representations used here-and-now for interpreting text are an example of consciousness.  (Obviously, “the representations used here-and-now” is indexical, connecting with the earlier discussion on the necessity of energy being provided for uttering sentences about “the good”.)

The issue of extension of consciousness is, again, similar to the issue of how different agents with somewhat-selfish goals can avoid getting into intractable conflicts.  Conflicts would result from each observer-moment assigning itself extreme importance based on its own consciousness, and not extending this to other observer-moments, especially if these other observer-moments are expected to recognize the consciousness of the first.

I perceive an important problem with the idea of “friendly AI” leading to nihilism, by the following process:

1. People want things, and wants that are more long-term and common-good-oriented are emphasized.

2. This leads people to think about AI, as it is important for automation, increasing capabilities in the long term.

3. This leads people to think about AI alignment, as it is important for the long-term future, given that AI will be relevant.

4. They have little actual understanding of AI alignment, so their thoughts are based on others’ thought, their idea of what good research should look like.

In the process their research has become disconnected from their original, ordinary wanting, which becomes subordinated to it.  But an extension of the original wanting is what “friendly AI” is trying to point at.  Unless these were connected somehow, there would be no reason or motive to value “friendly AI”; the case for it is based on reasoning about how the mind evaluates possible paths forward (e.g. in the metaethics sequence).

It becomes a paradoxical problem when people don’t feel motivated to “optimize the human utility function”.  But their utility function is what they’re motivated to do, so this is absurd, unless there is mental damage causing failure of motivations to cohere at all.  This could be imprecisely summarized as: “If you don’t want it, it’s not a friendly AI”.  The token “FAI” is meaningless unless it connects with a deep wanting.

This leads to a way that a friendly AI project could be more powerful than an unfriendly AI project: the people working on it would be more likely to actually want the result in a relatively-unconfused way, so they’d be more motivated to actually make the system work, rather than just pretending to try to make the system work.

Alignment researchers who were in touch with “wanting” would be treating themselves and others like them as moral patients.  This ties in to my discussion of my own experiences as an alignment researcher.  I said at the end:

Aside from whether things were “bad” or “not that bad” overall, understanding the specifics of what happened, including harms to specific people, is important for actually accomplishing the ambitious goals these projects are aiming at; there is no reason to expect extreme accomplishments to result without very high levels of epistemic honesty.

This is a pretty general statement, but now it’s possible to state the specifics better.  There is little reason to expect that alignment researchers that don’t treat themselves and others like them as moral patients are actually treating the rest of humanity as moral patients.  From a historical outside view, this is intergenerational trauma, “hurt people hurt people”, people who are used to being constrained/dominated in a certain way passing that along to others, which is generally part of an imperial structure that extends itself through colonization; colonizers often have narratives about how they’re acting in the interests of the colonized people, but these narratives can’t be evaluated neutrally if the colonized people in question cannot speak.  (The colonization of Liberia is a particularly striking example of colonial trauma). Treating someone as a moral patient requires accounting for costs and benefits to them, which requires either discourse with them or extreme, unprecedented advances in psychology.

I recall a conversation in 2017 where a CFAR employee told someone I knew (who was a trans woman) that there was a necessary decision between treating the trans woman in question “as a woman” or “as a man”, where “as a man” meant “as a moral agent” and “as a woman” meant “as a moral patient”, someone who’s having problems and needs help.  That same CFAR person later told me about how they are excited by the idea of “undoing gender”.  This turns out to align with the theory I am currently advocating, that it is necessary to consider one’s self as both a moral agent and a moral patient simultaneously, which is queer-coded in American 90s culture.

I can see now that, as long as I was doing “friendly AI research” from a frame of trying not to be bad or considered bad (implicitly, trying to appear to serve someone else’s goals), everything I was doing was a total confusion; I was pretending to try to solve the problem, which might have possibly worked for a much easier problem, but definitely not one as difficult as AI alignment.  After having left “the field” and gotten more of a life of my own, where there is relatively less requirement to please others by seeming abstractly good (or abstractly bad, in the case of vice signaling), I finally have an orientation that can begin to approach the real problem while seeing more of how hard it is.

The case of aligning AI with a single human is less complicated than the problem with aligning it with “all of humanity”, but this problem still contains most of the difficulty.  There is a potential failure mode where alignment researchers focus too much on their own utility function at the expense of considering others’, but (a) this is not the problem on the margin given that the problem of aligning AI with even a single human’s utility function contains most of the difficulty, and (b) this could potentially be solved with incentive alignment (inclusive of mechanism design and deontological ethics) rather than enforcing altruism, which is nearly certain to actually be enforcing preference-falsification given the difficulty of checking actual altruism.

# “Credibility” for being unbelievable

The word “credible” is perversely ambiguous.  On the face of it, it means: being trustworthy, being believable (in a Bayesian sense), being likely to make true statements and pay one’s debts.  But there’s another way the word is used, which is to indicate authority and prestige: control over which propositions are considered “truthy” (and/or agreement with controlling processes), rather than prediction of which statements are actually true.

Control over narratives, however, is anticorrelated with, and opposed to, actual believability.  If you can control the narrative to say that some proposition is either X or ~X at will, arbitrarily, then you’re using a symmetric process for “convincing” others: it’s just as easy to use it to convince of falsehood as of truth.  This is as opposed to asymmetric processes which are easier to use to convince of truth than of falsehood, e.g. public experiments, logical debate.

(The word “authority” is interesting here: “authority”, “authoritarian”, and “author” come from the same root, indicating a relation between the “authoring” of arbitrary narratives, “authoritarian” use of force by some parties to control others, and “authority” assigned to statements and producers of statements.)

While oracular reality-trackers discern facts, authority creates facts, primarily social facts; if these are the “facts” used to determine credibility, then authority and those close to it can “win” credibility, while having no corresponding ability to discern truth.

Being in a position to control narratives means having power: having maneuvered into a position to exert arbitrary influence on others.  Since power is rivalrous (it can’t be the case that everyone has lots of arbitrary influence on everyone else), acquiring power requires winning zero-sum games.  Winning zero-sum games requires allocating attention to the game itself; unless the game is set up so as to correlate with truth (e.g. a formal debate judged according to pro-epistemology standards such as logical rigor and consistency with evidence), it will be won by actors who are barely paying attention to the truth, who are bullshitting (not simply lying!).

Beyond this, zero-sum game play is opposed to revelation of information; such revelation is interpreted as aggression, as it breaks the “nothing changes” power-maintaining equilibrium.

The “calling a deer a horse” story is illustrative, demonstrating more severity than simply not paying attention to the truth.  When Zhao Gao points to a deer and says it is a horse, he effectively controls the narrative: those who want to live will “agree” with him that it’s a horse.  He isn’t believable, but he’s authoritative; he’s “credible”, as are those who submit to the threat and “agree” (ironically) with him.  (Ironic agreement is a state of doublethink, of internally disbelieving while outwardly agreeing; such ironic states of mind are suited to environments of reversed credibility.)

This story is more severe than simple bullshit, in that it involves selectively promoting false statements.  Paying enough attention to the truth to invert it and thus gain an advantage over truth-based actors is, of course, compatible with zero-sum play.

If a government known to promote lots of false stories promotes a false story as part of mobilization of military/police threat (say, the story that Saddam Hussein purchased yellow cake), is this story “credible” or “non-credible”?  It will be printed in prestigious newspapers, and will become a default assumption in many discussions, but people tracking history will have a sense of the government’s track record and know that the claim is made by the sort of actor who gets there by bullshitting.

Fiat currency is an interestingly explicit case.  The US adopted a metallic standard in 1785; government-issued money notes (US dollars) were exchangeable for a particular amount of a precious metal, initially silver and then gold.  To value US dollars is to bet that the government will be willing to exchange it for silver/gold; the money is valuable insofar as this promise is credible.

However, around WWI (1914-1918), many governments (including the US) suspended convertibility.  If the value of the money were simply based on the belief that it could be exchanged for precious metal, then the value would plummet accordingly.  But by then the money unit was well-integrated into the economy: it was used to set prices, pay wages, pay taxes, be used for bank savings and loans, and so on.  Changing protocols everywhere to adopt a new currency would be slow and difficult, and (given taxation) would run into conflict with the government.  While the value of money did reduce substantially (e.g. prices doubled in the US), this was not the totalizing devaluation that would be naively expected from a collapse of convertibility.

During the Great Depression, through Executive Order 6102 of 1933, the US government confiscated the vast majority of gold, “exchanging” it for a fixed amount of US dollars.  By the time the government is confiscating almost all gold, it’s obvious that US dollars are not valued primarily due to the expectation that they could be exchanged for gold.

So, though the “credibility” (market value) of the US dollar originally came from the belief that it could be exchanged for gold, its credibility over time shifted to be backed primarily by the authority of the US government, which is opposed to the expectation that it will pay debts.  Even if US dollars can’t be exchanged for precious metal, they are (since 1884) legal tender, valid for paying public debts (e.g. taxes) and private debts.  Since US dollars are valid for private debts (according to US courts), it’s impractical for private debts between Americans to not be reliant on the “credibility” of the US dollar.

US dollars are, at this point, a stage 3-4 simulacrum with respect to the original claim of value.  This paves the way for further manipulation of currency through Federal Reserve policy implementing Keynesian macroeconomics, a form of military mobilization (the relation between macroeconomics and mobilization is de-obfuscated by Modern Monetary Theory).  Direct manipulation of the currency is, of course, a form of authority, opposed to believability, in that it undermines use of the currency to denominate unironic debts.

Back to the more general problem.  If you asked an average college-educated American whether institutions such as the CDC or the WHO are credible, they would probably say “yes”.  However, these institutions repeatedly made hard-to-believe claims during COVID, such as the claim that masks were unhelpful, or the claim that the virus was not airborne.  Prestigious news outlets such as the New York Times did not call out these claims as false early on, which is correlated with such outlets’ “credibility”; they’re “credible” due to repeating claims made by authoritative narrative-controllers (thus, being part of the narrative-control apparatus), not due to tracking reality.

As Nick Land asks: “Assuming the WHO, CDC, and FDA wanted to kill you, how would their behavior differ?”  It wouldn’t be a coincidence for authoritative institutions to be trying to kill those they exert authority over: power is the ability to threaten others, and threats can control narratives.

I’ve seen a lot of discussions where people with some shared explicit agenda (e.g. Effective Altruists) talk about the need to “gain credibility”, and assume that the way to do so is to be closer to power; their central example of a “credible” person would be a high-level corporate/government strategic consultant or a journalist of a prestigious publication.  Such talk doesn’t distinguish between credibility-as-believability and credibility-as-authority: is being a strategic consultant helpful for convincing others because it is correlated with saying true propositions, or is it helpful because the authority of the institution (or upstream institutions) intimidates people into accepting claims made by its members despite their unbelievability?

In conclusion:

• “Credibility” conflates between believability (Bayesian evidence) and authority (ability to control narratives arbitrarily).
• Authority is derived from zero-sum game play, which is opposed to revelation of new information, and which threatens those who authority is exerted over.
• Thus, these different properties being conflated are opposed.

# On commitments to anti-normativity

Normativity: morality, ethics, doing the right thing, treating others as one would want to be treated, respecting moral symmetries, telling the truth, keeping commitments, following rules that are there to restrict harmful behavior, behaving in a way that contributes to the benefit of one’s society.

The idea of commitment to normativity is familiar.  Someone can be committed to behaving ethically, to the point that they forego some narrowly self-interested benefit to avoid behaving unethically.

What about commitment to anti-normativity?  This is commitment to doing the wrong thing, treating others as one wouldn’t want to be treated, disregarding moral symmetries, lying, breaking commitments, preventing rules from being followed, and parasitizing one’s society.

It is, naively, unsurprising that some people behave non-normatively, because non-normative behavior can bring a selfish benefit.  It is rather more surprising that commitment to anti-normativity may be a thing; such a commitment would cause one to continue behaving anti-normatively, even when normative behavior would be selfishly optimal.

Let’s look at some examples of anti-normativity:

• The phrase “snitches get stitches”, and the idea that whistleblower protections might be necessary, points at the commonality of criminal conspiracies, which punish members not for breaking the law, but for causing the law to be enforceable.  Turning in other members of a conspiracy one is part of is, in a sense, aggressing upon them: it’s causing them to face negative consequences they expected not to face.  Members of a conspiracy commit to hiding themselves and each other from the law.
• Privacy-related social norms are optimized for obscuring behavior that could be punished if widely known.  A common justification for such norms is that behavior that would be punished if known about is common, hence actual punishment is unfair scapegoating based on unpredictable factors; under privacy norms, revelation is more rare.  Such norms are sometimes enshrined into law, e.g. the Right to be Forgotten, by which some people can force records of their own behavior to be deleted.  (Note, privacy norms are an example of a paradoxical norm that is opposed to enforcement of norms-in-general).
• Traumatized people are forcefully made part of a conspiracy, and learn to side with the transgressor who is aggressing upon them.  Such learning generalizes to siding with transgressors in general, as described in The Body Keeps the Score; while watching a play about dating violence, the traumatized children yell things like “kill the bitch”, siding with the transgressor in the scene.  This is despite this transgressor not actually being powerful; in the outer setting in which the play is being put on, such behavior is frowned upon, so the traumatized kids are going against powerful social structures.  (It is easy for traumatized people to conflate transgressiveness with power, but these frequently come apart)
• It’s very common to want to exclude people who are too “moralistic” or “judgy” from social groups.  If this were just a matter of disagreeing with these people about morality, then moral argumentation would be the most natural response; what is being opposed is, rather, individuals making moral judgments in a way that implies that some normal behaviors are unacceptable.  Being committed to behaving normally, then, means being committed not to follow moral laws that would compel behaving abnormally.  (Relatedly, “vice signalling”, e.g. smoking, can make others less afraid of moral judgment, as the vice signaller has morally lowered themselves, having less optionality to claim the moral high ground.  Many Christian teachings, e.g. “judge not lest you be judged”, “Recognize always that evil is your own doing, and to impute it to yourself.”, recommend the social strategy of not claiming moral high ground.)
• Some social groups separate themselves from the “commoners”, making it clear that they’re a different class, not subject to the rules that constrain the commoners, e.g. militaries, intelligence agency members, high-level corporate executives, some professional classes, some spiritual practitioners, aristocracies throughout history.  The Inner Ring describes a general dynamic of this form.  They may transgression-bond with each other to show that they are not subject to the normal rules.  Nazi legal theorist Carl Schmitt writes that “Sovereign is he who decides on the exception”, i.e. the truly autonomous leader can allow rules to be broken at will; David Graeber describes royal and ritualistic power as involving socially tolerated value inversion in the last chapter of On Kings.

Why would dynamics like these result in commitments to anti-normativity? In some cases, like criminal conspiracy, the answer is obvious: exiting the conspiracy is, by default, dangerous. In general, being part of a conspiracy for enough time will cause conspiratorial behavior to seem “normal”, such that going back to non-conspiratorial behavior requires resetting one’s sense of normal behavior, as in cult deconversion.

Anti-normativity is closely related to motive ambiguity; if there is ambiguity between the motives of normativity and of local expediency (or other local social motives), then behaving anti-normatively signals that local expediency is what is being optimized for, and shows that one is giving up the option of blaming others for behaving non-normatively.

A bubble of anti-normativity is one where members are constantly signalling that they are behaving non-normativity and are encouraging others to behave non-normatively as well.  Such a bubble (essentially, a conspiracy) can maintain itself as long as it can continue meeting its constraints, e.g. intaking enough resources and not being successfully opposed.

How is anti-normativity related to oppression?  In a society that runs on normativity, there can be something approaching equality of opportunity; people can gain for themselves by following the rules and providing value to others.  In a society that runs on anti-normativity, such strategies will fail.  Instead of following the rules being the way to get ahead, accommodating anti-normativity while still conforming to local cultural expectations is necessary to get ahead.  Kelsey Piper recently described dynamics in bureaucracies by which lower-class people get treated worse than upper-middle-class people, despite appealing to the same rules.  Simply depending on the bureaucracies to follow the rules fails, since they don’t follow the rules; instead, it’s necessary to have more subtle social skills, such as knowing when to appeal, talking to people in a polite yet demanding way, seeming like the kind of person who society generally treats well, seeming to be expensive to mess with, and so on.

Our society has a term for people who follow rules consistently (Asperger’s syndrome); it is considered a mental disorder, one that sharply reduces people’s social skills.  While Asperger’s is adaptive in lawful societies, it is maladaptive in anti-lawful societies, such as Nazi Germany, where the term was coined.  Hans Asperger was a Nazi who euthanized some of his patients; he identified the flaw of Asperger’s patients as failure to be absorbed into the national super-organism, a flaw also attributed to Jews, who have a highly lawful religion and are disproportionately likely to be diagnosed with Asperger’s.

If bureaucracies followed rules consistently, then Asperger’s would not be a social disadvantage; it would imply a high ability to navigate society.  In a society where corporations and other bureaucracies are anti-normative moral mazes, Asperger’s is a disadvantage, because appealing to rules alone is not an effective way to cause bureaucracies to provide service.

(A common intuition is that bureaucracies are bad because they follow the rules consistently, lacking subtle human factors.  As a counter to this intuition, consider the case of MMORPG games; the game mechanics function as a rule-following bureaucracy, e.g. the mechanics of stores and banking in the game.  Such games are fun because of the consistency of the software rules; inconsistency in game mechanics decreases predictability of effects of action, thereby decreasing effective planning horizons and increasing perception of unfairness.)

One can appeal to institutions on the basis of rules, or one can appeal on the basis of privilege, being the sort of person who should be rewarded for no reason.  Social classes are a matter of privilege, of people being treated one way or another because of who they are, what category they fit in, based on largely aesthetic properties.

If treatment by institutions is a matter of illegible cultural factors, then a large part of what is important is to be “normal”: being near the center of some Gaussian-ish distribution over people, such as a social class.  When everyone is transgressing, non-transgression isn’t a defense, while not standing out from the crowd (hiding as a statistic) is, since it prevents being singled out for scapegoating. The behavior is much more Fristonian (avoiding surprise) than decision-theoretic (trying to accomplish something that isn’t already the case).

Culture is correlated with race, both because people of different ancestry have different histories, and because people treat each other differently depending on appearance.  If society’s institutions are disproportionately occupied by people of some cultural group, then their sense of “normal” will accord with what is normal for that cultural group, not what is normal for other cultural groups.

So, anti-normativity is racially/culturally biased by default, in a way that normativity isn’t, or at least is much less so.  While explicit rules can be followed by people of a variety of different cultures, implicit social expectations are naturally particular to a narrow set of cultures.  Anti-normativity will tend to force behavior to follow a Gaussian-like distribution, where more central behavior is, by default, more rewarded than extremal behavior (with the exception of savvy extremal behavior optimized for taking advantage of the anti-normative dynamic).

Therefore, explicit anti-racism is much more necessary for mitigating oppression if anti-normativity is dominant than if normativity is dominant; having institutions staffed by a people of a variety of different cultures broadens the set of what is considered normal by people in the institution, causing it to be more natural for the institution to service people of a variety of races/cultures.  This is, obviously, nowhere near a good solution, since institutions are still not following the rules, and not all cultures can be represented in a given institution; it is, rather, a harm-reduction measure given an already-bad situation.

# Many-worlds versus discrete knowledge

[epistemic status: I’m a mathematical and philosophical expert but not a QM expert; conclusions are very much tentative]

There is tension between the following two claims:

• The fundamental nature of reality consists of the wave function whose evolution follows the Schrödinger equation.
• Some discrete facts are known.

(What is discrete knowledge? It is knowledge that some nontrivial proposition X is definitely true. The sort of knowledge a Bayesian may update on, and the sort of knowledge that logic applies to.)

The issue here is that facts are facts about something. If quantum mechanics has any epistemic basis, then at least some things are known, e.g. the words in a book on quantum mechanics, or the outcomes of QM experiments. The question is what this knowledge is about.

If the fundamental nature of reality is the wave function, then these facts must be facts about the wave function. But, this runs into problems.

Suppose the fact in question is “A photon passed through the measurement apparatus”. How does this translate to a fact about the wave function?

The wave function consists of a mapping from the configuration space (some subset of R^n) to complex numbers. Some configurations (R^n points) have a photon at a given location and some don’t. So the fact of a photon passing through the apparatus or not is a fact about configurations (or configuration-histories), not about wave functions over configurations.

Yes, some wave functions assign more amplitude to configurations in which the photon passes through the apparatus than others. Still, this does not allow discrete knowledge of the wave function to follow from discrete knowledge of measurements.

The Bohm interpretation, on the other hand, has an answer to this question. When we know a fact, we know a fact about the true configuration-history, which is an element of the theory.

In a sense, the Bohm interpretation states that indexical information about which world we are in is part of fundamental reality, unlike the many-worlds interpretation which states that fundamental reality contains no indexical information. (I have discussed the trouble of indexicals with respect to physicalism previously)

Including such indexical information as “part of reality” means that discrete knowledge is possible, as the discrete knowledge is knowledge of this indexical information.

For this reason, I significantly prefer the Bohm interpretation over the many-worlds interpretation, while acknowledging that there is a great deal of uncertainty here and that there may be a much better interpretation possible. Though my reservations about the many-worlds interpretation had led me to be ambivalent about the comparison between the many-worlds interpretation and the Copenhagen interpretation, I am not similarly ambivalent about Bohm versus many-worlds; I significantly prefer the Bohm interpretation to both many-worlds and to the Copenhagen interpretation.

# Modeling naturalized decision problems in linear logic

The following is a model of a simple decision problem (namely, the 5 and 10 problem) in linear logic. Basic familiarity with linear logic is assumed (enough to know what it means to say linear logic is a resource logic), although knowing all the operators isn’t necessary.

The 5 and 10 problem is, simply, a choice between taking a 5 dollar bill and a 10 dollar bill, with the 10 dollar bill valued more highly.

While the problem itself is trivial, the main theoretical issue is in modeling counterfactuals. If you took the 10 dollar bill, what would have happened if you had taken the 5 dollar bill? If your source code is fixed, then there isn’t a logically coherent possible world where you took the 5 dollar bill.

I became interested in using linear logic to model decision problems due to noticing a structural similarity between linear logic and the real world, namely irreversibility. A vending machine may, in linear logic, be represented as a proposition “$1 → CandyBar”, encoding the fact that$1 may be exchanged for a candy bar, being consumed in the process. Since the $1 is consumed, the operation is irreversible. Additionally, there may be multiple options offered, e.g. “$1 → Gumball”, such that only one option may be taken. (Note that I am using “→” as notation for linear implication.)

This is a good fit for real-world decision problems, where e.g. taking the $10 bill precludes also taking the$5 bill. Modeling decision problems using linear logic may, then, yield insights regarding the sense in which counterfactuals do or don’t exist.

## First try: just the decision problem

As a first try, let’s simply try to translate the logic of the 5 and 10 situation into linear logic. We assume logical atoms named “Start”, “End”, “$5”, and “$10”. Respectively, these represent: the state of being at the start of the problem, the state of being at the end of the problem, having $5, and having$10.

To represent that we have the option of taking either bill, we assume the following implications:

TakeFive : Start → End ⊗ $5 TakeTen : Start → End ⊗$10

The “⊗” operator can be read as “and” in the sense of “I have a book and some cheese on the table”; it combines multiple resources into a single linear proposition.

So, the above implications state that it is possible, starting from the start state, to end up in the end state, yielding $5 if you took the five dollar bill, and$10 if you took the 10 dollar bill.

The agent’s goal is to prove “Start → End ⊗ $X”, for X as high as possible. Clearly, “TakeTen” is a solution for X = 10. Assuming the logic is consistent, no better proof is possible. By the Curry-Howard isomorphism, the proof represents a computational strategy for acting in the world, namely, taking the$10 bill.

## Second try: source code determining action

The above analysis is utterly trivial. What makes the 5 and 10 problem nontrivial is naturalizing it, to the point where the agent is a causal entity similar to the environment. One way to model the agent being a causal entity is to assume that it has source code.

Let “M” be a Turing machine specification. Let “Ret(M, x)” represent the proposition that M returns x. Note that, if M never halts, then Ret(M, x) is not true for any x.

How do we model the fact that the agent’s action is produced by a computer program? What we would like to be able to assume is that the agent’s action is equal to the output of some machine M. To do this, we need to augment the TakeFive/TakeTen actions to yield additional data:

TakeFive : Start → End ⊗ $5 ⊗ ITookFive TakeTen : Start → End ⊗$10 ⊗ ITookTen

The ITookFive / ITookTen propositions are a kind of token assuring that the agent (“I”) took five or ten. (Both of these are interpreted as classical propositions, so they may be duplicated or deleted freely).

How do we relate these propositions to the source code, M? We will say that M must agree with whatever action the agent took:

MachineFive : ITookFive → Ret(M, “Five”)

MachineTen : ITookTen → Ret(M, “Ten”)

These operations yield, from the fact that “I” have taken five or ten, that the source code “M” eventually returns a string identical with this action. Thus, these encode the assumption that “my source code is M”, in the sense that my action always agrees with M’s.

Operationally speaking, after the agent has taken 5 or 10, the agent can be assured of the mathematical fact that M returns the same action. (This is relevant in more complex decision problems, such as twin prisoner’s dilemma, where the agent’s utility depends on mathematical facts about what values different machines return)

Importantly, the agent can’t use MachineFive/MachineTen to know what action M takes before actually taking the action. Otherwise, the agent could take the opposite of the action they know they will take, causing a logical inconsistency. The above construction would not work if the machine were only run for a finite number of steps before being forced to return an answer; that would lead to the agent being able to know what action it will take, by running M for that finite number of steps.

This model naturally handles cases where M never halts; if the agent never executes either TakeFive or TakeTen, then it can never execute either MachineFive or MachineTen, and so cannot be assured of Ret(M, x) for any x; indeed, if the agent never takes any action, then Ret(M, x) isn’t true for any x, as that would imply that the agent eventually takes action x.

## Interpreting the counterfactuals

At this point, it’s worth discussing the sense in which counterfactuals do or do not exist. Let’s first discuss the simpler case, where there is no assumption about source code.

First, from the perspective of the logic itself, only one of TakeFive or TakeTen may be evaluated. There cannot be both a fact of the matter about what happens if the agent takes five, and a fact of the matter about what happens if the agent takes ten. This is because even defining both facts at once requires re-using the Start proposition.

So, from the perspective of the logic, there aren’t counterfactuals; only one operation is actually run, and what “would have happened” if the other operation were run is undefinable.

On the other hand, there is an important sense in which the proof system contains counterfactuals. In constructing a linear logic proof, different choices may be made. Given “Start” as an assumption, I may prove “End ⊗ $5” by executing TakeFive, or “End ⊗$10” by executing TakeTen, but not both.

Proof systems are, in general, systems of rules for constructing proofs, which leave quite a lot of freedom in which proofs are constructed. By the Curry-Howard isomorphism, the freedom in how the proofs are constructed corresponds to freedom in how the agent behaves in the real world; using TakeFive in a proof has the effect, if executed, of actually (irreversibly) taking the $5 bill. So, we can say, by reasoning about the proof system, that if TakeFive is run, then$5 will be yielded, and if TakeTen is run, then $10 will be yielded, and only one of these may be run. The logic itself says there can’t be a fact of the matter about both what happens if 5 is taken and if 10 is taken. On the other hand, the proof system says that both proofs that get$5 by taking 5, and proofs that get 10 by taking 10, are possible. How to interpret this difference? One way is by asserting that the logic is about the territory, while the proof system is about the map; so, counterfactuals are represented in the map, even though the map itself asserts that there is only a singular territory. And, importantly, the map doesn’t represent the entire territory; it’s a proof system for reasoning about the territory, not the territory itself. The map may, thus, be “looser” than the territory, allowing more possibilities than could possibly be actually realized. What prevents the map from drawing out logical implications to the point where it becomes clear that only one action may possibly be taken? Given the second-try setup, the agent simply cannot use the fact of their source code being M, until actually taking the action; thus, no amount of drawing implications can conclude anything about the relationship between M and the agent’s action. In addition to this, reasoning about M itself becomes harder the longer M runs, i.e. the longer the agent is waiting to make the decision; so, simply reasoning about the map, without taking actions, need not conclude anything about which action will be taken, leaving both possibilities live until one is selected. ## Conclusion This approach aligns significantly with the less-formal descriptions given of subjective implication decision theory and counterfactual nonrealism. Counterfactuals aren’t real in the sense that they are definable after having taken the relevant action; rather, an agent in a state of uncertainty about which action it will take may consider multiple possibilities as freely selectable, even if they are assured that their selection will be equal to the output of some computer program. The linear logic formalization increases my confidence in this approach, by providing a very precise notion of the sense in which the counterfactuals do and don’t exist, which would be hard to make precise without similar formalism. I am, at this point, less worried about the problems with counterfactual nonrealism (such as global accounting) than I was when I wrote the post, and more worried about the problems of policy-dependent source code (which requires the environment to be an ensemble of deterministic universes, rather than a single one), such that I have updated towards counterfactual nonrealism as a result of this analysis, although I am still not confident. Overall, I find linear logic quite promising for modeling embedded decision problems from the perspective of an embedded agent, as it builds critical facts such as non-reversibility into the logic itself. ## Appendix: spurious counterfactuals The following describes the problem of spurious counterfactuals in relation to the model. Assume the second-try setup. Suppose the agent becomes assured that Ret(M, “Five”); that is, that M returns the action “Five”. From this, it is provable that the agent may, given Start, attain the linear logic proposition 0, by taking action “Ten” and then running MachineTen to get Ret(M, “Ten”), which yields inconsistency with Ret(M, “Five”). From 0, anything follows, e.g.1000000, by the principle of explosion.

If the agent is maximizing guaranteed utility, then they will take the $10 bill, to be assured of the highest utility possible. So, it cannot be the case that the agent can be correctly assured that they will take action five, as that would lead to them taking a different action. If, on the other hand, the agent would have provably taken the$5 bill upon receiving the assurance (say, because they notice that taking the \$10 bill could result in the worst possible utility), then there is a potential issue with this assurance being a self-fulfilling prophecy. But, if the agent is constructing proofs (plans for action) so as to maximize guaranteed utility, this will not occur.

This solution is essentially the same as the one given in the paper on UDT with a known search order.

# Topological metaphysics: relating point-set topology and locale theory

The following is an informal exposition of some mathematical concepts from Topology via Logic, with special attention to philosophical implications. Those seeking more technical detail should simply read the book.

There are, roughly, two ways of doing topology:

• Point-set topology: Start with a set of points. Consider a topology as a set of subsets of these points which are “open”, where open sets must satisfy some laws.
• Locale theory: Start with a set of opens (similar to propositions), which are closed under some logical operators (especially and and or), and satisfy logical relations.

What laws are satisfied?

• For point-set topology: The empty set and the full set must both be open; finite intersections and infinite unions of opens must be open.
• For local theory: “True” and “false” must be opens; the opens must be closed under finite “and” and infinite “or”; and some logical equivalences must be satisfied, such that “and” and “or” work as expected.

Roughly, open sets and opens both correspond to verifiable propositions. If X and Y are both verifiable, then both “X or Y” and “X and Y” are verifiable; and, indeed, even countably infinite disjunctions of verifiable statements are verifiable, by exhibiting the particular statement in the disjunction that is verified as true.

What’s the philosophical interpretation of the difference between point-set topology and locale theory, then?

• Point-set topology corresponds to the theory of possible worlds. There is a “real state of affairs”, which can be partially known about. Open sets are “events” that are potentially observable (verifiable). Ontology comes before epistemology. Possible worlds are associated with classical logic and classical probability/utility theory.
• Locale theory corresponds to the theory of situation semantics. There are facts that are true in a particular situation, which have logical relations with each other. The first three lines of Wittgenstein’s Tracatus Logico-Philosophicus are: “The world is everything that is the case. / The world is the totality of facts, not of things. / The world is determined by the facts, and by these being all the facts.” Epistemology comes before ontology. Situation semantics is associated with intuitionist logic and Jeffrey-Bolker utility theory (recently discussed by Abram Demski).

Thus, they correspond to fairly different metaphysics. Can these different metaphysics be converted to each other?

• Converting from point-set topology to locale theory is easy. The opens are, simply, the open sets; their logical relations (and/or) are determined by set operations (intersection/union). They automatically satisfy the required laws.
• To convert from locale theory to point-set topology, construct possible worlds as sets of opens (which must be logically coherent, e.g. the set of opens can’t include “A and B” without including “A”), which are interpreted as the set of opens that are true of that possible world. The open sets of the topology correspond with the opens, as sets of possible words which contain the open.

From assumptions about possible worlds and possible observations of it, it is possible to derive a logic of observations; from assumptions about the logical relations of different propositions, it is possible to consider a set of possible worlds and interpretations of the propositions as world-properties.

Metaphysically, we can consider point-set topology as ontology-first, and locale theory as epistemology-first. Point-set topology starts with possible worlds, corresponding to Kantian noumena; locale theory starts with verifiable propositions, corresponding to Kantian phenomena.

While the interpretation of a given point-set topology as a locale is trivial, the interpretation of a locale theory as a point-set topology is less so. What this construction yields is a way of getting from observations to possible worlds. From the set of things that can be known (and knowable logical relations between these knowables), it is possible to conjecture a consistent set of possible worlds and ways those knowables relate to the possible worlds.

Of course, the true possible worlds may be finer-grained than these consistent set; however, it cannot be coarser-grained, or else the same possible world would result in different observations. No finer potentially-observable (verifiable or falsifiable) distinctions may be made between possible worlds than the ones yielded by this transformation; making finer distinctions risks positing unreferenceable entities in a self-defeating manner.

How much extra ontological reach does this transformation yield? If the locale has a countable basis, then the point-set topology may have an uncountable point-set (specifically, of the same cardinality as the reals). The continuous can, then, be constructed from the discrete, as the underlying continuous state of affairs that could generate any given possibly-infinite set of discrete observations.

In particular, the reals may be constructed from a locale based on open intervals whose beginning/end are rational numbers. That is: a real r may be represented as a set of (a, b) pairs where a and b are rational, and a < r < b. The locale whose basis is rational-delimited open intervals (whose elements are countable unions of such open intervals, and which specifies logical relationships between them, e.g. conjunction) yields the point-set topology of the reals. (Note that, although including all countable unions of basis elements would make the locale uncountable, it is possible to weaken the notion of locale to only require unions of recursively enumerable sets, which preserves countability)

If metaphysics may be defined as the general framework bridging between ontology and epistemology, then the conversions discussed provide a metaphysics: a way of relating that-which-could-be to that-which-can-be-known.

I think this relationship is quite interesting and clarifying. I find it useful in my own present philosophical project, in terms of relating subject-centered epistemology to possible centered worlds. Ontology can reach further than epistemology, and topology provides mathematical frameworks for modeling this.

That this construction yields continuous from discrete is an added bonus, which should be quite helpful in clarifying the relation between the mental and physical. Mental phenomena must be at least partially discrete for logical epistemology to be applicable; meanwhile, physical theories including Newtonian mechanics and standard quantum theory posit that physical reality is continuous, consisting of particle positions or a wave function. Thus, relating discrete epistemology to continuous ontology is directly relevant to philosophy of science and theory of mind.