While writing another document, I noticed I kept referring to Kantian concepts. Since most people haven’t read Kant, that would lead to interpretation problems by default. I’m not satisfied with any summary out there for the purpose of explaining Kantian concepts as I understand them. This isn’t summarizing the work as a whole given I’m focusing on the parts that I actually understood and continue to find useful.
I will refer to computer science and statistical concepts, such as Bayesianism, Solomonoff induction, and AI algorithms. Different explainers are, of course, appropriate to different audiences.
Last year I had planned on writing a longer explainer (perhaps chapter-by-chapter), however that became exhausting due to the length of the text. So I’ll instead focus on what still stuck after a year, that I keep wanting to refer to. This is mostly concepts from the first third of the work.
This document is structured similar to a glossary, explaining concepts and how they fit together.
Kant himself notes that the Critique of Pure Reason is written in a dry and scholastic style, with few concrete examples, and therefore “could never be made suitable for popular use”. Perhaps this explainer will help.
We are compelled to reason about questions we cannot answer, like whether the universe is finite or infinite, or whether god(s) exist. There is an “arena of endless contests” between different unprovable assumptions, called Metaphysics.
Metaphysics, once the “queen of all the sciences”, has become unfashionable due to lack of substantial progress.
Metaphysics may be categorized as dogmatic, skeptical, or critical:
- Dogmatic metaphysics makes and uses unprovable assumptions about the nature of reality.
- Skeptical metaphysics rejects all unprovable assumptions, in the process ceasing to know much at all.
- Critical metaphysics is what Kant seeks to do: find the boundaries of what reason can and cannot know.
Kant is trying to be comprehensive, so that “there cannot be a single metaphysical problem that has not been solved here, or at least to the solution of which the key has not been provided.” A bold claim. But, this project doesn’t require extending knowledge past the limits of possible experience, just taking an “inventory of all we possess through pure reason, ordered systematically”.
The Copernican revolution in philosophy
Kant compares himself to Copernicus; the Critique of Pure Reason is commonly referred to as a Copernican revolution in philosophy. Instead of conforming our intuition to objects, we note that objects as we experience them must conform to our intuition (e.g. objects appear in the intuition of space). This is sort of a reverse Copernican revolution; Copernicus zooms out even further from “the world (Earth)” to “the sun”, while Kant zooms in from “the world” to “our perspective(s)”.
Phenomena and noumena
Phenomena are things as they appear to us, noumena are things as they are in themselves (or “things in themselves”); rational cognition can only know things about phenomena, not noumena. “Noumenon” is essentially a limiting negative concept, constituting any remaining reality other than what could potentially appear to us.
Kant writes: “this conception [of the noumenon] is necessary to restrain sensuous intuition within the bounds of phenomena, and thus to limit the objective validity of sensuous cognition; for things in themselves, which lie beyond its province, are called noumena for the very purpose of indicating that this cognition does not extend its application to all that the understanding thinks. But, after all, the possibility of such noumena is quite incomprehensible, and beyond the sphere of phenomena, all is for us a mere void… The conception of a noumenon is therefore merely a limitative conception and therefore only of negative use. But it is not an arbitrary or fictitious notion, but is connected with the limitation of sensibility, without, however, being capable of presenting us with any positive datum beyond this sphere.”
It is a “problematical” concept; “the class of noumena have no determinate object corresponding to them, and cannot therefore possess objective validity”; it is more like a directional arrow in the space of ontology than like any particular thing within any ontology. Science progresses in part by repeatedly pulling the rug on the old ontology, “revealing” a more foundational layer underneath (a Kuhnian “paradigm shift”), which may be called more “noumenal” than the previous layer, but which is actually still phenomenal, in that it is cognizable through the scientific theory and corresponds to observations; “noumena”, after the paradigm shift, is a placeholder concept that any future paradigm shifts can fill in with their new “foundational” layer.
Use of the word “noumenon” signals a kind of humility, of disbelieving that we have access to “the real truth”, while being skeptical that anyone else does either.
In Bayesianism, roughly, the “noumenon” is specified by the hypothesis, while the “phenomena” are the observations. Assume for now the Bayesian observation is a deterministic function of the hypothesis; then, multiple noumena may correspond to a single phenomenon. Bayesianism allows for gaining information about the noumenon from the phenomenon. However, all we learn is that the noumenon is some hypothesis which corresponds to the phenomenon; in the posterior distribution, the hypotheses compatible with the observations maintain the same probabilities relative to each other that they did in the prior distribution.
(In cases where the observation is not a deterministic function of the hypothesis, as in the standard Bayes’ Rule, consider replacing “hypothesis” above with the “(hypothesis, observation)” ordered pair.)
In Solomonoff Induction, there is only a limited amount we can learn about the “noumenon” (stochastic Turing machine generating our observations + its bits of stochasticity), since there exist equivalent Turing machines.
A priori and a posteriori
A priori refers to the epistemic state possessed before taking in observations. In Bayesianism this is the P(X) operator unconditioned on any observations.
A posteriori refers to the epistemic state possessed after taking in observations. In Bayesianism this is P(X | O) where O refers to past observation(s) made by an agent, which may be numbered to indicate time steps, as in a POMDP.
While Kant and Hume agree that we can’t learn about universal laws from experience (due to Hume’s problem of induction), Hume concludes that this means we don’t know about universal laws, while Kant instead argues that our knowledge about universal laws must involve a priori judgements, e.g. geometric or arithmetic judgments. (One man’s modus ponens is another’s modus tollens…)
Analytic and synthetic
Analytic propositions are ones that can be verified as true by expanding out definitions and doing basic formal operations. A common example is “All bachelors are unmarried”, which can be verified by replacing “bachelor” with “unmarried man”.
Synthetic propositions can’t be verified as true this way, e.g. “All bachelors are alone”. They can be true regardless. Causal judgments are synthetic, we can’t get the Principle of Sufficient Reason analytically.
Contemporary STEM people are likely to think something like this at this point: “Doesn’t that just mean analytic propositions are mathematical, and synthetic propositions are empirical/scientific?”
An immediate problem with this account: Kant doesn’t think geometric propositions are analytic. Consider the proposition “A square is equal to itself when turned 90 degrees on its center”. It’s not apparent how to verify the proposition as true by properly defining “square” and so on, and doing basic logical/textual transformations. Instead we verify it by relating it to possible experience, imagining a rotating square in the visual field.
From a geometrical proof that a square is equal to itself when turned 90 degrees on its center, a prediction about possible experience can be derived, namely, that turning a square piece of wood 90 degrees by its center results in a wood square having the same shape and location in the visual field as it did previously. Mathematics needs to correspond to possible experience to have application to the perceptible outside world.
Kant doesn’t even think arithmetic propositions are analytic. To get 2+2=4 from “analytic” operations, we could try defining 2=1+1, 4=1+1+1+1, then observing 2+2=(1+1)+(1+1)=1+1+1+1=4, however this requires using the associative property of addition. Perhaps there is an alternative way to prove this “analytically” but neither Kant nor I know of that. Instead we can verify addition by, synthetically, corresponding numbers to our fingers, which “automatically” get commutative/associative properties.
The synthetic a priori
Besides the issue that math relates to possible experience, another problem with “analytic = mathematical” is that, as Kant argues, some propositions are both synthetic and a priori, and “the real problem of pure reason” is how we can know such propositions.
Here’s an argument for this. Suppose we first observe O and then conclude that P is true. If we’re reasoning validly, P is true a posteriori (relative to O). But this whole thought experiment pre-supposes that there is a time-structure in which we first see O and then we make a judgment about P. This time-structure is to some degree present even before seeing O, such that O can be evidence about P.
Imagine trying to argue that it’s raining outside to a person who doesn’t believe in time (including their own memory). Since you’re both inside and there are no windows, they can’t see that it’s raining. You try to argue that you were previously both outside and saw that it was raining. But they think there’s no such thing as “the past” so this is not convincing.
To make the argument to them successfully, their mind has to already implement certain dynamics even before receiving observations.
A baby is already born with cognitive machinery, it can’t “learn” all of that machinery from data, the process of learning itself already requires this machinery to be present in order to structure observations and relate them to future ones. (In some sense there was no cognition prior to abiogenesis, though; there is a difference between the time ordering of science and of cognitive development.)
In Bayesianism, to learn P from O, it must be the case a priori that P is correlated with O. This correlational structure could be expressed as a Bayesian network. This network would encode an a priori assumption about how P and O are correlated.
Solomonoff induction doesn’t encode a fixed network structure between its observations, instead it uses a mixture model over all stochastic Turing machines. However, all these machines have something in common, they’re all Stochastic turing machines producing an output stream of bits. Solomonoff induction assumes a priori “my observations are generated by a stochastic Turing machine”, it doesn’t learn this from data.
One could try pointing to problems with this argument, e.g. perhaps “there is time” isn’t a valid proposition, and time is a non-propositional structure in which propositions exist. But now that I just wrote that, it seems like I’m asserting a proposition about time to be true, in a contradictory fashion. The English language is more reflective than the language of Bayesian networks, allowing statements about the structure of propositions to themselves be propositions, as if the fact of the Bayesian network being arranged a particular way were itself represented by an assignment of a truth value to a node in that same Bayesian network.
Philosophers today call Kant’s philosophy “transcendental idealism”. Kant himself uses the word “transcendental” to refer to cognitions about how cognitions are possible a priori.
This is in part an archaeological process. We see, right now, that we live in a phenomenal world that is approximately Euclidean. Was our phenomenal world always Euclidean, or was it non-Euclidean at some point and then switched over to Euclidean, or is time not enough of a real thing for this to cover all the cases? This sort of speculation about what the a priori empty mind is, from our a posteriori sense of the world, is transcendental.
One angle on the transcendental is, what else has to be true for the immediate (immanent) experience you are having right now to be what it is? If you are seeing a chair, that implies that chairs exist (at least as phenomena); if you see the same chair twice, that means that phenomenal objects can re-occur at different times; and so on.
The transcendental aesthetic
Aesthetic means sense. The transcendental aesthetic therefore refers to the a priori cognitive structures necessary for us to have sensation.
Mainly (Kant argues) these are space and time. I often call these “subjective space”, “subjective time”, “subjective spacetime”, to emphasize that they are phenomenal and agent-centered.
Most of our observations appear in space, e.g. visual input, emotional “felt senses” having a location in the body. To some extent we “learn” how to see the world spatially, however some spatial structures are hardwired (e.g. the visual cortex). Image processing AIs operate on spatial images stored as multidimensional arrays; arbitrarily rearranging the array would make some algorithms (such as convolutional neural networks) operate worse, indicating that the pre-formatting of data into a spatial array before it is fed into the algorithm is functional.
If space weren’t a priori then we couldn’t become fully confident of geometrical laws such as “a square turned 90 degrees about its center is the same shape”, we’d have to learn these laws from experience, running into Hume’s problem of induction.
There is only one space, since when attempting to imagine two spaces, one is putting them side by side; there must be some outermost container.
Space is infinite, unbounded. This doesn’t imply that the infinity is all represented, just that the concept allows for indefinite extension. Finite space can be derived by adding a bound to infinite space; this is similar to Spinoza’s approach to finitude in the Ethics.
Space isn’t a property of things in themselves, it’s a property of phenomena, things as they relate to our intuition. When formalizing mathematical space, points are assigned coordinates relative to the (0, 0) origin. We always intuit objects relative to some origin, which may be near the eyes or head. At the same time, space is necessary for objectivity; without space, there is no idea of external objects.
Our intuitions about space can only get universal geometric propositions if these propositions describe objects as they must necessarily appear in our intuition, not if they are describing arbitrary objects even as they may not appear in our intuition. As a motivating intuition, consider that non-Euclidean geometry is mathematically consistent; if objective space were non-Euclidean, then our Euclidean intuitions would not yield universally valid geometric laws about objective space. (As it turns out, contemporary physics theories propose that space is non-Euclidean.)
We also see observations extending over time. Over short time scales there is a sense of continuity in time; over long time scales we have more discrete “memories” that refer to previous moments, making those past moments relevant to the present. The structuring of our experiences over time is necessary for learning, otherwise there wouldn’t be a “past” to learn from. AIs are, similarly, fed data in a pre-coded (not learned) temporal structure, e.g. POMDP observations in a reinforcement learning context.
The time in which succession takes place is, importantly, different from objective clock time, though (typically) these do not disagree about ordering, only pacing. For example, there is usually only a small amount of time remembered during sleep, relative to the objective clock time that passes during sleep. (The theory of relativity further problematizes “objective clock time”, so that different clocks may disagree about how much time has passed.)
We may, analogously, consider the case of a Solomonoff inductor that is periodically stopped and restarted as a computer process; while the inductor may measure subjective time by number of observation bits, this does not correspond to objective clock time, since a large amount of clock time may pass between when the inductor is stopped and restarted.
Kant writes, “Different times are only parts of one and the same time”. Perhaps he is, here, too quick to dismiss non-linear forms of time; perhaps our future will branch into multiple non-interacting timelines, and perhaps this has happened in the past. One especially plausible nonlinear timelike structure is a directed acyclic graph. Still, DAGs have an order; despite time being nonlinear, it still advances from moment to moment. It is also possible to arbitrarily order a DAG through a topological sort, so the main relevant difference is that DAGs may drop this unnecessary ordering information.
Time is by default boundless but can be bounded, like space.
“Time is nothing other than the form of the inner sense, i.e. of the intuition of our self and our inner sense”, in contrast to space, which is the form of the external sense. To give some intuition for this, suppose I have memory of some sequence of parties I have experienced; perhaps the first consists of myself, Bob, and Sally, the second consists of myself, Sally, and David, and the third consists of myself, Bob, and David. What is common between all the parties I remember is that I have been at all of them; this is true for no one other than myself. So, my memory is of situations involving myself; “me” is what is in common between all situations occurring in my subjective timeline.
Since time is the form of the inner sense, it applies to all representations, not only ones concerning outer objects, since all representations are in the mind.
Time is, like space, a condition for objects to appear to us, not a property of things in themselves.
Kant clarifies the way in which we fail to cognize objects in themselves, with the example of a triangle: “if the object (that is, the triangle) were something in itself, without relation to you the subject; how could you affirm that that which lies necessarily in your subjective conditions in order to construct a triangle, must also necessarily belong to the triangle in itself?”
Relational knowledge allows us to know objects as they relate to us, but not as they don’t relate to us. Geometry applies to objects that have locations in spacetime; for objects to appear in subjective spacetime, they must have coordinates relative to the (0, 0) origin, that is, the self; therefore, geometry applies to objects that have locations relative to the self; without a location relative to the self, the object would not appear in subjective spacetime.
It may seem silly to say that this “merely relational” knowledge fails to understand the object in itself; what properties are there to understand other than relational properties? A triangle “in itself” independent of space (which relates the different parts of the triangle to each other) is a rather empty concept.
What is given up on, here, is an absolute reference frame, a “view from nowhere”, from which objects could be conceived of in a way that is independent of all subjects; instead, we attain a view from somewhere, namely, from subjective spacetime.
Einstein’s theory of special relativity also drops the absolute reference frame, however it specifies connections and translations between subjective reference frames in a way that Kant’s theory doesn’t.
Sensibility and understanding
The sensibility is the faculty of passively receiving impressions, which are approximately “raw sense-data”. The understanding is the faculty of spontaneously conceptualizing an object by means of these impressions.
To recognize an object (such as an apple), the mind must do something; with no mental motion, the light pattern of the apple would hit the retina, but no object would be represented accordingly. In general, the understanding synthesizes raw data into a coherent picture.
Manifold of intuition
Without concepts, sense data would be a disorganized flux, like a video of white noise; Kant terms this flux a “manifold of intuition”. When I think of this, I think of a bunch of sheets of space tied together by a (curved) timeline holding them together, with pixel-like content in the space. Judgments, which are propositions about the content of our understanding (e.g. “there is a cat in front of me”), depend on the “unity among our representations”; what is needed is a “higher [representation], which comprehends this and other representations under itself”. To judge that there is a cat in front of me, I must have parsed the manifold into concepts such as “cat” which relate to each other in a logically coherent universe; I cannot make a judgment from un-unified raw pixel-data. AI object recognition is an attempt to artificially replicate this faculty.
Synthesis is the process of “putting different representations together with each other and comprehending their manifoldness in one cognition”. This is an action of the spontaneity of thought, processing the manifold of intuition into a combined representation.
This relates to the phenomenal binding problem, how do we get a sense of a “unified” world from disconnected sensory data?
In Solomonoff Induction, the manifold of intuition would be the raw observations, and the manifold is synthesized by the fact that there is a universal Turing machine producing all the observations with a hidden state. This is the case a priori, not only after seeing particular observations. Similarly with other Bayesian models such as dynamic Bayesian networks; the network structure is prior to the particular observations.
“There is only one experience, in which all perceptions are represented as in thoroughgoing and lawlike connection, just as there is only one space and time…”
Different experiences are connected in a lawlike way, e.g. through causality and through re-occurring objects; otherwise, it would be unclear how to even interpret memories as referring to the same world. The transcendental categories (which are types of judgment) are ways in which different representations may be connected with each other.
Kant gives 12 transcendental categories, meant to be exhaustive. These include: causation/dependence, existence/nonexistence, necessity/contingence, unity, plurality. I don’t understand all of these, and Kant doesn’t go into enough detail to understand all of them. Roughly, these are different ways experiences can connect with each other, e.g. a change in an experienced object can cause a change in another, and two instances of seeing an object can be “unified” in the sense of being recognized as seeing the same object.
A schema (plural schemata) relates the manifold of intuition (roughly, sense data) to transcendental categories or other concepts. As a simple example, consider how the concept of a cat relates to cat-related sense data. The cat has a given color, location, size, orientation, etc, which relate to a visual coordinate system. A cat object-recognizer may recognize not only that a cat exist, but also the location and possibly even the orientation of the cat.
Without schema, we couldn’t see a cat (or any other object); we’d see visual data that doesn’t relate to the “cat” concept, and separately have a “cat” concept. In some sense the cat is imagined/hallucinated based on the data, not directly perceived: “schema is, in itself, always a mere product of the imagination”. In Solomonoff induction, we could think of a schema as some intermediate data and processing that comes between the concept of a “cat” (perhaps represented as a generative model) and the observational sense data, translating the first to the second by filling in details such as color and location.
This applies to more abstract concepts/categories such as “cause” as well. When X causes Y, there is often a spacetime location at which that cause happens, e.g. a moment that one billiard ball hits another.
Kant writes: “Now it is quite clear that there must be some third thing, which on the one side is homogeneous with the category, and with the phenomenon on the other, and so makes the application of the former to the latter possible. This mediating representation must be pure (without any empirical content), and yet must on the one side be intellectual, on the other sensuous. Such a representation is the transcendental schema.”
Schemata are transcendental because they are are necessary for some phenomenal impressions, e.g. the impression that a cat is at some location. They are necessary for unifying the manifold of intuition (otherwise, there wouldn’t be an explanation of correlation between different individual pixel-like pieces of sense data).
Consciousness of the self
Kant discusses consciousness of the self: “The consciousness of oneself in accordance with the determinations of our state in internal perception is merely empirical, forever variable; it can provide no standing or abiding self in this stream of inner appearances, and is customarily called inner sense or empirical apperception. That which should necessarily be represented as numerically identical cannot be thought of as such through empirical data. There must be a condition that precedes all experience and makes the latter itself possible, which should make such a transcendental presupposition valid.”
The idea of a lack of a fixed or permanent self in the stream of internal phenomena will be familiar to people who have explored Buddhism. What you see isn’t you; there are phenomena that are representations of the engine of representation, but these phenomena aren’t identical with the engine of representation in which they are represented.
The self is, rather, something taken as “numerically identical with itself” which is a condition that precedes experience. Imagine a sequence of animation frames in a Cartesian coordinate system. In what sense are they part of “the same sequence”? Without knowing more about the sequence, all we can say is that they’re all part of the same sequence (and have an ordering within it); the sameness of the sequence of each frame is “numerical identity” similar to the identity of an object (such as a table) with itself when perceived at different times.
Kant writes: “We termed dialectic in general a logic of appearance.” Dialectic is a play of appearances, claiming to offer knowledge, but instead offering only temporary illusions; different sophists argue us into different conclusions repeatedly, perhaps in a cyclical fashion.
Dialectic is an error it is possible to fall into when reasoning is not connected with possible experience. Kant writes about dialectic in part to show how not to reason. One gets the impression that Kant would have thought Hegel and his followers were wasting their time by focusing so much on dialectic.
The Antinomies of Pure Reason
As an example of dialectic, Kant argues that time and space are finite and that they are infinite; that everything is made of simple parts and that nothing is simple; that causality doesn’t determine everything (requiring spontaneity as an addition) and that it does; that there is an absolutely necessary being and that there isn’t. Each of these pairs of contradictory arguments is an antinomy.
Philosophers argue about these sorts of questions for millenia without much resolution; it’s possible to find somewhat convincing arguments on both sides, as Kant demonstrates.
Kant writes: “This conception of a sum-total of reality is the conception of a thing in itself, regarded as completely determined; and the conception of an ens realissimum is the conception of an individual being, inasmuch as it is determined by that predicate of all possible contradictory predicates, which indicates and belongs to being.”
Say some objects are cold and some are hot. Well, they still have some things in common, they’re both objects. There is a distinction being made (hot/cold), and there is something in common apart from that distinction. We could imagine a single undifferentiated object, that is neither hot nor cold, but which can be modified by making it hot/cold to produce specific objects.
This is similar to Spinoza’s singular infinite substance/God, of which all other (possibly finite) beings are modifications, perhaps made by adding attributes.
The ens realissimum has a similar feel to the Tegmark IV multiverse, which contains all mathematically possible universes in a single being, or a generative grammar of a Turing complete language. It is a common undifferentiated basis for specific beings to be conceptualized within.
Kant considers deriving the existence of a supreme being (God) from the ens realissimum, but the concept is too empty to yield properties attributed to God, such as benevolence, being the intelligent creator of the universe, or providing an afterlife. He goes on to critique all supposed rational proofs of the existence of God, but goes on to say that he posits God and an afterlife because such posits are necessary to believe that the incentive of pleasure-seeking is aligned with acting morally. (Wishful thinking much?)
What is Kant getting at, in all this? I think he is trying to get readers to attend to their experience, the spacetimelike container of this experience, and the way their world-model is constructed out of their experience. For example, the idea that time is the form of the inner sense is apparent from noting that all accessible memories include me, but it’s possible to “forget” about this subjective timeline and instead conceptualize time as observer-independent. The idea that the manifold of intuition must be actively synthesized into a representation containing objects (which is in line with cognitive science) challenges the idea that the world is “given”, that “we” are simply inhabitants of a stable world. The idea of the “noumenon” as a negative, limiting concept points us at our experience (and what our experience could be) as an alternative to interminably angsting about whether what we experience is “really real” or about metaphysical concepts like God, which makes it easier to get on with positivist math, science, economics, and scholarship without worrying too much about its foundations.
The sense I get reading Kant is: “You live in a world of phenomena synthesized by your mind from some external data, and that’s fine, in a sense it’s all you could ever hope for. You have plenty of phenomena and generalities about them to explore, you can even inquire into the foundations of what makes them possible and how your mind generates them (I’ve already done a lot of that for you), but there’s no deep Outside demanding your attention, now go live!”
When I take this seriously I worry about getting lost in my head, and sometimes I do get lost in my head, and the Outside does impinge on my cozy mental playground (demanding my attention, and loosening my mental assumptions structuring the phenomenal world), but things calm after a while and I experience the phenomenal world as orderly once again.