The Unprovability of the Continuum Hypothesis Using the Method of Forcing

(1)

U.U.D.M. Project Report 2016:32

Examensarbete i matematik, 15 hp Handledare: Vera Koponen

Examinator: Jörgen Östensson Juni 2016

Department of Mathematics Uppsala University

The Unprovability of the Continuum Hypothesis Using the Method of Forcing

Alvar Bjerkeng van Keppel

(2)

(3)

Abstract

The continuum hypothesis is the statement that no set has cardinality greater than N and smaller than R. We show that if ZF C is consistent, the continuum hypothesis is not provable from ZF C.

This is done using the method of forcing, pioneered by Paul Cohen in 1963. Proofs and concepts are given with lots of detail to make reading as simple as possible.

(4)

1 Introduction

This thesis is meant to be a self contained, gentle and to the point introduction to the method of forcing.

The reader is expected to have come in contact with mathematical logic and ZF C set theory. For ease of reading a lot of basic results in set theory are stated (but not proven) to refresh the readers memory.

Forcing is a way of expanding a certain kind of model of set theory to a larger model. In this thesis forcing is used to prove that if ZF C is consistent, so is ZF C together with the negation of the continuum hypothesis(CH). We denote this by Cons(ZF C) ⇒ Cons(ZF C + ¬CH). This is one of the two parts in proving that the continuum hypothesis is independent of ZFC. The result was first published by Paul Cohen in his article “The Independence of The Continuum Hypothesis” published 1963[1]. In this article Cohen refers to earlier work by Kurt G¨odel in 1940 proving (among other things) the other direction, Cons(ZF C) ⇒ Cons(ZF C +CH). We will only look at the Cons(ZF C) ⇒ Cons(ZF C +¬CH) direction.

This thesis mainly follows the approach of Kunens excellent book [3].

2 The Logic of ZFC

The underlying logic of Zermelo-Fraenkel with the axiom of choice (ZFC) is classical first order logic with equality and a binary relation symbol ∈. Thus the formulas will be of the form

(x = y), (x ∈ y), (φ ∧ ψ), (¬φ), (∃x φ)

where x and y are variables and φ, ψ are formulas. In practice we drop parenthesis if no clarity is lost.

We let φ ∨ ψ abbreviate the logically equivalent ¬(¬φ ∧ ¬ψ), similarly for φ → ψ and φ ↔ ψ. In the same vein, ∀x φ abbreviates ¬∃x ¬φ. This is done to make proofs by induction on the structure of formulas have fewer cases. As a final convenience, the symbol ⊥ is used in a few places to denote falsehood in a formula. A possible definition of this symbol is thus ⊥^def⇔ ∃x(x ∈ x ∧ x /∈ x).

At a glance, the language of first order logic does not seem expressive enough, or at least not terse enough, to formulate interesting notions in. We deal with this by defining new notation as shorthand for long formulas. Two examples of this are

subset x ⊆ y^def⇔ ∀z(z ∈ x → z ∈ y) and

set comprehension x = {y ∈ Y | φ(y)}^def⇔ ∀y(y ∈ x ↔ y ∈ Y ∧ φ(y)).

If the reader wants to refresh his or her memory about the axioms of ZF C they are included in an appendix at the back.

3 Metatheory and Formalism

As in the vast majority of mathematical works, we will be using mathematical prose as opposed to some formal (and therefore tedious) proof system. We will however implicitly assume that any of our proofs can be translated into a formal proof in some proof system for classical first order logic, except in a few special cases. These cases touch on the two separate types of logic used. On the one hand, we have the logic of ZFC and on the other hand, the logic of the metatheory. One problematic proof is of the form “for all formulas φ₁, . . . , φ_n, X is true” where X depends on φ₁, . . . , φ_n. Since we cannot quantify over formulas within first order logic, such statements and proofs cannot be translated as is. However, in the few cases where this pops up, we will see that for each concrete choice of formulas φ₁, . . . , φ_n a formal proof in first order logic can be extracted for X in a straightforward way. A special case of this is proofs by induction on the structure of formulas. We can from such a proof extract a first order proof by manually unwinding the inductive steps until no reference to the induction hypothesis is needed. This is akin to proving P (83) without induction given a proof by induction for ∀n P (n). We then have a proof for P (0) and a proof for ∀n(P (n) → P (n + 1)) where we for simplicity assume no induction is used. By specializing ∀n(P (n) → P (n + 1)) for each n = 0, . . . , 82 and applying modus ponens 83 times, we get a proof for P (83) without using induction.

(5)

The underlying structure of the proof is as follows. To prove Cons(ZF C) ⇒ Cons(ZF C + ¬CH), we show the equivalent

(ZF C + ¬CH ` ⊥) ⇒ (ZF C ` ⊥). , (?)

where T ` φ stands for “there is a formal proof of φ using assumptions from T ”. To do this we assume the finite assumption property of the formal proof system, that if T ` ψ for some possibly infinite list of formulas T then there is a finite sublist φ₁, φ₂, . . . , φ_n of T such that φ₁, φ₂, . . . , φ_n ` ψ. We use this to reformulate (?) into cases, one for each finite sublist φ₁, φ₂, . . . , φ_n of ZF C where we let Φ ≡ φ₁∧ φ₂∧ . . . ∧ φ_n:

(Φ ∧ ¬CH ` ⊥) ⇒ (ZF C ` ⊥). (†)

Let us consider a fixed Φ. We prove (†) by constructing a model¹ of Φ ∧ ¬CH in ZF C using the method of forcing pioneered by Paul Cohen [1]. This is the meat of the thesis. This model M [G] is the expansion of a base model M which we augment with an extra set G, similar to how the field extension F (g) is the smallest subfield containing F ∪ {g} where F ⊂ G and g ∈ G. To make “a model of” more precise, we define in our metalogic a transformation (·)^N on formulas, where φ^N is read as “φ relativized to N ”. It simply bounds all quantifications to a set N and leaves everything else intact. As an example,

(∃x (x ∈ y → x = z))^N becomes ∃x ∈ N (x ∈ y → x = z).

The construction of the model can then be stated as

ZF C ` ∃M [G] (Φ ∧ ¬CH)^{M [G]}.

We will gloss over something that is intuitively clear but requires a rigorous metalogical framework to define, namely that we can use deduction inside models N :

(ψ ` χ) ⇒ (ZF C ` ψ^N → χ^N).

This is the last piece needed to prove (†).

ZF C ` ∃M [G] (Φ ∧ ¬CH)^{M [G]} the constructed model

Φ ∧ ¬CH ` ⊥ the supposition in (†)

ZF C ` (Φ ∧ ¬CH)^{M [G]}→ ⊥^{M [G]} deduction inside M [G]

ZF C ` ∃M [G] ⊥^{M [G]} by line one and three

ZF C ` ⊥ simplification

This concludes the rough sketch of the overarching structure of the proof. Our metatheory thus needs to support manipulation of the formal proof system, be able to prove the finite assumption property and support induction and recursion over formulas.

4 Well-founded Relations

Two concepts that are not really interesting to formalize outside of set theory are that of classes and of class functions. The intuition is simply that a class is a collection of sets that may be “bigger” than a set and a class function is a function whose domain and range are classes. Examples of classes are the class of all singleton sets and the class of all groups. An example of a class function is the Kuratowski operator that sends arbitrary sets x and y to {{x}, {x, y}}.

Definition 1. Let s1, . . . , sn be sets.

• A class A is defined by a formula φA(x, z1, . . . , zn) and the sets si. We write a ∈ A if φA(a, s1, . . . , sn) holds.

• A class function F is defined by a formula ψF(x1, . . . , xm, y, z1, . . . , zn) and the sets si where for sets r1, . . . , rm there is a unique set y such that ψF(r1, . . . , rm, y, s1, . . . , sn) holds. If F is a class function and ψF(a1, . . . , am, b, s1, . . . , sn) is true we use F(a1, . . . , am) as a shorthand for b.

1The exact meaning of “model” will be explained later.

(6)

The class of all sets, called the universe of sets is denoted by V. The choice of formula for a class or class function is usually implicit. For φV any tautology will do, x = x being the canonical choice.

Every set is a class (fix z as the desired set in φ(x, z) ^def⇔ x ∈ z) but there are classes which are not sets. It should also be noted that classes and class functions are not objects in the underlying first-order logic, they are just an abstraction used to hide the formula they represent. Keep this in mind as certain operations on classes cannot be justified by ZF C (such as the collection of all subclasses) while others yield new classes without causing any trouble. If A is a class it is for example sane to speak of the class {a ∈ A : φ(a)} for some φ since this is represented by the formula φA(x) ∧ φ(x).

Definition 2. A well-founded class W is a class W with a binary relation R (a class whose members are pairs of elements from W) where every nonempty subset of W has a R-minimal element; or expressed by a formula,

∀S ⊆ W(S 6= ∅ → ∃m ∈ S ∀s ∈ S ¬sRm).

Note that for any well-founded class, ¬xRx (otherwise the set {x} has no minimal element). By using the axiom of choice there is an alternative characterization of well-founded relations that hopefully sheds more light on what structure well-foundedness imposes.

Lemma 3. The following two statements are equivalent.

• (W, R) is well-founded.

• Every function f : N → W, such that for all n ∈ N either f(n + 1)Rf(n) or f(n + 1) = f(n), satisfies ∃N ∈ N ∀n ≥ N f (N ) = f (n).

The second statement can be phrased as every R-descending N-sequence converges.

Proof. Assume that (W, R) is well-founded and take a function f as above. Take a N ∈ N such that f (N ) = min range(f ). Then

f (N ) = f (N + 1) = · · · = f (N + n) = . . .

since by induction f (N + i + 1)Rf (N + i) is impossible by minimality of f (N + i) = f (N ) for all i ∈ N.

In the other direction, assume that every descending function f converges and towards a contradiction, assume that there is a nonempty set S ⊂ W with no R-minimal element. Let cS be a choice function of S and s ∈ S. Then

f (0) = s, f (n + 1) = c_S({x ∈ S | xRf (n)})

defines a function since the set {x ∈ S | xRf (n)} being empty is equivalent to f (n) being a minimal element in S. This f contradicts our assumption since f (n) 6= f (n + 1) for all n ∈ N.

This tells us that however we drop a bouncy ball down a well-founded staircase it will only bounce on a finite number of stairs. The canonical example of a wellfounded class is of course (V, ∈). That this is a wellfounded class follows directly from the axiom of foundation:

∀x [∃y(y ∈ x) → ∃y(y ∈ x ∧ ¬∃z(z ∈ x ∧ z ∈ y))] . Definition 4. The powerset of a set A denoted by PA.

Definition 5. The union {x | ∃a ∈ A(x ∈ a)} of a set A is denoted byS A or S_a∈Aa.

Definition 6. A relation R on a class A is set-like if for every a ∈ A the class {x ∈ A : xRa} is a set.

Definition 7. Let R be a set-like relation on a class A and let x ∈ A. We define pred (as in predecessor) and closure by recursion as follows:

pred(A, x, R) = {a ∈ A : aRx}

pred₀(A, x, R) = pred(A, x, R) pred_n+1(A, x, R) =[

{pred(A, a, R) : a ∈ pred_n(A, x, R)}

The set {pred_n(A, x, R) : n ∈ N} exists by the replacement axiom so closure(A, x, R) =[

{pred_n(A, x, R) : n ∈ N} is well defined.

(7)

Proposition 8. If a relation R is set-like and well-founded on a class W then any nonempty subclass W⁰ of W has a R-minimal element.

Proof. This is seen by fixing a v ∈ W⁰ and taking a R-minimal element m of the set S = W⁰∩ ({v} ∪ closure(W⁰, v, R)).

A m⁰∈ W⁰ such that m⁰Rm implies m⁰ ∈ S by construction of S. This contradicts the minimality of m in S so m must be a R-minimal element of W⁰.

We now have all the machinery to prove that induction over well-founded sets is valid. To get a familiar special case of this general induction principle, take (W, R) = (N, <). This becomes

∀w ∈ N (∀w⁰ < w φ(w⁰)) → φ(w) → ∀w ∈ N φ(w),

the total-induction principle on the natural numbers. If we fix w = 0 in the first quantifier above we get (∀w⁰ < 0 φ(w⁰)) → φ(0) or equivalently φ(0),

the base case in the induction.

Theorem 9. Given a well-founded set-like relation R on W and a formula φ then the following induction principle holds.

∀w ∈ W [∀w⁰∈ W w⁰Rw → φ(w⁰)] → φ(w) → ∀w ∈ W φ(w)

In other words, to show that all elements of W have the property φ, it suffices to show that if all elements of W which are R-smaller than w satisfies φ then w satisfies φ.

Proof. Assume towards a contradiction that there is a w such that ¬φ(w) but ∀w ∈ W (∀w⁰Rw φ(w⁰)) → φ(w) holds. Without loss of generality assume that w is a minimal element not satisfying φ. Then for all w⁰Rw, φ(w⁰) by minimality of w. By assumption (∀w⁰Rw φ(w⁰)) → φ(w) so φ(w) holds, a contradiction.

It depends from case to case whether proof by induction or proof by a contradictory minimal element is preferable. Hand in hand with induction comes recursion, our next topic. For this we will need some notation.

Definition 10. We define some standard notation concerning functions.

• The restriction of a (class) function f to a set A is f A = {ha, bi | a ∈ A ∧ f(a) = b}.

• The image of a set A under a function f is f [A] = {f (a) | a ∈ A}.

• The domain of a set R is dom(R) = {a | ∃baRb} if R is a binary relation and empty otherwise.

• The range of a set R is range(R) = {b | existsaaRb} if R is a binary relation and empty otherwise.

Theorem 11 (Recursion Theorem). Let R be a well-founded and set-like relation on A and F : A×V → V a class function. Then there is a unique class function G : A → V such that

∀x ∈ A [G(x) = F x, G pred(A, x, R)]. (∗)

Proof. For the reader that has never seen this definition of a recursively defined function, a short expla- nation is in order. In a recursive definition we are allowed to make use of any previously defined values, in this case the values of G in points below x in the R-order. What values G takes in these points are encoded in G pred(A, x, R). Finally F decides what value G(x) has by looking at x itself and the previously defined values in G pred(A, x, R). But what about base cases? These are sneakily covered by the phrasing “previously defined values”, so for a base case b ∈ A,

G = F b, G pred(A, b, R) = F(b, G ∅) = F(b, ∅).

(8)

To prove uniqueness of G, assume we have G1, G2 satisfying the above and that a is minimal in A satisfying G1(a) 6= G2(a). By R-minimality G1 pred(A, a, R) = G² pred(A, a, R) since if a⁰Ra and G₁(a⁰) 6= G₂(a⁰) then a is not minimal with the property G₁(a) 6= G₂(a). This yields the contradiction

G1(a) = F(x, G1 pred(A, a, R)) = F(x, G² pred(A, a, R)) = G²(a).

The proof of existence is done by defining a G that satisfies the equation. Call a function g an x-approximation of G if

dom(g) = {x} ∪ closure(A, x, R) and ∀y ∈ dom(g) [g(y) = F(y, g pred(A, y, R))].

The definition of G(a) is simply G(a) = ga(a) where ga is an a-approximation. Thus we need to prove that this definition is well-defined by showing uniqueness and existence of these approximations.

Uniqueness Given an x-approximation g_x, a y-approximation g_yand a z ∈ dom(g_x) ∩ dom(g_y) then in the same fashion as in the uniqueness argument about G, gx(z) = gy(z). In particular, given two x-approximations, they coincide everywhere and are thus equal.

Existence By induction on x ∈ A. Assume that approximations exist for all x such that xRa. We define ga as follows:

ga(x) =

(F(a, {hy, gy(y)i : y ∈ pred(A, a, R)}) if x = a

g_x(x) if x ∈ closure(A, a, R).

Then ga is an approximation since for any x ∈ {a} ∪ closure(A, a, R), ga(x) = gx(x) = F(x, {hy, gy(y)i : y ∈ pred(A, x, R)})

= F(x, gx pred(A, x, R))

(∗)= F(x, g_a pred(A, x, R)) where (∗) holds since pred(A, x, R) ⊆ dom(gx) and gx⊆ ga.

The only part left to prove is that G satisfies the equation ∀x ∈ A [G(x) = F(x, G pred(A, x, R)]. To see that this is the case, fix an x ∈ A and chase the equalities.

G(x) = gx(x) = F(x, {hy, gy(y)i : y ∈ pred(A, x, R)})

= F(x, {hy, G(y)i : y ∈ pred(A, x, R)})

= F(x, G pred(A, x, R)).

In practice we do not mention F at all and just define G in terms of base cases and earlier defined values. We sometimes define functions by recursion on some well-founded class (U, R) instead of V, the class of all sets. In this case we do not care about the values of G outside of U and to appease the recursion theorem we define G(x) = ∅ for x /∈ U. Note that if R well-founds U it also well-founds V.

In a few places we will define relations on well-founded classes. To see that this is well defined, we formally view the relation Q as a function G_Q with the same domain as Q::

GQ(~x) =

(1 if the definition of Q with every occurence of Q(~y) replaced with GQ(~y) = 1 holds 0 otherwise.

Then the recursion theorem tells us G_Q is well defined and hence, the relation Q is as well.

5 Well-Orders, Ordinals, Cardinals And Cardinality

Apart from K¨onigs theorem, the rank function and a few results about cofinality at the end, this section is a refresher on the ordinal and cardinal numbers. As ordinals and cardinals are assumed to be familiar, basic results are only stated, not proven. For proofs, curious, oblivious and forgetful readers alike are directed to [2] or [4].

(9)

5.1 Ordinal Numbers

Definition 12.

• A partially ordered set or poset is a binary relation R on a class A such that for a, b, c ∈ A, R is:

transitive: aRb and bRc implies aRc.

antisymmetric: If aRb and bRa then a = b.

irreflexive: ¬aRa.

• A relation R on a class A is called a well-order if it is both well-founded and totally ordered.

• A class A is transitive if x ∈ y ∈ A implies x ∈ A or equivalently, if y ∈ A implies y ⊆ A.

• An ordinal or an ordinal number is a transitive set α that is totally ordered by ∈. Due to the axiom of foundation, this is equivalent to being well-ordered. The class of ordinals are denoted by Ord. We use the convention that the first greek letters α, β, γ and δ always denote ordinals while λ is sometimes used to denote a limit ordinal. The symbols < and ∈ are used interchangably when dealing with ordinals and α ≤ β has the usual meaning of a < b or a = b.

• The successor function SW on a well-order W is defined at every non-maximal element in W by SW(x) = min{y ∈ W : x <W y} and is undefined otherwise.

• The successor of a set x is defined as S(x) := x ∪ {x}. For ordinals α, S(α) is sometimes denoted as α + 1. This definition is motivated by the next proposition.

Proposition 13.

• If x ∈ α ∈ Ord then x ∈ Ord.

• If β ∈ α and β is not maximal in α then Sα(β) = β ∪ {β} = S(β).

Definition 14. The first ordinal ∅ is denoted as 0. An ordinal α is called a successor if α = S(β) for some other ordinal β. It is called a limit if it is neither a successor nor 0.

Proposition 15. If A is a transitive subset of an ordinal α then A is an ordinal and either A = α or A ∈ α.

Proposition 16. For ordinals α and β exactly one of α ∈ β, α = β or α 3 β is true.

Theorem 17. The class of ordinals is well-ordered and set-like under the ∈-order.

Lemma 18. There exists a limit ordinal. In particular, there is a smallest limit ordinal.

Definition 19. The smallest limit ordinal is called ω. All ordinals smaller than ω are called finite and the rest are called infinite.

In ZF C it is common practice to identify the finite ordinals with natural numbers. The usual operations (addition, multiplication and so on) can then be defined by recursion on N = ω.

Proposition 20. If α and β are ordinals and x is a set of ordinals then

• α ∪ β = max(α, β),

• α ∩ β = min(α, β) and

• For any x ⊂ Ord the union S x is the least upper bound of x in Ord. This is sometimes denoted as sup x.

The next theorem together with the well-ordering principle justifies indexing an arbitrary set with some ordinal.

Definition 21. A function f : V → W between two partial orders is an order isomorphism if it is bijective and satisfies v <V v⁰ ⇔ f (v) <W f (v⁰).

(10)

Theorem 22. Every well-ordered set is order-isomorphic to a unique ordinal.

Definition 23. The class function rank : V → Ord is defined with recursion on ∈ by rank(x) = sup{rank(y) + 1 | y ∈ x}.

The rank function simply counts how “deep” a set is.

Proposition 24. Every ordinal α has rank α.

Proof. By induction. Whether α is 0, a successor or a limit,

rank(α) = sup{rank(γ) + 1 | γ < α}^IH= sup{γ + 1 | γ < α} = α.

Definition 25. The class function R(α) is defined on all ordinals by R(0) = ∅

R(α + 1) = P(R(α)) R(λ) = [

γ<λ

R(γ) for limit ordinals λ.

The “depth” of a set in R(α + 1) \ R(α) is then reasonably enough α + 1 since the newly generated subsets in R(α + 1) must contain sets in R(α) that are not in any R(β) where β < α. The next theorem shows this connection formally.

Proposition 26. For all α ∈ Ord, R(α) = {x | rank(x) < α}.

Proof. By induction over α where we assume the proposition holds for β < α.

α = 0: In this case R(0) = ∅ = {x | rank(x) < 0}.

α = S(β): To show R(α) ⊆ {x | rank(x) < α}, note that any subset of {x | rank(x) < β} has rank at most β:

{x | rank(x) < α} ⊇ P({x | rank(x) < β}) = P(R(β)) = R(α).

In the R(α) ⊇ {x | rank(x) < α} direction, take a set y such that rank(y) < α. For any x ∈ y, rank(x) < β so y ⊆ {x | rank(x) < β} = R(β), hence y ∈ P(R(β)) = R(β + 1) = R(α).

α is a limit: R(α) =S

γ<αR(γ)^IH=S

γ<α{x | rank(x) < γ} = {x | rank(x) < α}.

5.2 Cardinality

Definition 27. A set X has the same cardinality as a set Y if there is a bijection between X and Y . This is written as X =_c Y . A set X have cardinality lesser than or equal to Y if there is an injection from X to Y . This is written as X ≤c Y . If X ≤c Y and X 6=c Y we say X has a strictly smaller cardinality than Y and we write X <cY .

Theorem 28 (Cantor-Schr¨oder-Bernstein). If X ≤_c Y and Y ≤_c X then X =_cY .

This theorem can be proven in a weak subtheory of ZF C, notably without the axiom of choice and replacement.

Cantor showed that cardinality among the infinite sets is a nontrivial concept.

Theorem 29 (Cantor). The cardinality of PX is always greater than the cardinality of X.

Proof. Since x 7→ {x} is an injection, X ≤c PX. It follows that if PX ≤c X there is a bijection f : X → PX. Assume f is such a bijection and let f (y) = {x ∈ X | x /∈ f (x)} for some y ∈ X. Then y ∈ f (y) ⇔ y /∈ f (y), so no such f can exist and X <c PX.

(11)

As a consequence of the well-ordering principle and the fact that every well-order is is isomorphic with some ordinal we can define the notion of cardinal number.

Definition 30. A cardinal or cardinal number is an ordinal such that there is no bijection to any smaller ordinal. Cardinals are denoted by the letter κ and sometimes λ, although λ may also stand for a limit ordinal. The cardinality of a set X, denoted by |X|, is defined as the least ordinal bijective with X. For every ordinal α, we denote the αth infinite cardinal by ℵ_α. A cardinal κ is a successor cardinal if it is the smallest cardinal larger than some cardinal λ. A cardinal is a limit cardinal if it is not 0 nor the successor to any smaller cardinal.

There is a natural correspondence between the ordering of cardinals and existence of injections, surjections and bijections between sets.

Proposition 31.

• There is an injection f : X → Y if and only if |X| ≤ |Y |.

• There is a bijection f : X → Y if and only if |X| = |Y |.

• There is a surjection f : X → Y if and only if |X| ≥ |Y |.

Proposition 32. Either X <_cY , X =_c Y or X >_cY . Proof. Follows from the total order of the ordinals.

We define the usual arithmetic operations on cardinals and state some of their properties.

Definition 33.

• κ + λ := |κ ] λ| = |{hα, ii | (α ∈ κ ∧ i = 0) ∨ (α ∈ λ ∧ i = 1)}|

• κ · λ := |κ × λ|

• κ^λ:= |{f | f : λ → κ}|

Proposition 34. Let max(κ, λ) ≥ ℵ0. Then κ + λ = κ · λ = max(κ, λ).

Proposition 35. If X and its members are of at most cardinality κ ≥ ℵ₀ then |S X| ≤ κ.

Proposition 36. For cardinals κ, λ and µ, (κ^λ)^µ= κ^λ·µ).

Definition 37. The product of sets B_i indexed by i ∈ I is defined as Y

i∈I

B_i = {f | (f is a function) ∧ dom(f ) = I ∧ ∀i ∈ I(f (i) ∈ B_i)}.

Theorem 38 (K¨onig). Let I be a nonempty index set and Aiand Bibe sets for every i ∈ I. If Ai<c Bi

for every i ∈ I then

[

i∈I

Ai<c

Y

i∈I

Bi.

Proof. We show that the union has a strictly smaller cardinality by proving that any function from S

i∈IAi toQ

i∈IBi is not surjective. Pick an arbitrary s :S

i∈IAi→Q

i∈IBiand let πj :Q

i∈IBi→ Bj

be the projection of the jth coordinate. Since Aj <c Bj, no function f : Aj → Bj can be a surjection, specifically πj◦ (s Aj) cannot. There is thus by the axiom of choice a function b ∈Q

i∈IBi such that b(j) ∈ Bj\ s[Aj]. For all j ∈ I and a ∈ Aj, πj(s(a)) 6= πj(b) so s is not a surjection.

K¨onigs theorem is special because it guarantees a strict cardinal inequality. Given K¨onigs theorem one can deduce Cantors theorem by setting I to the set in question, Ai = i and Bi = {0, 1}. Then S

i∈I{i} = I <c Q

i∈I{0, 1} where the product is the set of characteristic functions of I. Another neat corollary is the axiom of choice; if ∅ <_cB_i for i ∈ I thenS

i∈I∅ = ∅ <cQ

i∈IB_i, that is to say, there is a choice function for the set {B_i| i ∈ I} if all Bi are nonempty.

(12)

5.3 Cofinality

The results and definitions in this section will only be used in the last section and so can thus be skipped at first and returned to when needed.

Definition 39. A set A ⊂ Ord is unbounded in α if ¬∃β ∈ α∀γ ∈ A(γ < β), that is, no strict bound of all of A exists in α.

Definition 40. A function f : α → β is cofinal if the range of f is unbounded in β.

Definition 41. The cofinality of an ordinal β, denoted cf β is defined as the least ordinal α such that there exists a cofinal function f : α → β.

As an immediate consequence of the definition, the identity map is cofinal so cf β ≤ β and the cofinality of an ordinal is always defined. cf 0 = 0 and Also, by the function ∅, cf 0 = 0 and by the function (0 7→ α) from 1 to α + 1, cf(α + 1) = 1.

Lemma 42. For every ordinal β, there is a nondecreasing cofinal map f : cf β → β.

Proof. Let f : cf β → β be a cofinal map. We define the nondecreasing function g : cf β → Ord as follows:

g(γ) = sup range f S(γ).

Since f (γ) ≤ g(γ) for every γ ∈ cf β, range(g) is clearly unbounded in β, but is it a subset of β?

Assume that it is not and that α < cf β is the least ordinal such that g(α) ≥ β. Then α can not be 0 or a successor because in that case g(α) = f (α) < β so further assume α is a limit ordinal. But if sup range f S(α) ≥ β, either f : α → β is cofinal, contradicting the minimality of cf β or f (α) > β, contradicting the range of f . Thus range(g) ⊆ β and g : cf β → β is cofinal and nondecreasing.

Proposition 43. If f : α → β and g : β → γ are nondecreasing cofinal maps then their composition g ◦ f : α → γ is nondecreasing and cofinal as well.

Definition 44. An ordinal β is regular if cf β = β and singular otherwise.

Proposition 45. Every regular ordinal is a cardinal.

Proof. A bijection b : α → β is always cofinal so no regular β can be bijective with a smaller α.

Lemma 46. For every β, cf β is a regular cardinal.

Proof. First of all, cf cf 0 = cf 0 = 0 and cf cf S(α) = cf 1 = 1 are both regular. For limit ordinals β, let f : cf β → β and g : cf cf β → cf β be nondecreasing cofinal maps. The composition f ◦ g : cf cf β → β is cofinal so by the minimality of cf β, cf cf β = cf β, meaning cf β is regular. The previous proposition then tells us cf β is a cardinal.

Proposition 47. The cofinality of every infinite limit ordinal is a limit ordinal. In particular, cf ω = ω, so ω is regular.

Proof. Let λ be a limit ordinal. The cofinality of an ordinal is itself regular and since cf(α + 1) = 1, the only regular nonlimit ordinals are 0 and 1. All we have to do is exclude those possibilities. The cofinality of λ can clearly not be 0 or 1 since λ is nonempty and α ∈ λ is strictly bounded by α + 1 ∈ λ, respectively. Since ω is the least limit ordinal and cf α ≤ α, cf ω = ω.

Lemma 48. If there is a nondecreasing cofinal map from α to β then cf α = cf β.

Proof. The composition of nondecreasing cofinal maps f⁰ : cf α → α and f : α → β shows that cf β ≤ cf α.

To show that cf α ≤ cf β, let g : cf β → β be a nondecreasing cofinal function and consider the mapping h : cf β → α

γ 7→ {δ ∈ α | f (δ) ≤ g(γ)}.

First of all, it maps ordinals to ordinals since f is nondecreasing and every transitive subset (δ < δ⁰∧ δ⁰∈ A ⇒ δ ∈ A) of an ordinal is an ordinal. Second, for every δ ∈ α, there is some γ ∈ cf β such that f (δ) ≤ g(γ) by cofinality of g, so h : cf β → α is cofinal. By definition of cofinality, cf α ≥ cf β.

(13)

Proposition 49. Every infinite successor cardinal is regular.

Proof. Let κ be the successor cardinal of the cardinal µ and assume towards a contradiction that cf(κ) <

κ. Let f : cf κ → κ be cofinal. Then for every γ ∈ κ, f (γ) <_cκ and κ =S

γ<cf κf (γ) since range(f ) is unbounded in κ. Further, cf κ ≤_c µ and since a union of at most µ sets of cardinality at most µ is at most µ, κ =S

γ<cf κf (γ) ≤_c µ < κ. This contradiction means our assumption cf(κ) < κ is wrong and the only other option is cf(κ) = κ.

Proposition 50. For an infinite limit cardinal λ, the successor cardinals in λ are unbounded in λ.

Proof. We wish to show that for every ordinal α < λ there is a successor cardinal κ ∈ λ greater than α.

Since α ∈ λ and |α| ≤ α, |α| ∈ λ. Let κ be the successor cardinal of |α|. Then α < κ by being a larger cardinal than |α| and κ ∈ λ since λ is a limit cardinal.

As a small bonus we can derive two neat results linking cofinality to cardinal exponentiation.

Proposition 51. For any infinite cardinal κ, κ <c cf 2^κ

Proof. Assume that λ = cf 2^κ ≤c κ and let f : λ → 2^κ be cofinal. By applying K¨onigs theorem and a bit of cardinal arithmetic we get the following contradiction:

2^κ=[

i∈λ

f (i) <_c Y

i∈λ

2^κ=_c(2^κ)^λ= 2^κ·λ= 2^κ.

By applying proposition 50 to the map f (n) = ℵn, dom(f ) = ω we see it is cofinal in ℵω and by proposition 47, cf ℵω≤ ω. Thus we see that by the last proposition, 2^ℵ⁰= ℵω is not a possibility.

Proposition 52. For every infinite cardinal κ, κ <cκ^{cf κ}.

Proof. Let f : cf κ → κ be a cofinal map. Since f (γ) <c κ for each γ ∈ cf κ, we can apply K¨onigs theorem:

κ = [

γ∈cf κ

f (γ) <c

Y

γ∈cf κ

κ = κ^{cf κ}.

6 Relativization

The the only mathematical structures we are interested in are the ones with a single binary relation, that is to say the ones that possibly could be models of ZF C. But as it turns out we can get by using only structures where the binary relation is ∈. We therefore define what it means for a formula to hold in a transitive class M. Recall that a class is transitive if y ∈ A implies y ⊆ A.

Definition 53. Let M be a transitive class. The relativization of a formula φ to M, denoted by φ^M is defined by recursion (in the metalogic) on the structure of φ as follows.

1. (x = y)^M is x = y.

2. (x ∈ y)^Mis x ∈ y.

3. (¬ψ)^M is ¬ ψ^M. 4. (ψ ∧ χ)^M is ψ^M∧ χ^M. 5. (∃x ψ)^M is ∃x(x ∈ M ∧ ψ^M).

(14)

The rest of the logical connectives and the universal quantifier are as mentioned defined to be short- hands containing only ¬,∧ and existential quantifiers, i.e φ ∨ ψ := ¬(¬φ ∧ ¬ψ) and ∀x φ := ¬∃x(¬φ).

We abbreviate ∃x(x ∈ y ∧ φ) by ∃x ∈ y φ and ∀x(x ∈ y → φ) by ∀x ∈ y φ for our common mental well being’s sake. Note that the defined universal quantifier works as expected under relativization in the following sense:

(∀x φ)^{M def}⇔ (¬∃x ¬φ)^{M def}⇔ ¬(∃x ¬φ)^{M def}⇔ ¬∃x(x ∈ M ∧ ¬φ^M)

⇔ ¬∃x¬(x ∈ M → φ^M)^def⇔ ∀x(x ∈ M → φ^M)^def⇔ ∀x ∈ M φ^M. The reason for defining relativization and using the notation φ^M instead of the more familiar M φ is that [3] does so. We read φ^Mas “φ is true in M”. Similarly, if S is a list of sentences in the metalogic, we say that “M is a model of S” or “S is true in M” if every sentence φ in S is true in M².

Definition 54. For any constant c uniquely defined by some formula φc(y) (such as ω and ∅) and any class function F(x1, . . . , xn) defined by some formula φF(x1, . . . , xn, y), let c^M be the object defined by φ^M_c (y) and F^M(x1, . . . , xn) be defined as the y making φ^M_F(x1, . . . , xn, y) hold. This notation will only be used when such a y is guaranteed to exist and is unique.

Definition 55. Let M be a transitive class. A formula φ with free variables among x₁, . . . , x_n is called absolute for M if

∀x₁, . . . , x_n∈ M(φ(x₁, . . . , x_n) ↔ φ^M(x₁, . . . , x_n))

holds. A formula φ is called absolute if M is given by the context or if for any transitive class M, φ is absolute for M.

Lemma 56. Let M be a transitive class and let φ and ψ be absolute for M. Then ¬φ, φ ∧ ψ, ∃x ∈ y φ and ∀x ∈ y φ are absolute for M.

Proof. The first two cases are true by definition of relativization on negations and conjunctions. In the existential case, fix values in M for y and the free variables x1, . . . , xn of ∃x ∈ y φ.

(∃x ∈ y φ)^{M def}≡ (∃x(x ∈ y ∧ φ))^{M def}≡ ∃x(x ∈ M ∧ (x ∈ y)^M∧ φ^M)

(?)⇔ ∃x(x ∈ M ∧ x ∈ y ∧ φ)^(†)⇔ ∃x(x ∈ y ∧ φ)^def≡ ∃x ∈ y φ The (?)-equivalence follows by absoluteness of φ and x ∈ y. The left to right of the (†)-equivalence is clear, for the right to left direction, consider that y ∈ M and that M is transitive yields x ∈ y ⇒ x ∈ M.

The universal quantifier case follows from the existential case and that ∀x ∈ y φ is equivalent to ¬∃x ∈ y ¬φ.

Definition 57. A formula is ∆0 if it is of the form 1. x ∈ y or x = y,

2. ¬φ or φ ∧ ψ where φ and ψ are ∆₀.

3. ∃x (x ∈ y ∧ φ) or ∀x (x ∈ y → φ) where φ is ∆₀.

Note that ∀x ∈ y φ is short for ¬∃x(x ∈ y ∧ ¬φ) and is therefore also ∆₀ if φ is. As it happens, this type of formula with only bounded quantifiers is relevant in other topics also. In arithmetic, where (<) bounds variables instead of (∈) we can for such a formula φ(x1. . . xn) replace the free variables with numbers n1. . . nm and check if φ(n1, . . . nm) is true by mechanically checking for existence up to the given bounds for each quantifier. All primitive recursive functions can also be described by ∆0formulas.

The curious reader is referred to [6].

Corollary 58. ∆0formulas are absolute. Alternatively, any formula whose quantifiers are all bounded is absolute.

2By true, we here mean derivable from ZF C.

(15)

Definition 59. An ordered pair hx, yi is coded as the set {{x}, {x, y}}. The class function taking x, y to {{x}, {x, y}} is called the Kuratowski operator. With this in mind we let

Qha, bi ∈ x : φ abbreviate Qw ∈ x∃p ∈ w∃a, b ∈ p(w = ha, bi ∧ φ) where Q is one of the quantifiers ∀ or ∃. Note that if φ is ∆₀, so is Qha, bi ∈ x : φ.

To demonstrate what can be deduced from the absoluteness of ∆0 formulas, here is a list of formulas accompanied with ∆0equivalents. That these actually are equivalent is “easily seen” by inspection. Note that the “definition” of intersection has a quirk, to make it well defined for all sets we say thatT ∅ has the arbitrary value ∅.

x ∈ y ⇔ x ∈ y x = y ⇔ x = y

x ⊆ y ⇔ ∀w ∈ x(w ∈ y)

z = {x, y} ⇔ x ∈ z ∧ y ∈ z ∧ ∀w ∈ z(w = x ∨ w = y) z = {x} ⇔ z = {x, x}

z = hx, yi ⇔ z = {{x}, {x, y}}

z = ∅ ⇔ ∀w ∈ z(w 6= w)

z = x ∪ y ⇔ x ⊆ z ∧ y ⊆ z ∧ ∀w ∈ z(w ∈ x ∨ w ∈ y) z = x ∩ y ⇔ z ⊆ x ∧ z ⊆ y ∧ ∀w ∈ x(w ∈ y → w ∈ z)

z = x \ y ⇔ z ⊆ x ∧ ∀w ∈ x(w /∈ y ↔ w ∈ z) z = S(x) ⇔ x ⊆ z ∧ x ∈ z ∧ ∀w ∈ z(w ∈ x ∨ w = x) (x is transitive) ⇔ ∀y ∈ x ∀z ∈ y(z ∈ x)

x ∈ Ord ⇔ ∀y, z ∈ x(y ∈ z ∨ y = z ∨ y 3 z) ∧ (x is transitive) x = 0 ⇔ x = ∅

(x is a successor ordinal) ⇔ ∃y ∈ x(S(y) = x) ∧ x ∈ Ord

(x is a limit ordinal) ⇔ ¬∃y ∈ x(S(y) = x) ∧ x ∈ Ord ∧ x 6= 0

z = ω ⇔ (z is a limit ordinal) ∧ ¬∃y ∈ z(y is a limit ordinal) z =[

x ⇔ ∀y ∈ x(y ⊆ z) ∧ ∀w ∈ z∃y ∈ x(w ∈ y) z =\

x ⇔ (∀y ∈ x(z ⊆ y)) ∧ (x = ∅ → z = ∅)

∧ ∀y ∈ x∀w ∈ y((∀y⁰∈ x(w ∈ y⁰)) → w ∈ z) z = A × B ⇔ ∀w ∈ z∃hx, yi ∈ z(x ∈ A ∧ y ∈ B ∧ w = hx, yi)

∧ ∀x ∈ A∀y ∈ B∃w ∈ z(w = hx, yi) R is a relation ⇔ ∀w ∈ R∃hy, zi ∈ R(w = hy, zi)

z = dom R ⇔ (R is a relation) ∧ (∀x ∈ z∃ha, bi ∈ R(a = x)) ∧ (∀ha, bi ∈ R ∃x ∈ z(a = x)) z = range R ⇔ (R is a relation) ∧ (∀x ∈ z∃ha, bi ∈ R(b = x)) ∧ (∀ha, bi ∈ R ∃x ∈ z(b = x)) R is a function ⇔ (R is a relation) ∧ ∀ha, bi, ha⁰, b⁰i ∈ R(a = a⁰→ b = b⁰)

z = R(x) ⇔ (R is a function) ∧ ∃ha, bi ∈ R(a = x ∧ b = z)

R is an injective function ⇔ (R is a function) ∧ ∀ha, bi, ha⁰, b⁰i ∈ R(b = b⁰→ a = a⁰)

Proposition 60. The formulas to the right of the ‘⇔’ signs are ∆₀ so their equivalent counterparts on the left are therefore absolute.

Lemma 61. The formula z = {w ∈ y | φ(w, x₁, . . . , x_n)} where z is not a parameter of φ is ∆₀ if φ is

∆0 and absolute if φ is absolute.

Proof. The formula is really a notational convenience instead of writing ∀w ∈ y(w ∈ z ↔ φ(w, x1, . . . , xn)).

This formula is clearly ∆0 if φ is and by lemma 56 it is absolute if φ is absolute.

(16)

7 Model Construction

The goal of this section is to show that for every finite sublist φ₁, . . . , φ_n of ZF C, ZF C ` ∃M (M is transitive) ∧ (M ≤_cℵ₀) ∧ φ^M₁ ∧ · · · ∧ φ^M_n .

At a glance, this looks a lot like the prerequisites for the compactness theorem of first order logic but differs somewhat. Let “M s” be defined (in ZF C) in the usual way in the spirit of Tarski’s truth definition by recursion over encoded formulas³. For sets S of encoded formulas we use the standard abuse of notation M S to mean ∀s ∈ S(M s). The set of all encoded axioms of ZF C is denoted by pZF Cq. The compactness theorem specialized to the theory pZF Cq can then be stated as the sentence

∀S ∈ P(pZF Cq)[S <cℵ0→ ∃M : M S]

| {z }

antecedent

→ ∃M : M pZF Cq

| {z }

consequent

.

It is provable for transitive models M that M pφq is equivalent to φ^M so this is only a syntactic difference. In order to deduce the consequent from the compactness theorem we would need to prove the antecedent in a single first order proof while showing that there are models for all finite subsets of pZF Cq. This is in stark contrast to the section goal of proving that for a fixed finite sublist of ZF C we can prove that there is a countable transitive model of these axioms where we can tailor our use of axioms in the proof to suit the required axioms.

This somewhat motivates why the section goal is achieveable and why the antecendent of the compactness theorem would be harder, if possible at all, to prove. In fact, the consequent is not provable in ZF C! This follows from G¨odels second incompleteness theorem which tells us that ZF C 0

¬ Provable_{pZF Cq}(p⊥q) and by standard results from mathematical logic⁴, this is equivalent to ∃M pZF Cq. But how is this possible? The section goal and the antecendent looks so similar. Do they not say the same thing? Not necessarily: If we take a model theoretic approach and assume that we in the metatheory have a model M of ZF C then ω^M might be nonisomorphic to the natural numbers in the metatheory⁵. If this is the case, there are so called nonstandard numbers in ω^M that have no corresponding natural number in the metatheory. This implies that “finite” inside of M is not the same thing as “finite” in the metatheory. Together with the encodings of formulas from the metatheory, the set of encoded formulas in M will contain codes for formulas whose length are nonstandard. The encoded proofs will similarly include the encodings of proofs from the metatheory as well as proofs that include nonstandard formulas or are of nonstandard length.

Definition 62. A formula is defined to have the following subformulas:

• x ∈ y and x = y have no subformulas

• ∃x φ and ¬φ have φ as a subformula and

• φ ∧ ψ has φ and ψ as subformulas.

Definition 63. A list φ1, . . . , φn (in the metalogic) is subformula closed if, for any φi, all subformulas of φi are in the list.

The following lemma is not the usual form of the Tarski-Vaught test⁶but the core idea is the same.

Lemma 64 (Tarski-Vaught Test). Let M, N be transitive classes such that M ⊂ N and let the list φ₁, . . . φ_n be subformula closed. Then the following clauses are equivalent:

(a) For every φ_i,

∀x1, . . . , xn∈ M(φ^M_i (x1, . . . , xn) ↔ φ^N_i (x1, . . . , xn)) is true.

3An example of an encoding is the G¨odel coding of arithmetic formulas down to natural numbers. In the text we refer to some unspecified way of mapping formulas in the metatheory to sets.

4If a theory is consistent it has a model and vice versa.

5This is similar to nonstandard models of Peano arithmetic.

6The usual Tarski-Vaught test can be found in [5]

(17)

(b) Whenever φi(y1, . . . , ym) ≡ ∃x φj(x, y1, . . . , ym) (with all free variables displayed),

∀y₁, . . . , y_m∈ M∃x ∈ N φ^N_j (x, y₁, . . . , y_m) → ∃x ∈ M φ^N_j (x, y₁, . . . , y_m).

As a special case, if N = V this test gives a criteria for when a formula is absolute. Whenever we refer to this result we will be referring to (b)⇒(a).

Proof. To show (a)⇒(b), take a φ_i(y₁, . . . , y_n) ≡ ∃x φ_j(x, y₁, . . . , y_m) and apply (a). For (b)⇒(a), we proceed by induction on φ_i taking as induction hypothesis that (a) is true for any subformula of φ. All cases except for the quantifier case are trivial:

(x ∈ y)^{M def}⇔ x ∈ y ^def⇔ (x ∈ y)^N (for x, y ∈ M)

(x = y)^{M def}⇔ x = y ^def⇔ (x = y)^N (for x, y ∈ M)

(¬ψ)^{M def}⇔ ¬ψ^M ⇔ ¬ψ^IH ^N ^def⇔ (¬ψ)^N (ψ ∧ χ)^{M def}⇔ ψ^M∧ χ^{M IH}⇔ ψ^N∧ χ^{N def}⇔ (ψ ∧ χ)^N. Fix y₁, . . . , y_m∈ M and assume φ_i≡ ∃x φ_j(x, y₁, . . . , y_m); then

φ^M_i ^def⇔ ∃x ∈ M φ^M_j (x, y1, . . . , ym)⇔ ∃x ∈ M φ^IH ^N_j (x, y1, . . . , ym)^(b)⇔ ∃x ∈ N φ^N_j (x, y1, . . . , ym)^def⇔ φ^N_i .

This is all the preparation we need to create the initial model of our finite fragment.

Theorem 65 (The Reflection Theorem). Given any list (in the metatheory) of formulas φ₁, . . . , φ_n and an ordinal α there is a β > α such that φ₁, . . . , φ_n are absolute for R(β).

Proof. We assume that φ1, . . . , φn is subformula closed, if it is not, append the missing formulas and get a longer finite list of formulas. We will use the Tarski-Vaught test on R(β) and V to show the absoluteness of φ1, . . . , φn. For each φi≡ ∃x φj(x, y1, . . . , ym), define Fi and Gi as

Gi: V^m→ Ord, Gi(y1, . . . , ym) =

(γ if γ is the smallest ordinal where ∃x ∈ R(γ) φj(x, y1, . . . , ym) 0 if no such γ exists.

Fi: Ord → Ord, Fi(γ) = sup{Gi(y1, . . . , ym) | y1, . . . ym∈ R(γ)}

For all other φi, let Fi(γ) = 0 and Gi(x1, . . . , xn) = 0 for every γ ∈ Ord and x1, . . . , xn ∈ V. We now recursively define an ω-sequence by

β₀= α β_k+1= max(β_k+ 1, F₁(β_k), . . . , F_n(β_k))

and let β = sup_k∈ωβk. The sequence is strictly increasing so β must be a limit ordinal and R(β) = S

k∈ωR(βk). Assume that for y1, . . . , ym∈ R(β),

φi(y1, . . . , ym) ≡ ∃x φj(x, y1, . . . , ym) holds.

For p from 1 to m, let kp ∈ ω be minimal such that yp ∈ R(βkp). Let k = max(k1, . . . , km). By construction of β_k+1, there is an x ∈ R(β_k+1) ⊂ R(β) such that φ_j(x, y₁, . . . , y_m). This is all that the Tarski-Vaught test requires, so the formulas are absolute for R(β).

Corollary 66. Any finite fragment φ1, . . . , φn of ZFC has a model.

It is worth pointing out that m,n and p in the proof of the reflection theorem lives in the metalogic.

This is why this argument cannot be expanded to all of ZF C, the proposition would not be expressible nor provable in first order logic. Next we prove that we can cut out a small chunk of the model while preserving the absoluteness of the formulas. The proof uses the same technique as the L¨owenheim-Skolem downward theorem⁷ from mathematical logic and is therefore called the same thing.

7For the usual L¨owenheim-Skolem downwards, see [5]

(18)

Theorem 67 (L¨owenheim-Skolem Downwards). Given formulas φ1, . . . , φn and a set X there is a set A ⊇ X where |A| ≤ max(ℵ0, |X|) such that φ1, . . . , φn are absolute for A.

Proof. We once again assume that the list is subformula closed. Apply the reflection theorem to φ₁, . . . , φ_n with α ≥ max(1, rank(X)) to get a β > α. The reason we require that α is positive is that the construction assumes that ∅ ∈ R(β). Then X ⊆ R(α) ⊆ R(β) and the formulas are absolute for R(β). We will then construct a set A such that X ⊆ A ⊆ R(β) and |A| ≤ max(ℵ₀, |X|) where A and R(β) passes the Tarski-Vaught test. It then follows that φi(. . . ) ⇔ φ^R(β)_i (. . . ) ⇔ φ^A_i(. . . ) and the formulas are thus absolute for A.

Pick a choice function on R(β). For each φi ≡ ∃x φj(x, y1, . . . , yl_i) with all free variables shown where we assume that li≥ 1 (otherwise ∃x (x = x) and other sentences become problematic), define the function Hi: R(β)^lⁱ → R(β) by

Hi(y1, . . . , yl_i) =

(x where x ∈ R(β) is the chosen element such that φ^R(β)_j (x, y₁, . . . , y_l_i)

∅ if no such x ∈ R(β) exists.

For the remaining φi, let Hi[A^l_kⁱ] denote ∅. Let

A0= X, Ak+1= H1[A_k^l¹] ∪ · · · ∪ Hn[A^l_kⁿ], A = [

k∈ω

Ak.

Then all the Ak have (by induction) cardinality at most |X| so the union has cardinality at most ℵ0· |X| = max(ℵ0, |X|). Furthermore, since A and R(β) by construction passes the Tarski-Vaught test for φ1, . . . , φn and since these formulas are absolute for R(β) they are also absolute for A.

Definition 68. A relation R is extensional on A (or simply extensional when A is clear from the context) if

∀x, y ∈ Ax = y ↔ ∀z ∈ A(zRx ↔ zRy).

Definition 69. Given a class A and a set-like, well-founded relation R, the Mostowski collapsing function is defined as

F : A → V, F(a) = {F(a⁰) | a⁰Ra}.

We call {F(a) | a ∈ A} the Mostowski collapse of A.

Lemma 70. If A is well-founded by a set-like, well-founded, extensional relation R then the Mostowski collapsing function is bijective to the Mostowski collapse of A and for a, b ∈ A, aRb ⇔ F(a) ∈ F(b).

Further, the Mostowski collapse of A is transitive.

Proof. If a⁰6= b⁰there is by extensionality of R some x ∈ A such that either xRa⁰and ¬(xRb⁰) or ¬(xRa⁰) and xRb⁰. Either way, F(x) will be a member of F(a⁰) or F(b⁰) but not both; thus F(a⁰) 6= F(b⁰) and F is injective. It is by definition surjective and thus also bijective. That aRb implies F(a) ∈ F(b) follows from the definition. The converse is true by injectivity of F: If F(a) ∈ F(b) there is some a⁰Rb where F(a) = F(a⁰) and thus aRb. The Mostowski collapse is transitive by definition: For any x ∈ y ∈ {F(a) | a ∈ A} there are a⁰, a⁰⁰∈ A such that F(a⁰) = x, F(a⁰⁰) = y, hence x ∈ {F(a) | a ∈ A}.

Proposition 71. If X is a set where ∈X= {hx, yi | x ∈ y ∈ X} is extensional, the Mostowski collapse M of X will for any formula φ satisfy

∀x1, . . . , xn∈ X φ^X(x1, . . . , xn) ↔ φ^M(F(x1), . . . , F(xn)).

Proof. The proof is by induction on formulas with the induction hypothesis that the theorem holds for any subformulas. Let x⁰= F(x) and y⁰= F(y). It is true for the atomic formulas:

(x ∈ y)^X ≡ x ∈ y ⇔ x⁰∈ y⁰ ≡ (x⁰ ∈ y⁰)^M (previous proposition) (x = y)^X ≡ x = y ⇔ x⁰= y⁰≡ (x⁰ = y⁰)^M. (injectivity of F)

(19)

Let y1, . . . , yn ∈ X and let ~y denote y1, . . . , yn and ~z denote F(y1), . . . , F(yn). It then follows that (ψ(~y) ∧ χ(~y))^X ≡ ψ^X(~y) ∧ χ^X(~y)⇔ ψ^IH ^M(~z) ∧ χ^M(~z) ≡ (ψ(~z) ∧ χ(~z))^M

(¬ψ(~y))^X ≡ ¬ψ^X(~y) ⇔ ¬ψ^IH ^M(~z) ≡ (¬ψ(~z))^M. Finally, when the formula is an existential quantification,

(∃x ψ(x, ~y))^X ≡ ∃x ∈ X ψ^X(x, ~y)

⇔ ∃x ∈ X ψIH ^M(F(x), ~z)

⇔ ∃x ∈ M ψ^M(x, ~z)

≡ (∃x ψ(x, ~z))^M.

It should be noted that ∈_X is not always extensional. Consider the set X = {∅, {{∅}}}:

∅ /∈ ∅, {{∅}} /∈ ∅, ∅ /∈ {{∅}}, {{∅}} /∈ {{∅}}.

If ∈_X were extensional then (∅ = {{∅}})^X would be true.

We can at last stitch these steps together and get a countable transitive model.

Theorem 72 (Countable Transitive Model Theorem). For any list φ1, . . . , φn of axioms of ZFC there is a countable transitive set M such that φ1, . . . , φn are true in M .

Proof. Start by adding the axiom of extensionality to the list if it is not already present. Apply the L¨owenheim-Skolem Downwards theorem with X = ∅. The resulting set A is then countable and our formulas are true in A. The formulas are by the last proposition also true in the Mostowski collapse of A. By the previous lemma the Mostowski collapse is transitive and bijective with A and thus also countable.

An odd consequence of this theorem is that if we add a constant symbol N to our formal first order language then the theory TN = ZF C + ZF C^N+ (N is transitive) + (N ≤cℵ0) is a conservative extension of ZF C. To clarify, ZF C^N means the axioms of ZF C relativized to N and “conservative extension”

means that if TN ` φ and φ does not contain the symbol N then ZF C ` φ. An informal argument for conservativity goes like this: Let p be a proof of φ from TN and let φ^N₁ , . . . , φ^N_n be the assumptions from ZF C^N used in p. The countable transitive model theorem then says there exists a model M of φ^M₁ , . . . , φ^M_n that has the needed properties to replace N in the proof p. Every occurence of φ^N_i can be replaced by a derivation of φ^M_i , and similarly for (N is transitive) and (N ≤c ℵ0). With this in mind we will from now on work in the theory T_M. As stated in section 3, Metatheory and Formalism we will assume that if φ ` ψ then T ` φ^M → ψ^M. A proof of this assumption would be of a proof theoretic nature and is therefore outside the scope of this thesis.

We end the section by proving a few properties of M .

Lemma 73. Let u1, . . . , un ∈ M and φ(x, u1, . . . , un) be absolute for M . If φ(x, u1, . . . , un) defines a unique x ∈ V and φ^M(y, u1, . . . , un) for some y ∈ M then x = y.

Proof. Let x be the unique set satisfying φ(x, u1, . . . , un). If y ∈ M satisfies φ^M(y, u1, . . . , un) then by absoluteness, φ(y, u1, . . . , un) so x = y.

This lemma highlights a mental shortcut in reasoning about objects in M . The implicit reasoning in a lot of places goes as follows: A certain construct C can be made by appealing to ZF C. Some formula φc(x, u1, . . . , un) describes the construct uniquely. By reasoning inside M , for some x ∈ M , φ^M_C(x). If the formula happens to be absolute for M , φC(x) also holds. Examples of this pattern are in order:

Absolute set comprehensions If φ(x) is an absolute formula then by lemma 61, so is z = {w ∈ y | φ(w, x₁, . . . , x_n)}. Thus

(z = {w ∈ y | φ(w, x1, . . . , xn)})^M ⇔ z = {w ∈ y | φ(w, x1, . . . , xn)}

or equivalently, ({w ∈ y | φ(w, x1, . . . , xn)})^M = {w ∈ y | φ(w, x1, . . . , xn)}.

(20)

Empty set The empty set is absolute by proposition 60 (the gigantic list of equivalent formulas) so

∅^M = ∅).

Domain and range The domain and range of a binary relation is by the same proposition absolute so dom^MR = dom R and range^MR = range R.

Union Again, our friend proposition 60 tells us that unions are absolute, so (x ∪ y)^M = x ∪ y and (S x) = S x.

Lemma 74. If X ⊆ M and X has a finite cardinality then X ∈ M . Proof. The proof is by induction on the cardinality of X.

|X| = 0 : If |X| = 0 then X = ∅ and by lemma 73 and lemma 60, ∅ ∈ M .

|X| = n + 1 : Pick x ∈ X. Then x ∈ M and |X \ {x}| = n. The induction hypothesis yields X \ {x} ∈ M . By lemma 73 and lemma 60, {x} ∈ M and by the same lemmas X = (X \ {x}) ∪ {x} ∈ M .

Some of the functions and relations used, notably the forcing relation (that will be defined later) and the rank function are defined by recursion on some well ordering. The following result gives us a criteria when such a function or relation is absolute.

Theorem 75. Let A be well-founded by R and F : A × V → V a defined function. If A, R and F are absolute for M , (R is set-like)^M and

∀x ∈ A^M (pred(A, x, R) ⊆ M ) (†)

then the defined function G characterized by

∀x ∈ A [G(x) = F(x, G pred(A, x, R)]

is absolute for M .

Proof. We start by proving that (R is well-founded)^M. By absoluteness R^M = R ∩ (M × M ). For any nonempty S ∈ M , S ⊆ A^M we see that there is an R-minimal element in S. This is also an R^M-minimal element so (R is well-founded)^M. We can therefore use the recursion theorem (theorem 11) to define G inside of M such that

(∀x ∈ A [G(x) = F(x, G pred(A, x, R)])^M

⇔ ∀x ∈ A^M [G^M(x) = F^M(x, G^M pred(A^M, x, R^M))].

We note that pred(A, x, R) = pred(A^M, x, R^M) by (†) and by absoluteness of A and R in M . Assume there is an R-minimal x ∈ A^M such that G(x) 6= G^M(x). Then G(y) = G^M(y) for all yRx so G pred(A, x, R) = G^M pred(A, x, R) and we get the contradiction

G(x) = F(x, G pred(A, x, R)) = F^M(x, G^M pred(A^M, x, R^M)) = G^M(x).

Lemma 76. The rank function is absolute for M .

Proof. First of all, rank(x) = sup{rank(y) + 1 | y ∈ x}. The well-founded relation is ∈, which is set-like in M since M is transitive. The domain A = V is clearly absolute. The function F uses supremum (i.e.

union), successor and function application, all shown to be absolute. Therefore according to theorem 75, rank is absolute for M .