Character Recognition Using Persistent Homology

(1)

SJÄLVSTÄNDIGA ARBETEN I MATEMATIK

MATEMATISKA INSTITUTIONEN, STOCKHOLMS UNIVERSITET

av

Martin Strandgren

2017 - No 38

(2)

(3)

Character Recognition Using Persistent Homology

Martin Strandgren

Självständigt arbete i matematik 15 högskolepoäng, grundnivå Handledare: Olof Bergvall

(4)

(5)

Final thesis

Character Recognition Using Persistent Homology

by

Martin Strandgren

(6)

(7)

Final thesis

Character Recognition Using Persistent Homology

by

Martin Strandgren

(8)

(9)

Abstract

We investigate the potential of using persistent homology based on curvature filtrated tangent complexes to classify handwritten letters for recognition.

Attempting to replicate results from similar studies on a data set of diverse hand-written letters, we employ a variety of methods of pre-processing to arrive at a barcode representation. Through a metric defined on the barcodes, we can then cluster and compare different letters. Unfortunately, with the methods investigated, no combination of parameters and alterna- tive processing steps result in robust classification or even clustering of the hand-written letters in our data set. We provide a full theoretical background as well as openly available implementations of all algorithms tested.

(10)

(11)

Theoretical Background

As long as we’re working with real numbers in two or three dimensions, we can create visualizations, enabling quick intuitive understanding of abstract mathematical concepts. This has lead to a prolific invention of concepts and tools that mathematicians are keen on using in other circumstances that lend themselves less readily to intuition.

We’ll take a closer look at the particularly useful concept of continuous functions. One fairly intuitive definition is that as a point x approaches a point c, the function value f (x) should approach f (c):

xlim→cf (x) = f (c)

More broadly, this means that points that are near one another in the domain of f correspond to points that are near one another in the range of f . I.e., if x is near c, then f (x) is near f (c). So as long as we can define what we mean by near, continuity is a piece of cake.

1.1 Metric Spaces

One colloquially accepted meaning of near is that x is near c if the distance between them is small. It turns out this works well mathematically; as long as we have a way to measure distance, continuity is again a piece of cake.

Definition 1. Take X to be a set, and let the distance between two points x1, x2 ∈ X be given by the function d(x1, x2) : X → R. Then f : X → X is continuous if for any ε > 0 there is a corresponding δ > 0 so that

d(x1, x2) < δ ⇒ d(f(x¹), f (x2)) < ε

This works just as well if the range of f is different from its domain:

(14)

CHAPTER 1. THEORETICAL BACKGROUND

Definition 2. If X is a space with distance function dX(x1, x2) and Y is a space with distance function dY(y1, y2), then f : X → Y is continuous if for any ε > 0 there is a corresponding δ > 0 so that

dX(x1, x2) < δ ⇒ dY(f (x1), f (x2)) < ε

So from being confined to Rⁿ, we are now free to work with any space (or set) that has a defined distance measure. The distance measure can be any relation that fulfills the following rules:

• d(x¹, x2) ≥ 0, and d(x¹, x2) = 0 if and only if x1 = x2

• d(x1, x2) = d(x2, x1) (symmetry)

• d(x1, x2) + d(x2, x3)≥ d(x1, x3) (triangle inequality)

Then we call d : X² → R a metric on X, and X a metric space.

1.2 Topological Spaces

What about spaces that don’t come with a concept of distance? As long as we can agree on what near means, there’s no reason to limit ourselves.

Let’s rewrite our definition of continuity above in terms of open balls.

dX(x1, x2) < δ ⇒ x² ∈ B^X(x1, δ)

where BX(x1, δ) is the open ball around x1 with radius δ:

BX(x1, δ) ={x ∈ X; d^X(x1, x) < δ}

Then, with the same X and Y as in section 1.1, f : X → Y is continuous if for any ε > 0 there is a corresponding δ > 0 so that

x2 ∈ B(x1, δ)⇒ f(x2)∈ BY(f (x1), ε)

This still requires a metric, so to go beyond we have to rethink what an open ball is. An open ball in X is an open subset of X with some restrictions related to distance. If we do away with the notion of distance, the closest thing to an open ball is an open neighborhood of x1 ∈ X, defined simply as an open subset of X containing x1. Then f : X → Y is continuous if for any open neighborhood V of f (x1) in Y there is a corresponding open neighborhood U of x1 in X, so that f (x)∈ V implies that x ∈ U.

So continuity means points (or elements) in the domain that are near one another correspond to points (or elements) near one another in the range, if we define near to mean ”in the same open subset”. We can rewrite this definition more simply:

Definition 3. f : X → Y is continuous if whenever V ⊂ Y is open in Y , then f⁻¹(V ) is open in X

2

(15)

1.3. SIMPLICIAL COMPLEXES AND TRIANGULATION

Or even more succinct:

Definition 4. f : X → Y is continuous if the preimage of an open set under f is an open set.

Now we only need to define what an open subset is. Much like with a metric, we can define openness however we want, so long as our definition behaves similarly to our intuitive definition of openness in the less general metric spaces. More strictly, in a set X we can define a family of open subsets T so that

• The empty set and X itself are in T

• Any union of sets in T is again in T

• Any finite intersection of sets in T is again in T

It’s trivial to see that open sets in a metric space abide by these rules. We call our definition of open sets in X a topology T on X, and X a topological space.

1.2.1 Homeomorphisms

The point of topology as a field of study is to investigate properties that are preserved under continuous deformation. So from the perspective of topology, spaces are ”equal” if they are continuous deformations of each other. We say that the topological spaces are homeomorphic.

Definition 5. Topological spaces X and Y are homeomorphic if there exists a map f : X → Y that is continuous, invertible, and the inverse is also continuous. Then we call f homeomorphism, and write X ∼= Y .

Remembering our definition of continuity from definition 2, a less rig- orous wording is that f is a 1–1 map that preserves openness. Since the homeomorphism is invertible, and any composition of homeomorphisms is trivially also a homeomorphism, this is indeed an equivalence relation that can be used to classify spaces in useful ways. In this sense, a square is

”equal” to a circle, a cube to a sphere, and a donut to a coffee cup.

1.3 Simplicial Complexes and Triangulation

An interesting application is that if we find ourselves some complicated- looking set X that doesn’t lend itself well to some type of calculations, we can try and find a simpler set Y ∼= X and perform our calculations there instead. In euclidean space of one to three dimensions we can imagine the simplest set possible:

• 0 dimensions: a point

(16)

• 1 dimension: a closed line segment

• 2 dimensions: a triangle

• 3 dimensions: a tetrahedron (solid)

We call such a simple set an n-simplex (for n dimensions). Generally, an n-simplex is the smallest convex hull of any n + 1 points in Rⁿ⁺¹ that are in general linear position (i.e. not more linear dependent than they must).

Each of the n + 1 points is called a vertex, and a convex hull of any subset of those vertices is called a face. Colloquially we’d probably say that the faces of a tetrahedron are triangles, but mathematically the edges of the tetrahedron are also faces by definition.

We can build more intricate shapes by joining together simplices into a simplicial complex :

Definition 6. A simplicial complex Σ is any finite union of simplices (pos- sibly of varying dimension) where:

• Any face of a simplex in Σ is also in Σ

• Any intersection of simplices in Σ is also a simplex in Σ

So for a complicated-looking topological space X, we want to find a simplicial complex K ∼= X to perform our calculations on. If this is possible, we call X triangulable, and the process of constructing K triangulation [1].

Most shapes that we encounter in euclidean space are triangulable, e.g.

circles and squares are homeomorphic to triangles, a sphere is homeomorphic to a tetrahedron, and with a little creativity we can find a triangulation for our donut/coffeecup (for an example, see figure 1.1).

1.4 Homology

1.4.1 Chains

We want to classify our triangulated spaces in a way reminiscent of homeomorphisms, but that is easier to calculate. For a simplicial complex K we can start combining simplices in K into chains, by creating formal linear combinations using real coefficients:

c = X

σ∈K

aσσ

where c is called the d-chain of K, σ are all d-simplices in K and aσ

are linear coefficients. For our purposes, it works well to take aσ ∈ {0, 1}, i.e. a simplex is either included in the chain or it’s not (no halfsies). For a triangulated torus (donut shape), a 2-chain is a combination of triangles (a

”patch”, if you will), while an 1-chain is a combination of edges (similar to a

4

(17)

1.4. HOMOLOGY

path, though chains needn’t be contiguous, somewhat breaking the intuitive examples). See figure 1.1 for examples.

Keeping our aσ ∈ {0, 1}, we can rewrite the linear combination as a vector multiplication ndKd, where nd = (aσ1, aσ2, . . .) are the coefficients [0, 1] and Kd = (σ1, σ2, . . .) is a vector with all d-simplices in K. We can then define addition of chains as entry-wise mod 2 addition of nd vectors, and define a chain group Cd(K) of all the d-chains under this operation.

1.4.2 Cycles and Boundaries

The boundary of a d-chain is the mod 2 sum of the (d− 1)-faces of all the d-simplices in the chain, written ∂d : Cd(K)→ C^d−1(K). A simple example is a 2-chain in our triangulated torus. The chain is a set of triangles, and the boundary is the sum of all the edges of those triangles. If two triangles are adjacent, i.e. share an edge, that edge is excluded from the boundary (1 + 1 = 0). Consider D in figure 1.1—a combination of three adjacent triangles (1-simplices), with its boundary formed by the non-shared edges.

A d-cycle is a d-chain that has no boundary. I.e. the set of d-cycles is the kernel of the boundary operator ∂d. In the triangulated torus in figure 1.1 a 2-cycle would be the entire surface of the torus, and a 1-cycle would be a collection of edges stitched together in a loop with no loose ends, e.g. C, E, F or H. A vertex has no boundary, which means all 0-chains are cycles.

It’s easy to see that in a d-cycle, each (d−1)-simplex occurs exactly twice.

In the case of 1-cycles, i.e. combinations of edges, the endpoint vertices (0- simplices) of each edge coincide with the endpoints of other edges in the simplex. If a vertex occurred only once, it would form a boundary! Though it’s harder to visualize, it follows from the mod 2 addition that this holds for higher dimensions as well.

1.4.3 The Homology Group

We call the group of d-boundaries Bd(K) and the group of d-cycles Zd(K), both subgroups of Cd(K). We can also see that Bd(K) is a subgroup of Zd(K) (all boundaries are cycles, but not all cycles are boundaries). The group of all cycles that are not boundaries is called the homology group:

Hd(K) = Zd(K)/Bd(K)

Two d-cycles in K correspond to the same element in the d-th homology group Hd(K) if the difference between them is a boundary. In figure 1.1, the two 1-cycles F and H (1-chains) form a boundary for the cylinder segment between them G, and their corresponding elements in the homology group H1(K) are thus equal, or homologous. The dimension of the d-th homology group, equal to the number of d-dimensional holes in K, is called the d-th Betti number :

βd = dim (Hd(K))

(18)

A

B D

E F

H G

C

Figure 1.1: A triangulated torus (donut) shape with examples of chains, cycles and boundaries: A) a vertex, or 0-simplex, B) an 1-chain, C) an 1- cycle, D) a 2-chain and its 1-boundary, E) a 1-cycle that is not a boundary, F) an 1-cycle that is homologous to E, G) a 2-chain bounded by {F, H}

1.5 Persistent Homology

When it comes to applications in computer graphics and geometric modeling, topology and homology are often too blunt as tools to be particularly useful on their own. Shapes that are intuitively quite different have equivalent topologies. We can increase the precision by parameterizing our topological space with a filter function [6]. Let X be a topological space, and f : X → R be a filter function. We call X^t = f⁻¹(−∞, t] sublevel sets, and we look at how the structure of X^t changes as t goes from −∞ to ∞.

1.5.1 Filtered Simplicial Complex

In the case of homology, recall that a simplicial complex K is a set of simplices σ so that if σ ∈ K, then τ ∈ σ implies that τ ∈ K. A filtration on K is a nested sequence ∅ = K⁰ ⊆ K¹ ⊆ K² ⊆ . . . ⊆ K^m = K. We then call K a filtered complex. I practice, we define a filter function f : K → N, so that Ki is the linear combination of all simplexes σ for which f (σ)≤ i. We can think of f (σ) as the birth time of the simplex σ.

6

(19)

1.5. PERSISTENT HOMOLOGY

If our filtration parameter is from a non-countable set, e.g. t∈ R, we can define a filter function f : K → R. A sublevel complex K^t is then a linear combination of all simplices σ so that f (σ)≤ t, and we note that K−∞ =∅ and Kt1 ⊆ Kt2 for every t1 ≤ t2.

As an example, we consider an ellipse in R², which we triangulate and filter on the curvature κ at each vertex. The filtered complex Kκ thus consists of all vertices with a curvature less than κ, and all edges bounded by those vertices. In figure 1.2 we clearly see how simplices with lower curvature appear first.

Figure 1.2: Filtration of an ellipse based on curvature κ. Note that each plot shows Kκ for a different κ, but the steps are too small for the precision in the plot titles.

1.5.2 Barcode Descriptors

Recall from section 1.4.3 that the d-th homology group Hd is the d-th cycle group bounded by the d-th boundary group, and the betti number βd is the rank of Hd. We consider a 2D filtered complex similar to the ellipse in figure 1.2, where we only have vertices and edges and no higher dimensional simplices. Then

• β0^κ, the 0-th betti number of Kκ, is the number of connected components of Kκ

(20)

• β1^κ is the number of 1-cycles in Kκ (as we have no 2-simplices, all 1-cycles are distinct classes in H1)

To see what happens with β₀^κ and β₁^κ as κ grows, we can construct a barcode descriptor of our filtered complex. 0-bars are constructed as follows:

• identify each connected component of Kκ by its oldest vertex

• represent each connected component by an ordered interval P = {i, j}

where i is the birth time of the component’s oldest vertex, and j is the birth time of the edge that connects the component to another component, thus removing that component from the homology group 1-bars are identified by the edge that closes a component into a cycle, the start of the interval is the birth time of that edge. As we don’t have any higher dimensional simplices in our example, all 1-bars have infinite length.

If we would have 2-simplices, a 1-bar ends when a 2-simplex is born that makes the cycle into a boundary.

Compare the bar code for our filtered ellipse in figure 1.3, to its filtration in figure 1.2. 0-dimensional bars are bounded by square markers, 1-dimensional by round markers. The lowermost bar is the component on the left side, which never exits the homology group. The second bar is the component on the right, that is eventually connected to the first curve and removed from the homology group. Most 0-bars are of zero length, i.e. a vertex that is born and immediately connected to one of the existing connected components. The topmost component is the 1-cycle created when the ellipse is completed.

Figure 1.3: Filtration of an ellipse based on curvature κ

8

(21)

Chapter 2

Application

2.1 Problem

We apply persistent homology to the problem of character recognition.

Given a hand-written letter, we want to describe its persistent topological features in order to identify which character it represents. Optical character recognition (OCR) is a mature field, and the purpose is not to try and outperform leading methods, but to indicate possible future applications for topology and homology.

2.2 The Filtered Tangent Complex

Firstly, we need to create a simplicial complex representation of our letter.

Collins et. al. [4] suggest a filtered tangent complex.

Definition 7. Let X be a curve in R². We define T⁰(X) ⊆ X × S¹ to be the set of the tangents at all points of X, that is,

T⁰(X) =

(x, ζ)| lim_t

→0

d(x + tζ, X)

t = 0

(2.1) A point (x, ζ) is thus a tangent vector at point x, in the direction ζ ∈ S¹. The tangent complex is the closure of T⁰, T (X) = T⁰(X)⊆ X × S¹.

We filter the tangent complex using the curvature κ, i.e. T_κ⁰ is the set of points (x, ζ) ∈ T⁰(X) where the curvature κ(x) is less than κ. The filtered tangent complex T^{f ilt}(X) is the parametrized family of spaces{T^κ(X)}^k≥0, where Tκ(X) = T_κ⁰(X).

2.2.1 Calculating Tangents

From a point cloud representation of the curve X, we use the approach from [4] and approximate T (X) by calculating the tangent in each point using

(22)

CHAPTER 2. APPLICATION

a total least squares (TLS) fit, that minimizes the perpendicular distances of the tangent line to the point’s k nearest neighbors. I.e., we want to find the hyperplane with normal vector n that minimizes Pk

i=0((xi− x⁰)· n)², where xi is a neighbor point and x0 = _k¹ Pk

i=0xi is the center point of all neighbors.

We create a matrix M where row i is (xi − x⁰) and rewrite the expression to minimize as |Mn|². Then the eigenvector corresponding to the smallest eigenvalue of the covariance matrix M^TM is the normal to the hyperplane. Our tangent vector is thus the eigenvector corresponding to the larger eigenvalue. TLS works better than ordinary least squares fitting here as it’s independent of the parametrization of the points [4].

The appropriate number k of nearest neighbors to consider depends on sample density and noisiness of the underlying data. We have set this value empirically, and have not considered possible automatic ways to calculate an optimal value.

In figure 2.1 we have estimated the tangents for the point cloud representation of the skeleton of a hand-written P.

Figure 2.1: Tangents in each point of a sampled P.

2.2.2 Metric Tangent Space

For our application we also need a metric on our tangent space T (X). Collins et. al. [4] define the squared distance between two points τ = (x, ζ) and τ⁰ = (x⁰, ζ⁰) to be:

10

(23)

2.2. THE FILTERED TANGENT COMPLEX

d²(τ, τ⁰) = Xn i=1

(xi − x⁰i)² + ω² Xn i=1

(ζi− ζi⁰)² (2.2) where ω is a scaling factor. As the difference ζi− ζi⁰, we use the chord length rather than the arc length, as a decent approximation that makes calculations vastly simpler.

One problem with this approach is that tangent vectors pointing in the opposite direction from each other will yield a big difference. With our tangent estimation method in section 2.2.1, it makes sense to consider tangent vectors pointing in opposite directions as equal.

Provided our point cloud data is two-dimensional, we can use an alter- native distance measure to take this into consideration:

Definition 8. We want to get the squared distance between two points τ = (x, ζ) and τ⁰ = (x⁰, ζ⁰). Set τe = x1 x2 ω cos 2ϕ ω sin 2ϕT

where ϕ = arg(ζ). Then we simply use the euclidean norm, so that

d²(τ, τ⁰) =|τ^e− τe⁰|² (2.3) Note that aside from the double angle trick, expression 2.3 is equal to 2.2.

We can thus view our approximation of T (X) as a set of points in R⁴, which will make life easy when exploring different triangulation algorithms later.

We refer to this representation as our 4D representation, as an expansion of our 2D point cloud data.

2.2.3 Calculating Curvature

To estimate the curvature at a point x, we again sample the k nearest neighbors (though the value for k needn’t be the same as in section 2.2.1).

Definition 9. The curvature κ(x, ζ) in a point x in direction ζ is κ(x, ζ) = 1

ρ(x, ζ) (2.4)

where ρ(x, ζ) is the radius of the osculating circle to X at x in direction ζ Instead of calculating the radius of the osculating circle, we can consider the osculating parabola. Two curves y = f (x) and y = g(x) in the plane have second order contact at x0 iff f (x0) = g(x0), f⁰(x0) = g⁰(x0) and f⁰⁰(x0) = g⁰⁰(x0). Placing x0 at the origin, the osculating circle with radius ρ has the equation x² + (y − ρ)² = ρ², so at the origin y = 0, y⁰ = 0 and y⁰⁰ = 1/ρ. The parabola y = x²/2ρ has second order contact with the osculating circle, and thus also with X. In figure 2.2 we see the osculating circle and parabola for a curve X.

So we need to find a parabola with center in x0 that fits the neighbors of x0. We do this by transforming the neighbor points so that the origin is

(24)

X

(0, ρ) (0, ρ/2) x

Figure 2.2: The osculating circle and parabola to X in x (dashed). The circle has center (0, ρ), the parabola has focus (0, ρ/2) and the curvature κ of X at x is 1/ρ.

at x0 and the x-axis is parallel to the tangent in x0. From there we fit a second degree polynomial using ordinary least squares (see [4]).

We show the curvature of letters P, U and V in figure 2.3. Note the high curve estimates on the left tip of U—the estimate is quite correct, but not a property you would expect from a typical U, indicating how sensitive the algorithm is to the variations in actual handwriting.

Figure 2.3: Curvature κ in each point of sampled letters. Curvature goes from dark blue (low curvature) to bright yellow (high curvature).

12

(25)

2.3. TRIANGULATION

2.3 Triangulation

Triangulating our shape—connecting the vertices by edges—is a crucial step in the algorithm. A bad triangulation will either fail to connect all vertices, or create additional 1-cycles. Both cases will create additional infinite length bars in the barcode, which will make any classification impossible. We look at a few common triangulation methods to see which could be suitable for our application.

Collins et. al. [4] applied triangulation on the tangent space and achieved good results. With a less well-behaved dataset however, noisy tangent estimates can make the result of such triangulation unpredictable. We have therefore tested triangulation both on our 4D representation (including tangent data, as described in section 2.2.2), and on the original 2D point cloud.

2.3.1 Cech Complex ˇ

Take a set of points X in some metric space and a real number ε > 0. For each subset S ⊂ X of points, form a (ε/2) ball around each point, and include S as a simplex if there is a point contained in all the balls with centers in S.

The ˇCech complex C^ε is the union of all such simplices S in X, or formally:

Cε(X) = (

convS | S ⊆ X, \

x∈S

Bε(x) 6= ∅ )

We see that any subset S⁰ ⊂ S of a simplex S is also a simplex, satis- fying the requirements for a simplicial complex. Unfortunately, the ˇCech complex is quite expensive to compute. We may also end up with high- dimensional simplices even for a two dimensional point set—if four balls have a common intersection in two dimensions, the ˇCech complex will include a four-dimensional simplex.

Because of the computational complexity and the resulting high-dimensional simplices, we skipped the ˇCech complex in our application and focused on the similar Rips complex, described below.

2.3.2 Rips Complex

The Rips Complex (or Vietoris-Rips complex ) is a cheaper to compute approximation of the ˇCech complex. Again we form (ε/2) balls around the points in X, but we only look at pair-wise intersecting balls to find our 1-simplices, and add higher dimension simplices when all of their lower sub- simplices are present:

R^ε(X) ={conv S | S ⊆ X, d(s, t) ≤ ε, s, t ∈ X}

(26)

(a) ˇCech complex: 2- simplex is formed as all three ε-balls overlap

(b) ˇCech complex: no 2- simplex, as ε-balls overlap only pair-wise

(c) Rips complex: 2- simplex is formed when boundary is formed, regardless of overlap Figure 2.4: Difference between ˇCech and Rips complex for 2D point cloud data

Limiting ourselves to two dimensions and ≤ 1-simplices, the Rips complex is equivalent to the ˇCech complex. The difference between the ˇCech complex and the Rips complex becomes obvious with multiple balls intersecting, as shown in figure 2.4.

The Rips complex is also quite expensive to compute, but as it can be represented as a graph, there are many optimized algorithms available, which makes it popular to use in practice.

The Rips complex gives intuitive triangulations for both 4D and 2D representations, as can be seen in figure 2.5. Performance depends quite heavily on the sampling of the underlying shape as well as the parameter settings. Rips triangulation also tends to create a lot of extra connections and 1-cycles. This is not a big issue as long as we only care about β0, i.e. the number of connected components, but it does make subsequent calculations more expensive.

2.3.3 Delaunay Complex

Given our point set X, we define the Voronoi cell Vp of a point p as follows:

Vp ={x ∈ Rⁿ | d(x, p) ≤ d(x, q), ∀ q ∈ X}

We then define the Delaunay triangulation S of X to be (isomorphic to) the nerve of the collection of Voronoi cells:

Del(X) = (

σ⊆ X | \

p∈σ

Vp 6= ∅ )

That is, the Delaunay triangulation connects all vertices who’s Voronoi cells share a common edge, as seen in figure 2.6. A set of vertices σ ⊆ X form a simplex in Del(X) iff these vertices all lie on a common (d−1)-sphere in R^d. If all vertices are in general position, i.e. no d + 2 Voronoi cells have a

14

(27)

2.3. TRIANGULATION

Figure 2.5: Rips triangulation of a P. When applied to the 4D representation (left), we get two distinct components—the intersections don’t connect due to the high difference in tangent angle (as expected and intended). When applied only to the vertices, we only get one component, and intersections are full of ”extra” edges.

non-empty common intersection, the Delaunay triangulation is a simplicial complex.

Compared to ˇCech and Rips complexes, the Delaunay complex does not factor in the distance between the vertices. In the case of hand-written letters, Delaunay triangulation is largely useless; we include it mainly as a theoretical introduction to the more promising α-complex. In figure 2.7 we have applied Delaunay triangulation to our sampled P.

(a) Voronoi cells (b) Delaunay complex (c) α-complex Figure 2.6: The Delaunay and α-complexes are both based on voronoi cells, but the α-complex only connects vertices sufficiently close to one another, resulting in fewer simplices.

(28)

Figure 2.7: Delaunay triangulation of a P. The Delaunay complex connects vertices regardless of distance, which makes it quite useless for this application. We include it as a theoretical introduction to the α-complex.

2.3.4 α-Complex

To remedy this shortcoming, the α-complex is simply a parametrized De- launay triangulation where only points sufficiently close to one another are considered for pairing. More formally, for a point p we form a ε/2- ball Bε(p) around p and intersect it with p’s Voronoi region Vp to form Rε(p) = Bε(p)∩ V^p. We see that these sets are convex, and that the union of these sets equals the union of ε-balls (S

Rε = S

Bε). The α-complex is defined as the nerve of these sets:

α(X, ε) = (

σ⊆ X | \

p∈σ

Rε(p)6= ∅ )

See 2.6 for an illustration.

When applying α triangulation to our sampled P, we get fairly intuitive sampling in both 2D and 4D representations (see figure 2.8). Compared to the Rips complex, it creates fewer extra simplices, and seems to work better with a slightly sparser sampling. It’s still sensitive to the radius parameter, however, and it’s difficult to find a parameter setting that works for all letters.

2.3.5 Witness Complex

The witness complex, suggested by de Silva & Carlsson [5], is a triangulation suitable for dense point sets. A subset L ⊂ X of landmark points is chosen

16

(29)

2.3. TRIANGULATION

Figure 2.8: α triangulation of a P. The α-complex performs similarly to the Rips complex, but is more reliable with sparser sampling.

as the vertex set, and we use the remaining points to find higher dimension simplices.

Definition 10. Let D be an n × N distance matrix between a set of n landmarks and N data points. We define the strict witness complex W_∞(D), with vertex set {1, 2, . . . , n} as follows:

• The edge σ = (a, b) belongs to W∞(D) iff there exists a data point 1 ≤ i ≤ N such that D(a, i) and D(b, i) are the smallest two entries in the i-th column of D, in some order.

• By induction in p: suppose all the faces of the p-simplex σ = (a⁰, a1, . . . , ap) belong to W_∞(D). Then σ itself belongs to W_∞(D) iff there exists a data point 1≤ i ≤ N such that D(a⁰, i), D(a1, i), . . . , D(ap, i) are the smallest entries in the i-th column of D, in some order.

We refer to i as a witness to the existence of σ. See figure 2.9 for an illustration.

In the same way that the Rips complex approximates the ˇCech complex, we can approximate the strict witness complex W_∞(D) with a lazy witness complex W1(D) ⊇ W∞(D).

• W1(D) has the same 1-skeleton (set of 1-simplices) as W_∞(D).

• The p-simplex σ = (a0, a1, . . . , ap) belongs to W1(D) iff all of its edges belong to W1(D).

(30)

(a) Point cloud (gray) with landmark points (black)

(b) Witness points (red) (c) Lazy witness complex

Figure 2.9: The witness complex

Analogous to the Rips/ ˇCech case, W1(D) is far faster to compute than W_∞(D), and therefore more commonly used in practice.

De Silva & Carlsson [5] suggest two ways to choose landmark points:

randomly or by maxmin. The latter consists of the following procedure:

• Pick l¹ ∈ Z randomly

• Inductively, if {l¹, l2, . . . , li−1} have been picked, let lⁱbe the point that maximizes the function z 7→ min (D(z, l¹), D(z, l2), . . . , D(z, li−1)), i.e.

the point farthest away from the previously selected points.

• Continue until the desired number of points have been selected, Maxmin gives evenly spread out landmark points, but has a tendency to pick out extremes. In our case, even spread is important, while extremes has relatively limited impact, so maxmin is a better fit for our purposes.

As we can see in figure 2.10, the witness complex works very well on the 2D point cloud data; it yields an intuitive representation without extra simplices, and is not as sensitive to exact parameter settings as the α or Rips complex. However, using 2D space discards a lot of information that were meant to help distinguish the different letters.

Applied to the 4D representation, the witness complex creates fewer accidental 1-cycles than α and Rips complexes, but is less reliable to create the same amount of connected components for different variations of the same letter. In the left subplot in figure 2.10 we see this as a gap in the rightmost curve of the P’s eye. We could not find a parameter setting that split the triangulation at the expected places, like the Rips and α-complex.

2.4 Computing Barcodes

From our filtered simplicial complex we can compute a barcode represented (as described in section 1.5.2) using a simplified version of the more generic algorithm described in [10]. We show a pseudocode version of our implementation in algorithm 1 and provide the complete implementation in a public online repository [8].

18

(31)

2.4. COMPUTING BARCODES

Algorithm 1 Algorithm to get the barcode descriptor of a simplex. S is the filtered complex, with simplices ordered by birth time.

function GetExtendedBoundary(σ, T , Sm) k =|σ|

d = ∂k(σ)∪ Sm

while d6= ∅ do

σi = YoungestSimplex(d) if T [σi] =∅ then

break d = d + T [σi]

function GetBarCode(S) Sm = T = P = ∅

for all σ ∈ S do

d = GetExtendedBoundary(σ, T , Sm) if d =∅ then

Sm = Sm ∪ σ else

σi = YoungestSimplex(d) k =|σi|

T [σi] = d

P^k = P^k∪ {deg(σⁱ), deg(σ)} for all σminSm do

if T [σm] =∅ then k =|σm|

P^k = P^k∪ {deg(σ^m),∞}

return P

(32)

Figure 2.10: The witness complex works very well when taken on the 2D vertex set (right). When triangulating the tangent space (left), however, it produced less reliable triangulations than Rips and α-complexes in our experiments.

2.5 Barcode Distances

After computing the barcode descriptors for our hand-written letters we compare them using the quasi-metric for barcodes defined by Collins et. al.

[4]:

Definition 11. Let I, J be any two intervals in a barcode. We define their dissimilarity δ(I, J) as the symmetric difference: δ(I, J) = |I ∪ J − I ∩ J|.

Given a pair of barcodes B1 and B2, we create a matching M (B1, B2)⊆ B¹× B² = {(I, J) | I ∈ B¹, J ∈ B²}

so that any interval in B1 or B2 occurs in at most one pair (I, J). Let N be the set of non-matched intervals from B1 and B2, i.e. the set of all intervals that do not occur in any pair in M . Then we define the distance of B1 and B2 relative to M to be the sum of the pair-wise differences of the matched intervals, plus the sum of the lengths of the non-matched intervals:

D^M(B1, B2) = X

(I,J)∈M

δ(I, J) + X

L∈N

|L| (2.5)

The general distance between B1 and B2 is defined as the distance for the best possible matching, i.e.:

D(B¹, B2) = min

M D^M(B1, B2) (2.6)

20

(33)

2.5. BARCODE DISTANCES

2.5.1 Calculation

To calculate the difference between two barcodes we use a more transparent but likely less performant variation of the algorithm used by Collins et. al.

[4]. We start by comparing the number of half-infinite intervals in B1 and B2. If they aren’t equal, the odd half-infinite interval will cause the sum to be infinite, so we return ∞. If the number of half-infinite intervals is the same, we sort the starting points of those intervals and create matching pairs.

We then store the absolute difference of those pairs as the total difference of the half-infinite intervals. In the third step, we calculate the difference of the finite intervals. For two intervals I = (i1, i2) and J = (j1, j2), there are three cases:

1. I and J are disjoint, i.e. I∩ J = ∅, so δ(I, J) = |I| + |J|

2. I and J overlap completely, so that I ∩ J = min(|I|, |J|), and δ =

||I| − |J||

3. I and J overlap partially, so that I∩ J < min(|I|, |J|) and δ = |j² − i1− (i2 − j1)|

We set x = i2 − j1 and y = j2 − i1, so that

• if either x ≤ 0 or y ≤ 0, that means I and J are disjoint, and the δ = |x + y|

• otherwise, δ = |x − y|

From this we can construct a distance matrix between B1 and B2 for all pairings M . Then we can use Kuhn-Munkres optimization algorithm (de- tails omitted here, see [7] for complete description) to find the best pairing, and trivially sum the lengths of the non-matched intervals to get the final distance.

(34)

Chapter 3

Results

3.1 Pre-processing

We get images of hand-written letters from the Chars74k dataset [3], described in [2], and we focus on capital letters. We crop and thin our image to a skeleton before random sampling the nonzero pixels and scale the resulting vertices to [−1, 1] to get our point cloud data. See figure 3.1.

The thinning step prevents errors with tangent and curve estimations resulting from too thick lines described by Collins et. al. in [4], as well as allow for a sparser representation and a cleaner triangulation later on.

Collins et. al. seem to have had a data set of already thin letters, which is to prefer, since the thinning itself can result in artifacts. We use the skeletonize function in SciPy’s morphology package, which in turn is based on [9].

3.2 Tangent and Curve Estimation

We estimate tangent vectors ζ and curvature κ for all points using the methods described in sections 2.2.1 and 2.2.3 respectively. We did experiment with removing points with low confidence tangent estimates (based on eigenvalue ratio) as described in [4]. However with our more diverse dataset, this method lacked the desired precision and would remove large parts of several letters, prompting us to skip this step altogether.

3.3 Downsampling

As a last step before triangulation, we downsample our point cloud using the maxmin method described in section 2.3.5. For the witness complex, this step is a part of triangulation, but we found it to be useful even for

22

(35)

3.4. TRIANGULATION

Figure 3.1: Image pre-processing steps

triangulation with other methods. It allows us to use a more dense sampling for tangent and curvature estimation, which will yield better results, while using a more sparse triangulation, which will make barcode calculation cheaper. The number of 1-simplices potentially scales with ⁿ₂

, which quickly makes this step a bottleneck for large sets.

3.4 Triangulation

We then triangulate the acquired point set using one of the methods described in section 2.3. Here, however, our results start to diverge from those described in [4]. Collins et. al. seem to have a very well behaved data set, with straight lines, perfect curves and no stray strokes. With our more

”realistic” dataset, the expected effects from triangulating the full tangent space fail to appear reliably on different instances of the same letter.

For instance, in [4] the difference between the letters U and V is detected through the fact that V, due to the high curvature at the tip, will result in two separate components (as the tangential distance at the tip is too high for the two components to connect in triangulation). V:s in our dataset are not pointy enough for us to find a parameter setting that reliably results in this distinction in representation. In figure 3.2, we see three different V:s and their respective triangulation with Rips complex and α-complex with carefully hand-tuned parameters. Moreover, when triangulating different

(36)

CHAPTER 3. RESULTS

U:s with the same parameter setting, we occasionally get a gap there as well (see figure 3.3).

All triangulation methods are extremely sensitive to noise in tangent estimates, as a stray tangent will potentially split the final complex T_∞(X) and spawn an extra connected component. This will result in an extra half- infinite bar, making it infinitely dissimilar to letters lacking this gap, i.e. the distance between the bar codes in our barcode quasi-metric (from definition 11) will be infinite.

We achieved more consistent results disregarding tangent space altogether and performing triangulation on the 2D vertices. In 2D, the witness complex outperforms the other triangulation methods, and is the least sensitive to parameters. Disregarding tangents unfortunately means losing a large part of the descriptive data, which makes topologically similar shapes hard to distinguish.

3.5 Barcode Distance

Collins et. al. [4] use only 0-bars in their experiments, i.e. the number of connected components for different filtration steps. Initially we hoped to get a more complete descriptor by also considering 1-bars—the number of 1-cycles—e.g. to distinguish the final topology of a C from an O. However, triangulation often results in a number of small ”accidental” loops which result in half infinite 1-bars. The bottom left T in figure 3.4 would thus be infinitely dissimilar to the other two T:s on the bottom row. In the end, we also consider only 0-bars.

We calculate barcodes for J variants of I different letters, and create a distance matrix that we can plot (using our quasi-metric from definition 11). Dark shade means a lower distance, white means dissimilar. A useful result would be dark J × J squares along the diagonal and brighter shades everywhere else.

The result of a test with 2D witness complex triangulation is shown in figure 3.5. Though the triangulation itself (see figure 3.6) looks good, the barcode descriptors are practically useless to classify letters, as most letters are considered similar.

Applying the algorithm to the full 4D representation gives the opposite problem: erratic triangulation results make most letters infinitely dissimilar, even if the data describes the same letter. In figure 3.7 we show the distance matrix for α-complexes of the same letters. As discussed earlier, even if we can get a decent triangulation on one letter, it’s impossible to find parameter settings that work for all letters. The triangulation, shown in figure 3.8 thus contains gaps in unpredictable locations on most letters, and the number of connected components are often different between instances of the same letter

Different triangulation parameters and sample density does affect the result a great deal, as does the choice of triangulation method. However,

24

(37)

3.5. BARCODE DISTANCE

Figure 3.2: Triangulation of different V:s, using the 4D representation. The expected gap in the point of the V does not occur reliably, even with carefully tuned parameters.

Figure 3.3: Triangulation of different U:s, using the 4D representation, with the same parameter settings as in figure 3.2. We get a gap at the bottom for some U:s as well, making reliable distinction between U and V difficult.

(38)

CHAPTER 3. RESULTS

Figure 3.4: 2D triangulation of different T:s. The witness complex performs best, but even there we sometimes get an 1-cycle near intersections, making it impossible to use 1-bars in our descriptor.

even the most carefully hand-tuned settings invariably results in very poor recognition.

As the distance matrices in figure 3.5 and 3.7 show, we could find any configuration within the investigated methods that yielded a descriptor useful for classifying letters. The matrices only show five letters, but the results are similar for all letters. The underlying problem is the unpredictability in the triangulation step: the same letter results in different triangulations depending on minor differences in penmanship, pre-processing artifacts and even the random sampling. As the triangulation determines the barcode, inconsistent triangulations make it impossible to even cluster letters based on this representation.

It was possible to classify letters based on final first betti number β1—the number of 1-cycles in the complete letter—provided a triangulation without gaps and extra 1-cycles. However, we were not able to achieve this triangulation without manually removing small cycles (which is quite expensive), and the result only manages to classify the letters of the alphabet into three groups. Additionally, this method fails for sloppy penmanship, where the writer doesn’t close their letters perfectly.

26

(39)

3.5. BARCODE DISTANCE

Figure 3.5: Distance matrix for letters triangulated with 2D witness complex. The triangulated letters are shown in figure 3.6.

Figure 3.6: Letters triangulated with 2D witness complex. Triangulation contains few apparent errors, but the resulting descriptor is useless for classification.

(40)

CHAPTER 3. RESULTS

Figure 3.7: Distance matrix for letters triangulated with α-complex on the 4D representation. The triangulated letters are shown in figure 3.8.

Figure 3.8: Letters triangulated with α-complex on the 4D representation.

Triangulation contains numerous obvious dissimilarities for instances of the same letter.

28

(41)

Chapter 4

Discussion

Character recognition for handwritten letters generally consists of three steps

1. Pre-processing: Scaling, cropping, deskewing, despeckling, etc.

2. Recognition: Feature detection into a low-dimensional representation with a defined metric, for comparison to training data

3. Post-processing: Apply word and sentence structure statistics to further increase accuracy

In this thesis, we did not have a lot of time to explore different types of pre-processing. It is not immediately clear what type of pre-processing could potentially remediate the problems we encountered during the triangulation step that propagated to the final result, but different experiments with common pre-processing methods may very well yield improvements.

Compared to leading methods for feature detection and low-dimensional representation of characters, our method is very sensitive to noise, not very granular, and extremely expensive in terms of computing power/time and memory. Working with point cloud data often results in quite heavy com- putations, as exemplified by our tangent and curve estimation, performing a least squares fit for a selected number of neighbors for every vertex, not to mention triangulation.

As shown in section 3.5, we have not been able to achieve any effective classification, or even clustering, using the methods investigated in this thesis. We thus don’t see a reason to try and combine these methods with other approaches to classification; firstly, it is unlikely that persistent homology can contribute significantly to final accuracy, and secondly, the algorithm will likely be too computationally expensive to be practical.

It is clear from our result that the problem lies in creating a filtered simplicial complex representation that is both sufficiently descriptive and predictable to make classification possible. From there on, the barcode representation and its quasi-metric work as intended. Borrowing findings from

(42)

CHAPTER 4. DISCUSSION

OCR research in other fields, it might be possible to find parameteriza- tions that accomplish this, and thus makes for accurate classification, even if computational is likely to remain high.

It could also be interesting to use a less descriptive but computationally cheaper representation to get a coarse clustering of letters, that, combined with another method, yields a granular and reliable classification at low computational cost. However, the representation would need to be far more predictable than our approach.

30

(43)

Bibliography

[1] Mark Anthony Armstrong. Basic topology. Springer Science & Busi- ness Media, 2013.

[2] T. E. de Campos, B. R. Babu, and M. Varma. “Character recognition in natural images”. In: Proceedings of the International Conference on Computer Vision Theory and Applications, Lisbon, Portugal. Feb.

2009.

[3] T. E. de Campos, B. R. Babu, and M. Varma. The Chars74K image dataset - Character Recognition in Natural Images. 2009. url: http:

//www.ee.surrey.ac.uk/CVSSP/demos/chars74k/ (visited on 07/10/2017).

[4] Anne Collins et al. “A barcode shape descriptor for curve point cloud data”. In: Computers & Graphics 28.6 (2004), pp. 881–894.

[5] Vin De Silva and Gunnar E Carlsson. “Topological estimation using witness complexes.” In: SPBG 4 (2004), pp. 157–166.

[6] Herbert Edelsbrunner, David Letscher, and Afra Zomorodian. “Topo- logical persistence and simplification”. In: Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on. IEEE. 2000, pp. 454–463.

[7] Harold W Kuhn. “The Hungarian method for the assignment problem”. In: Naval Research Logistics (NRL) 2.1-2 (1955), pp. 83–97.

[8] Martin Strandgren. mstrandgren/homology ocr: Code for bachelor’s thesis on character recognition with persistent homology. 2017. url:

https://github.com/mstrandgren/homology_ocr (visited on 08/04/2017).

[9] TY Zhang and Ching Y. Suen. “A fast parallel algorithm for thinning digital patterns”. In: Communications of the ACM 27.3 (1984), pp. 236–239.

[10] Afra Zomorodian and Gunnar Carlsson. “Computing persistent homology”. In: Proceedings of the twentieth annual symposium on Com- putational geometry. ACM. 2004, pp. 347–356.

Character Recognition Using Persistent Homology

SJÄLVSTÄNDIGA ARBETEN I MATEMATIK

Character Recognition Using Persistent Homology

Final thesis

Character Recognition Using Persistent Homology

Martin Strandgren

Final thesis

Character Recognition Using Persistent Homology

Martin Strandgren

Abstract

Contents

Chapter 1

Theoretical Background

1.1 Metric Spaces

1.2 Topological Spaces

1.2.1 Homeomorphisms

1.3 Simplicial Complexes and Triangulation

1.4 Homology

1.4.1 Chains

1.4.2 Cycles and Boundaries

1.4.3 The Homology Group

A

B D

E F

H G

C

1.5 Persistent Homology

1.5.1 Filtered Simplicial Complex

1.5.2 Barcode Descriptors

Chapter 2

Application

2.1 Problem

2.2 The Filtered Tangent Complex

2.2.1 Calculating Tangents

2.2.2 Metric Tangent Space

2.2.3 Calculating Curvature

2.3 Triangulation

2.3.1 Cech Complex ˇ

2.3.2 Rips Complex

2.3.3 Delaunay Complex

2.3.4 α-Complex

2.3.5 Witness Complex

2.4 Computing Barcodes

2.5 Barcode Distances

2.5.1 Calculation

Chapter 3

Results

3.1 Pre-processing

3.2 Tangent and Curve Estimation

3.3 Downsampling

3.4 Triangulation

3.5 Barcode Distance

Chapter 4

Discussion

Bibliography