Distribution of Critical Points of Polynomials

(1)

Distribution of Critical Points of Polynomials

Fördelning av kritiska punkter för polynom

Ted Forkéus

Faculty of Health, Science and Technology Mathematics

15hp

Supervisor: Martin Lind Examiner: Niclas Bernhoff January 2021

(2)

Abstract

This thesis studies the relationship between the zeroes of complex polynomials in one variable and the critical points of those polynomials. Our methods are both analytical and statistical in nature, using techniques from both complex analysis and probability theory. We present an alternative proof for the famous Gauss-Lucas theorem as well as proving that the distribution for the critical points of a random polynomial with real zeroes will converge in probability to the distribution of the zeroes. A simulation of the case with complex zeroes is also presented, which gives statistical support that this holds for random polynomials with complex zeroes as well. Lastly, the previous results are then applied to Sendov’s conjecture where we take a probabilistic approach to this problem.

Sammanfattning

Denna avhandling studerar förh˚allandet mellan nollställena hos komplexa polynom av en variabel och de kritiska punkterna för des- sa polynom. Vi använder b˚ade analytiska och statistiska metoder där vi tar tekniker fr˚an b˚ade komplex analys och sannolikhetsteori. Vi presenterar först ett alternativt bevis till Gauss-Lucas sats. Därefter bevisar vi att fördelningen av de kritiska punkterna av ett slumppoly- nom med reella nollställen konvergerar i sannolikhet mot fördelningen av nollställena. Sedan presenterar vi en simulering av fallet med komplexa nollställen, vilket ger statistiskt stöd för att detta ocks˚a h˚aller för polynom med komplexa nollställen. Slutligen applicerar vi de tidigare resultaten p˚a Sendov’s förmodan där vi lägger fram en probabilistisk modell av detta problem.

(3)

1 Introduction

The topic of this thesis is classical: the relationship between the zeros of a monic polynomial in one complex variable and the zeros of its derivative (i.e.

its critical points).

On the one hand, we are dealing with functions that have very simple structure. On the other hand, if P (z) = (z − z₁)(z − z₂)...(z − z_n) and P⁰(z) = n(z − ζ₁)(z − ζ₂)...(z − ζ_n−1), then the critical points {ζ_i} depend on the zeros {zi} in a highly nonlinear manner. The aim of this thesis is to present a number of results connecting the distribution of the critical points {ζ_i} to the distribution of {z_i}. Some of these results will be deterministic and others probabilistic. We shall use both analytical arguments as well as numerical and statistical methods.

In Section 2, we discuss an elegant result from the geometry of polynomials called the Gauss-Lucas theorem. To motivate it, recall that if a polynomial in one real variable with real coefficients has n real zeros, then it also has n − 1 critical points that interlace the zeros (this is an immedi- ate consequence of Rolle’s theorem). The Gauss-Lucas theorem provides a complex counterpart: the critical points of an arbitrary polynomial are all contained in the convex hull of its zeros. In Section 2, we present a proof of an inequality that immediately implies the Gauss-Lucas theorem. The inequality appears to be due to Wilf [8], but we give a more detailed proof.

In the paper [5], a statistical study of the distribution of critical points of polynomials was initiated. Section 3 is devoted to discussing results of this nature. In particular, we shall consider random polynomials of the form

P (z) =

n

Y

j=1

(z − X_j) (1)

where each X_j is a random variable uniformly distributed on the unit disc D = {z ∈ C : |z| ≤ 1}.

It was observed in [5] that the critical points {Yj}ⁿ⁻¹_j=1 of a polynomial (1) will ”mimic” the statistical behaviour of the zeros of the polynomial (i.e.

appear to also be uniformly distributed on D in our case). See Figures 4-5 below for the outcome of one of our simulations of this phenomenon.

The aim of Section 3 is to describe this interesting result in more detail.¹

1We emphasize that [5] deals with a more general situation (not restricted to uniformly distributed zeros).

(5)

We consider first the real case, i.e. when the zeros of (1) are drawn from the uniform distribution on [−1, 1]. This case was not analyzed in [5] but is interesting in its own right in our opinion. We prove that in the real case, the empirical cumulative distribution function of the critical points converges in probability to the cumulative distribution function of a random variable that is uniformly distributed on [−1, 1].

One attractive feature of the real case is that one can derive the convergence result mentioned above using elementary methods from probability theory. For the complex case, one needs more sophisticated tools in order to prove that the distribution of the critical points converge in some sense to the uniform distribution on D. On the other hand, the result is statistical in nature so we decided to illustrate it using hypothesis testing. We present a χ²-test that can be applied to test the following null hypothesis:

H₀: the critical points of (1) are uniformly distributed on D.

Using Python and Mathematica, we perform several simulations with polynomials of the form (1) with moderately high degree (n ≈ 100, higher degrees lead to numerical instability when solving P⁰(z) = 0). The result is that H₀ almost never can be rejected at significance level 0.05. Of course, this does not constitute a proof of convergence, but it makes the result plausible in our opinion.

In Section 4, we discuss a celebrated open problem in the geometry of polynomials: Sendov’s conjecture. We shall not formulate the conjecture here (see instead Section 4 below), it is sufficient to say that it is a statement concerning the location of the critical points of a polynomial with all its zeros in the unit disc D. Sendov’s conjecture has been verified in a number of cases but remains open in general. We formulate a probabilistic conjecture (see Conjecture 4.1 below) which essentially claims the following: if the zeros of a polynomial is drawn uniformly in D, then the probability that Sendov’s conjecture is false is zero. We provide some (admittedly very non-rigorous) justification based on the asymptotic statistical properties of critical points discussed in Section 3.

(6)

2 The Gauss-Lucas Theorem

Before we explain the main results of this section, we need to define the concept convex hull :

Definition 2.1. Consider a finite set of points S = {z₁, z₂, . . . , z_n}, S ⊂ C.

The convex hull of S is denoted K(S) and defined as the following set:

K(S) := {z ∈ C : z = λ1z₁+ λ₂z₂+ · · · + λ_nz_n},

n

X

i=1

λ_i = 1

where {λ_i}ⁿ_i=1 are non-negative real numbers. If S = the set of zeroes of a polynomial P , then we use the notation K(P ) = K(S) as a shorthand instead of defining S.

Using this definition, we can formulate the following theorem:

Theorem 2.2. For any complex polynomial P , all the critical points of P must belong to K(P ), i.e. the convex hull of the zeroes of P .

This is called the Gauss-Lucas theorem. An old and important result, this theorem describes where the critical points of a polynomial are in relation to its zeroes. One could describe it as a generalization of Rolle’s Theorem, although Rolle’s theorem describes the location of the critical points with more precision. The main aim of this section is to prove a sharper theorem (Theorem 2.8 below) which has the Gauss-Lucas theorem as a direct consequence. Theorem 2.8 is apparently due to Wilf [8]. Our proof uses a similar argument but is presented with more details.

2.1 Preparation

There are several things we need to note, some of which might seem trivial, and others that are well known theorems. First we note two things about complex numbers:

z = a + bi = |z| (cos α + i sin α)

|z| =√

a²+ b² ≥ |a| = |Re(z)|

Re(z) = a = |z| · cos α

(7)

While these are trivial results they are important to our work in this section and one of the goals we have with this paper is writing it in such a way such that any mathematics student can read and understand it without much need for previous knowledge. Next we need to mention the AM-GM inequality, which is a widely known and proven inequality, and as such we won’t prove it here. It is however a key part in the proof of Lemma 2.5.

Theorem 2.3. For any finite sum of non-negative real numbers we have that:

1 n

n

X

j=0

x_j ≥

n

Y

j=1

x_j

!1/n

(2) Another important theorem is the Hyperplane Separation Theorem. This theorem may be formulated to fit different situations (e.g. for R^d, or even for arbitrary normed spaces), here we shall only need it for C.

Theorem 2.4. Let K ⊂ C be a closed convex set and z ∈ C, {z} ∩ K = ∅, then there exists a half plane that contains K but not z.

Proof. The proof of the hyperplane separation theorem is not very easy (see e.g. [4]), instead we present a sketch of the argument. In this paper we talk mostly about convex hulls of zero sets of complex polynomials. These sets are not only convex and closed, but they also form a polygon (although the argument is exactly the same as with a general convex set, it is easier to visualize with polygons).

Z

Figure 1: Outside vertex

Z

Figure 2: Between vertices In these figures we have the convex hull of some polynomial P and a point z outside of K(P ). Define z_k ∈ K(P ) : |z − z_k| = inf_w∈K(S)|z − w|. There

(8)

are two cases to discuss, the first case is where a point is outside a vertex (left figure), here the line segment between z and z_k will be orthogonal to the closest vertex. In this case we can simply draw a line parallel to the vertex and since the convex hull is a polygon there is simply no situation where the convex hull will cross the line. The second case is where the point is ”between” vertices (right figure), here the line segment between z and zk

will not be orthogonal to any of the nearby vertices. Then you can simply construct a line as shown, and again since the convex hull is a polygon, the outer line will separate the point and the convex hull.

Now that we have all the base theorems and definitions we need, we can proceed.

2.2 Proof of the Gauss-Lucas Theorem

We shall need the following complex variant of the AM-GM inequality (2) above.

Lemma 2.5. Assume that {z₁, . . . , z_n} ⊂ C is a set of points such that 0 < | arg(zj)| ≤ γ < π

2, then

cos(γ)

n

Y

j=1

|z_j|

!1/n

≤ 1 n

n

X

j=1

z_j .

Proof. Let θ_j = arg(z_j) for 1 ≤ j ≤ n. Note that 0 ≤ |θ_j| < π/2 implies that Re(z_j) > 0. Using that fact together with |z| ≥ |Re(z)| for any z ∈ C, we get

n

X

j=1

z_j

≥

Re

n

X

j=1

z_j

!

=

n

X

j=1

Re(z_j)

=

n

X

j=1

Re(z_j)

=

n

X

j=1

|zj| cos (θj)

≥ cos(γ)

n

X

j=1

|z_j|

(9)

where we used that cos(θ_j) ≥ cos(γ). We can now apply (2) to the right-hand side of the above inequality to obtain

n

X

j=1

z_j

≥ cos(γ)

n

X

j=1

|z_j| ≥ n cos(γ)

n

Y

j=1

|z_j|

!1/n

which proves Lemma 2.5.

Definition 2.6. Let S = {z₁, z₂, ..., z_n} be an arbitrary set of complex numbers and let z /∈ K(S). Denote by

γ = γ(z, S) = max_1≤j≤n(arg(z − z_j)) − min_1≤j≤n(arg(z − z_j)) 2

We call the angle 2γ the viewing angle of K(S) at z (see Figue 3).

Figure 3: The viewing angle 2γ

Lemma 2.7. For any S and z /∈ K(S) we have that 0 ≤ γ < π/2.

Proof. This is geometrically clear by Theorem 2.4.

Theorem 2.8. Let P be a monic polynomial in one complex variable of degree n. Furthermore, let z /∈ K(P ), then

|P⁰(z)| ≥ n cos(γ) |P (z)|^1−1/n (3) where 2γ is the viewing angle of K(P ) at z.

(10)

Proof. By rotating the complex plane around z, we may assume that

| arg(z − z_j)| ≤ γ < π 2. Note that

P⁰(z) P (z)

=

n

X

j=1

1 z − z_j

.

Since | arg(1/(z − zj))| = | arg(z − zj)| ≤ γ < π/2, we may apply Lemma 2.5 to get

P⁰(z) P (z)

=

n

X

j=1

1 z − z_j

≥ n cos(γ)

n

Y

j=1

1 z − z_j

!1/n

,

= n cos(γ)

n

Y

j=1

1 z − z_j

1/n

= n cos(γ)

1 Qn

j=1z − z_j

1/n

= n cos(γ)

1 P (z)

1/n

. From this, (3) follows.

Proof of the Gauss-Lucas theorem. Take any point z₀ ∈ K(P ). Then, by/ Theorem 2.8

|P⁰(z₀)| ≥ n cos(γ) |P (z₀)|^1−1/n.

Since cos(γ) > 0 and |P (z₀)| > 0 (z₀ ∈ K(P ) of course implies P (z/ ₀) 6= 0), we get P⁰(z₀) 6= 0 for any z₀ ∈ K(P )./

(11)

3 Statistical Theory of Critical Point Distri- bution

3.1 Notation and Overview

The main topic of this chapter, and indeed the main topic of this thesis, is the statistical distribution of critical points of random polynomials. The systematic study of such questions was initiated in [5].

Assume that X_j, j ∈ N, are independent, identically distributed (IID) complex random variables and that Xj ∼ U (D) (i.e. uniformly distributed on the unit disc in C). Form from {Xj} the random polynomial

P (z) =

n

Y

j=1

(z − X_j) . (4)

In this case the zeroes will be drawn uniformly on D. However, other distributions may be used. Let {Y_j}ⁿ⁻¹_j=1 be the critical points of P (z), note that each Y_j is a complex random variable. It follows from the previously shown Gauss-Lucas theorem that Y_j ∈ D for all j. Even though the Gauss- Lucas Theorem is a major result, natural curiosity leads us to question how the critical points are distributed in D. The following figures show the results of a simulation in which 100 zeroes (left figure) were drawn uniformly on D.

From these zeroes the polynomial P (z) = Q100

j=1(z − X_j) was formed. The critical points of this polynomial was then calculated (right picture).

-0.5 0.5

-1.0 -0.5 0.5 1.0

Figure 4: Zeros

-0.5 0.5

-0.5 0.5 1.0

Figure 5: Critical points

(12)

The following (albeit vaguely formulated) result was proven in [5].

Theorem 3.1. As n → ∞, the distribution of the critical points {Y_j}ⁿ⁻¹_j=1 will converge to the common distribution of the zeroes {X_j}ⁿ_j=1 (i.e. U (D) in this case).

Discussing the metric in which this convergence is measured is hard, even more so understanding the proof, and as such is way beyond the scope of this thesis. However, if we limit this to the real case we can derive this result using elementary methods, as we do in Section 3.2. We will then motivate this result for the complex case using statistical methods (i.e. hypothesis testing) in Section 3.3.

Before we proceed we need to introduce some notations. We denote the following function

1[a,∞) : R → {0, 1}

1[a,∞)(x) :=

(0 if x < a 1 if x ≥ a

as the indicator function. Then we introduce what is called the empirical distribution function

F_n(x) : R → [0, 1] , Fn(x) = 1 n

n

X

i=1

1[Xi,∞](x).

Here {X_j}ⁿ_j=1 are IID random variables and1[Xi,∞] the indicator function as previously defined. This then becomes a step function where the indicator functions acts as points were F_n(x) will increase by 1/n. This becomes clear if we rewrite F_n(x) as

F_n(x) = 1[X1,∞](x)

n + 1[X2,∞](x)

n + · · · + 1[Xn,∞](x) n

and if we assume that X₁ ≤ X₂ ≤ · · · ≤ X_n. Note that for a fixed x this function is still a random variable. This means that we can calculate the expected value and variance of this function while x is fixed.

3.2 Case of Real Zeroes

In this section we will prove that zeroes of polynomials and their critical points has at least some connection in terms of distributions, this will be in

(13)

the form of convergence of probability. To build up to this we first define the real valued random polynomial

P (x) =

n

Y

i=1

(x − X_j) (5)

where {X_j}ⁿ_j=1 are IID random variables and X_j ∼ U [−1, 1]. We can also assume that X₁ ≤ X₂ ≤ · · · ≤ X_n. Note that according to Rolle’s theorem, the critical points {Y_j}ⁿ⁻¹_j=1 must satisfy X_j ≤ Y_j ≤ X_j+1, even if they are also random variables. Next we define two empirical distribution functions

Fn(x) : R → [0, 1] , Fⁿ(x) = 1 n

n

X

i=1

1[Xi,∞](x)

Fˆn: R → [0, 1] , ˆFn(x) = 1 n − 1

n

X

j=1

1[Yj,∞](x).

Here Fn(x) is the empirical distribution function for the zeroes of the polynomial in (5) and ˆF_n(x) is the empirical distribution function for the critical points of (5). We denote by F (x) the distribution function for {X_j}ⁿ_j=1. By showing that ˆF_n(x) converges in probability to F (x) we have, in some sense, showed that {X_j}ⁿ_j=1 and {Y_j}ⁿ⁻¹_j=1 have similar distributions. The first step is defining what convergence in probability is.

Definition 3.2. For any sequence of random variables {X_n} and random variable X, {X_n} converges in probability to X if for every > 0

n→∞lim P (|Xn− X| > ) = 0 Lemma 3.3. For any fixed but arbitrary x ∈ R we have

E(F_n(x)) = F (x) (6)

and

V (Fn(x)) = F (x)(1 − F (x))

n . (7)

Proof. We observe that for each fixed x ∈ R, the indicator function 1[Xj,∞)(x) is a random variable that attains values {0, 1}. Furthermore,

P(1[Xj,∞)(x) = 1) = P(Xj ≤ x) = F (x).

(14)

Let p = F (x). From the above observation, it follows that1[Xj,∞)(x) ∼ Be(p) and therefore

nF_n(x) =

n

X

j=1

1[Xj,∞)(x) ∼ Bin(n, p).

By standard formulas for the expected value and variance of binomial distributed random variables, we have E(nF_n(x)) = np = nF (x), whence E(F_n(x)) = F (x). Similarly, V (F_n(x)) = F (x)(1 − F (x))/n.

The next theorem states that the empirical distribution function ˆF_n(x) of the critical points {Y_j}ⁿ⁻¹_j=1 converges in probability to the distribution function F (x) of the zeros. This makes precise in which way the distribution of the critical points {Y_j}ⁿ⁻¹_j=1 is ”asymptotically close” to the distribution function of the zeroes {X_j}ⁿ_j=1. We mention here that this result was not presented in [5].

Theorem 3.4. For any > 0 and fixed but arbitrary x ∈ R, it holds that P

{| ˆF_n(x) − F (x)| > }

≤ 1 2n.

for all n > 1/. In other words, as n → ∞, the empirical distribution function of the critical points {Y_j}ⁿ⁻¹_j=1 converges in probability to the distribution function.

Proof. We begin by introducing three events:

A = n

x :

Fˆn(x) − F (x) ≥ o B =n

x :

Fˆ_n(x) − F_n(x)

≥ /2o C = {x : |Fn(x) − F (x)| ≥ /2} . We prove that A ⊆ (B ∪ C). Assume that x ∈ A, then:

≤

Fˆn(x) − F (x) ≤

Fˆn(x) − Fn(x)

+ |Fn(x) − F (x)| =⇒

=⇒ 2+

2 ≤

Fˆ_n(x) − F_n(x)

+ |F_n(x) − F (x)|

Hence, A ⊆ (B ∪ C) is proven. By using basic properties for the probability of unions, we have:

P(A) ≤ P(B ∪ C) ≤ P(B) + P(C) (8)

(15)

Since {Y_j}ⁿ⁻¹_j=1 are the critical points of a polynomial with the points{X_j}ⁿ_j=1 as zeroes, Rolle’s Theorem guarantees that ˆFnand Fnwill attain the following graph, barring any multiple roots.

Hence Fn and ˆFn will alternate between which function has the greatest value (except for when they are both equal to 0 or 1). Then

Fˆ_n− F_n has two cases

k

n − k − 1

n − 1, 1 ≤ k ≤ n (9)

or k

n − 1 − k

n, 1 ≤ k ≤ n − 1 (10)

where k is the amount of steps F_nhas taken. Using these equations it is easy to see that

P

sup

x∈R

Fˆ_n− F_n = 1

n

= 1, (11)

which any multiple roots will not affect. The supremum is un-affected since equation (9) is strictly decreasing and equation (10) strictly increasing, hence any max/min values must happen at the boundaries, and so any ”skipped steppes” inside the interval will not affect the supremum. In fact, multiple roots at the boundaries of the interval may decrease the supremum.

By (11) we clearly have P(B) = 0 for a sufficiently large n (say n > 1/).

We proceed with esimating P(C). By (6) and (7), we have E(Fⁿ(x)) = F (x) and V (F_n(x)) = F (x)(1 − F (x))/n. By applying Chebyshev’s inequality (see e.g. [1]) we have:

(16)

P(C) = P(|Fn(x) − F (x)| ≥ /2) ≤ 2F (x)(1 − F (x))

n ≤ 1

2n (12) By (8) and (12) the proof is completed.

3.3 Case of Complex Zeroes

The case when X_j ∼ U (D) is much more difficult than when the roots are real. The two big arguments we relied on was that we could construct the step functions ˆF_n and F_n and also use Rolle’s Theorem to predict how these step functions would behave. Both of these arguments start to fall apart in C. We cannot construct the step functions in the same fashion and we also cannot use Rolle’s Theorem to tell us how they would behave. While we do have the Gauss-Lucas Theorem in C, it does not tell us nearly enough to be used in the same way as we did in the one dimensional case. However, Theorem 3.1 holds, but as stated above, it is not so easy to make precise in which sense we have convergence. In order to have some justification for Theorem 3.1 in the complex case, we devised a way to test the conclusion of this theorem statistically. The theorem is, after all, a statement about the statistical behaviour of the critical points.

3.3.1 Hypothesis Testing

In theory it is not difficult to devise a test for Theorem 3.1 from a statistical point of view. We wish to test the null hypothesis

H₀: Y_j are uniformly distributed on D.

Since the hypothesis concerns the distribution itself plus the fact that our data is complex (i.e. two-dimensional) commonly used one-dimensional and non-parametric tests (e.g. the Kolmogorov-Smirnoff test) are not applicable.

As such, a χ²-test is used. Choose N to be a sufficiently large natural number and generate N random complex numbers {X_j}^N_{J =1} in D, Xj ∼ U (D). Form the random polynomial P (z) on the form (4), differentiate P (z) symbolically and solve P⁰(z) = 0 numerically to obtain the critical points {Y_j}^{N −1}_{J =1}. Take m, n ∈ N and form

A_i,j =

z ∈ D : i

m ≤ |z| < i + 1 m ,2πj

n ≤ arg(z) < 2π(j + 1) n

(13)

(17)

for 0 ≤ i ≤ m − 1 and 0 ≤ j ≤ n − 1. If we identify D as [0, 1] × [0, 2π), then m is the number of sub-intervals [0, 1] is divided into and n is the number of sub-intervals [0, 2π) is divided into. Hence, m · n is the total number of groups. Denote e_i,j as the expected number of points in each group, then under H₀

ei,j = (N − 1)|A_i,j| π

where |Ai,j| denotes the area of each group. In each simulation we can count the number of observed points, o_i,j, in each group to form the ”goodness of fit” statistic

Q =X

i,j

(e_i,j− o_i,j)²

e_i,j (14)

which is approximately χ² distributed with mn − 1 degrees of freedom. We may then accept or reject H0 by comparing Q to the correct quantity, based on what significance level α is acceptable.

For our test, we chose N = 100, m = 5, n = 4 and α = 0.05.

3.3.2 Testing Program

In reality the hypothesis test outlined above was not trivial to implement.

Firstly we needed to find a suitable program which could handle generating N = 100 random complex number and forming a polynomial P (z) on the form (4). The prime candidates was MatLab, R and Mathematica. All three programs exhibited problems when choosing a large N . Both the function for constructing P (z) and the numerical solver for P⁰(z) = 0 would become numerically unstable. Mathematica was least problematic, and could handle the largest N , and so it was chosen. To ensure accuracy the SetPrecision function was used on every number, to ensure all numbers had a sufficiently large number of decimals to minimize computing mistakes. Following the previously constructed hypothesis test, P (z) was differentiated and P⁰(z) = 0 was solved numerically to find all N − 1 zeroes. Up to this point everything in the simulation had been done in Mathematica. After the critical points was found they were exported into a .csv file to be imported by a python program. This program then sorted all critical points into their respective groups, A_i,j(13), and the statistic Q (14) was calculated. The result was

(18)

saved and the simulation was repeated 49 more times. The outcome was that H₀ was only rejected a total of 2 out of the 50 times this simulation was run (at a significance level of α = 0.05), which we see as a convincing result.

(19)

4 Sendov’s Conjecture

The aim of this section is to suggest an application of the statistical theory of critical points discussed in Section 3 above to an outstanding problem in the geometry of polynomials: Sendov’s conjecture.² Sendov’s conjecture was proposed in 1959 and remains unsolved to this day. The aim of this section is to formulate a probabilistic analogue of Sendov’s conjecture and to provide some justification as to why such a result might hold based on the material presented in 3.

4.1 Background

Assume that the complex polynomial P (z) has all zeros in the unit disc D.

By the Gauss-Lucas theorem, any critical point of P (z) must also belong to D. Sendov’s conjecture makes a stronger statement than that.

Conjecture 4.1 (Sendov’s conjecture). Assume that P (z) =

n

Y

j=1

(z − z_j) with z_j ∈ D (1 ≤ j ≤ n). Then there is a critical point in the disc D(zj, 1) for each j ∈ {1, 2, ..., n}.

In other words, each zero of P (z) has a critical point of P (z) within unit distance. We note that it follows from Gauss-Lucas theorem that each zero has a critical point within distance 2.

In Figures 6-7 below, we illustrate the conjecture for polynomials of degree 3. Figure 6 shows the unit disc with the zeros (dots) and critical points (x:s) of a third degree polynomial. Figure 7 shows the unit circle together with circles of radii 1 centered at the zeroes of P (z) (the coloured circles). Note that within each coloured circle there is a critical point of P (z).

2Bl. H. Sendov (1932-2020). In addition to being a renowned mathematican, Sendov also served as rector of Sofia University (1973-1979), President of the Bulgarian parliament (1995-1997), and Bulgarian ambassador to Japan (2003-2009) (see [3]).

(20)

Figure 6

Figure 7

In the literature, one usually reformulates Conjecture 4.1. Consider polynomials of the following form:

P (z) = (z − a)

n−1

Y

j=1

(z − z_j) , a ∈ [0, 1] , (15) and z_j ∈ D. The next proposition is well-known (see e.g. [6]).

Proposition 4.2. If each polynomial of the form (15) has a critical point in C(a) = D(a, 1) ∩ D,

then Conjecture 4.1 is true.

The proof of Proposition 4.2 is not difficult but we have not seen it presented in the literature. Thus, it seems reasonable to include our own deriva- tion of the result below.

Proof. Assume that each polynomial of the form (15) has a zero in C(a). Let P (z) = Q

j(z − z_j) with z_j ∈ D for 1 ≤ j ≤ n. Fix k ∈ {1, 2, ..., n}, we shall prove that P has a critical point in D(z_k, 1) ∩ D. By re-labeling the zeros, we may assume that k = n. Assume that z_n = ae^iθ, where a ∈ [0, 1]. Define w = T (z) = e^−iθz, w_j = T (z_j) and

Q(w) =

n

Y

j=1

(w − w_j) = (w − a)

n−1

Y

j=1

(w − w_j).

(21)

By assumption, Q⁰(w) = 0 has a root η ∈ C(a). We now claim that ξ = T⁻¹(η) is a root to P⁰(z) = 0 in D(z_n, 1) ∩ D. For this, we first note that C(a) = T (D(zn, 1) ∩ D). Furthermore, by considering the logarithmic derivative, we have

P⁰(z) P (z) =

n

X

j=1

1 z − z_j =

n

X

j=1

1

T⁻¹(w) − T⁻¹(w_j)

=

n

X

j=1

1

e^iθw − e^iθw_j = e^−iθ

n

X

j=1

1 w − w_j

= e^−iθQ⁰(w) Q(w)

Since e^−iθ 6= 0 and w ∈ C(a) is equivalent to T⁻¹(w) ∈ D(z_n, 1) ∩ D, it follows that if there is η ∈ C(a) such that Q⁰(η) = 0, then there is ξ = T⁻¹(η) ∈ D(z_n, 1) ∩ D such that P⁰(ξ) = 0.

4.1.1 Progress on Conjecture 4.1

Sendov’s conjecture has been proved in general for all polynomials of degree

≤ 8 by Brown and Xiang [2]. It has also been proved for all degrees in some special cases, e.g. when P (z) has a zero at z = 0 (see e.g. [6]). While this thesis was being completed, T. Tao announced [7] that there exists a constant n₀ such that Sendov’s conjecture is true for all n ≥ n₀. Tao’s proof is an existence proof and does not provide any quantitative information on n0. He believes that it can be modified to yield an explicit value for n₀, but that this n₀ will be ”probably extremely large (certainly much larger than 9)”.

4.2 A Probabilistic Analogue of Sendov’s Conjecture

In this section we suggest a probabilistic analogue of Conjecture 4.1. Let X_j ∼ U (D) and consider the random polynomial

P (z) = (z − a)

n−1

Y

j=1

(z − X_j). (16)

Then P (z) will have n − 1 critical points {Y_j}ⁿ⁻¹_j=1. Furthermore, by the Gauss-Lucas theorem, these complex random variables will attain val-

(22)

ues within D. The probabilistic analogue of Conjecture 4.1 is, vaguely, that almost surely some Y_j will belong to C(a).

Conjecture 4.3. Let X_j ∼ U (D) for 1 ≤ j ≤ n − 1. Then P (P⁰(z) 6= 0 for all z ∈ C(a)) = 0.

Here, P is the joint probability function of the Xj (1 ≤ j ≤ n − 1).

Note that this is a weaker version of Conjecture 4.1, since if the probability of something is zero does not mean it is impossible, just improbable. A good example to illustrate this phenomenon is the following proposition:

Proposition 4.4. A real number u is drawn from U [0, 1]. The probability that the number drawn, u, is a rational number is zero. (As we all know, there are rational numbers in [0, 1] and this does not imply there aren’t.) Proof. We define the event A = {u ∈ Q} (i.e. the number drawn is a rational number) and fix > 0. Let {r_k}^∞_k=1 be some sequence of Q ∩ [0, 1] and define the events

J_k= {u ∈ (r_k− 2^−k, r_k+ 2^−k)}, k ∈ N Clearly,

A ⊂

∞

[

k=1

J_k

and since Jkis a continuous interval we can state that P(J^k) is just the length of the interval divided by the length of [0, 1], so:

P(Jk) = (r_k+ 2^−k) − (r_k− 2^−k)

1 − 0 = 2^1−k

Now we can assert that:

P(A) ≤

∞

X

k=1

P(J^k) =

∞

X

k=1

2^1−k = 2

∞

X

k=1

2^−k = 2

Equality would occur if the intersection of any two intervals would be ∅.

Since was arbitrary we can let it go to zero and obtain P(A) = 0. As stated earlier this does not imply that A = ∅, just that there are very few rational numbers in [0, 1] compared to all numbers in [0, 1].

(23)

Denote by A the event that all random polynomials of the form (15) also satisfies P⁰(z) 6= 0 for all z ∈ C(a). Conjecture 4.1 simply states that P(A) = 0, while the original Sendov’s conjecture states that A = ∅.

While there is no analytical proof for this conjecture, there are support- ing factors which makes this conjecture reasonable and we shall present a

”plausibility argument” for Conjecture 4.1.

Let {Y_j}ⁿ⁻¹_j=1 be the critical points of a random polynomial on the form (15). Assume that the random variables Y_j (1 ≤ j ≤ n − 1) are independent and uniformly distributed on D, then

P(A) =

n−1

Y

j=1

P(Yj ∈ C(a)) =/

1 − |C(a)|

π

n−1

, (17)

where |C(a)| denotes the area of the set C(a). The right-hand side of (17) tends to zero as n → ∞.

By the results presented in Section 3, we know that Y_j will be asymptotically uniformly distributed on D as j → ∞ which give some (non-rigorous) justification for (17) (although we have no real reason to believe that the critical points will be independent random variables). We note however that if the random polynomial has additional structure, we can prove estimates of the type (17).

Proposition 4.5. Let P be a polynomial of the form (16) and assume that that there exists λ ∈ (0, 1) such that at least λn of the n roots have multiplicity

≥ 2. Then

P(A) = O(rⁿ) where r < 1.

Proof. Let S = {X_j} be the set of random variables that are zeros of multiplicity ≥ 2. Since any such zero is also a zero to the random polynomial P⁰, we have that S ⊂ {Y_j}ⁿ⁻¹_j=1. Hence,

A =

n−1

\

j=1

{Y_j ∈ C(a)} ⊆/ \

Xj∈S

{X_j ∈ C(a)}./

Using the inclusion above together with independence of {X_j} and the fact

(24)

](S) ≥ λn (where ](E) denotes the cardinality of E), we obtain

P(A) ≤ P





\

Xj∈S

{X_j ∈ C(a)}/



= Y

Xj∈S

P(Xj ∈ C(a))/

≤

1 − |C(a)|

π

λn

= rⁿ, where r = (1 − |C(a)|/π)^λ.

References

[1] S.E. Alm and T. Britton, Stokastik: Sannolikhetsteori och statistikteori med till¨ampningar, Liber, Stockholm, 2014

[2] J.E. Brown and G. Xiang, ”Proof of the Sendov conjecture for polynomials of degree at most eight”, J. Math. Anal. Appl. 232 (1999), 272–292.

[3] K. Ivanov and P. Petrushev, ”In Memoriam: Blagovest Sendov February 8, 1932 - January 19, 2020”, J. Approx. Theory 254 (2020), 105406 [4] D.C. Lay, Linear Algebra and Its Applications (4e), Addison-Wesley,

Boston, 2011

[5] R. Pemantle and I. Rivin, ”The distribution of zeros of the derivative of a random polynomial”, Advances in Combinatorics, 259-273, Springer, Heidelberg, 2013

[6] Q.I. Rahman and G. Schmeisser, Analytic Theory of Polynomials, Lon- don Mathematical Society Monographs, no. 26, Oxford University Press, Oxford, 2002

[7] T. Tao, ”Sendov’s conjecture for sufficiently high degree polynomials“, arXiv:2012.04125, december 2020

[8] H.S. Wilf, ”Some Applications of the Inequality of Arithmetic and Ge- ometric Means to Polynomial Equations”, Proc. Amer. Math. Soc. 14 (1963), 263-265.

(25)

A Mathematica Code

(*Empties any existing variables*) Remove["Global‘*"]

(*The SetPrecision[x,n] function sets the number of decimals of the variable y to n*)

(*generates 100 random radiuses between 0 and 1*) r = SetPrecision[RandomReal[{0, 1}, 100], 53]

(*Generates 100 random angles between 0 and 2pi*) theta = SetPrecision[RandomReal[{0, 2 Pi}, 100], 53]

(*Calculates the carthesian coordinates from the previously generated radiuses and angles*)

(*Remark: note that we take sqrt(r) instead of just r, this is to achieve a uniform distribution*)

Xcoord = SetPrecision[Sqrt[r] Cos[theta], 53]

Ycoord = SetPrecision[Sqrt[r] Sin[theta], 53]

(*Uses the Xcoords and Ycoords to make a single complex variable*)

Comp = Xcoord + I*Ycoord

(*Generates a polynomial f with the previously generated points as zeroes*)

f[x_] := SetPrecision[Times @@ (x - Comp), 53]

(*Calculates the critical points of f*) CP = SetPrecision[NSolve[f’[x] == 0, x], 53]

(*Generates plots of the zeroes and critical points*) g[t_] := ReplaceAll[x, t]

CP2 = Map[g, CP]

ComplexListPlot[{CP2}]

(26)

ComplexListPlot[{Comp}]

ComplexListPlot[{Comp, CP2}]

(*Takes the real value from the critical points*) CP2X = SetPrecision[Re[CP2], 53]

(*Takes the imaginary values from the critical points*) CP2Y = SetPrecision[Im[CP2], 53]

(*Joins the real and imaginary values into one list*) (*The point of this being to not export the i’s, instead just exports the real and imaginary parts*) CP2Values = SetPrecision[Join[CP2X, CP2Y], 53]

(*Exports the real and imaginary values to a file called Values.csv*)

Export["Values.csv", CP2Values]

B Python Code

#######Pakets######

import os

import numpy as np import math

###################

####Functions####

def cart2pol(x, y):

rho = np.sqrt(x**2 + y**2) phi = np.arctan2(y, x) return(rho, phi)

#################

#Changes the directory to gain access to relevant files

(27)

os.chdir(’C:\\....’)

#Opens the Values.csv file that contains the exported values

#from mathematica

f = open("Values.csv", "r")

#Reads the values from Values and makes a list f_import = f.readlines()

#Closes Values.csv f.close()

#Defines a 99x2 vector Values = np.zeros((99,2))

#Takes the values from f_import and puts them in the 99x2

#vector, where the x-values are in the first

#column and y-values in the second column i=j=0

while i<198:

k = i % 99

Values[(k,j)] = f_import[i]

i += 1 if i == 99:

j += 1

#Defines a new vector that will contain the polar coordinates

#of the cartesian points in the Values vector ValuesPolar = np.zeros((99,2))

#Loops through the vector Values and using the function

#cart2pol converts the cartesian coordinates in

#Values to polar coordinates and puts them in ValuesPolar i=0

while i<99:

ValuesPolar[(i,0)], ValuesPolar[(i,1)] =

(28)

cart2pol(Values[i,0], Values[i,1]) i += 1

#Defines a 4x5 matrix representing all the groups the

#points are divided into Groups = np.zeros((4,5))

#Loops through all the points to check which group to

#place each point in i=j=k=0

for i in range(0,4):

for j in range(0,5):

for k in range(0,99):

if (1-0.2*(j+1))<=ValuesPolar[(k,0)]<=(1-0.2*(j)) and (-math.pi+i*math.pi/2)<=ValuesPolar[(k,1)]<=

(-math.pi+(i+1)*math.pi/2):

Groups[(i,j)] += 1 print(Groups)

#Define a variable for the Chi2 statistic Chi2Sum = 0

#Loops through all the groups to calculate the values

#for the chi2 variable for i in range(0,5):

for j in range(0,4):

Chi2Sum = Chi2Sum +

((Groups[(j,i)] - (9-2*i)) ** 2)/(9-2*i) print(Chi2Sum)

#Writes the calculated chi2 value to the file Resultat.csv g = open("Resultat.csv", "a")

g.write(str(Chi2Sum) + ’\n’) g.close()

Distribution of Critical Points of Polynomials