Predicting the life cycle of technologies from patent data

(1)

Master Thesis in Statistics and Data Mining

Predicting the life cycle of

technologies from patent data

Merhawi Tewolde Gebremariam

Division of Statistics

Department of Computer and Information Science

Linköping University

(2)

Supervisor Mattias Villani Examiner Bertil Wegmann

(3)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – från publiceringsdatum under förutsättning att inga extraordinära omständigheter up-pstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för icke¬kommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av doku-mentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och administrativ art. Upphovsman-nens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan be¬skrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant samman-hang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se för¬lagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible re-placement – from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/her own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copy-right cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and ad-ministrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its pro-cedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(4)

(5)

Abstract

Analysis of patent documents is one way to learn about trends in the evolution of technologies. In this thesis, we propose a mixture of life cycle Poisson model for predicting the life cycle of technologies from patent count data. The aim is to predict the life cycle of technologies and determine the stage of the technology in the development S-curve. The model is constructed from historical data on patent publications of technologies and also from experts’ belief of life cycle of technolo-gies. The methods used to estimate the model are based on Bayesian methods, in particular we use a combination of Gibbs sampling and slice sampling to simulate from the posterior distribution of the model parameters. We apply the model on a dataset of 123 technologies from the electricity sector. As a preliminary exploratory step clustering analysis is also applied on the dataset. Finally we evaluate the model how it performs to predict the trend of life cycle of technologies based on different base years. Results reveal that the model is capable of predicting the life cycle of technologies based on its different stages. However, the predictions of expected behavior become more accurate when more data is used to construct the prediction.

(8)

(9)

Acknowledgments

Firstly, I would like to express my sincere gratitude to my advisor Prof. Mattias Villani, for the continuous support of my thesis, for his patience, motivation, and immense knowledge. His guidance helped me in all the time of writing this thesis. I could not have imagined having a better advisor for my masters study.

I would also like to thank Falah Huseni Co-founder and R&D manager of the com-pany IAMIP for giving me the chance to work my thesis with the comcom-pany and his help to understand the data.

Finally, I would like to thank my lovely wife who stays beside me not only in the good days but also in times where everything seemed hopeless for me.

(10)

(11)

1 Introduction

1.1 Background

Technological change has been found to have a decisive impact on the competitive structure in many industries (Ernst, 2003). In the present era, technologies are changing very rapidly. Hence, technology intensive companies need to be alert to technological updates every day, in order to stay competitive. Monitoring research activities of a given technological area is a good way to define its current trend. Research activities are usually conducted confidentially and therefore it is difficult to get information from which insights of current research activities in a specific technological area can be extracted. However, previous studies show that there are many factors that help to infer research activities in a given technological area (Diam et al., 2012). Analysis of publicly available intellectual property documents is one way to learn about trends in research activities.

A Creation of the mind is referred as an intellectual property (IP) of the cre-ator/inventor. Individuals or companies need to follow some procedures to be granted legal ownership on their inventions. Proof of legal ownership of an IP provides exclusive rights to the property. IP is protected in law by, for example, patents, copyrights, and trademarks, which enable people to earn recognition or financial benefit from what they invent or create (WIPO, 2018). Inventors need to apply for an approval of new invention and thereby for protection in one of the national IP offices in the form of patent, trademark or copyright. Ernst (2003) suggests that the information in patent data can be used for strategic planning pur-poses. Though the observation of patenting activities is not equivalent to a full-scale review of technological trends, it is one of its strongest indicators (Brockhoff, 1991). The process of patent publication starts by applying to one of the patent offices. Once an application is made, it stays under examination for a maximum of 18

(12)

Chapter 1 Introduction months. Approved patents are then stored as published patents in the patent databases. Evolution of technologies based on their respective patent publications is often described by the cumulative number of patent publications which resembles an S-shape curve. Companies use the S-curve as the main source of indicator for a technology’s development stages in its evolution phases. i.e. it shows whether the technology is emerging, growing, maturing or saturating.

There have been numerous studies on monitoring patent documents for both spe-cific technologies and larger technological area, see eg (Diam et al., 2012; Dereli and Durmusoglu, 2009; Chen et al., 2011; McAleer et al., 2007). Most of the research efforts on patent analysis has been studied using time series models for specific tech-nologies in a particular period of time. The structural and asymptotic properties of the time-varing AR(1)-GARCH(1,1), AR(1)-GJR(1) and AR(1)-EGARCH(1,1) models have been discussed in McAleer et al. (2007). Linear regression based al-gorithm have been developed in the extraction of patent based trend (Dereli and Durmusoglu, 2009). Chen et al. (2011) present the technological S-curves that inte-grates the bibliometric and patent analysis into the logistic growth curve model for hydrogen energy and fuel cell technologies.

The quality of the proposed models and techniques is technology-specific dependent, i.e appropriate selection of model is needed to analyze a given technology. Mishra et al. (2002) suggests a methodology to match the technique to a technology by map-ping both technology and technique characteristics on a common scale. N. Meade (1998) reviewed several existing models and suggested that it is easier to identify a class of possible models rather than the ‘best’ model, this leads to the combining of model forecasts.

To the best of my knowledge a data-driven model supported by a prior knowledge of technological life cycles has never been analyzed. In this thesis, a mixture of life cycle Poisson for analyzing patent count data observed over time is proposed. The model is estimated using Bayesian methods. A combination of Gibbs sampling and slice sampling is used to simulate from the posterior distribution of the model parameters. We apply the model and posterior sampling algorithm to a dataset of 123 technologies from electricity sector. Mixture models (Gelman et al., 2013) are flexible, yet robust by allowing data-driven clustering of technologies with similar evolution of patents.

(13)

1.2 Objective

With the increase of technological advancement, an extensive number of patent doc-uments are getting published. According to the records from databases of patent documents, around 100,000 patents are published every week. With this bulk of information, it is very difficult for companies to keep track of trends of their respec-tive technological area of interest. Lack of such kind of information for companies cause problems in their planning strategy. Automatic data-driven trend predictions are therefore of great interest to innovative companies.

This thesis aims therefore to develop a mixture model as an option which helps to predict the life cycle of a technology and also to determine the stage of a technology in the development S-curve. The model will be constructed from historical data on patent publications and also from experts’ belief of life cycle of technologies.

(14)

(15)

2 Data

2.1 Data sources

Databases of patents store a very large amount of data which updated rapidly with the application of new patents. Espacenet (2018) Patent databases contain a wealth of information about technology, and with 90 million patent documents the EPO’s Espacenet worldwide database is one of the largest collections of technical informa-tion in the world. To make searching easier, patent applicainforma-tions are classified by technical area. The patent classification system is a hierarchical system where the top nodes are broad technological areas (coded with letters) such as:

A = Human necessities, B = Performing operations, C = Chemistry, D = Textile, E = Fixed construction, F= Mechanical engineering, G = Physics, H = Electricity, Y = General tagging of new technologies

Each of the above technological areas is classified into several subclasses and again the subclasses into other subclasses and soon until it reaches a very specific tech-nology. In this thesis, 123 technologies from 9 subclasses of the class Electricity(H) are selected over the time period of 1970-2016 (47 years). The data is provided by the company IAMIP Sverige AB located in Stockholm, Sweden.

2.2 Raw data

The dataset contains annual number of patent applications for each of the selected 123 technologies over 47 years. Table 2.1 shows the filtered data from the database in a tabular form. Each column refers to a technology which is coded with a com-bination of letters and numbers, for example, H01M 4/00 represents technologies related to electrodes.

(16)

Chapter 2 Data Table 2.1: Annual number of patent publications of technologies (1970-2016).

Year H01M 2/00 H01M 4/00 H01M 6/00 . . 1970 275 308 110 . . 1971 292 307 125 . . . . . . . . . . . . . . 2016 4687 4919 2671 . .

Most patent applications follow similar trends over time, i.e applications start to increase very slowly at the emerging stage of the technology and then increases very rapidly at its growing stage followed by a decline down at its maturation stage. As an example Figure 2.1 shows a trend of two technologies (Secondary cells and Electrochemical generators) over the time period 1970-2016.

Figure 2.1: Trend for patent applications of Electrochemical generators and Sec-ondary cells (1970-2016).

(17)

3 Methods

This section describes the proposed models and the estimation methodology used in this thesis.

3.1 The life cycle Poisson model

As a starting point, consider the following model for the number of patent applica-tions for technology i at time t,

Yit ind

∼ Pois(λit), (3.1)

where λit is the average number of patent applications for technology i in time t,

which is given as:

λit= Ci· hvi(t|µi, σ

2

i), (3.2)

where hvi(t|µi, σ

2

i) is the density function of a t distribution with vi degrees of

free-dom, µi location and σ2i scale. The t density parameterize the mean to follow the

typical life cycle of technologies and we therefore call the proposed model in 3.1 and 3.2 the life cycle Poisson model. The mean value of the t distribution in the life cycle Poisson model is the point where the patent applications of a technology is at its making. We refer this point as the peak value on the coming sections of the thesis.

The parameter Ci in equation 3.2 is a positive constant which multiplies the density

function values for technology i in order to model the mean of the number of patent applications in technology i.

Figure 3.1 illustrates mean function of the model with simulated data for different combination of the parameters.

(18)

Chapter 3 Methods

Figure 3.1: Illustration of the mean of the life cycle Poisson model.

As it is given in Equation 3.1, the number of applications for patent documents are assumed to be independent. Under the assumption of dependency in between and with other factors (e,g number of companies applying for a patent), model can be easily extended as follows,

λit = Ci·exp(ZitTβ) · hvi(t|µi, σ

2

i),

where Zitpossibly contains yi,t−1and other covariates. We will not use this extension

in this thesis, but note that Algorithm 3.2 is straightforwardly extended by adding an standard Poisson regression updating step for β.

Many technologies follow similar trends even though mean level of the number of applications can differ widely. In other words, several technologies may have similar location (µ), scale (σ2_{) and degrees of freedom v, but possibly different C. The}

similar trend patterns will be exploited in the mixture model presented in Section 3.3. As a preliminary exploratory step, the next section pre-clusters the data and

(19)

3.2 The life cycle Poisson model on clustered data estimates the trend pattern for each cluster.

3.2 The life cycle Poisson model on clustered data

Clustering is a division of data into groups of similar objects (Ria and Singh, 2010). The application of clustering analysis both to static and dynamic datasets helps to get some hidden knowledge of the dataset. Performing clustering analysis on time series data is often a very helpful way to explore the features of a dataset. Finding stocks that behave in a similar way, determining products with similar selling patterns, identifying countries with similar population growth or regions with similar temperature are some typical applications where similarity searching between time series is clearly motivated by Montero and Vilar (2014).

The life cycle of several technologies seem to have similar shapes and by cluster-ing technologies into homogeneous groups makes it possible to estimate a scluster-ingle model for a cluster. This borrowing of information from similar time series is of-ten very useful when data are limited on each time series, which is the case with patents. Hence, time series clustering analysis on the given technologies will be applied. The first step that should be considered in the process of clustering anal-ysis is to define the concept of similarity between units (Montero and Vilar, 2014) A crucial question in cluster analysis is establishing what we mean by “similar” data objects, i.e., determining a suitable similarity/dissimilarity measure between two objects .Computing dissimilarity measure for time series data is very complex because of its dynamic nature and high dimensionality. One of the simplest ways for calculating distance between two time-series is considering them as univariate time-series, and then calculating the distance measurement across all time points (Aghabozorgi et al., 2015). With this, based on distance measures an agglomerative hierarchical clustering algorithm is applied. We determine the number of clusters from the dendogram produced by the algorithm using elbow method. The Elbow method looks at the total within cluster sum of square (WSS) as a function of the number of clusters: One should choose a number of clusters so that adding another cluster doesn’t minimize much better the total WSS (datanovia, 2018).

Before proceeding to compute the distances between points, since the interest of clustering, in this case, is to group technologies according to their similarity in shape and not magnitude, scaling the data into a common range is necessary. Below

(20)

Chapter 3 Methods is a basic and simple approach which normalizes data in a range of [0,1].

Let Yit be the number of patents of technology i in time t ,

y_i = Yit−min(Yit)

max(Yit) − min(Yit)

(3.3)

The distance between two technologies is computed with the following expression: Let y_i = (yi1, yi2, ..., yiT)′ be a vector of scaled patents for technology i in a given

time interval 1, 2, ..., T . distance(y_i, y_j) = ( T X t=1 (yit− yjt)2) 1 2 (3.4)

3.3 Mixture of life cycle Poisson models

A model based alternative to pre-clustering the data is to use a mixture model. As noted in Barber (2011) a mixture model is one in which a set of simpler component models is combined to produce a richer model. It is a very attractive and flexible model, and in the past 20 years there has been a dramatic increase of interest in Bayesian analysis of finite mixture models (Jasra et al., 2005). This section will present the proposed mixture model and the inference algorithm for the model parameters using Gibbs sampler, which we will discuss in Section 3.6.

Model: yit|{Si = k} ∼ Pois(λ (k) it ) (3.5) λ(k)_it = Ci· hvk(t|µk, σ 2 k) Pr(Si = k) = πk

where k = 1, ..., K indexes the number components in the mixture model, π1....πk

are the mixing coefficients, Si indicates component membership of observation i.

(21)

3.4 Prior distributions

The prior distribution of the model parameters are taken to be independent, and each parameter’s prior distribution is modeled as:

• The indicators (S) are modeled as multinationals with parameter π with con-jugate prior distribution of π ∼ Dirichlet(α1, ..., αK).

The probability distribution function of a Dirichlet distribution is given by:

P(θ) = 1 B(α) K Y i=1 θαi₋₁ i , where B(α) = Q K i=1Γ(αi) Γ(PKi=1αi) , α = (αi, , , , , , αK) and θ = (θ1, , , , , θK)′ ∼ Dirichlet(α)

• The prior distribution of the location parameter, P(µk), is modeled differently

based on expert’s opinion of different technological groups:

µk∼ N(µ0k, τ_0k2 ),

where µ0k and τ0k2 are determined from experts opinion regarding the 95%

probability interval for µk.

• The prior distribution of σ2

k and vk of the model are assumed to be uniform.

Degrees of freedom are assumed to be uniform over the range of [1,20] because values greater that 20 are essentially equal to 20 and basically it is a normal distribution where degrees of freedom is infinity.

3.5 Maximum Likelihood Estimation

Maximum likelihood estimation identifies the population parameter values most likely to have produced a particular sample of data (Scholz, 2006). In this thesis, the method of maximum likelihood estimation is applied for estimating parameters of the model for individual technologies in order to get a quick view into the fitting capability of the models. ML estimates from pre-clustered data will also be discussed to verify dissimilarity between the clusters.

(22)

Chapter 3 Methods The likelihood function for the model given in equation 3.1 is as follows:

L(θi; Yi1, Yi2, ..., YiT) = T Y t=1 P(Yit; λit(θi), (3.6) where θi = (Ci, vi, µi, σ2i)

For computational simplicity we compute ML estimate from the log-likelihood func-tion. l(θi; Yi1, Yi2, ....YiT) = T X t=1 logP (Yit; λit(θi)) (3.7)

which for the Poisson model in equation 3.1 is:

l(θi; Yi1, Yi2, ....YiT) = T X t=1 Yitlog(λit(θit)) − T X t=1 log(Yit!) − T X t=1 λit(θit) (3.8) where λit(θi) = Ci· hvi(t|µi, σ 2

i). Partial differentiation of the log likelihood function

with respect to the parameters and setting it to zero enables to compute parame-ter values which maximizes the function. But computing the partial derivatives is complex therefore ML estimates are obtained using numerical optimization.

Maximum Likelihood of a Cluster

Once clustering analysis is performed, parameters of a cluster (see Eq 3.10) which maximizes the probability of the observed data can be performed from the the following likelihood function:

L(θc) = nc Y j=1 T Y t=1 p(Y(c) jt ; λ (c) jt) (3.9) λ(c) jt = Cj · tvc(t|µc, σ 2 c) (3.10)

where θc = (µc, vc, σc2, , C1, C2, ..., Cnc)′ parameters of cluster c and nc is number of

technologies within cluster c.

(23)

3.6 Markov Chain Monte Carlo

Equivalent log-likelihood of equation 3.9 is written as follows:

l(θc) = nc X j=1 T X t=1 Y(c) jt log(λ (c) jt) − nc X j=1 T X t=1 log(Y(c) jt ) − nc X j=1 T X t=1 λ(c) jt (3.11)

3.6 Markov Chain Monte Carlo

Bayesian inference is another way of estimating model parameters given the data. This subsection briefly discusses how the posterior distribution of the cluster param-eters are obtained using appropriate Markov Chain Monte Carlo (MCMC) methods. Model: y(c)jt indp ∼ Pois(λ(c)jt ) λ(c)jt = Cj· hvc(t|µc, σ 2 c)

The posterior distribution is written as: Pr(θc|y (c) 1:T) ∝ p(y (c) 1:T|θc) · Pr(θc) (3.12) Where θc = (µc, vc, σ2c, , C1, C2, ..., Cnc) ′ _{and y}(c)

1:T are patent documents over the time

period of 1 to T of all the technologies in cluster c.

Computationally, it would be very complex to sample all the parameters directly from the above joint posterior distribution. Successive random sampling using a technique of MCMC is a solution in such kinds of complex models. Gibbs sampling (Gelman et al., 2013; Casella and George, 2016) helps to generate from the joint posterior distribution by iteratively sampling from the full conditional posterior, which at iteration q is:

Pr(θj|θq−1_−j , y) (3.13)

where θ is decomposed into d blocks,

θ = (θ1, ..., θd) and θq−1_−j = (θq1, ..., θj−1q , θjq−1+1, ....θdq−1).

Simulation of the parameters from the posterior distribution given in Eq. 3.12 using Gibbs sampling is performed by the following procedure.

(24)

Chapter 3 Methods 1. Simulate C1, C2, ..., Cnc given vc, µc, σ

2

c and data(y

(c)

jt ) from the posterior

distri-bution computed from its conjugate prior: Pr(C1, C2, ..., Cnc|y (c) jt , vc, µc, σc2 ∝Pr(y (c) jt |µc, σc2, C1,..,Cnc) · Pr(C1, .., Cnc) C1, .., Cnk|y (c) jt , µc, vc, σc2 ∼Gamma(Aj + T X t=1 y(c)jt , B+ T X t=1 at) (3.14) where at= hvc(t|µc, σ 2

c) and Cj ∼Gamma(Aj, B) for j =1, 2, ...nc.

2. Simulate the degree of freedom vc given all other parameters and data from its

posterior distribution by the inverse CDF method over fine grid of values over the interval of [1,20].

The posterior distribution of vc is derived with the following steps:

• Multiply the likelihood and the prior over a fine grid of vc (unnormalized

posterior).

• Calculate the normalizing constant and then multiply it with the unnormalize posterior so that the posterior distribution has a proper probability distribu-tion funcdistribu-tion.

3. Update µc given all other parameters using slice sampling.

Pr(µc|y(c)jt , C1,..,Cnk, vc, σ 2 c) ∝ Pr(y (c) jt |µc, σc2, C1,..,nc) · Pr(µc) 4. Update σ2

c given all other parameters using slice sampling.

Pr(σ2 c|y (c) jt , C1,..,Cnc, vc, µc) ∝ Pr(y (c) jt |µc, σc2, C1,..,nc) · Pr(σ 2 k) Slice Sampling

Gibbs sampling is used in problems where it is known how to sample from conditional distributions. In the above simulation steps, as the µ and σ2 _{doesn’t have conjugate}

priors it is complex to sample points from their respective conditional distributions.

µc and σc2 are updated using slice sampling in each iteration of the Gibbs sampling.

Slice sampling originates with the observation that to sample from a univariate distribution, we can sample points uniformly from the region under the curve of

(25)

3.6 Markov Chain Monte Carlo

its density function and then look only at the horizontal coordinates of the sample points (M. Neal, 2003). The iterative procedure of slice sampler discussed by Jian (2007) is briefly stated below.

Recall that sampling from a density f(x) is equivalent to sample uniformly (with respect to the area) from the area under the graph of f(x) :

A= {(x, u) : 0≤u≤f(x)},

Algorithm 3.1 Slice sampler: at iteration t with (xt_{, u}t_),

1. sample u(t+1)

∼Unif[0, f(x(t)_)];

2. sample x(t+1)_∼_Unif(A(t+1)

),

where A(t+1) _{= {x : f(x) ≥ u}(t+1)_}.

Gibbs Sampler for the mixture model

The Gibbs sampler outlined above is straightforwardly extended to the mixture models by adding an updating step for the membership indicators S1, ..., SK and an

updating step for the mixture weights πk.

Mixture model applies the Gibbs sampler by iterating the two steps: 1. Simulate indicators given all other model parameters.

2. Simulate all parameters given indicators.

The second step contains many steps, as there are many parameters which will be simulated conditional on all other parameters. In our case there are five parameters

Cik, µk, σk2, vkand the mixing coefficient πk. The first four parameters can be updated

using the procedure mentioned above. The mixing coefficient is updated in each step from its conditional posterior Distribution derived from conjugate prior of Dirichlet distribution.

In summary, one Gibbs iteration for the mixture model is given with the following algorithm:

(26)

Chapter 3 Methods Algorithm 3.2 Gibbs sampler for the mixture model.

1. (π1, ...., πk)|Ci, µk, σ2k, vk, S∼Dirichlet(α1+ n1, ...., αk+ nk) 2. Ci|π, µk, σ2k, vk, S ∼ Gamma(Ai + PT t=1y (k) it , B + PT t=1at) , i = 1,2,....,N and y_it(k)is technology i in component k

3. vk|π, µk, σk2, Ci, S simulate from posterior over grid of values [1,20].

4. µk|π, vk, σk2, Ci, S simulate using slice sampling.

5. σ2

k|π, vk, σ2k, Ci, S simulate using slice sampling.

6. Si|π, vk, µk, σk2, Ci ∼Multinomial(θ (1) i , ...., θ (K) i ) , i = 1,2,...N where, θ(k)_i = πkφ(yi; λ (k) i ) PK k=1πrφ(yi; λ (k) i ) (3.15)

where i = 1, 2.., N and k = 1, 2, 3, ..., K and φ(yit; λ(r)it ) denotes the PDF of Poisson

distribution.

3.7 Evaluation

This section describes the method used to evaluate the prediction performance of the given model. The main application of this model is to predict life cycle of an emerging technology.

Given the number of patent applications of a technology in its emerging stage, the predictive distribution of the the model is computed as:

p(yi,t+1, ..., yi,t+M|yi,1, ..., yi,t,Y) = K

X

k=1

Z

p(Si = k|yi,1, ..., yi,t, θ)×

p(yi,t+1, ..., yi,t+M|yi,1, ..., yi,t, θ, Si = k)p(θ|yi,1, ..., yi,t,Y)dθ

where yi,1, ..., yi,t is the number of patent applications of an emerging technology i

from year 1 up to year t. Y is the training data and yi,t+1, ..., yi,t+M is the predictions

of the number of patent applications of technology i from year t up to year t + M.

(27)

3.7 Evaluation

The effect of the small amount of data, which is the number of patent applications of the emerging technology on the posterior distribution of model parameters is very low therefore, we take the following Simplifying assumption:

p(θ|yi,1, ..., yi,t,Y) ≈ p(θ|Y).

With this, simulation of data from the predictive distribution is performed using the following algorithm.

Algorithm 3.3 Simulation from the posterior predictive distribution. Repeat N times:

• Draw θ from posterior p(θ|Y) • Compute hk for all k

• Draw Ci|{yi,1, ..., yi,t}, {Si = k}, θ ∼ Gamma(Ai +

Pt j=1yij, B + Pt j=1hvk(j|µk, σ 2 k))

• Draw Si = k from p(Si = k|yi,1, ..., yi,t, θ) ∝ p(yi,1, ..., yi,t|{Si = k}, θ)p({Si =

k}|θ)

• Draw yi,t+m|yi,1, ..., yi,t, θ, Si = k indp

∼ Pois(yi,t+m|λ(k)i,t+m)

Measure of model performance

We look on the symmetric mean average error (SMAPE) to measure the accuracy of the model predictions of the expected behavior. Symmetric mean absolute per-centage error (SMAPE) is an accuracy measure based on perper-centage (or relative) errors (wikipedia). It is defined by:

SMAPE = 100% n n X t=1 |yˆt− yt| |yˆt| − |yt| ,

(28)

(29)

4 Results

This chapter presents the most important results of the study. We start in section 4.1 with reporting parameter estimations using maximum likelihood and Bayesian methods of individual technologies. Section 4.1 describes also calculation of prior distributions of technologies. Section 4.2 reports results from clustering analysis and estimations of cluster parameters using maximum likelihood and Bayesian analysis. Section 4.3 presents the results of the mixture model when a different number of components are assumed. Finally, in section 4.4, we evaluate our model on the test data.

4.1 Parameter estimates of individual technologies

4.1.1 Maximum likelihood estimates

Maximum likelihood estimates of parameters for individual technologies are obtained using numerical optimization (see Section 3.5) performed in the R package Optim. Figure 4.1 (Left) shows scatter plot of Maximum likelihood estimates of mean and variance of all individual technologies. The ML estimates of the technologies are very scattered in a wide range of mean and variance.

Since the number of technologies in this study are many, we randomly select four technologies to visualize the fit of the model to the data. Plots in Figure 4.2 show how the model fits the data using both the ML estimates and Bayesian estimates. The two plots at the top of the Figure show that the model fits the data better using Bayesian estimates than the MLE.

(30)

Chapter 4 Results

Figure 4.1: ML estimates(Left) and Bayesian estimates(Right) of mean and

vari-ance of all individual technologies.

Constructional details Electrodes

Fuel cells Secondary cells

Figure 4.2: Fitted values of different individual technologies using MLE and

Bayesian estimates.

(31)

4.1 Parameter estimates of individual technologies

4.1.2 Prior Distributions

As mentioned in the methods section, the prior distribution of the peak value (de-fined on section 3.1) is modeled based on an expert’s opinion. The expert’s 95% interval belief of the peak values of the technologies included in this study are divided into three categories:

• The batteries and battery related technology have 30-35 years of life. • Technologies related to chemical content have 10-20 years of life. • Technologies related to capacitors have 35-40 years of life.

The mean µ0k and variance τ0k2 of the prior distribution are computed as using the

following formula:

µ0k+ 1.96 · τ0k = maxk

µ0k−1.96 · τ0k = mink,

where, maxk is the maximum year that a maximum number of patent documents

would be published. mink is the minimum year that a maximum number of patent

documents would be published.

The prior distribution of the variance and degrees of freedom are non-informative. We assume a uniform prior distribution on both the parameters in the analysis.

4.1.3 Bayesian estimates

As it is explained in the methods section, we implement a Gibbs sampler algo-rithm to simulate from the posterior distribution of the parameters of the model. The posterior draws of all parameters of individual technologies are obtained from 2000 iterative simulations of the algorithm. The prior distribution of parameters of each technology is modeled according to the category it belongs (see Section 4.1.2). Figure 4.1 (Right) shows a scatter plot of posterior estimates of mean and variance. Compared to the MLE in Figure 4.1(Left), we see that the Bayesian estimates are clearly shrunk toward their prior means.

(32)

Chapter 4 Results

4.2 Estimation of Cluster parameters

4.2.1 Clustering analysis

As we have discussed in Section 3.2 the aim of clustering analysis in this study is to verify that groups of technologies follow similar trend. Clustering Analysis on the data points is performed using the R packages TSclust (Montero and Vilar, 2014), which computes the dissimilarity matrix and hclust, which implements hierarchical clustering algorithm. Using the elbow method described in Section 3.2 we found six clusters. Table 4.1 shows the number of technologies assigned in each cluster.

Table 4.1: Number of technologies in each cluster. number of technologies cluster 1 45 cluster 2 53 cluster 3 14 cluster 4 1 cluster 5 6 cluster 6 4

4.2.2 Estimation of cluster parameters

Parameter estimates using maximum likelihood and Bayesian analysis of all the six clusters are presented in Table 4.2. The posterior draws of each parameter are obtained from 5000 iterative simulations. Plots in Figure 4.3 - Figure 4.5 show the posterior distribution of the µ, σ2 _{and v for each of the six clusters.}

Table 4.2: Parameter estimates of each cluster using the MLE and Bayesian

meth-ods, where the MCMC column show the posterior mean estimates.

µ σ2 df

MLE MCMC MLE MCMC MLE MCMC

Cluster 1 41.50 41.75 7.49 8.92 1.48 2.40 Cluster 2 39.66 48.90 11.01 17.45 2.05 3.70 Cluster 3 45.24 45.29 6.93 20.45 1.00 5.10 Cluster 4 26.79 25.90 10.24 11.03 20.00 10.30 Cluster5 30.85 30.27 14.57 18.40 19.29 10.48 Cluster 6 31.61 40.25 16.23 34.51 20.00 12.60 26

(33)

4.2 Estimation of Cluster parameters

Figure 4.3: Posterior distribution of the six clusters parameters.

Figure 4.4: Posterior distribution of the 6 clusters variances.

(34)

Chapter 4 Results The MCMC convergence of the posterior for µ, σ2 _{and v of the first two clusters}

is presented in Figure 4.6. We plot the cumulative means of the posterior draws to verify the convergence. The cumulative posterior estimates for the parameters in the other four clusters follow similar pattern.

Posterior mean of cluster 1 Posterior mean of cluster 2

Posterior variance of cluster 1 Posterior variance of cluster 2

Posterior degrees of freedom of cluster 1 Posterior degrees of freedom of cluster 2

Figure 4.6: Convergence of parameters of the first two clusters.

Plots in Figure 4.7 show fitted models using maximum likelihood of the clusters. The plots show that the shapes among the clusters is different. For example cluster 1 has a narrow shape around the mean while cluster 6 has wider shape around its mean. Model fits of each of the clusters using the Bayesian estimates are presented

(35)

4.3 Mixture model

in Figure 4.8. Like the ML estimates the shapes among the clusters show difference when a Bayesian estimates is applied.

Cluster 1 Cluster 2

Cluster 3 Cluster 4

Cluster 5 Cluster 6

Figure 4.7: Fitted values using ML estimates of each of the six clusters.

4.3 Mixture model

This section reports results of the proposed mixture model. Posterior inference about the model parameters are obtained from 5000 iterations of the MCMC algorithm

(36)

Chapter 4 Results

Cluster 1 Cluster 2

Cluster 3 Cluster 4

Cluster 5 Cluster 6

Figure 4.8: Fitted values using Bayesian estimates of each of the six clusters.

(37)

4.3 Mixture model

described in Section 3. We report in Table 4.3 - Table 4.6 posterior inference of µ,

σ2_{, v and π of the model when three, four, five and six components are assumed}

respectively.

Table 4.3: Parameter estimates of a mixture model with three components.

µ σ2 _df _π

Component 1 38.03 29.95 1.00 0.44

Component 2 45.19 9.39 1.28 0.36

Component 3 40.23 10.55 1.80 0.19

Table 4.4: Parameter estimates of a mixture model with four components.

µ σ2 df π

Component 1 43.25 11.14 1.00 0.27

Component 2 44.01 6.02 1.60 0.18

Component 3 35.07 30.14 2.80 0.13

Component 4 42.66 14.30 19.90 0.14

Table 4.5: Parameter estimates of a mixture model with five components.

µ σ2 df π Component 1 39.61 7.51 2.80 0.10 Component 2 46.09 10.65 1.00 0.22 Component 3 36.85 29.76 1.50 0.33 Component 4 40.64 11.04 20.00 0.20 Component 5 43.67 5.32 11.05 0.15

Table 4.6: Parameter estimates of a mixture model with six components.

µ σ2 df π Component 1 30.84 11.08 20.00 0.12 Component 2 44.66 6.61 1.92 0.14 Component 3 40.57 11.54 19.66 0.22 Component 4 46.81 33.61 1.00 0.23 Component 5 35.52 8.12 1.40 0.18 Component 6 48.53 20.23 1.60 0.09

To reflect the shapes of estimated clusters using the mixture model with six compo-nents, we plot model fits for each of the clusters in Figure 4.9. The plots show that

(38)

Chapter 4 Results the six components have different shape of trends. Notice that there is similarity between the model fits of the six components and the Bayesian model fits of the six clusters. Detailed comparison between the components and clusters is described in the discussion section.

Component 1 Component 2

Figure 4.9: Fitted values of the mixture model with six components.

4.4 Model evaluation

All the algorithms on the analysis run using 80% of the technologies, leaving the 20% for evaluation. As it is discussed on the methods section model is checked

(39)

4.4 Model evaluation

how it performs on predicting trend of patent documents of a technology based on some observed data. We evaluate the model on predicting the number of patent publications of the next t + m time steps given patent publications of t time step. Figure 4.10 shows predictions of number of patent publications of the Electrically

conductive technology based on different base years. The plots and the SMAPE

values show the predictions of expected behavior become more accurate when more data is used to construct the prediction.

We also evaluate our model how it predicts the peak values of a technology’s cycle, i.e. the mean of the student t distribution in Eq. 3.2. It is important for most of the innovative companies to know how long would an emerging technology will stay in the market. Table 4.7 shows the 95% interval mean prediction of the year where the maximum number of patents of an emerging technology would be published. Description of CPC codes of all technologies are found in Espacenet (2018). Since our interest here is to predict for emerging technologies, all of our predictions are analyzed based on a 10 years of data. For a quick glance which helps on interpreting the accuracy of the predictions, we use a scatter plot of the mean values of the predictions and the observed values in Figure 4.11.

Results in Table 4.7 and Figure 4.11 show that most of the predictions of the model for the peak values of the life cycle of technologies are close to the red line which is interpreted as model predicts good. However in some technologies the predictions are very far from the true values for e.g. the technology Two-part coupling devices which is presented by its CPC code H01R24 in Table 4.7. In order to understand the reason, we make predictions of Two-part coupling devices based on 10 years of data and then in a 30 years of data (the year where the number of patent applications stay constant for few years). Figure 4.12 shows that the prediction performance gets better when we increase the data.

(40)

Chapter 4 Results

Figure 4.10: Trend prediction of patent documents of a technology

(Electrically-conductive). Prediction is made based on four different base years. Top left uses 10 years of data (blue line) and mean predicted values of the remaining 37 years(green line) with 95% interval(green dotted) and 95% prediction interval(red dotted);SMAPE: 1.2% , top right uses 15 years of data;SMAPE: 0.9%, bottom left uses 20 years of data; SMAPE: 0.3% and bottom right uses 30 years of data; SMAPE: 0.06%. Blue points are observed data in all plots.

(41)

4.4 Model evaluation

Table 4.7: Prediction of peak values of technologies. CPC codes of technology mean predictions(95% interval) data

H01M8 [35,38] 37 H01G5 [39,45] 43 H01G11 [43,46] 43 H01G13 [42,47] 44 H01L41 [38,43] 43 H01L2221 [37,44] 45 H01B11 [39,44] 43 H01R3 [36,44] 44 H01R23 [43,46] 43 H01R24 [37,39] 46 H01R35 [44,47] 45 H01R43 [41,43] 44 H01R2101 [41,46] 44 H01R2201 [39,46] 43 H01R2205 [39,42] 44 H01H3 [39,45] 44 H01H13 [41,44] 46 H01H23 [44,47] 45 H01H33 [39,43] 44 H01H69 [37,45] 43 H01H73 [40,46] 45 H01H85 [39,46] 45 H01H30 [42,45] 46

(42)

Chapter 4 Results

Figure 4.11: Scatter plot of the means of the peak values predictions on the

tech-nologies and the observed values with a 45 degree line(red line).

Figure 4.12: Trend prediction of patent documents of a technology (Two-part

cou-pling devices). Prediction is made based on two different base years Left plot uses 10 years of data and right plot uses 30 years of data.

(43)

5 Discussion

This thesis presents a new mixture of life cycle Poisson model for predicting the life cycle of technologies from patent count data. We use a Bayesian approach to inference and develop a Gibbs sampling algorithm to simulate model parameters. Historical patent documents of technologies from the electricity sector are analyzed using the model and we evaluate the model’s predictive performance. An expert’s opinion on the life cycle of the technologies is also considered in this study. Before we proceeded to build the mixture model, we verify that groups of technologies follow similar trends using time series clustering analysis.

Initially, we model the number of patent documents of a single technology with a Poisson model. The average number of patent documents in a given year is modeled by probability density function of a student-t distribution multiplied by a constant term to capture the mean level of the number of patents. The fit of the model to patent documents of individual technologies is justified in Figure 4.2 using maximum likelihood estimates and Bayesian estimates respectively. It is clearly seen that the Bayesian estimates of the model fits well in all of the four technologies but some of the model fits using the MLE are relatively not good. Scatter plots in Figure 4.1 helps to have a general understanding of all parameter estimates using MLE and Bayesian analysis. The Bayesian estimates of the means are concentrated in a small interval while the maximum likelihood estimates of the means are relatively scattered in a wider range . The input of an expert’s opinion in the model causes the shrinkage of the parameters and thereby fitness of the model improves when the Bayesian analysis is applied.

Section 4.2 presents results from clustering analysis and parameter estimates of each cluster. Time series clustering analysis is carried out on 123 technologies from nine different subsections of the electricity section. Table 4.1 shows the number of tech-nologies grouped in each of the six clusters. Cluster 4 contains one technology from the subclass of Semiconductors devices. and cluster 6 contains four technologies

(44)

Chapter 5 Discussion out of which all of them are from the subclass Electric Switches. The other four clusters contain technologies from different sub-classes. This shows that technolo-gies which are classified in different sub-classes according to the CPC (Cooperative Patent Classification) can have similar trends.

Parameter estimates of clusters using both methods of MLE and Bayesian esti-mates show significant difference among clusters (see Table 4.2). This justifies the dissimilarity of cluster distributions. Plots in Figure 4.7 and Figure 4.8 show the probability distribution function of each cluster when the parameter estimates of MLE and Bayesian estimates are applied respectively. Notice that the shapes of the fitted models of the clusters using both Bayesian and maximum likelihood methods are different from one another. This concludes, trends of patent documents of tech-nologies within a cluster have similar shapes and those in different clusters have a different shapes.

Since we don’t know the optimal number of clusters in our data, parameters of the mixture model are estimated by assuming a different number of components. Tables in section 4.3 report parameter estimates of the model when it assumes three, four, five and six components. Bayesian information criterion (BIC) is one of the techniques that help to determine the optimal number of components in a mixture model. BIC gives a measure on how much is good the model in prediction performance (the lower the BIC is the better the model is). However, in this thesis based on the output of the clustering analysis we fit the mixture model with six components. Table 4.6 presents model parameters when the number of components is six. Notice from the table that parameter estimates of the components are different from one another which suggests that the model generates different clusters. But it is more convenient to see the difference among the components in terms of their probability distribution function plots in Figure 4.9.

Model performance in terms of data clustering is examined by comparing parameter estimates of the clusters in section 4.2.2 and parameter estimates of the components formed from the model in section 4.3. For comparison convenience, we compare the similarity between the shapes of the plots in Figure 4.8 and Figure 4.9. It is clearly seen that there is a similarity in the probability distribution functions of the following clusters and components. Cluster 1 and Component 2, Cluster 2 and Component 6, Cluster 3 and Component 5, Cluster 4 and Component 1. The two remaining components that the shape of their probability distribution function does

(45)

Discussion

not match to any of the plots of clusters are Component 3 and Component 4. But results of parameter estimates in Figure 4.1 and Table 4.6 suggests similarities of some parameter estimates of Cluster 5 with their respective parameter estimates of Component 3 and similarly some parameters of Cluster 6 and Component 4. Evaluation of the prediction performance of the model is carried out in two ways. The first evaluation is carried out to check how the model performances to predict trend of patent applications based from different base years. This prediction helps companies to determine the current stage of the technology in its S-curve develop-ment. Plots in Figure 4.10 show the model predictions of the trend of a technology called Electrically conductive technology. The model was checked its prediction per-formance based on different base years. Generally, all the plots show that the model performs well on predicting the shape of the trend. However, based on the SMAPE values and visual inspection the precision gets better as the base year increases. The second part of the evaluation was aimed to check how the model predicts the year which a maximum number of patent documents would publish. The evaluation was carried out on the 23 technologies (approximately 20%), which were not in-cluded in the model estimation analysis. Table 4.2 shows the predictions of the year which a maximum number of patents of each of the 23 technologies would publish. Predictions were made from the early stages of the technologies. The scatter plot in Figure 4.11 shows the relationship between the predictions and the real observa-tions. A quick view of the plot helps to interpret the measure of evaluation such that the closer a point to the 45 degree line is the better the prediction is. The plot reveals that many of the points fall very close to the 45 degree line which concludes that the model performs well. However, in some cases the model fails to predict the pick life cycle of some technologies. One reason to the bad prediction performance could be limited amount of data for emerging technologies as proved in Figure 4.12. Nevertheless, it is important to mention that the evolution of technology could af-fected by many factors. In this thesis, we built a model which assumes number patent documents as the only input. However, there are factors that could poten-tially affect the evolution of a technology, for example, the number of companies applying for a patent of a technology, number of individual inventors applying for patenting. The suggestion here is to extend the mixture model to the mixture-of-experts type of model. The mixture-of-mixture-of-experts (Jordan and Jacobs, 1993) extends the mixing coefficient parameter in the model to generalized linear model (GLM).

(46)

Chapter 5 Discussion In other words, the mixing coefficient parameter extends to a function of all the covariates.

(47)

6 Conclusions

In this thesis, we discuss how to predict life cycle of technologies based on life cycle of patent applications. We propose a mixture of life cycle Poisson model for predicting the life cycle of technologies from patent count data. Findings on the prediction performance of the model revealed that the prediction of the expected behavior become more accurate when more data is used to construct the prediction. The consideration of the experts opinion is one of the reasons that we found improves the estimation of the life cycle Poisson model. For the clustering analysis results showed that group of technologies follow similar trend even though they are from different sub-classes of a technological area. We compare the similarity between the pre-clustered parameters and the parameters of the components exploited by the mixture model. Most of clusters modeled using Bayesian estimates are very similar to the components of the mixture model. This concludes that the proposed mixture model performs well in terms of clustering according to the shapes and also it is a good option to use for forecasting patent publications even though the amount of observed data is limited.

For future work, it is encouraged to model the number of patent documents of a single technology with a negative binomial model. The Poisson model requires that the mean and variance are the same. However, in the case of applications of patent documents it is common that the data to be overdispersed. The negative binomial is an alternative to model overdispersed data since it allows the mean and variance to be fitted separately.

(48)

(49)

Bibliography

Aghabozorgi, S., S. A. Seyed, and W. T. Ying (2015). Time-series clustering - a decade review. Information Systems 53, 16–38.

Barber, D. (2011). Bayesian reasoning and machine learning.

Brockhoff, K. (1991). Competitor technology intelligence in german companies.

Industrial Marketing Management.

Casella, G. and E. George (2016). Explaining the gibbs sampler stable url :

http://www.jstor.org/stable/2685208 linked references are available on jstor for this article. Explaining the Gibbs Sampler 3 (3), 167–174.

Chen, Y., C. Chen, and S. Lee (2011). Technology forecasting and patent strategy of hydrogen energy and fuel cell technologies. International Journal of Hydrogen

EnergyInternational Journal of Hydrogen Energy 36 (12), 6957–6969.

datanovia (2018). Determining the optimal number of clusters: 3 must know

meth-ods.

https://www.datanovia.com/en/lessons/determining-the-optimal-number-of-clusters-3-must-know-methods/(20181226).

Dereli, T. and A. Durmusoglu (2009). A trend-based patent alert system for tech-nology watch. 68 (AUGUST), 674–679.

Diam, T., I. Iskin, and L. X. (2012). Patent analysis of wind energy technology using the patent alert system. World Patent Information.

Ernst, H. (2003). Patent information for strategic technology management. World

Patent Information 25 (3), 233–242.

Espacenet (2018). https://worldwide.espacenet.com/classification?locale=enEP(20180430).

Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin (2013). Bayesian Data Analysis (third ed.). Number 519-542. CRC Press.

(50)

Chapter 6 Bibliography Jasra, A., C. Holmes, and D. A. Stephens (2005). Markov chain monte carlo methods and the label switching problem in bayesian mixture modeling. Statistical Science 20, 50–67.

Jian, Z. (2007). Slice sampler and gibbs sampler.

Jordan, M. I. and R. A. Jacobs (1993). Hierarchical mixtures of experts and the em algorithm michael.

M. Neal, R. (2003). Slice sampling: Rejoinder. Annals of Statistics 31, 758–767. McAleer, M. Chan, F. Marinova, and Dora (2007). An econometric analysis of

asym-metric volatility: Theory and application to patents. Journal of Econoasym-metrics 139, 259–284.

Mishra, S., G. Deshmukh, and P.vart (2002). Matching technological forecasting tech-niques to a technology. pp. 1–27.

Montero, P. and J. Vilar (2014). Tsclust: An r package for time series clustering. JSS

Journal of Statistical Software 63, 1–43.

N. Meade, T. I. (1998). Technological forecasting, model selection, model stability, and combining models. Management Science 44, 1115–1130.

Ria, P. and S. Singh (2010). A survey of clustering techniques. International Journal

of Computer Applications 7, 1–5.

Scholz, F. (2006). Maximum likelihood estimation. Encyclopedia of Statistical Sciences.

wikipedia. Symmetric mean absolute percentage error.

https://en.wikipedia.org/wiki/Symmetricmeanabsolutepercentageerror(20181222).

WIPO (2018). http://www.wipo.int/services/en/(20180209).

(51)

Predicting the life cycle of technologies from patent data

Master Thesis in Statistics and Data Mining