• No results found

Citation for the original published paper (version of record):

N/A
N/A
Protected

Academic year: 2021

Share "Citation for the original published paper (version of record):"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

http://www.diva-portal.org

Postprint

This is the accepted version of a paper published in Automatica. This paper has been peer- reviewed but does not include the final publisher proof-corrections or journal pagination.

Citation for the original published paper (version of record):

Abdalmoaty, M ., Hjalmarsson, H. (2019)

Linear Prediction Error Methods for Stochastic Nonlinear Models Automatica, 105: 49-63

https://doi.org/10.1016/j.automatica.2019.03.006

Access to the published version may require subscription.

N.B. When citing this work, cite the original published paper.

Permanent link to this version:

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-235340

(2)

Linear Prediction Error Methods for Stochastic Nonlinear Models ?

Mohamed Rasheed-Hilmy Abdalmoaty, H˚ akan Hjalmarsson

Division of Decision and Control Systems, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Malvinas v¨ag 10, floor 6, SE-10044 Stockholm, Sweden

Abstract

The estimation problem for stochastic parametric nonlinear dynamical models is recognized to be challenging. The main difficulty is the intractability of the likelihood function and the optimal one-step ahead predictor. In this paper, we present relatively simple prediction error methods based on non-stationary predictors that are linear in the outputs. They can be seen as extensions of the linear prediction error methods for the case where the hypothesized model is stochastic and nonlinear. The resulting estimators are defined by analytically tractable objective functions in several common cases. It is shown that, under certain identifiability and standard regularity conditions, the estimators are consistent and asymptotically normal. We discuss the relationship between the suggested estimators and those based on second-order equivalent models as well as the maximum likelihood method. The paper is concluded with a numerical simulation example as well as a real-data benchmark problem.

Key words: Parameter estimation; System identification; Stochastic systems; Nonlinear models; Prediction error methods.

1 Introduction

System identification of linear dynamical systems is a well-developed and well-understood subject. During the last five decades, methods and algorithms based on stochastic as well as deterministic frameworks have been developed and used. The availability of many devoted monographs [22,27,64,44,40,54] as well as software pack- ages [36,48,31,52] is a clear indication of the maturity of the subject. In principle, linear system identification may be used to construct linear models even when the underlying system is nonlinear [41,17,57,58]; however, when the results are not satisfactory, nonlinear models have to be identified.

Unfortunately, the estimation problem for stochastic nonlinear models can be quite challenging. General nonlinear transformations of unobserved disturbances render commonly used estimation methods—such as the Maximum Likelihood (ML) method—analytically intractable. Until recently, the main body on system

? This work was supported by the Swedish Research Council via the projects NewLEADS (contract number: 2016-06079) and System identification: Unleashing the algorithms (con- tract number: 2015-05285). Parts of this paper have appeared in [1] and [2]. Corresponding author M. Abdalmoaty.

Email addresses: abda@kth.se (Mohamed Rasheed-Hilmy Abdalmoaty), hjalmars@kth.se (H˚akan Hjalmarsson).

identification of nonlinear models considered model structures with an explicit correspondence between ob- servations and innovations such that predictors and likelihood functions are (relatively) easy to compute; for example, NARX and NARMAX models belong to this type of model structures. Several methods have been developed under that assumption to address problems such as model structure selection, parameterization, and initialization; see the surveys [7,24,63,42,60], the arti- cles [29,61,53,62] and the books [47,20,8,45]. It was not until the last decade that such a restrictive assumption on the model was relaxed.

It has been shown in [25] that estimators obtained by ig- noring disturbances passing through the nonlinear sys- tem may not be consistent. Since then, there has been a growing interest in the estimation problem for stochas- tic nonlinear models. An ML method and a Prediction Error Method (PEM) for stochastic Wiener models, when the unobserved disturbance is independent over time, have been developed in [26] and [67], respectively.

In [66] a performance analysis and approximate esti-

mation methods based on Taylor approximations were

considered; however, due to this approximation, the ob-

tained estimators may not be consistent. A solution to

the ML problem for general stochastic nonlinear state-

space models was proposed in [56]. It relied on a Monte

Carlo Expectation-Maximization (MCEM) algorithm

[68] where the E-step was approximated by a Sequential

(3)

Monte Carlo (SMC) smoother [16] (also known as par- ticle smoother). A PEM estimator based on the optimal Mean-Square Error (MSE) one-step ahead predictor was suggested in [49]. In [70], a MCEM algorithm, in the same spirit as [56], was used; but this time a rejection sampling based particle smoother [14] was employed.

These methods, however, can be computationally ex- pensive: to be convergent, they require the number of used particles in the SMC smoother to increase with the iterations of the optimization algorithm [18].

The current state-of-the-art algorithm for off-line ML estimation of general nonlinear state-space models was outlined in [34]. It is based on a combination of a stochas- tic approximation Expectation-Maximization algorithm [13] and an SMC smoother known as the conditional par- ticle filter with ancestor sampling (CPF-AS) [35]. The CPF-AS is an SMC sampler similar to a standard auxil- iary particle filter [16] with the difference that one parti- cle at each time step is set deterministically. The result- ing method is asymptotically efficient and convergence to an ML estimate can be established [32]. More recently, an algorithm for on-line ML estimation has been pro- posed in [51]. It employs a recently developed on-line SMC smoother [50] to approximate the gradient of the predictive densities of the state (also known as tangent filters/filter sensitivity [11, Section 10.2.4]) which are then used to update the parameter estimate.

These methods have been shown to provide interesting results on several benchmark problems. However, their application is so far limited to cases where fundamental limitations of SMC algorithms—such as particle degen- eracy (see [15,16])—can be avoided. For example, they are not directly applicable when the measurement noise variance is small; in this case a modified algorithm has to be used [65]. Furthermore, the convergence of the Expectation-Maximization algorithm may be very slow if the variance of the latent process is small. Moreover, the estimation of high-dimensional models is still out of reach. These limitations are currently the topic of active research within different communities including system identification; see for example [46] and [69].

1.1 Contributions

In this paper, we introduce and analyze a PEM based on predictors that are linear in the past outputs. The use of these predictors can be motivated by Wold’s decompo- sition of general second-order non-stationary processes (see Appendix A). It has been noticed in [1] that their use corresponds to a partial probabilistic model. They rely on the second-order properties of the model and the computations of the exact likelihood function are not re- quired. Therefore, they are relatively easy to compute, and can be highly competitive in this respect compared to estimators based on SMC smoothing algorithms. We show that they may be given in terms of closed-form expressions for several common cases, and Monte Carlo approximations are not necessarily required. The differ-

ence between the proposed predictors and linear pre- dictors based on second-order equivalent models [41,17]

is described. Furthermore, the convergence and consis- tency of the resulting PEM estimators is established un- der standard regularity and certain identifiability con- ditions. The price paid for bypassing the computations of the likelihood function is a loss of statistical asymp- totic efficiency. Nevertheless, it is possible to improve the asymptotic properties of the resulting estimator by one iteration of a Newton-Raphson scheme. This requires the evaluation of the gradient vector and the Hessian matrix of the log-likelihood function, and may be achieved by a single run of a particle smoothing algorithm, e.g., a con- ditional particle filter [35]. As is well known, this refined estimator is asymptotically first-order equivalent to the maximum likelihood estimator [33, Chapter 6].

1.2 Paper outline

We start in Section 2 by introducing a stochastic frame- work and formulating the main problem. In Section 3, we introduce one-step ahead optimal and suboptimal lin- ear predictors for a general class of nonlinear stochastic models. The relationship between these predictors and predictors obtained using second-order equivalent mod- els is discussed. In Section 4, linear PEM estimators are defined; their consistency and asymptotic normality are established under standard conditions in Section 5 and Section 6. A maximum likelihood interpretation is given in Section 7. In Section 8, a numerical simulation exam- ple as well as a recent real-data benchmark problem are used to demonstrate the performance of the proposed estimators. The paper is concluded in Section 9. Finally, Appendix A gives a brief overview of Wold’s decompo- sition of second-order non-stationary processes.

1.3 Notations

Bold font is used to denote random quantities and regu- lar font is used to denote realizations thereof. The triplet (Ω, F, P

θ

) denotes a generic underlying probability space on which the output process y is defined; here, Ω is the sample space, F is the basic σ-algebra, and P

θ

is a proba- bility measure parameterized by a finite-dimentional real vector θ and an a priori known input signal u. The sym- bols E[·; θ], var(·; θ) and cov(·, ·; θ) denote the mathe- matical expectation, variance and covariance operators with respect to P

θ

. The space L

n2

(Ω, F, P

θ

) is the Hilbert space of R

n

-valued random vectors with finite second moments [9, Chapter 2]; for brevity, we simply use L

n2

. The notation x ∼ p(x) is used to mean that the ran- dom variable x is distributed according to the probabil- ity density function p(x). For a matrix M , the notation [M ]

ij

denotes the ij

th

-entry of M , and when M is real and symmetric, M  0 means that M is positive defi- nite. Finally for any vector v, kvk

2M

:= v

>

M v.

2 Problem Formulation

In this section, we define the used stochastic framework

and formulate the main problem of the paper.

(4)

2.1 Signals

The outputs and disturbances are all modeled using discrete-time stochastic processes. In other words, we as- sume that the observed data is embedded in an infinite sequence of potential observations. The output signal y := {y

t

: t ∈ Z} is modeled as an R

dy

-valued discrete- time stochastic process defined over (Ω , F, P

θ

), d

y

∈ N.

The probability measure P

θ

is parameterized by a finite- dimensional parameter θ, assuming values in a compact subset Θ ⊂ R

d

, and an a priori known d

u

-dimensional input signal u := {u

t

: t ∈ Z}, d

u

∈ N. Hence, the underlying dynamical system is necessarily operating in open-loop, and all unknown disturbances are stochastic processes. The models to be developed are deterministic functions that define a mapping between these processes such that they completely specify P

θ

. We will only con- sider purely non-deterministic processes that have finite second-order moments: y ⊂ L

d2y

; see Appendix A.

One of the simplest second-order stochastic processes is white noise. In this paper, white noise is defined as a se- quence of uncorrelated random variables with zero mean and finite variance. This definition is quite weak: it does not specify the distribution of the process and neither stationarity nor independence are assumed. However, it is sufficient for our purposes since the proposed methods do not require the use of a full probabilistic model.

2.2 Mathematical Models

We consider the class of discrete-time causal dynamical models given by the following definition.

Definition 1 (Stochastic parametric nonlinear model) A stochastic parametric nonlinear model is defined by the relations

y

t

= f

t

( {u

k

}

t−1k=1

, {ζ

k

}

tk=1

; θ), (1) in which t = 1, . . . , N , and θ ∈ Θ is a parameter to be identified, and {ζ

k

}

tk=1

is a subsequence of an unobserved R

dζ

-valued stochastic process, whose distribution may be parameterized by θ, such that {y

k

}

Nk=1

is a subsequence of a second-order stochastic process y.

This definition emphasizes the input-output nature of our approach. The resulting model class is fairly general:

it covers a wide range of static models as well as most of the commonly used dynamic model structures. Consider, for example, a stochastic nonlinear time-varying state- space model [40, Section 5.3] defined by the relations

x

t+1

= h

t

(x

t

, u

t

, w

t

; θ), x

1

∼ p

x1

(θ), w

t

∼ p

wt

(θ), y

t

= g

t

(x

t

, v

t

; θ), v

t

∼ p

vt

(θ), t ∈ N, in which x is the state process, and w and v are unob- served disturbances and noise. Define the process ζ as

ζ

1

:= h x

>1

v

>1

i

>

, ζ

t

:= h

w

>t−1

v

>t

i

>

∀t > 1.

Then the functions {h

t

} and {g

t

} determine a model of the form in (1) as follows

f11; θ) := g1(x1, v1; θ),

f2(u1, {ζk}2k=1; θ) := g2(h1(x1, u1, w1; θ), v2; θ),

.. .

ft({uk}t−1k=1, {ζk}tk=1; θ) :=

gt(ht−1(. . . h1(x1, u1, w1; θ) . . . , ut−1, wt−1; θ), vt; θ).

2.3 The problem Define the data set

D

t

:= {(y

k

, u

k

) : k = 1, . . . , t }, (2) that contains pairs of inputs and outputs up to time t ∈ N. We will assume that the data is generated by a known model structure, i.e., known functions {f

t

} in (1) parameterized by an unknown true parameter θ

∈ Θ.

Thus, we will not be concerned here with the important problem of model structure selection

1

.

Assumption 2 (True system) The sequence of data sets {D

t

}

t=1

follows a known model structure (1) with an unknown parameter θ

∈ Θ.

The problem studied in the paper is the construction of a point estimate ˆ θ of the parameter vector θ

based on a given realization of the data set D

N

.

One of the favored and commonly used point estimators is the ML Estimator (MLE) whose computations require the evaluation of the likelihood function of θ at the ob- served data. While this can be done efficiently for Gaus- sian linear models, see for example [4], the likelihood function for the model in (1) is, in general, analytically intractable. Let us define the vectors

Y := h

y

>1

, . . . , y

>N

i

>

, Z := h

ζ

>1

, . . . , ζ

>N

i

>

, and assume the existence of a known joint probability density function p(Y , Z; θ). Then the likelihood function of θ is given by the high-dimensional marginalization integral

p(Y ; θ) = Z

Rdz

p(Y, Z; θ) dZ, (3) in which d

z

= d

ζ

N is the dimension of Z. An alterna- tive to the ML method is a PEM based on the optimal MSE one-step ahead predictor. Unfortunately, for the stochastic nonlinear model in (1), such a predictor is in general analytically intractable. It is given by

y ˆ

t|t−1

(θ) = Z

Rdy

y

t

p(y

t

|Y

t−1

; θ) dy

t

∀t ∈ Z, (4)

1 The convergence of the proposed estimators can be estab- lished even when θdoes not exist or θ∈ Θ; see Section 5./

(5)

where p(y

t

|Y

t−1

; θ) is the predictive density of y

t

, Y

t−1

:= h

y

>1

, . . . , y

>t−1

i

>

, and Y

0

is defined as the empty set. Observe that by Bayes’ theorem (see [28, page 39]),

p(y

t

|Y

t−1

; θ) = p(Y

t

; θ) R

Rdy

p(Y

t

; θ) dy

t

= R

Rdt

p(Y

t

, Z

t

; θ) dZ

t

R

Rdy

R

Rdt

p(Y

t

, Z

t

; θ) dZ

t

dy

t

. (5)

where d

t

= d

ζ

t, and thus, except in very few cases, are analytically intractable. Hence, it seems that a PEM based on the optimal MSE one-step ahead predictor does not have any computational advantage over the asymp- totically efficient ML method. Both the MLE and the conditional mean of the output require the solution of similar intractable marginalization integrals. While ig- noring the unobserved disturbance may lead to closed form predictors, it is well known that the resulting PEM estimator is not guaranteed to be consistent [26]. For this reason, most of the recent research efforts found in the system identification literature target the MLE.

In this contribution, we consider PEMs based on rel- atively simple suboptimal predictors; they are used to construct consistent and asymptotically normally dis- tributed estimators. The obtained results can be seen as extensions of the linear case and can be motivated by Wold’s decomposition (see Appendix A).

3 Linear Predictors for Nonlinear Models The general prediction problem can be described as follows: at time t − 1, we have observed the outputs y

1

, . . . , y

t−1

for some t ∈ N and wish to estimate a value for, the next output, y

t

. In general, for a known input u and a given θ, a one-step ahead predictor may be de- fined as a measurable function of Y

t−1

, usually chosen to minimize some criteria. As pointed out above, the optimal MSE predictor is a common choice; however, in general, it is given by the intractable integral (4).

Instead, in this paper, we consider a class of predictors that are linear in the past outputs and has the form y ˆ

t|t−1

(θ) = ˜ µ

t

(U

t−1

; θ) +

t−1

X

k=1

˜ l

t−k

(t, U

t−1

; θ)y

k

, t ∈ N,

where ˜ µ

t

and ˜ l

t−k

are, possibly nonlinear, functions in θ and the known vector of inputs

U

t−1

:= h

u

>1

, . . . , u

>t−1

i

>

.

Observe that the dependence of the predictor on u is implicit in the notation.

Linear predictors are much easier to work with; a unique linear Minimum MSE (MMSE) predictor for any second- order process always exists among the set of linear pre- dictors (see Lemma 4 below). The computations are also

straightforward, and closed-form expressions for the pre- dictors may be available in several common cases.

3.1 The Optimal Linear Predictor (OL-predictor) By considering the outputs of the model in (1) as ele- ments of the Hilbert space L

d2y

, the projection theorem (see [72] or [3]) can be used to define the linear MMSE one-step ahead predictor. This is a standard result of Hilbert spaces; the key idea is that such a predictor can be thought of as the unique orthogonal projection of y

t

onto the closed subspace spanned by the entries of Y

t−1

when the MSE is used as an optimality criterion.

Definition 3 (Linear MMSE one-step ahead pre- dictor) Let S ⊂ L

d2y

be the closed subspace spanned by the entries of Y

t−1

. Then, a linear Minimum MSE (MMSE) predictor of y

t

in S is defined as a vector y ˆ

t|t−1

∈ S such that

E h

ky

t

− ˆ y

t|t−1

k

22

; θ i

≤ E ky

t

− ˜yk

22

; θ 

∀˜y ∈ S.

A characterization of such a predictor is given in the following classical lemma. Note that all the expectations are functions of the input which is assumed to be known and deterministic.

Lemma 4 (Existence and uniqueness) The linear MMSE one-step ahead predictor defined in Definition 3 exists and is unique. It is given by

y ˆ

t|t−1

( θ) = E[y

t

; θ]+Ψ

t

(U

t−1

; θ) (Y

t−1

− µ

t−1

(U

t−1

; θ)) , (6) for 1 < t ≤ N, where µ

t−1

(U

t−1

; θ) := E[Y

t−1

; θ], Y

t−1

= [y

>1

, . . . , y

>t−1

]

>

and Ψ

t

(U

t−1

; θ) is given by any solution to the normal equations

Ψ

t

(U

t−1

; θ)[cov(Y

t−1

, Y

t−1

; θ)] = cov(y

t

, Y

t−1

; θ). (7) Furthermore, ˆ y

1|0

( θ) = E[y

1

; θ].

PROOF. See [72].

For brevity, we will refer to the linear MMSE predictor in (6) as the Optimal Linear predictor (OL-predictor).

Remark 5 Observe that the coefficients in (7), which are used in the expression of the OL-predictor, depend only on the unconditional first and second moments of y up to time t. therefore, the computations of the OL- predictor can be simpler than that of the unrestricted op- timal predictor (the conditional mean) that, as shown in Section 2.3, required computing the integrals (4) and (5).

To connect Lemma 4 to Wold’s decomposition of y, note

that the predictor in (6) would be easy to compute if the

(6)

matrices cov(Y

t−1

, Y

t−1

; θ) were diagonal. This holds only if the output vectors y

1

, . . . , y

t−1

are orthogonal (uncorrelated), which is rarely the case in most appli- cations. Nevertheless, the Gram-Schmidt procedure (see [30]) can be used to (causally) transform the output vec- tors into a set of orthogonal vectors {ε

k

} such that

εt(θ) := yt− ˆyt|t−1(θ), 1 ≤ t ≤ N,

= yt− E[yt; θ] −

t−1

X

k=1

cov(yt, εk; θ)λ−1εkεk(θ), (8)

with ε

1

(θ) = y

1

− E[y

1

; θ], and λ

εk

= cov(ε

k

, ε

k

; θ).

Let E

t−1

:= [ε

>1

. . . ε

>t−1

]

>

. Then, for linear prediction, the vectors E

t−1

and Y

t−1

are equivalent in the sense that they span the same subspaces. Thus, under the as- sumption that all signals are known to be zero for t ≤ 0, the above construction is identical to Wold’s decompo- sition (see the third row of (A.2) and compare to (8)).

The vector ε

t

is known as the innovation in y

t

(see [12]).

Definition 6 (The (linear) innovation process) The linear innovation process of y is defined as

ε

t

(θ) := y

t

− ˆ y

t|t−1

(θ), t ∈ Z, where ˆ y

t|t−1

(θ) is the OL-predictor defined in (6).

The next lemma concerns the computations of the OL- predictor. It shows that finding the predictors and the innovations corresponds to a (block) LDL

>

factoriza- tion (see [21, Chapter 4]) of the covariance matrix of Y := [y

>1

, . . . , y

>N

]

>

. We will use the notation U := U

N

. Lemma 7 (Computations of the OL-predictor) Consider the general nonlinear model in (1) such that y

t

= 0 ∀t ≤ 0. Suppose that

µ(U ; θ) := E[Y ; θ],

Σ(U ; θ) := cov(Y , Y ; θ)  0 (9) are given. Then the unique OL-predictor of y

t

, t = 1, . . . , N , is given by

ˆ

yt|t−1(θ) = E[yt; θ] +

t−1

X

k=1

˜lt−k(t, Ut−1; θ) (yk− E[yk; θ]) (10)

in which ˜ l

j

(t, U

t−1

; θ) := L

−1

(U ; θ) 

tj

, where the ma- trix L(U ; θ) is the unique (block) lower unitriangular ma- trix

2

given by the (block) LDL

>

factorization of Σ; that is,

Σ(U ; θ) =: L(U ; θ)Λ(U ; θ)L

>

(U ; θ). (11) Moreover, ˆ y

1|0

( θ) = E[y

1

; θ] and the vector of OL- predictors is given by

Y (θ) := b h

y ˆ

>1|0

(θ) . . . ˆ y

>N |N −1

(θ) i

>

= Y − L

−1

(U ; θ)(Y − µ(U; θ)).

(12)

PROOF. To establish (10), first recall that whenever the covariance matrix Σ is positive definite, the factor- ization in (11) is unique (see [21, Theorem 4.1.3]). Then observe that, using Wold’s decomposition or (8), we may write

Y = µ(U ; θ) + ˜ L(U ; θ)E, (13)

L(U ; θ) = ˜

I 0 . . . 0

cov(y

2

, ε

1

−1ε1

I . . . 0 .. . .. . . . . .. . cov( y

N

, ε

1

) λ

−1ε1

cov( y

N

, ε

2

) λ

−1ε2

. . . I

 .

From (13), due to the linearity of the expectation op- erator, cov(Y , Y ; θ) = ˜ L(U ; θ)˜ Λ(U ; θ) ˜ L

>

(U ; θ). Conse- quently, the uniqueness of the factorization in (11) im- plies that ˜ L(U ; θ) = L(U ; θ), and ˜ Λ(U ; θ) = Λ(U ; θ) is a block diagonal matrix of innovation covariances. Now, observe that it is possible to compute the innovations vector by inverting the unitriangular matrix L (which is always invertible for any finite N ) to get

E(θ) = L

−1

(U ; θ)(Y − µ(U; θ)), (14) and by definition (see (8)) we have

E(θ) = Y − b Y (θ). (15) Therefore the vector of OL-predictors is given by

Y (θ) = Y b − L

−1

( U ; θ)(Y − µ(U; θ))

= (I − L

−1

(U ; θ))Y + L

−1

(U ; θ)µ(U ; θ) from which (10) follows after making use of the unitri- angular form of L

−1

(U ; θ).

The computations of the innovations in Lemma 7 are similar to that of the standard Kalman filter in the linear case (see [30]); however, an important difference here is the dependence on the used input in L and Λ.

Remark 8 Wold’s decomposition implies that ˆ y

t|t−1

is well defined in terms of the innovation process as t → ∞ (Theorem 28). However, an invertibility condition on y with respect to the linear innovations need to be imposed in order to be able to compute the OL-predictor in terms of the data as N → ∞ (see Section 5.1.1).

3.2 The Output-Error predictor (OE-predictor) In order to define a sensible linear predictor without us- ing an optimality criteria, we first recall how a subopti- mal predictor may be defined in the linear case. Suppose that y

t

= G(q; θ)u

t

+ H(q)ε

t

, where G(q; θ) is a stable transfer operator, ε is white noise, and q is the forward- shift operator (see [5]). Then, it is well known that if

2 A lower unitriangular matrix is a lower triangular matrix whose main diagonal entries are equal to the identity matrix.

(7)

the data is collected in open-loop, and when standard regularity and identifiability conditions hold, a PEM estimator based on the Output-Error (OE) predictor, y ˆ

t

(θ) = G(q; θ)u

t

, is consistent [40, Theorem 8.4]. No- tice that this predictor neither requires the specification of the exact noise model H(q), nor the distribution of ε. The only used information regarding the probabilis- tic structure of the model is the mean of its output. It is thus possible to generalize the above observation to a large class of stochastic nonlinear models whose output has a finite mean, such as the model in (1). This leads us to the following definition.

Definition 9 (The OE-predictor) Consider the gen- eral model in (1). The Output-Error predictor (OE- predictor) of y

t

is defined as the deterministic quantity

y ˆ

t

( θ) := E[y

t

; θ], t ∈ N. (16)

The predictor in (16), although deterministic and in- dependent of Y

t−1

, is different from the “nonlinear simulation predictor” [40, Section 5.3, page 147] which is defined by fixing {ζ

k

}

tk=1

in (1) to zero and taking y ˆ

t

(θ) = f

t

( {u

k

}

t−1k=1

, 0 ; θ). Instead, the OE-predictor (16) averages the output over all possible values of the unobserved disturbances.

Both the OL-predictor and the OE-predictor may be computed in terms of closed-form expressions in several common cases—which are usually considered challenging—as illustrated in the following example.

Example 10 (Linear predictors for a scalar stochastic Wiener model) Consider a stochastic Wiener model defined by the relations

y

t

= β(u

t

+ w

t

)

2

+ 1

1 − aq

−1

v

t

− 2β, (17) t = 1, . . . , N , in which β ∈ R, u is a known input signal, and w and v are unobserved independent white noises with time-independent variances denoted by λ

w

and λ

v

respectively. Let θ := [β λ

w

λ

v

]

>

and suppose that a is known such that |a| < 1 and that all signals are scalars.

Observe that the full distribution of v is not specified in the model. However, for the clarity of the exposition, as- sume that w is a Gaussian process and let w and v be mutually independent. Moreover, note that even when a full probabilistic model is hypothesized, both the likelihood function of θ and the optimal MSE predictor of y are an- alytically intractable (see e.g. [16]). However, as we now show, the mean and the covariance of the model’s output can be computed in terms of closed-form expressions.

The model in (17) may be written in vector form as Y = β(U + W )

2

+ HV − 2β1

in which 1 denotes a vector of ones,

W :=

 w

1

.. . w

N

 , V :=

 v

1

.. . v

N

 , H :=

1 0 . . . 0 a 1 . . . 0 .. . . . . a

N −1

a

N −2

. . . 1

 ,

and the exponent is applied entry-wise; i.e., for a vector X we define X

2

:= x

21

, . . . , x

2N



>

. Then, it is straight- forward to see that the mean of Y is given by

µ(U ; θ) = E[Y ; θ] = E[β(U + W )

2

; θ] − 2β1

= β(U

2

+ λ

w

1) − 2β1. (18) Because W and V are independent, the covariance ma- trix of Y is given by

Σ(U ; θ) = cov(Y , Y ; θ)

= β2cov (U + W )2, (U + W )2 + cov(HV , HV )

= D(U ; θ) + λvHH>,

(19)

where D(U ; θ) is a diagonal matrix with entries

[D(U ; θ)]

tt

= 2β

2

λ

w

(2u

2t

+ λ

w

), t = 1, . . . N, (20) because of the assumption that W ∼ N (0, λ

w

I

N

). There- fore, the vector of OE-predictors is given by

Y (θ) := β(U b

2

+ (λ

w

− 2)1), (21) and, by Lemma 7, the vector of OL-predictors is given by (12) which, using (18)-(20), is equal to

Y (θ) = Y b − L

−1

( U ; θ)(Y − β(U

2

+ ( λ

w

− 2)1)), where L(U ; θ) is given by the LDL

>

factorization of Σ(U ; θ).

Observe that due to the nonlinearity of the model, the covariance matrix of Y depends on the input (unlike the case of linear models). Moreover, note that the predictors are parameterized by β as well as λ

w

and λ

v

.

A straightforward extension of the model in (17)—that does not affect the discussion—is to let w be a linearly fil- tered Gaussian white noise and assume a parameterized input; for example, u

t

(θ) := G(q; θ)˜ u

t

for some transfer operator G and known signal ˜ u.

In the following section, we discuss the relationship be- tween the proposed predictors and linear predictors ob- tained based on LTI second-order equivalent models.

3.3 Relation to LTI Second-Order Equivalent Models

Linear time-invariant approximations of nonlinear sys-

tems are usually considered under different sets of as-

sumptions and objectives [41,17,58]. They are generally

(8)

studied in an MSE framework where assumptions and restrictions on the systems to be approximated are im- plicitly given as assumptions on the input and output signals. It is commonly assumed that the inputs and the outputs are zero mean stationary stochastic processes, such that the input belongs to a certain class; for exam- ple, a class of periodic processes, or processes that have a specific spectrum. In such a framework, explicit as- sumptions on the underlying data generating mechanism (such as a parametric nonlinear model) are not neces- sarily used or required. The goal there is to use spectral assumptions on the data to obtain an LTI model—linear in both y and u—that approximate the behavior of the underlying nonlinear system. Once a model is computed, it might be used to construct a predictor of y

t

that is linear in the past inputs and outputs.

An Output-Error LTI Second-Order Equivalent (OE- LTI-SOE) model is defined in [17, Section 4.2] as

G

OE

(q) := arg min

G∈G

E ky

t

− G(q)u

t

k

2

 , (22) where G is the set of stable and causal LTI models, the expectation operator is with respect to the joint distri- bution of y and u. When the stability and causality con- straints are dropped, the minimizer is called the best linear approximation (BLA) [54]. Note that an OE-LTI- SOE model only captures the causal part of the cross- covariance function between y and u. A better approx- imation is obtained by a General-Error LTI-SOE (GE- LTI-SOE) model defined as (see [17, Section 4.4] or [41])

(GGE, HGE) := arg min

G,H

E h

kH−1(q)(yt− G(q)ut)k2

i

such that H−1(q), H−1(q)G(q) ∈ G.

(23)

It captures the second-order properties in terms of the covariance function of y and the cross-covariance func- tion between y and u. In other words, the process

y ˜

t

:= G

GE

(q)u

t

+ H

GE

(q)˜ ε

t

(24) has exactly the same spectrum as y, where ˜ ε is a sta- tionary white noise with variance

λ

0

:= E kH

GE−1

(q)(y

t

− G

GE

(q)u

t

) k

2

 .

By definition, LTI-SOE models depend on the assumed distribution of the input process. Notice that the models in (22) and (23) are defined by averaging, not only over y, but also over all realizations of the input u. Therefore, one has to speak of an LTI-SOE model “with respect to a certain class of input signals”. In this contribution, by contrast, the inputs are assumed fixed and known. They are used to describe the mean and covariance functions of y, which is not necessarily stationary, and therefore all the computations are conditioned on the given input.

To further clarify these important remarks, we have the following example.

Example 11 (LTI-SOE predictor models) Con- sider the model of Example 10 and let a = 0 so that

y

t

= β(u

t

+ w

t

)

2

+ v

t

− 2β. (25) Suppose that u, w and v are independent and mutually independent zero mean stationary Gaussian processes with unit variances. Then y is a zero mean stationary process. Now observe that due to the independence as- sumptions

E[y

t

u

t−τ

] = 0 ∀τ > 1,

E[y

t

u

t

] = E[βu

3t

+ βu

t

w

2t

+2 βu

2t

w

t

+ u

t

v

t

−2βu

t

]

= 0,

and therefore the cross-spectrum between y and u is Φ

yu

(z) = 0. Moreover, straightforward calculations show that the spectra of u and y are given by Φ

u

(z) = 1, and Φ

y

(z) = 8β

2

+ 1. Consequently, the OE-LTI-SOE model in (22) is given by [17, Corollary 4.1]

G

OE

(q) = Φ

yu

(q) Φ

u

(q) = 0,

which is independent of β and Φ

u

(q). Similarly, the GE- LTI-SOE model in (23) is given by [17, Theorem 4.5]

G

GE

(q) = 0, H

GE

(q) = 1,

and λ

0

= 8β

2

+1. Therefore, in this case, ˜ y

t

= H

GE

(q)˜ ε

t

has exactly the same spectrum as y, and the optimal lin- ear predictor constructed based on either LTI-SOE mod- els is independent of β and u; it is given by

y ˆ

t|t−1

= (1 − H

GE−1

(q))y

t

+ H

GE−1

(q)G

GE

(q)u

t

= 0.

On the other hand, the OL-predictor and the OE- predictor suggested in this paper are defined by condi- tioning on the assumed known (realization of the) input.

As shown in Example 10, assuming that u is known, the mean and the covariance of the model’s output are given by (18) and (19). When a = 0 and λ

w

= λ

v

= 1, the mean of y

t

becomes E[y

t

; θ] = β(u

2t

− 1), and its variance becomes var(y

t

) = 2β

2

(2u

2t

+ 1) + 1. Hence, Wold’s decomposition of y (given u) is

y

t

= β(u

2t

− 1) + H

t

(q; β)ε

t

, var(ε

t

) = 1 , (26) in which H

t

(q; β) is a time-varying filter with impulse response coefficients

h

k

(t) = 0 ∀k ≥ 1, h

0

(t) = q

2

(2u

2t

+ 1) + 1 ∀t ∈ Z.

Note that here, because a = 0, y is an independent process and the OL-predictor coincides with the unrestricted op- timal MSE predictor as well as the unconditional mean:

y ˆ

t|t−1

( β) = E[y

t

; β] = E[y

t

|Y

t−1

; β] = β(u

2t

− 1),

which is nonlinear in u

t

.

(9)

Thus, the main difference between the models in (24) and (26) is how the input is handled. While LTI-SOE models are defined by averaging over a stationary input, the model in (26) is obtained by conditioning on a given realization, typically leading to a non-stationary model.

4 Linear PEM Estimators

We now define four PEM estimators based on the pre- dictors defined in the previous section. Their asymptotic analysis is given in Section 5 and Section 6.

The first estimator is based on the OL-predictor and the squared Euclidean norm; we will refer to it as the OL- QPEM (OL-predictor Quadratic PEM) estimator.

Definition 12 (The OL-QPEM estimator) The OL-QPEM estimator is defined as

θ(D ˆ

N

) = arg min

θ∈Θ

kY − b Y (θ) k

2

where Y (θ) =Y b − L

−1

(U ; θ)(Y − µ(U; θ)), (27)

in which µ and L are defined in (9) and (11).

Note that, by the definition of the OL-predictor, it holds that the expectation of the criterion function in (27) is minimized at θ

, i.e.,

E[kY − b Y (θ) k

22

; θ

] ≥ E[kY − b Y (θ

) k

22

; θ

] ∀θ ∈ Θ, and whenever U and the parameterization of µ and L is such that b Y (θ

) = b Y (θ) = ⇒ θ

= θ, the true parameter θ

is a unique minimizer.

Observe that in the classical case of LTI models, the OL-QPEM estimator is nothing more than the com- monly used PEM estimator defined by the Euclidean norm and the optimal linear one-step ahead predictor

ˆ

yt|t−1(θ) = yt− H−1(q; θ)(yt− G(q; θ)ut),

= G(q; θ)ut+

t−1

X

k=1

˜ht−k(θ) (yk− G(q; θ)uk) (28)

where G(q; θ) is the plant model and {˜h

k

(θ) } is the im- pulse response of the inverted noise model H

−1

(q; θ) [40, Chapter 3]. In that case, [µ(U ; θ)]

t

= G(q; θ)u

t

and [L

−1

(U ; θ)]

ij

= ˜ h

|i−j|

(θ). Furthermore, conditions on u and the parameterization are given by the concepts of informative experiment and identifiability [40,64].

The second estimator is based on the OL-predictor and a weighted time- and θ-dependent criterion function; we will refer to it as the OL-GPEM (OL-predictor Gaussian PEM) estimator because the criterion function is on the form of a Gaussian log-likelihood function.

Definition 13 (The OL-GPEM estimator) The OL-GPEM estimator is defined as

θ(D ˆ

N

) = arg min

θ∈Θ

kY − b Y (θ) k

2Λ−1(U ;θ)

+log det Λ(U ; θ) where Y (θ) = Y b − L

−1

(U ; θ)(Y − µ(U; θ)), (29) in which µ, L and Λ are defined in (9) and (11).

Observe that the used criterion function is both input- and θ-dependent via the linear innovation covariance matrices. The log det term is important for the consis- tency of the estimator due to the dependence of the weighting matrix Λ(U ; θ) on θ. As with the OL-QPEM problem, the properties of the OL-predictor imply that the expected value of the criterion function in (29) is minimized at θ

(see, for example, [10, (3.1) and (3.2)]).

The third estimator is based on the OE-predictor and the squared Euclidean norm; we will refer to it as the OE-QPEM (OE-predictor Quadratic PEM) estimator.

Definition 14 (The OE-QPEM estimator) The OE-QPEM estimator is defined as

θ(D ˆ

N

) = arg min

θ∈Θ

kY − b Y (θ) k

2

where Y (θ) = µ(U ; θ), b

(30)

in which µ is defined in (9).

Once more, the expected value of the criterion function in (30) is minimized at θ

, since µ(U ; θ

) is the opti- mal MSE predictor of Y (given zero initial conditions).

Note that the criterion function in (30) can be weighted using a θ-independent positive definite matrix to poten- tially improve the asymptotic properties of the estima- tor. In that case we refer to it as the OE-WQPEM (OE- predictor Weighted Quadratic PEM) estimator.

Definition 15 (The OE-WQPEM estimator) Let M be a given θ-independent bounded positive definite ma- trix. The OE-WQPEM estimator corresponding to M is defined as

θ(D ˆ

N

) = arg min

θ∈Θ

kY − b Y (θ) k

2M

where Y (θ) = µ(U ; θ), b

(31)

in which µ is defined in (9).

In the next two sections, we show that the general

asymptotic theory of the PEMs is applicable to the pro-

posed estimators. The asymptotic results are based on

the original work of Ljung in [39] and Ljung and Caines

in [43] where the dependence structure of the processes

is specified in a generic form in terms of an “exponential

forgetting” hypothesis [38].

(10)

5 Convergence and Consistency

Let us denote the normalized PEM criterion function by V

N

(θ) := 1

N

N

X

t=1

`(e

t

(θ), t; θ), (32)

in which e(θ) is the Prediction Error (PE) process (the difference between the observed and predicted y),

`(e

t

(θ), t; θ) = ke

t

(θ) k

2

for the OL-QPEM and the OE-QPEM,

`(e

t

( θ), t; θ) = e

>t

( θ)Λ

−1t

( U

t

; θ)e

t

( θ) + log det Λ

t

( U

t

; θ) for the OL-GPEM (where e = ε, the linear innovations), and `(e

t

(θ), t; θ) = e

>t

(θ)M

t

e

t

(θ) for the OE-WQPEM where M

t

is a known θ-independent bounded positive definite matrix

3

. The corresponding PEM estimators, defined in Section 4, are then given by

θ ˆ

N

= arg min

θ∈Θ

V

N

(θ), N ∈ N.

The classical asymptotic analysis usually involves the study of the asymptotic behavior of the sequence of cri- terion functions {V

N

(θ) : N ∈ N, θ ∈ Θ} and the use of a compactness assumption on the parameter set Θ to control the corresponding process of global minimizers {ˆ θ

N

: N ∈ N, θ ∈ Θ}. As far as the prediction er- ror framework is concerned, the simplest cases are those involving (quasi-)stationary ergodic processes such that the sequence of criterion functions converges uniformly over Θ to a well-defined deterministic limit; namely,

sup

θ∈Θ

V

N

(θ) − ¯ V(θ)

−→ 0 as N → ∞,

a.s.

(33)

such that the limit ¯ V(θ) is continuous over Θ and has a unique global minimizer θ

. The symbol −→ denotes

a.s.

the almost sure convergence [3]. In general, the limit in (33) depends on the system and the input properties.

Under identifiability conditions and a compactness as- sumption on Θ, it is straightforward to conclude, when (33) holds, that ˆ θ

N

−→ θ

a.s.

= θ

as N → ∞. These are essentially the arguments used in the convergence and consistency proofs in an ergodic environment (see [40, Chapter 8] for the LTI case).

In a general non-stationary environment however, the sequence of criterion functions does not necessarily con- verge to any limit and may very well be divergent. These cases are of interest particularly when the predictors are non-stationary or when the user cannot control the iden- tification experiment to ensure their convergence. Nev- ertheless, it is possible to establish the convergence of the minimizers by showing that V

N

(θ) asymptotically

3 Note that for the OE-WQPEM problem et(θ) = yt− ˆyt(θ), where ˆyt(θ) is the OE-predictor, only when M in Definition 15 is block diagonal. Otherwise, et(θ) = [L−>(Y − ˆY (θ))]t

where L is given by the LDLT factorization of M−1.

behaves like the averaged criterion E [V

N

(θ)] uniformly in θ. This is the main idea of the convergence and con- sistency analysis developed in [37,39]. Below, we discuss sufficient regularity conditions regarding the data, the predictor and the used criterion, given in [39], when ap- plied to the linear PEMs proposed in this paper.

5.1 Conditions on the data generating mechanism For the convergence of PE methods, it is sufficient that the dependence of the moments of y upon the history of the process decays at an exponential rate. It will be as- sumed that Assumption 2 holds, and therefore the terms

“model” and “system” are used interchangeably.

Definition 16 (r-stability) A discrete-time causal dy- namical model of y is said to be r-stable with some r > 1, if for all s, t ∈ Z such that s ≤ t there exist doubly-indexed random variables {y

t,s

: y

t,t

= 0 } such that

(1) y

t,s

is a (measurable) function of {y

k

}

tk=s+1

and independent of {y

k

}

sk=−∞

,

(2) for some positive real numbers c < ∞ and λ < 1, it holds that

E ky

t

− y

t,s

k

r

 < cλ

t−s

. (34) The outputs of r-stable models form a class of stochas- tic processes known as r-mean exponentially stable processes or exponentially forgetting processes of or- der r [38]. Observe that the definition implies that

|E [ky

t

k

r

] | < c ∀t ∈ Z, r > 1, and therefore the out- put of an r-stable model must have a bounded mean.

Generally speaking, the random variables y

t,s

can be in- terpreted as the outputs of the system when the under- lying basic stochastic process ζ is replaced by {ζ

t,s

}

t∈Z

such that {ζ

t,s

}

t<s

are given by a value independent of {ζ

t

}

t<s

, say zero, but ζ

t,s

:= ζ

t

∀t > s. Note that the above definition of stability includes the conventional stability definition of dynamical systems. For example, in the case of LTI rational models, the output process is exponentially stable when all the poles of the model transfer functions are strictly inside the unit circle.

Models of Definition 1 are quite general and need to be restricted for the results to hold. Observe that not every second-order process is exponentially forgetting, even if the associated linear innovation process is independent.

The next proposition clarifies this point.

Proposition 17 Assume that y is a second-order discrete-time stochastic process with independent linear innovations {ε

t

} and no linearly deterministic part; then y is not necessarily exponentially forgetting.

On the other hand, if sup

t

E kε

t

k

4



< ∞ and the se- quences {h

k

(t) : k ∈ N

0

, t ∈ Z} in Wold’s decomposition (A.1) are uniformly exponentially decaying, i.e., there exist positive constants c < ∞ and 0 < λ < 1 such that

|h

k

(t) | < cλ

k

for every k ∈ N

0

and t ∈ Z; then y is an

exponentially forgetting process

4

of order 4.

(11)

PROOF. The first assertion is straightforward; we only need to find an example of a second-order discrete-time stochastic process whose innovations are independent but which is not exponentially stable. Consider for ex- ample the process y

t

:= P

k=1

k

−1

ε

t−k

where {ε

k

} are independent innovations. This is clearly a second-order process that also forgets the remote past, however only linearly. To prove the second part, we use Wold’s de- composition of y assuming zero mean, namely y

t

= P

k=0

h

k

(t)ε

t−k

, with the hypothesis that the sequence {h

k

(t) : k ∈ N

0

, t ∈ Z} is uniformly exponentially de- caying. Using the triangular inequality, it holds that for every t ∈ Z and every n ∈ N

n

X

k=1

h

k

(t)ε

t−k

4

n

X

k=1

kh

k

(t) kkε

t−k

k

!

4

≤ c

4

n

X

k=1

λ

k

t−k

k

!

4

≤ c

4

n

X

k=1

λ

k

!

3 n

X

k=1

λ

k

t−k

k

4

in which we used H¨ older’s inequality ([3, page 85] applied to ( λ

kp

)( λ

kq

t−k

k) with p =

43

, q = 4) for the last im- plication. By applying the expectation operator to both sides and letting N → ∞ we get the inequality

E ky

t

k

4

 ≤ c

4

(1 − λ)

3

X

k=1

λ

k

E kε

t−k

k

4

 . (35)

Finally, by defining y

t,s

:= P

k=0

h

k

(t)ε

t−k,s

such that ε

t,s

= ε

t

for t > s and zero otherwise, (35) and the assumption sup

t

E kε

t

k

4

 < ∞ imply that

E ky

t

− y

t,s

k

4

 ≤ c

4

(1 − λ)

3

X

k=t−s

λ

k

E kε

t−k

k

4



≤ ˜cλ

t−s

, ∀t > s which proves the statement.

More explicit conditions can be given for specific model sets. The next example considers the class of stochastic Wiener models.

Example 18 (Exponentially stable data) Suppose the system is described by the stochastic Wiener model

x

t

= G(q; θ

)u

t

+ H(q; θ

)w

t

,

y

t

= f (x

t

; θ

) + v

t

, t ∈ Z, (36) and suppose that the LTI part of the system is rational and stable; i.e., the poles of G(z; θ

) and H(z; θ

) are strictly inside the unit circle. Furthermore, suppose that w and v are independent and mutually independent white noises with bounded moments of all order.

4 r = 4 is sufficient for the analysis of PEMs, see Lemma 21.

Then, in the light of Proposition 17, we see that x is an exponentially forgetting process. Because the nonlinearity f is static, we only need to guarantee that moments of y are bounded and that f (x; θ

) is exponentially decaying whenever x is exponentially decaying. This is the case when f is a polynomial, or bounded, in x, for example.

5.1.1 Conditions on the predictor

Conditions on the predictors are mainly used to guar- antee that the PE process is exponentially forgetting uniformly in θ. Apart from a differentiability condition with respect to the parameter and a compactness condi- tion on the parameter set, it is required that the remote past observation has little effect on the current output of the predictor and its derivative. From the point view of asymptotic analysis, this means that all the observed outputs, regardless of their order in time, may have a comparable contribution on the choice of the parame- ter. From the practical point of view, this is required for the numerical stability of the minimization procedure.

This reasonable condition means that the used predic- tors should have a stability property.

Definition 19 (Uniformly stable predictors) The predictors {ˆ y

t|t−1

(θ) = ψ(D

t−1

, t; θ), θ ∈ Θ}, where Θ is compact, are said to be uniformly stable if there exist positive real numbers c < ∞ and λ ∈ (0, 1) such that the following conditions hold:

(1) θ 7→ ψ(D

t−1

, t; θ) is continuously differentiable over an open neighborhood of Θ ∀t and for every data set D

t−1

.

(2) kξ(0, t; θ)k ≤ c ∀t, ∀θ in an open neighborhood of Θ, where ξ is used to denote both the predictor function ψ and its derivative with respect to θ, and 0 represents a data set of arbitrary inputs and zero outputs of length t − 1.

(3)

kξ(Dt−1, t; θ) − ξ( ¯Dt−1, t; θ)k ≤ cPt−1

k=0λt−kkyk− ¯ykk

, where θ is in an open neighborhood of Θ, and D

t−1

, D ¯

t−1

are data sets corresponding to arbitrary real- izations, y and ¯ y, of the output, and a fixed arbitrary input u.

First, observe that the OE-predictor is deterministic and depends only on u; therefore, it always satisfies the third condition of the above definition. Moreover, note that the compactness of Θ is part of the definition.

For the OE-predictor and the OL-predictors to be uni- formly stable, it is clear that the parameterization of µ(U ; θ) and Σ(U ; θ) is required to be continuously differ- entiable over Θ; this translates into smoothness condi- tions on the parameterization of the assumed nonlinear model. To check the remaining conditions, we first recall that the predictors have the form (see (10) and (16))

ψ(Dt−1, t; θ) = E[yt; θ]+

t−1

X

k=1

˜lt−k(t, Ut−1; θ) (yk− E[yk; θ]) ,

References

Related documents

This is the published version of a paper published in Arkitektur: byggnad, interiör, plan, landskap.. Citation for the original published paper (version

Surface enrichment of polymer at the free surface in polymer:fullerene blend lms has previously been observed by dynamic secondary ion mass spectrometry (d- SIMS), 19–21

Kursen riktar sig till lärare från samtliga ämnesområden med målet att lära sig mer om hållbar utveckling, att stärka kompetenser för att undervisa om/i hållbar utveckling

Den röst man får istället är en mer indirekt sådan, då den fria versen i första hand är vers för ögat och inte örat: den skiljer sig från prosan endast genom den

Samtidigt behövs stora grepp för att de gröna miljöerna i våra städer ska bli ännu mer gynnsamma för biologisk mångfald inklusive pollinatörer, och för att

Och i flera fall är det ju också så att det blir, faktiskt ses som en liten fjäder i hatten om man jobbar, alltså att det är någonting man faktiskt kan lyfta fram i relation till

Denna tudelning, med formen av en arbetsdelning, genererar ett särskilt förhållande till idén om dikten vars autonomi på en gång utmanas och upprättas – ett förhållande som i

Haukås (2016) studie om lärares uppfattningar om flerspråkighet visar bland annat att lärarna anser att det är viktigt att elever är medvetna om hur de kan dra nytta av sina