Speech and music performance. Parallels and contrasts

(1)

Dept. for Speech, Music and Hearing

Quarterly Progress and Status Report

Speech and music performance. Parallels and contrasts

Carlson, R. and Friberg, A. and Fryd ´en, L.

and Granstr ¨om, B. and Sundberg, J.

journal: STL-QPSR volume: 28

number: 4 year: 1987 pages: 007-023

http://www.speech.kth.se/qpsr

(2)

(3)

(4)

STL-QPSH 4 / 19EI

stress, emphasis e t c . can a l l be s i g n a l e d by one s i n g l e parameter, such a s t h e v o i c e f u n d a m e n t a l f r e q u e n c y . I n t h e same way, t h e d u r a t i o n o f speech sounds is a f f e c t e d by a v a r i e t y o f c o n d i t i o n s i n c l u d i n g stress, p o s i t i o n i n t h e u t t e r a n c e , a n d l o c a l p h o n e t i c c o n t e x t . An e x t e n s i v e r e v i e w o f t h e f a c t o r s t h a t have been found t o i n f l u e n c e t h e d u r a t i o n o f speech s o u n d s c a n b e f o u n d i n a p a p e r b y K l a t t ( 1 9 7 6 ) a n d i n s p e c i a l i s s u e s o f P h o n e t i c a ( 1 9 8 1 ; 1986). A l l o f t h i s is t a k e n i n t o a c c o u n t b y t h e l i s t e n e r i n t h e p e r c e p t u a l decoding process.

The same a p p l i e s t o music. T h e r e are many d i f f e r e n t r e a s o n s t o l e n g t h e n o r s h o r t e n a n o t e beyond i t s n o m i n a l d u r a t i o n a s s p e c i f i e d i n t h e score. Apparently, such p e r t u r b a t i o n s o f t h e nominal d u r a t i o n serve d i f f e r e n t p u r p o s e s , e.g.1 e m p h a s i s , m a r k i n g o f p h r a s e e n d i n g s , a n d sharpening t h e c o n t r a s t b e t w e e n c a t e g o r i e s . T h i s r e s u l t s i n r a t h e r complicated i n t e r a c t i o n s making i t h a r d t o u s e c o n v e n t i o n a l a n a l y t i c methods. A s a consequence, analysis-by-synthesis is a powerful t o o l i n b o t h speech r e s e a r c h and music performance r e s e a r c h .

I n r e c e n t y e a r s , t h e use o f computers h a s l e d t o g r e a t advances i n b o t h s p e e c h a n d m u s i c s c i e n c e s . W i t h t h e h e l p o f f a s t a n a l y s i s t o o l s , new k n o w l e d g e h a s b e e n g a i n e d a n d new m o d e l s s i m u l a t e d . I t i s now p o s s i b l e t o s t u d y t h e a c o u s t i c b e h a v i o r o f a r t i c u l a t o r y m o d e l s o r t o compare d u r a t i o n models t o n a t u r a l recorded speech i n d a t a banks.

Speech r e c o g n i t i o n s y s t e m s have a t t r a c t e d a l o t of r e s e a r c h money and some o f i t h a s b e e n f r u i t f u l l y i n v e s t e d i n b a s i c s p e e c h r e s e a r c h . Text-to-speech p r o g r a m s h a v e b e e n p r o d u c t i v e i n i n c r e a s i n g t h e under- s t a n d i n g o f t h e s p e e c h c o m m u n i c a t i o n p r o c e s s . W i t h t h e h e l p o f m o d e l s formulated a s t r a n s f o r m a t i o n r u l e s t w e h a v e b e e n a b l e t o t e s t o u r c u r r e n t knowledge and t o reject o r a c c e p t ideas.

S i m i l a r l y , o u r work w i t h developing computer g e n e r a t e d music per- formances, b y means o f n o t e - t o - t o n e p r o g r a m s , h a s b e e n r e v e a l i n g a s t o b a s i c a s p e c t s o f music communication. I n t h i s a r t i c l e , w e w i l l a n a l y z e some s i m i l a r i t i e s t h a t w e have o b s e r v e d i n o u r p a r a l l e l w o r k i n g w i t h text-to-speech and note-to-tone programs.

Text-to-speech and Note-to-tone Programs

W e h a v e e a r l i e r r e p o r t e d on t h e l o n g - t e r m e f f o r t t o d e v e l o p h i g h q u a l i t y t e x t - t o - s p e e c h s y s t e m s f o r s e v e r a l l a n g u a g e s ( C a r l s o n & Gran- strom, 1 9 7 5 a ; 1986: C a r l s o n , G r a n s t r o m , & H u n n i c u t t , 1 9 8 2 ) . The ap- proach taken h a s been t o f o r m u l a t e t h e p r o c e s s i n a c o h e r e n t framework.

Che c r i t e r i o n w a s t h a t l i n g u i s t s i n v o l v e d i n c r e a t i n g , r e f i n i n g , a n d

maintaining t h e t e x t - t o - s p e e c h s o f t w a r e s h o u l d b e a b l e t o work w i t h

c o n s t r u c t s and c o n v e n t i o n s f a m i l i a r t o them w i t h o u t n e c e s s a r i l y master-

ing c o n v e n t i o n a l computer prgramming. Consequently, d i s t i n c t i v e fea-

t u r e s a n d phonemes a r e p r i m e s i n o u r s y s t e m . A l s o , t h e r u l e n o t a t i o n

borrows h e a v i l y on t h a t u s e d i n g e n e r a t i v e p h o n o l o g y , a l t h o u g h it i s

expanded t o e a s i l y h a n d l e c o n t i n u o u s v a r i a b l e s s u c h a s s y n t h e s i z e r

(5)

(6)

(7)

STL-QPSR 411987

s t r a i g h t f o r w a r d m a k i n g t h e i n t e r m e d i a t e l e v e l a l m o s t u n n e c e s s a r y . I n o t h e r l a n g u a g e s r l i k e E n g l i s h , t h e r e l a t i o n is q u i t e c o m p l i c a t e d and n o t e n t i r e l y a c c o r d i n g t o a s e t o f c o n v e n t i o n s / r u l e s . I n t h i s case, t h e s p e a k e r o r s y n t h e s i s p r g r a m n e e d s t o r e l y on l e x i c a l i n f o r m a t i o n .

I n a s e n s e , a n e q u i v a l e n t i n t e r m e d i a t e l e v e l o f d e s c r i p t i o n e x i s t s a l s o i n m u s i c , v i z . , when t h e p l a y e r or t h e c o m p o s e r c o m p l e m e n t s t h e s c o r e b y a g r e a t n u m b e r o f a d d i t i o n a l s i g n s , s u c h a s d o t s , d a s h e s , wedges, s l u r s , etc. T h i s t y p e o f score h a s n o t f o r m a l l y b e e n d i s t i n - g u i s h e d from t h e o r t h o g r a p h i c r e p r e s e n t a t i o n a s c l e a r l y a s i n t h e case o f speech. Also, t h i s i n t e r m e d i a t e l e v e l seems more needed i n speech.

T h i s c a n be c o n c l u d e d from t h e f a c t t h a t s p e e c h produced b y t h e c o n c a t e - n a t i o n o f s o u n d s d i r e c t l y c o r r e s p o n d i n g t o t h e l e t t e r s n o t o n l y s o u n d s e x t r e m e l y u n n a t u r a l , b u t is a l s o even p r a c t i c a l l y i m p o s s i b l e t o under- s t a n d ; music produced from a n o m i n a l r e a l i z a t i o n o f t h e n o t e s i g n s , on t h e o t h e r hand, is s t i l l r e c o g n i z a b l e , even though v e r y b o r i n g t o l i s t e n t o .

Q u a n t i z a t i o n

Although b o t h m u s i c a n d l a n g u a g e c a n b e r e p r e s e n t e d b y g r a p h i c a l s i g n s t h a t i n some way c a n be r e g a r d e d as s y m b o l s f o r t h e c o r r e s p o n d i n g a c o u s t i c s i g n a l s , t h e r e l a t i o n s h i p between t h i s g r a p h i c a l r e p r e s e n t a t i o n and t h e sound s i g n a l s d i f f e r s i n o n e i m p o r t a n t respect.

I n t h e m u s i c s c o r e , p i t c h and d u r a t i o n are r e p r e s e n t e d by s y m b o l s a c c o r d i n q t o a s y s t e m o f q u a n t i z e d c a t e g o r i e s . F o r i n s t a n c e , a f o u r t h n o t e is n o m i n a l l y twice a s l o n g a s an e i g h t h n o t e , and a C is a l m o s t 6%

lower i n f u n d a m e n t a l f r e q u e n c y t h a n a C s h a r p .

I n o r t h o g r a p h y , on t h e o t h e r hand, n e i t h e r p i t c h , n o r d u r a t i o n are s p e c i f i e d . I n m o s t l a n g u a g e s i t i s r a t h e r t h e t i m e d e r i v a t i v e o f p i t c h t h a t is p r e d i c t a b l e from t h e o r t h q r a p h y , s u c h a s i n cases o f q u e s t i o n , q u o t a t i o n , a c c e n t , e t c . , o r f r o m t h e l i n g u i s t i c c o n t e n t , s u c h a s i n cases o f f o c u s . I n t h e s o - c a l l e d t o n e l a n q u a g e s r p i t c h is p r e d i c t a b l e w i t h i n non-quantized c a t e g o r ies, s u c h a s h i g h , m i d d l e , low, r i s i n g , and f a l l i n g . D u r a t i o n c a n s o m e t i m e s be p r e d i c t e d q u a l i t a t i v e l y from ortho- g r a p h y , s u c h a s " l o n g " a n d " s h o r t " v o w e l s a n d c o n s o n a n t s . H o w e v e r ,

p r e d i c t i o n o f p h o n e t i c t r a n s c r i p t i o n s o m e t i m e s f a i l s s o t h a t a l e x i c o n is needed i n o r d e r t o a r r i v e a t c o r r e c t word p r o n u n c i a t i o n .

P a r a l l e l s Role o f t h e author/composer

The f a c t t h a t w e c a n f o r m u l a t e r u l e s y s t e m s t h a t g e n e r a t e i n t e l l i -

g i b l e s p e e c h and music p e r f o r m a n c e o f a d e c e n t m u s i c a l q u a l i t y a p p a r e n t -

l y means t h a t t h e a c o u s t i c r e a l i z a t i o n is i m p l i e d i n t h e o r t h g r a p h y and

t h e m u s i c s c o r e . T h i s s u g g e s t s t h a t t h e a u t h o r limits t h e n u m b e r o f

possible a c o u s t i c r e a l i z a t i o n s of h i s t e x t i n a s i m i l a r way a s t h e

composer L i m i t s t h e p o s s i b l e a c o u s t i c r e a l i z a t i o n s o f h i s s c o r e . T h i s is

(8)

STL-QPSR 4/1987

probably an e s s e n t i a l r e q u i r e m e n t on a u s e f u l symbol system, such as t h e orthography and t h e music s c o r e . T h i s s i m i l a r i t y i n d i c a t e s t h a t , i n t h i s r e g a r d , s i m i l a r p r o c e s s e s u n d e r l i e speech and music performance. How- e v e r , t h e f i n a l a c o u s t i c r e a l i z a t i o n is n o t o f t h e same p r i m e c o n c e r n f o r a u t h o r s a s f o r composers. T h i s is a l s o why w r i t i n g s y s t e m s c a n have a more vague r e l a t i o n t o t h e a c o u s t i c r e a l i z a t i o n .

S t r e s s and emphasis

An i m p o r t a n t premise o f t h e comparisons c a r r i e d o u t i n t h e p r e s e n t a r t i c l e is t h e d i f f e r e n c e between emphasis and stress i n speech. While emphasis is c o n t e n t - d e p e n d e n t r t h e d i s t i n c t i o n b e t w e e n s t r e s s e d a n d u n s t r e s s e d i s a word l e v e l phenomenon. T h i s means t h a t e m p h a s i s a n d stress e x i s t a t d i f f e r e n t l e v e l s i n s p e e c h , stress b e i n g a t a l o w e r l e v e l . It seems t h a t an e q u i v a l e n t d i s t i n c t i o n can be made i n music, i n t h a t stress is a p r o p e r t y t h a t is dependent on t h e p o s i t i o n i n t h e b a r , w h i l e emphasis is r a t h e r dependent on h i g h e r l e v e l a s p e c t s of t h e music- a l s t r u c t u r e .

C o m u n i c a t i v e purposes

A: P r e d i c t a b i l i t y and emphasis

I n b o t h s p e e c h a n d m u s i c t m o s t a c o u s t i c e v e n t s a r e more o r l e s s p r e d i c t a b l e . For i n s t a n c e , w e a r e v e r y s k i l l e d i n f i l l i n g g a p s i n t h e a c o u s t i c i n f o r m a t i o n o c c u r r i n g because of i n t e r f e r e n c e w i t h noise. Even

i f a d o o r is b a n g i n g i n t h e m i d d l e o f somebody's s p e e c h , w e c a n m o s t l y hear w h a t t h e p e r s o n i s s a y i n g . I t is o f t e n e v e n h a r d t o t e l l w h i c h speech s o u n d t h a t w a s masked b y t h e n o i s e . Some c l a s s i c a l p e r c e p t u a l s t u d i e s have shown t h a t t h e perceived l o c a t i o n i n t i m e o f a d i s t u r b i n g sound i s j u d g e d t o b e a p l a c e i n t h e c o n v e r s a t i o n w h e r e i t makes a s l i t t l e harm as p o s s i b l e , i.e., c l o s e t o a s y n t a c t i c break.

However, t h e meaning o f an u t t e r a n c e d o e s n o t a l w a y s s u r v i v e a door bang. I n some p l a c e s , t h e s e ~ t e n c e is v u l n e r a b l e and i f i n f o r m a t i o n is l o s t i n such p l a c e s t t h e meaning of t h e s e n t e n c e c o u l d n o t be r e s t o r e d . Worn t h i s , w e c a n c o n c l u d e t h a t p r e d i c t a b i l i t y v a r i e s a l o n g a s p o k e n u t t e r a n c e , a n d w h e r e p r e d i c t a b i l i t y is h i g h , t h e i n f o r m a t i o n f l o w is low, a n d v i c e v e r s a .

A similar reasoning seems a p p l i c a b l e t o music. Mostly w e can com-

plement a m e l o d i c l i n e c o r r e c t l y , e v e n i f o n e n o t e is m i s s i n g . For

i n s t a n c e , d e l e t i n g a passing-note would be c o m p l e t e l y harmless. 0-1 t h e

o t h e r h a n d , t h e r e are a l s o more i m p o r t a n t n o t e s i n a melody. I n t h e

second theme from t h e f i r s t movement o f Schubert's B minor Symphony, D

759, t h e r e i s a m o d u l a t i o n f r o m D m a j o r t o B m a j o r , see Fig. 1. The

modulation is a n n o u n c e d b y a D s h a r p . I f t h i s n o t e is c u t O u t r t h e

harmonic i n t e r p r e t a t i o n of t h e f o l l o w i n g n o t e s w i l l be a f f e c t e d . Thus,

t h i s D s h a r p seems t o be an i n d i s p e n s a b l e n o t e f o r t h e melody. I t seems

obvious t h a t such i m p o r t a n t n o t e s a r e i n a s e n s e i n d i s p e n s a b l e and have

(9)

STL-QPSR 4!198 7

a l o w p r e d i c t a b i l i t y , a n d i t c a n b e a s s u m e d t h a t p r e d i c t a b i l i t y is dependent on t h e i n v e r s e of t h e i n f o r m a t i o n rate.

F i g . 1. Second theme from f i r s t movement of S c h u b e r t ' s Symphony i n B minor, D 7 5 9 . T h e t o p Line of numbers symbolize t h e harmonics i n terms of t h e d i s t a n c e , i n semitones, between t h e r o o t of t h e chord and t h e r o o t of t h e t o n i c .

The p a r a l l e l o c c u r r e n c e o f a t i m e - v a r y i n g p r e d i c t a b i l i t y i n b o t h speech a n d m u s i c i s b y n o means t r i v i a l . I t s e x i s t e n c e i n b o t h s u g g e s t s t h a t , p o s s i b l y , i t r e p r e s e n t s a way o f m e e t i n g a n e s s e n t i a l l i m i t a t i o n o f t h e p e r c e p t i v e s y s t e m . For i n s t a n c e , t h i s s y s t e m may b e i n c a p a b l e o f p r o c e s s i n g s i g n a l s h a v i n g a n i n v a r i a b l y h i g h i n f o r m a t i o n r a t e .

P r e d i c t a b i l i t y o f w o r d s h a s b e e n s t u d i e d b y s e v e r a l a u t h o r s . I n a now c l a s s i c a l s t u d y by Lieberman (1963) , t h e r e l a t i o n s h i p between con- t e x t redundancy and keyword i n t e l l i g i b i l i t y was s t u d i e d . It was shown t h a t p r e d i c t a b i l i t y had a s t r o n g e f f e c t on how c l e a r a word w a s p r o - nounced. T h i s experiment was l a t e r r e p e a t e d by Hunn i c u t t (1985, 1987a).

I n a t e x t - t o - s p e e c h s y s t e m , C o k e r , Umeda, & Browman ( 1 9 7 3 ) i n c l u d e d f a c t o r s a s word f r e q u e n c y a n d r e p e t i t i o n o f e a r l i e r m e n t i o n e d words.

These p a r a m e t e r s added t o t h e n a t u r a l n e s s o f t h e speech q u a l i t y . Simi- lar i d e a s about p r e d i c t a b i l i t y a r e i n c l u d e d i n modern communication a i d s f o r t h e handicapped. I t can be shown t h a t a p r e d i c t i o n program working o n l y o n t h e s u r f a c e s t r u c t u r e c a n p r e d i c t a t l e a s t 50% o f t h e t y p e d l e t t e r i n a running t e x t (Hunnicutt, 1987b).

P r e d i c t a b i l i t y i n s p e e c h is p r e s e n t a t many d i f f e r e n t l e v e l s s u c h a s phoneme s e q u e n c e , c h o i c e o f word e n d i n g s , a n d e v e n s y n t a c t i c c o n s t r u c t s . A t a s t r u c t u r a l l e v e l , w e know t h a t c e r t a i n p h r a s e o r word combinations a r e v e r y p r o b a b l e . I t is e v e n l i k e l y t h a t c e r t a i n word sequences a r e l e x i c a l i z e d j u s t l i k e c e r t a i n s i n g l e words, and t h a t w e p e r c e i v e t h e s e p h r a s e s a s s i n g l e u n i t s .

The v a r y i n g p r e d i c t a b i l i t y i n a s e n t e n c e is s i g n i f i c a n t t o i t s a c o u s t i c a l r e a l i z a t i o n , i.e., t o s p e e c h . I n o r d e r t o make s p e e c h e a s y t o understand and n a t u r a l sounding, it is n e c e s s a r y t o emphasize import- a n t , o r l e s s p r e d i c t a b l e , w o r d s a n d t o d e e m p h a s i z e u n i m p o r t a n t , p r e - d i c ta b l e elements.

The v a r y i n g p r e d i c t a b i l i t y i s s i g n i f i c a n t a l s o t o t h e q u a l i t y o f

music p e r f o r m a n c e . I n o u r r u l e s y s t e m f o r m u s i c p e r f o r m a n c e , w e h a v e

i n t r o d u c e d t h e n o t i o n o f m e l o d i c c h a r g e i n o r d e r t o t a k e i n t o a c c o u n t

(10)

(11)

(12)

STL-QPSR 4/19E7

1970). I n t h e music performance program, p h r a s e and sub-phrase e n d i n g s a r e marked i n t h e i n p u t n o t a t i o n . Then, t h e f i n a l n o t e o f t h e s e t w o c o n s t i t u e n t s a r e marked i n t h e performance. T h i s marking seems t o be an e s s e n t i a l r e q u i r e m e n t on a m u s i c p e r f o r m a n c e (Thompson & a l . , 1 9 8 6 ) . Measurements on music performance s u p p o r t t h e same assumption and also t h a t c o n s t i t u e n t m a r k i n g t a k e s p l a c e a t d i f f e r e n t l e v e l s i n t h e h i e r - a r c h y (Todd, 1 9 8 5 ) .

P h r a s e marking is a n i n s t a n c e o f t h e g e n e r a l p r i n c i p l e o f marking c o n s t i t u e n t s i n a s t r u c t u r e . T h i s p r i n c i p l e i s o f t e n r e f e r r e d t o a s grouping w h i c h seems e s s e n t i a l i n a n y t y p e o f c o m m u n i c a t i o n . I t seems

t h a t c o n s t i t u e n t s a t many d i f f e r e n t l e v e l s are marked. F o r i n s t a n c e , word b o u n d a r i e s a r e narked n o t o n l y i n speech, a s mentioned above, b u t a l s o i n o r t h o g r a p h y w i t h a s p a c e . A l s o , i n u n d e r s t a n d i n g s p e e c h o f a f o r e i g n l a n g u a g e , a m a j o r s t e p i s t o d e t e c t t h e b o u n d a r i e s b e t w e e n words.

The f a c t t h a t c o n s t i t u e n t m a r k i n g a p p e a r s n o t o n l y i n s p e e c h a n d music p e r f o r m a n c e b u t a l s o i n t h e o r t h c g r a p h y a n d t h e m u s i c s c o r e s u g g e s t s t h a t t h e marking o f c o n s t i t u e n t is a paramount demand i n many k i n d s o f inter-human communication.

Choice o f a c o u s t i c code

A s b o t h s p e e c h a n d m u s i c p e r f o r m a n c e u s e a c o u s t i c s i g n a l s f o r communicationt i t is i n t e r e s t i n g t o c o m p a r e t h e c o d e s used. I f s p e e c h and m u s i c p e r f o r m a n c e u s e s i m i l a r c o d e s , t h e u n d e r s t a n d i n g o f m u s i c r e q u i r e s a c o m p e t e n c e w h i c h p a r t l y is t h e s a m e a s t h a t r e q u i r e d f o r understanding speech. Thus, comparing t h e c o d e s w i l l shed some l i g h t on t h e b a s i c r e q u i r e m e n t s f o r understanding music.

Ehphasis

A s mentioned earlier, t h e main c o r r e l a t e s f o r e m p h a s i s i n speech is r e l a t i v e l y g r e a t e r p i t c h c h a n g e s , i n c r e a s e d d u r a t i o n s a n d , t o some e x t e n t , g r e a t e r v o c a l e f f o r t , a s w a s i l l u s t r a t e d i n F i g s . 3 a n d 4. By and l a r g e , a s p i t c h a n d d u r a t i o n i n m u s i c a r e d e c i d e d upon b y t h e composer, much l e s s l e e w a y is l e f t t o t h e p e r f o r m e r t h a n t o a s p e a k e r .

However, w h i l e o u r s e n s i t i v i t y t o d i f f e r e n c e s i n t h e d u r a t i o n o f n o t e s p r e s e n t e d i n i s o l a t i o n i s m o d e s t , t h e s e n s i t i v i t y t o m i n u t e p e r t u r b a - t i o n s o f t h e d u r a t i o n o f a n o t e a p p e a r i n g i n a s e q u e n c e o f n o t e s o f s i m i l a r d u r a t i o n is a s s m a l l a s 1 0 msec ( v a n Noorden, 1975). Thus, b y a r r a n g i n g sequences o f n o t e s o f similar d u r a t i o n , t h e composer seems t o o f f e r t h e p l a y e r t h e p o s s i b i l i t y t o c o m m u n i c a t e e m p h a s i s i n terms o f l e n g t h e n i n g and s h o r t e n i n g o f notes.

S i m i l a r o b s e r v a t i o n s h a v e b e e n made w i t h r e g a r d t o s p e e c h . The

lower boundary f o r p e r c e i v i n g d u r a t i o n a l d i f f e r e n c e s h a s been found t o

be o f t h e o r d e r o f 1 0 msec ( H u g g i n s , 1 9 7 2 ; K l a t t & C o o p e r , 1975). The

j u s t n o t i c e a b l e d i f f e r s n c e h a s been shown t o v a r y w i t h t h e t y p e o f sound

(13)

STL-QPSR 4/:1987 - 17 -

Fig. 3. Fundamental frequency differences between emphatic productions and the averaged neutral production of the Swedish sentence

"Uno belznade gzrden i Boden" (Uno mortgaged the farm in Boden).

The different curves pertain to emphasis on the four main words.

Fig. 4. Durational differences between segments in emphatic and neutral

utterances. The solid curve indicates the difference between

segments in nonempnasized words of emphatic utterances and neu-

tral production. The other curves pertain to the increase in

duration of the words when pronounced emphatically.

(14)

STL-QPSR 4 / 19817

and its p h o n e t i c c o n t e x t ( F u j i s a k i , Nakamura, & Imoto, 1975; Carlson &

Granstrom 1975b).

Musicians seem t o t a k e a d v a n t a g e o f t h i s o p p o r t u n i t y . A s was men- t i o n e d a b o v e , m e l o d i c c h a r g e is a p r o p e r t y o f a n o t e t h a t r e f l e c t s t h e remarkableness o f t h e n o t e . S i m i l a r l y , t h e h a r m o n i c c h a r g e seems t o r e f l e c t t h e r e m a r k a b l e n e s s o f a chord. Remarkableness seems t o c a l l f o r emphasis i n t h e performance.

According t o t h e performance r u l e s , a h i g h melodic c h a r g e is marked by i n c r e a s e s i n sound l e v e l , v i b r a t o e x t e n t , and d u r a t i o n . The method is s t r a i g h t f o r w a r d ; t h e i n c r e m e n t o f s o u n d l e v e l v i b r a t o e x t e n t , a n d d u r a t i o n is c a l c u l a t e d a s a c o n s t a n t times t h e m e l o d i c c h a r g e o f t h e i n d i v i d u a l n o t e . The r e s u l t is t h a t n o t e s w i t h a h i g h m e l o d i c c h a r g e sound emphasized. S i m i l a r l y , i n c r e a s e s i n harmonic c h a r g e g e n e r a t e cre- scendos, a n d t h e a s s o c i a t e d i n c r e m e n t s i n s o u n d l e v e l a r e u s e d f o r c a l c u l a t i n g t h e i n c r e a s e s i n d u r a t i o n a n d v i b r a t o e x t e n t . A c c o r d i n g t o formal l i s t e n i n g e x p e r i m e n t s w i t h m u s i c a l l y t r a i n e d s u b j e c t s , t h e music- a l q u a l i t y o f a p e r f o r m a n c e i s r a i s e d i f m e l o d i c c h a r g e is marked i n t h i s way (Thompson & a l . , 1986).

These examples seem t o i n d i c a t e t h a t e m p h a s i s is s i g n a l e d by adding d u r a t i o n r e s u l t i n g i n a slowing o f t h e tempo. The s i m i l a r i t y w i t h speech is o b v i o u s . A l s o , t h i s s l o w i n g down o f t h e t e m p o seems p e r c e p t u a l l y adequate; t h e l i s t e n e r is g i v e n more time t o p r o c e s s t h e u n e x p e c t e d i n £ ormat i o n .

Melodic c h a r g e and i n c r e a s e s i n harmonic c h a r g e are a l s o r e f l e c t e d i n t h e v i b r a t o e x t e n t , a s mentioned. Here, t h e parallel w i t h speech is less obvious, b u t t h e f o l l o w i n g s p e c u l a t i o n is tempting. The p e r c e p t i v e system seems v e r y s e n s i t i v e to changes, eq., i n p i t c h , and one e m p h a s i s marker i n s p e e c h is p i t c h c h a n g e . V i b r a t o a c t u a l l y i n c r e a s e s t h e r a t e of c h a n g e o f f u n d a m e n t a l f r e q u e n c y , t h o u g h w i t h o u t c h a n g i n g t h e mean p e r c e i v e d p i t c h . I n t h i s , p e r s p e c t i v e v i b r a t o c o u l d b e s e e n a s a n ele- g a n t way o f e x p l o i t i n g p i t c h c h a n g e f o r e x p r e s s i v e p u r p o s e s w i t h o u t changing t h e melodic p a t t e r n s .

C o n s t i t u e n t marking

I n s p e e c h a s i n m u s i c p e r f o r m a n c e , i t is n e c e s s a r y t o m a r k s t r u c t u r a l c o n s t i t u e n t s a t d i f f e r e n t l e v e l s . I n speech, t h e most appar- e n t e x a m p l e i s t h e p h r a s e a n d c l a u s e e n d i n g w h i c h is s i g n a l e d by a l e n g t h e n i n g o f t h e l a s t s y l l a b l e o r s y l l a b l e s . T h i s way o f a n n o u n c i n g t h e e n d i n g o f a c o n s t i t u e n t is common t o most l a n g u a g e s ( L i n d b l o m , 1979). A c t u a l l y , f i n a l l e n g t h e n i n g is so i m p o r t a n t f o r b o t h Swedish and E n g l i s h t h a t s p e e c h s y n t h e s i z e d w i t h o u t s u c h a r u l e is p e r c e i v e d a s a c c e l e r a t i n g a t t h e e n d o f e a c h c l a u s e .

I n s p e e c h , c o n s t i t u e n t s o f many s i z e s , f r o m p a r a g r a p h s t o w o r d s ,

are marked, n o t o n l y w i t h d u r a t i o n b u t a l s o w i t h o t h e r p a r a m e t e r s like

i n t o n a t i o n and v o c a l s o u r c e s e t t i n g s . Al~ot micro-pauses a r e i n t r o d u c e d

(15)

STL-QPSR 4/.1987

a t major s y n t a c t i c b r e a k s even i f t h e r e is no need f o r b r e a t h i n g , eq., a f t e r w o r d s f o l l o w e d b y a p e r i o d , a comma, o r a s e m i c o l o n . D u r a t i o n a l d a t a on t h e s e e f f e c t s have been r e p o r t e d f o r s e v e r a l languages and are a l s o f o r m u l a t e d i n t o c o h e r e n t r u l e systems.

P r o s o d i c models have an o b v i o u s importance i n t h e g e n e r a l d e s c r i p t i o n o f languages and f i n d a p p l i c a t i o n i n text-to-speech systems. Sev- eral o f t h e s e m o d e l s h a v e b e e n t e s t e d b y C a r l s o n , G r a n s t r o m , ^& K l a t t (1979)

¹

and t h e more e l a b o r a t e models g i v e advantages b o t h i n n a t u r a l - n e s s and i n t e l l i g i b i l i t y .

One d u r a t i o n a l d e s c r i p t i o n o f S w e d i s h is h i s t o r i c a l l y b a s e d on a tree s t r u c t u r e o f a s e n t e n c e w i t h p h r a s e b o u n d a r i e s a n d s y l l a b l e s on s e p a r a t e branches. Support f o r t h i s model w a s found i n r e i t e r a n t speech (Lindblom & Rapp, 1973; C a r l s o n & G r a n s t r o m , 1 9 7 3 ) . The model had a c y c l i c f i n a l l e n g t h e n i n g r u l e t h a t i n c r e a s e d t h e e f f e c t , t h e h i g h e r p e r c e p t u a l i m p o r t a n c e a b o u n d a r y had. A s i m i l a r model h a s b e e n f o u n d e x t r e m e l y p r o d u c t i v e i n d e s c r i b i n g t i m i n g d a t a from p i a n o music perform- ance (Todd, 1 9 8 3 ) .

Another t y p e o f p r o s o d i c model f o r s p e e c h is b a s e d on a g e n e r a l s t r u c t u r e proposed by K l a t t (1979). The r u l e s have as i n p u t t h e i n h e r e n t d u r a t i o n , which is t h e t y p i c a l d u r a t i o n of t h e phoneme i n a w o r d - i n i t i a l p o s i t i o n b e f o r e a s t r e s s e d v o w e l . The s e c o n d p a r a m e t e r is t h e m i n i m a l d u r a t i o n , which is a measure of t h e phoneme's c o m p r e s s i b i l i t y . F i n a l l y , a c o r r e c t i o n f a c t o r is u s e d t o c a l c u l a t e t h e d u r a t i o n . T h i s f a c t o r is set depending on l o c a l and g l o b a l parameters. T h i s model h a s proven t o b e a g o o d model t o d e s c r i b e d u r a t i o n e f f e c t s i n r u n n i n g s p e e c h . The e x p e r i e n c e s f r o m t h e m u s i c p e r f o r m a n c e p r o g r a m r e v e a l t h a t a s i m i l a r model, i n c l u d i n g r e s t r i c t i o n s on c o m p r e s s i b i l i t y , would b e p r o d u c t i v e a l s o i n music performance.

There seems t o b e a c o n t r a d i c t i o n b e t w e e n t h e s e t w o m o d e l s f o r d e s c r i b i n g d u r a t i o n phenomena. The f i r s t model is probably more s u i t e d f o r w e l l p r e p a r e d r e a d i n g o f t e x t w i t h a h i g h amount o f s p e e c h p r e - planning w h i l e t h e o t h e r is t y p i c a l o f less planned speech w i t h r u l e s o f a more l o c a l nature.

According t o Todd ( 1 9 8 5 ) , t h e e n d i n g o f a p h r a s e is o f t e n p l a y e d

w i t h a s m a l l r i t a r d w h i l e t h e b e g i n n i n g is p l a y e d w i t h a small

accelerando. I t i s w e l l known t h a t t h e l a s t n o t e s o f a p i e c e o f t e n are

played w i t h a f i n a l r i t a r d ( S u n d b e r g & V e r r i l l o , 1980). I n t h e m u s i c

performance program, t h e l a s t n o t e of a phrase is lengthened by 40 msec

and t e r m i n a t e d by a micro-pause. A sub-phrase t e r m i n a t i o n is marked by

a micro-pause only. I n a d d i t i o n , t h e r e are a number of o t h e r r u l e s t h a t

s h o r t e n a n d l e n g t h e n n o t e s d e p e n d i n g on t h e c o n t e x t , a n d t h e s e r u l e s

sometimes seem t o s e r v e t h e p u r p o s e o f c o n s t i t u e n t m a r k i n g . For i n -

s t a n c e , i n combination w i t h t h e marker o f phrase ending, t h e y a c t u a l l y

sometimes g e n e r a t e s m a l l r i t a r d s a t phrase endings.

(16)

STL-QPSR 4/1387

Why is t h e c o d e f o r m a r k i n g s t r u c t u r a l c o n s t i t u e n t s s i m i l a r i n speech and music performance? A tempting h y p o t h e s i s is t h a t t h e code i n music i s i m p o r t e d f r o m s p e e c h i n t h i s r e g a r d ; a s a l l m u s i c l i s t e n e r s have a c q u i r e d a c o m p e t e n c e i n d e c o d i n g s p e e c h , it would b e s a f e t o u s e t h e same c o d e i n m u s i c p e r f o r m a n c e . However, some l a n g u a g e s , e.g., Danish, d o n o t u s e f i n a l l e n g t h e n i n g , a n d , y e t , m u s i c i a n s f r o m t h e s e c o u n t r i e s a r e o b v i o u s l y q u i t e as competent m u s i c i a n s a s t h e i r c o l l e a g u e s from o t h e r c o u n t r i e s . T h i s shows t h a t t h e code used i n music perform- ance may n o t b e b o r r o w e d f r o m s p e e c h , b u t m i g h t l e a n on o t h e r k i n d s o f common e x p e r ience.

As f a r a s t h e f i n a l r i t a r d is concerned, t h e r e is a s t r i k i n g s i m i - l a r i t y w i t h t h e d e c r e a s i n g r a t e o f f o o t s t e p s i n a s t o p p i n g r u n n e r who keeps t h e s t e p l e n g t h a n d t h e b r a k i n g f o r c e c o n s t a n t t h r o u g h o u t t h e s t o p p i n g p r o c e s s (Kronman ^& Sundberg , 1987). Under t h e s e c o n d i t i o n s , t h e slowing down o f t h e f o o t s t e p s f o l l o w s t h e same c u r v e a s t h e a v e r a g e r i t a r d i n motor music from t h e baroque e r a . Thus, t h e f i n a l lengthening seems t o a l l u d e t o a well-known e x p e r i e n c e , n a m e l y t h a t o f s t o p p i n g locomotion. W e may s p e c u l a t e t h a t a l s o t h e f i n a l l e n g t h e n i n g i n p h r a s e e n d i n g s a r e f a i n t a l l u s i o n s t o locomotion. I f SO, t h e code would be v e r y r o b u s t i n t h e s e n s e t h a t anybody a c q u a i n t e d w i t h locomotion is l i k e l y t o know t h e code.

Outlook

W e can d i s c e r n two a p p a r e n t l y v e r y b a s i c p r i n c i p l e s used i n speech and m u s i c p e r f o r m a n c e . One is t h e e m p h a s i s w h i c h is c a l l e d f o r b y t h e varying p r e d i c t a b i l i t y . I n s p e e c h , p r e d i c t a b i l i t y would s e r v e t h e purpose o f making t h e message robust. I n music performance, on t h e o t h e r hand, t h i s may o r may n o t be t h e purpose; w h i l e speech is of t e n r e q u i r e d t o f u n c t i o n a l s o i n n o i s y environment, music is l i k e l y t o be performed i n l e s s d i s t u r b e d s i t u a t i o n s . I n a n y e v e n t , i t seems l i k e l y t h a t i t is t h e c o g n i t i v e system t h a t a s k s f o r varying d e g r e e s o f emphasis. Perhaps, t h i s system cannot d i g e s t long series o f e q u a l l y unexpected e l e m e n t s i n communication.

Another common b a s i c p r i n c i p l e i n speech and music performance is c o n s t i t u e n t marking. The p a r t s t h a t c o n s t i t u t e b l o c k s i n t h e s t r u c t u r e a r e marked i n t h e a c o u s t i c r e a l i z a t i o n , e.g., p h r a s e s a n d c l a u s e s i n speech a n d p h r a s e s a n d s u b - p h r a s e s i n music. T h i s a p p e a r s t o r e f l e c t a requirement o f t h e c o g n i t i v e s y s t e m , a n d is o f t e n r e f e r r e d t o a s t h e p r i n c i p l e o f grouping.

Why a l l t h e s e numerous p a r a l l e l s , what d o t h e y imply? The p a r a l -

l e l s are n o t a s t o n i s h i n g . Both speech and music are examples of formal-

i z e d inter-human communication by means o f a c o u s t i c s i g n a l s . Both must

be d e v i s e d f o r t h e same p e r c e p t i v e a n d c o g n i t i v e s y s t e m s . The l i m i t a -

t i o n s and c a p a b i l i t i e s of t h e s e s y s t e m s must c o n t r i b u t e i m p o r t a n t l y t o

t h e development of b o t h speech and music.

(17)

STL-QPSR ₄₁ 198"I

Acknowledgments

The speech p a r t of t h e work r e p o r t e d i n t h i s paper was s u p p o r t e d by The Swedish Board f o r Technical Development (STU) C o n t r a c t No. 84-3667 and t h e music performance p a r t by t h e Bank o f Sweden Tercentenary Found- a t i o n , C o n t r a c t 84/171.

References

erns stein L. ( 1976) : The Unanswered Quest i o n , The MIT P r e s s , Cambridge MA.

Carlsont R., E r i k s o n , Y., G r a n s t r o m , B.1 L i n d b l o m , B., & Rapp, K.

(1975): " N e u t r a l a n d e m p h a t i c stress p a t t e r n s i n S w e d i s h " pp. 209-218 i n (G. Fant, ed.) Speech Communicationl Vol. 2, Almqvist & W i k s e l l Int., Stockholm.

Carlson, R. & G r a n s t r o m , B. ( 1 9 7 3 ) : "Word a c c e n t , e m p h a t i c stress, a n d syntax i n a s y n t h e s i s b y r u l e s c h e m e f o r S w e d i s h " , STL-QPSR 2-3/1973, pp. 31-36.

Carlson, R. & G r a n s t r o m , B. ( 1 9 7 5 a ) : " A p h o n e t i c a l l y o r i e n t e d p r o - gramming l a n g u a g e f o r r u l e d e s c r i p t i o n o f s p e e c h " , pp. 245-253 i n (G.

Fant, ed.) Speech Communicationt Vol. 21 Almqvist & W i k s e l l t Stockholm.

Carlson, R. & G r a n s t r o m , B. (197513) : " P e r c e p t i o n o f s e g m e n t d u r a t i o n " , pp. 90-106 i n (A. Cohen & S. Nooteboom, e d s . ) S t r u c t u r e a n d P r o c e s s i n Speech P e r c e p t i o n , S p r i n g e r Verlag Heidelberg.

Carlson, R & Granstrorn, B. (1986): " L i n g u i s t i c processing i n t h e KTH m u l t i - l i n g u a l t e x t - t o - s p e e c h s y s t e m " , pp. 2403-2406 i n C o n f e r e n c e R e - c o r d , IEEE-ICASSP

^t

TO kyo.

Carlson, R., G r a n s t r o m , B., & H u n n i c u t t , S. ( 1 9 8 2 ) : "A m u l t i - l a n g u a g e text-to-speech module"t pp. 1604-1607 i n P r o c . ICASSP , P a r is Vol. 3.

Carlson, R., G r a n s t r o m , B., & K l a t t , D.K. ( 1 9 7 9 ) : "Some n o t e s on t h e p e r c e p t i o n o f temporal p a t t e r n s i n speech", pp. 233-244 i n (9. Lindblom

& S. ohman, eds.) F r o n t i e r s i n Speech Communication Research, Academic

P r e s s , London.

Coker, C.H., Umeda, N., & Browman, C.P. ( 1 9 7 3 ) : " A u t o m a t i c s y n t h e s i s from t e x t " , IEEE Trans. Audio E l e c t r o a c o u s t . AU-21, pp. 293-297.

m i b e r g , A., S u n d b e r g , J. & Fryde'n, L. ( 1 9 8 7 ) : "How t o t e r m i n a t e a phrase. An a n a l y s i s - b y - s y n t h e s i s e x p e r i m e n t on a p e r c e p t u a l a s p e c t o f music p e r f o r m a n c e " , pp. 49-55 i n (A. G a b r i e l s s o n , ed.) A c t i o n a n d P e r - c e p t i o n on Rhythm and Music, Publ. i s s u e d b y t h e Royal S w e d i s h Academy of Music No. 55, Stockholm.

F u j i s a k i , H.1 Nakamura, K., & I m o t o , T. ( 1 9 7 5 ) : " A u d i t o r y p e r c e p t i o n o f d u r a t i o n of speech and non-speech s t i m u l i " , pp. 197-200 i n (G. Fant & M.

Tatham, e d s . ) A u d i t o r y A n a l y s i s a n d P e r c e p t i o n o f S p e e c h , Academic

P r e s s , London.

(18)

(19)

STL-QPSR 4 / . i 9 8 7

van Noorden, L., P., A., S. ( 1 9 7 5 ) : T e m p o r a l C o h e r e n c e i n t h e P e r - c e ~ t i o n o f Tone Seauences, d i s s . Techn. Univ. Eindhoven

Sundberg , J. (1978) : " S y n t h e s i s of s i n g i n g " Svensk T i d s k r i f t f o r Musik- f o r s k n i n g (Swed.J. M u s i c o l o g y ) - 60:1, pp. 107-112.

Sundberg, J. ( f o r t h c o m i n g ) : " S y n t h e s i s o f s i n g i n g u s i n g a c o m p u t e r c o n t r o l l e d f o r m a n t s y n t h e s i z e r " m a n u s c r i p t t o b e p u b l i s h e d b y Music Department S t a n f o r d University.

Sundberg, J. & Lindblom, B. (1976): "Generative t h e o r i e s i n language and music d e s c r i p t i o n s " Cognition - 4, pp. 99-122.

Sundbergt J. & V e r r i l l o , V. ( 1 9 8 0 ) : "On t h e a n a t o m y o f t h e r e t a r d : A s t u d y o f t i m i n g i n m u s i c " r J.Acoust.Soc.Am. - 6 8 , pp. 772-779.

Sundberg J./ A s k e n f e l t , A., & F r y d h , L. ( 1 9 8 3 ) : " M u s i c a l p e r f o r m a n c e : A synthesis-by-rule approach", Computer Music J. - 71 pp. 37-43.

Thompson, W.F., F r i b e r g , A.1 F r y d i n t

^L^.¹

& S u n d b e r g , J. ( 1 9 8 6 ) : "Evalu- a t i n g r u l e s f o r t h e s y n t h e t i c p e r f o r m a n c e o f me lo die^"^ STL-QPSR 2- 3/1986, pp. 27-44.

Todd, ^N. ( 1 9 8 5 ) : " A model o f e x p r e s s i v e t i m i n g i n t o n a l m u s i c " , Music P e r c e p t i o n -

³¹

pp. 33-57.

Winograd, T. ( 1 9 6 8 ) : " L i n g u i s t i c s a n d t h e c o ~ n p u t e r a n a l y s i s o f t o n a l

harmony", J. Music T h e o r y - 121 pp. 2-49.