• No results found

Speech and music performance. Parallels and contrasts

N/A
N/A
Protected

Academic year: 2022

Share "Speech and music performance. Parallels and contrasts"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

Dept. for Speech, Music and Hearing

Quarterly Progress and Status Report

Speech and music performance. Parallels and contrasts

Carlson, R. and Friberg, A. and Fryd ´en, L.

and Granstr ¨om, B. and Sundberg, J.

journal: STL-QPSR volume: 28

number: 4 year: 1987 pages: 007-023

http://www.speech.kth.se/qpsr

(2)
(3)
(4)

STL-QPSH 4 / 19EI

stress, emphasis e t c . can a l l be s i g n a l e d by one s i n g l e parameter, such a s t h e v o i c e f u n d a m e n t a l f r e q u e n c y . I n t h e same way, t h e d u r a t i o n o f speech sounds is a f f e c t e d by a v a r i e t y o f c o n d i t i o n s i n c l u d i n g stress, p o s i t i o n i n t h e u t t e r a n c e , a n d l o c a l p h o n e t i c c o n t e x t . An e x t e n s i v e r e v i e w o f t h e f a c t o r s t h a t have been found t o i n f l u e n c e t h e d u r a t i o n o f speech s o u n d s c a n b e f o u n d i n a p a p e r b y K l a t t ( 1 9 7 6 ) a n d i n s p e c i a l i s s u e s o f P h o n e t i c a ( 1 9 8 1 ; 1986). A l l o f t h i s is t a k e n i n t o a c c o u n t b y t h e l i s t e n e r i n t h e p e r c e p t u a l decoding process.

The same a p p l i e s t o music. T h e r e are many d i f f e r e n t r e a s o n s t o l e n g t h e n o r s h o r t e n a n o t e beyond i t s n o m i n a l d u r a t i o n a s s p e c i f i e d i n t h e score. Apparently, such p e r t u r b a t i o n s o f t h e nominal d u r a t i o n serve d i f f e r e n t p u r p o s e s , e.g.1 e m p h a s i s , m a r k i n g o f p h r a s e e n d i n g s , a n d sharpening t h e c o n t r a s t b e t w e e n c a t e g o r i e s . T h i s r e s u l t s i n r a t h e r complicated i n t e r a c t i o n s making i t h a r d t o u s e c o n v e n t i o n a l a n a l y t i c methods. A s a consequence, analysis-by-synthesis is a powerful t o o l i n b o t h speech r e s e a r c h and music performance r e s e a r c h .

I n r e c e n t y e a r s , t h e use o f computers h a s l e d t o g r e a t advances i n b o t h s p e e c h a n d m u s i c s c i e n c e s . W i t h t h e h e l p o f f a s t a n a l y s i s t o o l s , new k n o w l e d g e h a s b e e n g a i n e d a n d new m o d e l s s i m u l a t e d . I t i s now p o s s i b l e t o s t u d y t h e a c o u s t i c b e h a v i o r o f a r t i c u l a t o r y m o d e l s o r t o compare d u r a t i o n models t o n a t u r a l recorded speech i n d a t a banks.

Speech r e c o g n i t i o n s y s t e m s have a t t r a c t e d a l o t of r e s e a r c h money and some o f i t h a s b e e n f r u i t f u l l y i n v e s t e d i n b a s i c s p e e c h r e s e a r c h . Text-to-speech p r o g r a m s h a v e b e e n p r o d u c t i v e i n i n c r e a s i n g t h e under- s t a n d i n g o f t h e s p e e c h c o m m u n i c a t i o n p r o c e s s . W i t h t h e h e l p o f m o d e l s formulated a s t r a n s f o r m a t i o n r u l e s t w e h a v e b e e n a b l e t o t e s t o u r c u r r e n t knowledge and t o reject o r a c c e p t ideas.

S i m i l a r l y , o u r work w i t h developing computer g e n e r a t e d music per- formances, b y means o f n o t e - t o - t o n e p r o g r a m s , h a s b e e n r e v e a l i n g a s t o b a s i c a s p e c t s o f music communication. I n t h i s a r t i c l e , w e w i l l a n a l y z e some s i m i l a r i t i e s t h a t w e have o b s e r v e d i n o u r p a r a l l e l w o r k i n g w i t h text-to-speech and note-to-tone programs.

Text-to-speech and Note-to-tone Programs

W e h a v e e a r l i e r r e p o r t e d on t h e l o n g - t e r m e f f o r t t o d e v e l o p h i g h q u a l i t y t e x t - t o - s p e e c h s y s t e m s f o r s e v e r a l l a n g u a g e s ( C a r l s o n & Gran- strom, 1 9 7 5 a ; 1986: C a r l s o n , G r a n s t r o m , & H u n n i c u t t , 1 9 8 2 ) . The ap- proach taken h a s been t o f o r m u l a t e t h e p r o c e s s i n a c o h e r e n t framework.

Che c r i t e r i o n w a s t h a t l i n g u i s t s i n v o l v e d i n c r e a t i n g , r e f i n i n g , a n d

maintaining t h e t e x t - t o - s p e e c h s o f t w a r e s h o u l d b e a b l e t o work w i t h

c o n s t r u c t s and c o n v e n t i o n s f a m i l i a r t o them w i t h o u t n e c e s s a r i l y master-

ing c o n v e n t i o n a l computer prgramming. Consequently, d i s t i n c t i v e fea-

t u r e s a n d phonemes a r e p r i m e s i n o u r s y s t e m . A l s o , t h e r u l e n o t a t i o n

borrows h e a v i l y on t h a t u s e d i n g e n e r a t i v e p h o n o l o g y , a l t h o u g h it i s

expanded t o e a s i l y h a n d l e c o n t i n u o u s v a r i a b l e s s u c h a s s y n t h e s i z e r

(5)
(6)
(7)

STL-QPSR 411987

s t r a i g h t f o r w a r d m a k i n g t h e i n t e r m e d i a t e l e v e l a l m o s t u n n e c e s s a r y . I n o t h e r l a n g u a g e s r l i k e E n g l i s h , t h e r e l a t i o n is q u i t e c o m p l i c a t e d and n o t e n t i r e l y a c c o r d i n g t o a s e t o f c o n v e n t i o n s / r u l e s . I n t h i s case, t h e s p e a k e r o r s y n t h e s i s p r g r a m n e e d s t o r e l y on l e x i c a l i n f o r m a t i o n .

I n a s e n s e , a n e q u i v a l e n t i n t e r m e d i a t e l e v e l o f d e s c r i p t i o n e x i s t s a l s o i n m u s i c , v i z . , when t h e p l a y e r or t h e c o m p o s e r c o m p l e m e n t s t h e s c o r e b y a g r e a t n u m b e r o f a d d i t i o n a l s i g n s , s u c h a s d o t s , d a s h e s , wedges, s l u r s , etc. T h i s t y p e o f score h a s n o t f o r m a l l y b e e n d i s t i n - g u i s h e d from t h e o r t h o g r a p h i c r e p r e s e n t a t i o n a s c l e a r l y a s i n t h e case o f speech. Also, t h i s i n t e r m e d i a t e l e v e l seems more needed i n speech.

T h i s c a n be c o n c l u d e d from t h e f a c t t h a t s p e e c h produced b y t h e c o n c a t e - n a t i o n o f s o u n d s d i r e c t l y c o r r e s p o n d i n g t o t h e l e t t e r s n o t o n l y s o u n d s e x t r e m e l y u n n a t u r a l , b u t is a l s o even p r a c t i c a l l y i m p o s s i b l e t o under- s t a n d ; music produced from a n o m i n a l r e a l i z a t i o n o f t h e n o t e s i g n s , on t h e o t h e r hand, is s t i l l r e c o g n i z a b l e , even though v e r y b o r i n g t o l i s t e n t o .

Q u a n t i z a t i o n

Although b o t h m u s i c a n d l a n g u a g e c a n b e r e p r e s e n t e d b y g r a p h i c a l s i g n s t h a t i n some way c a n be r e g a r d e d as s y m b o l s f o r t h e c o r r e s p o n d i n g a c o u s t i c s i g n a l s , t h e r e l a t i o n s h i p between t h i s g r a p h i c a l r e p r e s e n t a t i o n and t h e sound s i g n a l s d i f f e r s i n o n e i m p o r t a n t respect.

I n t h e m u s i c s c o r e , p i t c h and d u r a t i o n are r e p r e s e n t e d by s y m b o l s a c c o r d i n q t o a s y s t e m o f q u a n t i z e d c a t e g o r i e s . F o r i n s t a n c e , a f o u r t h n o t e is n o m i n a l l y twice a s l o n g a s an e i g h t h n o t e , and a C is a l m o s t 6%

lower i n f u n d a m e n t a l f r e q u e n c y t h a n a C s h a r p .

I n o r t h o g r a p h y , on t h e o t h e r hand, n e i t h e r p i t c h , n o r d u r a t i o n are s p e c i f i e d . I n m o s t l a n g u a g e s i t i s r a t h e r t h e t i m e d e r i v a t i v e o f p i t c h t h a t is p r e d i c t a b l e from t h e o r t h q r a p h y , s u c h a s i n cases o f q u e s t i o n , q u o t a t i o n , a c c e n t , e t c . , o r f r o m t h e l i n g u i s t i c c o n t e n t , s u c h a s i n cases o f f o c u s . I n t h e s o - c a l l e d t o n e l a n q u a g e s r p i t c h is p r e d i c t a b l e w i t h i n non-quantized c a t e g o r ies, s u c h a s h i g h , m i d d l e , low, r i s i n g , and f a l l i n g . D u r a t i o n c a n s o m e t i m e s be p r e d i c t e d q u a l i t a t i v e l y from ortho- g r a p h y , s u c h a s " l o n g " a n d " s h o r t " v o w e l s a n d c o n s o n a n t s . H o w e v e r ,

p r e d i c t i o n o f p h o n e t i c t r a n s c r i p t i o n s o m e t i m e s f a i l s s o t h a t a l e x i c o n is needed i n o r d e r t o a r r i v e a t c o r r e c t word p r o n u n c i a t i o n .

P a r a l l e l s Role o f t h e author/composer

The f a c t t h a t w e c a n f o r m u l a t e r u l e s y s t e m s t h a t g e n e r a t e i n t e l l i -

g i b l e s p e e c h and music p e r f o r m a n c e o f a d e c e n t m u s i c a l q u a l i t y a p p a r e n t -

l y means t h a t t h e a c o u s t i c r e a l i z a t i o n is i m p l i e d i n t h e o r t h g r a p h y and

t h e m u s i c s c o r e . T h i s s u g g e s t s t h a t t h e a u t h o r limits t h e n u m b e r o f

possible a c o u s t i c r e a l i z a t i o n s of h i s t e x t i n a s i m i l a r way a s t h e

composer L i m i t s t h e p o s s i b l e a c o u s t i c r e a l i z a t i o n s o f h i s s c o r e . T h i s is

(8)

STL-QPSR 4/1987

probably an e s s e n t i a l r e q u i r e m e n t on a u s e f u l symbol system, such as t h e orthography and t h e music s c o r e . T h i s s i m i l a r i t y i n d i c a t e s t h a t , i n t h i s r e g a r d , s i m i l a r p r o c e s s e s u n d e r l i e speech and music performance. How- e v e r , t h e f i n a l a c o u s t i c r e a l i z a t i o n is n o t o f t h e same p r i m e c o n c e r n f o r a u t h o r s a s f o r composers. T h i s is a l s o why w r i t i n g s y s t e m s c a n have a more vague r e l a t i o n t o t h e a c o u s t i c r e a l i z a t i o n .

S t r e s s and emphasis

An i m p o r t a n t premise o f t h e comparisons c a r r i e d o u t i n t h e p r e s e n t a r t i c l e is t h e d i f f e r e n c e between emphasis and stress i n speech. While emphasis is c o n t e n t - d e p e n d e n t r t h e d i s t i n c t i o n b e t w e e n s t r e s s e d a n d u n s t r e s s e d i s a word l e v e l phenomenon. T h i s means t h a t e m p h a s i s a n d stress e x i s t a t d i f f e r e n t l e v e l s i n s p e e c h , stress b e i n g a t a l o w e r l e v e l . It seems t h a t an e q u i v a l e n t d i s t i n c t i o n can be made i n music, i n t h a t stress is a p r o p e r t y t h a t is dependent on t h e p o s i t i o n i n t h e b a r , w h i l e emphasis is r a t h e r dependent on h i g h e r l e v e l a s p e c t s of t h e music- a l s t r u c t u r e .

C o m u n i c a t i v e purposes

A: P r e d i c t a b i l i t y and emphasis

I n b o t h s p e e c h a n d m u s i c t m o s t a c o u s t i c e v e n t s a r e more o r l e s s p r e d i c t a b l e . For i n s t a n c e , w e a r e v e r y s k i l l e d i n f i l l i n g g a p s i n t h e a c o u s t i c i n f o r m a t i o n o c c u r r i n g because of i n t e r f e r e n c e w i t h noise. Even

i f a d o o r is b a n g i n g i n t h e m i d d l e o f somebody's s p e e c h , w e c a n m o s t l y hear w h a t t h e p e r s o n i s s a y i n g . I t is o f t e n e v e n h a r d t o t e l l w h i c h speech s o u n d t h a t w a s masked b y t h e n o i s e . Some c l a s s i c a l p e r c e p t u a l s t u d i e s have shown t h a t t h e perceived l o c a t i o n i n t i m e o f a d i s t u r b i n g sound i s j u d g e d t o b e a p l a c e i n t h e c o n v e r s a t i o n w h e r e i t makes a s l i t t l e harm as p o s s i b l e , i.e., c l o s e t o a s y n t a c t i c break.

However, t h e meaning o f an u t t e r a n c e d o e s n o t a l w a y s s u r v i v e a door bang. I n some p l a c e s , t h e s e ~ t e n c e is v u l n e r a b l e and i f i n f o r m a t i o n is l o s t i n such p l a c e s t t h e meaning of t h e s e n t e n c e c o u l d n o t be r e s t o r e d . Worn t h i s , w e c a n c o n c l u d e t h a t p r e d i c t a b i l i t y v a r i e s a l o n g a s p o k e n u t t e r a n c e , a n d w h e r e p r e d i c t a b i l i t y is h i g h , t h e i n f o r m a t i o n f l o w is low, a n d v i c e v e r s a .

A similar reasoning seems a p p l i c a b l e t o music. Mostly w e can com-

plement a m e l o d i c l i n e c o r r e c t l y , e v e n i f o n e n o t e is m i s s i n g . For

i n s t a n c e , d e l e t i n g a passing-note would be c o m p l e t e l y harmless. 0-1 t h e

o t h e r h a n d , t h e r e are a l s o more i m p o r t a n t n o t e s i n a melody. I n t h e

second theme from t h e f i r s t movement o f Schubert's B minor Symphony, D

759, t h e r e i s a m o d u l a t i o n f r o m D m a j o r t o B m a j o r , see Fig. 1. The

modulation is a n n o u n c e d b y a D s h a r p . I f t h i s n o t e is c u t O u t r t h e

harmonic i n t e r p r e t a t i o n of t h e f o l l o w i n g n o t e s w i l l be a f f e c t e d . Thus,

t h i s D s h a r p seems t o be an i n d i s p e n s a b l e n o t e f o r t h e melody. I t seems

obvious t h a t such i m p o r t a n t n o t e s a r e i n a s e n s e i n d i s p e n s a b l e and have

(9)

STL-QPSR 4!198 7

a l o w p r e d i c t a b i l i t y , a n d i t c a n b e a s s u m e d t h a t p r e d i c t a b i l i t y is dependent on t h e i n v e r s e of t h e i n f o r m a t i o n rate.

F i g . 1. Second theme from f i r s t movement of S c h u b e r t ' s Symphony i n B minor, D 7 5 9 . T h e t o p Line of numbers symbolize t h e harmonics i n terms of t h e d i s t a n c e , i n semitones, between t h e r o o t of t h e chord and t h e r o o t of t h e t o n i c .

The p a r a l l e l o c c u r r e n c e o f a t i m e - v a r y i n g p r e d i c t a b i l i t y i n b o t h speech a n d m u s i c i s b y n o means t r i v i a l . I t s e x i s t e n c e i n b o t h s u g g e s t s t h a t , p o s s i b l y , i t r e p r e s e n t s a way o f m e e t i n g a n e s s e n t i a l l i m i t a t i o n o f t h e p e r c e p t i v e s y s t e m . For i n s t a n c e , t h i s s y s t e m may b e i n c a p a b l e o f p r o c e s s i n g s i g n a l s h a v i n g a n i n v a r i a b l y h i g h i n f o r m a t i o n r a t e .

P r e d i c t a b i l i t y o f w o r d s h a s b e e n s t u d i e d b y s e v e r a l a u t h o r s . I n a now c l a s s i c a l s t u d y by Lieberman (1963) , t h e r e l a t i o n s h i p between con- t e x t redundancy and keyword i n t e l l i g i b i l i t y was s t u d i e d . It was shown t h a t p r e d i c t a b i l i t y had a s t r o n g e f f e c t on how c l e a r a word w a s p r o - nounced. T h i s experiment was l a t e r r e p e a t e d by Hunn i c u t t (1985, 1987a).

I n a t e x t - t o - s p e e c h s y s t e m , C o k e r , Umeda, & Browman ( 1 9 7 3 ) i n c l u d e d f a c t o r s a s word f r e q u e n c y a n d r e p e t i t i o n o f e a r l i e r m e n t i o n e d words.

These p a r a m e t e r s added t o t h e n a t u r a l n e s s o f t h e speech q u a l i t y . Simi- lar i d e a s about p r e d i c t a b i l i t y a r e i n c l u d e d i n modern communication a i d s f o r t h e handicapped. I t can be shown t h a t a p r e d i c t i o n program working o n l y o n t h e s u r f a c e s t r u c t u r e c a n p r e d i c t a t l e a s t 50% o f t h e t y p e d l e t t e r i n a running t e x t (Hunnicutt, 1987b).

P r e d i c t a b i l i t y i n s p e e c h is p r e s e n t a t many d i f f e r e n t l e v e l s s u c h a s phoneme s e q u e n c e , c h o i c e o f word e n d i n g s , a n d e v e n s y n t a c t i c c o n s t r u c t s . A t a s t r u c t u r a l l e v e l , w e know t h a t c e r t a i n p h r a s e o r word combinations a r e v e r y p r o b a b l e . I t is e v e n l i k e l y t h a t c e r t a i n word sequences a r e l e x i c a l i z e d j u s t l i k e c e r t a i n s i n g l e words, and t h a t w e p e r c e i v e t h e s e p h r a s e s a s s i n g l e u n i t s .

The v a r y i n g p r e d i c t a b i l i t y i n a s e n t e n c e is s i g n i f i c a n t t o i t s a c o u s t i c a l r e a l i z a t i o n , i.e., t o s p e e c h . I n o r d e r t o make s p e e c h e a s y t o understand and n a t u r a l sounding, it is n e c e s s a r y t o emphasize import- a n t , o r l e s s p r e d i c t a b l e , w o r d s a n d t o d e e m p h a s i z e u n i m p o r t a n t , p r e - d i c ta b l e elements.

The v a r y i n g p r e d i c t a b i l i t y i s s i g n i f i c a n t a l s o t o t h e q u a l i t y o f

music p e r f o r m a n c e . I n o u r r u l e s y s t e m f o r m u s i c p e r f o r m a n c e , w e h a v e

i n t r o d u c e d t h e n o t i o n o f m e l o d i c c h a r g e i n o r d e r t o t a k e i n t o a c c o u n t

(10)
(11)
(12)

STL-QPSR 4/19E7

1970). I n t h e music performance program, p h r a s e and sub-phrase e n d i n g s a r e marked i n t h e i n p u t n o t a t i o n . Then, t h e f i n a l n o t e o f t h e s e t w o c o n s t i t u e n t s a r e marked i n t h e performance. T h i s marking seems t o be an e s s e n t i a l r e q u i r e m e n t on a m u s i c p e r f o r m a n c e (Thompson & a l . , 1 9 8 6 ) . Measurements on music performance s u p p o r t t h e same assumption and also t h a t c o n s t i t u e n t m a r k i n g t a k e s p l a c e a t d i f f e r e n t l e v e l s i n t h e h i e r - a r c h y (Todd, 1 9 8 5 ) .

P h r a s e marking is a n i n s t a n c e o f t h e g e n e r a l p r i n c i p l e o f marking c o n s t i t u e n t s i n a s t r u c t u r e . T h i s p r i n c i p l e i s o f t e n r e f e r r e d t o a s grouping w h i c h seems e s s e n t i a l i n a n y t y p e o f c o m m u n i c a t i o n . I t seems

t h a t c o n s t i t u e n t s a t many d i f f e r e n t l e v e l s are marked. F o r i n s t a n c e , word b o u n d a r i e s a r e narked n o t o n l y i n speech, a s mentioned above, b u t a l s o i n o r t h o g r a p h y w i t h a s p a c e . A l s o , i n u n d e r s t a n d i n g s p e e c h o f a f o r e i g n l a n g u a g e , a m a j o r s t e p i s t o d e t e c t t h e b o u n d a r i e s b e t w e e n words.

The f a c t t h a t c o n s t i t u e n t m a r k i n g a p p e a r s n o t o n l y i n s p e e c h a n d music p e r f o r m a n c e b u t a l s o i n t h e o r t h c g r a p h y a n d t h e m u s i c s c o r e s u g g e s t s t h a t t h e marking o f c o n s t i t u e n t is a paramount demand i n many k i n d s o f inter-human communication.

Choice o f a c o u s t i c code

A s b o t h s p e e c h a n d m u s i c p e r f o r m a n c e u s e a c o u s t i c s i g n a l s f o r communicationt i t is i n t e r e s t i n g t o c o m p a r e t h e c o d e s used. I f s p e e c h and m u s i c p e r f o r m a n c e u s e s i m i l a r c o d e s , t h e u n d e r s t a n d i n g o f m u s i c r e q u i r e s a c o m p e t e n c e w h i c h p a r t l y is t h e s a m e a s t h a t r e q u i r e d f o r understanding speech. Thus, comparing t h e c o d e s w i l l shed some l i g h t on t h e b a s i c r e q u i r e m e n t s f o r understanding music.

Ehphasis

A s mentioned earlier, t h e main c o r r e l a t e s f o r e m p h a s i s i n speech is r e l a t i v e l y g r e a t e r p i t c h c h a n g e s , i n c r e a s e d d u r a t i o n s a n d , t o some e x t e n t , g r e a t e r v o c a l e f f o r t , a s w a s i l l u s t r a t e d i n F i g s . 3 a n d 4. By and l a r g e , a s p i t c h a n d d u r a t i o n i n m u s i c a r e d e c i d e d upon b y t h e composer, much l e s s l e e w a y is l e f t t o t h e p e r f o r m e r t h a n t o a s p e a k e r .

However, w h i l e o u r s e n s i t i v i t y t o d i f f e r e n c e s i n t h e d u r a t i o n o f n o t e s p r e s e n t e d i n i s o l a t i o n i s m o d e s t , t h e s e n s i t i v i t y t o m i n u t e p e r t u r b a - t i o n s o f t h e d u r a t i o n o f a n o t e a p p e a r i n g i n a s e q u e n c e o f n o t e s o f s i m i l a r d u r a t i o n is a s s m a l l a s 1 0 msec ( v a n Noorden, 1975). Thus, b y a r r a n g i n g sequences o f n o t e s o f similar d u r a t i o n , t h e composer seems t o o f f e r t h e p l a y e r t h e p o s s i b i l i t y t o c o m m u n i c a t e e m p h a s i s i n terms o f l e n g t h e n i n g and s h o r t e n i n g o f notes.

S i m i l a r o b s e r v a t i o n s h a v e b e e n made w i t h r e g a r d t o s p e e c h . The

lower boundary f o r p e r c e i v i n g d u r a t i o n a l d i f f e r e n c e s h a s been found t o

be o f t h e o r d e r o f 1 0 msec ( H u g g i n s , 1 9 7 2 ; K l a t t & C o o p e r , 1975). The

j u s t n o t i c e a b l e d i f f e r s n c e h a s been shown t o v a r y w i t h t h e t y p e o f sound

(13)

STL-QPSR 4/:1987 - 17 -

Fig. 3. Fundamental frequency differences between emphatic productions and the averaged neutral production of the Swedish sentence

"Uno belznade gzrden i Boden" (Uno mortgaged the farm in Boden).

The different curves pertain to emphasis on the four main words.

Fig. 4. Durational differences between segments in emphatic and neutral

utterances. The solid curve indicates the difference between

segments in nonempnasized words of emphatic utterances and neu-

tral production. The other curves pertain to the increase in

duration of the words when pronounced emphatically.

(14)

STL-QPSR 4 / 19817

and its p h o n e t i c c o n t e x t ( F u j i s a k i , Nakamura, & Imoto, 1975; Carlson &

Granstrom 1975b).

Musicians seem t o t a k e a d v a n t a g e o f t h i s o p p o r t u n i t y . A s was men- t i o n e d a b o v e , m e l o d i c c h a r g e is a p r o p e r t y o f a n o t e t h a t r e f l e c t s t h e remarkableness o f t h e n o t e . S i m i l a r l y , t h e h a r m o n i c c h a r g e seems t o r e f l e c t t h e r e m a r k a b l e n e s s o f a chord. Remarkableness seems t o c a l l f o r emphasis i n t h e performance.

According t o t h e performance r u l e s , a h i g h melodic c h a r g e is marked by i n c r e a s e s i n sound l e v e l , v i b r a t o e x t e n t , and d u r a t i o n . The method is s t r a i g h t f o r w a r d ; t h e i n c r e m e n t o f s o u n d l e v e l v i b r a t o e x t e n t , a n d d u r a t i o n is c a l c u l a t e d a s a c o n s t a n t times t h e m e l o d i c c h a r g e o f t h e i n d i v i d u a l n o t e . The r e s u l t is t h a t n o t e s w i t h a h i g h m e l o d i c c h a r g e sound emphasized. S i m i l a r l y , i n c r e a s e s i n harmonic c h a r g e g e n e r a t e cre- scendos, a n d t h e a s s o c i a t e d i n c r e m e n t s i n s o u n d l e v e l a r e u s e d f o r c a l c u l a t i n g t h e i n c r e a s e s i n d u r a t i o n a n d v i b r a t o e x t e n t . A c c o r d i n g t o formal l i s t e n i n g e x p e r i m e n t s w i t h m u s i c a l l y t r a i n e d s u b j e c t s , t h e music- a l q u a l i t y o f a p e r f o r m a n c e i s r a i s e d i f m e l o d i c c h a r g e is marked i n t h i s way (Thompson & a l . , 1986).

These examples seem t o i n d i c a t e t h a t e m p h a s i s is s i g n a l e d by adding d u r a t i o n r e s u l t i n g i n a slowing o f t h e tempo. The s i m i l a r i t y w i t h speech is o b v i o u s . A l s o , t h i s s l o w i n g down o f t h e t e m p o seems p e r c e p t u a l l y adequate; t h e l i s t e n e r is g i v e n more time t o p r o c e s s t h e u n e x p e c t e d i n £ ormat i o n .

Melodic c h a r g e and i n c r e a s e s i n harmonic c h a r g e are a l s o r e f l e c t e d i n t h e v i b r a t o e x t e n t , a s mentioned. Here, t h e parallel w i t h speech is less obvious, b u t t h e f o l l o w i n g s p e c u l a t i o n is tempting. The p e r c e p t i v e system seems v e r y s e n s i t i v e to changes, eq., i n p i t c h , and one e m p h a s i s marker i n s p e e c h is p i t c h c h a n g e . V i b r a t o a c t u a l l y i n c r e a s e s t h e r a t e of c h a n g e o f f u n d a m e n t a l f r e q u e n c y , t h o u g h w i t h o u t c h a n g i n g t h e mean p e r c e i v e d p i t c h . I n t h i s , p e r s p e c t i v e v i b r a t o c o u l d b e s e e n a s a n ele- g a n t way o f e x p l o i t i n g p i t c h c h a n g e f o r e x p r e s s i v e p u r p o s e s w i t h o u t changing t h e melodic p a t t e r n s .

C o n s t i t u e n t marking

I n s p e e c h a s i n m u s i c p e r f o r m a n c e , i t is n e c e s s a r y t o m a r k s t r u c t u r a l c o n s t i t u e n t s a t d i f f e r e n t l e v e l s . I n speech, t h e most appar- e n t e x a m p l e i s t h e p h r a s e a n d c l a u s e e n d i n g w h i c h is s i g n a l e d by a l e n g t h e n i n g o f t h e l a s t s y l l a b l e o r s y l l a b l e s . T h i s way o f a n n o u n c i n g t h e e n d i n g o f a c o n s t i t u e n t is common t o most l a n g u a g e s ( L i n d b l o m , 1979). A c t u a l l y , f i n a l l e n g t h e n i n g is so i m p o r t a n t f o r b o t h Swedish and E n g l i s h t h a t s p e e c h s y n t h e s i z e d w i t h o u t s u c h a r u l e is p e r c e i v e d a s a c c e l e r a t i n g a t t h e e n d o f e a c h c l a u s e .

I n s p e e c h , c o n s t i t u e n t s o f many s i z e s , f r o m p a r a g r a p h s t o w o r d s ,

are marked, n o t o n l y w i t h d u r a t i o n b u t a l s o w i t h o t h e r p a r a m e t e r s like

i n t o n a t i o n and v o c a l s o u r c e s e t t i n g s . Al~ot micro-pauses a r e i n t r o d u c e d

(15)

STL-QPSR 4/.1987

a t major s y n t a c t i c b r e a k s even i f t h e r e is no need f o r b r e a t h i n g , eq., a f t e r w o r d s f o l l o w e d b y a p e r i o d , a comma, o r a s e m i c o l o n . D u r a t i o n a l d a t a on t h e s e e f f e c t s have been r e p o r t e d f o r s e v e r a l languages and are a l s o f o r m u l a t e d i n t o c o h e r e n t r u l e systems.

P r o s o d i c models have an o b v i o u s importance i n t h e g e n e r a l d e s c r i p t i o n o f languages and f i n d a p p l i c a t i o n i n text-to-speech systems. Sev- eral o f t h e s e m o d e l s h a v e b e e n t e s t e d b y C a r l s o n , G r a n s t r o m , & K l a t t (1979)

1

and t h e more e l a b o r a t e models g i v e advantages b o t h i n n a t u r a l - n e s s and i n t e l l i g i b i l i t y .

One d u r a t i o n a l d e s c r i p t i o n o f S w e d i s h is h i s t o r i c a l l y b a s e d on a tree s t r u c t u r e o f a s e n t e n c e w i t h p h r a s e b o u n d a r i e s a n d s y l l a b l e s on s e p a r a t e branches. Support f o r t h i s model w a s found i n r e i t e r a n t speech (Lindblom & Rapp, 1973; C a r l s o n & G r a n s t r o m , 1 9 7 3 ) . The model had a c y c l i c f i n a l l e n g t h e n i n g r u l e t h a t i n c r e a s e d t h e e f f e c t , t h e h i g h e r p e r c e p t u a l i m p o r t a n c e a b o u n d a r y had. A s i m i l a r model h a s b e e n f o u n d e x t r e m e l y p r o d u c t i v e i n d e s c r i b i n g t i m i n g d a t a from p i a n o music perform- ance (Todd, 1 9 8 3 ) .

Another t y p e o f p r o s o d i c model f o r s p e e c h is b a s e d on a g e n e r a l s t r u c t u r e proposed by K l a t t (1979). The r u l e s have as i n p u t t h e i n h e r e n t d u r a t i o n , which is t h e t y p i c a l d u r a t i o n of t h e phoneme i n a w o r d - i n i t i a l p o s i t i o n b e f o r e a s t r e s s e d v o w e l . The s e c o n d p a r a m e t e r is t h e m i n i m a l d u r a t i o n , which is a measure of t h e phoneme's c o m p r e s s i b i l i t y . F i n a l l y , a c o r r e c t i o n f a c t o r is u s e d t o c a l c u l a t e t h e d u r a t i o n . T h i s f a c t o r is set depending on l o c a l and g l o b a l parameters. T h i s model h a s proven t o b e a g o o d model t o d e s c r i b e d u r a t i o n e f f e c t s i n r u n n i n g s p e e c h . The e x p e r i e n c e s f r o m t h e m u s i c p e r f o r m a n c e p r o g r a m r e v e a l t h a t a s i m i l a r model, i n c l u d i n g r e s t r i c t i o n s on c o m p r e s s i b i l i t y , would b e p r o d u c t i v e a l s o i n music performance.

There seems t o b e a c o n t r a d i c t i o n b e t w e e n t h e s e t w o m o d e l s f o r d e s c r i b i n g d u r a t i o n phenomena. The f i r s t model is probably more s u i t e d f o r w e l l p r e p a r e d r e a d i n g o f t e x t w i t h a h i g h amount o f s p e e c h p r e - planning w h i l e t h e o t h e r is t y p i c a l o f less planned speech w i t h r u l e s o f a more l o c a l nature.

According t o Todd ( 1 9 8 5 ) , t h e e n d i n g o f a p h r a s e is o f t e n p l a y e d

w i t h a s m a l l r i t a r d w h i l e t h e b e g i n n i n g is p l a y e d w i t h a small

accelerando. I t i s w e l l known t h a t t h e l a s t n o t e s o f a p i e c e o f t e n are

played w i t h a f i n a l r i t a r d ( S u n d b e r g & V e r r i l l o , 1980). I n t h e m u s i c

performance program, t h e l a s t n o t e of a phrase is lengthened by 40 msec

and t e r m i n a t e d by a micro-pause. A sub-phrase t e r m i n a t i o n is marked by

a micro-pause only. I n a d d i t i o n , t h e r e are a number of o t h e r r u l e s t h a t

s h o r t e n a n d l e n g t h e n n o t e s d e p e n d i n g on t h e c o n t e x t , a n d t h e s e r u l e s

sometimes seem t o s e r v e t h e p u r p o s e o f c o n s t i t u e n t m a r k i n g . For i n -

s t a n c e , i n combination w i t h t h e marker o f phrase ending, t h e y a c t u a l l y

sometimes g e n e r a t e s m a l l r i t a r d s a t phrase endings.

(16)

STL-QPSR 4/1387

Why is t h e c o d e f o r m a r k i n g s t r u c t u r a l c o n s t i t u e n t s s i m i l a r i n speech and music performance? A tempting h y p o t h e s i s is t h a t t h e code i n music i s i m p o r t e d f r o m s p e e c h i n t h i s r e g a r d ; a s a l l m u s i c l i s t e n e r s have a c q u i r e d a c o m p e t e n c e i n d e c o d i n g s p e e c h , it would b e s a f e t o u s e t h e same c o d e i n m u s i c p e r f o r m a n c e . However, some l a n g u a g e s , e.g., Danish, d o n o t u s e f i n a l l e n g t h e n i n g , a n d , y e t , m u s i c i a n s f r o m t h e s e c o u n t r i e s a r e o b v i o u s l y q u i t e as competent m u s i c i a n s a s t h e i r c o l l e a g u e s from o t h e r c o u n t r i e s . T h i s shows t h a t t h e code used i n music perform- ance may n o t b e b o r r o w e d f r o m s p e e c h , b u t m i g h t l e a n on o t h e r k i n d s o f common e x p e r ience.

As f a r a s t h e f i n a l r i t a r d is concerned, t h e r e is a s t r i k i n g s i m i - l a r i t y w i t h t h e d e c r e a s i n g r a t e o f f o o t s t e p s i n a s t o p p i n g r u n n e r who keeps t h e s t e p l e n g t h a n d t h e b r a k i n g f o r c e c o n s t a n t t h r o u g h o u t t h e s t o p p i n g p r o c e s s (Kronman & Sundberg , 1987). Under t h e s e c o n d i t i o n s , t h e slowing down o f t h e f o o t s t e p s f o l l o w s t h e same c u r v e a s t h e a v e r a g e r i t a r d i n motor music from t h e baroque e r a . Thus, t h e f i n a l lengthening seems t o a l l u d e t o a well-known e x p e r i e n c e , n a m e l y t h a t o f s t o p p i n g locomotion. W e may s p e c u l a t e t h a t a l s o t h e f i n a l l e n g t h e n i n g i n p h r a s e e n d i n g s a r e f a i n t a l l u s i o n s t o locomotion. I f SO, t h e code would be v e r y r o b u s t i n t h e s e n s e t h a t anybody a c q u a i n t e d w i t h locomotion is l i k e l y t o know t h e code.

Outlook

W e can d i s c e r n two a p p a r e n t l y v e r y b a s i c p r i n c i p l e s used i n speech and m u s i c p e r f o r m a n c e . One is t h e e m p h a s i s w h i c h is c a l l e d f o r b y t h e varying p r e d i c t a b i l i t y . I n s p e e c h , p r e d i c t a b i l i t y would s e r v e t h e purpose o f making t h e message robust. I n music performance, on t h e o t h e r hand, t h i s may o r may n o t be t h e purpose; w h i l e speech is of t e n r e q u i r e d t o f u n c t i o n a l s o i n n o i s y environment, music is l i k e l y t o be performed i n l e s s d i s t u r b e d s i t u a t i o n s . I n a n y e v e n t , i t seems l i k e l y t h a t i t is t h e c o g n i t i v e system t h a t a s k s f o r varying d e g r e e s o f emphasis. Perhaps, t h i s system cannot d i g e s t long series o f e q u a l l y unexpected e l e m e n t s i n communication.

Another common b a s i c p r i n c i p l e i n speech and music performance is c o n s t i t u e n t marking. The p a r t s t h a t c o n s t i t u t e b l o c k s i n t h e s t r u c t u r e a r e marked i n t h e a c o u s t i c r e a l i z a t i o n , e.g., p h r a s e s a n d c l a u s e s i n speech a n d p h r a s e s a n d s u b - p h r a s e s i n music. T h i s a p p e a r s t o r e f l e c t a requirement o f t h e c o g n i t i v e s y s t e m , a n d is o f t e n r e f e r r e d t o a s t h e p r i n c i p l e o f grouping.

Why a l l t h e s e numerous p a r a l l e l s , what d o t h e y imply? The p a r a l -

l e l s are n o t a s t o n i s h i n g . Both speech and music are examples of formal-

i z e d inter-human communication by means o f a c o u s t i c s i g n a l s . Both must

be d e v i s e d f o r t h e same p e r c e p t i v e a n d c o g n i t i v e s y s t e m s . The l i m i t a -

t i o n s and c a p a b i l i t i e s of t h e s e s y s t e m s must c o n t r i b u t e i m p o r t a n t l y t o

t h e development of b o t h speech and music.

(17)

STL-QPSR 41 198"I

Acknowledgments

The speech p a r t of t h e work r e p o r t e d i n t h i s paper was s u p p o r t e d by The Swedish Board f o r Technical Development (STU) C o n t r a c t No. 84-3667 and t h e music performance p a r t by t h e Bank o f Sweden Tercentenary Found- a t i o n , C o n t r a c t 84/171.

References

erns stein L. ( 1976) : The Unanswered Quest i o n , The MIT P r e s s , Cambridge MA.

Carlsont R., E r i k s o n , Y., G r a n s t r o m , B.1 L i n d b l o m , B., & Rapp, K.

(1975): " N e u t r a l a n d e m p h a t i c stress p a t t e r n s i n S w e d i s h " pp. 209-218 i n (G. Fant, ed.) Speech Communicationl Vol. 2, Almqvist & W i k s e l l Int., Stockholm.

Carlson, R. & G r a n s t r o m , B. ( 1 9 7 3 ) : "Word a c c e n t , e m p h a t i c stress, a n d syntax i n a s y n t h e s i s b y r u l e s c h e m e f o r S w e d i s h " , STL-QPSR 2-3/1973, pp. 31-36.

Carlson, R. & G r a n s t r o m , B. ( 1 9 7 5 a ) : " A p h o n e t i c a l l y o r i e n t e d p r o - gramming l a n g u a g e f o r r u l e d e s c r i p t i o n o f s p e e c h " , pp. 245-253 i n (G.

Fant, ed.) Speech Communicationt Vol. 21 Almqvist & W i k s e l l t Stockholm.

Carlson, R. & G r a n s t r o m , B. (197513) : " P e r c e p t i o n o f s e g m e n t d u r a t i o n " , pp. 90-106 i n (A. Cohen & S. Nooteboom, e d s . ) S t r u c t u r e a n d P r o c e s s i n Speech P e r c e p t i o n , S p r i n g e r Verlag Heidelberg.

Carlson, R & Granstrorn, B. (1986): " L i n g u i s t i c processing i n t h e KTH m u l t i - l i n g u a l t e x t - t o - s p e e c h s y s t e m " , pp. 2403-2406 i n C o n f e r e n c e R e - c o r d , IEEE-ICASSP

t

TO kyo.

Carlson, R., G r a n s t r o m , B., & H u n n i c u t t , S. ( 1 9 8 2 ) : "A m u l t i - l a n g u a g e text-to-speech module"t pp. 1604-1607 i n P r o c . ICASSP , P a r is Vol. 3.

Carlson, R., G r a n s t r o m , B., & K l a t t , D.K. ( 1 9 7 9 ) : "Some n o t e s on t h e p e r c e p t i o n o f temporal p a t t e r n s i n speech", pp. 233-244 i n (9. Lindblom

& S. ohman, eds.) F r o n t i e r s i n Speech Communication Research, Academic

P r e s s , London.

Coker, C.H., Umeda, N., & Browman, C.P. ( 1 9 7 3 ) : " A u t o m a t i c s y n t h e s i s from t e x t " , IEEE Trans. Audio E l e c t r o a c o u s t . AU-21, pp. 293-297.

m i b e r g , A., S u n d b e r g , J. & Fryde'n, L. ( 1 9 8 7 ) : "How t o t e r m i n a t e a phrase. An a n a l y s i s - b y - s y n t h e s i s e x p e r i m e n t on a p e r c e p t u a l a s p e c t o f music p e r f o r m a n c e " , pp. 49-55 i n (A. G a b r i e l s s o n , ed.) A c t i o n a n d P e r - c e p t i o n on Rhythm and Music, Publ. i s s u e d b y t h e Royal S w e d i s h Academy of Music No. 55, Stockholm.

F u j i s a k i , H.1 Nakamura, K., & I m o t o , T. ( 1 9 7 5 ) : " A u d i t o r y p e r c e p t i o n o f d u r a t i o n of speech and non-speech s t i m u l i " , pp. 197-200 i n (G. Fant & M.

Tatham, e d s . ) A u d i t o r y A n a l y s i s a n d P e r c e p t i o n o f S p e e c h , Academic

P r e s s , London.

(18)
(19)

STL-QPSR 4 / . i 9 8 7

van Noorden, L., P., A., S. ( 1 9 7 5 ) : T e m p o r a l C o h e r e n c e i n t h e P e r - c e ~ t i o n o f Tone Seauences, d i s s . Techn. Univ. Eindhoven

Sundberg , J. (1978) : " S y n t h e s i s of s i n g i n g " Svensk T i d s k r i f t f o r Musik- f o r s k n i n g (Swed.J. M u s i c o l o g y ) - 60:1, pp. 107-112.

Sundberg, J. ( f o r t h c o m i n g ) : " S y n t h e s i s o f s i n g i n g u s i n g a c o m p u t e r c o n t r o l l e d f o r m a n t s y n t h e s i z e r " m a n u s c r i p t t o b e p u b l i s h e d b y Music Department S t a n f o r d University.

Sundberg, J. & Lindblom, B. (1976): "Generative t h e o r i e s i n language and music d e s c r i p t i o n s " Cognition - 4, pp. 99-122.

Sundbergt J. & V e r r i l l o , V. ( 1 9 8 0 ) : "On t h e a n a t o m y o f t h e r e t a r d : A s t u d y o f t i m i n g i n m u s i c " r J.Acoust.Soc.Am. - 6 8 , pp. 772-779.

Sundberg J./ A s k e n f e l t , A., & F r y d h , L. ( 1 9 8 3 ) : " M u s i c a l p e r f o r m a n c e : A synthesis-by-rule approach", Computer Music J. - 71 pp. 37-43.

Thompson, W.F., F r i b e r g , A.1 F r y d i n t

L.1

& S u n d b e r g , J. ( 1 9 8 6 ) : "Evalu- a t i n g r u l e s f o r t h e s y n t h e t i c p e r f o r m a n c e o f me lo die^"^ STL-QPSR 2- 3/1986, pp. 27-44.

Todd, N. ( 1 9 8 5 ) : " A model o f e x p r e s s i v e t i m i n g i n t o n a l m u s i c " , Music P e r c e p t i o n -

31

pp. 33-57.

Winograd, T. ( 1 9 6 8 ) : " L i n g u i s t i c s a n d t h e c o ~ n p u t e r a n a l y s i s o f t o n a l

harmony", J. Music T h e o r y - 121 pp. 2-49.

References

Related documents

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

I dag uppgår denna del av befolkningen till knappt 4 200 personer och år 2030 beräknas det finnas drygt 4 800 personer i Gällivare kommun som är 65 år eller äldre i

Den förbättrade tillgängligheten berör framför allt boende i områden med en mycket hög eller hög tillgänglighet till tätorter, men även antalet personer med längre än

Detta projekt utvecklar policymixen för strategin Smart industri (Näringsdepartementet, 2016a). En av anledningarna till en stark avgränsning är att analysen bygger på djupa

Indien, ett land med 1,2 miljarder invånare där 65 procent av befolkningen är under 30 år står inför stora utmaningar vad gäller kvaliteten på, och tillgången till,

However, with a pointed-top voltage waveform, the third current harmonic increases in almost every measured CFL compared with sinusoidal voltage, while increases in less than half

Industrial Emissions Directive, supplemented by horizontal legislation (e.g., Framework Directives on Waste and Water, Emissions Trading System, etc) and guidance on operating