Transfer Learning in Deep
Structured Semantic Models for
Information Retrieval
SAHAND ZARRINKOUB
KTH ROYAL INSTITUTE OF TECHNOLOGY
Rece a ache IR c de e a e ha ge e a e e a d d c e ec e e e a . The e e e a a e ed a he ba f d c e e e a a d a e ab e e c de e a c fea e f a ed a ge da a e , a ab ha e he a a f c a ca IR a ache ch a TF-IDF. H e e , he da a e ece a a he e e a e a a ab e he e f ea ch e ce ed da , ce he a e ed b e gh e . Th , e h d f e ab g he e f e a IR de da a- e e a e f e e . I h , a bag- f- g a e a IR a ch ec e ed a a fe ea g ced e a a e c ea e e f a ce a a ge da a e b e-a g e e a da a e . The a ge da a e ed W QA, a d he e e a da a e a e Q a Q e Pa , Re e RCV1 a d SQ AD. Whe c de g d d a de e f a ce, e- a g Q e Pa a d f e- g W QA g e he be d d a de . H e e , he c de g a e age e f a ce, e- a g he ch e e e a da a e e e e f a ce he a ge da a e , b h he a da a e a e ed ge he a d he he a e ed d d a , h d ffe e a e age e f a ce de e d g he e e a da a e ed. O a e age, e- a g RCV1 a d Q e Pa g e he e a d h ghe a e age e f a ce e ec e , he c de g he e- a ed e . S g , he e f a ce f a a ed, a d ge e a ed e h gh, a d bea he e f a ce f a e- a ed e a e age. The be e f g de a e age a e a IR de a ed he a ge da a e h
e- a g.
K
D g, I f a g, Ma g, S e g , Ne a a
I d e e e d dee e g a de he e e a B a c G bH, f g g e he e he a he c a a d e ab g a
M ch h gh he f a c a . I a a e g a ef f he g da ce f e , Ma N dah , h ded a ab e feedbac a d
I
1.1 B
I he de d, e ha e acce a a f f a . The e e , b a e a d h e b , a e a ace he e a ge a f da a a e a a ab e f b c e. H e e , ha g acce f a d e g a a ee he ab f d ha e g f . I fac , he a ge a f a a ab e f a ca a e e d ff c f d ha e eed . I f a Re e a (IR) he f e d f c e ce c ce ed h g h b e . I IR e ee a f a f a eed h gh he e f a da aba e f f a . Of e h d e b e g he e de a e , h ch ce ed b a IR e h ch he e e e d c e dee ed e e a .e he a d e ab e a be e ea ch e e e ce f he e .
1.6 S
S be e a . [37] ea e he e e g c a d ca b f ge e a ed b a g a be f a NLP de . The a h ha ea NLP de c d be a ed e a c e , b h he c ea g c e f he de ed h he f e d, he e e e f c a a e ce ha e a e c ea ed. I e ed ha g Ne a A ch ec e Sea ch [36] f d a a ch ec e ha e f he e a ge T a f e a ch ec e ed [40] d ce a ca b f e a e he fe e f 5 ca , c d g f e . O he NLP de e a ed [37] a ha e g f ca ca b f . The b e h a ge a f ca b e a be dec Dica be def ed a Di = [di1, di2, ..., dit] (2.1) he e t he be f e he d c a a d dij he e gh f he j- h e f d c e i, de e ed b he e gh g che e ed. T ge he a be ee d c e , a a ea e, def ed he t-d e a ec ace, ca be ed be ee he c e d g ec . A c ed a ea e he c e a . A e - e, he e e f ca be ec ed h ec ace a d ha e a a d c e he c ec ea ed. The a be ee he e a d a g e d c e he e e a ce c e f ha d c e . W h a g d e gh g che e, d c e ha a e e be e e a f he a e e e d c e each he he ec ace. D ffe e e gh g che e a e a a ab e, b a e c a d e - e f g e he TF-IDF e gh g che e. TF-IDF I he TF-IDF e gh g che e [34], he e gh g e b a e f e e c (TF) e a d a e e d c e f e e c (IDF) e . The TF e c ea e h he be f e he e a ea he g e d c e , h e he IDF e dec ea e h he be f d c e he c ec ha he e a ea . Th , f a e k h ch e e fe d c e cc a e d c e i, be g e a h gh e gh he d c e ec (2.1). Whe g a e ha c a he e k, he c e a be h gh be ee he e a d d c e i, ce a h gh e gh a a g ed he k- h e e e f he d c e ec e e e a Di. The TF-IDF c e S f a e k a d c e i ca be def ed a
de - ed e ce ea ch e g e [10].
2.1.3 I
I
A a a e ab e c B ea a d a ed e e a b d a e ed de [3]. A e ed de c f a d c a f e h ch a ea he c ec ( efe ed a he cab a ), a d a f d c e ( efe ed a a g ) f each d he cab a . A g c a efe e ce he d c e ha c a a g e d. B ee g a e ed de f he d c e F g e 2.1.1: A e ed de c ec , B ea e e ca be a e ed ch fa e ha b e f g a e g a ch g. U ece g a ea ch e , he e ca e f fa ea ch a ge d c e c ec h gh he e f ha h ab e - . N B ea ea ch be ef f de g he d c e c ec a a e ed de . F a ed e e a de ch a TF-IDF a d BM25, a e ed de a e ab e fa ea ch, e ec a he e ched h e f e e c f a f each d c e he g . D g de g, c d h ch a ea a d c e b a e e ab he c e f a d c e a e f e g ed. The e d a e efe ed a stop words a d a c de he c d he ed a g age. E a e f c d he E g h a g age a e , he a d
a d .
2.2 C
S
M
a d c e B ea ea ch, he TF-IDF ec de a d BM25, a e ha e c e e he e h d e ed ab e c d a ch a e a d d c e ha ha e e a c ea g h ha g a e d . F e a e, a d c e H e he S ed h Ca a ca be e ed f a e a S c h . A c h b e he c ea f , h ch a e ed e a d e e h f he e d [41]. H e e , he e be a a c ea ed a d a a ed b ed f a e f a ce, ce g a he a ha a da ag g effec a e age [12]. Th b e ca f e h d ha e ab e he e e a f d c e ba ed he cc e ce f e a c e e ead f e e e d . D ffe e e h d ha e bee ed f h
F g e 2.2.1: The a ch ec e f he Se a c Ha h g de ed b [33] a cha ge he ac a a d ead e e e d c e h e a ge e a a e , ea g ha he a e f he de g a a e c e e he 0 1. Af e f e- g he e , he de f he 128- de a e a e h e h ded d ce he b a ha he f each d c e . S ce he c de a e a ead c e b a beca e f he Ga a e ed d g f e- g, e f a d g h e h d g. Sa a h d & H [33] e a a e he SH de b e g he d c e c ec h d c e . T de e e h ch d c e a e e e a a g e e d c e , he d c e c abe , h ch a e e e f a d c e he c ec , a e ed. A d c e ha ha e e gh c abe h he e d c e a e c de ed e e a . T e he c ec h a d c e g he 128-b c de d ced b he e , Sa a h d & H [33] e he be f d ffe g b , a he Ha g d a ce, a a e e ea e f a be ee he e d c e a d he he d c e . D c e h a a Ha g d a ce be ee he
F g e 2.2.2: P e- a g he Se a c Ha h g e a a Re c ed B a Mach e. T a g d e f he b a e (2000 de ) .
he SH de a d he g TF-IDF e- a he .
2.3 N
I
R
F g e 2.2.3: The a e c de a ch ec e ed f e e he e gh f he SH de .
Se a c M de (DSSM), a dee de a ch ec e h ch a ed e d- -e d a e ed fa h .
I eb ea ch, he be f e d ac a d c e e a ge d e he a a f da a a a ab e he eb. Th e a a ge cab a . Sa a h d & H [33] e he cab a agg e e a e a g fea b e, ee g he 2000 c d he da a e . H e e , g he cab a ead b- a e f a ce d e g f ca d be g e ed a d h e e e g he ab f he e e e e he
We h a a e (2.4) ac a e - e e a d c e a . Th e a e g he ega e g- e h d: L(⇤) = g Y (Q,D+) P (D+|Q) (2.5) he e D+ he c c ed d c e f e Q. The ega e g- e h d ed g cha c g ad e de ce , he e he g ad e f he g- e h d ge e a ed f a a d e ec ed ba ch f he da a ca c a ed h e ec he e gh {Wi : i 2 [2...N]} a d a cha ge he e gh a e ade he d ec f he ega e g ad e . Th gh g (2.5), he de a ed ge e a e e a d d c e e e e a yQa d yD h a h gh c e a f e e a d d c e ha a e e e a each he . The a ch ec e f he DSSM ed b H a g e a . [14] a ed F g e 2.3.1. F g e 2.3.1: The a ch ec e f he DSSM ed b H a g e a . [14].
2.4 T
L
T a fe ea g [29] c ce ed h a fe g ea ed edge be ee d ffe e d a . The e e he edge f e d a dec e c a g. A e- - a a ch ec e ed, he e e e c de ed a d a ed a d ffe e a h d ffe e dec de a ached , bef e be g a ached a f a dec de f he a ge d a a . The e- - a a ch ec e de c bed F g e 2.4.1. D g e- a g, he e ec f he a
e f a a a e e da e f a each e a d e h gh f a g be ee he e- a g a . The a g ced e de c bed A g h 1. The e c de de ed b S b a a a e a . [38] a b -d ec a Ga ed Rec e U [5], h ch a a a f he LSTM-ba ed [13] e a e ha e e fe e
c a .
F g e 2.4.1: The - a e c de -dec de a ch ec e ed b S b a a a e a . [38]. D efe dec de . A a d e ec ed dec de a ed ge he h he e c de E a each a g e a .
A 1 M - a a g f e e ce e bedd g
R : k a h he a e ce a g age, a e c de E ha ha ed ac a a a d k a - ec f c dec de D1, ..., Dk. Le ✓ de e he a a e e f a
e c de a d dec de , ↵ a bab ec [p1, ..., pk] def g he bab f
a g each a ch ha Pk
i pi = 1, (k, ↵) he ca eg ca d b h
k ca eg e h a g bab e def ed b ↵, P1, ...Pk da a e f each a ,
2.5 E
Whe e a a g IR e , ef ha e a e c dge e f a ce aga . Be f a b ef de c f e e a a e c ha ca be ed f h e.
2.5.1 A
A e e e c f e a a g IR e he acc ac , h ch e de e he e a . F a e f ea ch e e Q, acc ac def ed a he be f ea ch e e f h ch he e e e a . F a , = P|Q| j=1relj(dj) |Q| (2.6) he e relj(d) = 1 f d e e a e j a d 0 he e, a d dj he e fM
T a e he e ea ch e e ad he DSSM f H a g e a . [14] a d a d ffe e a g e a de a d g f he effec f a fe ea g e f a ce. The DSSM a ed acc d g e d ffe e ced e ha e d ffe e da a e , d c g e d ffe e a ed DSSM . The
h he h g fac (2.4) e 1. The a ch ec e a ed F g e 3.1.1.
F g e 3.1.1: The e a ch ec e f he DSSM ed.
The ada ed DSSM e e ed Te f , g a e e a e b Baha e Fa e b hed a h ://g h b.c /baha efa e /DSSM h ch a ada ed a d d f ed f he e f h . The c de f ed h c ed P h da a ade f he d ffe e da a e ed h . The da a ade a e e b e f ead g he e- ce ed da a e a d feed g he he de . The da a ade a e f he a g f e e a d c e d g he a g f
he DSSM.
3.2 T
L
Th e e a a eg f ch g h ch f he da a e e e -d c e a f a each e a f a g. S b a a a e a . [38] a e f be ee
he ce d a da a e a each e a , de e e h ch f he e. I h , a d ffe e a ach ed g a a ee ha each - a
ee ce d g each e ch, a d ha a ce da a e a e ed a e a a f e . S e e ha e h ee ce da a e . A he a f e- a g, a L c a g H + I + J e e e c ea ed. H f he e e e ha e he a e D1, I f he e e e ha e he a e D2 a d J f he e e e ha e he a e D3. H, I a d J a e he be f ba che da a e D1, D2a d D3. L he a d h ff ed a d ha f e e e e a ed a he a f a g. The a e f he e e e de e e h ch da a e ed f he e a . F e a e, f he f a e D2, e a b a g a ba ch f ha da a e . Af e he e a , he f e e e f L e ed a d he e e e e f L e a ed dec de h ch da a e e a ba ch f he e e a . Th ced e e ea ed L e a d a ba che f a da a e ha e bee ed. A a f g he ab e a ach e- a a ce da a e , each ce da a e a ed d d a f e- a g. F h a ach, he e
e e a a fe ea g ced e de c bed Sec 2.4.1 ed. I h ced e, he DSSM e- a ed he e ec ed ce da a e bef e be g f e-ed he a ge da a e .
3.3 D
S ce he a ch ec e e e ed b H a g e a . [14] a e a g a c ec f a d c e a , e e a a bag- f- g a a ach e -d c e a ch g. The de ea a ch he g a ec f a e he g a ec f a e e a d c e . The e b ha he e e c de g a a ca ac c fea e ce beca e f he bag- f- g a a ach,he e fea e a e e ed f he e a d d c e he c ea g he g a e e e a . The e f e a e ca ab e f e c d g ac c a d g a a ca fea e , h ch d e bag- f- g a e e e a f he da a,
ch a LSTM-ba ed e , e e a ge a f da a f a g, a d a e e ab e a da a- ca ce e e .
e- a g he DSSM. F e a e, f he de e- a ed he e e a a a , e ha he bag- f- g a a ch ec e ca e he de ba e dec he e e ce f g a ha a e e c e e ega e d . The de h ea a e e e a ha ef f IR. Ne a ach e a a a b e a c, ce a a de e d g a a ca c e, h ch he DSSM ca e c de. F he e ea , he g e-a a fe ea g a ach f M e a . [26] efe ed e he - a a fe ea g a ach f S b a a a e a . [38]. I ead f a g he DSSM e a , he ce da a e a e ada ed a he be ed f e-a g he a ge a . I fac , e f he da a e ed e e ea be ed a IR da a e g a . The ea f h ha IR da a e h ch a e a ge e gh a a e a IR de a e b c a a ab e. Af e e- ce g, a da a e c f ea ch e e a d c e d g e e a d c e . The da a e ed he e e e ca be ca eg ed acc d g he he he be g he ce d a a ge d a . Whe a g he DSSM a a fe ea g ced e, he ce d a da a e a e ed f e- a g a d he a ge d a da a e ed f f e- g.
3.3.1 S
D
D
Th ee d ffe e ce d a da a e a e ed e- a he DSSM: Re e RCV1, Q a Q e Pa a d SQ AD. The Q e Pa da a e [35] c f e a f he Q a e -a d--a e (QA) eb e. The a a e abe ed a e he d ca e d c . Te h da a e a he DSSM, a d c e a a e d ca ded. The e a g d ca e e a ca be ed a ea ch e - d c e a , ce he a ed e ha e he a e e a c ea g. The da a e c a a a e 150 000 d ca e e a af e e- ce g. RCV1 [22] c f Re e e a e a c e e be ee A g 20, 1996 a d A g 19, 1997. Each a c e ha a e f c ag ca g he a b ec f he . The ag ca be ga ed a ee- e c e, h ag bec g e ec f c a dee e e e f he ee. I de e RCV1 a he DSSM, he c ag f a d c e a e ed a e a he , e e a d c e f each d c e
3.3.2 T
D
T da a e h a e efe ed a a ge da a e : M c f W QA a d G g e Na a Q e . A g e a ge da a e a a ed f e a a a g e e e e . The a a ge da a e ed W QA [43], a QA da a e c g f e ed he B g ea ch e g e b ea e , h c e d g W ed a a c e c a g a c ec a e . The e W ed a a c e a e a c e ha ha e bee c c ed b he ea e g he e e . The a c e b d e a e g ed a d he a c e e a e ed ead. We h ge a a da a e ha f H a g e a . [14], h e e -d c e e a ge e a ed b e f a c e c a ea ch e g e, a h gh he c c ed d c e a e W ed a a c e a d c de d c e f he eb e . Whe g a e e -d c e a W QA a d e g a a he e he e e a d/ e e a e a d , 2949 a e a . Th a ea c a f da a de a da a-e e ch a a c a a e . The e -d c e e a a e a g, a da a d e e h 2049, 300 a d 600 a he e ec e . L e W QA, Na a Q e [18] a QA da a e c g f e f ea e , h c e d g W ed a a c e c a g a c ec a e . A e ed ha e a c e d g d c e , h ch a W ed a a c e, c a g a c ec a e . Na e ec ha c a a c ec a e a e g e f he e a c ec a e e . Each e a e a c a he e f he d c e . Na a Q e g f ca a ge ha W QA. A a f he da a e ed, gh 60 000 e a e , c a e he e f a ce he a ch g e e a d d c e b d e ha f a ch g e e a d d c e e . The 60 000 e a e a e a g, a da a d e e , h 75%, 5% a d 20% f he da a e ec e . I he W QA a d Na a Q e da a e , he e a e e e ed a a a g age. A ea ab e e h h d ffe e a f a eed e e ed a a e e e e ed a a eg a ea ch e . If he d ffe e cebe ee e a d eg a ea ch e e be e a ed. F he e, e be f e ed f d bef e be g e e e ed a a bag- f- d e ec a d he e . F e a e, a e a g h he e de f he U ed S a e gh e U ed S a e e de a ea ch e g e. The a e f a eed c d be d ec ed he ea ch e g e a a e , ch a Wh he e de f he U ed S a e ? . Af e e g he d h , , he a d f , e ee ha he e e c a he a e e d . Th , he a be ee a e- ce ed QA a d a e- ce ed -QA IR da a e h d be a ge e gh f he e h d f he ea ch e e e ec ed a a ea ch a ca .
3.4 P
-A da a e a e b ec e- ce g. The e e a d d c e a e f e ed. The e a e he b ec d f e g. The d ed ca be f d A e d A. I de ed ce he e f he cab a (a d hhe e f he a e f he DSSM), d ha h g [14] ed, e g a bag- f- g a ec e e e a f each d c e . The e f he bag-
f-g a ec he a e a he be f e g a he h ee da a e ed. A a e a e d ha h g ed b Sa a h d & H [33] e d d ce ead f g a d ce af e g he cab a , ha he c d a e e e e ed he bag- f- d ec . Sa a h d & H e he 10 000 c d e e e each d c e a a bag- f-d ec . H e e , h ead he f e g f a - d a f he c a a e h ea ch e e ha cc e f he da a e ed h , e g a e ea ch e e a d d c e . Th , d ha h g efe ed e agg e e cab a - g.
E e h he d ha h g a ach, a fe d c e he da a e be e af e d e a . The e d c e a e d ca ded.
3.4.1
H
f a d 60 000 e a e . The a g Na a Q e e f ed g he a g e h e g he acc ac he a da e . The de a he e ch h he h ghe a da acc ac e ec ed a d e a a ed he e e de e e he e f a ce f he h e - a a e e a . The h ee be e f g
a a e e a a e a ed a d ed def e a e eg f he a a e e ace be ea ched he ec d age. Af e he ec d age, he be e f g a a e e
a e ec ed be ed a he e e e f h .
3.6 T
The DSSM a ed h ee a a . The f a a he ce d a da a e , .e. e- a g. The ec d a a he a ge d a da a e , .e. f e- g he DSSM. The h d a a b h he ce a d a ge d a , .e. b h e- a g a d f e- g. Whe e- a g he DSSM, he e a e h ee da a e ch e be ee , ce he e a e h ee d ffe e ce da a e . P e- a g d e g a h ee ce da a eWhe e a a g he e- a ed de ( h f e- g) he a ge e e , he de a he e ch h he h ghe a da ASA ( he ce da a e ) e ec ed. A a g , he f e- g a de ( h a d h e- a g) he de a
he e ch h he h ghe a da acc ac he a ge da a e e ec ed. The DSSM ha a e gh a ce a d b a ec a ed f Te f
ca ed a d b [8], h a a da d de a f 0.1 a d ea 0. A ca ed a d b efe a a d b he e each
a e h a ag de ha d ffe f he ea b e ha a da d de a e- a ed. Th a a ced e ed f a d ffe e a g ced e f he DSSM, a e a f he a ed DSSM, h ch a ed af e a a . We a a d e a a e 10 de f each a g ced e e e a ca acc ac .
3.7 E
N a ed D c ed C a e Ga (NDCG) [15] ed e a a e a ea ch de , h ca e e 1, 3, 10 a d 20. I he a ge da a e (W QA), each e a c a ed h e e e a d c e . The b ha a g e e e e a e d c e ha he e a c a ed h he da a e ca be e c ded. O he d c e ha he a c a ed e a e e h d e e e a ce he e . H e e , ce he a c a ed d c e e a g he most e e a d c e he c ec f e e , a ed ha NDCG be a ea gf ea e f e f a ce. T ea e e f a ce he Na a Q e da a e , he Average SampleAccuracy (ASA) ea e ed. A e age Sa e Acc ac e a a ed he W QA
e e b g g h gh he e -d c e a a d a g 4 e e a d c e f each f he a . The de he ha 5 d c e c e f
e e a ce. ASA def ed a he a e age be f e ha he de a g he h ghe e e a ce he e e a d c e , e a e a f he da a e . Th
he ea e ed f he a da e d g a g f he DSSM.
h he defa a a e e a e . I de ge a de a d g f he effec f e- a g he DSSM h f e- g, a e a he effec f f e- g, he e f a ce f a a ed DSSM a ea ed.
3.8 S
T
I de ea e he a ca g f ca ce f he e , each de a g ced e e ea ed 10 e . The a ca g f ca ce f he e ea ed b g he W c S g ed Ra Te [42]. S ce e e a e c d c ed he e f a ce da a f he de e a a ed, he B fe c ec [9] ed de c he fa e e a e. The a ca g f ca ce e e ed 5%.3.9 C
Each da a e ed h ca be e e e ed a a bab d b e he g a ha cc b c g he cc e ce f each g a a d a g g he a f cc e ce . I de ea e he d ffe e ce be ee he d b f he ce a d a ge da a e , he K bac -Le b e (KL) d e ge ce [17] ca c a ed f each ce da a e he a ge da a e . F d c e e bab d b P a d Q, he KL d e ge ce f Q P def ed [24] a X x2X P (x) gP (x) Q(x). The KL d e ge ce def ed f f a x he e Q(x) = 0, P (x) = 0. T a e e he KL d e ge ce def ed, e e add e h g [23], he e e add 1 a f e e c e , e a e e bab e f g a ha d cc a a c a da a e . Th ea e add 1 he c f each g a af e c g aR
4.1 DSSM P
The DSSM a a ed acc d g a be f d ffe e a g ced e :
I Tab e 4.1.1, he a e age NDCG e f a ce e f each f he a g ced e a e e e ed, ge he h he BM25 e f a ce. The a e a e ge e a ed b e a a g each de W QA e e a d ca c a g he a e age
a e f each a g ced e. T ee he e f a ce f a he d d a de ha c e he a e age a e Tab e 4.1.1, ee A e d B. Whe e a a g
M de (P ced e .) NDCG@1 NDCG@3 NDCG@10 NDCG@20 BM25 0.686 (0) 0.734 (0) 0.742 (0) 0.743 (0) _ a (6) 0.678 (0.003) 0.732 (0.003) 0.751 (0.003) 0.757 (0.002) a (5) 0.690 (0.003) 0.744 (0.004) 0.761 (0.003) 0.766 (0.003) a (1) 0.421 (0.005) 0.481 (0.006) 0.516 (0.005) 0.530 (0.004) (2.1) 0.615 (0.006) 0.671 (0.006) 0.694 (0.006) 0.703 (0.005) c 1 (2.2) 0.170 (0.004) 0.212 (0.007) 0.235 (0.005) 0.254 (0.005) ad (2.3) 0.417 (0.004) 0.472 (0.004) 0.505 (0.003) 0.519 (0.003) a + a (3) 0.456 (0.006) 0.515 (0.005) 0.549 (0.005) 0.562 (0.005) + a (4.1) 0.669 (0.011) 0.721 (0.011) 0.740 (0.010) 0.747 (0.010) c 1+ a (4.2) 0.225 (0.004) 0.270 (0.005) 0.309 (0.004) 0.328 (0.004) ad+ a (4.3) 0.577 (0.036) 0.631 (0.035) 0.657 (0.033) 0.665 (0.032)
Tab e 4.1.1: : The a e age NDCG c e a d ffe e c ff f he d ffe e de a d a g ced e , e a a ed W QA. The S a da d E f he Mea h h a e he e . BM25 efe he BM25 ba e e, _ a efe a a ed DSSM, a efe a DSSM ha a a ed a ce da a e , efe a DSSM ha a a ed he Q a Q e Pa da a e , a efe a DSSM ha a a ed W QA a d + a efe a de ha a e- a ed Q e Pa a d f e- ed W QA. he e f a ce f he a g ced e , e ca e he e a a e he a e age e f a ce f each ced e, he e f a ce f he d d a de ge e a ed b he ced e. O a e age, he be e f g ced e (5.), he e e a
he de W QA h e- a g. Th e f a c ff , he e a g he a ge da a e de e a e age e f a ce ac e a g . The e f a ce a c ff 1, 3, 10 a d 20 a e 0.690, 0.744, 0.761 a d 0.766 f h ced e. H e e , he g a he d d a e f a ce f each de , e ee ha e- a g he de Q a Q e Pa a d f e- g W QA ( ced e (4.)) de he be d d a de , e e h gh he a e age e f a ce f h ced e g f ca e ha ha
f a g W QA. T f he 10 de ge e a ed b h ced e ded he be e f a ce a a c ff . O e f he de ded he be e f a ce a c ff 1 a d 3, h ch a 0.715 a d 0.768, h e a he de ded he be e f a ce a c ff 10 a d 20, h ch a 0.785 a d 0.790. The
Tab e 4.1.2. The a a ce he e f a ce f he DSSM he e- a g NDCG@1 NDCG@3 NDCG@10 NDCG@20
0.715 (3) 0.768 (3) 0.785 (8) 0.790 (8)
Tab e 4.1.2: : The be e f g de a each c ff , e a a ed he W QA e e . The d g h a e he e efe h ch f he de f he 10 de a ed acc ed f a g e e f a ce a e. M de 3 ge e a ed he be e f a ce a c ff 1 a d 3. M de 8 ge e a ed he be e f a ce a c ff 10 a d 20. B h de e e b a ed h gh e- a g Q a Q e Pa a d f e- g W QA. SQ AD a d f e- g W QA e ce a h gh he c a g he he ced e , h a S a da d E f he Mea f be ee 0.032 a d 0.036, de e d g he c ff . P e- a g Q a a d f e- g W QA de he ec d h ghe a a ce, h a S a da d E f he Mea be ee 0.010 a d 0.011. Th ea ha f b h f he e a g ced e , e b a ed e de ha
e f ed e ce a , a d e ha e f ed e ce a e .
O ba e e BM25 de e f e a e e c a ed he a e age e f a ce f be DSSM a g ced e . BM25 de h d be a e age e f a ce f a de a c ff 1 a d 3, a d ec d be e f a ce a c ff 10 a d 20. The d ffe e ce a e age e f a ce be ee BM25 a d he DSSM ha a a ed W QA ( ced e (5.)) c ea e h each c ff . The a e age e f a ce f he a ed DSSM g a g he be f a ced e . The a ed DSSM ha he h d be e f a ce a c ff 1 a d 3, a d he ec d be e f a ce a c ff 10 a d 20, he e bea BM25 ba e e. A ced e ha e e- a g he ce da a e ( ced e (1.), (2.), (3.) a d (4.)) e e a e age e f a ce ha a g W QA h e- a g ( ced e (5.)) a d a g he e ( ced e (6.)). Th ea ha he c a g he a e age e f a ce ac e a g , e- a g he ce da a e e e e f a ce he a ge da a e . P e- a g d ffe e da a e e d d ffe e a e age e f a ce , h
he d ffe e ce be g e e ed af e f e- g he a ge da a e . The e a e age e f a ce g e b e- a g Re e RCV1, f ed b
da a e .
4.2 C
I de ea e he d ffe e ce be ee ed da a e af e a f g he g a - e e e a , he K bac -Le b e (KL) d e ge ce a ca c a ed f each ce da a e W QA. Each f he da a e ed h ca be
e e e ed a a f e e c ec h a e g h e a he be f e g a ac a da a e , a d each e e e c a g he f e e c f a g e g a he e e e ed da a e . I de e he KL d e ge ce ea e ea e h d ffe e da a e a e f each he , he f e e c ec f he c a ed da a e be c e ed bab d b . T a d a def ed a e f he KL d e ge ce, add e h g a ed he f e e c ec , ea g each ec e e e c e e ed b 1. We he d de each ec b L1 a e a he bab d b e g a f he da a e . Af e d g h f he Q e Pa , RCV1, SQ AD a d W QA da a e he KL d e ge ce ca be ca c a ed. The e a e e e ed Tab e 4.2.1. The h ghe KL d e ge ce f Re e RCV1 da a e W QA, ea g ha he d b f g a W QA d ffe e RCV1 f a ce da a e . Th ef ec ed he fac ha e- a g RCV1 ca e he e e f a ce W QA. The KL d e ge ce a f Q e Pa W QA a d f SQ AD W QA. B h he e a e a e g f ca e ha he d e ge ce f RCV1 W QA. Q e Pa RCV1 SQ AD W QA 0.53 0.87 0.48
Tab e 4.2.1: The K bac -Le b e d e ge ce f each ce da a e W QA.
A he e DSSM e e a ed a ch e e d c e e . The a e age A e age Sa e Acc ac (ASA) e f a ce a ea ed f each f he e de f each ced e. A e ed ec 3.7, ASA def ed a he be f
e he e e ec he e e a d c e f a g f 5 d c e , h e f he d c e be g he e e a d c e a d he 4 he be g a d e ec ed e e a d c e . A g f ca c ea e e f a ce f he DSSM b e ed he g he d c e e ead f he d c e b d e . The BM25 e f a ce a e he g he d c e e , a h gh ea a ch a he DSSM. Tab e 4.3.1 de c be he e f a ce f he DSSM he g d c e e a d d c e b d e . Beca e f he e e f a ce he g d c e e , d c e e e e ed e c e a he e e e . M de ASA BM25_ _ e 2b d 0.788 DSSM_ _ e 2b d 0.671 BM25_ _ e 2 e 0.816 DSSM_ _ e 2 e 0.868
de a e a a ed he SQ AD e e . The e f a ce f a DSSM a ed SQ AD a c a ed ha f a a ed e . The e e a e h Tab e 4.4.1. Whe a g SQ AD, he a e age e f a ce d ffe e ce be ee
he a ed a d a ed e c ea e g f ca . The e f a ce c ea e b 44% a c ff 1. The e f a ce c ea e c ea e f each c ff , a d ab 50% a c ff 20. Th fa e ha he e f a ce c ea e he a g a d e a a g W QA, h ch ea e be ee 1% a d 2% f he d ffe e c ff . M de NDCG@1 NDCG@3 NDCG@10 NDCG@20 U a ed 0.0976 0.182 0.240 0.251 T a ed 0.141 0.267 0.359 0.377
NDCG c ff (a de ha be e a c ff 10 e e a be be e a c ff 20), e ha h ca e a e c e a e B fe c ec . We ead e he B fe c ec f he 2 e c d c ed a each c ff . The e a each c ff a e W c S g ed Ra e ha c a e he DSSM ha a a ed W QA he a ed DSSM a d he BM25 de . The e e a e e de e de ha he e f a ce f he a e de ac c ff , h ch a e he B fe c ec e c e a e. We h d de g f ca ce e e b 2 a d e he 8 e a f de e de a . The e ed - a e de e e g f ca ce each d d a e h a g f ca ce e e f ↵ = 0.05 h bec e ↵/2 = 0.025. Tab e 4.5.1 h he -a e f each de a d c ff . We b e e g f ca e f he DSSM a ed W QA a c ff 3, 10 a d 20 he c a g he BM25 de a d a c ff 1 a d 3 he c a g he a ed de . NDCG@1 NDCG@3 NDCG@10 NDCG@20 BM25/DSSM a 0.0462 0.0244 0.000977 0.000977 _ a /DSSM a 0.0142 0.0244 0.0527 0.0527
Tab e 4.5.1: : The - a e a d ffe e c ff he c a g he DSSM a ed
F g e 4.6.1: The f f he h e - a a e e ea ch. Red d ca e h ghe e f a ce, he ea e d ca e e e f a ce.
D
5.1 D
Whe c a g he e f a ce f he a g ced e , e- a g he ce da a e e e a e age e f a ce ac a g . Whe a g he e a h ee e- a g ce da a e acc d g he a fe ea g ced e de c bed 3.2, e a e age e f a ce b e ed a c ff he c a g a a ed e . Th a e f a ce da a e he ed d d a . The ea f he e a e age e f a ce c d be ha he cab a e he ce da a e d ffe f he cab a
he a ge da a e . S ce he DSSM e a bag- f- d a ach, ha g f a d ffe e cab a e a e age e f a ce he a ge da a e . A f , h gh be g ce [26], M e a . h ed ha e- a g he ce da a e SQ AD g f ca b ed e f a ce W QA. H e e , a g f ca d ffe e ce be ee h a d he f M e a . ha h , a e age e f a ce ac e e a de a ed acc d g he a e ced e c a ed ead f he e f a ce f d d a de . Whe c de g d d a de , e ee ha he a fe ea g ced e ed h
a e . I he IR a , he g a e e e e e a d c e ac he e e d c e c ec e e a e . I e ha he e d ffe e ce ca e he a fe ea g ced e be e [26] ha h .
Whe e- a g each ce da a e d d a , he effec f he e- a g de e d h ch da a e ed. The e a e age e f a ce ac a
ced e g e b e- a g he e Re e RCV1 h b e e f e- g. P e- a g he he QA da a e SQ AD g e h ghe a e age e f a ce ha he e c RCV1, h ch e ec ed ce he a be ee SQ AD a d W QA h ghe ha he a be ee RCV1 a d W QA, h b h SQ AD a d W QA c g f e e a ed h e e a W ed a a c e , a d RCV1 c g f e a e a c e a ed h a e a e a c e . O f he h ee ce da a e , e- a g Q a Q e Pa he be ef c a e f a ce W QA. The a e age e f a ce h ghe ha he
e- a g he he ce da a e , a h gh e ha he a e age e f a ce f a a ed e . Whe g a d d a de ead f a e age , e- a g Q a a d f e- g W QA g e he be e , d g he be e f g de a e e c ff . S ce he Q e Pa da a e c f d ca e e h ch a e ea ed a e e a d d c e , be e ec ed ha he a e age a f ha ed d be ee a e a d a d c e h ghe ha RCV1 a d SQ AD. A h ghe deg ee f a be ee
e a d d c e d a d h ec gh be be ef c a he DSSM a fe ea g e f a ce, ce e c age he e e f g a
a ch g, a d ea a a g ec f c he ce da a e .
Whe c a g he KL d e ge ce f each ce da a e W QA, a h Tab e 4.2.1, he h ghe d e ge ce b fa f RCV1. Th ea ha he RCV1 d b he d ffe e f he W QA da a e f a ce da a e . The h ghe d ffe e ce be ee he RCV1 a d W QA d b c d e a h
e- a g RCV1 ead he e e f a ce f he DSSM W QA. The a ed, a d ge e a ed DSSM e f a ce a g he be a g a
a c ff 10 a d 20. S ce BM25, h ch e e ce a e d- a ch g a g h , e he W QA da a e , he e be, a e age, a h gh a f ha ed d be ee e e a d he e e a d c e . S ce he DSSM a g e e ha a b h e e a d d c e he e ec e ec e e e a , b e ha a a ed DSSM, h a , a d ge e a ed e gh a e e a d d c e h a h gh a f ha ed d (a d h a g a c ec af e d ha h g) a eg he e bedd g ace, h ch g e a a ed e g d e f a ce. De Pa a e a . [6] e ha f a a d ge e a ed c a f e h he ReLU ac a f c a d a b
g a , a a ge a f cha ged b ece a cha ge he c a f e . Wh e he a d ge e a ed, a ed DSSM d ffe f he e ed [6] e f ac a f c a d be g a c a f e , b e ha beha e a , h a a ge d ffe e ce e ed ace de e a
g f ca d a ce e bedd g ace. Th e a e he a ed DSSM
a g a ec ace a eg he e bedd g ace,
d ffe e d f . The e e a e h , a fe d a e age, a d he d c e a e e e W ed a a c e , h a ch a ge a f e
C
I h , he DSSM a e- a ed g h ee d ffe e ce da a e ge he , a d g each ce da a e e a a e . A f e- a g c f g a ca ed a e a e age a ge da a e e f a ce, a be a g deg ee . Whe
c de g a e age e f a ce, he be e f g de a b a ed b a g he a ge da a e h e- a g. H e e , he c de g he d d a DSSM , he be e f g de b a a ge a g e e b a ed b e- a g Q a Q e Pa a d f e- g he a ge da a e , W QA. We ec ha h beca e f Q a ca e h gh a f ha ed d be ee e e a d d c e . I e ha he de a ch ec e f he DSSM h de a fe ea g e f a ce. The DSSM a bag- f- g a de , he e he c f he a ea g g a e e a d d c e a e c de ed, a d ac c fea e a e g ed. The d ffe e ce be ee he cab a e he d ffe e da a e ca e he DSSM ea a d ffe e cab a ha he cab a f he a ge da a e ,
f e ead g e e f a ce ha ha f a cab a - e a a ed DSSM.
6.1 F
he da a. Beca e f he he e d ffe e ce be ee he DSSM a d B DAF de , a e e g e h he be ef f he a fe ea g ced e a de e d he de a ch ec e. LSTM-ba ed e a IR e , ch a he e e e ed [28], a e h e f e c a ed he a e- f- he-a IR de . F e c d e a e h he e de be ef f he a fe ea g ced e e e ed he e. De Pa a e a . [6] e ha a a ge cha ge ace e ed f he c a f ca cha ge a dee c a f e g he ReLU ac a f c . The c a f e a e b g a . Th ea ha ha d ffe b a a
be f b a e e d ce he a e c a . I b e ha a a d ge e a ed DSSM beha e a , h ha a e a ace be g
[1] Ba ba , M chae , Ze e , T , a d Ha e , Sa . A Face I E ed f AOL Sea che N . 4417749 . I : New York Times (A g. 2006). URL:
.
[2] Be g a, Ja e a d Be g , Y h a. Ra d Sea ch f H e -Pa a e e O a . I : J. Mach. Learn. Res. 13 (Feb. 2012), . 281 305. ISSN: 1532-4435.
[3] B d, R. M., Ne ba , J. B., a d T eff , J. L. Te F e I e : A E a a . I : Proceedings of the Fourth Workshop on Computer
Architecture for Non-Numeric Processing. CAW 78. B e M a La e, Ne Y , USA: A c a f C g Mach e , 1978, . 42 50. ISBN: 9781450374330. DOI: . URL:
.
[4] B , D a . Rank-BM25: A two line search engine. J e 2020. URL: .
[5] Ch , K gh , Me b e , Ba a , Bahda a , D , a d Be g , Y h a. O he P e e f Ne a Mach e T a a : E c de Dec de A ache . I : Proceedings of SSST-8, Eighth Workshop on Syntax,
Semantics and Structure in Statistical Translation. D ha, Qa a : A c a f C a a L g c , Oc . 2014, . 103 111. DOI:
. URL: .
[6] De Pa a, G ac , K a , B ba , a d L d, Se h. Ra d dee e a e a e b a ed a d e f c . I : Advances in Neural
Information Processing Systems 32. Ed. b H. Wa ach, H. La che e, A.
2019, . 1964 1976. URL:
.
[7] Dee e e , Sc . I de g b a e e a c de g . I : Journal of the
American Society for Information Science 41.6 (1990), . 391 407. URL: .
[8] D c e a , Te F . tf.random.truncated_normal.
. Acce ed: 2020-06-23.
[9] D , Jea a d D , O e Jea . M e C a A g Mea . I :
Journal of the American Statistical Association 56.293 (1961), . 52 64. [10] F da , A ache S f a e. Class Similarity.
. 2019.
[11] G g e. Google Environmental Report 2019.
h :// a ab .g g e/ e /e e a - e -2019/. 2019. [12] He h, W a , P ce, S, a d D h e, L. A e g he a -ba ed e
e a g he UMLS Me a he a . I : Proceedings / AMIA ... Annual
Symposium. AMIA Symposium (Feb. 2000), . 344 348.
[13] H ch e e , Se a d Sch dh be , J ge . L g Sh -Te Me . I :
Neural Computation 9.8 (1997), . 1735 1780. DOI:
. e : . URL:
.
[14] H a g, P -Se , He, X a d g, Ga , J a fe g, De g, L , Ace , A e , a d Hec , La . Lea g Dee S c ed Se a c M de f Web Sea ch U g C c h gh Da a . I : Proceedings of the 22nd ACM International
Conference on Information & Knowledge Management. CIKM 13. Sa
F a c c , Ca f a, USA: A c a f C g Mach e , 2013, . 2333 2338. ISBN: 9781450322638. DOI: . URL:
.
[15] J e , Ka e a d Ke e , Jaa a. C a ed Ga -Ba ed E a a f IR Tech e . I : ACM Trans. Inf. Syst. 20.4 (Oc . 2002), . 422 446. ISSN: 1046-8188. DOI: . URL:
[16] K g a, D ede P. a d Ba, J . Ada : A Me h d f S cha c O a . I : CoRR ab /1412.6980 (2015).
[17] K bac , S. a d Le b e , R. A. O I f a a d S ff c e c . I : Ann. Math.
Statist. 22.1 (Ma . 1951), . 79 86. DOI: . URL:
.
[18] K a , T , Pa a , Je a a, Redf e d, O a, C , M chae , Pa h, A , A be , Ch , E e , Da e e, P h , I a, Ke ce , Ma he , De , Jac b, Lee, Ke , T a a, K a N., J e , L , Cha g, M g-We , Da , A d e , U e , Ja b, Le, Q c, a d Pe , S a . Na a Q e : a Be ch a f Q e A e g Re ea ch . I :
Transactions of the Association of Computational Linguistics (2019).
[19] La ca e , F. W. a d Fa e , F. G. I f a Re e a O -L e . I : The
Library Quarterly 46.1 (1976), . 79 81. DOI: . e :
. URL: .
[20] La ca e , F. W f d a d Fa e , E Ga . Information retrieval: on-line. Me e P b. C L A ge e , 1973. ISBN: 0471512354.
[21] Le, Q c a d M , T a . D b ed Re e e a f Se e ce a d D c e . I : Proceedings of the 31st International Conference on
International Conference on Machine Learning - Volume 32. ICML 14. Be g, Ch a: JMLR. g, 2014, . II 1188 II 1196.
[22] Le , Da d D., Ya g, Y g, R e, T G., a d L , Fa . RCV1: A Ne Be ch a C ec f Te Ca eg a Re ea ch . I : J. Mach. Learn.
Res. 5 (Dec. 2004), . 361 397. ISSN: 1532-4435.
[23] L d e, G.J. N e he ge e a ca e f he Ba e -La ace f a f d c e a e bab e . I : Transactions of the Faculty of
Actuaries (1920).
[24] MacKa , Da d J.C. Information Theory, Inference, and Learning Algorithms. 1 ed. Ca b dge, E g a d: Ca b dge U e P e , 2003, . 34. ISBN: 0521642981.
[25] Ma g, Ch he D., Ragha a , P abha a , a d Sch e, H ch. An
Introduction to Information Retrieval. Ca b dge U e P e , Ca b dge E g a d, 2009, . 3 6. ISBN: 9781139472104. URL:
[26] M , Se , Se , M , a d Ha h , Ha a eh. Q e A e g h gh T a fe Lea g f La ge F e-g a ed S e Da a . I :
Proceedings of the 55th Annual Meeting of the Association for Computational
Linguistics (Volume 2: Short Papers). Va c e , Ca ada: A c a f
C a a L g c , J 2017, . 510 517. DOI:
. URL: .
[27] Na , P e a , M e , L , M ch , A e a d , Magd , Wa d, M ba a , Ha d , F e ha , Abed A ha , G a , J , a d Ra de ee, B a . Se E a -2016 Ta 3: C Q e A e g . I : Proceedings of the 10th
International Workshop on Semantic Evaluation (SemEval-2016). Sa D eg ,
CA, USA: A c a f C a a L g c , J e 2016, . 525 545.
DOI: . URL:
.
[28] Pa a g , Ha d, De g, L , She , Ye g, Ga , J a fe g, He, X a d g, Che , J a h , S g, X g, a d Wa d, Rabab. Dee Se e ce E bedd g U g L g Sh -Te Me Ne : A a a d A ca I f a Re e a . I : IEEE/ACM Transactions on Audio, Speech, and Language
Processing 24 (Ja . 2016), . 694. URL:
. [29] P a , L e Y, M , Jac , Ka , Ca dace A, a d Ka , Ace A. D ec
T a fe f Lea ed I f a A g Ne a Ne . I : Association for
the Advancement of Artificial Intelligence. V . 91. 1991, . 584 589.
[30] Ra a , P a a , Zha g, J a , L e , K a , a d L a g, Pe c . SQ AD: 100,000+ Q e f Mach e C ehe
f Te . I : Proceedings of the 2016 Conference on Empirical Methods in
Natural Language Processing. A , Te a : A c a f C a a
L g c , N . 2016, . 2383 2392. DOI: . URL: .
[31] R be ge , C. J. a . Information Retrieval. 2 d ed. L d , E g a d: B e h , 1979, . 114 115. URL:
[32] R be , S e he , Wa e , S., J e , S., Ha c c -Bea e , M. M., a d Ga f d, M. O a a TREC-3 . I : Overview of the Third Text REtrieval
Conference (TREC-3). Ga he b g, MD, USA: NIST, Ja . 1995, . 109 126. URL:
.
[33] Sa a h d , R a a d H , Ge ff e E. Se a c ha h g . I : Int. J.
Approx. Reason. 50 (2009), . 969 978.
[34] Sa , G., W g, A., a d Ya g, C. S. A Vec S ace M de f A a c I de g . I : Commun. ACM 18.11 (N . 1975), . 613 620. ISSN:
0001-0782. DOI: . URL: . [35] Sha a, La ha , G ae e , La a, Na g a, N a, a d E c , U . Na a La g age U de a d g h he Q a Q e Pa Da a e . I : CoRR ab /1907.01041 (2019). a X : . URL: .
[36] S , Da d, Le, Q c, a d L a g, Che . The E ed T a f e . I :
Proceedings of the 36th International Conference on Machine Learning.
V . 97. L g Beach, Ca f a, USA: PMLR, J e 2019, . 5877 5886. URL: .
[37] S be , E a, Ga e h, A a a, a d McCa , A d e . E e g a d P c C de a f Dee Lea g NLP . I : Proceedings of the 57th
Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. A c a f C a a L g c , J 2019, . 3645 3650. DOI:
. URL: .
[38] S b a a a , Sa dee , T ch e , Ada , Be g , Y h a, a d Pa , Ch he J. Lea g Ge e a P e D b ed Se e ce Re e e a a La ge Sca e M - a Lea g . I : International Conference on Learning
Representations. 2018. URL: .
[39] S dhee , A a . Fac b : B g Tech a d he ca b edge . I : Sustainable
Business (Ja . 2020). URL:
[40] Va a , A h h, Sha ee , N a , Pa a , N , U e , Ja b, J e , L , G e , A da N, Ka e , a , a d P h , I a. A e A Need . I : Advances in Neural Information Processing Systems 30. Ed. b I. G , U. V. L b g, S. Be g , H. Wa ach, R. Fe g , S. V h a a ha , a d R. Ga e . C a A c a e , I c., 2017, . 5998 6008. URL: . [41] V hee , E e M. a d R be ge , C. J. a . Q e E a g Le ca -Se a c Re a . I : SIGIR ’94. L d : S ge , 1994, . 61 69. ISBN: 978-1-4471-2099-5. [42] W c , F a . I d d a C a b Ra g Me h d . I : Biometrics Bulletin 1.6 (1945), . 80 83. ISSN: 00994987. URL: .
[43] Ya g, Y , Y h, We - a , a d Mee , Ch he . W QA: A Cha e ge Da a e f O e -D a Q e A e g . I : Proceedings of the 2015 Conference
on Empirical Methods in Natural Language Processing. L b , P ga : A c a f C a a L g c , Se . 2015, . 2013 2018. DOI: