INOM
EXAMENSARBETE DATALOGI OCH DATATEKNIK, AVANCERAD NIVÅ, 30 HP
STOCKHOLM SVERIGE 2020,
Faster Unsupervised Object Detection For Symbolic
Representation
PEIYANG SHI
KTH
SKOLAN FÖR ELEKTROTEKNIK OCH DATAVETENSKAP
β β
zwhere
β
280 × 210 × 3
H × W × C
(S, A, R, T, γ)
S s ∈ S
S
A a ∈ A
T
R γ
γ
s a
Q∗(s, a) = R(s, a) + γ!
s′
P (s′|s, a)
a′ Q∗(s′, a′)
Q∗ s
a
s′ a′ t + 1
V∗ V∗
V∗(s) =
a Q(s, a)
p q
x z
p(x, z)
x z
p(x, z) = p(x|z)p(z)
p(z|x) = p(x|z)p(z)
p(x) = " p(x|z)p(z)
zp(x|z)p(z)dz p(z|x)
z p(z|x) q(z|x)
DKL
#q(z)||p(z|x)$
=Eq(z)[ q(z)− p(z|x)]
p(x|z)
DKL
#q(z)||p(z|x)$
=Eq(z)[ q(z)− p(x|z) − p(z)] +Eq(z)[ p(x)]
Eq(z)[ p(x)]
p(x) q(z)
p(x)−DKL#
q(z)||p(z|x)$
=Eq(z)[ p(x|z)]−Eq(z)[ p(z)]−Eq(z)[ q(z)]
p(x)− DKL
#q(z)||p(z|x)$
=Eq(z)[ p(x|z)] − DKL
#q(z)||p(z)$
p(x) DKL
p(x)
p(x)
p
θ q φ
L(θ, φ, x) = Eq(z|x)[ pθ(x|z)] − DKL
#qφ(z|x)||pθ(z)$
Eq(z|x)[ pθ(x|z)]
Lreconst(θ, φ)
Eq(z|x)[ pθ(x|z)]
qφ(z|x) pθ(z)
q(z|x) p(x|z)
z
β β
β
L(θ, φ, x, z, β) = Eqφ(z|x)[ pθ(x|z)] − βDKL#
qφ(z|x)||p(z)$
β β
p(z) β
[I(Z; Y )− βI(X; Z)]
I(·) β
Z X
Z Y β
z
x z
Ep(x)ˆ [DKL
#qφ(z|x)||p(z)$
] = Iqφ(x; z) + DKL
#qφ(z)||p(z)$
β
zj z
Lβ−T C =L (θ, φ)+αIqφ(z; x)+λ (qφ(z))+γ!
j
DKL
#qφ(zj)||p(zj)$
α λ γ
z
z z ={zattr, zwhere, zpres}
zpres
zwhere zattr
pθ(x)
pθ(x) =
!N n=1
pN(n)
%
pθ(z|n)pθ(x|z)dz N
z z =
(z1, z2, ..., zN) z ∼ pzθ(·|n) x∼ pxθ(·|z)
N
qφ
qφ(z, zpres|x) = qφ(zn+1pres = 0|z1:n, x)
&n i
qφ(zi, zipres = 1|x, z1:i−1)
z
n zpres
z ={zattr, zwhere, zpres, zdepth} zdepth
zpres
zpres
P (zkpres|ˆz1:k−1pres , C = 1) =
!HW c=0
P (zkpres|ˆz1:k−1pres , C = c)p(C = c|ˆz1:k−1pres )
C zˆpres1:k−1
k−1 k
X Z
H × W Z
X
Z ∈ RH×W ×(loc,depth,pres,M)
H× W Z
Z ={Z11, Z12...ZH1, Z21...ZHW}
Zi,j ={Zi,jwhere, Zi,jattr, Zi,jdepth, Zi,jpres}
Zi,jwhere ∈ R4
x, y h, w Zi,jattr ∈ RM M
Zi,jdepth ∈ R1
Zi,jpres ∈ R1
qφ(Z|X) pφ(X|Z)
fb(·) Ef eat
Ef eat= fb(X)
Ef eat ∈ RCf×W ×H Z
H, W Cf
Zloc Zwhere, Zdepth, Zpres
qφloc Zi,jloc= Zi,jwhere, Zi,jpres, Zi,jdepth
q(Zloc|X) = qφ(Zwhere, Zdepth, Zpres|fb(X))
= qφ(Zwhere, Zdepth, Zpres|Ef eat) Zi,j
z z
z = Zi,j
z ={zwhere, zdepth, zpres, zattr}
z
[i, j]
p(Z) = 'HW
i p(zi|zi−1, zi−2...)
z Ef eat
Ef eat
qφloc(Zloc|Ef eat) =
H,W&
i,j
qφloc(Zi,jloc|Ei,jcontext)
Ei,jcontext={EN−i,N−jf eat , ..., EN,Nf eat, ...EN +i,N +jf eat }
zwhere [i, j] Ef eat
zwhere zxwhere
zywhere zhwhere
zwwhere
H× W
zwhere
zwhere={zxwhere, zywhere, zwwhere, zhwhere} zxwhere, zwherey
zwwhere, zwhereh
zwhere
b
bx =#
(σzxwhere)(B x− B x) + B x+ i$ cw
by =#
(σzywhere)(B y− B y) + B y+ i$ ch
σ B ∗
B ∗
(i, j) c∗
A
bw =#
σ(zwwhere)(A w− A w) + A w
$Aw
bh =#
σ(zhwhere)(A h− A h) + A h
$Ah
A w A h
A
hobj× wobj
qφattr
x′ = ST N (X, T )
x′ X
T
T (zwhere) =
⎡
⎣
bw 0 bx
0 bh by
0 0 1
⎤
⎦
wobj× hobj
Himg, Wimg, Cimg
X′ ={X, C}, X′ ∈ RHimg×Wimg×(Cimg+2) C ∈ RHimg×Wimg×2
Ci,j ={i, j}
zattr
qφattr(zattr|x′) pattrθ (x′|zattr) x′
x′ β
zwhere H× W
HW
zpres zdepth zpres ∈ {0, 1}
zpres = 1
zdepth
zdepth ∈ [0, ∞) σ(zdepth)∈ [0, 1]
ˆ
xi,j zdepth
blend( ˆXi,j) = ˆXi,j ∗ zdepth ,HW
k,t Zk,tdepth ∗ pθ(zpres|X)
N (0, 1)
zattr∼ N (µattr, σattr) zwhere∼ N (µwhere, σwhere)
zdepth∼ N (µdepth, σdepth)
zpres
pθ(zpres) = 1 HW
qφ(zpres|X)
Decoder(z)
zattr H× W xˆij
zwhere
T−1(zwhere) =
⎡
⎣
bw 0 bx
0 bh by
0 0 1
⎤
⎦
−1
HW
T−1 blend(·)
zpres Xˆ′ =
H,W!
i,j
ST N (blend(x′i,j), T−1(zwhere))∗ pθ(zpres|X)
Zwhere, Zpres, Zattr
zdepth
zattr zpres
Sˆ
S = Zˆ pres⊙ Zattr
⊙
zwhere
Ef eat Z X p(zpres)
1e−4 28× 28 48× 48 zattr
zhwhere zwwhere N (5.013100487582577, 0.5) N (0, 1)
zwhere, zpres, zdepth zattr
zwhere, zpres, zdepth
ACE =|Ngroundtruth− Npredicted|
zpres
zattr zattr
zwhere, zpres zdepth
zhwhere zwwhere
zwhere
zwhere zwhere
z
wherezhwhere zwwhere
zhwhere zwwhere
β
β
β
β β
β
β
TRITA -EECS-EX:194
www.kth.se