• No results found

Faster Unsupervised Object Detection For Symbolic Representation

N/A
N/A
Protected

Academic year: 2021

Share "Faster Unsupervised Object Detection For Symbolic Representation"

Copied!
47
0
0

Loading.... (view fulltext now)

Full text

(1)

INOM

EXAMENSARBETE DATALOGI OCH DATATEKNIK, AVANCERAD NIVÅ, 30 HP

STOCKHOLM SVERIGE 2020,

Faster Unsupervised Object Detection For Symbolic

Representation

PEIYANG SHI

KTH

SKOLAN FÖR ELEKTROTEKNIK OCH DATAVETENSKAP

(2)
(3)
(4)
(5)
(6)
(7)

β β

(8)

zwhere

β

(9)
(10)

280 × 210 × 3

(11)

H × W × C

(12)
(13)

(S, A, R, T, γ)

S s ∈ S

S

A a ∈ A

T

R γ

γ

s a

(14)

Q(s, a) = R(s, a) + γ!

s

P (s|s, a)

a Q(s, a)

Q s

a

s a t + 1

V V

V(s) =

a Q(s, a)

(15)
(16)

p q

x z

p(x, z)

x z

p(x, z) = p(x|z)p(z)

(17)

p(z|x) = p(x|z)p(z)

p(x) = " p(x|z)p(z)

zp(x|z)p(z)dz p(z|x)

z p(z|x) q(z|x)

DKL

#q(z)||p(z|x)$

=Eq(z)[ q(z)− p(z|x)]

p(x|z)

DKL

#q(z)||p(z|x)$

=Eq(z)[ q(z)− p(x|z) − p(z)] +Eq(z)[ p(x)]

Eq(z)[ p(x)]

p(x) q(z)

p(x)−DKL#

q(z)||p(z|x)$

=Eq(z)[ p(x|z)]−Eq(z)[ p(z)]−Eq(z)[ q(z)]

p(x)− DKL

#q(z)||p(z|x)$

=Eq(z)[ p(x|z)] − DKL

#q(z)||p(z)$

p(x) DKL

p(x)

p(x)

p

θ q φ

L(θ, φ, x) = Eq(z|x)[ pθ(x|z)] − DKL

#qφ(z|x)||pθ(z)$

(18)

Eq(z|x)[ pθ(x|z)]

Lreconst(θ, φ)

Eq(z|x)[ pθ(x|z)]

qφ(z|x) pθ(z)

q(z|x) p(x|z)

z

β β

β

L(θ, φ, x, z, β) = Eqφ(z|x)[ pθ(x|z)] − βDKL#

qφ(z|x)||p(z)$

(19)

β β

p(z) β

[I(Z; Y )− βI(X; Z)]

I(·) β

Z X

Z Y β

z

x z

Ep(x)ˆ [DKL

#qφ(z|x)||p(z)$

] = Iqφ(x; z) + DKL

#qφ(z)||p(z)$

β

zj z

Lβ−T C =L (θ, φ)+αIqφ(z; x)+λ (qφ(z))+γ!

j

DKL

#qφ(zj)||p(zj)$

α λ γ

z

z z ={zattr, zwhere, zpres}

(20)

zpres

zwhere zattr

pθ(x)

pθ(x) =

!N n=1

pN(n)

%

pθ(z|n)pθ(x|z)dz N

z z =

(z1, z2, ..., zN) z ∼ pzθ(·|n) x∼ pxθ(·|z)

N

qφ

qφ(z, zpres|x) = qφ(zn+1pres = 0|z1:n, x)

&n i

qφ(zi, zipres = 1|x, z1:i−1)

z

n zpres

(21)

z ={zattr, zwhere, zpres, zdepth} zdepth

zpres

zpres

P (zkpres|ˆz1:k−1pres , C = 1) =

!HW c=0

P (zkpres|ˆz1:k−1pres , C = c)p(C = c|ˆz1:k−1pres )

C zˆpres1:k−1

k−1 k

(22)

X Z

H × W Z

X

Z ∈ RH×W ×(loc,depth,pres,M)

H× W Z

Z ={Z11, Z12...ZH1, Z21...ZHW}

(23)

Zi,j ={Zi,jwhere, Zi,jattr, Zi,jdepth, Zi,jpres}

Zi,jwhere ∈ R4

x, y h, w Zi,jattr ∈ RM M

Zi,jdepth ∈ R1

Zi,jpres ∈ R1

qφ(Z|X) pφ(X|Z)

fb(·) Ef eat

Ef eat= fb(X)

(24)

Ef eat ∈ RCf×W ×H Z

H, W Cf

Zloc Zwhere, Zdepth, Zpres

qφloc Zi,jloc= Zi,jwhere, Zi,jpres, Zi,jdepth

q(Zloc|X) = qφ(Zwhere, Zdepth, Zpres|fb(X))

= qφ(Zwhere, Zdepth, Zpres|Ef eat) Zi,j

z z

z = Zi,j

z ={zwhere, zdepth, zpres, zattr}

z

[i, j]

p(Z) = 'HW

i p(zi|zi−1, zi−2...)

(25)

z Ef eat

Ef eat

qφloc(Zloc|Ef eat) =

H,W&

i,j

qφloc(Zi,jloc|Ei,jcontext)

Ei,jcontext={EN−i,N−jf eat , ..., EN,Nf eat, ...EN +i,N +jf eat }

zwhere [i, j] Ef eat

(26)

zwhere zxwhere

zywhere zhwhere

zwwhere

H× W

zwhere

zwhere={zxwhere, zywhere, zwwhere, zhwhere} zxwhere, zwherey

zwwhere, zwhereh

zwhere

b

bx =#

(σzxwhere)(B x− B x) + B x+ i$ cw

by =#

(σzywhere)(B y− B y) + B y+ i$ ch

σ B

B

(i, j) c

(27)

A

bw =#

σ(zwwhere)(A w− A w) + A w

$Aw

bh =#

σ(zhwhere)(A h− A h) + A h

$Ah

A w A h

A

hobj× wobj

qφattr

x = ST N (X, T )

x X

T

(28)

T (zwhere) =

bw 0 bx

0 bh by

0 0 1

wobj× hobj

Himg, Wimg, Cimg

X ={X, C}, X ∈ RHimg×Wimg×(Cimg+2) C ∈ RHimg×Wimg×2

Ci,j ={i, j}

zattr

qφattr(zattr|x) pattrθ (x|zattr) x

x β

zwhere H× W

HW

(29)

zpres zdepth zpres ∈ {0, 1}

zpres = 1

zdepth

zdepth ∈ [0, ∞) σ(zdepth)∈ [0, 1]

ˆ

xi,j zdepth

blend( ˆXi,j) = ˆXi,j ∗ zdepth ,HW

k,t Zk,tdepth ∗ pθ(zpres|X)

N (0, 1)

zattr∼ N (µattr, σattr) zwhere∼ N (µwhere, σwhere)

zdepth∼ N (µdepth, σdepth)

zpres

(30)

pθ(zpres) = 1 HW

qφ(zpres|X)

Decoder(z)

zattr H× W xˆij

zwhere

T−1(zwhere) =

bw 0 bx

0 bh by

0 0 1

−1

HW

T−1 blend(·)

zpres =

H,W!

i,j

ST N (blend(xi,j), T−1(zwhere))∗ pθ(zpres|X)

(31)

Zwhere, Zpres, Zattr

zdepth

zattr zpres

S = Zˆ pres⊙ Zattr

zwhere

Ef eat Z X p(zpres)

(32)
(33)

1e−4 28× 28 48× 48 zattr

zhwhere zwwhere N (5.013100487582577, 0.5) N (0, 1)

zwhere, zpres, zdepth zattr

zwhere, zpres, zdepth

ACE =|Ngroundtruth− Npredicted|

(34)
(35)

zpres

(36)

zattr zattr

zwhere, zpres zdepth

(37)

zhwhere zwwhere

(38)

zwhere

zwhere zwhere

(39)

z

where

zhwhere zwwhere

zhwhere zwwhere

β

β

(40)

β

β β

β

(41)
(42)
(43)
(44)

β

(45)
(46)
(47)

TRITA -EECS-EX:194

www.kth.se

References

Related documents

A series of “double-cable” conjugated polymers were developed for application in efficient single-component polymer solar cells, in which high quantum efficiencies could

Vidare menar Hetzel att modellen är tillbakablickande och därmed inte kan prediktera framtida styrräntor, eftersom inflation och BNP-gap är beroende av vad värdet för dessa

The one shot object detector is a neural network based detection algorithm, that tries to find the exemplar in the search window. The output is a detection in the form of bounding

As explained before, we implement the active learning process of our method, Temporal Flow, against the various baselines in the field.. The fig- ure 4.3 presents the mAP over

Such task can be divided into object detection, which consists in finding the object position within the digital image, and point cloud which means computing the real world 3D

Figure 20 shows a box plot of the quantization error in the dataset, i.e. the distance between each data point and its winning node. This plot shows some PNRs lying far from

Using multiple tasks with different goals to train a single policy that can generalize to new un- seen goals; this is done both by performing end-to-end learning by training an MLP

Medan Reichenberg lyfter fram vikten av att väcka elevernas läslust, vilket är viktigt att tänka på när du som lärare väljer ut texter och böcker Reichenberg (2014, s. 15)