6
FPGAworld CONFERENCE
Book
2009 SEPTEMBER
EDITORS
Lennart Lindh, David Källberg, Santiago de Pablo and Vincent J. Mooney III
The FPGAworld Conference addresses aspects of digital and hardware/software system engineering
on FPGA technology. It is a discussion and network forum for students, researchers and engineers
working on industrial and research projects, state-of-the-art investigations, development and
applications. The book contains some presentations; for more information see
(
www.fpgaworld.com/conference
).
ISBN
978-91-977667-2-2
SPONSORS
Copyright and Reprint Permission for personal or classroom use are allowed with credit to
FPGAworld.com. For commercial or other for-profit/for-commercial-advantage uses, prior
2009 PROGRAM COMMITTEE
General Chair
Lennart Lindh, FPGAworld, Sweden
Publicity Chair
David Kallberg, FPGAworld, Sweden
Academic Programme Chair
Vincent J. Mooney III, Georgia Institute of Technology, USA, and Nanyang
Technological University, Singapore
Academic Publicity Chair
Santiago de Pablo, University of Valladolid, Spain
Academic Programme Committee Members
Ketil Roed, Bergen University College, Norway
Lennart Lindh, Jönköping University, Sweden
Adam Postula, University of Queensland, Australia
Pramote Kuacharoen, National Institute of Development Administration, Thailand
Santiago de Pablo, University of Valladolid, Spain
Industrial Programme Chair
Lennart Lindh, Jönköping University, Sweden
Industrial Programme Committee Members
Solfrid Hasund, Bergen University
College
Kim Petersén, HDC, Sweden
Mickael Unnebäck, ORSoC, Sweden
Fredrik Lång, EBV, Sweden
Niclas Jansson, BitSim, Sweden
Göran Bilski, Xilinx, Sweden
Adam Edström, Elektroniktidningen,
Sweden
Espen Tallaksen, Digitas, Norway
Göran Rosén, Actel, Sweden
Tommy Klevin , ÅF, Sweden
Tryggve Mathiesen, BitSim, Sweden
Fredrik Kjellberg, Net Insight,
Sweden
Daniel Stackenäs, Altera, Sweden
Martin Olsson, Synective Labs,
Sweden
Stefan Sjöholm, Prevas, Sweden
Ola Wall, Synplicity, Sweden
Torbjorn Soderlund, Xilinx, Sweden
Anders Enggaard, Axcon, Denmark
Doug Amos, Synplicity, UK
Guido Schreiner, The Mathworks,
Germany
Stig Kalmo, Engineering College of
Aarhus, Denmark
Hichem Belhadj, Actel, USA
Rolf Sylvester-Hvid, Aktuell
Elektronik
This year’s conference is held in Stockholm (Sweden) and Copenhagen
(Denmark).
We try to balance student, academic and industrial presentations, exhibits
and tutorials to provide a unique chance for our attendants to obtain
knowledge from different views.
Track A - Industrial
Track A features presentations with focus on industrial applications. The
presenters were selected by the Industrial Programme Committee. Total
11 papers was presented (30 minutes slots).
Track B - Academic
Track B features presentations with focus on academic papers and
industrial applications. The presenters were selected by the Academic
Programme Committee. 7 out of the 12 submitted papers were
presented(30 minutes slots).
Track C - Product presentations
Track C features product presentations from our exhibitors and sponsors
(30 minutes slots).
Exhibitors:
Total 27 unique exhibitors in Stockholm and Copenhagen.
Sponsors:
6 Sponsors of lunch, coffee and snacks.
Students projects:
4-5 Master theses project was presented.
Please check out the website (http://fpgaworld.com) for more information
about FPGAworld. In addition, you may contact David Källberg
(
david@fpgaworld.com
) for more information.
We would like to thank all of the authors for submitting their papers and
hope that the attendees enjoyed the FPGAworld conference and you are
coming to next year’s conference.
Andrew Dauman, Synopsys
10:00 - 10:30
Coffee Break, Sponsored by Synopsys
10:30 - 11:30
Exhibitors Presentation
11:30 - 12:30
Lunch Break, Sponsored by Abound Logic
12:30 - 14:30
Session Chair
Anders Enggaard
Session Chair
Session A1
Making a simple VHDL testbench -
step-by-step
Session A2
Prototyping and Verifying HDL Code
with Graphical Development Tool
Session A3
A shortcut to hardware using C - a case
study from the real world...
Session A4
FPGA Development with Altium
Designer
Session C1
The MAGIC of acquisition and generators
Bitsim
Session C2
Implementing PCI Express® In High Performance Or
Low Cost FPGAs
Silica
Session C3
A general testbench infrastructure for simple
verification
Digitas
Session C4
Bugs & Problems; - Worst Disasters through many
interesting years
Digitas
14:30 - 15:00
Coffee Break
15:00 - 17:00
Session Chair
Tryggve Mathiesen
Session Chair
Session A5
Breaking through FPGA performance
barriers
Session A6
FPGA at 40nm: A great leap
forwards...or a leap in the dark?
Session A7
Ultra-low Power FPGAs for 'Cool'
Portable Applications
Session A8
Large scale real-time data acquisition
and signal processing in SARUS
Session C5
Designing a simple OVM Testbench
Dyrberg Trading
Session C6
Save time and money by reducing FPGA-PCB revisions
- and ensure correct FPGA IO pin assignment
Nordcad
Session C7
FPGA Raptor
Abound Logic
Session C8
Products and Roadmap
09:15 - 10:00
The Impact of Reconfigurable Computing on Manycore Programming Trends
Dr. Reiner Hartenstein, professor of Computer Science at the University of Kaiserslautern
10:00 - 10:30
Sponsored by Synopsys
Coffee Break
10:30 - 11:30
Session Chair
Tryggve Mathiesen
Vincent J. Mooney III
Session Chair
Kristina Kristoffersson
Session Chair
Session A1
Breaking through FPGA Performance
Barriers
Session A2
Making a simple VHDL testbench -
step-by-step
Session B1
Design of BBN-based Framework for
Adaptive IP-reuse
Session B2
Camera and LCM IP-Cores for NIOS
SOPC System
Session C1
The MAGIC of acquisition and
generators
Bitsim
Session C2
Products and Roadmap
DINI Group
11:30 - 12:30
Lunch Break, Sponsored by Mentor Graphics
12:30 - 14:30
Session Chair
Tryggve Mathiesen
Session Chair
Johnny Öberg
Session Chair
Doug Amos
Session A3
Milkymist™
Session A4
Prototyping and Verifying HDL Code with
Graphical Development Tool
Session A5
FPGA: the Verification Platform of the
future?
Session B3
Implementing True Random Number
Generators by Overfilling the FPGA
Chip
Session B4
Combined simulation and emulation
setup for complex image processing
algorithms in VHDL
Session B5
On-Chip Transactional Memory
System for FPGAs using TCC model
Session C3
A shortcut to hardware using C - a case
study from the real world...
Bitsim
Session C4
A general testbench infrastructure for
simple verification
Digitas
Session C5
Bugs & Problems; - Worst Disasters
through many interesting years
Digitas
14:30 - 15:00
Coffee Break
15:00 - 16:30
Session Chair
Fredrik Lang
Santiago de Pablo
Session Chair
David Kallberg
Session Chair
Session A6
CASE STUDY: FPGA technology in
robotics
Session A7
FPGA at 40nm: A great leap forwards...or
a leap in the dark?
Session A8
Ultra-low Power FPGAs for 'Cool' Portable
Applications
Session B6
Power and Energy Efficiency
Evaluation for HW and SW
Implementation of nxn Matrix
Multiplication on Altera FPGAs
Session B7
Design and Implementation of a
Plesiochronous Multi-Core 4x4
Network-on-Chip FPGA Platform with
MPI HAL Support
Session C6
Live demo of an OpenRISC
processor SoC, running Linux
and showing the great
possibilities of an Open-source
system
Session C7
Designing a simple OVM Testbench
Mentor Graphics
Session C8
FPGA Raptor
Key Note Session
The Impact of Reconfigurable Computing on Manycore Programming
Trends
The Impact of
Reconfigurable Computing on
Manycore Programming Trends
Reiner
Hartenstein
1
10 Sep 2009, Stockholm, Sweden
ke
yno
reiner@hartenstein.de
9:15 – 10:00
Teaching for Change: an early martyr
„Turing is irrelevant“
The von Neumann model
is the emulation of a tape machine
http://www.sigsoft.org/SEN/parnas.html
D. L. Parnas (keynote):
"
Teaching for Change“;
10
thConf. Softw. Engineering Education
and Training (CSEET '97)
„The von Neumann syndrome“:
coined ~ a decade later
Prof. C.V.
Ramamoorthy,
(UC Berkeley),
SDPS 2006,
San Diego, CA
Critique of von Neumann is not new:
punished for blasphemy?
(mimicking tape
on RAM)
Peter G.
Neumann
http://hartenstein.de
©2009, reiner@hartenstein.de
3
Outline
(1)
• The Power Consumption of Computing
• The Single-Core Approach
• The Multicore Scenario
• The Silver Bullet?
• A CPU-centric Flat World
• The Generalisation of Software Engineering
• Conclusions
Impact of the
von Neumann
Syndrome
http://www.forbes.com/forbes/1999/0531/6311070a.html
Dig more coal
--the PCs are coming
Peter W. Huber,
Mark P. Mills,
05.31.99
http://hartenstein.de
©2009, reiner@hartenstein.de
5
never run out of energy?
typical oil field operation
coal
hydro
nuclear
gas
oil
[Fatih Birol, Chief Economist IEA]. https://www.theoildrum.com/
2007:
80% crude oil coming from decline fields
natural gas: similar situation
> 30 %
~ 55 %
Pr
od
uct
io
n
(%
)
100
0
5
„6 more Saudi Arabias needed
for demand predicted for 2030“
Server
Farms
the electricity bill is a key issue
at banks of the Columbia river:
[Randy Katz: IEEE Spectrum, Febr. 2009]
Am. football fields
at Quincy,
size: 10
Power consumption by internet:
x30 til 2030 if trends continue
G. Fettweis, E. Zimmermann: ICT Energy Consumption - Trends and Challenges; WPMC'08, Lapland, Finland, 8 –11 Sep 2008
Quincy Dalles Boardman
WASHINGTON
OREGON
48 MW
48 MW
power for
40,000 homes
each 6500 m
2each 6500 m
2 at Dallashttp://hartenstein.de
©2009, reiner@hartenstein.de
7
Power
Consumption
of Computers
Energy cost may overtake
IT equipment cost
in the near future
but
„we may ultimately need
revolutionary new solutions“
[Horst Simon, LBNL, Berkeley]
... has become an industry-wide issue:
incremental improvements are on track,
[Albert
Zomaya]
Current trends will lead to
unaffordable future operation
cost of our cyber infrastructure
(subject
of my talk)
Outline
(2)
• The Power Consumption of Computing
• The Single-Core Approach
• The Multicore Scenario
• The Silver Bullet?
• A CPU-centric Flat World
• The Generalisation of Software Engineering
• Conclusions
http://hartenstein.de ©2009, reiner@hartenstein.de
9
70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08
10
910
810
710
610
510
410
3free ride on
Moore„s Law
the burden of
software performance is
the task of chip designers*
year
*)
M-&-C-created
population
Single-core
approach:
Software Performance
Rapid VLSI Design Education Revolution
1980 - 1983
E.I.S.
project
The incubator
of the free ride
on Moore‘s law
DARPA;
NSF; many
national governments;
European Union …
massive
funding:
Created the missing
designer population
(Heinz
Riesenhuber)
http://hartenstein.de ©2009, reiner@hartenstein.de
11
70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08
10
910
810
710
610
510
410
3The End of Moore„s Law
the end of the
single-core era
year
The end of
Moore„s Law
soon:
the 20
nm wall
2005
traditional instruction-based computing
is running out of steam
[DAC’09 special session:
Computation in the Post-Turing Era
]
year
70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08
10
1010
1310
1210
11relative performance
10
910
810
710
610
510
410
310 12 14 16 18 20 22 24 26 28 30
the end of the
single-core era
number of transistors
doubles every 2 years
Growth beyond Moore„s Law?
http://hartenstein.de ©2009, reiner@hartenstein.de
13
13
year
relative performance
94
96
98
00
02
04
06
08 10
12
14
16
18
20
22
24
26
28
30
be
gin
of
t
he
multi
cor
e
er
a
Multimedia
in the Multicore Era
Multimedia
Performance Needs
application
performance
needs up to:
Audio
800 MIPS
Graphics
11 GOPS
Video
160 GOPS
Digital TV
900 GOPS
[Pierre Paulin, MPSoC’09]
year
relative performance
94
96
98
00
02
04
06
08 10
12
14
16
18
20
22
24
26
28
30
be
gin
of
t
he
multi
cor
e
er
a
next
standard
Broadband
in the Multicore Era
needed
performance
growing
faster than
Moore‘s law
[courtesy E. Sanchez]MIPS
GSM GPRS EDGE UMTS
http://hartenstein.de
©2009, reiner@hartenstein.de
15
ICT is at an inflection point
Senior Counselor to the U.S. Trade Representative (USTR)on strategy and negotiations.
Cheap Revolution:
„Broadband is significant at the inflection point,
prompting major market governance changes“
massive funding needed
Cowhey„s & Aronson„s Law:
affordable broadband
& software performance
„Future prosperity depends on network capacity,
..., efficient pricing, and flexible platforms“
handheld & living room commercially more important
than the comparatively small PC market.
requirement
growing
faster than
Moore‘s law
[courtesy E. Sanchez] MIPSFunding market governance changes
RUS Broadband Initiatives Program (BIP)
http://www.broadbandusa.gov/
NTIA Broadband Technology Opportunities Program (BTOP).
ARPA-E ?
EU-FP7 ?
DARPA ?
other
sources
EFRCEs ?
Energy Frontier Research Centers 777 bio $The Recovery Act:
$7.2 billion
http://hartenstein.de
©2009, reiner@hartenstein.de
17
Outline
(3)
• The Power Consumption of Computing
• The Single-Core Approach
• The Multicore Scenario
• The Silver Bullet?
• A CPU-centric Flat World
• The Generalisation of Software Engineering
• Conclusions
Multicore has been around for decades
•ACRI
•Alliant
•American Supercomputer
•Ametek
•Applied Dynamics
•Astronautics
•BBN
•CDC
•Convex
•Cray Computer
•Cray Research
•Culler-Harris
•Culler Scientific
•DAPP
•Denelcor
•Elexsi
•ETA Systems
•Evans and Sutherland
Computer
•Floating Point Systems
•Galaxy YH-1
•Goodyear Aerospace MPP
•Gould NPL
•Guiltech
•ICL
•Intel Scientific Computers
•International Parallel
Machines
Dead (Super)Computer Society
[Gordon Bell, keynote, ISCA 2000]
•MasPar
•Meiko
•Multiflow
•Myrias
•Numerix
•Prisma
•Tera
•Thinking Machines
•Saxpy
•Scientific Computer
•Systems (SCS)
•Soviet Supercomputers
•Supertek
•Supercomputer Systems
only 2 or 3 successes most in 1985-1995 - mainly research18
http://hartenstein.de
©2009, reiner@hartenstein.de
19
19
Speed-up factors by GPGPUs (1)
The power efficiency is disputable
The power efficiency is disputable
(up to ~150 x)
[Michael Garland, NVIDIA Research: Parallel Computing
on Manycore GPUs; IPDPS, Rome, Italy, June 25-29, 2009]
this hardware can only be
used only in certain ways.
Jan
2007 2007July 2008Jan 2008July 2009Jan 2009July 2010Jan
10
0
10
3
10
2
10
1
S
pe
ed
up
-F
ac
to
r
Imaging
Video
Video
146 20 130 30 100 47 50 149 18 36Bioinformatics
Numerics
Numerics
effective only at problems
that can be solved using
stream processing.
streams provide data parallelism
*) migration from x86 singlecore
*
?
such speed-ups by GPGPUs
only for embarrassingly
parallel applications
Speed-up factors by GPGPUs
(2)
CUDA ZONE pages [NVIDIA Corp.]:
non-reviewed
CUDA user submissions
http://www.nvidia.co.uk/object/cuda_home_uk.html#state=home
S
pe
ed
up
-Fa
cto
r
Cryptography
Cryptography
12 50 55 2Imaging
5 169 20 100 50 30 327 100 5 20 90 109 13 40 10 15 10 36 100 50 35 50Bioinformatics
3 . 5 30 270 20 4 15 16 26 10 4 150 100 35 29 13 4 . 3 15 35 60 40 4 170 12 90 10 15 500 420 75 675 340 50 10 172 50 60 100 169 2 100 50 3 2 5 10 270 27 8 7 32 470 150 9 10 100 30 138 55 7 20 9 10 9 60CFD
CFD
Computational Fluid Dyamics Computational Fluid Dyamics 23 120 39 17 55 100 77 10 29 1 . 3 4 10DCC
Digital Content Creation 5 3 5Graphics
50 2 100 16 25 26 3Astrophysics
Astrophysics
250 2500
10
3
10
2
10
1
DSP
Digital Signal Processing 5 35 50 31 35 8 260EDA
34oil &
gas
Compute Unified Device
Architecture (CUDA),
accelerates BLAS
libraries (Basic Linear
Algebra Subroutines)
Less flexible
(GPGPU tool development
years earlier than f
.
x86)
NVIDIA
GeForce
GTX
stream
processor
cores
minium
power supply
recommended
275
240
650–680 watt
295
480
650–680 watt
Intel Xeon "Nehalem-EX" for servers: 8 cores
Intel Core™2 Quad (desktop PCs): 4 cores
(up to ~600 x)
http://hartenstein.de ©2009, reiner@hartenstein.de
21
21
year
relative
performance
94
96
98
00
02
04
06
08 10
12
14
16
18
20
22
24
26
28
30
Growth by Multicore
be
gin
of
t
he
multi
cor
e
er
a
John
Hennessy:
Hastily knitted
compilers for the
heavy lifting?
e. g. automatically
parallelizing
compilation via
multi-threading,
and many other
ad-hoc solutions
“wait for current
generation of
programmers to die
off and be replaced
new types
of bugs
introduced
http://hartenstein.de
©2009, reiner@hartenstein.de
23
Outline
(4)
• The Power Consumption of Computing
• The Single-Core Approach
• The Multicore Scenario
• The Silver Bullet?
• A CPU-centric Flat World
• The Generalisation of Software Engineering
• Conclusions
year
relative performance
94
96
98
00
02
04
06
08 10
12
14
16
18
20
22
24
26
28
30
be
gin
of
t
he
multi
cor
e
er
a
http://hartenstein.de ©2009, reiner@hartenstein.de
25
25
FFT
FFT
100 Reed-Solomon Decoding Reed-Solomon Decoding 2400 Viterbi Decoding Viterbi Decoding 400 1000 MAC MACDSP and
wireless
molecular dynamics simulation molecular dynamics simulation 88 BLAST BLAST 52 protein identification protein identification 40 Smith-Waterman pattern matching Smith-Waterman pattern matching 288Bioinformatics
GRAPE GRAPE 20 20Astrophysics
Astrophysics
SPIHT wavelet-based image compression SPIHT wavelet-based image compression 457 real-time face detectionreal-time face detection6000
6000
video-rate stereo visionvideo-rate stereo vision 900 pattern recognitionpattern recognition 730
Image processing,
Pattern matching,
Multimedia
3000
CT imaging
CT imaging
10
0
10
3
S
pee
dup
-F
ac
to
r
Speed-up
factors
obtained
by Software
to Configware
migration
vs. GPU: almost 50x
(up to ~30,000x)
(200x)
~50x
(200x)
CUDA ZONE Garland IPDPS‘098723
DNA & protein sequencing
crypto
crypto
100028514
DES breaking
DES breaking
by FPGA:
intel supports direct front
side bus access by FPGAs
“... design techniques will evolve, by
necessity,
to satisfy the demands of
reconfigurable hardware
and software
programmability”. J. R. Rattner, DAC 2008
2 orders of magnitude
FFT
FFT
100 Reed-Solomon Decoding Reed-Solomon Decoding 2400 Viterbi Decoding Viterbi Decoding 400 1000 MAC MACDSP and
wireless
molecular dynamics simulation molecular dynamics simulation 88 BLAST BLAST 52 protein identification protein identification 40 Smith-Waterman pattern matching Smith-Waterman pattern matching 288Bioinformatics
Astrophysics
Astrophysics
SPIHT wavelet-based image compression SPIHT wavelet-based image compression 457 real-time face detectionreal-time face detection6000
6000
video-rate stereo visionvideo-rate stereo vision 900 pattern recognitionpattern recognition 730
Image processing,
Pattern matching,
Multimedia
3000
CT imaging
CT imaging
crypto
crypto
100028514
DES breaking
DES breaking
8723
DNA & protein sequencing
10
3
10
6
S
pee
dup
-F
ac
to
r
10
3
10
6
Speedup-
Factor
+ Pre-FPGA solutions
2000
39.4 160
15000
2-D FIR filter (no FPGA: DPLA by TU-KL*) 2-D FIR filter (no FPGA: DPLA by TU-KL*)
Lee Routing (DPLA by TU-KL*) Lee Routing (DPLA by TU-KL*) Grid-based DRC: no FPGA: DPLA on MoM by TU-KL* Grid-based DRC: no FPGA: DPLA on MoM by TU-KL* Grid-based DRC* („fair comparizon“) Grid-based DRC* („fair comparizon“)
fabricated by E.I.S.
http://hartenstein.de ©2009, reiner@hartenstein.de
27
Software vs. FPGA
96 98 00 02 04 06 08
10
510
410
310
210
110
0year
10
relative performance
1990
1995
2000
200
100
0
50
150
75
25
125
175
SP
EC
fp
20
00
/M
H
z/Bi
lli
on
T
ran
si
sto
rs
HP
[BWRC, UC Berkeley, 2004]0.
5
x
M
O
PS
/M
H
z/Bi
lli
on
T
ran
si
sto
rs
420 1996 46Benchmarks:
Moore‘s
law
does not indicate
microprocessor MIPS
?
!
Moore‟s law not applicable to all aspects
For multicore*:
the Law of More …
…with drastically declining
programmer productivity
*) number of cores doubles every 2 years
http://hartenstein.de ©2009, reiner@hartenstein.de
29
Massive
Energy Saving
factors: ~10%
of speedup factor
29
FFT
FFT
100 Reed-Solomon Decoding Reed-Solomon Decoding 2400 Viterbi Decoding Viterbi Decoding 400 1000 MAC MACDSP and
wireless
molecular dynamics simulation molecular dynamics simulation 88 BLAST BLAST 52 protein identification protein identification 40 Smith-Waterman pattern matching Smith-Waterman pattern matching 288Bioinformatics
GRAPE GRAPE 20 20Astrophysics
Astrophysics
SPIHT wavelet-based image compression SPIHT wavelet-based image compression 457 real-time face detectionreal-time face detection6000
6000
video-rate stereo visionvideo-rate stereo vision 900 pattern recognitionpattern recognition 730
Image processing,
Pattern matching,
Multimedia
3000
CT imaging
CT imaging
crypto
crypto
100028514
DES breaking
DES breaking
10
0
10
3
S
pee
dup
-F
ac
to
r
http://hartenstein.de ©2009, reiner@hartenstein.de8723
DNA & protein sequencing
Software
vs. FPGA
(2)
[Tarek El-Ghazawi et al.: IEEE COMPUTER, Febr. 2008]
Application
Speed-up
factor
Savings
Power
Cost
Size
DNA and Protein
sequencing
8723
779
22
253
DES breaking
28514
3439
96
1116
much less
equipment
needed
much less memory and bandwidth needed massively
saving energy
RC*: Demonstrating the intensive Impact
SGI Altix 4700 with RC 100 RASC compared to Beowulf cluster
Tarek
El-Ghazawi
http://hartenstein.de
©2009, reiner@hartenstein.de
31
Why such Speed-up Factors ...
...
with FPGAs
:
a much worse technology !
massive wiring overhead
+ routing congestion growing with FPGA size
+ massive reconfigurability overhead
main reason:
no
von Neumann Syndrome!
more recently also:
more „platform FPGAs“
The „Reconfigurable Computing Paradox“
RC versus Multicore
RC:
speed-up often higher
by orders of magnitude
RC:
energy-efficiency often higher:
very much, or, by orders of magnitude ?
Sure !
Sure !
We need
both
: Multicore
and RC
this is the
silver bullet
Multicore:
legacy software,
control-intensive
applications, etc.
„
RC
“ =
R
econfigurable
C
omputing
http://hartenstein.de ©2009, reiner@hartenstein.de
33
33
year
relative performance
94
96
98
00
02
04
06
08 10
12
14
16
18
20
22
24
26
28
30
end of
the
sing
lecor
e
er
a
33
Reconfigurable Computing is indispensable!
For a Booming Multicore Era
von-Neumann-only is not the silver bullet
Outline
(5)
• The Power Consumption of Computing
• The Single-Core Approach
• The Multicore Scenario
• The Silver Bullet?
• A CPU-centric Flat World
• The Generalisation of Software Engineering
• Conclusions
http://hartenstein.de ©2009, reiner@hartenstein.de
35
CPU-centric
flat world
sequential-only
mind set –
(Aristotelian model)
typical programmer
qualification:
This
Software-centric
world model
is obsolete
CPU-“centric“ but no
hardware know-how
CPU-“centric“ but no
hardware know-how
(kind of tunnel view)
Machine Model of the Mainframe Era
Machine
model
resources
sequencer
property
programming
source
property
programming source
register
state
http://hartenstein.de
©2009, reiner@hartenstein.de
37
40 years Software Crisis
Nathan‟s Law: Software is a gas.
It expands to fill its container ...
Nathan Myhrvold
… until being limited by Moore’s Law
[& Kryder’s Law]
Wirth„s Law
“software is slowing faster
than hardware is accelerating“
Oct 1957
The Economist: Nov 19th 1955
In 1955, Parkinson could not have
foreseen the impact of software.
formula: bureaucracy growth independent of actual work to be done
[Niklaus Wirth]
[Cyril Northcote Parkinson]
Software critics is not new:
F. L. Bauer 1968,
coined the term „Software Crisis“
N. N. 1995: THE STANDISH GROUP REPORT
Robert N. Charette 2005:
Why Software Fails; IEEE Spectrum, Sep 2005
Anthony Berglas 2008:
Why it is Important that Software Projects Fail
L. Savain 2006:
Why Software is bad
Peter G. Neumann 1985-2003:
216x “Inside Risks“(18 years inside back
cover of Comm_ACM)
“Software”
overhead piles up to code sizes
of astronomic dimensions
The von Neumann
Syndrome:
stands for extremely
memory-cycle-hungry instruction streams
from earlier talks:
from earlier talks:
datastream
parallelism
instruction stream
parallelism
C.V.
Ramamoorthy
“The Memory Wall”
coined by Sally McKee
(& co-author)Patterson‟s Law:
Dave
Patterson
bandwidth gap grows 50% / year
has reached >1000x
the uglyness
of this term
http://hartenstein.de
©2009, reiner@hartenstein.de
39
Machine Model of the PC Era
Machine
model
resources
sequencer
property
programming
source
property
programming source
register
state
ASIC
accelerator
hardwired
-
hardwired
-CPU
hardwired
-
programmable
Software
(instruction streams)
program
counter
Application-Specific Integrated Circuit &
other accelerators: e.g. graphics processor
wagging
the dog“
“the tail is
ISIS 1997
Austin, TX
[
]
Science does not
progress continuously,
Thomas S. Kuhn 1969:
The Structure of
Scientific Revolutions
…in which the established paradigm
is overthrown and replaced.
?
?
.
?
Thomas S. Kuhn
The von Neumann paradigm?
… shortcomings in an established
paradigm produces
a crisis
http://hartenstein.de
©2009, reiner@hartenstein.de
41
Outline
(6)
• The Power Consumption of Computing
• The Single-Core Approach
• The Multicore Scenario
• The Silver Bullet?
• A CPU-centric Flat World
• The Generalisation of Software Engineering
• Conclusions
From CPU to
RPU
machine
model
right now
resources
sequencer
property
programming
source
property
programming
source
register
state
ASIC
accelerator
hardwired
-
hardwired
-CPU
hardwired
-
programmable
Software
(instruction
streams)
program
counter
RPU
accelerator
programmable
Configware
(configuration
code)
programmable
Flowware
(data streams)
counters
data
we need 2 more program sources
R
econfigurable
P
rocessing
U
nit
non-von-Neumann-now accelerators
are programmable!
http://hartenstein.de
©2009, reiner@hartenstein.de
43
[Thomas S. Kuhn 1969: The Structure of Scientific Revolutions]
“… in which the established paradigm is
overthrown and replaced.”
However,
not
the von Neumann paradigm
will be overthrown and replaced.
The CPU-centric world model
of Software Engineering
will be replaced
by
removing the tunnel view perspective
Thomas
Kuhn
is right !
What Revolution?
RC* outside a
CPU-centric
flat world?
For the
Multicore era
we need
a new model
(Copernican)
For the
Multicore era
we need
a new model
(Copernican)
*) RC = Reconfigurable Computing
http://hartenstein.de ©2009, reiner@hartenstein.de
45
Program
Program Performance
„Multicore computers shift the burden of software
performance from chip designers to programmers.“
we anyway need a Software Education Revolution ...
Since People have to write code differently,
[J. Larus: Spending Moore's
Dividend; C_ACM, May 2009]
... performance drops & other problems
in moving single-core to multicore ...
... the chance to move RC* from niche to mainstream
a scenario like before the Mead-&-Conway revolution
Missing programmer population and methodology:
*) RC = Reconfigurable Computing
Embedded syst. & hdw scene have the right background
to reform the parallelism education of SW programmers
A Heliocentric CS Model
FE
F
lowware
E
ngineering
PE
P
rogram
E
ngineering
The Generalization of
Software Engineering —
A Twin Paradigm Dual
Dichotomy Approach.
time to space
mapping
issue
SE
S
oftware
E
ngineering
RPU
RPU
special
*) do not confuse
with „dataflow“!
http://hartenstein.de
©2009, reiner@hartenstein.de
47
A Multicore Submarine Model?
C is not the silver bullet: it’s inherently serial
mapping parallelism just into the time domain:
“abstracting” away the space domain is fatal
But nobody wants to
learn a new language.
There is no easy way to program in parallel
The programmer needs to understand how data flows
through cores, accelerators, interconnect and peripherals
The programmer* needs system visualization
in the space
domain, to understand performance under parallelism
The datastream model of the twin-paradigm approach
helps to understand the space domain and parallelism
*) and, especially the student
Our Contemporary Computer Machine Model
Machine
model
resources
sequencer
property
programming
source
property
programming
source
state register
ASIC
accelerator
hardwired
-
hardwired
-CPU
hardwired
-
programmable
Software
(instruction
streams)
program
counter
RPU
accelerator programmable
Configware
(configuration
code)
programmable
Flowware
(data
streams)
data
counters
twin Paradigm Dichotomy
in CPU
in RAM
data counters of reconfigurable
address generators in
asM
(auto-sequencing) data memory blocks
http://hartenstein.de
©2009, reiner@hartenstein.de
49
Time to Space Mapping
Machine
model
resources
sequencer
property
programming
source
property
programming
source
state register
ASIC
accelerator
hardwired
-
hardwired
-CPU
hardwired
-
programmable
Software
(instruction
streams)
program
counter
RPU
accelerator programmable
Configware
(configuration
code)
programmable
Flowware
(data
streams)
data
counters
Relativity Dichotomy
„The biggest payoff will come from
P
utting
O
ld
i
deas
i
nto
P
ractice and teaching people how to apply them properly.“
David P
ar
na
s
loop turns
2 pipeline
C
C1967
How to achieve acceptance
Hardware description
languages hidden
Courses tailored for
students not being
hardware-savvy
Tools usable by users
not being hardware
designers
[Courtesy Richard Newton]
„How to hide the ugliness
from the user“
[Herman Schmit]
http://hartenstein.de
©2009, reiner@hartenstein.de
51
traditional qualification in the time domain
51
Software Education (R)evolution:
+ lean qualification in the space domain
= lean hardware modeling qualification
at a higher level of abstraction
by simultaneous dual domain co-education:
viable methodology for dual rail education
(only a few % curricula need to be changed)
step by step, not overthrowing the SE scene
We need a Software Education Revolution
2010 - ....
The incubator
of the free ride
on Cowhey‘s &
Aronson‘s law
massive
funding
required
partially
re-write
the code
Create the
missing
programmer
population
next most
effective project in
DOS to Windows
took 10 years
http://hartenstein.de
©2009, reiner@hartenstein.de
53
Community Building Function
of the DATE Friday Workshop
Friday, March 12, 2010, 08:30 – 17:00
Friday Workshop
reiner@hartenstein.de
Software Education Revolution
for using Multicore - and RC* (SERUM-RC*)
http://www.date-conference.com
DATE-Conference, Dresden, DE:
CfP:
http://fpl.org/cfp/
53
*) Reconfigurable Computing
RAW 2010
17th Reconfigurable Architectures Workshop
April 19-20, 2010, Atlanta (Georgia), USA
http://www.ece.lsu.edu/vaidy/raw/
Run-Time Reconfiguration & Adaptive Computing:
Architectures, Algorithms, Technologies
http://www.ipdps.org/
24th IEEE International
Parallel and Distributed
Processing Symposium
April 19-23, 2010,
Atlanta (Georgia) USA
in conjunction with:
Manuscript due:
October 18, 2009
Notification of acceptance: December 14, 2009
Camera-ready Papers Due:
February 1, 2010
http://hartenstein.de
©2009, reiner@hartenstein.de
55
Outline
(7)
• The Power Consumption of Computing
• The Single-Core Approach
• The Multicore Scenario
• The Silver Bullet?
• A CPU-centric Flat World
• The Generalisation of Software Engineering
• Conclusions
To maintain a Booming Multicore Era:
Not without Reconfigurable Computing!
Conclusions (1)
relative
performance
possible for 2 or 3 more decades?
th
e
e
n
d
o
f
th
e
si
n
g
le
c
o
re
e
ra
http://hartenstein.de
©2009, reiner@hartenstein.de
57
additional Flowware / Configware skills are
essential qualifications for programmers.
Mead-&-Conway-style
SE Revolution toward
dual-rail education
is urgently needed
key motivation: performance and
energy consumption of programs
we need to master hetero of
all 3: Singlecore, Multicore,
& Reconfigurable Computing
massive long term
R&D funding required
like known from DARPA
A main problem: selecting (or
creating)
tools for lab courses
SERUM-RC
the key issue:
ease of use!
Conclusions
We need „une' Levée en Masses“
We need „une'
http://hartenstein.de
©2009, reiner@hartenstein.de
59
thank you for your
patience
59
http://hartenstein.de
©2009, reiner@hartenstein.de
61
Credited to be „The father of Reconfigurable Computing“ (also pre-FPGA era) [1],
EU grant (80ies),
85 mio ECU
(pre-€): complete EDA framework [4,5] around KARL
1981: visiting professor at UC Berkeley (& coop. w. Xerox PARC)
1983: founder of the German contribution to the Mead-&-Conway VLSI design revolution:
the multi university „
E.I.S. project
“ (gov. grant:
38 million Deutschmark
)
IEEE fellow, SDPS fellow, FPL fellow, best paper awards, other awards
Professor (ordinarius emeritus), TU Kaiserslautern
CV of Reiner Hartenstein
All acad. degrees from
KIT
Karlsruhe Institute of Technology (his mentor:
Karl Steinbuch
)
Creator of
KARL
[2], most successful [3] trailblazer HDL before VHDL came up
[1] qu. Viktor Prasanna (with Gerald Estrin as the grandfather of Reconfigurable Computing, who proposed it in 1960 WJCC)
[4] R. Hartenstein: The History of KARL and ABL; in: J. Mermet (editor): Fundamentals and Standards in
Hardware Description Languages; ISBN 0-7923-2513-4, Kluwer (now Springer), September 1993.
also see:
http://xputers.informatik.uni-kl.de/karl/karl_history_fbi.html
[5] format-checking functional floorplan graphic editor, and textual editors, calculus-based term rewriting floorplan generator,
embedded router, automatic test generation, testability analysis, structured logic synthesis, simulator, et al. -- also see [
4]
[2] R. Hartenstein: Fundamentals of Structured Hardware Design; American Elsevier,
1977 -- Bestseller
Founder / co-founder of several international annual conference series
reiner@hartenstein.de
61
1977 & later used as a textbook at UC Berkeley (not only here)KARL: a Pascalish hardware language
[3] for users, usage details, quotations,etc.see:
http://www.fpl.uni-kl.de/staff/hartenstein/KARLUsers.html
his hobby: giving keynotes
http://hartenstein.de/keynotes.htm
http://hartenstein.de ©2009, reiner@hartenstein.de
63
Double Dichotomy
2) Relativity Dichotomy
-Procedure
time:
(Software-Domain)
-Structure
space:
(Configware-Domain)
1) Paradigm Dichotomy
instruction stream
von Neumann Machine
(Software-Domain)
data stream
Datastream Machine
(Flowware-Domain)
63
Relativity Dichotomy
time domain:
space domain:
procedure domain
structure domain
2 phases:
1) programming
instruction streams
2) run time
3 phases:
1) reconfiguration
of structures
time
space
2) programming
data streams
3) run time
von Neumann Machine
Datastream Machine
http://hartenstein.de ©2009, reiner@hartenstein.de
65
time-iterative to space-iterative
65
a time to
space/time
mapping
loop transformation
methodogy: 70ies and later
n*k time steps,
1 CPU
n time steps,
k DPUs
Often the space dimension is limited
n time steps,
1 CPU
1 time step,
n DPUs
a time to
space
mapping
e. g. example: bubble sort migration
Strip
[D. Loveman, J-ACM, 1977]
mining
time to space mapping
time domain:
space domain:
procedure domain
structure domain
program loop
n time steps, 1 CPU
1 time step, n DPUs
pipeline
Bubble Sort
n x k time steps,
1 „conditional
swap“ unit
Shuffle Sort
k time steps,
n „conditional
swap“ units
time algorithm
space algorithm
conditional
x
conditional swap conditional swap conditional swap conditionalhttp://hartenstein.de ©2009, reiner@hartenstein.de
67
1
2
3
4
5
6
7
8
y
x
1
2
3
4
5
6
7
8
JPEG zigzag
scan
pattern
EastScan
is
step by
[1,0]
end
EastScan;
SouthScan
is
step by
[0,1]
endSouthScan;
*> Declarations
NorthEastScan
is
loop
6
times
until [*,1]
step by
[1,-1]
endloop
end
NorthEastScan;
SouthWestScan
is
loop
7
times
until [1,*]
step by
[-1,1]
endloop
end
SouthWestScan;
HalfZigZag
is
East
Scan
loop
3
times
SouthWest
Scan
South
Scan
NorthEast
Scan
East
Scan
endloop
end
HalfZigZag;
goto
PixMap[1,1]
HalfZigZag;
SouthWestScan
uturn
(HalfZigZag)
Hal
fZig
Zag
data counter data counter data counter data counter2
1
3
4
HalfZigZag
Flowware language example (MoPL):
programming the datastream
x
y
67
(an animation)
Programming model: Flowware
Adder
Speaker
FMDemod
LPF
1Split
Gather
LPF
2LPF
3HPF
1HPF
2HPF
3 Source: MIT StreamIT• Pros for streaming
– Streamlined, low-overhead
communication
– (More) deterministic behaviour
– Good match for many simple media
rich applications
[Pierre Paulin]
We„ve to find out, which applications
types and programming models Students
should exercise for the flowware approach
• Cons
– control-dominated applications
– shunt yard problem
http://hartenstein.de
©2009, reiner@hartenstein.de
69
Flowware
from a generalization of the systolic arrays
supports any wild free form of pipe networks:
spiral, zigzag, fork and join, and even more wild,
unidirectional and fully or partially bidirectional,
Flowware: scheduling data streams
-Fifos, stacks, registers, register files, RAM blocks...
Flowware means parallelism
resulting from time to space migration
Ways to implement an Algorithm
• Hardware
• Software
• Configware
• mixed
von
Neumann-machine
datastream
machine
multicore
.
manycore
per se
singlecore
manycore
RAM-based
http://hartenstein.de
©2009, reiner@hartenstein.de
71
Acceleration Mechanisms
•parallelism by multi bank memory architecture
•auxiliary hardware for address calculation
•address calculation before run time
•avoiding multiple accesses to the same data.
•avoiding memory cycles for address computation
•optimization by storage scheme transformations
•optimization by
memory architecture transformations
New boundary constraints are the limiting factor
Legacy scientific applications: predominantly sequential
The entire software ecosystem will need to evolve
(including curricula): O/S, libraries, software
development environments, compilers and languages
additional levels of parallelism: chaining, pipelining,
systolic, super-systolic, wavefront arrays
additional data structures and storage organization:
the new distributed memory discipline
http://hartenstein.de
©2009, reiner@hartenstein.de
73
old Paradigms and Methodologies
1946: Machine Paradigm (von Neumann)
1980: Datastreams (Kung, Leiserson)
1989: Anti Machine** Paradigm (TU-KL)
1990: first rDPA* (Rabaey)
1994: higher Anti Machine** Programming Language (
Flowware:
TU-KL)
1995: super systolic array: rDPA (Kress)
1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), ...
1997+: Discipline of Distributed Memory Architectures
(IMEC …)
1997: first automatically partitioning Configware/Software Co-Compiler
(TU-KL)
*) rDPA = reconfigurable
Data Path Array
**) datastream machine
(flowware machine):
http://hartenstein.de
©2009, reiner@hartenstein.de