Research Challenges Ahead The HiPEAC Vision 2011 / 2012
M. Duranton, D. Black-Schaffer, S. Yehia, K. De Bosschere
Executive Summary 2
1. Computing Systems: The Cornerstone 4 of Our Civilization
1.1. Technology Push 4
1.2. Application Pull 5
1.3. Business Trends 5
2. The HiPEAC Core Computing Systems Challenges 6
2.1. Efficiency 6
2.2. Complexity 6
2.3. Dependability 6
3. Impact of Computing Systems on Society 8 4. HiPEAC Research Objectives in the European 10 Context
4.1. Efficiency 10
4.1.1. Heterogeneous computing systems 10 4.1.2. Locality and communications management 11
4.2. System complexity 12
4.2.1. Cost-effective software for heterogeneous multi-cores 12 4.2.2. Cross-component/cross-layer optimization for design 13 integration
4.2.3. Next-generation processing cores 14
4.3. Applications 14
4.3.1. Architectures for the Data Deluge 14 4.3.2. Reliable systems for Ubiquitous Computing 15
5. Conclusion 17
Appendices: The Roadmap In-Depth 18 A. Key Trends and Challenges in Computing Systems 18 A.1. Societal Trends and Challenges for Computing Systems 18 A.2. Business Trends and Challenges for Computing 20 A.3. Application and System Trends Challenges 24for Computing Systems
A.4. New technological challenges and opportunities 30 B. SWOT Analysis of Computing Systems in Europe 34
B.1. Strengths 34
B.2. Weaknesses 34
B.3. Opportunities 35
B.4. Threats 35
C. The HiPEAC Core Computing Systems Challenges 36
C.1. Improving efficiency 36
C.2. Managing complexity 37
C.3. Improving dependability 40
D. HiPEAC Research Areas in Architecture, 42 Compilers, and Systems
D.1. Parallelism and Programming Models 42
D.1.1. Locality Management 42
D.1.2. Compile and runtime optimizations, 43 programmer hints, tuning
D.1.3. Runtime Systems and Adaptivity 43
D.2. Architecture 44
D.2.1. Processors, Accelerators, Heterogeneity 44
D.2.2. Memory Architectures 44
D.2.3. Interconnection Architectures 45
D.2.4. Reconfigurability 46
D.3. Compilers 46
D.3.1. Automatic Parallelization 46
D.3.2. Adaptive Compilation 47
D.3.3. Intelligent Optimization 47
D.4. Systems Software and Tools 47
D.4.1. Virtualizatiofn 47
D.4.2. Input, Output, Storage, and Networking 48 D.4.3. Simulation and Design Automation Tools 48 D.4.4. Deterministic Performance Tools 48
Computing Systems have a tremendous impact on everyday life in all domains, from the Internet to consumer electronics, transportation to manufacturing, medicine, energy, and scien- tific computing. In the future, computing systems will continue to be one of our most powerful tools for taking on the societal challenges shaping Europe, its values, and its global competi- tiveness.
The FP7 HiPEAC network of excel- lence is Europe’s premier organiza- tion for coordinating research, im- proving mobility, and enhancing visibility in the computing system field. HiPEAC covers all computing market segments: embedded systems, general purpose com- puting systems, data centers and high performance comput- ing. Created in 2004, HiPEAC today gathers over 250 leading European academic and industrial computing system research- ers from about 100 universities and 50 companies in one vir- tual centre of excellence. To encourage computing systems innovation in Europe, HiPEAC provides collaboration grants, internships, sabbaticals, and improves networking through the yearly HiPEAC conference, ACACES summer school, and the semiannual computing systems week.
In this roadmap document, HiPEAC leverages the broad exper- tise of its members to identify and analyze the key challenges for computing systems in Europe over the next decade. While advances in computing systems have been consistent and dra- matic over the past fifty years, its future today is not as certain.
To continue to be a tool for providing new and innovative solu- tions, the computing systems community must face serious challenges in efficiency, complexity, and dependability.
Definite trends are emerging from upcoming societal challeng- es and the evolution of computing systems. First, our society is clearly experiencing a new era of data explosion in all domains.
This explosion of data is particularly characterized by the vari- ety of formats data can take (text, documents, video, photos, environment observations, etc.). Second, while connectivity during the last decade was mainly limited to wired computers and servers, we are now witnessing an explosion in connectiv-ity. Critically, this connectivity now comprises a large variety of devices, ranging from warehouse-sized data centers for cloud and high performance computing, to mobile devices (phones, cars, planes, etc.), and all the way down to embedded sensors in the physical world and in the human body. Third, the com- puting domain is facing an increased demand for dependability and reliability across all fields. Many emerging applications re- quire high levels of safety and security (healthcare, automotive, etc.) and new technologies are introducing new challenges in reliability (ubiquitous connectivity, unreliable devices, etc.).
On the market side, the leadership of the PC as driver for hard- ware and software development is fading, and being replaced by more mobile and consumer-oriented devices. Accordingly, the focus on development is shifting to embedded/mobile systems and cloud services. This transition is leading to major de-verticalization of the market players and a convergence of technology platforms. As a result, we are experiencing an in- creased diversification of the value chain and more emphasis on integration. While this encourages entry to the market, it makes product differentiation more difficult.
From a technology point of view, the “Moore’s law” of ever- increasing levels of integration fuelled performance over the past five decades. Each new technology generation doubled transistor density and increased frequency, while simultane- ously reducing the power per transistor. Ever more demand- ing applications directly exploited these growing resources with minimal changes to the software.
However, a major paradigm shift is now taking place:
1) “Moore’s law”, while keeping pace in terms of transistor density, is now enabling only minor frequency increases and minor decreases in power dissipation per transistor. To keep increasing raw performance, the current approach is to add more processing units (multi-core processing). Un- fortunately, this is far from transparent to most applica- tions: existing software now has to be re-engineered to execute efficiently on parallel architectures. The complexity of this task is one of today’s main challenges.
2) Another important limitation is power efficiency: even if more devices can be packed on a chip, the power used by each device is no longer dropping accordingly (end of Dennard scaling). Since we are already at the power limit, it will no longer be possible to use all devices on a chip simultaneously. The resulting need to turn off functionality to meet power constraints results in “Dark Silicon”.
3) The explosion of data (the “Data Deluge”) and the in- crease in natural (unstructured) data from the real world (“cyber-physical systems”) is increasing computation re- quirements, and demanding new computing methods and storage faster than technology can keep up. Increasingly complex algorithms and systems are required to efficiently handle this new era of data.
4) As devices become smaller with each generation, the vari-
ability between devices (in terms of performance and pow-
er) and their reliability decreases. To continue to leverage
ever-smaller devices, we must learn how to build reliable
systems from unreliable, and highly variable, components.
For the short and medium term, HiPEAC believes that spe- cializing computing devices is the most promising path for dramatically improving power efficiency. This improved effi-ciency is needed to meet the data deluge of the 21st century.
Unfortunately this trend will only worsen the complexity and cost of developing software for these systems. Further, the increasing need for reliability forces us to consider variability, security, and safety at all levels of the system and develop- ment cycle. In this light, HiPEAC has identified seven specific research objectives for the computing systems community:
Efficiency (with a focus on energy efficiency)
1) Heterogeneous computing systems: how can we design computer systems to maximize power efficiency and perfor- mance?
2) Locality and communications management: how do we intelligently minimize or control the movement of data to maximize power efficiency and performance?
3) Cost-effective software for heterogeneous multi-cores:
how do we build tools and systems to enable developers to ef- ficiently write software for future heterogeneous and parallel systems?
4) Cross-component/cross-layer optimization for design in-tegration: how do we take advantage of the trend towards component-based design without losing the benefits of cross- component optimization?
5) Next-generation processor cores: how do we design proces- sor cores for energy-efficiency, reliability, and predictability?
Dependability and applications (with a focus on their non-func- tional requirements)
6) Architectures for the Data Deluge: how can we tackle the growing gap between the growth of data and processing power?
7) Reliable systems for Ubiquitous Computing: how do we guarantee safety, predictability, availability, and privacy for ubiq- uitous systems?
In the longer term, it will become critical to investigate re- search directions breaking with the line of classical Von Neu- mann systems and the traditional hardware/software bound- ary. This includes new devices, such as dense non-volatile memories, optical interconnect, spintronics, memristors, etc., and new computing paradigms, such as bio-inspired systems, stochastic computing, swarm computing, etc. These direc- tions all offer the promise of performing particular tasks at high efficiency levels while decreasing the impact of the con- straints of the new technology nodes.
By addressing the seven specific research objectives and inves-
tigating emerging technologies, we will be able to ensure that
Europe can continue to benefit from the promised growth of
computing systems technology. Failure to address these chal-
lenges will significantly reduce our ability to leverage com-
puting systems’s potential to improve global competitiveness
and tackle society’s challenges.
The HiPEAC Vision 2011/ 2012
Computing systems devices are universal today. All facets of public, private, and commercial life are impacted both directly and indirectly by them. Advances in computing systems are the key to the development of new domains and revolution- ary technologies, such as personalized medicine, online social interaction, and immersive entertainment experiences. Indeed, computing systems are so valuable that people demand con- stant access and have an insatiable appetite for new devices and capabilities. In addition to creating new paradigms, comput- ing capabilities revolutionize existing technologies. Across all of modern society, from manufacturing to agriculture, communica- tions to energy, and social interaction to advanced science, com- puting systems is our primary tool for improving productivity, safety, well-being, and health. Investing in computing systems strengthens our most powerful tool for tackling the problems of today and tomorrow.
Yet today computing systems are experiencing several dra- matic shifts. Technological limitations are pushing comput- ing systems away from the ever-increasing performance of the past, while applications are pulling computing systems towards ever larger, more intensive, and more critical roles. At the same time business trends are causing widespread con- vergence of platforms, decoupling of design and production, and a rapid switch towards mobile embedded systems over desktop computers.
1.1. Technology Push
A decade into the 21st century, computing systems are fac-ing a once-in-a-lifetime technical challenge: the relent-less increases in raw processor speed and decreases in energy consumption of the past 50 years have come to an end. As a result, all of computing systems are being forced to switch from a focus on performance-centric serial compu- tation to energy-efficient parallel computation. This switch is driven by the higher energy-efficiency of using many slower parallel processors instead of a single high-speed one. How- ever, existing software is not written to take advantage of parallel processors. To benefit from new processor develop- ments, developers must re-design and re-write large parts of their applications at astronomical cost.
Yet even the shift to universal parallelism is not enough. The increasing number of components on a chip, combined with decreasing energy scaling, is leading to the phenomenon of
“dark silicon”, whereby chips have a too high power den- sity to use all components at once. This puts an even greater emphasis on efficiency, and is driving chips to use multiple different components, each carefully optimized to efficiently execute a particular type of task. This era of heterogeneous parallel computing presents an even greater challenge for developers. Now they must not only develop parallel appli- cations, but they are responsible for deciding what types of processors to use for which calculations.
1. Computing Systems: The Cornerstone of Our Civilization
1. Computing Systems: The Cornerstone of Our Civilization
Tackling these challenges requires addressing both the hard- ware and software challenge. We must design energy-efficient systems with the right mix of heterogeneous parallel compo- nents and provide developers with the tools to effectively le- verage them. Without either developments, we will be unable to continue the computing growth that has so changed our society over the past 50 years. Accomplishing this will require a global reassessment of how hardware and software interact.
1.2. Application Pull
While technology is pushing computing systems towards heterogeneous parallelism for energy efficiency, applications are pulling it towards ever increasing levels of performance, connectivity, and dependability. Individuals, businesses, gov- ernments, scientists and societies alike are relying on cost- effective, robust, and ubiquitous storage, communication and processing of unprecedented volumes of data. At the same time, the demand for more intelligent processing is growing, largely due to the increasingly unstructured nature of the data that is increasingly provided by the physical world. The result- ing “Data Deluge” is far out-pacing any projected advances in storage and processing capacity.
While we are struggling to cope with storing and processing the on-going data deluge, the modalities for using the infor- mation have changed dramatically. Users are exploiting ubiq- uitous communications to change where and how comput- ing is done. As a result, backend processing is moving from fixed-purpose servers to general-purpose, commodity, cloud systems, and user interaction is shifting from the desktop to the embedded devices, such as smartphones and tablets. This transition enables more flexible and scalable computing, but puts a much heavier emphasis on dependability and security.
As cloud and communications systems become integral to all aspects of daily life, we become dependent on them for safety- critical functions and we rely on them to protect our privacy and security. To survive this transition we need to develop tech- niques for building large distributed systems that can meet so- ciety’s dependability, security, and privacy requirements.
The combination of massive amounts of data, demand for intelligent processing, ubiquitous communication, and con- strained system energy efficiency, lead us to summarize the Critical Trends Influencing Computing Systems Today
• Data Deluge
• Intelligent Processing
• Ubiquitous Communication
• Frequency Limits
• Power Limits
• Dark Silicon
• Post-PC Devices
trend in applications as: “Data Deluge meets the Energy Wall in a Connected World.” To meet the challenges posed by these trends we need to enable storage, communi-cations, and processing with orders of magnitude less energy than we can today, while ensuring functional dependability and information security. Accomplishing these goals requires revisiting the design of applications and the systems upon which they are built.
1.3. Business Trends
The computing systems business is likewise experiencing a range of disruptive trends. Convergence, both in hardware and software platforms, is rampant throughout the industry with desktop processors and embedded processors merging and applications moving from local systems to commodity cloud platforms and the web. Consumers are discovering that “less is more” and are seeking improved mobility and experience over raw performance and features. Companies are de-verticalizing and spinning off parts of the value chain to improve competitiveness by increasing specialization and productivity at each level. And at the same time, the move towards open source software has opened up new collabora-tions across companies and nations, and ushered in a vast range of robust, low-cost tools and technologies.
These trends are putting increased pressure on companies to
effectively integrate software and hardware components. De-
verticalization means designers no longer control the whole
value chain, and must combine components from a range of
suppliers. Convergence and the “less is more” trend are forc-
ing companies to compete on the whole product package
and ecosystem, rather than the raw performance and feature
list. And the availability of open source software has both
lowered the barrier for entry and increased the need for prod-
uct differentiation. These trends are all shifting the market
from the historic leadership of computing on the desktop to
a new focus on mobile devices accessing commodity cloud
systems. To be competitive in this market, companies must
either excel at integrating components and systems from di-
verse manufacturers and delivering an optimized end-user
experience, or take the opposite approach and control every
level of the value chain (e.g. Apple and Google).
Several aspects of the future of computing systems for the next several years are clear:
• Energy efficiency will force hardware to move to hetero- geneous parallel systems
• The Data Deluge will drive applications towards increasing levels of real time processing of increasingly sophisticated data
• Ubiquitous computing and “less is more” will force a busi- ness focus away from the desktop towards the cloud and mobile devices
Yet these same trends lead to significant challenges:
• Heterogeneous systems are prohibitively difficult (and hence costly) to program with today’s tools
• Existing infrastructures for data processing will not scale up to meet the expected increase in data
• The focus on mobile devices and cloud processing will re- sult in significant challenges for providing reliable services Based on these trends and challenges, HiPEAC has identified three Core Computing Systems Challenges:
The HiPEAC Core Computing Systems Challenges
• Efficiency: Efficiency focuses on maximizing the amount of computation we can accomplish per unit of energy and for a minimum cost (both development and production), and is the key for sustaining growth in our computational capabilities.
• Complexity: Complexity identifies the need to provide tools and techniques for enabling developers of software and new hardware to leverage increasingly complex sys- tems for increasingly complex applications.
• Dependability: Dependability encompasses the reliability and predictability needed for safety-critical systems and the security and privacy demanded for ubiquitous computing.
Each of these challenges plays an integral role for the future growth of our computing capabilities and the societal benefits we derive from them.
Watt/Euro. Performance at any cost is no longer tenable. The future is in efficiency first, and as a result, it is essential to op- timize energy usage throughout the system.
The solution to improved energy efficiency is to leverage paral-lel heterogeneous architectures of task-optimized processors and accelerators. By optimizing these components for specific tasks, their energy efficiency can be increased by orders of magnitude. However, specialization comes with a loss of gen- erality. As a result, there will be a significant burden on system designers and application developers to choose the right com- bination of heterogeneous processors and accelerators, and to leverage them optimally in the applications.
Complexity has a strong impact on the cost of devel-oping computing systems. As systems and applications be- come more complex and distributed, the difficulties in design, implementation, verification, and maintenance are rising. The issue of complexity has come to the forefront with the move to universal parallelism, which is widely acknowledged as be- ing too complex to expose to developers. Add to this the fur- ther complication of heterogeneity, and the resulting complex- ity becomes fatal for innovation and advancement. As a result, it is no longer practical to write software that fully leverages modern and future systems.
The solution to this increased complexity is to develop tools and techniques that handle the complexity and simplify the development for system designers and application developers.
These must span the full range from design space explora- tion for hardware and performance modeling for software, to runtime analysis, virtualization, optimization, debugging, and high-level programming systems. The goal is to provide a simplified interface for developing and understanding applica-tions, a guarantee of performance portability across current and future systems, and a path for integrating legacy code.
Without these capabilities, the costs of leveraging modern and future hardware will be too high, and the societal ad- vances enabled by computing systems will stall.
Dependability defines the safety, security, and reliability of computing systems. All safety-critical systems today are based on computing systems technology, and with the promise of increased performance, connectivity, and re-duced size, such systems will play an increasing role in the future. In addition to safety-critical systems, the global accessibility of data is bringing issues of data privacy and se-curity to the forefront. For society to benefit from the massive amounts of data available we need to ensure that individual privacy and data ownership can be respected and enforced.
Core Computing Systems Challenges Efficiency
Power defines performance for all modern and future
computing systems. From battery life in mobile devices to
cooling capacity in large-scale data centers, the key metrics of
computing systems are now Operations/Watt and Operations/
But dependability is not just about the design and construction of secure and reliable systems. As technology advances, the individual devices from which systems are built are becoming less and less reliable themselves, and systems must adapt. To counter this, we must develop techniques for building systems from unreliable components without unduly sacrificing perfor- mance, efficiency, or cost.
The solution to handling increased demands for dependability
must be built into all layers of the system. At the hardware layer
improved predictability and security must be part of the basic
architecture, while the software stack must include time and
latency as first-class requirements. Tools must provide analysis
and verification to guarantee correctness and improved sta-
tistical timing models to predict system behavior. In addition,
systems must work together with their hardware to adapt to
failing and unreliable components, while still maintaining the
required level of dependability.
We must make significant advances in all three Core Challenges to maintain the fantastic growth rates that have made comput- ing the cornerstone of our modern civilization. If we fail in any one of them, we will risk the future advances promised by more
powerful, ubiquitous, and efficient computation. To highlight the importance of these challenges for society, the table below identifies key applications and how they relate to the nine soci- etal grand challenges as identified by the commission [ISTAG].
Reduce the direct energy consumption of computing systems. High-perfor- mance for optimizing energy usage, generation, and distribution.
Reduced power consumption for smarter vehicles and sensors. High- performance for design, optimization of routing and planning.
Reduced power consumption for smarter and smaller sensors and diagnostic tools. High-performance for drug design and population analysis.
Reduced power consumption for smarter home sensors and household robotics.
Large-scale distributed power monitoring and generation networks.
(e.g., smart meters).
Large-scale networks of cars and smart roads. Optimization of goods delivery and transportation.
Large-scale systems for medical record analysis and patient monitoring.
Large-scale systems for home moni- toring. Complex robotics for human interaction.
Safety and reliability for generation and distribution. Privacy for personal energy consumption information while enabling aggregate analysis.
Safety of embedded vehicle systems.
Reliability of global transportation opti- mizations. Privacy for personal location data, while enabling aggregate analysis.
Safety of embedded medical devices.
Privacy for personal medical data while enabling aggregate analysis.
Safety of embedded medical devices and household services. Privacy for personal data and monitoring.
Transportation and Mobility
Reduced power consumption for smarter and smaller sensors. High-per- formance for global-scale simulation, analysis and visualization of data.
Reduced power for portable embed- ded systems. High-performance for product optimization, forecasting, and efficient manufacturing.
Large-scale systems for integrating data from networks of sensors.
Large-scale, real-time integration of data from manufacturing, distribution, and sales to enable optimized produc- tion.
Reliable monitoring of critical environ- mental markers. Privacy for personal data while enabling aggregate analy- sis.
Optimizing high-reliability with low cost, particularly in the presence of unreliable components.
Reduced power for smaller, more intel- ligent embedded systems. High-perfor- mance for more intelligent analysis of complex situations.
Reduced power for smaller systems to provide ubiquitous access to infor- mation. High-performance for more powerful and intuitive learning tools.
Smaller, higher-performance tools for law-enforcement and defense.
Verifying integration of components from multiple vendors.
Enabling non-computing profession- als to leverage computing advances through higher-level tools.
Large-scale, real-time data analysis for detecting threats and patterns.
Safety and reliability of embedded devices and safety critical systems.
Protection from unappropriate con- tent. Guarantees for secure and safe operation.
Security and privacy guarantees for individuals and data.
design: conception and design of the accelerators, finding the right mixture of cores for current and future workloads, providing the right interconnections between different cores, managing power budgets when only a fraction of cores can be turned on at any given time, and validating that the wide variety of cores are correct.
Heterogeneity is the most viable way forward to ensure con- tinued growth in computing systems performance without a miraculous improvement in device-level energy efficiency.
Failure to enable this path will severely limit our ability to le- verage future device scaling to improve performance.
• Industry: Hardware developers and integrators need to determine the right mix of processors, accelerators, and interconnect, and need to define standards for interoper- ability at the software and data levels. CAD tools manufac- turers should deliver tools helping further the developers of new accelerators.
• Academia: Explore the mix of processors, accelerators, and interconnect for future application domains and future technology nodes.
Finding the right “degree of specialization/flexibility” for spe- cialization to be affordable. System-level integration of het- erogeneous cores. Integration of heterogeneous IP.
• Efficiency: Choosing the right mix of processors, accelera- tors, and interconnect. Efficient data movement support.
Efficient system integration: SIP, 3D-stacking. Tools for design and validation of domain specific accelerators. Au- tomated design space exploration to select the optimum hardware structures. Standardization of hardware and soft- ware interfaces. Reconfigurable cores.
• Complexity: Models for reducing/hiding heterogeneity.
New approach to reduce simulation and validation time.
Standard interfaces. Virtualization support for accelerators.
Hardware support for software. Shared/coherent or virtual address spaces across cores and accelerators.
• Dependability: Redundant cores for reliability. Secure cores for security. Predictable cores and memory systems for safety-critical systems. Verification of interconnects and combined functionality.
Maximum efficiency (operations/Watt) for key application.
Ease of compilation. Scalability of interconnects.
To address these challenges, HiPEAC has identified three key areas for research (efficiency, system complexity, and applica- tions) and seven specific research objectives:
• Efficiency (with a focus on energy efficiency)
- Heterogeneous computing systems: how can we design computer systems to maximize power efficiency and performance?
- Locality and communications management: how do we intelligently minimize or control the movement of data to maximize power efficiency and performance?
• System Complexity
- Cost-effective software for heterogeneous multi-cores: how do we build tools and systems to en- able developers to efficiently write software for future heterogeneous systems?
- Cross-component/cross-layer optimization for de-sign integration: how do we take advantage of the trend towards component-based design without losing the benefits of cross-component optimization?
- Next-generation processor cores: how do we design processor cores for energy-efficiency, reliability, and pre- dictability?
• Applications (with a focus on their non-functional requirements)
- Architectures for the Data Deluge: how can we tackle the growing gap between the growth of data and pro- cessing power?
- Reliable systems for Ubiquitous Computing: how do we guarantee safety, availability, and privacy for ubiquitous systems?
By focusing on these areas, the HiPEAC community will be able to make significant high-impact contributions to com- puting in Europe and in the world. These advances are neces- sary to enable our society in the 21st century to continue to reap the benefits of computing systems that have so revolu- tionized the 20th century.
4.1.1. Heterogeneous computing systemsDrive
The end of power scaling combined with continued increases in transistor density have put computing systems in the diffi- cult position of having more transistors than can be turned on at once. This era of “dark silicon” leads to a focus on making the most efficient use of the transistors that are turned on at any given time. As a result, processor design is becoming het- erogeneous, with large numbers of specialized cores, ASIPs (Application-Specific Instruction-set Processors) and accelera- tors, each optimized for energy efficiency on specific tasks.
However, this trend poses significant challenges for system
• Industry: Hardware vendors must provide support for ex- plicit data movement. Compiler manufacturers must ex- pose this to the application and runtime, but not require it.
• Academia: Optimizations for data movement spanning embedded to HPC. Runtime systems and compilers for in- telligent data placement and movement. New concepts, architectures, and devices to enable co-location of compu- tation and storage.
Automatic design of the optimal memory hierarchy for het- erogeneous computing systems. Design of simple but effec- tive performance models that help hardware designers and programmers to make the right decisions. Static and runtime systems for automatic intelligent data movement. Revisit the
“best effort” paradigm and the memory hierarchy scheme vs.
explicit scheduling. Leverage advances in new storages de- vices. Develop new storage devices allowing co-location of processing and storage.
• Efficiency: Optimizing data movement. Controlling hard- ware prefetchers/DMA engines. Intelligent runtime systems for data movement. Memory hierarchy design for both ex- plicit and implicit data movement. Minimizing coherency overhead with explicit data movement. Cache manage- ment. New architectures for co-located computation and storage.
• Complexity: Modeling performance/energy costs of data movement. Tools to automate runtime and static data movement decisions. Tools for PGAS system data move- ment. Debugging and performance analysis support. Leg- acy code migration support. Mitigating NUMA effects and variable latency accesses.
• Dependability: Correctness guarantees for data move- ment with concurrency. Memory models for coherency and message passing. Handling device failures. Quality of service in virtualized/shared resource environments. Predict- able latency for safety-critical systems. Impacts of shared memory resources on multi-core performance. Revisiting the “best-effort” paradigms towards a more “on-demand”
Percentage of data from explicitly-managed transactions.
Speedup from explicit communications. Portability of explicit- ly-managed memory code. Power reduction from simplifying the memory hierarchy.
• Short: Development of efficient accelerators.
• Medium: Tools for improving productivity during develop- ment.
• Long: Automatic porting of legacy applications on parallel and heterogeneous systems.
Opportunities and Potential Disruptive Technologies New forms of computing elements (PCMOS, 3D, Neuromor- phic elements, etc.) and the integration of more traditional ones (FPGAs, GPU, CGRA, etc.) will be more common. New memory and interconnect technologies (photonic on silicon, 3D stacking, non-volatile memories, etc.) will alter the data/
compute balance. Industry convergence on low-level pro- gramming systems (e.g., OpenCL) and virtualization for ac- celerators will increase adoption.
Lack of programming models for heterogeneous systems.
Difficulty of large-scale simulations. Lack of standard bench- marks.
4.1.2. Locality and communications managementDrive
As computing systems become increasingly complex the “dis- tance” between processors, storage and data is increasing.
This is not only a performance issue, as it takes time to move data, but more critically a power problem, as communica- tions accounts for the majority of the total power in modern systems. However, experience has shown that while explicit data movement can give tremendous efficiency and perfor- mance benefits, the difficulty of manually managing data movement is prohibitively high for most developers. To ef- fectively address these problems we must develop intelligent techniques for managing data placement and movement.
Such techniques must be designed together with hardware resources to efficiently store and transport data. This will also impact the current thinking of “best effort”, “as fast as pos- sible” processing toward a more “on-time” model, “process- ing only when required”. Ultimately, the memory hierarchy should be revisited to enable a co-location of computing and storage. New storage elements, if technically successful, will be a major player for this evolution.
Ability to obtain high efficiency and performance from future
systems. Ability to cost-effectively develop efficient software
for large and complex systems.
• Short: Simple models for manual locality management.
• Long: Automatic and dynamic locality management. Use of new storage devices for co-locating storage and process- ing.
Opportunities and Potential Disruptive Technologies New memory and interconnect technologies (photonic inter- connect, stacked die, etc.) will alter the optimal design point for memory systems. Non-volatile memories might eventually blur the line between primary and secondary storage. Higher- level domain-specific programming systems will enable easier runtime/static analysis.
Complex access patterns remain difficult to optimize. Hard- ware programmability for explicitly managed communica- tions. Lack of integrated hardware/compiler design research.
Changing the mindset from “as fast as possible” into “only when necessary”.
4.2. System complexity
4.2.1. Cost-effective software for heterogeneous multi-cores
The transition to ubiquitous heterogeneous parallel process- ing is the path forward to tackle computing power efficiency.
However, this hardware solution comes at an enormous cost in program complexity: parallelizing applications, mapping computation to heterogeneous processors, and adapting to new systems and architectures. Today, the cost of these devel- opment activities is prohibitively high for virtually all develop- ers, and it will only increase as systems become more parallel and more heterogeneous. To enable companies to leverage the potential of these future systems, we must develop tools to that manage this complexity for the programmer. Such tools must provide simplified interfaces for writing software, guarantees of performance portability across systems, and a path for integrating legacy code.
Ability to cost-effectively leverage future performance growth in computing systems for new and existing applications.
• Industry: Software developers need tools to leverage new hardware. Hardware designers need tools to make new hardware usable. Compiler developers need to standardize interfaces and extensions to make code portable and de-
buggable. Everyone needs to address the issue of moving legacy code to new systems.
• Academia: New programming systems and approaches.
Runtime and static optimization strategies for complex ar- chitectures. Scaling from embedded SoCs to HPC. Interop- erability with legacy code and systems.
Performance portability. Running code on heterogeneous de- vices. Co-designed virtual machines. Data movement. Run- time/static optimization and load balancing. Programmer feedback. Debugging. Correctness. Legacy code on new sys- tems. Programming models for specialized architectures.
• Efficiency: Performance portability across different sys- tems. Runtime/static optimization. Runtime performance monitoring and analysis. Profile guided JIT compilation and optimization for managed languages. Runtime timing analysis for latency requirements.
• Complexity: High-level software development with per- formance portability across different systems. Auto-analy- sis and parallelization of complex loop nests. Programmer feedback for understanding performance and power. De- bugging support. Modeling and predicting power and per- formance. Design space exploration tools to help select the best architecture and compilation options. Shared resource modeling and analysis. Integrating with legacy code and workflows.
• Dependability: Formal correctness of runtime systems and user code in the presence of concurrency. Handling device failures. Quality of service in shared/virtualized en- vironments. Programming models for ensuring timing for safety-critical systems on heterogeneous systems and ac- celerators.
Percentage of peak performance/efficiency automatically achieved for a given application across multiple systems. Ease of obtaining high efficiency on different systems. Ability to integrate and accelerate legacy code.
• Short: Systems to enable understanding of existing code and assist with parallelizing. Directive-based compiler tools.
• Medium: Programming systems with a more integrated runtime and language. Providing object-oriented para- digms across accelerators. Integrated performance/power/
timing modeling and optimization.
• Long term: Full performance portability. Self-adapting
software. New programming paradigms.
Opportunities and Potential Disruptive Technologies Standard access layers to diverse devices (e.g., OpenCL) pro- vide a good low-level platform for research and tools. The LLVM compiler tool chain provides a modern accessible base for new compiler development and research. Polyhedral loop transformation frameworks are becoming mature. Heteroge- neous systems are becoming standard with CPU+GPU+video codec in nearly every device today. Domain specific languages have the potential to accelerate adoption. New programming paradigms or self-adapting software will hide the complexity to humans, but their final behavior should be under control (e.g. by meta-rules).
Legacy code still dominates and is hard to understand. Hard to get realistic problems into academia. No representative heterogeneous benchmark suites. Performance prediction is becoming harder with new technology.
4.2.2. Cross-component/cross-layer optimization for design integration
The decoupling of design and production, combined with in- creased levels of on-chip integration, are leading to system- on-chip designs with increasing numbers of components from wider varieties of vendors. In addition to the hardware components, larger portions of the software stack are being provided as components from companies or open-source projects. This complex integration leads to significant inef- ficiencies across the component (block-to-block) and layer (hardware/software) boundaries. To produce efficient prod- ucts with this approach, we must develop tools, standards, and methodologies that enable optimization across these boundaries. Such tools must be able to understand and ma- nipulate the interaction of hardware components, software systems, and design constraints such as timing, power, and performance, across components from different vendors.
Ability to cost-effectively design efficient products with mul- tiple vendors’ IP. Ability to optimize across complex systems, in particular with shared resources.
• Industry: EDA tool manufacturers and IP vendors need to standardize interfaces for optimization. Software ven- dors need to develop systems for optimizing across library boundaries.
• Academia: New runtime and static optimization strategies for complex architectures and software.
Components are provided as black boxes, but optimization must cross the boundaries. Black boxes obfuscate the high- level behavior that is often critical for efficient optimization.
Specification of the non-functional properties of the compo- nents (e.g. temporal behavior, data pattern scheme, power profile). Multi-criteria optimization (e.g., latency plus energy).
Multi-modality optimization (e.g., software plus hardware).
Opportunities and Potential Disruptive Technologies ARM’s European presence and customer knowledge could be a large benefit if information can be shared with researchers in a non-restrictive manner. Multiple SoC designs on-going in Europe. Advances in convex optimization need to be more heavily leveraged by the computing systems community.
Dramatic increase in ASIC cost reduces the number of cus- tomers for such tools. Post-place-and-route power/perfor- mance analysis is essential for accurate evaluation, but is very difficult and expensive for academic teams to accomplish.
Virtualization layers will be difficult to analyze and optimize.
4.2.3. Next-generation processing coresDrive
Processing cores form the heart of all computing systems.
Efficiency constraints are forcing us to design systems with large numbers of task-specific (heterogeneous) cores. This fu- ture requires three key trends for next-generation processing cores: lower power, lower verification cost, and more intel- ligent reliability. For overall system efficiency, we must design efficient cores. This trend makes the complex structures of the past less attractive, and encourages a move towards simpler designs. As systems will be built of a variety of task-specific cores, we need to reduce the per-core design and verification costs. And since there will be hundreds or thousands of cores per chip, the ability to ensure reliability in the face of manu- facturing variability and unreliable components becomes criti- cal.
Ability to produce the energy-efficient systems needed to le- verage the increasing numbers of available transistors. Abil- ity to provide predictable behavior for safety-critical systems.
Ability to provide reliability in the face of ever increasing vari- ability and failure rates in newer technologies.
• Industry: Chip and IP block designers (ARM, STMicro- electronics, ST-Ericsson, etc.) need to push efficiency and reliability while minimizing development cost. EDA tool vendors need early and accurate power/performance mod- eling and higher-level functional verification and design methodology.
• Academia: New efficient architectures. Verification tech- niques for hardware. Power/performance modeling.
Minimizing the cost of data movement within chips (register- register/cache-register/memory-cache). Optimizing computa- tional resources for applications. Determining how much to specialize. Handling process variation. Handling hard/soft er- rors at smaller feature sizes. Better energy managament than using DVFS (Dynamic Voltage Frequency Scaling). Providing usable architectures for compilers. Common generation of hardware and its programming stack. Higher level hardware design tools (e.g. C++ based).
• Efficiency: Minimizing data movement within cores (reg- ister-functional unit, register-register, register-cache). Co- locating processing and storage. Optimizing computation unit design and selection. Optimizing data path widths. Ac- curacy/power tradeoffs at the architectural level. Custom and reconfigurable data paths/functional units.
• Complexity: Improving hardware verifiability. Enabling programmability through compiler-targetable designs.
Enabling virtualization for accelerator cores. Advanced hardware performance monitoring. Enabling concurrent debugging. Supporting legacy applications and binaries.
Higher level hardware design tools (e.g. C++ based).
• Dependability: Process variability and unreliable compo- nents. Providing predictability for time-critical applications.
Providing security for secure applications. Correct by con- struction design methodology. Formal proof of correct be- havior for hardware and software.
performance per Joule; performance per byte from main memory; average percent of maximum performance ob- tained automatically by compilers; quality degradation under process variability.
• Short: Energy efficient movement within cores, tools to assist in the generation of the hardware of the computing cores and their compilers.
• Medium: Automated design space exploration tools to propose efficient architectures and compilers.
• Long: New compute engines with minimized data move- ment.
Opportunities and Potential Disruptive Technologies New forms of computing elements (PCMOS, 3D, bio-inspired computing elements, etc.) and the integration of more tradi- tional ones (FPGAs, GPU, etc.). New memory and intercon- nect technologies (optical, stacked die, non-volatile, etc.) will alter the data/compute balance. New application demands will shift the focus of the cores.
Core power may be small compared to the surrounding in- frastructure and data movement. Current processors are becoming more complex and harder to understand and to work with. Very few teams can design processors through place-and-route to get credible performance results. Adop- tion of new processors is very slow. Testing processor designs requires full system and compiler infrastructure.
4.3.1. Architectures for the Data DelugeDrive
The world creates, stores, and processes a staggering (and
increasing) amount of data today. In addition, the complexity
of the data is increasing, as is the sophistication of the re-
quired processing. Yet buried within this data are key insights
into business, society, health, and science. To transform this
deluge of data into value requires computing infrastructures that can process it in real time. Today’s systems struggle to keep up, and projected increases in data far outstrip project- ed growth in processing power and storage. Addressing this divergence requires developing systems and techniques that enable us to store and process data with orders of magnitude more efficiency and methodologies to program them to en- sure real time response.
Ability to extract value from the massive streams of digital data in today’s society. Ability to handle the ever-increasing data volumes of the future.
• Industry: Data centers need to improve the scalability, ca- pacity, and performance of their processing and storage systems.
• Academia: Develop new algorithms to efficiently extract information from the data streams. Real-time, best-effort processing methodologies with statistical – or formal – guarantees of correctness and latency.
Volume of data (storage, retrieval, transportation). Processing requirements (throughput, performance). Latency require- ments (guaranteeing latency, best-effort calculations, ensur- ing uniform latency). Scalability (power, latency).
• Efficiency: Energy efficient processing of streaming data sets. Cost of moving data. Processing in-place. Choosing the right location for processing based on current con- straints (battery, communications cost). Finding new com- puting paradigms better suited to the natural data process- ing (Recognition, Data-Mining and Synthesis).
• Dependability: Enabling aggregate analysis while main- taining privacy. Reliability in the face of best-effort calcula- tions. Enabling commodity use of cloud services to provide vendor diversity.
Volume of data processed per Joule. False positive and false negative rates for advanced recognition algorithms. Energy/
quality tradeoff for best-effort calculations.
• Short: Energy efficient architectures for data processing.
Latency analysis tools.
• Medium: System development tools for minimizing laten- cy and energy. New concepts and processing paradigms for natural data processing.
• Long: Real-time analysis of data. Accelerators for natural data processing using new computing paradigms.
Opportunities and Potential Disruptive Technologies New memory and interconnect technologies (silicon photo- nics, stacked die, non-volatile memories, etc.) will alter the data/compute balance. Computation embedded in the stor- age system and non-volatile storage embedded in the pro- cessors. Interoperability between cloud providers may spur innovation on the backend to differentiate. New computing paradigms that are efficient for non-exact data processing, such as bio-inspired, stochastic, probabilistic will require dif- ferent architectures and programming models for efficient implementation.
Large-scale data applications are not open to academics. Data is often proprietary. Evaluating real-time behavior at scale re- quires complex testing setups and infrastructure. Inertia to move away from classical processing approaches.
4.3.2. Reliable systems for Ubiquitous ComputingDrive
As computing systems become smaller, more powerful, and universally networked, they permeate even deeper into all as- pects of society. These systems are now essential for safety, efficiency, and social interaction, and must meet demands for higher levels of reliability. This encompasses everything from correctness and dependability of safety-critical systems to availability of social networking services and power distri- bution networks, and privacy and security for personal and corporate data.
Ability to provide reliable systems in the presence of unre- liable technology. Ability to continue growth in the mobile sector. Ability to ensure reliability and privacy for mobile ap- plications and infrastructure. Ability to ensure safety for em- bedded infrastructures.
• Industry: Needs to develop ubiquitous computing stan- dards. Needs to improve the reliability and the security of the software and the hardware components.
• Academia: Needs to work on techniques to automatically
verify and design security and safety properties of whole
systems. Need to ensure the transfer of these techniques
to real-world systems and problems.
Complexity of the systems. Perseverance of hackers. Increas- ing reliability problems with smaller feature sizes. Coping with the dispersion of characteristics of basic components. Moving away from the “worst case design” methodology.
• Efficiency: Load balancing over the complete system. En- ergy scavenging for sensor networks. Extreme low power for implanted systems. New approaches at architecture level to dynamically detect errors of components. Moving away from the “worst case design” methodology allowing more efficient designs while ensuring predictability.
• Complexity: Large scale distributed system. Correctness (timing, testability, composability) guarantees. Interoper- ability. Ensuring quality of service across integrated compo- nents.
• Dependability: Graceful degradation in the presence of failing components. Security and safety guarantees. Isola- tion of software domains.
Number of security fixes, hacks. Tolerating device variability.
Tolerating device faults. Achieved utilization under safety- critical constraints.
• Short: Manually secured and verified systems.
• Medium: Semi-automatically secured and verified systems.
First designs with variability and fault tolerance.
• Long: Fully automatically secured and verified systems, or correct-by-design tool chains. Self-reconfiguring systems to optimized variability and errors.
Opportunities and Potential Disruptive Technologies Quantum computing for security applications. Leveraging parallelism to improve deterministic execution.
Fundamental security mechanisms broken (e.g. crypto made
worthless by quantum computing). Difficulty of achieving
predictability on commodity processors with shared resourc-
es. Need to work at higher levels of abstraction for efficiency
while still ensuring low-level reliability. Gap between theoreti-
cal work on timing properties and industrial practice.
Several aspects of the future of computing systems for the next several years are clear:
• Energy efficiency will force hardware to move to hetero- geneous parallel systems
• The Data Deluge will drive applications
• Ubiquitous computing will force a business towards clouds and mobile devices
Yet these same trends lead to significant challenges:
• Heterogeneous systems are prohibitively difficult to pro- gram
• Existing infrastructures for data processing will not scale up
• The focus on mobile and cloud will result in significant reli- ability challenges
Based on these trends and challenges, HiPEAC has identified three Core Computing Challenges:
• Efficiency: Maximizing the computation per unit of energy
• Complexity: Providing tools to enable software develop- ment for new systems
• Dependability: Ensuring reliability and predictability for ubiquitous computing.
Each of these challenges plays an integral role for the fu- ture growth of our computing capabilities and the societal benefits we derive from them. To address these challenges, HiPEAC has identified three key areas for research and seven specific research objectives:
• Efficiency (with a focus on energy efficiency) - Heterogeneous computing systems - Locality and communications management
• System Complexity
- Cost-effective software for heterogeneous multi-cores - Cross-component/cross-layer optimization for design
- Next-generation processing cores
• Dependability and applications (with a focus on their non-functional requirements)
- Architectures for the Data Deluge
- Reliable systems for Ubiquitous Computing
More and more, it will become critical as well to investigate research directions breaking with the line of classical Von Neumann systems and the hardware/software boundary to adress these challenges.
By focusing on these areas, the HiPEAC community will be able
to make significant high-impact contributions to computing in
Europe. These advances are necessary to enable our society in
the 21st century to continue to reap the benefits of computing
systems that have so revolutionized the 20th century.
To analyze the trends and challenges facing computing sys- tems in the beginning of the 21st century we have considered four key stakeholders: society, business, applications, and systems technology.
A.1. Societal Trends and
Challenges for Computing Systems Computing Systems R&D helps address Europe’s key socio- economic challenges, from a lower carbon economy, to health and well-being in an ageing society, competitive businesses and manufacturing for a sustainable recovery, and learning and sharing of cultural resources [ICTWORK]. For decades to come, we consider the following nine essential societal grand challenges [ISTAG], which have deep implications for comput- ing, and vice versa.
Energy: computing systems are both part of the growing en- ergy problem (consuming about as much energy as civil avia- tion) and our single most effective tool towards its solution.
To improve the energy consumption of computing systems we must improve its efficiency. At the same time, the use of computing systems to model, analyze and optimize our exist- ing and future energy production and consumption infrastruc- tures and technologies will have an even bigger impact. To enable new advances in energy efficiency and production we must continue to improve our computational capabilities.
Computing Systems Challenges: improve energy efficiency to reduce Computing’s energy footprint; increase computational capabilities to enable better tools for modeling and design.
Transportation and mobility: Modern society depends on inexpensive, safe and fast modes of transportation. However, transportation is an environmental hazard, average speeds are low, and tens of thousands die every year in transpor- tation accidents. Computing is a key enabler for improving mobility by providing the technology to optimize and control traffic flows, monitor and optimize fuel usage, and provide advanced active safety features. Besides improving transpor- tation, computing systems also help us avoid it by providing virtual interaction through email, instant messaging, and video conferencing, all of which reduce the need for physical travel.
Computing Systems Challenges: provide efficient computa-tion to enable sophisticated processing and control; ensure ubiquitous communication to enable large-scale optimiza-tion; guarantee dependability for safety-critical operation.
Healthcare: The use of computing systems technology is essential to improve healthcare. There is a great need for devices that monitor health, assist healing processes, and identify early-stage diseases. These devices can both improve the quality of care and reduce cost. Further, as more health