Software Engineering Best Practices for Parallel Computing Development

(1)

Software Engineering Best Practices for

Parallel Computing Development

Vikas Patney

MASTER THESIS 2010

(2)

Postadress: Besöksadress: Telefon:

Box 1026 Gjuterigatan 5 036-‐10 10 00 (vx)

551 11 Jönköping

Software Engineering Best Practices for

Parallel Computing Development

Vikas Patney

"Detta examensarbete är utfört vid Tekniska Högskolan i Jönköping inom ämnesområdet Informationsteknik och management.

Arbetet är ett led i masterutbildningen med inriktning informationsteknik." Handledare: Vladimir Tarasov

Examinator: Examinators namn

Omfattning: 30 hp (D-nivå)

Datum:

(3)

Abstract

In today’s computer age, the numerical simulations are replacing the traditional laboratory experiments. Researchers around the world are using advanced computer software and multiprocessor computer

technology to perform experiments, and analyse these simulation results to advance in their respective endeavours. With a wide variety of tools and technologies available, it could be a tedious and time taking task for a non-‐ computer science researcher to choose appropriate methodologies for developing simulation software The research of this thesis addresses the use of Message Passing Interface (MPI) using object-‐oriented

programming techniques and discusses the methodologies suitable to scientific computing, also, propose a customized software engineering development model.

(4)

Sammanfattning

ii

Sammanfattning

I dagens dataåldern, är de numeriska simuleringar som ersätter den

traditionella laboratorieexperiment. Forskare runt om i världen med hjälp av avancerad programvara s och multi datorteknik för att utföra

experiment och analysera dessa simuleringsresultat för att avancera i sina respektive strävanden. Med ett brett utbud av verktyg och tekniker som finns tillgängliga, det kan vara en mödosam och tids tar uppgiften för en icke-‐datavetenskap forskare att välja lämpliga metoder för utveckling av simuleringsprogram är. Forskningen i denna avhandling behandlar

användningen av MPI använda objektorienterad programmering tekniker och diskuterar de metoder som lämpar sig för vetenskapliga beräkningar. Också föreslå en anpassad programvaruteknik utvecklingsmodell.

(5)

Acknowledgements

I would like to give my special thanks to M. Yannick Ponty and M. Alain Miniussi for kind support and guidance during my internship at

Observatoire Cote d'azur.

I would like to thank M. Vladimir Tarasov for being a constant source of guidance and valuable inputs throughout my thesis work.

Also, I would like to thank my mother for being a source of inspiration and her efforts to help me attain high quality education.

(6)

Key words

iv

Key words

Parallel Programming, MPI, Software Engineering, Multi cores, C++, Design patterns, Boost Library, Simulation, STL, Generic Programming, Templates, Partial Differential Equations

(7)

List of Figures

FIGURE 2-‐1: THE WATERFALL MODEL (SOMMERVILLE, 2009) ... 14

FIGURE 2-2: EVOLUTIONARY SOFTWARE DEVELOPMENT

(SOMMERVILLE, 2009) ... 15

FIGURE 2-3: COMPONENT-BASED SOFTWARE ENGINEERING

(SOMMERVILLE, 2009) ... 16

FIGURE 2-4: GRAPHICAL REPRESENTATION OF SCIENTIFIC

COMPUTING CONCEPT (KARNIADAKIS & KIRBY II, 2003) ... 18

FIGURE 2-5: REPOSITORIES SVN [17] ... 26

FIGURE 3-1: STEPS IN CONSTRUCTIVE RESEARCH (LASSENIUS, 2001) 33

FIGURE 4-1: PSC DEVELOPMENT METHODOLOGY ... 36

FIGURE 4-2: DIAGRAMMATIC VIEW OF SINGLETON PATTERN ... 39

FIGURE 4-3: STANDARD VIEW OF TEMPLATE METHOD PATTERN (SHALLOWAY AND TROTT, 2004) ... 41

FIGURE 4-4: SCHEMATIC DIAGRAM FOR SPMD COMPUTATION

(KARNIADAKIS AND M.KIRBY II, 2003) ... 42

FIGURE 5-1: THEORETICAL EVALUATION PROCESS UNDERTAKEN IN THIS STUDY ... 46

(9)

List of Abbreviations

Generic Programming (GP) Software Engineering (SE)

Standard Template Library (STL) Message Passing Interface (MPI) Computational Fluid Dynamics (CFD) Partial Differential Equations (PDE)

Component based Software Engineering (CBSE) Observatoire Cote d'azur (OCA)

(10)

Introduction

8

1 Introduction

Advanced computing in recent times has made it possible to attain impeccable computation results allowing the emerging industries to

effectively cut costs. At the same time, implement different research topics to advance technologically and financially. With popularization of Parallel computation technologies, there is great demand for having

supercomputer’s enabling high computation speeds. In order to attain such high standard computing goals, we need intelligent software's that permits effective implementation to achieve maximum efficiency benefits. The costs involved in deploying supercomputers are high, but the costs involved in preparing, deploying, implementing the software for such machines are even higher. Hence, we need to follow specific Software Engineering procedures and methodologies that enable a consistent software development process. Software Engineering aims at producing high performance and fault-‐free software that is delivered within the specified time period and budget, satisfying client requirements. Also, the software should be easy to maintain and modify in case of change requirements (Schach, 2008).

The need for having powerful computer that can provide the users with almost trillion operations per second has become widespread. With the increasing use of parallel microprocessors, it has now brought with it the trend of implementing parallel programming. The hardware for generating such a complex machine is a challenge in itself, but even bigger challenge is the efficient use of multiple-‐core platforms. MPI can be used to manage multiple parallel computers and efficiently assigning correct task to each processor resulting in faster computation.

The use of parallel supercomputers is now common in the field of science and engineering. The traditional approach of “cut-‐and-‐try” is now being replaced by “simulate-‐and-‐analyse”, changing methods of scientific analysis and engineering design (Karniadakis & Kirby II, 2003). The modern

scientists and engineers prefer numerical simulations to physical laboratory experiments. The numerical simulations are performed using a single or cluster of computers that allows us to achieve experimental settings for solving and analysing complex Partial Differential Equations (PDE’s). In classical science we dealt with matter and atoms, and in simulations we deal with information in bits (Karniadakis & Kirby II, 2003). The

combination of advanced scientific concepts and engineering methodologies can be considered as simulation science. The fact that non-‐computer science researcher’s implementing numerical simulations may be unaware of the current software methodologies and technologies related to this field. This study aims to provide software engineering methodologies specific to

numerical simulations and discuss the usage of parallel programming in the fore mentioned context.

(11)

The author participated in the Project Cubby (Cubby, 2010) at OCA labs where the main idea of this research had originated. Project Cubby is considered as one of the main projects currently under development at Observatoire Cote d'azur focusing on numerical simulation and interpreting the data results. The project Cubby was initiated in the year 2004, the aim of developing Cubby project is to capture the behaviour of a fluid in a space Cubic periodic using pseudo spectral methodologies. The Cubby code was initially written in C, but during the past years the code has been re-‐

developed in C++. Cubby has now been coded in C++ using advanced

programming concepts and well documented that facilitates understanding of the code.

Cubby uses MPI for achieving parallelism. MPI is a de facto standard library function’s, which allows effective communication within the nodes in the cubic space. The use of boost libraries further assists in achieving high performance MPI communication and enable to transfer data between processes within the same or different groups (Cubby, 2010). The author participated in the development of the pseudo spectral code for solving Partial Differential Equations (PDE). The goal of the project Cubby is development of a toolbox that will be able to solve PDE’s with pseudo-‐ spectral methods in various geometries using parallel computing (Cubby, 2010).

The area of parallel programming is a fresh topic and getting much

popularity with the emerging hardware technologies. Due to its performance boosting features, there are numerous practical implementations of parallel programming. But, this study covers topic of simulation and solving complex mathematical equations such as Partial Differential Equations (PDE's) using parallel programming. It is very important that the right strategy is applied in the design process and implementations of parallel algorithms.

(12)

Introduction

10

1.1 Background

In the past, there have been multiple research studies focusing on the area of parallel programming to produce simulation software’s. Some of the

previous related works that inspired this research study includes, the object–oriented methodology for Massively Parallel Programming (Kilian 1992), Concurrency and Parallel methods for multi-‐core platforms (Jansson & Hellberg, 2010), the parallelism shift and C++’s memory model (Torp, 2008) and Realtime particle-‐based fluid simulation (Auer, 2008). The mentioned research studies contained detailed technical knowledge

regarding the subject of parallel programming and fluid dynamics. However, the fundamental concept of Software Engineering methodologies and how to effectively manage, and execute a complex simulation solving Partial

Differential Equations in detail was found to be missing. Hence, this research study addresses the fore mentioned issue by proposing a customized

software engineering methodology, and, list out best practices from the software industry to give a non-‐computer scientist a head start in planning and executing simulation projects.

The Object-‐oriented programming allows us to treat concepts as Objects and perform operations such as receiving data, performing operations on them and interacting with other objects. These objects are usually real world entities containing data, and this data can be manipulated using the

underlying functions. This allows us to manage complex code and reuse it, compared to classical approach of having long set of sequential instructions. Here, for our purpose of solving complex numerical problems we use the Object-‐ oriented paradigm using concepts of classes and objects to perform data manipulation. We find the application of Object-‐oriented paradigms in designing complex numerical simulations, utilizing multi core processors to generate accurate results (Garrido, 2009). In this study, we discuss object-‐ oriented programming techniques and recommend best practices for producing parallel computing simulation software. A strong emphasis has been made to support experts from non-‐computer science background, discussing the necessity of software engineering methods and their

application to extract maximum efficiency benefits. Since, there is a scientific need to have a study that shall fill this knowledge gap and effectively

minimize time, effort and expenditures in the context of producing complex simulation software’s. Software engineering is a vast subject and covering whole of it is not the main aim of this study. In this study, we will be

discussing some of the best practices that author finds highly relevant based on project experience at Observatoire Cote d’azur (OCA) labs. Following the guidelines mentioned in this study will allow non-‐computer scientists, such as researchers from Physics to effectively manage and execute numerical simulations using parallelization techniques.

(13)

1.2 Purpose/Objectives

The aim of this study is to identify and elaborate software engineering methodologies & techniques focusing on numerical simulation software’s using MPI. The purpose of these numerical simulations is focused at solving Partial Differential Equations in the field of Computational Flow Dynamics (CFD). The research questions investigated in this study are,

1. Is it appropriate to choose MPI and object-‐oriented programming techniques for developing numerical simulations, compared to traditional methods used over legacy systems?

2. Which software engineering methodologies, tools and software industry’s best practices could be used specific to these numerical simulation projects and why?

In order to achieve the purpose of the research topic, the following objectives must be fulfilled:

• Propose a Customized software development model

• Providing software engineering best practices for developing an efficient parallel computing simulation software

• Evaluating the approach used for developing the simulation software project Cubby

1.3 Limitations

The computations for numerical simulations have been prevalent for past many years, but its practical implementation is mostly limited to research studies. The main source of input for this research study is based on

analysing the role of software engineering for numerical simulation utilized in Cubby project at Observatoire Cote d’Azur. Here, we focus on

synchronizing the theoretical views (literature review) to the actual implementation (empirical study). This study proposes a customized software process model, tools and suggests best practices. Therefore, the propositions or results achieved in this study should not be treated as an absolute solution to numerical simulations. There could be alternative solutions for implementing a problem that may fit better depending on the available hardware, equations, etc. The proposed solutions do not provide a step-‐by-‐step guide for successfully implementing numerical simulation software. But, assist the developers in better managing the software projects to achieve efficacy.

(14)

Introduction

12

1.4 Thesis outline

The first chapter gives an overview of the research work, detailing the introduction and objectives behind the research work. The second chapter explains gives the theoretical aspects involved with the field of study explaining concepts in a short and concise manner. The third chapter is an explanation about the research methodology used in the research and how the study was conducted. The fourth chapter gives the first results

achieved during the study. The fifth chapter involves evaluation of the results. Finally, the sixth chapter gives the conclusion and future recommendations. Followed by the lists of reference sources used in preparing the study as the sixth chapter.

(15)

2 Theoretical Background

2.1 Introduction to Software Life-Cycle Models

A successful software development process not only focuses on code development, but also on activities such as budget, resources, client requirements, etc., concerning overall software project evolution. All the mentioned activities must be carefully examined in order to achieve final software product that is within specified budget, time period and fulfilling client requirements. Software projects are complex in nature and require the managers and developers to take important engineering decisions from time to time. In reality there can be no ideal process, depending on the nature of the software project and the organization goals, many organizations have developed their own approach to software

development (Sommerville, 2009). Some of the fundamental activities of software process are:

• Software specification: the functionality of software and constraints on its operations must be defined

• Software design and implementation: production of software requirement specification must be done

• Software validation: the software must be validated ensuring it meets the requirement of the customer

• Software evolution: the software must evolve to meet the changing customer requirements (Sommerville, 2009)

Once the purpose and the objectives that the software must achieve have been decided, the next step involves the careful planning and execution of actual development of the project. The underlying dependencies and complexities generate problems or pitfalls that are difficult to predict in advance. Software development models are tried and tested

methodologies that may ensure stable software development evolution. None of the software models guarantee successful final implementation due to complexities involved. Sometimes the projects can be poorly or overly estimated leading to unwanted results. Below, we will briefly discuss regarding the different types of software process models (Sommerville, 2009). In this chapter the author briefly introduces the existing software process models namely, waterfall, evolutionary

development, component based software engineering and CASE models as they form the basis for the proposed model in this study. Understanding the various software life-‐cycle models is highly recommended, as they are the building blocks of software development process. The different stages associated will allow the reader to get an insight into the various activities involved in a successful software projects.

(16)

Theoretical Background

14

2.1.1 Waterfall Life-Cycle Model

The waterfall model is considered to be the first software model that was derived from general software's engineering process. The model suggests a step-‐by-‐step strategy that allows us to move from one phase to another. Some of the waterfall development life cycle stages are:

• Requirement analysis and definition: understanding the goals and objectives of the software by consulting the system users

• System and software design: involves providing an overall design of the software architecture

• Implementation and unit testing: ensuring the units meet specifications

• Integration and system testing: a complete check to ensure the

system meets the requirement and can be delivered to the customer • Operation and maintenance: to ensure the deployed system is

working efficiently and provide maintenance (Sommerville, 2009)

Figure 2-‐1: The Waterfall model (Sommerville, 2009)

Figure 2-‐1 above illustrates the cascade from one stage to another in the waterfall model. These stages are performed one-‐by-‐one and after

validation of each stage, documentation is produced that must be

approved by the manager. Every stage is inter-‐related during the project evolution, since the development process is non-‐linear in nature. Slight change in any one stage can impact overall estimation and might result in complete rework in some cases increasing the costs associated. The appropriate usage of waterfall model can be applied only when the

(17)

requirements are known and well understood from the beginning and do not change significantly during the course of project development

(Sommerville, 2009).

2.1.2 Evolutionary development

The Evolutionary development process is based on the idea of developing an initial implementation based on the user requirements and deploying this initial plan. After receiving user review, improvements can be

implemented on the appropriate version until the final product validation. The tasks such as specification, development and validation are

interconnected and change with different updates throughout the software development life Cycle. Evolutionary development can be classified into two types:

• Exploratory development: this involves working with the clients to understand the requirements and producing the deliverables • Throwaway prototyping: this involves developing a prototype that

focuses on experimenting with customer requirement that maybe poorly understood. By doing this, one can come up with well understood requirements and produce a better final product (Sommerville, 2009)

Figure 2-‐2: Evolutionary Software Development (Sommerville,

2009)

The Figure 2-‐2 above shows the process of developing an initial

implementation and refining it through various versions based on user feedback. This process can be repeated till an adequate system has been developed. The evolutionary development is best suited when the user requirements are not clear and for medium sized systems. The main

advantage of using evolutionary development is that the specifications can be developed incrementally (Sommerville, 2009).

(18)

16

2.1.3 Component-based software engineering

This method is best suited when we want to reuse existing software and based on facts, majority of software projects perform software reuse. There is a possibility of finding solution to the problem as it might exist beforehand. In this case, with necessary modifications one can reuse an existing piece of code. After the initial phase of setting the requirements is done, we can validate the comparison between the different processes (Sommerville, 2009). There are different intermediate stages involved in reuse-‐oriented process:

• Component analysis: based on the requirement the search is made for an existing solution, but generally there are few exact matches available

• Requirement modification: in this phase, the component available is studied and its requirements are checked for compatibility. If on application of certain modification the component can be reused, then it is kept or else a new search is made

• System design with reuse: in this phase, the framework of the system is designed or reused

• Development and integration: the developed software and COTS systems are integrated to build a new system (Sommerville, 2009)

Figure 2-‐3: Component-‐based Software Engineering (Sommerville,

2009)

Figure 2-‐3 above shows the advantage of component based Software engineering. The software's can be developed quickly, at much lower cost by simply reusing existing software. But, there can be a situation where the requirements are not met in certain cases (Sommerville, 2009).

! ! ! "#$%&'!()*+!,-./-0'012345'6!7-8194&'!:0$#0''&#0$!;<=! >?'!46@4014$'!-8!1?#5!.'1?-6-A-$B!#5!1?'!84C1!1?41!5-8194&'D5!C40!3'! 6'@'A-/'6!E%#CFABG!41!.%C?!A-9'&!C-51!3B!5#./AB!&'%5#0$!'H#51#0$!5-8194&')! I%1G!1?'&'!.4B3'!4!5#1%41#-0!9?'&'!1?'!&'E%#&'.'015!4&'!0-1!.'1!#0!C'&14#0! C45'5!;<=)!

Computer-Aided Software Engineering (CASE)!

,J7:!/&-@#6'5!5%//-&1!8-&!5-8194&'!/&-C'55!&'A41'6!4C1#@#1#'5!5%C?!45! &'E%#&'.'01!'0$#0''&#0$G!6'5#$0G!/&-$&4.!6'@'A-/.'01!406!1'51#0$)!K'0C'G! 6'5#$0!'6#1-&5G!6414!6#C1#-04&BG!C-./#A'&5G!6'3%$$'&5G!5B51'.!3%#A6#0$!1--A5G! '1CG!C40!3'!C-05#6'&'6!45!,J7:!1--A5)!,J7:!1'C?0-A-$B!4AA-95!%5!1-!4%1-.41'! 5-8194&'!/&-C'55'5!406!$'1!#08-&.41#-0!43-%1!1?'!5-8194&'!3'#0$! 6'@'A-/'6)!L#1?!46@40C'.'01!#0!,J7:!1'C?0-A-$BG!#1!#5!0-9!/-55#3A'!1-! 4%1-.41'!1?'!&'$%A4&!5-8194&'!/&-C'55!4C1#@#1#'5)!IB!6-#0$!1?#5G!1?'&'!?45! 3''0!4!MNO!'51#.41'6!#./&-@'.'01!;<=)!!!!!! !

(19)

2.1.4 Computer-Aided Software Engineering (CASE)

CASE provides support for software process related activities such as requirement engineering, design, program development and testing. Hence, design editors, data dictionary, compilers, debuggers, system building tools, etc., can be considered as CASE tools. CASE technology allows us to automate software processes and get information about the software being developed. With advancement in CASE technology, it is now possible to automate the regular software process activities. By doing this, there has been a 40% estimated improvement (Sommerville, 2009).

2.2 What is Simulation?

Simulation forms the basis of this study and it is imperative to have a good knowledge about what simulation’s do and why we use them. In the recent decade technology has been a driving force in our lives and has changed the way we live & work. Let it be the change from reading news online from the traditional paper media, or even communicating with co-‐workers or family over email more often than personal meetings, and the list goes on and on. Science and engineering is a combination that has been

evolving at a grand level replacing the traditional methods of

experimentation by numerical simulations using parallel supercomputers (Karniadakis & Kirby II, 2003).

There are numerous aspects that need to be taken into consideration while performing any laboratory experiment. Similarly, one needs to be extra cautious while designing a simulation software for performing a real world task involving mathematical computations and other factors such as

entropy, viscosity, to name a few. We need to find an efficient way to implement the existing algorithms as software that would use an appropriate programming language to make efficient use of parallel

computers. We can also utilize visualization tools to verify the result of the experiments, and make simulation’s interactive and allows a real-‐time observation of the results produced (Karniadakis & Kirby II, 2003).

(20)

18

Simulation is a combination of different criteria’s and requires following set of procedures in order to achieve the expected or actual experimental behaviours. We will run through the different phases required for

simulation process. The first phase involves choosing the right

representation for the target physical system under scrutiny. To do this, one must carefully select consistent assumptions that maybe associated with the driving equations and the limitations associated to it. Several other governing laws, entropy conditions, uncertainty principal must be taken into account. The second phase involves choosing and implementing the best possible algorithmic approach. This allows the developer to come up with correct representation of the atomistic model. Again, selecting the best-‐fit algorithm is never an easy task. Since there are many ways to solve a problem and there could be many constraints such as choosing the

fastest or simplest or efficient algorithm can be a difficulty factor. Third phase involves following the latest trends and usage of the parallel computing paradigms and analysing successfully produced real-‐ time results to the application being produced. The fourth phase involves comparison of results produced by our application, with previous results to check for reliability of the produced application. The last phase involves visualization of the produced simulation, which is mostly done in 3D space and time (Karniadakis & Kirby II, 2003).

Figure 2-‐4: Graphical representation of scientific computing concept (Karniadakis & Kirby II, 2003)

Figure 2-‐4 above shows scientific computing as an intersection of

numerical mathematics, computer science and modelling. Simulation’s can certainly be considered as a science just as physics, just as how we

separated computer science as a branch of study in due course of time. Hence, the Simulation scientist can be separated from being a computer engineer or physicist. (Karniadakis & Kirby II, 2003).

(21)

2.3 C++ and its features

For past decade, most of the programmers have been using FORTRAN as the programming language for numerical simulations. With the transition of computer hardware technology to multi cores, it was observed that parallel processing is not easy to achieve. Software projects are

increasingly becoming advanced and complex in nature with addition of new software functionalities. The lack of features such as Abstract data structures, and limited modular design capability makes FORTRAN less favourite choice for today’s programming world.

This study focuses on having C++ as the preferred programming language and lists the reasons to support the claim. C++ is an Object Oriented

programming language that allows us to logically divide the problem at hand into different underlying data structures, and specify the

corresponding operations that take place on them. It has also been observed that C++ has a better tendency to be more efficient and has a better compatibility when it comes to applying them to different Algorithms. The implementation of concepts such as encapsulation, dynamic memory allocation, recursive function call, can be considered as some of the basic advantages and features that C++ presents to us as a robust and efficient programming language (Karniadakis & Kirby II, 2003).

2.4 Generic Programming and its Benefits

Generic programming is a programming methodology that allows us to use abstract requirement specification for the implementation of data

structures and algorithms. This study promotes the use of generic programming, as it could be highly useful for implementing complex algorithms that can accept type-‐independent input parameters. Generic programming in C++ is implemented by the use of templates allowing us to make use of parametric polymorphism. The abstract requirement

specification boasted by Generic programming can be related to the traditional Abstract Data Types (ADT). But, in case of generic

programming in C++ the same ideology used for implementing ADT is taken to the next level (Alexandrescu, 2001). To do this, we specify a group of types that may have similar interface and semantic behaviour and these sets of requirements that show a common behaviour is also referred to as a concept. This feature allows the programmer to write an algorithm in a complete generic manner or type independent based on the parameters defined for the algorithm. Hence, by implementing the idea of being able to use different types to the same variable we achieve polymorphism. In case of object oriented programming in C++ we can achieve polymorphism by the use of inheritance and virtual functions. On the other hand, for generic programming we implement Templates and classes to achieve

(22)

20

Concept in generic programming relates to the set of requirements that are set in a way that it is possible to compile and execute the class or function templates. Generic programming is implemented in the following steps:

• Identify useful and efficient algorithms

• Develop a generic representation for the algorithm in such a way that there exists minimum possible requirement on the data on which it operates

• Derive a set of minimal requirements that allows these algorithms to be run efficiently

• Finally construct a framework which corresponds to these requirements (Siek and others, 2002)

2.5 Standard Template Library (STL)

As the name suggests, STL or standard template library is a library consisting different container classes, algorithms and different data structures. STL is a generic library and built using templates, making the library strictly parameterized. This study promotes the use of generic programming and STL library provides numerous basic algorithms and data structures that can be very useful in solving complex PDE’s. The core of STL includes set of container classes that perform the task of containing other object. Some of the classes are (STL, 2010):

• Vector: is one of the simplest STL container classes, where the number of elements can vary dynamically and memory

management is automatic. A vector is a sequence, allowing random access to its elements, constant time insertion and removal of elements at beginning or in the middle

• List: is a doubly linked list. It is possible to traverse in forward or backward direction and constant time insertion and removal of elements is possible at the beginning, end or in the middle • Deque: is a sequence that supports random access to elements,

constant time insertion and removal of elements at the end, and allow linear time insertion and removal of elements in the middle • Set: stores objects as type key, where its value type and key type

both are its key. Also, every element in a set is unique

• Multiset: stores objects in exact same way as a Set, but it is possible to have two or more identical elements

• Map: associates objects of type key with objects of type Data. Hence, the value type is pair<const Key, Data> and no two elements can have the same key

(23)

• Multimap: stores data in a same manner as a Map, but there is no limit on number of elements with the same key

• Hash_set: finds its application highly useful in being able to search an element quickly

• Hash_map: allows faster search when looking up for an element with its key

All classes mentioned above are templates and an instantiation would allow them to contain any type of object. Also, there is a wide collection of algorithms that allows the manipulation of data that is stored in the

container classes. In general, the algorithms are written independent of the data types that may be used for its process that makes them generic in nature. The algorithms must be able to traverse and access the elements within a data structure and this functionality is achieved by the use of iterators. There are three different categories of iterators, namely

InputIterator, ForwardIterator and RandomAccessIterator. The following figure describes the relationship that exists between containers,

algorithms and iterators. Iterators make it possible to separate the

algorithms from the existing containers. We know the fact that algorithms in case of Standard template library are written using templates and

iterators allows us to parameterize them, allowing us to have an algorithm that can be used with different type of containers (STL, 2010).

2.6 MPI (Message Passing Interface)

As discussed in the introduction, supercomputers with multiprocessors are now available and their application is getting highly popular. In order to effectively use the hardware technology offered by the supercomputer this study proposes the use of MPI. MPI offers an efficient programming paradigm that is independent of the programming language. MPI is an open source library interface offering developers with capability to pass data to different processes irrespective of address space using its pre-‐ defined functions. MPI offers high-‐level portability, and connect over distributed machines or a super computer. Also, one can use different programming languages such as FORTRAN, C or C++ depending on the requirement or parameters associated with the software project. Due to its versatile nature, MPI library can be efficiently used over a variety of

computers where it may be installed, but this may require some modifications on different machine in some cases (MPI, 2010).

(24)

22

Scalability is also another supporting feature of MPI that allows an efficient collective communication over a group of processes, or a group of tasks that have similar executing tasks in nature. MPI enable communication over a virtual topology such as a hypercube or over a grid. MPI finds its usage highly relevant for developing software applications that can effectively run on parallel computers to gain high performance and

efficiency. MPI is platform independent and must compile and execute on any platform that has MPI installed on it. MPI is a standard offers the following functionalities (MPI, 2010):

• Point to Point Communication: allows two processes to communicate with each other, MPI_Send allows one process to send message or data to another process

• Datatypes: the type of data that is being sent must be explicitly defined. One can utilize the predefined datatypes such as MPI_INT, MPI_CHAR, MPI_DOUBLE. One can also derive datatypes using MPI_Type_create_struct using the predefined datatypes

• Collective operations: It is possible to have collective operations not only on intracommunicators, but also on intercommunicators. Hence, we can have an All-‐to-‐All, All-‐to-‐One and One-‐to-‐All categories of collective operations

• Process groups: a group and a rank represent a unique process in MPI but the process can belong to different groups

• Process Topologies: Virtual topologies allow us to map or order the processes into geometrical shape mostly as Cartesian (grid) and graph

• Process creation and management: Management of process, their creation, communication, new process addition and deletion of a process even after MPI application has been started

• One Sided communication: provides an interface to Random Memory Access (RMA) communication, methods allowing a single process in MPI to specify the parameters for sending and receiving data

• Language bindings: are available programming languages such as FORTRAN, C, C++ and Python

(25)

Above-‐mentioned functionalities have their own specification of how the data is transferred between the two communicating process and group of processes. The communication between the processes can be either blocking or non-‐blocking. In case of blocking communication, the

operation requested is completed and only when it has been processed, the result can be either sent or received. In case of non-‐blocking

communication, the processes can continue to send or receive the data irrespective of operations on data in the background. This allows the processor to perform the required tasks while the communication carries on without being hampered (Snir and others, 1995).

In order to group the processes and make communication with one another, we make use of communicator. A communicator can be defined by using the command MPI_COMM_WORLD and it allows us to define the groups of processes and access the correct process within different groups. The processes in a group are identified using a unique rank and with this rank number the data can be sent or received by the communicator. The collective communication allows us to communicate between two groups of processes as a whole, i.e., one to many or many to one. One thing that must be noted is that all collective communications are blocking in nature. Generally, collective communication is preferred as it reduces the errors due to writing numerous calls and improves code readability making debugging process less cumbersome (Snir and others, 1995).

The MPI C++ library consists of a wide range of classes that allows the use of various MPI subroutines such as send, receive, probe and broadcast. To do this we make use of the member functions of the underlying classes and access them using the dot notation. C++ is a popular language especially due to its object-‐oriented nature. This makes the source code more reliable and easy to use when compared with other versions defined in C or

FORTRAN (Snir and others, 1995).

2.7 Boost Libraries

Boost libraries are a set of libraries that support the Standard Template library for C++ and can be used with variety of applications. Moreover, the next generation of C++Ox will also feature more boost libraries than the 10 boost libraries currently featuring in C++ (Boost, 2010).

This study proposes the use of Boost. MPI library. The Boost.MPI library can be used in parallel programming with a purpose of achieving greater efficiency in computation. The boost C++ libraries enable the

programmer's to achieve high performance levels due to its new features, which are in compliance to the C++Ox version, yet to be released (Boost, 2010).

(26)

24

Boost is platform independent and Boost.MPI is mainly focused at C++ bindings that provide a great interface support for C++ user-‐defined and standard template library types. And, one can also use the boost

instructions with other programming languages such as C or python (Boost, 2010).

When we begin to write a program using Boost.MPI, we first need to have a MPI environment. This can be done by using mpi::environment at the beginning of the program. To perform communication between the processes we use a communicator, which gives the number of process residing within a group along with their ranks respectively. At present, the following functionalities can be used in Boost.MPI (Boost, 2010):

• Communicators: Boost.MPI supports operations on a process such as its creation, destruction, cloning and splitting of communicators and management of process groups

• Point to Point communication: is possible for primitive and user defined data types allowing sending and receiving data using either blocking or non-‐blocking interfaces

• Collective communication: we can apply operations such as reduce and gather over both built-‐in and user-‐defined data types

• MPI datatypes: User-‐defined datatypes are supported using Boost.Serialization library (Boost, 2010)

2.8 Software Configuration Management

Software development is a complex activity that involves much careful planning and a thorough check on the activities that are related to the development of the software. And, one needs to monitor these activities or the related processes. Hence, Software configuration management is how you control the evolution of software project (Bellagio and Milligan, 2005). There are many tools and techniques that are associated with the software configuration management (SCM) and widely used in the industrial world to make sure software development process is well managed. According to IEEE standards, “Standard for SCM plans” [IEEE 828-‐1998] states the following (Bellagio and Milligan, 2005):

SCM constitutes good engineering practices for all software projects whether it is the development phase, rapid prototyping or on-‐going maintenance. It enhances the reliability and quality of software by:

• Providing structure for identifying and controlling documentation, code interfaces, and database to support all life-‐cycle phases

(27)

• Supporting a chosen development/maintenance methodology that fits the requirements, standards, policies, organization, and

management philosophy

• Producing management and product information concerning the status of baselines, change control, tests, release, audits, etc.

Software development process is dynamic in nature and is usually associated with constant changes. These changes are very necessary for any software development activity since it can radically improve or

degrade the development activities. Hence, it is very important that strong emphasis is laid on the fact that we need to monitor the changes being made to the software projects. SCM provides tools and methodologies allowing us to mange these changes and avoid certain pitfalls. This study recommends the use of Subversion Control (SVN). Also, propose mercurial as a proposed tool for future enhancements (Bellagio and Milligan, 2005).

2.8.1 Subversion Control (SVN)

Also, known as “time machine”, Subversion or SVN is an open-‐Source software configuration management (SCM) tool that allows us to manage our files and directories, along with the changes implied on them over a period of time. By doing so, we can tag the changes made on the project as versions with a unique number and these versions can be retrieved and be very useful in analysing the various changes which have been made from the start of the project development (Subversion, 2010).

Subversion was first introduced in September 2004 with Subversion 1.1 release and in the following four years, Subversion came up with five new versions with Subversion 1.5 as the latest version released in June 2008 (Subversion, 2010).

Subversion has the ability to operate across various machines connected in a network, which can be handled by different users. SVN empowers the users to make corrections to the data and source code from their local machines and commit changes (Subversion, 2010).

Subversion can be understood as a file sharing client/server system where in the heart lies the repository. The repository is where the main data is stored. This data is mostly in form of file system tree set according to the defined hierarchy of directories and files. SVN allows you to connect as many users to the main repository as one wants and can simultaneously perform actions such as read or write on the data. The figure below shows an example of such repository (Subversion, 2010).

Software Engineering Best Practices for Parallel Computing Development

Software Engineering Best Practices for

Parallel Computing Development

Vikas Patney

MASTER THESIS 2010

Software Engineering Best Practices for

Parallel Computing Development

Abstract

Sammanfattning

Acknowledgements

Key words

Contents

List of Figures

List of Abbreviations

1 Introduction

1.1 Background

1.2 Purpose/Objectives

1.3 Limitations

1.4 Thesis outline

2 Theoretical Background

2.1 Introduction to Software Life-Cycle Models

2.2 What is Simulation?

2.3 C++ and its features

2.4 Generic Programming and its Benefits

2.5 Standard Template Library (STL)

2.6 MPI (Message Passing Interface)

2.7 Boost Libraries

2.8 Software Configuration Management