Degree project

(1)

Degree project

Design Metrics on Prediction of Open

Source Software Complexity

Author: Milkias Tadesse Date: 2013-03-04

Subject: Software Technology Level:Master

(2)

ii Abstract:

The growth of open source softwares (OSS) is playing a big role in the industry. The

important contributions that have been generated over the past years are found to be useful for software developers. The development method used to produce open source software is different in that it participates programmers who are interested in coding. On the other hand, software complexity is one aspect that should be raised during the development of software. It is considered as one factor linked with different characteristics of quality in software. Since there are multiple developers located in different places who commit their codes to

repositories, there is a need to understand the complexity of OSS before using them.

A systematic use of object oriented design metrics can be useful in helping to solve this. In this paper, the complexity of the most popular open source softwares is investigated by the use of statistical assessment of the metrics. In order to facilitate this, it includes a case study to investigate complexity of ten popular projects that are available sourceforge.

The case study has shown that applying software metrics that would measure the different aspects of software would be useful in analyzing, studying and improving the complexity of open source software.

(3)

iii Acknowledgement

First and foremost I would like to thank our thesis coordinator, Mathias Hedenborg for the cooperation he has provided particularly in getting me a supervisor for this paper. Secondly, I am grateful to my supervisor Tobias Andersson-Gidlund for the important guidelines and feedbacks he has given me while preparing this paper. Moreover, I would like to thank my family for their huge support. In addition, I am thankful to Yared Yohannes, a good friend of mine who has suggested me with his opinions in selecting a suitable topic that would be of my interest.

(4)

iv Table of contents

1. Introduction 1

1.1 Background 1

1.2 Statement of the Problem 1

1.3 Review of Related Literature and Research question 1

1.4 Purpose of the Study 2

1.5 Scope and Limitations 2

1.6 Structure of the Report 2

2. Software Complexity and its Impacts 3

2.1 Complexity Defined 3

2.2 Impacts of Software Complexity 3

2.3 Monitoring and predicting Software Defects 4

3. Software Metrics 5

3.1 Overview 5

3.2 Classification of Software Metrics 5

3.2.1 Complexity metric 6

3.2.2 Size Metrics 7

3.2.3 Dependency Metrics 8

4. Methodology of Case Study 9

4.1 Overview 9

4.2 Analytical Framework 9

4.3 Type of Research 10

4.4 Sampling Method 10

4.5 Procedure and Time Frame 11

4.6 Analysis Plan 12

4.7 Validity and Reliability 12

4.8 Assumptions 12

5. Results 13

5.1 Presentation and Analysis of Data 13

5.1.1 Coupling Between Objects 14

5.1.2 Weighted Method Count 16

5.1.3 Depth of Inheritance Tree 17

5.1.4 Number of Children 17

5.1.5 Lack of Cohesion Metric 18

5.1.6 Response For Class 19

5.1.7 McCabe Cyclomatic Complexity 20

5.2 Correlation Tests 21

5.2.1 Correlated Metrics 21

5.2.2 Metric Pairs Analysis 23

6. Conclusion 25

6.1 Summary of Results and Findings 25

6.2 Overall Conclusion 25

6.3 Recommendation and Future Work 26

References 27

(5)

v List of Figures

Figure 3.1 Sample Mccabe Cyclomatic Complexity graph ...6

Figure 5.1 Measurement result of CBO for the OSS...15

Figure 5.2 Measurement result of WMC for the OSS...16

Figure 5.3 Measurement result of DIT for the OSS...17

Figure 5.4 Measurement result of NOC for the OSS...18

Figure 5.5 Measurement result of LCOM for the OSS...19

Figure 5.6 Measurement result of RFC for the OSS...19

Figure 5.7 Measurement result of MCC for the OSS. ...20

(6)

vi List of Tables

(7)

1

1. Introduction

This chapter presents an introduction of the thesis. It is divided into six sub sections that discuss the background, problem statement, formulation of the research question, purpose of the study, limitations and the structure of the report.

1.1 Background

In this era of computing where the practice of software development have proliferated, there is a high competion for a better product in the market. Today object oriented programming paradigm is being used by many commercial and non commercial software developers in contrast to the procedural programming paradigm. It has gained a wide acceptance since it has features that benefit developers. One of the features that can be mentioned is reusability of components from one system to another.

Many commercial software companies have developed interest in integrating useful open source softwares into theirs as to enable them in saving the time or the effort required by software development process. Open-source community is playing an important role in the continuous growth and improvement of software development (Iulian-Ionut, 2009). The field of research in the area of software complexity is based on the assumption that complexity is an important indicator for predicting software development effort (De Tran-Cao,2001). This has grown interest for studying the complexity of open source softwares before implementing them. Object oriented design metrics are benificiary in that code level analysis is useful for understanding complexity of design (Iulian-Ionut, 2009).

.

1.2 Statement of the Problem

Software complexity is one of the challenges of software engineering. Researchers in the industry of software have given in attention and it is up to now a potential area for developing new methods (Honglei, 2009). Unlike industrial softwares, most open source software (OSS) projects are produced in a distributed computing environment whereby individual programmers write code and commit to centralized repositories (Tibor, 2005). Companies with software developers have an interest to incorporate these open source programs into theirs. This is so because the widespread practice of object oriented programming paradigm has motivated reusability of components. However, since the development methodology that is followed with commercial software developers is different than that of the OSS, there is a need to investigate the complexity of the open source software before adopting them.

1.3 Review of Related Literature and Research Question

Before formulating the research question, a literature review in the areas of open source software complexity and Object oriented design metrics have been conducted. This is done with the help of the digital libraries available on Institute of Electrical and Electronics Engineers (IEEE) and Association for Computing Machinery (ACM). The search strings used were open source software, software complexity and software metrics. A number of research papers and articles were found and studied. For instance:-

 (Iulian-Ionut, 2009) presents assessment of project size to estimate complexity. The paper took consideration of correlation among external software

characteristics such as reliability and maintainability rather than the software metrics.

(8)

2

In general, the number of open source softwares to analyze complexity was few in most of the articles. One of the reasons for this was that the papers were focused on validating the metrics rather than assessing complexity. In addition, few take into account the relationship or interaction among the metrics. This paper will investigate how object oriented design metrics predict the complexity of open source software. It will include analysis of selected metric suites and the implications of the correlation among them.

1.4 Purpose of the Research

The use and application of software metrics would play a role since they provide a quantitative basis for determining complexity of a system. This will allow us

to measure because it would be a convenient way to understand and analyze design flaws . In addition, understanding software complexity provides a convenient way to estimate software performance, cost and fault prone classes (Rashidi, 2010). The goal of this paper is to

investigate the capability of object oriented design metrics for predicting software complexity in popular open source softwares.

1.5 Limitations and Scope

The software used to assess the open source software and to collect data has a 30 days trial version with full functionality provided. The target software for analyses are selected to be the most popular software available in Concurrent version system (CVS) and Subversion (SVN) that are available in sourceforge. However, not all of the software with in the top 10 most popular software have been imported into the workspace of the software, so instead the once which satisfy the criteria from the subsequent list of projects have been selected in some cases. The main reason is that some were developed in other programming languages than the one used here. In this study, open source projects developed in java are chosen because the software metric analyses tool JBuilder is designed to test java programs. In addition, most of the softwares available in the repositories are designed with the programming language.

1.6 Structure of the Report

The upcoming sections discuss about the notion of software complexity applied for this research and the subsequent sections deal with the introduction and study of different software metrics; to be more specific, Chapter 2 will assess the meaning and context of software

(9)

3

2. Software Complexity and its Impacts

In this chapter, the description of software complexity will be discussed. In addition, the impacts of software complexity will be addressed. A discussion about the prediction and monitoring of software defects is also included.

2.1 Software Complexity Defined

For the past years, there has been a growing interest for defining appropriate ways to measure the complexity of software. One of the challenges in assessing complexity measures is that it is not always clear what the measure is supposed to be measuring. Regularly described characteristics include the difficulty of implementing, testing, understanding, modifying, or maintaining a program (Kearney, 1986). In relation to this, the understanding of complexity could also be helpful in estimating the number of people required to achieve a given task (Alain, 2001).

Software complexity is considered as multidimensional construct (Henry, 1988). One of the challenges while discussing about the measurement of complexity in software is to define complexity as to understand the meaning. There are different definitions of software

complexity. The first definition is the one that is described by Institute of Electrical and Electronics Engineers (IEEE). It states Complexity as “the degree to which a system or component has a design or implementation that is difficult to understand or verify”.

(Basili, 1980), on the other hand, defines complexity as the measure of resources used by the system during the interaction of a software to perform a specific task. If it is a programmer that is interacting with the software then the complexity is defined by the difficulty of performing the tasks with respect to debugging, modifying, testing or coding of the software. If the interacting system is a computer, then complexity is defined by the execution time and storage required to perform the computation. (Zuse, 1991) has also implied that the term software complexity measure is unsuitable for a common description and stated that

complexity can be described as the difficulty to change, understand or to maintain software. As can be understood, the definitions listed here are described in respect to the

interaction of software. Additionally, there could be other associations of complexity like for example, it can be used to predict the number of people required to achieve a certain task (Alain, 2001).

The term software complexity in this paper corresponds to the definition of the IEEE that relates the notion of complexity to the difficulty level of software to be comprehended or to verify the implementation flaws.

2.2 Impacts of Software Complexity

(10)

4 2.3 Monitoring and Predicting Software Defects

Processes of software development such as documentation, design, program, test, and maintenance can be calculated statistically. Therefore, the complexity of software can be observed efficiently. Software metrics is a significant field in research of software engineering and it has developed gradually (Tu Honglei, 2009).

From time to time, projects of software have become more complex because of increased lines of code , added features and also issues related to bug fixes, etc. Moreover, tasks are required to be completed with less time and fewer people. This complexity after some time will have the tendency to minimize the test coverage and eventually will affect the quality of the product. Other factors involved head to head over time are the overall cost of the product and the time to deliver the software (Thom Garrett, 2011).That is one of the reason why it is important to monitor and control the level of software complexity. It is done mainly by regularly measuring different indicators that describe it and then take corrective and adaptive measures to improve affected areas of code (Iulian-Ionut, 2009).

(11)

5

3. Software Metrics

This chapter presents an overview of software metrics proceeding by the discussion of different object oriented design metrics that are classified in different catagories. Most of these metrics are used in the case study that follows.

3.1 Overview

The process of depending on testing is not good enough from a quality assurance point of view. Since the development of software is becoming greater and more complex, quality must be assured starting from the early stages, such as when specifying requirements, designing and coding. As described by (J.E.Gaffney, 1981), Software metric may be comprehended as "an objective, mathematical measure of software that is sensitive to differences in software characteristics. It provides a quantitative measure of an attribute which the body of software exhibits." Software metrics play a role in quantifying some aspect of a product generated during a software project. This does not only comprise the program code of a system, but also, more important documents such as the functional specification, the system and detailed design. Reviewing code is an effective way to understand the complexity in software from the coding phase. However, for a large-scale software development, there are margins in

including all the programs. Therefore, using analysis tools would provide the capability to detect defects of programs.

Software metrics are of interest for several reasons. Numerical measures of the software product can be transformed to indicators, such as "reliability" and "maintainability" of interest to both users and software development management (J.E.Gaffney, 1981).

In general, Software metrics can be grouped into three categories. These are product metrics, process metrics and project metrics (H. Kan, 2002).

● The Product metrics explain the characteristics of the Software product such as its size, complexity, performance and the level of quality.

● Process metrics can be applied in order to improve the software development process and maintenance. Some examples include the effectiveness of defect removal during the

development stage and the response time to fix the process (H. Kan, 2002).

● Project metrics depict the project characteristics and execution. Examples include the number of software developers, the staffing pattern over the life cycle of the software, cost, schedule, and productivity (H. Kan, 2002).

In general, software metrics are more closely associated with process and product metrics than with project metrics. Nonetheless, the project parameters such as the number of

developers and their skill levels, the schedule, the size can be factors useful for analyzing complexity (Alain, 2001).

3.2 Classification of Software Metrics

The software metrics presented here are grouped into complexity, size and dependency metrics. The metrics are classified into these categories as to identify the attributes the metrics can provide insight into. A discussion of these will be followed in the sections to come ahead.

3.2.1 Complexity metrics

(12)

6

Banker, 1989). In this section, few metrics which can help in evaluating the complexity of a particular software project would be discussed.

McCabe Cyclomatic Complexity (MCC)

Cyclomatic complexity gives the number of paths that may be taken when a program is executed. Methods with a high Cyclomatic complexity tend to be more difficult to understand and maintain. The Cyclomatic complexity metric measures the complexity of a module's decision structure. It can be calculated by counting the number of linearly independent paths through a function or set of functions (McCabe, 1976). It is useful in a situation where higher Cyclomatic complexities associate with greater testing and maintenance requirements. Commonly Complexities measure of higher values corresponds to higher error rates.

The case study conducted here applies this metric as to determine the complexity of open source software developed in the Java programming language. Some of the tokens

responsible for the program taking different paths during execution are:

● while and do-while statements ● if statements

● for statements

● Ternary Operators and Logical Operators ● switch-case statements

● Return, throw, throws-catch statement. (Pradeep, S. 2005).

Let Bm=(V, E) a basic block graph of a method m,Then the McCabe MCC(m) can be calculated as MCC(m) ≈ |E| - |V| + 2 ,where E is the number of edges and V is the number of vertices in the graph(McCabe, 1976). The following figure depicts a sample Cyclomatic Complexity calculation.

Fig 3.1 Sample Mccabe Cyclomatic Complexity graph

Weighted Methods per Class (WMC)

The theoretical basis behind the Weighted Methods per Class is related to the complexity of an object. In object oriented programming language, methods are used to describe the

If statement while statement

MCC =|E|-|V| + 2 MCC(Fig1) =10-9+2

(13)

7

properties of a particular object. Hence, the complexity of an object can be assessed by the number of methods it can utilize or contains. Therefore, the number of methods can be a measure of complexity (Chidamber, 1991).

The WMC measures some features of the scope of the methods building a class. It computes the weight of each method, that is, the value of WMC can be attained by summing up the weighted methods of the class. After obtaining WMC value, it can be used to measure the complexity of the decision structure within the methods. It can be helpful in a

circumstance where higher WMC values associate with enlarged development, testing and maintenance efforts. Because of inheritance, the testing and maintenance efforts for the derived classes could also increase as a result of higher WMC for a parent class (Kemerer, 1994).

Mathematically the WMC can be presented as,

WMC =

∑

Ci

Where k represents the total number of classes and C stands for Class with methods M1 … Mn where n is the number of methods (Chidamber, 1991).The case study in this paper used WMC for evaluating the complexity of the individual candidate software projects.

3.2.2 Size metric

Size metric is the most common measures used to assess the memory requirements, the effort and the development time that is necessary. It has been argued that poor size predication has been a major cause for software failures. This metric is very important in determining the cost that is correlated with development. Additionally, it is useful in preparing schedules and also estimation of efforts required. Complexity is a function of size, which can greatly affect the design flaws and hidden defects resulting in quality problems, cost overruns, and schedule changes. Complexity shall be constantly monitored, measured and controlled. Any impact on size metrics can be shown in the effort performance criterion. The effort metric predicts the effort needed to maintain a project (S.Malathi, 2012).

Lines of Code (LOC)

As its name indicates, the notion behind Lines of code basically is to count the number of lines of source code of a certain software project.Even though it is a simple, it is a strong metric suite to assess the complexity of different software entities. Depending on respective coding conventions and formats, it is important to apply it in the generated codes.

Additionally, it can only be measured in the source code itself from the front-end. (ARiSA AB, 2008-2009).

3.2.3 Dependency Metrics

This sub section presents set of metrics that can be used to measure software complexity of an object oriented design in terms of the dependence between the subsystems. For instance, the dependency of subsystem can be at method level. Designs that are highly interdependent are inclined to be rigid, hard to maintain and puts restrictions on reusability (Martin, 1994).

k

(14)

8 Coupling Between Object Classes (CBO)

CBO for a class is a count of the number of other classes to which it is coupled. The

theoretical basis behind this metric is to calculate the number of the peripheral classes whose members are called or used as types by members of the current class. To explain it in other words,” CBO refers to the number of coupling between classes. When a class let’s say, class1 calls the member functions of another class, class 2; coupling will occur. The smaller the CBO, the less the class affects other classes. This means that the more independent the class is, the lesser the probability that an alteration could occur to a depending class; and therefore less maintenance effort may be needed. Concurrently, the bigger the coupling between objects is, the slighter the reusability may the class become. This metric is useful in

discovering a situation where excessive coupling limits the availability of a class for reuse, and also results in greater testing and maintenance efforts (Kemerer, 1994).

Response For a Class (RFC)

The RFC metric measures the general complexity of the calling hierarchy of the methods. The value for RFC can be calculated by counting the methods of a class and the methods that they directly call. Larger RFC counts are commonly correlated with increased testing

requirements. Since it includes methods called from outside the class, it can also be a measure of the possible communication between the class and other classes. If the number of methods that can be invoked in response to a message is large, the testing and debugging process of the class would become more difficult and time consuming since it requires very good knowledge of how the methods are interconnected to each other (Chidamber, 1994).

Lack of Cohesion in Methods (LCOM)

LCOM measures how widely member variables are used for sharing data between member functions. It is calculated by counting the pairs of class methods that don't access any of the same class variables reduced by the number of pairs that do. In other words, the “Lack of Cohesion in Methods metric is a measure for the number of not connected method pairs in a class representing independent parts having no cohesion. It represents the difference between the number of method pairs not having instance variables in common, and the number of method pairs having common instance variables.” A higher LCOM indicates lower cohesion. This relates with weaker encapsulation, and is a pointer that the class is a candidate for disaggregation into subclasses (Chidamber, 1994).

Depth of Inheritance Tree (DIT)

(15)

9

4. Methodology of Case Study

This chapter presents a discussion of the method applied for the case study. It begins with an overview and a description of analytical framework and continues to explain the type of research adopted. Then, the selected open source projects will be presented. In addition, the procedure used together with the time frame will follow. After that, the analysis plan will be discussed. Furthermore, the validity concerns and the assumptions taken will also be reflected in the end.

4.1 Overview

The aim of the case study is to investigate the ability of object oriented design metrics for analysis of complexity in open source software. Software metrics collected directly from source code (internal Metrics) are being used to measure the complexity of different open source softwares.

Software measurement has a number of goals in relation to the process of a product. The following is the list of some of these purposes presented in the paper by (S. Morasca, 1996).

 Tracking: the attainment of information on some characteristic of software processes to distinguish if those characteristics are controlled.

 Characterization: the collection of information relating to characteristics of software processes and products, that is helpful to get a good idea on the present status of software.

 Evaluation: deciding some characteristic of a software process or product.

 Improvement: using a cause-effect combination to classify elements of the process or product that can be changed to obtain good results on some characteristics.

 Prediction: finding a cause-effect correlation among the different product and process characteristics (S. Morasca, 1996).

From time to time, software engineering is confronted with the challenge of growing complexity of software projects, large amount of data and impediment of software

development process (V. Kamakshi, 2009). Implementation of a successful control system requires some means of measurement. Hence, software metrics play a significant role in the management aspects of the software development process such as enhanced planning,

assessment of improvements and decreasing the level of unpredictability (R. Selvarani, 2009). The processes of detecting faults early in the software development, productivity

evaluation and assessing factors such as reusability, defect proneness and complexity are important (R. Selvarani, 2009).

4.2 Analytical Framework

A method has been devised prior to conducting the case study. It is composed of:-

 choosing the set of open source software to analyze as a means for collecting data

 selection of suitable set of metrics suite

 defining the level at which the metrics are applied and tool selection

(16)

10

of the projects had different releases, stable and recent versions of the projects have been selected. The candidate projects were also the once which were available in CVS and SVN without read only access.

On the other hand, a particular set of design metrics suite have been selected since there are a variety of which to choose from that could be applicable for the experiment. The choice is performed with the study of articles of the most important metrics that are proven significant.

The case study has used tools for obtaining metric values and for statistical assessments. The tools were chosen after comparing them with related softwares. Automated software complexity analysis tools such as Cyvis, Analyst4j and JBuilder were the candidate tools. The trial version of JBuilder was selected as a favorite tool for this experiment since there were few drawbacks in the others. For instance, the trial version of Analyst4j allows only a total of 50 classes per project for analysis. In addition, Cyvis produces metric values for a limited number of metric suites. JBuilder overcomes these drawbacks since it measures the entire classes in the project. Moreover, it produces values for many metric suites. On the other hand, Winstat and Microsoft Excel were chosen as statistical assessment tools because they were found suitable for the tasks performed in this case study. The analyses were performed at class level.

After these considerations, the analysis and interpretation of the data was conducted with the use of statistical application software and automated extraction tool for metric values.

4.3 Type of Research

The research type followed by the case study can be regarded as Quantitative one since it involves collecting and obtaining data in the form of numeric so that different statistical analysis can be carried out to reach conclusions. In addition, to address the research question, this study has incorporated different software applications to observe the possible

relationships among the variables, in this case-the metrics. The type of statistics applied can be also regarded as a descriptive.

4.4 Sampling Method

(17)

11

Software name Number of classes Lines of Code Download rate per week Azureus/Vuze 451 84530 70019 Sweethome3D 444 85670 42506 Lightweight Java Game Library 1098 66530 9,374 Saros Distributed party programming 195 12339 8057 Jedit 569 70405 6693 Jstock 232 35970 6213 Jfreechart 795 92077 4361 JasperReports 1093 103063 2992 Logisim 953 71699 2855 Hypersql database engine 687 166468 2381

Table 4.1 List of most popular open source software

4.5 Procedure and Time Frame

A number of tasks have been performed as a procedure for conducting this experiment. The first task was to select a set of open source software. The second involved extracting data (metric values) with the use of the software. The third involved selection of specific set of metric suites and the last step was to perform an analysis on the data. Each of this processes had a certain time frame.

The tools that are used in extracting and processing the data collected are JBuilder 2008 R2, Winstat 3.1 and Microsoft Excel 2007.Winstat is a statistical assessment tool produced in the analytical chemistry department of Stockholm University. In addition, Microsoft Excel has been used to organize and interpret the data produced in JBuilder which is one of the most complete and powerful Java IDE’s available with support for the leading commercial and open source Java EE 5 application servers. It is a very good tool to improve the quality of code and performance, increase individual and team productivity and improve understanding of new or existing code.Generally,the built-in metrics in JBuilder makes it possible to measure the overall quality of the object design, the complexity and cohesion between objects, the degree of test coverage, and many other factors to help identify potential maintenance, quality and performance issues.

Metrics support the process of software development with measures of the complexity of the project since they quantify the code. If done manually, automated software metrics would take a lot effort to measure. However, automated software metrics involves analyzing the source code to collect the required measurement, thus giving a good outlook for analysts with consistent measurement.

(18)

12

The timeframe it took for collecting the list of the open source software is 3 to 4 weeks. In addition, it took about 3 weeks for selecting the metric analyzer tool JBuilder. Furthermore, the choice of statistical applications software Winstat and Excel have taken a period of two weeks.

With the choice of metric analyzer software and and the selection of metric suite performed, the next step is tofocus on understanding the implications of data. This was conducted in a two-step process which is to be discussed in the section to follow.

4.6 Analysis Plan

The analysis is performed in two steps. These are:-

Step 1: analyzing how the projects react to a particular metric

Step 2: performing correlation tests and producing pairs that have high correlation The first step involves analysis of individual metric for the OSS. At this point, the results produced after the collection of data are put for discussion. The results are portrayed in the form visualization charts that would give an outlook how the projects reacted to the specified metric. In addition, since the metric values vary from one open source to another an interval of values has been used to know where the data lies in the graph. The frequency interval in connection with the number of classes in the projects will serve as a basis for plotting the graph. The interpretation of the data will then be followed.

In the second step, the correlation among the metrics of the open source projects will be used to identify highly correlated metrics. In statstics, correlation is the measure of

relationship between different variables. The scales or the types of data used for measurement could be in the form of discrete or interval values. A coefficient of correlation value of 0 indicates that there is no relationship between the variables (the metrics in this case), while a correlation coefficient of 1 signify strong relationship. This is particularly useful since identifying highly correlated metrics in the open source softwares would enable to identify outliers or the classes which would be complex.

4.7 Validity and Reliability

The validity concern is that there is no clear information on how the threshold used for measuring the software metrics is formulated or sourced in the environment where the metrics are applied in JBuilder. So to fix this a threshold formulated in (R. Lincke, 2009) has been implemented. This is particularly applicable in identifying outliers.

On the other hand, factors such as missing dependencies, not fully compiled codes or syntax errors could be regarded as a threat to reliability of the experiment. Some effort have been taken to fix some of the issues related to that such as adjusting the import of packages in the class and adding dependencies in some cases.

4.8 Assumptions

(19)

13

5. Results

This chapter presents the analysis of data. The distribution of the data across the selected projects will be shown in the first section then a discussion of the chosen metrics will follow. After that, the discussion of the correlation tests proceeds.

5.1 Presentation and Analysis of Data

The following table depicts the results of the basic statistics gathered from measurement of the open source softwares. It shows the general distribution of metric values from the data collection phase.

Project name Measure CBO MCC DIH LCOM LOC NOC RFC WMC

Jfree Min 0 0 0 0 2 0 0 0 Max 93 428 7 100 2041 66 670 428 Median 8 7 1 61 63 0 25 7 Average 9.08 18.29 1.21 49.20 115.82 1.4 67.02 18.14 StdDev 9.87 33.95 1.49 35.80 179.38 6.05 110.59 33.71 Lightweight Min 0 0 0 0 2 1 0 0 Max 61 249 5 100 1274 10 580 249 Median 0 2 0 73 15 1 4 2 Average 1.74 9.59 0.52 59.65 60.59 1.09 17.95 9.20 StdDev 3.66 21.76 0.88 35.74 123.69 0.54 46.46 21.12

Jasper report Min 0 0 0 0 3 0 0 0

(20)

14 Sql Average 10.72 42.88 1.50 62.99 242.31 73.74 41.19 0.73 StdDev 13.23 79.74 1.16 31.90 475.24 86.92 77.31 2.76 Saros distributed party programming Min 0 0 0 0 2 0 0 0 Max 88 163 6 100 977 296 161 17 Median 5 7 1 72 30 25 7 0 Average 9.51 13.26 1.42 56.98 63.27 47.27 12.24 0.68 StdDev 12.60 19.69 1.37 35.03 105.70 48.38 17.81 1.90

Table 5.1 Descriptive Statistics values of the open source softwares

In the following section, discussion of the results obtained will be followed. As indicated in section 4.5, empirically evaluated metric suites by Chidamber are the focus for discussion in this section which are the CBO, WMC, DIT, NOC, LCOM and RFC. In addition to these set of metrics, MCC has been included for discussion since it is a good metric for measuring complexity. An analysis of how the results of individual metrics in the open source softwares will be discussed. In each of the metrics, the collection of data is conducted systematically. The occurring frequency of a certain metric value has been divided in intervals to deal with the large data set. Additionally, the number of classes has been grouped to reflect the occurring values. This has given a good outlook of the projects and understands how the metric in concern behave.

R.Lincke has in (R. Lincke, 2009) identified three different ways of defining thresholds for the software metrics. Defining a threshold would enable us to find the outlier entities in a package such as class or interfaces. The first of these methods involves sorting in decreasing or increasing order and then taking the top K entities, where K is number of entities with the highest or lowest values. The second way is to take the top K percentages of the sorted values of metrics relative to the number of entities. The third and recommended way of defining the threshold is to compute the maximum and minimum values of the metrics in concern and then picking the entities in range of {min + (max–min) * k% ... max} if a low value is desirable and take the entities within the range {min … min + (max–min) * k %} if a high value is good. This method obeys the absolute number of entities and has been chosen for this experiment.

5.1.1 Coupling Between Objects (CBO)

(21)

15

Fig 5.1 Measurement result of CBO for the OSS.

Based on the descriptions of software metrics used for this case, classes with low value of coupling between objects are preferable since they signify reusability and modularity. CBO is used to indicate the dependency of classes and helpful in determining how complex design is likely to be. It is not desirable to have a high value in CBO since it tends to cause difficulty to modular design and could create problem while reusing components. As a result, if there is less coupling among instances of a class, the complexity will also minimize. In addition, extremely high values of coupling rates increases the complexity in understanding the code which in turn could possibly have an effect on maintenance process of a given system. This is so because a change in one class could lead to an alteration in another so it could take effort to keep track of the couplings.

As stated above, there have been defined a way to monitor extremely high or low values for the metric suites. The method is useful for showing relative normative value ranges.

(22)

16 5.1.2 Weighted Method Count (WMC)

This is a measure of the complexity of the overall decision structure within the methods making up a class. For WMC, it is not suitable to have significantly larger number of methods per class since potential children of the class are going to inherit properties of that method. This in turn could make it hard to track changes. Therefore, this might increase the degree of complexity, which is a desired attribute in most object oriented programs. The following is a WMC graph produced as a result of applying the metric measurement.

Fig 5.2 Measurement result of WMC for the OSS

Interpretation: As result here in this case, the target classes to identify are those with high values of WMC. This will enable us to observe the classes that are difficult to comprehend to be able to make required improvements. Applying the formula for relative normative value range show that classes above the interval rate of 80 to 89 have higher WMC values. As a result, the classes placed above that could be regarded as outliers. As a result, identification of these classes in the open source projects could enable us to recognize the causes of the

(23)

17 5.1.3 Depth of Inheritance Tree (DIT)

DIT can be described as the maximum length of a path from a class to a root class in the inheritance structure of a system. This metric measures how many super-classes can affect a class.

Fig 5.3 Measurement result of DIT for the OSS.

Interpretation: Given the description of DIT, the suitable characteristic for classes is to have a minimum number of paths from the root of a class. As result here in this case, the target classes to identify are those with high values of DIT. The classes with high DIT values will often change since they depend on a lot of other classes in the hierarchy.

As it is the case above, this will also enable us to observe the classes that are difficult to understand. Applying the formula for relative normative value range show that classes above the value of 2 have higher DIT values. So a close observation of those classes could be useful in understanding the complexity of the software before implementing it.

5.1.4 Number of Children (NOC)

NOC can be defined as the number of immediate subclasses (children) subordinated to a class (parent) in the class hierarchy. NOC measures how many classes inherit directly methods or fields from a super-class.

(24)

18

Fig 5.4 Measurement result of NOC for the OSS.

Interpretation: Based on the description of NOC, classes with minimum number of NOC values are desirable. As result here in this case, the target classes to identify are those with higher values of NOC. These classes would cause more work when being changed since they are complex or have many depending classes.

So in this case, classes above the interval rate of 10 to 19 are more likely to become complex. Observing the graph above, it is safe to assume that most of the open source softwares have a good NOC values.

5.1.5 Lack of Cohesion in Methods (LCOM)

The Lack of Cohesion in Methods metric is a measure for the number of not connected method pairs in a class representing independent parts having no cohesion. It represents the difference between the number of method pairs not having instance variables in common, and the number of method pairs having common instance variables. The following graph shows results of LCOM measure in the OSS.

(25)

19

Fig 5.5 Measurement result of LCOM for the OSS.

Interpretation: Based on the description of LCOM, classes are preferred to have higher rate of cohesion. As result here in this case, the target classes to identify are those with relatively lower values of LCOM .So in this case, classes below the interval rate of 20 to 29 are more likely to become complex since they have less cohesion. Observing the graph above, there are many classes with low cohesion value. Therefore, appropriate attention should be given to those classes as to minimize the complexity.

5.1.6 Response For Class (RFC)

RFC is Count of (public) methods in a class and methods directly called by these.

(26)

20

Interpretation: Based on the description of RFC, classes are preferred to have lower rate for RFC. As result here in this case, the target classes to identify are those with higher values. So in this case, classes above the interval rate of 120 to 149 would most likely to become

complex because they have high coupling.

5.1.7 McCabe Cyclomatic Complexity (MCC)

Recalling the definition,MCC is a measure of the control structure complexity of software. It is the number of linearly independent paths and therefore, the minimum number of

independent paths when executing the software (Chidamber, 1994).

Fig 5.7 Measurement result of MCC for the OSS.

Interpretation: Based on the description of MCC, it is not suitable to have a higher value in this metric since the more the number of control flow, the more likely it is going to cause complexity with in a class. Methods with a high complexity tend to be more difficult to understand and probably maintain. In addition, the more complex the methods of an application are, the more it could become difficult to test the application for errors.

As for this case, the classes are preferred to have lower values in MCC. So, in the chart above, the target classes to identify are those with higher values. Applying the formula reveals classes above the interval rate of 30 to 39 would most likely to become complex because they have high values in MCC.

(27)

21 5.2 Correlation Tests

The correlation among the metric pairs would be a good means to indicate classes above or below the normative range. In order to perform the correlation for the open source softwares, a two step process has been used in the sections to follow. The first one is to identify the metric pairs which are highly correlated and the second one explaining to analyze how each metric pair could provide insight into the projects.

5.2.1 Correlated Metrics

Identifying highly related metrics would be useful in understanding fault prone classes in the projects. These classes would be residing outside the normal distribution in the graph. In order to pick the highly correlated metrics, Winstat has been used. This software can show the coefficient of correlation among each metric. After getting the result, those pairs which are close to a coefficient of correlation value close to 1 will be chosen for the individual OSS projects. The following table shows the coefficient of correlations in each. In this section the metric Lines of Code (Lines of Code) is included since it is good to observe how other metrics perform with size.

CBO _CC _DIT _LCOM _LOC _RFC _WMC _NOC Lightweight CBO ₁ 0.6116 0.3790 0.2733 0.6162 0.5728 0.5877 0.0224 CC ₁ 0.2019 0.2388 0.8494 0.5438 0.9883 0.0324 DIT ₁ 0.0848 0.2451 0.6143 0.1882 0.0029 LCOM ₁ 0.2885 0.1035 0.2415 0.0340 LOC ₁ 0.5153 0.8281 0.0618 RFC ₁ 0.5326 0.0274 WMC ₁ 0.0299

Jasper report CBO 0.7603 0.1684 0.2120 0.8370 0.0626 0.6161 0.7462

(28)

22 sql LCOM 1 0.1890 0.1924 0.2211 0.0928 LOC 1 0.6530 0.9269 0.0361 RFC 1 0.6702 0.0234 WMC 1 0.0790 Jedit CBO ₁ 0.4435 0.1611 0.3692 0.5449 0.3696 0.4129 0.0560 CC 1 0.0010 0.1804 0.9379 0.1938 0.9967 0.0103 DIT 1 0.2752 0.0337 0.8842 0.0194 0.0689 LCOM 1 0.2568 0.3591 0.1648 0.0617 LOC 1 0.2587 0.9207 0.0206 RFC 1 0.1649 0.0351 WMC 1 0.0094 Azureus CBO ₁ 0.7969 0.1513 0.1955 0.6803 0.7277 0.7940 0.0758 CC 1 0.1271 0.2224 0.8107 0.7276 0.9719 0.1292 DIT 1 0.0542 0.1014 0.5401 0.1168 0.0379 LCOM 1 0.1817 0.1261 0.2281 0.0055 LOC 1 0.5825 0.7718 0.0550 RFC 1 0.7239 0.1278 WMC 1 0.1490 Saros CBO ₁ 0.8714 0.1937 0.4000 0.9021 0.7779 0.8278 0.2073 CC 1 0.0365 0.4004 0.9647 0.6772 0.6772 0.9780 DIT 1 0.2649 0.0931 0.6550 0.0137 0.3529 LCOM 1 0.3929 0.1274 0.3766 0.0225 LOC 1 0.7246 0.9533 0.1884 RFC 1 0.6479 0.3069 WMC 1 0.1694 Jfree CBO ₁ 0.6032 0.1950 01803 0.7291 0.0384 0.5894 0.6041 CC 1 0.3544 0.2787 0.9163 0.1381 0.4996 0.9984 DIT 1 0.2442 0.3247 0.0823 0.6938 0.3544 LCOM 1 0.3543 0.1589 0.2870 0.3778 LOC 1 0.0604 0.5353 0.9140 RFC 1 0.0137 0.1400 WMC 1 0.4983 Sweethome3 d CBO ₁ 0.6223 0.3545 0.2166 0.7164 0.0957 0.6897 0.7044 CC 1 0.0643 0.2204 0.9751 0.0316 0.4348 0.9377 DIT 1 0.1861 0.1457 0.2115 0.7490 0.0689 LCOM 1 0.2077 0.0458 0.0947 0.2645 LOC 1 0.0473 0.5265 0.9231 RFC 1 0.1091 0.0414 WMC 1 0.4549

(29)

23

Result of metric pairs which have a correlation coefficient of greater than 0.8 for the individual open source projects:

1 Lightweight: CC Vs LOC, CC VS WMC and LOC Vs WMC

2 Jasper report: CBO Vs LOC,CC Vs LOC,CC Vs NOC,LOC Vs NOC 3 Logisim: DIT Vs RFC,CC Vs WMC

4 Jstock: DIT Vs RFC,CC Vs WMC 5 Hypersql: CC Vs LOC,CC Vs WMC

6 Jedit: CC Vs LOC,CC Vs WMC,DIT Vs RFC 7 Azureus: CC Vs LOC,CC Vs WMC

8 Saros: CBO Vs CC,CBO Vs LOC,CBO Vs WMC,CC VS LOC,CC Vs NOC, LOC Vs WMC

9 Jfree: CC Vs LOC,CC Vs NOC,LOC Vs NOC

10 Sweethome3d: CC Vs LOC,CC Vs NOC,LOC Vs NOC

5.2.2 Metric Pairs Analysis

Deriving the metric pairs which have a strong correlation would be useful in analyzing the complexity of the open source software. Identifying the highly correlated metrics for individual OSS is a good indicator of classes that lie outside the normal value range. As an example of how to show these metrics pairs are useful. The following plot shows the interaction of MCC with that of LOC, since this combination is observed to have high correlation in most of the OSS projects. For this example, the chosen metric values are taken from Jfree project.

Fig 5.8 Correlation between MCC and LOC in Jfree

(30)

24

result, classes that are large and complex would be residing in the top right portion of the graph. Therefore, a close look at these classes can be useful for estimating the complexity of the project.

(31)

25

6. Conclusion

Software complexity is still a challenge in the software industry. The first challenge is to define what complexity for a software system means in a given system since it could mean different things. Software developers can benefit from the advantage of open source software by reusing components. However, there is a need to investigate these components for

complexity. This paper has shown how it could be possible to predicate the complexity. The description of complexity set by the IEEE was applied as a basis and the applicability of software metrics have been exhibited. In addition, the field of statistics has been applied in order to measure and discuss the results.

6.1 Summary of Results and Findings

The case study demonstrated that using software metrics for predicating complexity in popular open source softwares is found to be very useful because it allows us to quickly understand the code and its implications.

From the case study presented above,the following has been observed that : -

 It has been shown how to measure complexity of software with the use of different software metrics. It has been indicated that using metrics, one can identify potentially detrimental classes in the projects that could most likely cause errors: So then

appropriate attention can be given to the areas of improvements.

 The relationship between metrics for individual projects has also a factor. It is a a very good indicator of outlier classes in the OSS.Metric pairs that have high correlation can be derived for a specific project and be analyzed for interpretation.

 The use of statstics for analyzing software metric values gives an impartial or one that is not subjective view of the projects.

 It has also been indicated that the use of visualizing the classes in graphs can help the developer or probably the software tester to get a good general observation of the classes by indicating the trend of projects for specific metric .

 Different from some classes in the most popular open source softwares used in this paper,most of them had metric results that indicate characterstics of good design.

6.2 Overall Conclusion

(32)

26 6.3 Recommendation and Future Work

The assessment tool used to carry out this study has produced values as to assess the different software metrics. However, there is no information on how the thresholds for computing these metrics are derived. Thresholds can be used to categorize and is importance in the study since the intention is to make decisions based on it.

In addition, there are a wide range of open source softwares that are available. Increasing the number of projects for analysis would increase the statistical significance. Therefore, both of these tasks can be noted as future work.

(33)

27

References

1. Applying Design-Metrics to Object-Oriented Frameworks. Lewerentz, Claus. 1996. IEEE Proceedings of METRICS. pp. 64-74.

2. Kan, Stephen H. Metrics and Models in Software Quality Engineering, Second Edition. Boston : Addison Wesley, 2002.

3. ARISA AB. ARISA Web Site. [Online] Applied Research in Systems Analysis. http://www.arisa.se. 4. Lincke, Rudiger. Validation of a Standard- and Metric-Based Software Quality Model. s.l. : Acta Wexionensia, 2009. ISSN: 1404-4307.

5. The Research on Software Metrics and Software Complexity Metrics. Honglei1, Tu. Beijing : IEEE Computer Society, 2009. 978-0-7695-3930-0/09.

6. Evaluating the Impact of Object - Oriented Design on Software Quality. Abreu, Fernando Brito e. Lisbon : IEEE Proceedings of METRICS ’96, 1996. 0-8186-7364-8196.

7. Managerial Use of Metrics for Object-Oriented Software: An Exploratory Analysis. Chidamber,

Shyam R. 8, Washington,DC : IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,, 1998,

Vol. 24. 0098-5589/98.

8. Methodology for Validating Software Metrics. F.Schneidewind, Norman. 5, Monterey : IEEE Transactions on Software Engineering, 1992, Vol. 18. 0098-5589/92.

9. A Metrics Suite for Object Oriented Design. Chidamber, Shyam R. and Kemerer, Chris F. 6, Cambridge : IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,, 1994, Vol. 20. 0098-5589/94.

10. The Use of Software Complexity Metrics in Software Maintenance. DENNIS KAFURA. 3, Virginia : IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1987, Vols. SE-13,. 0098-5589/87/0300-0335.

11. An Emprical Study of Software Metrics. LI, H. F. and CHEUNG, W. K. 6, Montreal : IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1987, Vols. SE-13. 0098-5589/87/0600-0697. 12. A Complexity Measure. MCCABE, THOMAS J. 4, Ft. Meade : IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,, 1976, Vols. SE-2.

13. A Metrics Suite for Object Oriented Design. Kemerer, Shyam R. Chidamber and Chris F. 6, Cambridge : IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,, 1994, Vol. 20. IEEE Log Number 940 1707.

14. TOWARDS A METRICS SUITE FOR OBJECT ORIENTED DESIGN. Shyam R. Chidamber,

Chris F. Kemerer. Cambridge : OOPSLA , 1991. ACM 89791-446-5/91/0010/0197.

(34)

28

17. Empirical Validation of Object-Oriented Metrics of Open Source Software for Fault Prediction. Szeged : IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2005, Vol. 31.

18. Evaluating Software Complexity Measures. J.WEYUKER, ELIANE. 9, s.l. : IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,, 1988, Vol. 14.

19. Open Source Tools to measure software complexity. RADULESCU, Iulian-Ionut. 2, s.l. : Open Source Science Journal, 2009, Vol. 1.

20. SOFTWARE COMPLEXITY MEASUREMENT. KEARNEY, JOSEPH K. 11, s.l. : Communications of the ACM, 1986, Vol. 29.

21. A Novel Method to Measure Comprehensive Complexity of Software . Rashidi, Hassan. s.l. : IEEE Computer Society, 2010.

22. Prediction of Fault-proneness at Early Phase in Object-Oriented Development. Kamiya,

Toshihiro. Osaka : s.n.

23. Quantitative Evaluation of Software Quality Metrics in Open-Source Projects. Henrike

Barkmann. s.l. : IEEE Computer Society, 2009.

24. Useful Automated Software Testing Metrics. Garrett, Thom. 2011.

25. OO Design Quality Metrics - An Analysis of Dependencies. Martin, Robert. 1994.

(35)

SE-391 82 Kalmar / SE-351 95 Växjö Tel +46 (0)772-28 80 00