• No results found

Static code metrics vs. process metrics for software fault prediction using Bayesian network learners

N/A
N/A
Protected

Academic year: 2021

Share "Static code metrics vs. process metrics for software fault prediction using Bayesian network learners"

Copied!
54
0
0

Loading.... (view fulltext now)

Full text

(1)

 

 

 

 

 

Static  code  metrics  vs.  process  metrics  for  

software  fault  prediction  using  Bayesian  

network  learners  

        Mälardalen  University      

School  of  Innovation,  Design  and  Technology      

Author:  Biljana  Stanić    

Thesis  for  the  Degree  of  Master  of  Science  in  Software  Engineering  (30.0  credits)    

Date:  28th  October,  2015    

Supervisor:  Wasif  Afzal    

Examiner:  Antonio  Cicchetti              

 

   

(2)

       

 

 

               

“The  real  question  is  not  whether  machines  think  but  whether  men  do.    

The  mystery  which  surrounds  a  thinking  machine  already  surrounds  a  

thinking  man.”  

   

Burrhus  Frederic  Skinner

                                         

(3)

Acknowledgments  

 

I   would   like   to   express   my   deep   appreciation   to   my   supervisor,   Dr.   Wasif   Afzal,   for   the   constructive,  useful  suggestions,  and  guidelines  throughout  my  research  work.    

 

I  would  also  like  to  thank  to:    

EUROWEB   Project1,   funded   by   the   Erasmus   Mundus   Action   II   programme   of   the  

European  Commission;  

Lech  Madeyski  and  Marian  Jureczko  for  using  their  metrics  repository.    

Finally,  I  wish  to  thank  my  dear  ones  for  their  support  and  for  believing  in  me  all  this  time.                                                                                                                                                                                             1  http://www.mrtc.mdh.se/euroweb/  

(4)

Abstract  

   

Software   fault   prediction   (SFP)   has   an   important   role   in   the   process   of   improving   software   product   quality   by   identifying   fault-­‐prone   modules.   Constructing   quality   models   includes   a   usage   of   metrics   that   describe   real   world   entities   defined   by   numbers   or   attributes.   Examining   the   nature   of   machine   learning   (ML),   researchers   proposed   its   algorithms   as   suitable   for   fault   prediction.   Moreover,   information   that   software   metrics   contain   will   be   used  as  statistical  data  necessary  to  build  models  for  a  certain  ML  algorithm.   One   of   the   most   used   ML   algorithms   is   a   Bayesian   network   (BN),   which   is   represented  as  a  graph,  with  a  set  of  variables  and  relations  between  them.     This  thesis  will  be  focused  on  the  usage  of  process  and  static  code  metrics  with   BN  learners  for  SFP.  First,  we  provided  an  informal  review  on  non-­‐static  code   metrics.  Furthermore,  we  created  models  that  contained  different  combinations   of   process   and   static   code   metrics,   and   then   we   used   them   to   conduct   an   experiment.   The   results   of   the   experiment   were   statistically   analyzed   using   a   non-­‐parametric  test,  the  Kruskal-­‐Wallis  test.    

The  informal  review  reported  that  non-­‐static  code  metrics  are  beneficial  for  the   prediction  process  and  its  usage  is  highly  recommended  for  industrial  projects.   Finally,  experimental  results  did  not  provide  a  conclusion  which  process  metric   gives   a   statistically   significant   result;   therefore,   a   further   investigation   is   needed.    

 

 

 

 

   

               

(5)

Contents

     

Abstract  ...  4  

List  of  Figures  ...  7  

List  of  Tables  ...  8  

Abbreviations  ...  9  

1.  Introduction  ...  11  

1.1   Motivation  ...  12  

1.2   Research  questions  ...  13  

2.  Background  ...  14  

2.1  Software  fault  prediction  ...  14  

2.2  Software  metrics  ...  15  

2.2.1  Static  code  metrics  ...  15  

2.2.2  Process  metrics  ...  15  

2.3  Machine  learning  ...  16  

2.3.1  Bayesian  network  ...  17  

2.3.1.1  The  Naive  Bayes  Classifier  ...  19  

2.3.1.2  Augmented  Naive  Bayes  Classifier  ...  19  

3.  Method  ...  21  

3.1  Methodology  for  RQ1  ...  21  

3.2  Methodology  for  RQ2  ...  23  

4.  Informal  review  ...  25  

4.1  Process  metrics  ...  25  

4.1.1  Code  churn  metrics  ...  26  

4.1.2  Developer  metrics  ...  26  

4.1.3  Other  process  metrics  ...  26  

5.  Design  ...  27  

5.1  Projects  ...  27  

5.2  Extracted  metrics  ...  27  

5.3  Evaluation  of  results  ...  29  

5.4  Experiment  ...  29  

6.  Results  ...  33  

6.1  Statistical  analysis  of  results  ...  33  

6.2.1  Results  of  the  experiment  for  NB  classifier  ...  33  

6.2.1.1  Models  with  combined,  static  code  and  process  metric  ...  34  

6.2.1.2  Models  with  static  code  and  1  process  metrics  ...  35  

6.2.1.3  Models  with  a  combination  of  2  process  and  static  code  metrics  ...  36  

6.2.1.4  Models  with  a  combination  of  3  process  and  static  code  metrics  ...  37  

6.2.2  Results  of  the  experiment  for  TAN  classifier  ...  38  

6.2.2.1  Models  with  combined,  static  code  and  process  metric  ...  38  

6.2.2.2  Models  with  static  code  and  1  process  metrics  ...  39  

6.2.2.3  Models  with  a  combination  of  2  process  and  static  code  metrics  ...  40  

6.2.2.4  Models  with  a  combination  of  3  process  and  static  code  metrics  ...  41  

7.  Result  discussion  ...  42  

(6)

8.  Validity  threats  ...  45  

8.1  Internal  validity  ...  45  

8.2  External  validity  ...  45  

8.3  Statistical  conclusion  validity  ...  45  

9.  Conclusion  ...  46   9.1  Future  works  ...  46   9.1.1  Model  investigation  ...  46   9.1.2  Industrial  projects  ...  46   9.1.3  Data  extraction  ...  47   Reference  ...  48  

A  Graphs  for  NB  classifier  ...  51  

B  Graphs  for  TAN  classifier  ...  53                                                                

(7)

List  of  Figures  

   

Figure  1.  Example  of  NB  [6]  ...  19  

Figure  2.  Example  of  STAN  [6]  ...  20  

Figure  3.  Methodology  for  the  RQ1  ...  22  

Figure  4.  Methodology  for  the  RQ2  ...  24  

Figure  5.  Weka  Explorer  ...  30  

Figure  6.  Set  values  for  NB  classifier  ...  31  

Figure  7.  Classifier  output  with  results  ...  32  

Figure  8.  Graphs  with  comparison  results  for  NB  classifier  ...  42  

Figure  9.  Graphs  with  comparison  results  for  TAN  classifier  ...  43  

Figure  10.  Graphical  comparison  of  combined,  static  code  and  process  models  using  NB  ...  51  

Figure  11.  Graphical  comparison  of  models  containing  1  process  and  static  code  metrics   using  NB  ...  51  

Figure  12.  Graphical  comparison  of  models  containing  the  combination  of  2  process  and   static  code  metrics  using  NB  ...  52  

Figure  13.  Graphical  comparison  of  models  containing  the  combination  of  3  process  and   static  code  metrics  using  NB  ...  52  

Figure  14.  Graphical  comparison  of  combined,  static  code  and  process  models  using  TAN  ...  53  

Figure  15.  Graphical  comparison  of  models  containing  1  process  and  static  code  metrics   using  TAN  ...  53  

Figure  16.  Graphical  comparison  of  models  containing  the  combination  of  2  process  and   static  code  metrics  using  TAN  ...  54  

Figure  17.  Graphical  comparison  of  models  containing  the  combination  of  3  process  and   static  code  metrics  using  TAN  ...  54                              

(8)

List  of  Tables

 

 

Table  1.  Selected  projects  for  the  experiment  ...  27   Table  2.  Results  for  combined,  static  code  and  process  metric  using  NB  classifier  ...  34   Table  3.  Comparison  results  for  combined,  static  code  and  process  models  for  NB  

classifier  ...  34   Table  4.  Results  for  models  containing  1  process  and  static  code  metric  using  NB  

classifier  ...  35   Table  5.  Comparison  results  for  models  containing  static  code  and  1  process  metrics  

using  NB  classifier  ...  35   Table  6.  Results  for  models  containing  combination  of  2  process  and  static  code  metrics  

using  NB  classifier  ...  36   Table  7.  Comparison  results  for  models  containing  combination  of  2  process  and  static  

code  metrics  using  NB  classifier  ...  36   Table  8.  Results  for  models  containing  combination  of  3  process  and  static  code  metrics  

using  NB  classifier  ...  37   Table  9.  Comparison  results  for  models  containing  combination  of  3  process  and  static  

code  metrics  using  NB  classifier  ...  37   Table  10.  Results  for  combined,  static  code  and  process  models  using  TAN  classifier  ...  38   Table  11.  Comparison  results  for  combined,  static  code  and  process  models  using  TAN  

classifier  ...  38   Table  12.  Results  for  models  containing  1  process  and  static  code  metrics  using  TAN  

classifier  ...  39   Table  13.  Comparison  results  for  models  containing  static  code  and  1  process  metrics  

using  TAN  classifier  ...  39   Table  14.  Results  for  models  containing  combination  of  2  process  and  static  code  metrics   using  TAN  classifier  ...  40   Table  15.  Comparison  results  for  models  containing  combination  of  2  process  and  static  

code  metrics  using  TAN  classifier  ...  40   Table  16.  Results  for  models  containing  combination  of  3  process  and  static  code  metrics   using  TAN  classifier  ...  41   Table  17.  Comparison  results  for  models  containing  combination  of  3  process  and  static  

code  metrics  using  TAN  classifier  ...  41  

 

   

 

 

   

(9)

Abbreviations    

 

SFP   Software  Fault  Prediction   ML   Machine  Learning  

BN   Bayesian  Network   DAG   Directed  Acyclic  Graph  

NB   Naive  Bayes  

ANB   Augmented  Naive  Bayes   TAN   Tree  Augmented  Naive  Bayes  

FAN   Forest  Augmented  Naive  Bayes   STAN   Selective  Tree  Augmented  Bayes   STAND   Selective  Tree  Augmented  Bayes  with  

Discarding  

SFAN   Selective  Forest  Augmented  Bayes   SFAND   Selective  Forest  Augmented  Bayes  with  

Discarding  

ANOVA   Analysis  of  Variance  

WMC   Weighted  Methods  per  Class   DIT   Depth  of  Inheritance  Tree   NOC   Number  Of  Children  

CBO   Coupling  Between  Object  class   RFC   Response  For  a  Class  

LCOM   Lack  of  Cohesion  in  Methods  

LCOM3   Lack  of  Cohesion  in  Methods  (normalized   version  of  LCOM)  

Ca   Afferent  Coupling   Ce   Efferent  Coupling   LOC   Lines  Of  Code  

(10)

DAM   Data  Access  Metric   MOA   Measure  Of  Aggregation    

CAM   Cohesion  Among  Methods  of  class     IC   Inheritance  Coupling    

CBM   Coupling  Between  Methods   AMC   Average  Method  Complexity  

CC   McCabe’s  Cyclomatic  complexity     NR   Number  of  Revisions    

NDC   Number  of  Distinct  Committers     NML   Number  of  Modified  Lines    

NDPV   Number  of  Defects  in  the  Past  Version     ROC   Receiver  Operating  Characteristic   AUC   Area  Under  the  Curve  

RQ   Research  Question   RQ1   Research  Question  1   RQ2   Research  Question  2                                    

(11)

1.  Introduction  

Introducing  the  term  ‘software  engineering  measurement’  and  its  basic  characteristics,   creates   a   logical   step   for   defining   another   one—software   metrics.   Software   metrics   represent  numbers  or  certain  attributes  used  for  describing  real  world  entities  formed   by  definite  rules.  Moreover,  there  are  certain  software  quality  assurances  that  use  such   metrics.  One  of  the  activities  involves  constructing  quality  models  based  on  metrics.  In   that   manner,   quality   metrics   are   beneficial   for   determining   if   the   software   system   delivers  the  intended  functionality  [1].    

 

A   software   fault   prediction   (SFP)   model   is   one   type   of   the   quality   model   that   has   attracted  a  lot  of  research  in  the  past  few  decades  [2].  Complexity  of  the  system  brings  

mistakes  in  the  code,  labeled  as  faults2  [16],  and  their  discovery,  as  the  system  grows,  is  

not  an  easy  task  for  the  developers  [3].    The  process  of  detecting  faults  can  be  long  and   infinite,   since   it   is   hard   to   claim   that   a   system   is   100%   fault-­‐free.   This   also   requires   additional   resources   during   testing   which   leads   to   enlarging   costs   of   software   development.  One  of  the  solutions  is  the  creation  of  a  prediction  model  that  will  guide   development  team  towards  quicker  and  more  efficient  fault  detection  [3].  With  the  help   of  metrics,  it  is  possible  to  define  a  model  that  is  responsible  for  predicting  faults.  Hall  et   al.  [4]  claim  that  there  are  a  lot  of  complex  models  that  deal  with  the  problem  of  fault   prediction,  but  despite  that,  there  is  a  visible  lack  of  information  about  the  actual  state  in   this  area.    

 

Song  et  al.  [2]  list  three  most  researched  problems  in  SFP.  These  problems  are:  

o Determining  the  number  of  faults  that  were  not  identified  during  testing;  

o Finding  connections  between  remaining  faults  and;  

o Classification  of  components  that  are  fault-­‐prone.    

To  solve  above  mentioned  problems,  a  lot  of  attention  in  the  past  has  been  on  applying   machine  learning  techniques.  Machine  learning  (ML)  is  about  programming  machines  in   order   to   get   optimized   results   using   statistical   data   or   previous   experience.   ML   uses   statistical   rules   to   build   different   mathematical   models   necessary   for   creating   a   conclusion  from  a  sample  [15].  The  nature  of  ML,  and  some  of  its  algorithms,  is  suitable   for  a  creation  of  fault  prediction  models,  which  has  shown  good  results.  ML  algorithms   use   patterns   to   identify   and   classify   features,   e.g.,   whether   a   software   component   contains  a  fault  or  not.  One  of  the  mostly  used  ML  algorithms  is  a  Bayesian  network  (BN)   [2].   The   BN   is   represented   through   a   directed   acyclic   graph   (DAG)   containing   set   of   variables,  a  certain  structure  that  is  defined  as  a  relation  between  variables  and  a  set  of   probability  distributions.  For  the  cases,  such  as  software  products,  the  usage  of  the  BN   can   be   depicted   as   a   relationship   between   software   features   and   possible   faults.   The   network  can  be  used  for  computing  probabilities  of  the  presence  of  different  faults  for   specified  features.  That  way,  we  will  know  which  faults  are  possible  to  occur  and  how   we  can  control  and  isolate  them  [5].  

   

                                                                                                               

(12)

1.1 Motivation  

The  main  arguments  for  this  thesis  are  found  in  recent  publications  [6],[7],  where  the   usage  of  BNs  has  been  found  to  be  performing  “surprisingly  well”  [6].    

   

Dejaeger  et  al.  [6]  conducted  experiments  where  different  BN  learners  with  static  code   metrics   were   compared.   They   made   this   choice   of   metrics   because   they   are   easier   to   gather  and  they  are  widely  used  in  practice.  They  recorded  better  prediction  for  cases   where   a   smaller   set   of   features,   that   are   highly   predictive,   were   used.   For   the   future   work,  they  stated:  

 

“Recently,  several  researchers  turned  their  attention  to  another  topic  of  interest,   i.e.,   the   inclusion   of   information   other   than   static   code   features   into   fault   prediction  models  such  as  information  on  intermodule  relations  and  requirement   metrics.   The   relation   to   the   more   commonly   used   static   code   features   remains   however  unclear.  Using,  e.g.,  Bayesian  network  learners,  important  insights  into   these   different   information   sources   could   be   gained   which   is   left   as   a   topic   for   future  research.”  [6]    

 

This  conclusion  represents  the  starting  point  and  the  main  motivation  for  the  thesis   definition.  Dejaeger  et  al.  [6]  indicated  that  one  interesting  direction  of  investigating   BN   learners   for   SFP   is   to   include   non-­‐static   code   metrics.   They   are   proposing   information   on   intermodule   relations   and   requirement   metrics.   These   two   are   just   examples  of  many  other  non-­‐static  code  metrics  that  can  be  used  for  SFP.    

 

Another  paper  that  deals  with  BNs  for  SFP,  and  gives  a  great  motivation,  is  by  Okutan  et   al.   [7].   For   the   experiment,   they   reported   performance   of   some   static   code   metrics.   Namely,   they   stated   that   metrics   with   information   about   a   number   of   lines   of   code   (LOC),   low   quality   of   a   coding   style   and   a   class   response   were   the   most   effective.   Moreover,  they  presented  other  conclusions  of  their  experiment  and  possible  steps  for   an  extension  of  their  research:  

 

“As   a   future   direction,   we   plan   to   refine   our   research   to   include   other   software   and  process  metrics  in  our  model  to  reveal  the  relationships  among  them  and  to   determine  the  most  useful  ones  in  defect  prediction.  We  believe  that  rather  than   dealing  with  a  large  set  of  software  metrics,  focusing  on  the  most  effective  ones   will  improve  the  success  rate  in  defect  prediction  studies.”  [7]  

 

Following  the  statement  in  [6],  Okutan  et  al.  are  proposing  the  usage  of  process  metrics.      

Radjenović   et   al.   [8]   have   done   a   systematic   literature   review   on   SFP   metrics   and   collected  results  that  are  supporting  statement  given  in  [7]  about  process  metrics  and   their  future  usage  for  fault  prediction.  They  came  up  with  the  following  conclusions:  

o Process   metrics   have   shown   better   results   in   detecting   faults   on   post-­‐release  

level  than  static  code  metrics;  

o Process   metrics,   comparing   to   static   code   metrics,   contain   more   descriptive  

information  related  to  fault  distribution;  

(13)

o Unlike   object-­‐oriented   metrics   that   are   mostly   using   small   datasets,   process   metrics  are  applicable  for  larger  datasets  and  it  is  an  advantage  when  it  comes  to   the  validity  and  maturity  of  research  results  [4];  

o Process   metrics   can   be   very   beneficial,   but   they   are   mostly   used   in   industrial  

domain.   Therefore,   it   is   challenging   for   researchers   to   use   process   metrics   in   their  studies  in  the  future.    

 

1.2 Research  questions  

Taking  into  account  the  collected  statements  about  SFP,  BNs  and  the  usage  of  non-­‐static   code  metrics,  the  structure  of  the  thesis  will  be  defined  by  the  research  questions:  

o RQ1:  What  is  the  current  state-­‐of-­‐the-­‐art  with  respect  to  the  use  of  non-­‐static  code   metrics  for  SFP?      

o RQ2:   What   is   an   impact   of   process   metrics   combined   with   static   code   metrics,   in   terms  of  performance,  using  BN  learners  for  SFP?    

   

In   RQ1,   we   will   discuss   and   explain   in   detail   the   current   state-­‐of-­‐the-­‐art   of   non-­‐static   code   metrics,   with   a   focus   on   process   metrics.   Research   will   be   driven   by   conclusions   made   by   Radjenović   et   al.   [8].   For   RQ2,   we   will   be   conducting   experiments   where   several   process   metrics   are   combined   with   static   code   metrics   and   they   will   be   used   with  BN  learners.  The  results  of  this  thesis  can  give  a  new  insight  into  software  metrics  

that  can  be  beneficial  for  SFP.    

 

The   content   of   the   thesis   is   organized   as   follows:   Section   2   contains   background   information  about  SFP,  software  metrics,  ML  and  BN.  Section  3  explains  the  method  for   collecting  resources  for  the  informal  review  and  the  experiment.    In  Section  4,  informal   review  on  non-­‐static  metrics  will  be  presented.  Section  5  contains  information  about  the   design  of  the  experiment,  used  datasets  and  a  response  variable.  In  Section  6,  results  of   the  experiment  and  statistical  analysis  will  be  presented.  Moreover,  Section  7  provides  a   discussion   of   a   comparison   of   experimental   outcomes.   Section   8   explains   possible   validity  threats.  Finally,  Section  9  offers  conclusions  and  lists  areas  of  future  work.                          

(14)

2.  Background    

In  this  section,  we   will  describe   SFP,   the   purpose   of   software   metrics,   how   ML   can   be   used  for  fault  prediction  (with  a  focus  on  BN  learners).    

 

2.1  Software  fault  prediction  

SFP,  as  quality  model,  is  recognized  as  an  important  tool  for  improving  software  product   by   identifying   possible   faults   that   can   occur   in   the   system.   Faults   directly   jeopardize   software,  which  decreases  its  performance,  and,  furthermore,  its  quality  [16].      

 

Software   systems   are   designed   to   handle   complex   activities   in   different   domains.   Because  of  their  critical  nature,  every  system  is  supposed  to  provide  quality  of  service   for   its   users.   As   the   system   increases,   the   growth   of   potential   faults   enhancements.   Dejaeger  et  al.  [6]  found  several  cases  where  software  faults  were  examined  in  terms  of   reliability  and  bug  localization.  In  order  to  test  reliability,  we  need  to  create  a  stochastic   model   that   will   output   probability   of   fault   existence   once   the   component   is   executed.   Combining  remaining  components,  we  are  able  to  estimate  the  reliability  of  the  whole   system.  Bug  localization  is  based  on  usage  of  certain  patterns  that  are  associated  with   faulty  components.  Using  this  approach,  we  can  discover  faults  that  were  not  previously   detected  [6].  

We  saw  that  it  is  crucial  to  classify  and  identify  fault-­‐prone  modules  of  the  software  on   time.  Different  studies  in  this  area  have  shown  that  faults,  in  majority  cases,  occur  just  in   several  modules  and  it  causes  malfunctioning  of  the  other  parts  of  the  system  and,  also,   those  that  are  in  direct  relation  with  faulty  ones.  This  means  that  the  costs  of  production   and  maintenance  will  increase  due  to  the  effect  of  faults  in  these  modules  [9].  Therefore,   much   research   has   been   focused   on   making   fault   prediction   regarding   system   under   development.  Such  predictions  focus  on  locating  modules  that  can  be  shown  to  be  fault-­‐ prone  [10].  Thus,  creating  accurate  predictions  is  useful  for  increasing  the  quality  of  the   system  [9].    

 

To  be  able  to  achieve  good  results,  software  engineers  have  to  find  a  suitable  prediction   technique   that   will   help   them   in   their   intention   of   detecting   faults   and   conducting   experimental  validation  [10].  Since  direct  measurement  of  fault  prediction  is  impossible,   we  need  to  use  metrics  for  necessary  estimation.  Reliable  results  of  prediction  depend   on  selection  of  software  metrics.  One  needs  to  select  appropriate  metrics,  because  there   exists   a   number   of   datasets   that   can   make   a   prediction   harder   and   might   result   in   unsatisfactory  and  misleading  results.    

 

Once  the  metrics  are  chosen,  it  is  necessary  to  decide  which  technique  will  be  used  for  

making  predictions.  ML  is  the  non-­‐parametric3  technique  that  is  frequently  used  for  SFP.  

Sometimes  datasets  can  be  heavily  skewed,  which  results  in  inaccurate  prediction.  ML   techniques  have  a  mechanism  to  overcome  such  problems,  because  it  has  an  ability  to   “learn  from  imbalanced  datasets”  [11].    

In  Section  2.2,  we  will  describe  software  quality  metrics  and  ML  as  key  elements  in  SFP   process.  

                                                                                                                3  distribution-­‐free  

(15)

2.2  Software  metrics  

Software  quality  metrics  deserve  particular  attention  in  SFP  since  they  can  be  used  for   measuring   quality   of   the   system.   They   are   further   divided   into   in-­‐process   and   end-­‐ process  metrics.  The  first  group  of  metrics  is  responsible  for  improving  the  development   process,  unlike  end-­‐process  that  is  focused  on  assessment  of  the  characteristics  of  the   final  product.    

 

Based   on   measuring   specific   parts   of   the   system,   there   can   be   two   types   of   software   quality  metrics:  

o Static;  

o Dynamic  metrics.  

 

Static   code   metrics   are   suitable   for   checking   attributes   of   the   code,   such   as   the   complexity  of  the  software  and  accessing  the  length  of  the  code.  Dynamic  code  metrics  in   a   great   extent   examine   the   behavior   of   the   system,   presented   as   usability,   reliability,   maintainability  and  evaluation  of  the  efficiency  of  the  program  [6].    

 

In  Section  2.2.1,  we  will  briefly  introduce  some  metrics,  namely  static  code  and  process,   which  are  relevant  for  the  further  content  of  this  thesis.    

 

2.2.1  Static  code  metrics  

Static   code   metrics   are   a   type   of   quality   metrics.   Their   usability   is   noticeable   when   it   comes  to  measuring:  

o Size  (through  lines  of  code  (LOC)  counts);  

o Complexity  (using  linearly  path  counts);  

o Readability  (through  operator  counts  and  different  operands).  

 

The  principle  of  calculating  static  code  metrics  is  based  on  parsing  of  the  source  code;   therefore,   the   process   of   metrics   collection   is   automated.   Because   of   this   feature,   it   is   manageable   to   measure   metrics   of   the   whole   system,   regardless   of   the   system’s   size.   Moreover,  it  is  possible  to  make  predictions  about  the  entire  system  based  on  metrics—   developers   can   easily   find   faulty   modules   since   they   have   a   clear   image   of   system’s   vulnerabilities  [12].  Static  code  metrics  are  easier  and  widely  used  in  practice;  therefore,   they  are  representing  a  safe  choice  for  predicting  faulty  software  [6].  

 

2.2.2  Process  metrics  

Process  metrics  are  also  used  for  measuring  the  quality  of  the  system  [8].  They  can  be   specified  from  various  sources:  

o Developer’s  experience  [10];  

o Software  change  history  [8],  etc.  

Developer’s   experience   is   concentrated   on   activities   that   are   envisaging   how   a   certain   part  of  the  code  (or  the  whole  system)  was  developed  [10].    

Those  metrics  that  are  determined  through  the  software  change  history  split  into  two   groups:    

o Delta;  

(16)

Delta  metrics  are  defined  as  the  difference  between  versions  of  the  software  [8].  As  the   result,   we   have   the   changed   value   of   the   metrics   from   one   version   to   another.   An   illustrative  example  is  when  we  are  adding  new  lines  of  code  and  when  we  save  those   changes,  delta  value  will  be  different.  In  cases  when  we  are  adding,  and  at  the  same  time   removing  the  same  number  of  lines  of  code,  delta  value  between  those  two  versions  will   be  the  same.  To  be  able  to  track  changes,  code  churn  metrics  will  report  every  activity   on  the  code.    

Advantage   of   using   process   metrics   is   that   they   contain   more   descriptive   information   about   a   faulty   part   of   code.   They   are   also   good   at   making   fault   predictions   on   post-­‐ release   level.   Since   process   metrics   are   used   mostly   for   industrial   purposes,   they   are   able  to  handle  large  datasets,  which  leads  to  better  validity  of  the  predictions  [8].  

 

2.3  Machine  learning  

ML   is   about   programming   machines   in   order   to   get   optimized   results   using   statistical   data   or   previous   experience.   ML   uses   statistical   rules   to   build   different   mathematical   models  necessary  for  creating  the  conclusion  from  the  sample  [15].  Those  models  can   give   predictions   of   the   future   steps   or   form   a   description   based   on   knowledge   from   different  data,  or  combining  both  in  particular  cases  [15].  The  first  step  for  building  the   model   is   to   train   data   using   certain   algorithms   in   order   to   optimize   the   specified   problem.  Learned  model  has  to  be  efficient  in  terms  of  time  and  space  complexity.  

 

ML  has  several  applications  in:  

o Learning  Associations;   o Supervised  learning;   o Unsupervised  Learning;   o Regression;   o Reinforcement  Learning.    

We  will  use  real-­‐life  examples  to  explain  types  of  ML.    

Learning  Associations  are  suitable  for  “learning  a  conditional  probability”.  Probability   is  presented  in  equation  1:  

 

  𝑃(𝑋)   (1)  

 

where  Y   is   a   variable   that   is   conditioned   on   X.   Moreover,   X   can   be   a   single   or   a   set   of   variables  of  the  same  type  as  Y.  Taking  the  example  of  a  bookstore;  Y  can  be  a  book  that   we  are  conditioning  on  X.  Based  on  customer's  behavior,  we  know  that  when  s(he)  buys   a  book  from  X,  there  is  a  high  probability  that  Y  will  we  bought  as  well.    

 

In   supervised   learning   input   features   are   related   to   corresponding   outputs   and   machine  has  to  understand  rules  of  mapping  between  those  two  parameters.  Supervised   learning   is   applicable   for   prediction   tasks,   where   is   necessary   to   identify   connections   between   different   measures.     On   the   other   hand,   for   unsupervised   learning,   output   data   are   not   required,   however,   it   examines   structure   within   some   input   dataset.   Unsupervised  learning  is  not  suitable  for  predictions  of  existence  software  faults  in  the  

(17)

system   because   of   its   nature   to   create   the   result   without   previously   specified   output   data.    

 

Cases   of   supervised   learning   problems   are   classification   and   regression.   In   classification,   we   are   making   predictions   based   on   a   rule   learned   from   the   past   data.   Assuming   that   a   behavior   is   similar   in   the   past   and   future,   we   can   easily   produce   predictions   for   every   future   case.   The   input   contains   data   that   we   need   to   analyze,   whereas  the  output  is  represented  as  classes  that  have  a  descriptive  value.    

Regression   has   the   same   approach   for   selecting   the   input   data   as   the   classification   problem,   but   for   the   output   we   will   get   a   numerical   value.   In   classification   and   regression  problems  we  are  creating  the  model,  shown  in  equation  2:  

 

  𝑦 = 𝑔(𝑥|𝜃)   (2)  

 

where  y  is  the  class,  in  classification,  or  the  numerical  value,  in  regression.  The  model  is  

represented  as  g  (.)  and  model’s  parameters  as  𝜃.  The  task  of  ML  is  to  optimize  values  of  

𝜃,  in  order  to  get  a  minimized  approximation  error.      

Reinforcement  Learning  has  for  the  input  set  of  actions.  In  that  system,  it  is  important   that   all   actions   are   part   of   a   “good   policy”   [15],   which   will   lead   to   obtaining   correct   results.  In  this  particular  problem,  we  are  not  observing  a  single  action.  The  task  of  the   program  is  to  learn  with  characteristics  are  forming  the  good  policy  based  on  past  set  of   actions  [15].      

 

Finally,  ML  has  a  mechanism  to  overcome  problems  of  datasets  that  are  heavily  skewed   and  that  can  cause  inaccurate  results  [11].  This  basically  means  that  it  possible  to  create   prediction  even  if  some  data  are  missing,  or  there  is  a  big  number  of  variables,  etc.  [17].        

The  BN  is  a  type  of  supervised  learning  paradigm  and  one  type  of  algorithm  suitable  for   SFP  (see  Section  2.3.1).    

 

2.3.1  Bayesian  network  

The   structure   of   a   BN   is   presented   as   a   directed   acyclic   graph   (DAG)   consisting   of   variables,  a  certain  structure  that  is  defined  as  a  relation  between  variables  and  a  set  of   probability   distributions   [5].   Considering   the   nature   of   SFP,   BN   can   be   used   for   calculating  probabilities  regarding  the  presence  of  faults  in  the  software  features.    

 

The  BN  graphically  presented  as  a  graph,  contains  following  elements:  

o Variables  that  are  presented  as  vertices  or  nodes;  

o Conditional  dependencies  that  are  edges  or  arcs.  

This  graph  cannot  contain  cycle  and  all  edges  inside  it  have  to  be  directed.  BNs  provide   theoretical   framework   that   combined   with   statistical   data,   can   give   good   fault   predictions  [5].  Model  for  SFP  can  be  defined  as  (see  equation  3):    

 

  𝐷 = 𝑡𝑟𝑛  { 𝑥!, 𝑦! }!!= 1   (3)  

(18)

where  D  is  a  set  with  N  observations,  with  𝑥! ∈ 𝑅!  shows  all  static  and  non  static  code  

features  and  𝑦! ∈ {0, 1}  is  used  to  indicate  presence  of  some  fault.  

Using   Bayesian   theorem   for   the   BN   to   calculate   probability   of   fault   presence,   we   will   have  the  following  equation  4:  

 

  𝑃 𝑦! = 1|𝑥! =  𝑃 𝑥!|𝑦! = 1 𝑃(𝑦! = 1)

𝑃(𝑥!)   (4)  

 

The  concept  of  the  BN  is  based  on  probability  distribution  using  stochastic  variables  that  

can  be  both  continuous  and  discrete.  Individual  variables  𝑥(!)  are  used  to  construct  the  

graph   and   dependencies   between   those   variables   are   presented   with   directed   arcs.   In  

the   same   way,   independence   between   different   nodes   𝑥(!)   and   𝑥(!!)   is   shown   by   the  

absence  of  the  arc  [6].    

     

Defining  and  building  the  BN  consists  of  three  steps  [14]:  

o “Set”  and  “field”  variables  have  to  be  defined;  

o Network  topology  has  to  be  constructed;  

o Probability  distribution  on  the  local  level  has  to  be  identified.  

 

Set  and  field  variables  have  to  be  defined  

“Set”   variable   inspects   possible   factors   that   are   causing   software   faults,   while   “field”   variable   has   to   change   the   range   of   all   variables   to   the   degree   of   software   faults.   The   value   of   “set”   variable   can   vary   depending   on   the   system,   organization,   environment   where  it  will  be  used.  Variable  “field”  can  be  stated  as  “high”,  “middle”  or  “low”  (or  more   precise  if  it  is  required).  

 

Network  topology  has  to  be  constructed  

Network   topology   is   defined   using   event   relation.   The   construction   of   the   certain   topology  depends  on  studies  or  literature  and  it  can  be  changed  if  experiments  demand   it.    

 

Probability  distribution  on  the  local  level  has  to  be  identified  

In   order   to   identify   probability   distribution   on   the   local   level,   it   is   required   to   find   marginal   and   conditional   probabilities.   Probability   distribution   can   be   used   to   emphasize  “affect  degree  of  causality”  [14].  

 

BN  classifiers  are  used  for  problems  that  require  classification.  Other  attractive  features   that  classifiers  have  are  presented  in  following  list:  

o Models  can  be  created  even  if  the  knowledge  is  uncertain;  

o Probabilistic  model  can  be  used  for  cost-­‐sensitive  problems;  

o The  nature  of  BNs  classifiers  can  deal  with  issues  related  to  data  that  are  missing;  

o Classifiers  can  solve  complex  classification  problems;  

o Future  work  on  BNs  include  presenting  models  in  hierarchical  manner  based  on  

their  complexity;  

o There  is  possibility  of  using  BN  classifiers  with  algorithms  that  have  linear  time  

complexity;  

(19)

 

In  Section  2.3.1.1,  we  will  present  2  types  of  BN  learners  (classifiers)  what  will  be  used   for  the  experiment.  

 

2.3.1.1  The  Naive  Bayes  Classifier    

The  Naive  Bayes  (NB)  is  based  on  “conditional  independence  between  attributes  given   the  class  label”  [6],  which  is  represented  with  nodes  in  directed  acyclic  graph,  where  one   node   is   a   parent   and   the   rest   are   children.   The   node   is   a   certain   value   in   the   dataset.   Results  from  different  studies  have  shown  that  NB  classifiers  are  giving  good  results  in   fault  prediction.    

 

In  order  to  calculate  fault  probability  for  classes,  each  of  them  will  have  a  vector  of  input   variables   for   every   new   code   segment.   Resulted   probabilities   are   gained   using   “frequency   counts   for   the   discrete   variables   and   a   normal   or   kernel   density-­‐based   method  for  continuous  variables”.    Because  of  their  simplified  nature,  NB  classifiers  can   be  constructed  very  easily,  containing  computational  efficiency.  The  structure  of  NB  is   shown  in  Figure  1  [6].    

 

  Figure  1.  Example  of  NB  [6]  

As  we  have  shown  on  Figure  1.  NB  consists  of  one  unobserved  parent  variable  (node  y)   and  a  number  of  observed  children  variables  (x  nodes).  

 

2.3.1.2  Augmented  Naive  Bayes  Classifier    

Augmented   Naive   Bayes   (ANB)   classifiers   were   created   as   a   modification   of   the   basic   conditional  independence  assumption  principle  of  the  NB.  We  will  show  changes  in  the   graph   that   are   made   by   adding   new   arcs   and   removing   unnecessary   variables.   One  

(20)

example   of   ANB   classifier   is   the   Tree   Augmented   Naive   Bayes   (TAN),   where   each   variable   has   one   more   parent.   On   the   other   hand,   Semi-­‐Naive   Bayesian   classifier   “partitions  the  variables  into  pairwise  disjoint  groups”.  Finally,  Selective  Naive  Bayes,  in   order  to  face  correlation  between  attributes,  omits  some  variables.    

 

There  are  several  ANB  classifiers:  

o Tree  Augmented  Naive  Bayes  (TAN);  

o Forest  Augmented  Naive  Bayes  (FAN);  

o Selective  Tree  Augmented  Naive  Bayes  (STAN);  

o Selective  Tree  Augmented  Naive  Bayes    with  Discarding  (STAND);  

o Selective  Forest  Augmented  Naive  Bayes  (SFAN);  

o Selective  Forest  Augmented  Naive  Bayes    with  Discarding  (SFAND).  

 

Graphical  representation  of  the  STAN  is  shown  on  the  following  Figure  2.    

  Figure  2.  Example  of  STAN  [6]  

In   Figure   2,   is   it   shown   that   each   child   (x)   node   is   allowed   to   have   additional   parent,   located  next  to  the  certain  class  node.  We  will  use  the  NB  to  balance  the  BN  classifiers   and  show  a  clear  image  of  all  dependencies  between  attributes  [6].  

     

(21)

3.  Method    

This  thesis  will  consist  of  2  parts,  depending  on  RQs.  We  will  answer  RQ1  by  doing  the   informal  review  of  the  field  and  for  RQ2  we  will  conduct  the  experiment.      

3.1  Methodology  for  RQ1  

In  order  to  answer  RQ1,  we  have  to  collect  material  related  to:    

o SFP;  

o Metrics;  

o ML  and    

o BN;  

 

Our  goal  is  not  focused  on  conducting  a  systematic  literature  review  on  the  topic  of  SFP,   software   metrics   or   ML   and   its   algorithms,   since   it   has   already   been   done   in   several   studies.  Therefore,  we  will  use  collected  sources  to  present  the  state-­‐of-­‐the-­‐art  for  non-­‐ static  code  metrics  in  the  process  of  SFP.  

 

Furthermore,   in   this   section,   we   will   describe   the   procedure   that   was   followed   in   the   process  of  selecting  papers  relevant  for  the  thesis  topic:  

 

Step  1:  Define  keywords  relevant  for  the  RQ1:  

o SFP;  

o Software  defect  prediction;  

o Code  metrics;  

o ML;  

o BN.  

 

Step  2:  Define  filters  in  terms  of  publication  years  and  type  of  the  papers  

Selected   papers   are   published   between   2010   and   2015   and   each   of   them   has   to   be   a   journal   or/and   conference   paper.   The   5–year   range   was   set   to   review   only   recently   published  results,  which  also  include  some  recent  literature  reviews  on  the  topic.      

 

Step  3:  Use  keywords  and  filters  in  several  databases:  

o IEEExplore4;  

o ACM  Digital  Library5;  

o Scopus6   (used   for   validation   of   papers   that   have   been   found   in   first   two  

databases).    

Step  4:  Collect  and  select  results  

To   get   suitable   results   from   databases,   search   string   often   contained   combination   of   terms,  using  Boolean  AND  and  OR  operators,  such  as:  

(TITLE-­‐ABS-­‐KEY  AND  TITLE-­‐ABS-­‐KEY)  AND  DOCTYPE(ar  OR  re)  AND  (  LIMIT-­‐

TO(SUBJAREA  )  AND  RECENT()7.  

                                                                                                               

4  http://ieeexplore.ieee.org/Xplore/home.jsp   5  http://dl.acm.org/  

6  http://www.scopus.com/  

(22)

Papers  with  SFP  and  process  metrics  keywords,  found  in  Scopus  database,  were  used  for   snowball   sampling   and   collecting   other   papers   relevant   for   this   topic   with   the,   earlier   mentioned,  5–year  range.    

During   the   selection   process,   papers   that   were   irrelevant,   but   have   been   displayed   in   search  results,  were  rejected  and  other  new  papers  were  taken  into  consideration.  The   majority  of  papers  were  found  in  IEEExplore  database.    

 

Step  5:  Present  the  informal  review.  

Present  all  relevant  information  regarding  non-­‐static  code  metrics.    

In  Figure  3,  we  will  illustrate  the  activities  as  part  of  methodology  for  answering  RQ1.    

  Figure  3.  Methodology  for  the  RQ1  

       

(23)

3.2  Methodology  for  RQ2  

We  identified  following  activities,  suggested  in  [15],  for  finalizing  RQ2:  

o Find   suitable   projects   with   datasets   containing   extracted   static   and   process  

metrics;  

o Decide  upon  a  response  variable;  

o Choose  the  design  of  the  experiment;  

o Conduct   the   experiment   for   the   projects,   using   previously   selected   BN  

classifiers;  

o Compare  results  using  a  statistical  analysis;  

o Publish  conclusions  about  results  of  the  statistical  analysis.    

 

Step   1:   Find   suitable   projects   with   datasets   containing   extracted   static   and   process   metrics  

We   will   analyze   datasets   available   on   the   Metric   Repository8   collected   from   different  

open-­‐source  projects.  The  datasets  were  used  in  the  study  of  Madeyski  et  al.  [18].      

Step  2:  Decide  upon  a  response  variable  

As  the  quality  measure,  we  will  use  area  under  the  curve,  AUC,  proposed  in  [6].  AUC  is   specified  by  an  average  value  of  performances  defined  by  all  thresholds.  

 

Step  3:  Choose  the  design  of  the  experiment  

A  principle  of  our  experimental  design  will  be  based  on  replication  [15],  which  means   that  the  experiment  will  be  run  a  defined  number  of  times  in  order  to  find  an  average   value  of  the  specific  quality  measure.  More  specifically,  we  will  use  cross-­‐validation.      

Step  4:  Conduct  the  experiment  for  the  projects,  using  previously  selected  BN  classifiers   As  it  is  presented  in  the  Background  Section,  for  the  purposes  of  the  experiment,  we  will   use  NB  and  TAN  classifiers.  Using  specified  classifiers  with  the  datasets,  we  will  create  a   learner.  Doing  training  multiple  times  and  then  testing  with  the  validation  sets,  we  will   get   results   for   the   desired   measure   [15].     Implementation   of   classifiers   is   available  

through  Weka9,  suggested  in  [6],  which  will  speed  up  the  process  of  collecting  results  of  

prediction  for  the  chosen  datasets.      

Step  5:  Compare  results  using  a  statistical  analysis    

Conducting  the  statistical  analysis  of  data  is  done  for  the  reason  of  providing  objective   and   unbiased   results   regarding   the   comparison   of   different   datasets.   Since   we   are   operating  with  17  datasets  and  2  classifiers,  we  will  decide  between  a  one-­‐way  analysis   of  variance  (ANOVA)  or  Kruskal-­‐Wallis  test,  once  we  collect  results  from  the  experiment.      

Step  6:  Publish  conclusions  about  results  of  the  statistical  analysis  

The   statistical   analysis   will   give   answers   whether   samples   of   the   datasets   are   significantly  different  or  not.  In  case  we  have  differences,  we  will  discuss  which  datasets   are   more   superior   to   others,   otherwise   we   will   give   an   explanation   and   possible   improvements  for  the  future  experiments.    

                                                                                                               

8  http://purl.org/MarianJureczko/MetricsRepo  

(24)

 

In  Figure  4,  we  will  illustrate  the  activities  as  part  of  methodology  for  answering  RQ2.      

  Figure  4.  Methodology  for  the  RQ2  

 

 We  will  describe  the  experiment  with  detailed  requirements  in  Section  5:  Design.    

(25)

4.  Informal  review  

We  collected  recent  papers  that  are  dealing  with  SFP  using  non-­‐static  code  metrics,  in   order  to  answer  RQ1.  The  best  source  of  information  is  found  in  a  form  of  a  systematic   literature  review  and  studies,  which  objectives  were  to  investigate  the  performances  of   process  metrics  for  various  open-­‐source,  industrial  projects  and  software  systems.  We   classified  relevant  findings  into  several  groups  based  on  their  occurrence  in  the  papers.   The  groups  are  following:  

o Process  metrics  in  general;  

o Code  churn  metrics;  

o Developer  metrics;  

o Other  process  metrics.  

 

4.1  Process  metrics  

In   their   systematic   literature   review,   Radjenović   et   al.   [8]   analyzed   106   papers   concluding  that  process  metrics  are  represented  only  24%  among  other  metrics  for  the   purposes  of  fault  prediction.  Moreover,  they  identified  that  process  metrics,  such  as  the   number   of   different   developers   that   worked   on   the   same   file,   the   number   of   changes   made   on   some   file,   the   age   of   a   module,   etc.,   are   better   for   faults   detection   in   post-­‐ release  phase  of  software  development  than  some  static  code  metrics.  Those  metrics  are   mainly   produced   extracting   source   code   and   the   history   from   the   repository.   Results   gained  from  different  studies  have  pointed  out  several  advantages  of  process  over  static   code  metrics:  

o Process  metrics  provide  a  better  description  regarding  distribution  of  faults  in  

the  software;  

o Used  for  Java-­‐based  applications,  process  metrics  can  provide  better  models  in  

terms  of  cost-­‐effectiveness;  

o Process  metrics  performed  the  best  in  cases  of  prediction  faulty  classes  using  

the  ROC  area.    

However,   some   studies   have   shown   that   process   metrics   have   not   performed   well   because  they  were  used  in  a  pre-­‐release  phase  of  software  development.  Results  of  the   conducted  experiments  have  confirmed  that  process  metrics  perform  better  and  have  to   be  used  in  post-­‐release  phase,  which  explains  previously  mentioned  issues  in  prediction   [8].   In   [4]   is   suggested   combination   of   code,   process   and   static   metrics   for   the   better   prediction.  

Xia  et  al.  [20]  analyzed  performances  of  code  and  process  metrics  for  the  TT&C  software.   They   emphasize   the   benefits   of   process   metrics   during   different   development   phases,   such   as   requirements   analysis,   design   and   coding.   There   have   been   listed   16   process   metrics,   suitable   for   the   specific   phase.   In   the   early   stages   of   software   development,   analysis   of   requirements   and   their   maturity   are   metrics   related   to   detection   of   faulty   software   module.   Therefore,   it   is   important   to   eliminate   errors   that   occur   in   the   requirements  phase,  which  can  be  transferred  to  other  stages  of  development,  i.e.  design   and  coding  phases.  Another  important  feature  of  process  metrics  is  tracking  historical  

Figure

Figure	
  1.	
  Example	
  of	
  NB	
  [6]	
   	
  
Figure	
  2.	
  Example	
  of	
  STAN	
  [6]	
   	
  
Figure	
  3.	
  Methodology	
  for	
  the	
  RQ1	
   	
   	
  
Figure	
  4.	
  Methodology	
  for	
  the	
  RQ2	
   	
   	
  
+7

References

Related documents

complex and many parameters should be measured, a dashboard of sustainable development was created which summarizes the most important measures (indicators) in

The results of the thesis are derived from the interviews we held with people involved in the current project in order to evaluate how efficient the proposed

Having at hand Theorems 1.1 and 1.2, it is natural to ask if in the case where the virtual mass is strictly positive and the boundary of M is allowed to have several

The objective of this thesis is to study the measurement of informational privacy with a particular focus on scenarios where an individual discloses personal data to a second party,

In 19th Australian Conference on Software Engineering (aswec 2008) (pp. Evaluation and measurement of software process improvement—a systematic literature review.. Measuring

Genom att skapa en kontrollgrupp, som inte fick några instruktioner eller källkritiska frågor förrän efter de fått se inslagen andra gången, visas om eventuell effekt på

for a future integration with a biomass gasification system. It can be speculated that even if production costs for bio-SNG are higher than the purchase price of natural gas,

Jeg var í fæði og til húsa hjá útvegsbónda Jóni Ólafssyni í Hlíð- arhúsum, er bjó rjett hjá, þar sem prentsmiðjan þá var (í Doktorshúsi svo nefndu). Haustið