• No results found

A critical appraisal tool for systematic literature reviews in software engineering

N/A
N/A
Protected

Academic year: 2021

Share "A critical appraisal tool for systematic literature reviews in software engineering"

Copied!
3
0
0

Loading.... (view fulltext now)

Full text

(1)

InformationandSoftwareTechnology112(2019)48–50

ContentslistsavailableatScienceDirect

Information

and

Software

Technology

journalhomepage:www.elsevier.com/locate/infsof

A

critical

appraisal

tool

for

systematic

literature

reviews

in

software

engineering

Nauman

bin

Ali

,

Muhammad

Usman

Blekinge Institute of Technology, Karlskrona, Sweden

a r t i c l e

i n f o

Keywords:

Systematic literature reviews Quality assessment Software engineering Critical appraisal tools AMSTAR

a b s t r a c t

Context:Methodologicalresearchonsystematicliteraturereviews(SLRs)inSoftwareEngineering(SE)hassofar focusedondevelopingandevaluatingguidelinesforconductingsystematicreviews.However,thesupportfor qualityassessmentofcompletedSLRshasnotreceivedthesamelevelofattention.

Objective:Toraiseawarenessoftheneedforacriticalappraisaltool(CAT)forassessingthequalityofSLRsin SE.Toinitiateacommunity-basedefforttowardsthedevelopmentofsuchatool.

Method:WereviewedtheliteratureonthequalityassessmentofSLRstoidentifythefrequentlyusedCATsinSE andotherfields.Results:WeidentifiedthattheCATscurrentlyusedisSEwereborrowedfrommedicine,buthave notkeptpacewithsubstantialadvancementsinthefieldofmedicine.

Conclusion:Inthispaper,wehavearguedtheneedforaCATforqualityappraisalofSLRsinSE.Wehavealso identifiedatoolthathasthepotentialforapplicationinSE.Furthermore,wehavepresentedourapproachfor adaptingthisstate-of-the-artCATforassessingSLRsinSE.

1. Introduction

Inspiredbymedicine,evidence-basedsoftwareengineering(EBSE) promotestheuseofsystematicliteraturereviews(SLRs)to systemati-callyidentify,evaluateandsynthesizeresearchonatopicofinterest[1]. SincetheintroductionofSLRsinSoftwareEngineering(SE),therateof papersreportingSLRsinSE1hasbeencontinuallyincreasing(seeFig.1).

However,severalrecentin-depthevaluationsofpublishedSLRshave identifiedseriousflawsregardingtheirquality.Forexample,issues re-latedto:(a)thereportingqualityofproceduresandoutcomes [2],(b) thereliabilityofsearch [3],and(c)lackofsynthesisortheuseof in-appropriatesynthesismethods [4] inSLRs.Suchissuesraisequestions aboutthecredibilityofSLRs.

Mostresearchers in SE haveused the fourquestions adoptedby Kitchenhametal.[5](itemsatodin Table1)forqualityassessmentof SLRs.Thesequestionsareinsufficienttorevealimportantlimitationsin anSLRasdemonstratedbytheabove-listedstudies [2–4].

Fig.2helpstounderstandtheroleofguidelinesanddistinguishthe purposeofcriticalappraisaltools(CAT).Theguidelinesforplanningand conductinganSLRenablearesearchteamtoplanandexecuteareview

ThisisanOpenAccessarticledistributedinaccordancewiththetermsoftheCreativeCommonsAttribution(CCBY4.0)license,whichpermitsotherstodistribute,

remix,adaptandbuilduponthiswork,includingforcommercialuse,providedtheoriginalworkisproperlycited.See:http://creativecommons.org/licenses/by/4.0/.

Correspondingauthor.

E-mailaddresses:nauman.ali@bth.se(N.b.Ali),muu@bth.se(M.Usman).

1 (SearchstringusedinScopustoidentifySLRspublishedincomputingTITLE-ABS-KEY(“systematicreview” OR“systematicliteraturereview”)ANDPUBYEAR<

2019AND(LIMIT-TO(SUBJAREA,“COMP”)).

thatfollowsarigorousprocess[1,5].Similarly,thereportingguidelines helptheresearcherstocommunicatethedesignandexecutionofanSLR tothereaders [1].Morerecently,therearenewreportingguidelines thatareintendedtoimprovetheusefulnessoftheresultsofanSLRfor educationandpractice [6].

Ontheotherhand,theroleofcriticalappraisaltoolsistofacilitatea readertoanalyticallyassessthecredibilityofacompletedSLR.Suchan assessmentconsidersboththereportingquality,e.g.,“Arethereview’s inclusionandexclusioncriteriadescribed?”,andtheriskofbiasassessment inthedesignandexecutionoftheSLR,e.g.,“Arethereview’sinclusion andexclusioncriteriaappropriate?”.

AsthenumberofSLRsisincreasing,theneedfortoolstoassessthe qualityofanSLRwithouthavingtoreplicatethestudyisbecomingmore evident.SuchaCATwillhelptosustainandimprovethecredibilityof SLRsasaneffectivemeansfordecision-supportinSE.Itwillenablethe readersofSLRstodifferentiatebetweengoodqualitySLRsfromtheones thatdidnotfollowarigorousandcomprehensiveapproach.

Inthis paper, weseektoraiseawareness of theneedfor a criti-calappraisalinstrumentandhaveintroducedacandidatesolutionfor thistask.Theworkpresentedinthispaperhasthepotentialtohavea

https://doi.org/10.1016/j.infsof.2019.04.006

Received5November2018;Receivedinrevisedform5April2019;Accepted12April2019 Availableonline15April2019

(2)

N.b. Ali and M. Usman Information and Software Technology 112 (2019) 48–50

Table1

AMSTAR-2,andDAREqualitycriteriausedtoappraiseSLRs.

DARE - Note: Fulfilling items a, b and e, and either c or d is mandatory for an SLR to be included in the DARE database of SLRs.

a. Were inclusion/exclusion criteria reported? d. Are sufficient details about the individual included studies presented? b. Was the search adequate? e. Were the included studies synthesised?

c. Was the quality of the included studies assessed?

AMSTAR -2 - Note: Items marked with an asterisk ( ) are not applicable for the appraisal of SMSs.

1. “Did the research questions and inclusion criteria for the review include the components of PICO? ”

2. “Did the report of the review contain an explicit statement that the review methods were established prior to the conduct of the review and did the report justify any significant deviations from the protocol? ”

3. “Did the review authors explain their selection of the study designs for inclusion in the review? ” 4. “Did the review authors use a comprehensive literature search strategy? ”

5. “Did the review authors perform study selection in duplicate? ” 6. “Did the review authors perform data extraction in duplicate? ”

7. “Did the review authors provide a list of excluded studies and justify the exclusions? ” 8. “Did the review authors describe the included studies in adequate detail? ”

9. ∗ “Did the review authors use a satisfactory technique for assessing the risk of bias (RoB) in individual studies that were included in the review? ”

10. “Did the review authors report on the sources of funding for the studies included in the review? ”

11. ∗ “If meta-analysis was performed did the review authors use appropriate methods for statistical combination of results? ”

12. ∗ “If meta-analysis was performed, did the review authors assess the potential impact of RoB in individual studies on the results of the meta-analysis or other evidence

synthesis? ”

13. ∗ “Did the review authors account for RoB in individual studies when interpreting/ discussing the results of the review? ”

14. ∗ “Did the review authors provide a satisfactory explanation for, and discussion of, any heterogeneity observed in the results of the review? ”

15. ∗ “If they performed quantitative synthesis did the review authors carry out an adequate investigation of publication bias (small study bias) and discuss its likely impact on the

results of the review? ”

16. “Did the review authors report any potential sources of conflict of interest, including any funding they received for conducting the review? ”

Fig.1. TheincreasingnumberofSLRsincomputingsince2004.

Fig.2. AviewofqualityonthevariousstagesofanSLR.

profoundimpactonSEresearch,sinceitisusefulfortwoverycommon scenarios:(1)toassessthequalityofanSLRasareferee/reader,and (2)tosynthesizetheresultsofseveralSLRsonthesametopicandto understandthereasonsforanydifferencesbetweentheirresults.

Theremainder ofthepaperisstructuredasfollows: Section2 ex-plainstheneedforaCATforSLRsinSE. Section3 presentsa state-of-the-artCAT.In Sections4 and 5,webrieflyproposeanapproachto customizeandvalidatethetoolforSE.Section6 concludesthepaper. 2. Need for a CAT for the quality assessment of SLRs in SE

Since2004,whenthefirstguidelinesforSLRsinSEwereintroduced, severalimprovementshavebeenmadetotheguidelinesforconducting andreportingSLRsinSE [1].However,theappraisaltoolsforSLRsin SEhavenotreceivedmuchattention.ResearchersintheSEfieldhave continuedtorely ona subsetof questionsidentifiedbyKitchenham etal. [5] fromthefieldofevidence-basedmedicineintheyear2004. Thecommonly-usedinterpretationoftheDARE2criteriainSEdoesnot

2 The CRD’s Database of Abstracts of Reviews of Effects (DARE)

https://www.crd.york.ac.uk/CRDWeb/AboutPage.asp.

evenconsiderifthereisasynthesisperformedinareview.Thisexplains tosomeextentwhysomeofthelimitationsinthequalityofSLRse.g. poorreportingquality [2],lackofanadequatesearchstrategy [3] and thelackofsynthesis [4]cannotbesufficientlyrevealedwiththeCATs currentlyusedinSE.

Inthemeantime,realizingtheimportanceofCATstoassessthe qual-ity of completed systematicreviews, researchersin other disciplines havefurtherdevelopedthesetools.Areviewofevidence-basedmedicine literaturerevealsthatonetoolthatstandsoutforthedegreeof valida-tionandapplicationisAMSTAR(AMeaSurementTooltoAssess system-aticReviews) [7].AMSTARwasdevelopedbasedonascopingreview ofthethenavailableratinginstruments.Thereviewidentifiedseveral over-lappingappraisalitems,whichwerecombinedinto11 AMSTAR appraisalitemsusingfactoranalysis[7].Afterpilottesting,theoriginal AMSTARwasvalidatedexternallyaswell[8].AMSTAR3hassincethen

beenusedandvalidatedextensively[8,9].

3. Candidate CAT for quality assessment of SLRs in SE

Recently,thedesignersofAMSTARhaveproposedarevisionofthe tool(AMSTAR-2[8]).Therevisionisbasedoncommunityfeedback col-lectedthroughdifferentchannelssuchaspublishedreportsofits appli-cation,theAMSTARwebsite,4surveysofAMSTARusers,andthe

expe-rienceofparticipantsinAMSTARworkshops.Theteamthathasrevised thetoolincludesdesignersoftheoriginalinstrumentandtwodesigners ofanotherinstrumentROBIS(RiskOfBiasInSystematicreviews). RO-BIS5isarelativelynewinstrumentandisdesignedtosupportreviewers

inassessingtheriskofbiasincompletedSLRs.

AMSTAR-2canbeusedtoappraiseSLRsthatmayincludeboth ran-domizedornon-randomizedstudies.AMSTAR-2hasamoredetailed as-sessmentoftheriskofbiasinSLRsduetotheprimarystudiesincluded, andhowthereviewauthorshavedealtwithsuchbiaswhen interpret-ingreviewresults.AMSTAR-2consistsof16items(see Table1),and eachitemhasdetailedresponseoptionstoguideuserstomakethe ap-propriatejudgement(seecompleteAMSTAR-24fordetails).Theinitial

3TheAMSTARpaper[7]had2958citationsonFebruary13,2018. 4AMSTARhttps://amstar.ca/.

5ROBIS https://www.bristol.ac.uk/population-health-sciences/projects/

robis/robis-tool/.

(3)

N.b. Ali and M. Usman Information and Software Technology 112 (2019) 48–50

1. Adapt AMSTAR-2 for SE

2. Obtain community feedback on the first version of CATSER

3. Revise CATSER by synthesizing community

feedback

5. Prepare and distribute CATSER for validation

6. Revise (if required) CATSER

4. - Elicit more feedback Sufficient Consensus? No

Yes

Fig.3. ApproachforadaptingAMSTAR-2forSE.

evaluationofAMSTAR-2,byhavingmultipleratersusethetool,has shownmoderatetogoodagreementformostitemsinthetool[8].

AMSTAR-2hasseveraladvantagesovertheDAREcriteriacommonly usedinSE.DAREisnotaCATperse;itisintendedtoprovidethecriteria thatSLRsshouldmeettobeincludedintheCRD’sdatabaseofSLRs.In SE,onlyfourofthefiveitemsofDARE(itemsatodin Table1)have oftenbeenused [5].ApartfromitembtheformulationofDAREitems onlycaptures thereporting qualityinSLRs(cf. [5]),e.g., seeitema aboutreportingoftheselectioncriteria.Furthermore,manyoftheitems inAMSTAR-2(e.g.items1,5,6,7,10,14,15,and16)whichcapture thequalityofanSLRarenotcoveredbytheDAREcriteria.

Inthisstudy,wehaveidentifiedAMSTAR-2asacandidateCATthat can beadapted forSE.Theapproachwe willusein developingand validatingCATSER(aCriticalAppraisalToolforSEsystematicReviews basedonAMSTAR-2)isdescribedinthefollowingsectionsanddepicted in Fig.3.

4. A proposed approach for adapting AMSTAR-2 for SE

WeproposetofirstadaptAMSTAR-2forSEbyreviewingitsitems andresponseoptionsfortheirrelevancetoSEusingthe recommenda-tionsintheEBSEliterature(e.g.,[1,10]).Inthenextphase,wewill in-volvetheSEresearchcommunityforthefurtherevolutionofCATSER. WewillorganizeworkshopsattheprominentSEvenues(e.g.,the in-ternationalsymposiumonempiricalsoftwareengineeringand measure-ment(ESEM)6).Furthermore,aweb-basedforumwillbesetuptocollect

feedbackfromthewidercommunity.

WehavereviewedtherelevanceofAMSTAR-2itemsforSE system-aticsecondarystudies(systematicmappingstudies(SMS)[1]andSLRs). Outofthe16itemsinAMSTAR-2,weconsider10items(see Table1) relevantforthecriticalappraisalofbothSLRsandSMSs.These10items coverthefundamentalaspects(e.g.,protocoldevelopment,systematic search,studyselection,anddataextractionprocesses)necessaryforthe reliabilityofbothSLRsandSMSs.

SMSsdonotincludeathoroughsynthesisanddetailedquality as-sessmentoftheincludedprimarystudies [1].Therefore,weconsider theremainingsixitemsregardingsynthesisandmeta-analysisasonly relevantforSLRs.

TheresponseoptionsinAMSTAR-2areformulatedforthemedical discipline,andthesewillrequireadaptationforSE.Forthispurpose, wewilluse thelatest guidelinesfordesigning,reporting,conducting andvalidatingsystematicsecondarystudiesinSE[1–3,10].

6 http://www.esem-conferences.org/.

5. A proposed approach for validating CATSER

WeplantovalidateCATSERbyusingittoappraiseasetofSLRs usingreviewersbeyondthosewhowillbeinvolvedintheadaptationof AMSTAR-2forSE.Wewillallocateasmallsampleofrandomlyselected SLRstothereviewers.UsingtheresultsofindividuallyappraisedSLRs withCATSER,weplantocomputetheinter-raterreliabilityofCATSER. AnotheraspectoftheevaluationofCATSERwillfocusonits use-fulness toidentifysignificantflawsinanSLR.Inthefuture,wewill comparetheassessmentofSLRsusingCATSERandthecommonly-used interpretationofDAREinSE.

Thelong-termvalidationofsuchinstrumentsdependsonhowwidely theyareacceptedandusedbythecommunity.Wehopetoinitiatea communityeffortinSEtoadapt,validateandmatureCATSER(which willleveragethestrengthsofAMSTAR-2).

6. Conclusion

By comparing thestate-of-the-art tools in medicine with the fre-quentlyusedCATs inSE,andbasedontherecentevaluationsof the qualityofSLRs,weidentifiedandemphasizedtheneedforfurther re-searchonCATsforSLRsinSE.WehavealsoidentifiedacandidateCAT andproposedanapproachtoadaptitfortheneedsofSEwiththe in-volvementoftheSEresearchcommunity.

Thisapproachwillnotonlyimprovethequalityofthetool,but en-surecommunitybuy-inandthusincreasethelikelihoodofadoptionof thetool.GiventhecontinuedinterestinSLRsinSE,wecontendthat thisworkhasapotentiallysignificantimpactonresearch.Itwillhelp toimproveandsustainthecredibilityofSLRsinSE.

Acknowledgment

Theauthorswouldlike tothankProf.ClaesWohlinforproviding feedbackonthepaper.Thisworkhasbeensupportedbyaresearchgrant fortheVITSproject(referencenumber20180127)bytheKnowledge Foundation inSwedenandbyELLIIT,aStrategicAreawithinITand MobileCommunications,fundedbytheSwedishGovernment. Conflict of Interest

Theauthorsdeclarenoconflictofinterest. References

[1] B.A. Kitchenham , D. Budgen , P. Brereton , Evidence-Based Software Engineering and Systematic Reviews, Chapman & Hall/CRC, 2015 .

[2] D. Budgen , P. Brereton , S. Drummond , N. Williams , Reporting systematic reviews: some lessons from a tertiary study, Inf. Softw. Technol. 95 (2018) 62–74 . [3] N.B. Ali , M. Usman , Reliability of search in systematic reviews: towards a quality

assessment framework for the automated-search strategy, Inf. Softw. Technol. 99 (2018) 133–147 .

[4] D.S. Cruzes , T. Dybå, Research synthesis in software engineering: a tertiary study, Inf. Softw. Technol. 53 (5) (2011) 440–455 .

[5] B. Kitchenham , R. Pretorius , D. Budgen , O. Pearl Brereton , M. Turner , M. Niazi , S. Linkman , Systematic literature reviews in software engineering - a tertiary study, Inf. Softw. Technol. 52 (8) (2010) 792–805 .

[6] B. Cartaxo , G. Pinto , S. Soares , Towards a model to transfer knowledge from software engineering research to practice, Inf. Softw. Technol. 97 (2018) 80–82 .

[7] B.J. Shea , J.M. Grimshaw , G.A. Wells , M. Boers , N. Andersson , C. Hamel , A.C. Porter , P. Tugwell , D. Moher , L.M. Bouter , Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews, BMC Med. Res. Methodol. 7 (1) (2007) 10 .

[8] B.J. Shea , B.C. Reeves , G. Wells , M. Thuku , C. Hamel , J. Moran , D. Moher , P. Tug- well , V. Welch , E. Kristjansson , et al. , AMSTAR 2: a critical appraisal tool for sys- tematic reviews that include randomised or non-randomised studies of healthcare interventions, or both, BMJ 358 (2017) j4008 .

[9] B.J. Shea , L.M. Bouter , J. Peterson , M. Boers , N. Andersson , Z. Ortiz , T. Ramsay , A. Bai , V.K. Shukla , J.M. Grimshaw , External validation of a measurement tool to assess systematic reviews (AMSTAR), PLoS One 2 (12) (2007) e1350 .

[10] A. Ampatzoglou , S. Bibi , P. Avgeriou , M. Verbeek , A. Chatzigeorgiou , Identifying, categorizing and mitigating threats to validity in software engineering secondary studies, Inf. Softw. Technol. 106 (2019) 201–230 .

References

Related documents

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

40 Så kallad gold- plating, att gå längre än vad EU-lagstiftningen egentligen kräver, förkommer i viss utsträckning enligt underökningen Regelindikator som genomförts

Our objective in this study is to conduct a systematic literature review to identify the types and purposes for visualizing software quality metrics, including an

Raffo, Identifying key success factors for globally distributed software development project using simulation: a case study, in: Proceedings of the International Conference on

Four main issues were considered, when going through this study: the first one was Field from the main taxonomy, which included the analysis of 9 different

The aim of this study was to describe and explore potential consequences for health-related quality of life, well-being and activity level, of having a certified service or

The thorough review helped us in identifying various challenging factors that negatively influenced the success of software projects and the respective lean