Quantifying
the
need
for
supervised
machine
learning
in
conducting
live
forensic
analysis
of
emergent
configurations
(ECO)
in
IoT
environments
Victor
R.
Kebande
a,*
,
Richard
A.
Ikuesan
b,
Nickson
M.
Karie
c,
Sadi
Alawadi
a,
Kim-Kwang
Raymond
Choo
d,
Arafat
Al-Dhaqm
eaDepartmentofComputerScience,MalmöUniversity,Sweden
bCyberandNetworkSecurityDepartment,ScienceandTechnologyDivision,CommunityCollegeofQatar,Qatar cSchoolofScience,EdithCowanUniversity,Australia
dDepartmentofInformationSystemsandCyberSecurity,UniversityofTexasatSanAntonio,SanAntonio,TX78249-0631,USA eSchoolofComputing,FacultyofEngineering,UniversitiTeknologiMalysia,Johor,Malaysia
ABSTRACT
Machinelearninghasbeenshownasapromisingapproachtominelargerdatasets,suchasthosethatcomprisedata
fromabroadrangeofInternetofThingsdevices,acrosscomplexenvironment(s)tosolvedifferentproblems.This
papersurveysexistingliteratureonthepotentialofusingsupervisedclassicalmachinelearningtechniques,suchas
K-NearestNeigbour,SupportVectorMachines,NaiveBayesandRandomForestalgorithms,inperforminglive
digitalforensicsfordifferentIoTconfigurations.Therearealsoanumberofchallengesassociatedwiththeuseof
machinelearningtechniques,asdiscussedinthispaper.
ARTICLE INFO Keywords: Supervisedmachine Learning Liveforensics Emergentconfigurations IoT 1. Introduction
AsInternetofThings(IoT)devicesbecomethenorm,sodoestheneed forIoTforensics.Thelatterisabranchofdigitalforensics,whichinvolves theinvestigationofIoTdevicesaswellasthesupportinginfrastructure. Unlikeconventionaldigitalforensics,collectingoracquiringevidence fromIoTdevicescanbechallengingduetothediversityofIoTdevicesand theunderpinningoperatingandfilesystems.
ItisalsonotedthatinanIoTsystem,especiallyinthecaseofemergent configurations(ECOs),datacanbedynamicandconsequently challeng-ingtolabeldatasetsduringliveforensics.Liveforensicsinthiscontext referstoaforensicinvestigationconductedinnearreal-time.ECOs,as definedbyexistingstudies[1–4],aresystemsformedbyasetofthings, with their services, functionalities, and applications, that cooperate temporarily toachieve some user goals. ECOs adapt in responseto (unforeseen)contextualchanges,suchaschangesinavailablethingsor user goals. Given the heterogeneity and increased connectivity of emergingconfigurations,ECOsplatformscanbechallengingtoperform liveforensics,giventhatsuchsystemsmaycompriseoneormoredynamic andheterogeneous(IoT)systems,whichmayalsobedistributed[5].
Inrecenttimes,therehavebeenattemptstoutilizemachinelearning (ML)techniquestofacilitatedigitalforensics,includingIoTforensics. However,thisinclusionhaslargelybeenwithinthescopeofstaticIoT
platformssuchasSmartHomeswherethe‘contextofthings’arelargely unchanged. Hence, in this manuscript, the authors survey existing literature on the use of supervised ML techniques (e.g., K-Nearest Neigbour,SupportVectorMachines(SVM),NaiveBayesandRandom Forest)inconductingliveforensicsacrossdynamicandcontext-changing IoTsystems,typicalofECOs.Atthetimeofourstudy,thisisthefirststudy toexplore thefeasibility ofintegrating MLinto anECO platformto facilitatedigitalforensics.Therefore,thecontributionsofthispaperareas follows:
explorethefeasibilityofintegratingsupervisedMLtechniquesto performliveforensicanalysisinadynamic(ECO)IoTplatform; demonstratehowforensicactivitiescoulddynamicallybeconductedin
anECOenvironment;and
provideacontextualevaluationthatshowsthattheforensicchallenges inanIoTenvironmentandhowautomationforincidentidentification mayoccur.
InSection2,areviewoftherelatedliteratureandtheresearchgap fromexisting studiesarepresented.Then, inSections IIIandIV,we presentourproposedconceptualframeworkandhowitcanbedeployed. Discussionsandconclusionarepresentedinthelasttwosectionsofthis manuscript.
* Correspondingauthor.
E-mailaddresses:victor.kebande@mau.se(V.R. Kebande),richard.ikuesan@ccq.edu.qa(R.A. Ikuesan),n.karie@ecu.edu.au(N.M. Karie),sadi.alawadi@mau.se
(S.Alawadi),raymond.choo@fulbrightmail.org (K.-K.R.Choo),mrarafat@utm.my (A.Al-Dhaqm).
http://doi.org/10.1016/j.fsir.2020.100122
Received21April2020;Receivedinrevisedform12June2020;Accepted8July2020
Availableonline15July2020
2665-9107/©2020TheAuthor(s).PublishedbyElsevierB.V.ThisisanopenaccessarticleundertheCCBYlicense(http://creativecommons.org/licenses/by/4.0/).
ForensicScienceInternational:Reports2(2020)100122
ContentslistsavailableatScienceDirect
Forensic
Science
International:
Reports
2. Relatedliterature 2.1. Existingliterature
Machine Learning (and deep learning) approaches have gained renewed interest in recent years, such as the example approaches presentedinTable1.
TherehavealsobeenattemptstoutilizeMLanddeeplearningfor digital(forensic)investigations.Forexample,agenericframeworkthat allowstheapplicationofdeeplearningcognitivecomputingtechniquesin cyberforensics(CF)waspresentedin[16].However,thisframeworkis notdesignedtofacilitateliveforensicsinIoTenvironment.
Anotherresearchby[17]exploredtheeffectivenessofemploying machine learning methodologies for computer forensic analysis by tracingpastfilesystemactivitiesandpreparingatimelinetofacilitatethe identificationofincriminatingevidence.Theirapproachis,however,not designedtofacilitateIoTforensics.Costantinietal.[18]exploredthe applicabilityofartificialintelligence(AI)alongwithcomputationallogic toolstoautomateevidenceanalysis,whileMitchell[19]discussedthe potentialusefulnessofAIindigitalforensics.
However, we observe that the potential role of supervised ML techniquesinliveforensicsonECOsacrossIoTecosystemsisnot well-understoodorfullyexploredintheliterature.TheconceptofECOisnot relativelynew,andhasbeenextensivelystudied[1–4].Generally,ECOs areformedbyasetofthings,withtheirservices,functionalities,and applications,whichcooperateonanad-hocbasistoachievesomeuser goal[2,4].ECOsareadaptedinresponseto(unforeseen)contextchanges, suchaschangesintheresourcesavailableorchanging/evolvinguser goals.Giventheheterogeneityandincreasedconnectivityofemerging configurations,itcanbechallengingtoidentifymaliciousactivitiesin ECOs.
TheconnectionbetweenIoTandECOscanbebroadlyexplainedbythe widespreadadoptionofIoTindifferentsectors(e.g.,smarthealth,smart transport,smartcities,automation,agriculture,andmanufacturing).IoT isalsoregardedasadisruptivetechnology,includingbytheUSNational intelligencecouncil[20,21].Therefore,inthecontextofIoTECOs,we
needtoconsideremergentbehavior,connectivity(exchangeof informa-tion), localization and tracking, how distributed components are, ubiquityanddeviceheterogeneity.Implicationsforforensicinvestigation areexistenceof interaction,coordinationandinteroperability,which mainlyencompassesevents,context,environmentandactions[22].
2.2. Researchgaps
Basedonareviewoftheexistingliterature,weidentifythefollowing researchgaps.
Theshiftinconventionaldigitalforensicstocloudforensics,network forensics,device-levelforensicsandliveforensicsacrosstheIoT ecosystemshascompoundedthechallengesinperformingdigital investigations,forexampleintermsofdatasizeandtherapidlychanging technologicallandscape[23–26].Hence,thereisaneedtoensurethat digitalforensiccapabilitieskeeppacewithemergingtechnologies[27], aswellasdesigningAI-basedapproachestofacilitatedigitalforensics andreal-timeincidentdetectionandincidentresponseforECOs[28,29]. ThisnecessitatestheunderstandingofthecompositionofECOs,for exampleintermsofprocessandarchitecture[30].
Conventionallabeleddatasetsandextractedfeaturesmaynot necessarilybeusefultofacilitateliveforensicacrossemergingIoT configurations,duetothedynamicnatureofthesysteminteractions andthreatlandscape[31,32].
3. Proposedframeworkforadoptingsupervisedmachinelearning approaches
Wewillnowpresentourproposedconceptualframework,asshownin
Fig.1.Thethreekeybuildingblocksarediscussednext. 3.1. EmergingIoTconfigurations
ECOscanbebroadlydefinedtobeadynamiccollectionof‘things’with functionalities seeking to achieve a given goal [1], and a concrete
Table1
SnapshotofexistingMLapproachesinsecurityincidents.
Reference Objective Machinelearning approaches
Algorithmused Application
[6] Botdetectionusing unsu-pervisedlearning
(Unsupervised Machine Learning)
Flowclustering,andsimpleK-means clustering
Basedontheflowsgeneratedbybotsbasedonthedestination portnumber,largestsizeofpacket,smallestsizeofpacket,the timethepacketisflagged.
[7] Digitalforensictextstring searching
(Unsupervised Machine Learning)
Clusteringdigitalforensictextstring.. UsesSelf-organizingmapstoteststhefeasibilityandutilityof post-retrievalclusteringofdigitalforensictextstringsearch results
[8] Classificationmodelfor anomaly-basedintrusion detection
(supervised Ma-chineLearning)
NaïveBayesclassification,K-nearest Neigbor
UsedNSL-KDDdatasettodetectUsertoRoot(U2R)andremoteto Local(R2L).
[9] forensicsdatataskfor multi-classclassification
-(Supervisedand neuralnetworks)
Decisiontrees,Bayesclassifiers,ANNand Nearestneighbor
Classifiershavebeenevaluatedbasedperformancemeasuresand Cohen'skappa.Astatisticalanalysishasbeenconductedinorder tocomputeeachofalgorithmsbasedonaccuracy
[10] Digitalforensicreadiness Supervised Learn-ingApproach
Bayes,NeuralNet,SVM,C4.5,HMM, NearestNeighbor,LogisticModeltree
ImplementedC4.5decisiontreeonKeystrokedatasetforliveuser identification
[11] UserIdentification Supervised Learn-ingApproach
Rulebasedmachinelearning,Decision Treeclassifier
Usedlabeleddatatoperformuseridentification
[12] Networkforensicanalysis. Feature engineer-ingatAnalysis layer.
AnalysedKDDCup99Datasetbyapplying areputationvalueindataanalysis method
TheauthorusedKDDCup’99collectionof9weekTCPdump datasetswhichhasshownrealtimeperformanceofthenetwork basedonthereputationvalue
[13] Passiveaudiobootleg detector
(Deeplearning andsupervised)
Deeplearning,DeepBeliefNetwork (DBN),classification-SVM.
ImplementedthreeclassSVMandappliedfeaturelearningto detectwhethermusicaudiotrackrelatestounauthorized recording
[14] IntelligentSelf-learning sys-temforhomeautomationin IoT
(GuidedLearning classification)
NaiveBayesAlgorithm Automaticfaultdetectioninconnecteddevices
[15] SVM-basedmalware detec-tionforIoTservices
(GuidedLearning classification)
implementationscenariowillbepresentedinSection4.TheECOsare designedtoachievetheirgoalsoverheterogeneousenvironment,and facilitaterealtimeinteractionsofscenariosthroughsuccessfulexecutions whilealsoensuringinteroperability.InIoTsettings,suchinteractions normallyrequire anumberof actions tobe executed,which implies massiveamountofdatathatcanbeexploitedbycyberattackers(e.g.,as the proverbial phrase, ‘needle in a haystack’). Hence, an in-depth understandingof theconfigurationsandthepotentialdatatypesand sourceswillsignificantlyreducetheamountoftimerequiredinforensic investigations.
3.2. NISTdigitalforensicprocess
Whilethereareanumberofexistingdigitalforensicprocess,weuse NIST Special Publication 800-86 as the guiding process due to its widespread adoption and that it allows the integration of forensic techniques into incident response. Similar to other digital forensic domains,IoTforensicsmaycrossjurisdictionsandhenceinvolvedifferent lawsandrequirements,forexampleintermsofevidencecollectionand admissibility.AsIoTsystemsmaybedeployedincriticalinfrastructure sectors,wheretakingitofflineforforensicinvestigationsisimpractical, weposittheimportanceofliveforensic-readinesstoo.Theroleofeachof theseprocessesisoutlinedbelowinthecontextofIoT.
Collection:Timelyidentificationofpotentialevidencesourcesin (interconnected)IoTecosystemsiscrucial,particularlytolive forensics.However,itcanbechallengingtodosomanuallyduetoofthe dynamicnatureofdatainteractionsinIoTsystems.Hence,wecould exploreusingMLtechniques,suchasclassificationalgorithms(e.g., NaiveBayesClassifier,NearestNeighbor,andSupportVector Machines)toautomatethecollectionprocess.Careshould,however,be takentoensurethatonestrikesabalancebetweenfalse-negativeand false-positive.
Examination:Thisprocessmayincludepre-processingofdigitaldata collectedfromemergingconfigurationdevices/applications,the selectionofsuitabletools(e.g.,encryptionalgorithmandhashing algorithmtobeused),andtheselectionofappropriatetechniques(e.g., logisticregression,tostatisticallyanalyzedatacollectedintheprevious process,andidentifyanyinformationusefultotheinvestigationsuchas existingrelationshipsbetweenobjectsofinterestaswellasvariables). Analysis:Successfulcompletionofexaminationwillhelpustomakean
informeddecisiononthetoolsandapproachestobeadopted.For example,shouldweuseKNearestNeighborsorDecisionTrees?Using thegeometricdistance,itmaybepossibletousethe k-nearest-neighborstodecidewhichisthenearestobjectintheecosystem.Onthe otherhand,decisiontreesmaybeusedtobreakdownanycollected datasetintosmallersubsetswhileatthesametimeincrementally developinganassociateddecisiontree.Then,liveforensicsand/or in-depthanalysisofthedatawillbeundertaken.
Reporting:Findingsfromtheanalysisprocesswillthenbeincludedin thereport,whichshouldalsoincludethetools,techniquesand
approachesused,theirrationaleandthelimitations(ifany).For example,byusingclassificationalgorithmssuchasNeuralNetwork, whatisthelimitation?Willanydatabemissedoutduringliveforensics duetotheuseofsuchclassificationalgorithms?
3.3. Supervisedmachinelearningapproaches
OneofthebenefitsofusingsupervisedMLapproachesinliveforensics isthepotentialforsuchtechniquestogiveapredictiononpossibleevents basedonpastoccurrences.Wewillnowdiscussafewpotentialsupervised MLalgorithmsthatcanbeusedinthiscontext:SupportVectorMachines (SVM),k-NearestNeighbors(kNN),NaiveBayesandRandomForests. kNN:kNNcan facilitatetheidentificationof existingrelationships
basedontheforensicallyacquireddigitaldata.Specifically,duetoits non-parametriclearningtechniques,itcanbeusedtoclassifysamples fromadatasetontheprincipleofsimilarity.Generally,kNN'soutput primarilydependsontheinstancesthatemanateorarestoredinthe memory.Also,amajorityofthekNNneighborsaretaskedwithgivinga decisionon thecontinuousvariablesthatareused[33]. TheKNN adoptsthreedistinctdistancemetrics,namely:euclidean;Manhattan and Minkowski distance functions. The algorithm in this context adoptsaKtobeequaltothesquarerootofthetuplenumbersandthen thedistancethatexistsbetweenthesamplesiscalculated.Afterthis,it issortedinascendingorderandthereafter,thenearestneighborare easilyselected.Thedistancemetricisrepresentedasfollows:
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Xk i¼1ðx iyiÞ 2 v u u t (1)
whichistheEuclideandistancefunctionfromthenearestexisting points
Xk i¼1
jxiyij (2)
whichistheManhattandistancefunction
½X k i¼1ðjx iyijÞ q 1 =q (3)
whichistheMinkowskidistancefunction
SupportVectorMachine(SVM):AsupportVectorMachine(SVM)is abletolearnbywayofassigninglabelstoobjects.BasicallySVMwhich isbasedonstatisticallearningcaneasilybeappliedinforensicanalysis ofthecollecteddigitaldatabecauseSVMisabletogeneratea hyper-planethathasacapabilityofmaximizingamarginthatexistsbetween classes[34].SVMadoptsatechniquethatallowsawholetrainingsetto be considered asthe mainrootnode of a giventree [35], which thereaftermaybesplittovarioussubnodesbasedontheexistinguseful information.Itisrepresentedasfollows:
AtrainingsetmaySmayberepresentedas:
S¼½ða1b1Þ;ða2b2Þ;...;ðanbnÞ (4)
ahyper-planeforthetrainingsetisrepresentedasF(x)=0where ai2R,bi2(1,1)thenthesumattemptstofindtheweightvectorand
thebias[36,37].Thismakesitmoresuitabletocategorizedifferent aspects anddimensions of data that is collected for purposes of forensicanalysis.
NaiveBayesAlgorithm:TheNaiveBayeswhichisalsoaclassification algorithm could be employed to predict the probabilistic of occurrenceofeventsfromagivenclass.Theauthorsstillemphasize onthefactthatNaiveBayestechniqueisindependentanddonotneed
to depend on other existing attributes. Generally, Naive Bayes classificationisbasedonextractingthestandarddeviationandthe meanduringclassification[38,39].Furthermore,itallowsinputdata tobegroupedbasedonthetrainingandtestsdata.ThisallowsNaive Bayestoworkonisolateddatawithoutliercharacteristicswhilefacing irrelevantattributesasshowninequation5.
gðx;
m
;aÞ¼ 1ffiffiffiffiffiffiffi 2mp eðx2sm2Þ2 (5)
RandomForests:Thenatureofvoluminousdatathatisacquiredfrom connectedenvironmentshasasignificanceofadoptingrandomforest classifierasasupervisedlearningtechniqueinconductingliveforensic analysis.Basicallyrandomforestallowsasingleclassifiertobeableto provideamachinelearningmodelthatisaimedatachievingdifferent reasons like parameterization and over-fitting. This is based on ensembledecisiontrees,whereeachtreeistestedindependently[40]. Thisallowsadatasettobesplitintorandomsamples.Forexample,
givenaclassc,therandomforestcanbeusedintheestimationofthe probabilitythatpredictscforasampleasfollows:
PðcjXÞX
N i¼1
pnðcjXÞ (6)
whereP(c|X)becomestheestimateddensityoftheclasslabels. Giventhatliveforensicsconsistsofdatawithcontinuousfeatures,the learningmethodswouldbemoresuitableinthiscontext.
3.4. Stagesofpreparingliveforensicdata
Thissectiongivesgeneralinsightsthatareusedwhilepreparinglive forensicdataforinvestigationusingmachinelearningapproaches. FeatureEngineering:Featureengineeringisaprocessinthis
frameworkthatdistinctivelyallowfortheselectionofvarioussubsets ofexclusivefeaturesfromasetofcollectedlivedatacomingfromECO. Thisallowsonetobeabletoobtainimportantorselectivefeaturesthat
allowproperclassification.Usingfeatureengineeringalsoallowsfor theidentificationofamultitudeoffeaturesthatcanassistduring classification[41].Thesuggestedframeworkwhenfullyimplemented inanIoTenvironmenttorealiseadigitalforensictoolwouldaimto utilizeMachinelearningalgorithmssuchasthosediscussedearlierto automatedfeatureidentificationinordertoavoidunnecessary redundanciesduringfeatureselectionandelimination.Whilethis studydoesnotemployadirectscenariowithrespective implementa-tion,itsensitizesontheneedforemployingsupervisedtechnique duringliveforensics.
FeatureSelection:Featureselectionbasicallyattemptstodistinctively selectfromasubsetoffeaturesX,asetofYfeaturesandthroughthisit minimallyidentifiesthesufficientfeaturesthatarenecessaryto improvehowagivenclassifiedmodelcanaccuratelypredictthe outcomeofliveanalysisprocess[42].Consequently,theultimategoal ofemployingfeatureselectioninthiscontextistoallow,during forensicanalysis,thelearningmodeltoidentifyusefulandkeysubsets fromadistributionoffeaturesandbeabletomapthemtotheoriginal classdistributionbasedontheidentifiedfeatures[43].
FeatureElimination:Featureeliminationemphasizesthatagivenlive forensicdatasetcanforensicallybeusedtojudgetheexclusivefeatures presentinthedatasetasbeinguseful/relevantornot.Inthiscontext relevant/usefulhasbeenusedtoshowwhetherthosefeaturesareina positionofbeingeliminatedornot.Itisimperativetopnotethata numberoffactorsmaycontributetothiselimination,forexample,in ECOthereexistrampantdynamicconfigurationandreconfigurationof devicesofemergingdevicesthatprovidemassivedata.Also,overtime someofthechangesinthetechnicalandtechnologicalaspectsmay hinderidentificationoffeaturesthatneedstobeeliminatedandthis maybeabottleneckwhenitcomestodigitalforensics.Inthisprocess, careshouldbetakenbecauseitispossibletoremovewhatmaybeuseful intheprocess[44,45].Also,[44,45]hasidentifieddifferentstrategies forforensicprofilingadversariesinthewakeofaforensicinvestigation whiledoingfeatureelimination.Thisisowingtothefactthat behavioralchangesisacommonaspect.
FeatureNormalization:Theimportanceofnormalizingthefeaturesofa givendataduringliveforensicwouldbetoindependentlygiveroomto normalizeeachofthefeaturebasedonsomegivenrangegiventhat extracteddatafromdifferentsourcesnormallyconsistofavarietyof features[46].Alsousingdifferentfeaturedistancesmeasureslikethe Euclideandistances,Manhattandistanceetcmayassigndifferent weightstotheseextracteddata.Basedonthatfeaturenormalization becomesimportantbecauseitcanbalancetherangeofthesefeatures basedonthecomputingsimilarity.Generally,theprocedureinvolves transformingthefeaturecomponentsstatisticallysuchthatthevalues areabletogivecorrectorbetterestimatesofthefeatures.Basedonthe collecteddatafromIoTenvironments,thefeaturesofthecollecteddata couldbetransformedbasedonauniformrandomvariable,basedon ranksorbasedonsomescalingapproaches[46].
FeatureRepresentation:Onecorecharacteristicsofadynamic environment,suchastheECO,istheintegrationofmultiplesourcesof informationintoacentralizedprocess.Therefore,whenseveral featuresfrommultiplesourcesareaggregatedoveragivenspectrumof analysis,therewillbeaneedtodefineauniqueformatfortheinstances ineachfeaturevector.Furthermore,afeaturespacedataformatcanbe definedtoaccommodatethepotentialheterogeneityofdata.Asaway toensuresuchprocess,thefeaturerepresentationphasewilldefinethe dataformat.
3.5. ImplementationFeasibilityofMachineLearninginECOinIoTPlatform ThegenericframeworkgiveninFig.1,isfurtherdesignedusingthe architecturalmodelforECOinIoTplatformdevelopedby[1],asshownin
Fig. 2. This consideration is then used to develop an hypothetical investigativescenario,throughwhichtheimplementationfeasibilityof supervised machine learningapproach anddigital forensicreadiness
(DFR)canbeevaluated.TheintegratedECMproposedinthisscenario consist of a goal manager, adaptation manager, context manager, enactmentengine,knowledgebase,anddigitalforensicreadinessengine. Byfunction,thegoalmanagerinterpretsthegoaloftheuser(aforensic investigatorinourcase)tocoordinateECOsthatcanbeusedtoachieve thegoal.TheAdaptationmanagerattemptstoaligntheECOstothe dynamismofthegoalandtheenvironment.Thecontextmanageronthe otherhand,attemptstomaintainthecontextualdynamismoftheECOs, whiletheenactmentengineisresponsibleforenactingECOsbyensuring thatECOsconstituentsperformfunctionalitiesinspecificsequence.The knowledgebaseservesasthesystemscontainerfortheECO.Referto[1]
fordetailsofthesecomponents.TheDFRengineisamechanismthat identifies, captures andstorespotential digital contentfrom theIoT platformbased on pre-defined rules(adaptive rule table).Such pre-defined rulesare alignswith thecontext maintained bythe context manager,andthespecificsequenceof functionalities ensured bythe enactmentengine.
Thus,theDFRengineprovideapreemptiveandproactiveapproachfor IoTinformationcollection,inmannerthatcanbeusedduringaforensic investigation.Furthermore, thenotion of DFR posit thatthe forensic soundnessoftheinformationcollectedisensured,suitableforlitigation. Inputfromthemachinelearningprocessplaysacriticalroleinthisregard. Tocorrectlyidentifythecompositionofpotentiallyviabledigitalevidence, rulesbasedondecisiontrees(RandomForestandC4.5decisiontreesfor instance),andevenNaiveBayesalgorithmcanbeleveragedtoidentifyand extractpotentialdigitalevidencefromagivencontextwithinagiventhe sequenceoffunctionalitiesofeachECOs.However,toensurethedegreeof accuracy of such rules, distance measures such as Manhattan and Minkowskidistancefunctions,andotherdissimilaritymetricsasdepicted in[47]canbeleveraged.Furthermore,theprocessofclassifyingpotential digitalinformationwouldrequireanalgorithmthatisrobusttonoise.Also, suchanalgorithmwould be fairlyrobust todimensionalitychallengesoften associatedwithsuchasexploratoryprocess.Inthisregard,classifierssuch asmulti-classsupportvectormachine,andtheNeuralNetworkfamilies can beconsidered.
4. Hypotheticalinvestigativescenario
We present a hypothetical scenario that dynamically conducts forensicactivitiesthataidinpotentialincidentidentificationbyfocusing onthreemainaspects:Collectingstreamedsensordata,analysisofthe stateofthecollectedevidencethroughdynamicdiscoveryandreporting thefindings.Inresponsetoreportsaboutapotentialincidentintheharbor area(that produceda large bang),a Law Enforcement Agent(LEA) requests a dynamically Automated Forensic Incident Management System (AFSM) to “analyze the potential incident”. An ECO is dynamicallyformedfromthedynamicallydiscovered“things”located aroundthecrimescene,e.g.,soundsensorsandacameracontrolledbythe CMAs.Thesensorsandthecamerastreamsoundandvideotothelaw enforcementserver,respectively.Consequently,thecameraisalsoableto track suspects activities spontaneously. The server can process the streameddataandclassifytheincident(e.g.,shooting,accidentetc)and thiscanbeusedtodrawconclusionsthathelpsintheformationofan objectiveforensichypothesis.Asisshownin Fig.2,thehypothetical scenariocompriseanemergentconfigurationmanagerandan investiga-tioninvocationmodulesthatplaysasignificantroleinpotentialincident identification.Theconceptbehindthisapproachisthat,likeabroker,a requestercanqueryforthediscoveryofthingsandoncethethingsare discoveredtheresponseis relayedtotherequesterin order todraw conclusiononthepotentialincidentasisillustratedbyFig.3.
4.1. Investigationinvocation
Fromtheaforementionedscenario,theinvestigationisinvokedby wayofqueryingtheAFIMS,thathasaknowledgebasetoassistinforensic
incident identification approaches in IoT environments. This allows analysistobeconductedeffectivelybasedonrapiddynamicdiscoveryof thingsthatsurroundthecrimescene.Consequently,amongtheimportant aspectoftheinvestigationinvocationistheactivationof ECObythe AFMISthatmakesitpossibleforstreamedsensordatatobeanalysed usingsupervisedmachinelearningapproaches.
4.2. DFRemergentconfigurationmanager
Leveraging the architecture on emergent configuration manager (ECM)developedin[1]toformulatethescenario,anintegratedadigital forensic readiness (DFR) engine as a major component is further introduced to addressforensicinvestigation processrequirement. As statedin[1,3],anECMisresponsibleforthemanagementoftheemergent configuration in waysthat addresses emergent user needs, adaptive capabilities included. Furthermore, the emergent configuration is referredtoasacollaborativeenvironmentofdiverse‘Things’designed towardsacommongoalaswellastoaddressapotentiallyunforeseen contextual dynamism. The integrated DFR emergent configuration manageris thereforea mechanismthatiscapable of identifyingand storepotentialdigitalevidencethatwouldotherwisenotbeavailable whentheinvestigationprocessisinvoked.Theadaptedapproachfurther combines the DFR engine, and the supervised machine learning componentsintotheexistingECOarchitecture,toformanintelligent engine whichcanbe leveragedfor investigation.This Ideaisfurther depicted in the example scenario presented in Part-4 of Fig. 2. Specifically, the intelligent unit is responsible for the extraction of context,incidentidentification,andincidentanalysis.Whilstthemachine learningcomponentoftheintelligentforensicplatformcanbeusedto extract meaningfulpattern fromthecontext extractedfrom theECO streameddata,theDFRenginecanbeusedtoascertainanddocumentthe potential incidentwhich the investigatorwouldthenuse toconduct investigation.SufficeittonotethattheDFRenginepresentsaproactive mechanismfor apotential investigation.This notion isbased onthe assertionthataproperlydevelopedDFRenginewillhavethecapacityto queryandbequeried,aswellasastoragepotential.However,thiscould furtheropenupapotentialincidentcategorization andidentification challenge,asextensivelyhighlightedin[48].
5. Discussions
TheheterogeneityandthedynamiccompositionofanECOrepresents aclassicalfeatureengineeringproblemwhichunderminesthereliability (particularlytheareaunderthereceiveroperatingcharacteristicscurve -AUC)ofanymachinelearning(supervised,reinforced,semi-supervised, or unsupervised) approach. Fundamental to this problem is the probability of extracting relevant anduseful configuration data that canbeleveragedtoconductaliveforensicanalysis.TheproposedECO framework(asdepictedintheHigh-levelapproachinFig.1)presents baselinefortherealizationofaliveforensicanalysisinanygivenIoT environment.However,thechallengeofforensicanalysis,specificallyin adynamicenvironment,containsmyriadofchallengeswhichshouldbe addressedgoingforward.Oneofsuchchallengesincludethepotentialof largefeaturespacewhichistypicallytermed“curseofdimensionality”in thesoftcomputingdiscipline.Additionally,thedynamicdiscoveryof things within the proximity of the crime scene as is shown in the hypotheticalscenario(seeSection4)showsthatfundamentallystreamed sensordatacouldeasilybe usedtoconductliveforensicanalysisfor purposesof incident identificationgiven that theAFIMS could only triggeredwhenapotentialincidentisthoughttohaveoccurred.Basedon this,theauthorshavebeenabletoputacrossfar-reachingpropositions thathaveafocusonhowECOcanbeutilisedinIoTenvironmentto achieve this objective. Consequently, several feature engineering approaches and supervised machine learning approaches have been developed inthe soft computingdomain toattemptto addresssuch challenge. However, such solution would further require a context dependentapproachtobetterengineerandcontextualizethesolutionto achieve areliable outcome.Expectantly, the induction of dimension reductionalgorithms wouldgenerate acontext-dependentweight for featureswithinthefeaturespace.Consequently,theweightofagiven featurewithinthefeaturespacecan beusedtoredesigntheforensic analysisprocess.Studiesin[49,50] haveexploreddiversesupervised learningalgorithmsthatcanbeappliedtoaugmentsuchalive(near-real time) analysis process. Besides, another potentially fundamental challengeis theprocessofascertaining therelevance ofeachfeature inthefeaturespace,beyondthesemanticweightofthefeature.Whilst dimensionality reduction algorithms, such as principal component analysis,aresuitableandfundamentallyrequiredinanydynamicdata classificationtasks,thedegreeof(forensic)evidentialusefulnessofthe featurepresentsalogicalchallengetowardsthereliabilityoftheforensic process.Thisisessentiallyimportantinaliveforensicanalysisprocess wherealowcomputationaltimeisrequired.Inaddition,thedegreeof accuracyisrequiredtobeveryhigh,asfalseerrorrateisexpectedtothe minimizedto0.001[51].
Thedefinitionofappropriatemetricsofevaluationwouldbeanother area of interestin this proposed approach. Existing soft computing metrics such as accuracy, specificity, equal error rate, AUC, and F-measure are often suggested to be effective. However, given the contextualandlive-natureoftheproposedanalysisapproach,theneed todevelopacontext-basedevaluationmetricscouldarise.Thisiscouldbe essentialwhennoaprioriinformationordatabasemightbeavailable.The lackofaprioriinformationwouldevidentlysuggestthatanunsupervised machinelearningapproachwouldbeconsidered,orareinforcedlearning approach.Thiscan,however,beextensivelyexploredinthe experimen-tationphaseoftheproposedapproach.
Peculiar to the proposed analysis process is the potential of a supervised ML approach to data analysis. Whilst the unsupervised approach could provide a direct approach to analysis through clusterization,theinductionof asupervisedapproachcanbeusedto finetunethedegreeof accuracyof theanalysisprocess. Asupervised approachpositsthattheinputdatastreamisparsedintoclasseswhichare then fed into the learning algorithm(s). The class-formation would, therefore, be a potential challenge in an ECO in IoT, which is characterizedbyheterogeneousstreamsofdatasources.However,given
thatECOsaresystemsformedbya setof things,withtheirservices, functionalities,andapplications,thatcooperatetemporarilytoachieve someusergoal,aninvestigatorcouldleveragethecommonalitytodefine classes.Forinstance,inputstreamsfromapplicationsandservicesfrom different‘Things’canbeclassifieddistinctlyusingidentifiersfromsuch sources.Consequently,thiscanprovideabaselineforextractingclasses forthesupervisedmachinelearningprocess.However,thereexistthe potentialofmiss-classificationexceptwhenafundamentalframeworkis definedasabaselineforclassformation.Therefore,aforensicanalysis processinanemergentconfigurationinIoTenvironmentwouldrequire thedefinitionofsuchclassidentificationandextractionprocess.
6. Conclusionandfutureworks
We explained the importance of a context-dependent on-the-fly forensicanalysisprocesstofacilitateliveforensicanalysisonemergent configurationin IoTenvironment.Specifically,ourconceptual frame-workleveragesNISTSP800-86standardandsupervisedMLapproaches. Suchaproposedapproachhasthepotentialtobeagamechangerin IoTforensics,althoughextensiveevaluationsondifferentdatasetsfroma broadrangeofapplicationsarerequired.However,carefulplanningon the evaluation scenarios is required. Hence, one potential research agenda is tocollaborate closely withrelevant stakeholder groupsto designanddevelopdifferentevaluationscenarios.
Oncetheseevaluationscenarioshavebeendeveloped,wewillalso evaluate a prototype of our proposed framework in the different scenarios.Thiswillallowustoidentifyanylimitations,forexamplein theMLtechniques,scenarios,orconfigurations.
DeclarationofCompetingInterests
Theauthorshavenocompetingintereststodeclare. AppendixA.Supplementarydata
Supplementarydataassociatedwiththisarticlecanbefound,inthe onlineversion,athttp://dx.doi.org/10.1016/j.fsir.2020.100122. References
[1]F.Alkhabbas,R.Spalazzese,P.Davidsson,Eco-iot:Anarchitecturalapproachfor realizingemergentconfigurationsintheinternetofthings,EuropeanConferenceon SoftwareArchitecture(2018)86–102.
[2]F.Alkhabbas,R.Spalazzese,P.Davidsson,Architectingemergentconfigurationsinthe internetofthings,2017IEEEInternationalConferenceonSoftwareArchitecture (ICSA)(2017)221–224.
[3]F.Alkhabbas,R.Spalazzese,P.Davidsson,Emergentconfigurationsintheinternetof thingsassystemofsystems,2017IEEE/ACMJoint5thInternationalWorkshopon SoftwareEngineeringforSystems-of-Systemsand11thWorkshoponDistributed SoftwareDevelopment,SoftwareEcosystemsandSystems-of-Systems.(JSOS)(2017) 70–71.
[4]F.Alkhabbas,M.Ayyad,R.-C.Mihailescu,P.Davidsson,Acommitment-based approachtorealizeemergentconfigurationsintheinternetofthings,2017IEEE InternationalConferenceonSoftwareArchitectureWorkshops(ICSAW)(2017)88– 91.
[5]F.Alkhabbas,R.Spalazzese,P.Davidsson,Iot-basedsystemsofsystems,Proceedings ofthe2ndeditionofSwedishWorkshopontheEngineeringofSystemsofSystems (SWESOS2016)(2016).
[6]W.Wu,J.Alvarez,C.Liu,H.-M.Sun,Botdetectionusingunsupervisedmachine learning,MicrosystemTechnologies24(2018)209.
[7]N.L.Beebe,J.G.Clark,Digitalforensictextstringsearching:Improvinginformation retrievaleffectivenessbythematicallyclusteringsearchresults,DigitalInvest.4 (2007)49.
[8]H.H.Pajouh,R.Javidan,R.Khayami,D.Ali,K.-K.R.Choo,Atwo-layerdimension reductionandtwo-tierclassificationmodelforanomaly-basedintrusiondetectionin iotbackbonenetworks,IEEETransactionsonEmergingTopicsinComputing(2016).
[9]A.J.Tall’on-Ballesteros,J.C.Riquelme,Dataminingmethodsappliedtoadigital forensicstaskforsupervisedmachinelearning,ComputationalIntelligenceinDigital Forensics:ForensicInvestigationandApplications(2014)413–428.
[10]M.Mohlala,A.R.Ikuesan,H.S.Venter,Userattributionbasedonkeystrokedynamics indigitalforensicreadinessprocess,2017IEEEConferenceonApplication InformationNetworkSecurity(AINS)(2017)124–129.
[11]I.R.Adeyemi,S.AbdRazak,M.Salleh,Understandingonlinebehavior:exploringthe
probabilityofonlinepersonalitytraitusingsupervisedmachine-learningapproach, Front.ICT3(2016)8.
[12]N.Huang,J.He,B.Zhao,G.Liu,Forensicanalysisofdistributedcomputingnetwork basedondecisionvalues,2016InternationalSymposiumonComputerConsumer Control(IS3C)(2016)423–427.
[13]M.Buccoli,P.Bestagini,M.Zanoni,A.Sarti,S.Tubaro,Unsupervisedfeaturelearning forbootlegdetectionusingdeeplearningarchitectures,2014IEEEInternational WorkshoponInformationForensicsSecurity(WIFS)(2014)131–136.
[14]V.H.Bhide,S.Wagh,i-learningiot:Anintelligentselflearningsystemforhome automationusingiot,2015InternationalConferenceonCommunicationsandSignal Processing(ICCSP)(2015)1763–1767.
[15]H.-S.Ham,H.-H.Kim,M.-S.Kim,M.-J.Choi,Linearsvm-basedandroidmalware detectionforreliableiotservices,J.Appl.Math.2014(2014).
[16]N.M.Karie,V.R.Kebande,H.Venter,Divergingdeeplearningcognitivecomputing techniquesintocyberforensics,ForensicSci.Int.:Synergy1(2019)61.
[17]M.N.A.Khan,DigitalForensicsusingMachineLearningMethodsPh.D.thesis,school UniversityofSussex, 2008.
[18]S.Costantini,G.DeGasperis,R.Olivieri,Digitalforensicsandinvestigationsmeet artificialintelligence,Ann.Math.Artif.Intel.(2019).
[19]F.Mitchell,Theuseofartificialintelligenceindigitalforensics:Anintroduction, DigitalEvid.Elec.SignatureL.Rev.(2010).
[20]P.P.Ray,Asurveyoninternetofthingsarchitectures,J.KingSaudUniv.-Comput. Inform.Sci.(2018).
[21]S.Khorashadizadeh,A.R.Ikuesan,V.R.Kebande,Generic5ginfrastructureforiot ecosystem,InternationalConferenceofReliableInformationandCommunication Technology(2019)451–462.
[22]R.-C.Mihailescu,R.Spalazzese,C.Heyer,andP.Davidsson,Arole-basedapproachfor orchestratingemergentconfigurationsintheinternetofthings,arXivpreprint arXiv:1809.09870(2018).
[23]V.R.Kebande,I.Ray,Agenericdigitalforensicinvestigationframeworkforinternetof things(iot),2016IEEE4thInternationalConferenceonFutureInternetofThings Cloud(FiCloud)(2016)356–362.
[24]S.Li,K.-K.R.Choo,Q.Sun,W.J.Buchanan,J.Cao,Iotforensics:Amazonechoasause case,IEEEInternetThingsJ.6(2019)6487.
[25]X.Zhang,O.Upton,N.L.Beebe,K.-K.R.Choo,Iotbotnetforensics:Acomprehensive digitalforensiccasestudyonmiraibotnetservers,ForensicSci.Int.:DigitalInvest.32 (2020)300926.
[26]X.Zhang,K.-K.R.Choo,DigitalForensicEducation:AnExperientialLearning Approach,Vol.61,Springer, 2019.
[27]X.Zhang,K.-K.R.Choo,N.L.Beebe,Howdoisharemyiotforensicexperiencewiththe broadercommunity?.anautomatedknowledgesharingiotforensicplatform,IEEE InternetofThingsJ.6(2019)6850.
[28]O.Alkadi,N.Moustafa,B.Turnbull,K.-K.R.Choo,Adeepblockchain framework-enabledcollaborativeintrusiondetectionforprotectingiotandcloudnetworks,IEEE InternetThingsJ.(2020).
[29]M.Saharkhizan,A.Azmoodeh,A.Dehghantanha,K.-K.R.Choo,R.M.Parizi,An ensembleofdeeprecurrentneuralnetworksfordetectingiotcyberattacksusing networktraffic,IEEEInternetThingsJ.(2020).
[30]F.Alkhabbas,M.DeSanctis,R.Spalazzese,A.Bucchiarone,P.Davidsson,A.Marconi, Enactingemergentconfigurationsintheiotthroughdomainobjects,International ConferenceonService-OrientedComputing(2018)279–294.
[31]R.-C.Mihailescu,J.Persson,P.Davidsson,U.Eklund,Towardscollaborativesensing usingdynamicintelligentvirtualsensors,InternationalSymposiumonIntelligentand DistributedComputing,Springer,2016,pp.217–226.
[32]A.Tegen,P.Davidsson,R.-C.Mihailescu,J.A.Persson,Collaborativesensingwith interactivelearningusingdynamicintelligentvirtualsensors,Sensors19(2019)477.
[33]A.Keramati,R.Jafari-Marandi,M.Aliannejadi,I.Ahmadian,M.Mozaffari,U.Abbasi, Improvedchurnpredictionintelecommunicationindustryusingdatamining techniques,Appl.SoftComput.24(2014)994.
[34]C.Cortes,V.Vapnik,Support-vectornetworks,MachineLearn.20(1995)273.
[35]Q.He,J.-F.Chen,Theinverseproblemofsupportvectormachinesanditssolution, 2005InternationalConferenceonMachineLearningandCybernetics,Vol.7,IEEE, 2005,pp.4322–4327.
[36]Z.Liu,L.Bai,Evaluatingthesuppliercooperativedesignabilityusinganovelsupport vectormachinealgorithm,200812thInternationalConferenceonComputer SupportedCooperativeWorkinDesign,IEEE,2008,pp.986–989.
[37]L.-m.He,X.-b.Yang,F.-s.Kong,2006InternationalConferenceonMachineLearning Cybernetics,IEEE, 2006,pp.3503–3507Supportvectormachinesensemblewith optimizingweightsbygeneticalgorithm.
[38]Y.N.Dewi,D.Riana,T.Mantoro,Improvingna”ivebayesperformanceinsingleimage papsmearusingweightedprincipalcomponentanalysis(wpca),2017International ConferenceonComputing,Engineering,andDesign(ICCED),1(2017).
[39]S.N.N.Alfisahrin,T.Mantoro,Dataminingtechniquesforoptimizationofliverdisease classification,2013InternationalConferenceonAdvancedComputerScience ApplicationsTechnologies,IEEE,2013,pp.379–384.
[40]L.Breiman,Randomforests,Mach.Learn.45(2001)5.
[41]V.N.Garla,C.Brandt,Ontology-guidedfeatureengineeringforclinicaltext classification,J.Biomed.Inform.45(2012)992.
[42]M.Dash,H.Liu,Featureselectionforclassification,IntelligentDataAnal.1(1997) 131.
[43]P.M.Narendra,K.Fukunaga,Abranchandboundalgorithmforfeaturesubset selection,IEEETrans.Comput.917(1977).
[44]Z.M.Hira,D.F.Gillies,Areviewoffeatureselectionandfeatureextractionmethods appliedonmicroarraydata,Adv.Bioinform.(2015).
[45]K.Kira,L.A.Rendell,etal.,Thefeatureselectionproblem:Traditionalmethodsanda newalgorithm,Aaai,Vol.2(1992)129–134.
[46]S.Aksoy,R.M.Haralick,Featurenormalizationandlikelihood-basedsimilarity measuresforimageretrieval,PatternRecognit.Lett.22(2001)563.
[47]A.R.Ikuesan,M.Salleh,H.S.Venter,S.A.Razak,S.M.Furnell,Aheuristicsforhttp trafficidentificationinmeasuringuserdissimilarity,Human-IntelligentSyst. Integration1(2020).
[48]A.Al-Dhaqm,S.Razak,D.A.Dampier,K.R.Choo,K.Siddique,R.A.Ikuesan,A.Alqarni, V.R.Kebande,Categorizationandorganizationofdatabaseforensicinvestigation
processes,IEEEAccess1(2020).
[49]A.R.Ikuesan,S.A.Razak,H.S.Venter,M.Salleh,Polychronicitytendency-basedonline behavioralsignature,Int.J.MachineLearn.Cybernet.10(2019)2103.
[50]I.R.Adeyemi,S.A.Razak,M.Salleh,H.S.Venter,Observingconsistencyinonline communicationpatternsforuserre-identification,PLOSONE11(2016)e0166930.
[51]A.R.Ikuesan,H.S.Venter,Digitalbehavioral-fingerprintforuserattributionindigital forensics:Arewethereyet?DigitalInvest.30(2019)73.