• No results found

Quantifying the need for supervised machine learning in conducting liveforensic analysis of emergent configurations (ECO) in IoT environments

N/A
N/A
Protected

Academic year: 2021

Share "Quantifying the need for supervised machine learning in conducting liveforensic analysis of emergent configurations (ECO) in IoT environments"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

Quantifying

the

need

for

supervised

machine

learning

in

conducting

live

forensic

analysis

of

emergent

configurations

(ECO)

in

IoT

environments

Victor

R.

Kebande

a,

*

,

Richard

A.

Ikuesan

b

,

Nickson

M.

Karie

c

,

Sadi

Alawadi

a

,

Kim-Kwang

Raymond

Choo

d

,

Arafat

Al-Dhaqm

e

aDepartmentofComputerScience,MalmöUniversity,Sweden

bCyberandNetworkSecurityDepartment,ScienceandTechnologyDivision,CommunityCollegeofQatar,Qatar cSchoolofScience,EdithCowanUniversity,Australia

dDepartmentofInformationSystemsandCyberSecurity,UniversityofTexasatSanAntonio,SanAntonio,TX78249-0631,USA eSchoolofComputing,FacultyofEngineering,UniversitiTeknologiMalysia,Johor,Malaysia

ABSTRACT

Machinelearninghasbeenshownasapromisingapproachtominelargerdatasets,suchasthosethatcomprisedata

fromabroadrangeofInternetofThingsdevices,acrosscomplexenvironment(s)tosolvedifferentproblems.This

papersurveysexistingliteratureonthepotentialofusingsupervisedclassicalmachinelearningtechniques,suchas

K-NearestNeigbour,SupportVectorMachines,NaiveBayesandRandomForestalgorithms,inperforminglive

digitalforensicsfordifferentIoTconfigurations.Therearealsoanumberofchallengesassociatedwiththeuseof

machinelearningtechniques,asdiscussedinthispaper.

ARTICLE INFO Keywords: Supervisedmachine Learning Liveforensics Emergentconfigurations IoT 1. Introduction

AsInternetofThings(IoT)devicesbecomethenorm,sodoestheneed forIoTforensics.Thelatterisabranchofdigitalforensics,whichinvolves theinvestigationofIoTdevicesaswellasthesupportinginfrastructure. Unlikeconventionaldigitalforensics,collectingoracquiringevidence fromIoTdevicescanbechallengingduetothediversityofIoTdevicesand theunderpinningoperatingandfilesystems.

ItisalsonotedthatinanIoTsystem,especiallyinthecaseofemergent configurations(ECOs),datacanbedynamicandconsequently challeng-ingtolabeldatasetsduringliveforensics.Liveforensicsinthiscontext referstoaforensicinvestigationconductedinnearreal-time.ECOs,as definedbyexistingstudies[1–4],aresystemsformedbyasetofthings, with their services, functionalities, and applications, that cooperate temporarily toachieve some user goals. ECOs adapt in responseto (unforeseen)contextualchanges,suchaschangesinavailablethingsor user goals. Given the heterogeneity and increased connectivity of emergingconfigurations,ECOsplatformscanbechallengingtoperform liveforensics,giventhatsuchsystemsmaycompriseoneormoredynamic andheterogeneous(IoT)systems,whichmayalsobedistributed[5].

Inrecenttimes,therehavebeenattemptstoutilizemachinelearning (ML)techniquestofacilitatedigitalforensics,includingIoTforensics. However,thisinclusionhaslargelybeenwithinthescopeofstaticIoT

platformssuchasSmartHomeswherethe‘contextofthings’arelargely unchanged. Hence, in this manuscript, the authors survey existing literature on the use of supervised ML techniques (e.g., K-Nearest Neigbour,SupportVectorMachines(SVM),NaiveBayesandRandom Forest)inconductingliveforensicsacrossdynamicandcontext-changing IoTsystems,typicalofECOs.Atthetimeofourstudy,thisisthefirststudy toexplore thefeasibility ofintegrating MLinto anECO platformto facilitatedigitalforensics.Therefore,thecontributionsofthispaperareas follows:

 explorethefeasibilityofintegratingsupervisedMLtechniquesto performliveforensicanalysisinadynamic(ECO)IoTplatform;  demonstratehowforensicactivitiescoulddynamicallybeconductedin

anECOenvironment;and

 provideacontextualevaluationthatshowsthattheforensicchallenges inanIoTenvironmentandhowautomationforincidentidentification mayoccur.

InSection2,areviewoftherelatedliteratureandtheresearchgap fromexisting studiesarepresented.Then, inSections IIIandIV,we presentourproposedconceptualframeworkandhowitcanbedeployed. Discussionsandconclusionarepresentedinthelasttwosectionsofthis manuscript.

* Correspondingauthor.

E-mailaddresses:victor.kebande@mau.se(V.R. Kebande),richard.ikuesan@ccq.edu.qa(R.A. Ikuesan),n.karie@ecu.edu.au(N.M. Karie),sadi.alawadi@mau.se

(S.Alawadi),raymond.choo@fulbrightmail.org (K.-K.R.Choo),mrarafat@utm.my (A.Al-Dhaqm).

http://doi.org/10.1016/j.fsir.2020.100122

Received21April2020;Receivedinrevisedform12June2020;Accepted8July2020

Availableonline15July2020

2665-9107/©2020TheAuthor(s).PublishedbyElsevierB.V.ThisisanopenaccessarticleundertheCCBYlicense(http://creativecommons.org/licenses/by/4.0/).

ForensicScienceInternational:Reports2(2020)100122

ContentslistsavailableatScienceDirect

Forensic

Science

International:

Reports

(2)

2. Relatedliterature 2.1. Existingliterature

Machine Learning (and deep learning) approaches have gained renewed interest in recent years, such as the example approaches presentedinTable1.

TherehavealsobeenattemptstoutilizeMLanddeeplearningfor digital(forensic)investigations.Forexample,agenericframeworkthat allowstheapplicationofdeeplearningcognitivecomputingtechniquesin cyberforensics(CF)waspresentedin[16].However,thisframeworkis notdesignedtofacilitateliveforensicsinIoTenvironment.

Anotherresearchby[17]exploredtheeffectivenessofemploying machine learning methodologies for computer forensic analysis by tracingpastfilesystemactivitiesandpreparingatimelinetofacilitatethe identificationofincriminatingevidence.Theirapproachis,however,not designedtofacilitateIoTforensics.Costantinietal.[18]exploredthe applicabilityofartificialintelligence(AI)alongwithcomputationallogic toolstoautomateevidenceanalysis,whileMitchell[19]discussedthe potentialusefulnessofAIindigitalforensics.

However, we observe that the potential role of supervised ML techniquesinliveforensicsonECOsacrossIoTecosystemsisnot well-understoodorfullyexploredintheliterature.TheconceptofECOisnot relativelynew,andhasbeenextensivelystudied[1–4].Generally,ECOs areformedbyasetofthings,withtheirservices,functionalities,and applications,whichcooperateonanad-hocbasistoachievesomeuser goal[2,4].ECOsareadaptedinresponseto(unforeseen)contextchanges, suchaschangesintheresourcesavailableorchanging/evolvinguser goals.Giventheheterogeneityandincreasedconnectivityofemerging configurations,itcanbechallengingtoidentifymaliciousactivitiesin ECOs.

TheconnectionbetweenIoTandECOscanbebroadlyexplainedbythe widespreadadoptionofIoTindifferentsectors(e.g.,smarthealth,smart transport,smartcities,automation,agriculture,andmanufacturing).IoT isalsoregardedasadisruptivetechnology,includingbytheUSNational intelligencecouncil[20,21].Therefore,inthecontextofIoTECOs,we

needtoconsideremergentbehavior,connectivity(exchangeof informa-tion), localization and tracking, how distributed components are, ubiquityanddeviceheterogeneity.Implicationsforforensicinvestigation areexistenceof interaction,coordinationandinteroperability,which mainlyencompassesevents,context,environmentandactions[22].

2.2. Researchgaps

Basedonareviewoftheexistingliterature,weidentifythefollowing researchgaps.

 Theshiftinconventionaldigitalforensicstocloudforensics,network forensics,device-levelforensicsandliveforensicsacrosstheIoT ecosystemshascompoundedthechallengesinperformingdigital investigations,forexampleintermsofdatasizeandtherapidlychanging technologicallandscape[23–26].Hence,thereisaneedtoensurethat digitalforensiccapabilitieskeeppacewithemergingtechnologies[27], aswellasdesigningAI-basedapproachestofacilitatedigitalforensics andreal-timeincidentdetectionandincidentresponseforECOs[28,29]. ThisnecessitatestheunderstandingofthecompositionofECOs,for exampleintermsofprocessandarchitecture[30].

 Conventionallabeleddatasetsandextractedfeaturesmaynot necessarilybeusefultofacilitateliveforensicacrossemergingIoT configurations,duetothedynamicnatureofthesysteminteractions andthreatlandscape[31,32].

3. Proposedframeworkforadoptingsupervisedmachinelearning approaches

Wewillnowpresentourproposedconceptualframework,asshownin

Fig.1.Thethreekeybuildingblocksarediscussednext. 3.1. EmergingIoTconfigurations

ECOscanbebroadlydefinedtobeadynamiccollectionof‘things’with functionalities seeking to achieve a given goal [1], and a concrete

Table1

SnapshotofexistingMLapproachesinsecurityincidents.

Reference Objective Machinelearning approaches

Algorithmused Application

[6] Botdetectionusing unsu-pervisedlearning

(Unsupervised Machine Learning)

Flowclustering,andsimpleK-means clustering

Basedontheflowsgeneratedbybotsbasedonthedestination portnumber,largestsizeofpacket,smallestsizeofpacket,the timethepacketisflagged.

[7] Digitalforensictextstring searching

(Unsupervised Machine Learning)

Clusteringdigitalforensictextstring.. UsesSelf-organizingmapstoteststhefeasibilityandutilityof post-retrievalclusteringofdigitalforensictextstringsearch results

[8] Classificationmodelfor anomaly-basedintrusion detection

(supervised Ma-chineLearning)

NaïveBayesclassification,K-nearest Neigbor

UsedNSL-KDDdatasettodetectUsertoRoot(U2R)andremoteto Local(R2L).

[9] forensicsdatataskfor multi-classclassification

-(Supervisedand neuralnetworks)

Decisiontrees,Bayesclassifiers,ANNand Nearestneighbor

Classifiershavebeenevaluatedbasedperformancemeasuresand Cohen'skappa.Astatisticalanalysishasbeenconductedinorder tocomputeeachofalgorithmsbasedonaccuracy

[10] Digitalforensicreadiness Supervised Learn-ingApproach

Bayes,NeuralNet,SVM,C4.5,HMM, NearestNeighbor,LogisticModeltree

ImplementedC4.5decisiontreeonKeystrokedatasetforliveuser identification

[11] UserIdentification Supervised Learn-ingApproach

Rulebasedmachinelearning,Decision Treeclassifier

Usedlabeleddatatoperformuseridentification

[12] Networkforensicanalysis. Feature engineer-ingatAnalysis layer.

AnalysedKDDCup99Datasetbyapplying areputationvalueindataanalysis method

TheauthorusedKDDCup’99collectionof9weekTCPdump datasetswhichhasshownrealtimeperformanceofthenetwork basedonthereputationvalue

[13] Passiveaudiobootleg detector

(Deeplearning andsupervised)

Deeplearning,DeepBeliefNetwork (DBN),classification-SVM.

ImplementedthreeclassSVMandappliedfeaturelearningto detectwhethermusicaudiotrackrelatestounauthorized recording

[14] IntelligentSelf-learning sys-temforhomeautomationin IoT

(GuidedLearning classification)

NaiveBayesAlgorithm Automaticfaultdetectioninconnecteddevices

[15] SVM-basedmalware detec-tionforIoTservices

(GuidedLearning classification)

(3)

implementationscenariowillbepresentedinSection4.TheECOsare designedtoachievetheirgoalsoverheterogeneousenvironment,and facilitaterealtimeinteractionsofscenariosthroughsuccessfulexecutions whilealsoensuringinteroperability.InIoTsettings,suchinteractions normallyrequire anumberof actions tobe executed,which implies massiveamountofdatathatcanbeexploitedbycyberattackers(e.g.,as the proverbial phrase, ‘needle in a haystack’). Hence, an in-depth understandingof theconfigurationsandthepotentialdatatypesand sourceswillsignificantlyreducetheamountoftimerequiredinforensic investigations.

3.2. NISTdigitalforensicprocess

Whilethereareanumberofexistingdigitalforensicprocess,weuse NIST Special Publication 800-86 as the guiding process due to its widespread adoption and that it allows the integration of forensic techniques into incident response. Similar to other digital forensic domains,IoTforensicsmaycrossjurisdictionsandhenceinvolvedifferent lawsandrequirements,forexampleintermsofevidencecollectionand admissibility.AsIoTsystemsmaybedeployedincriticalinfrastructure sectors,wheretakingitofflineforforensicinvestigationsisimpractical, weposittheimportanceofliveforensic-readinesstoo.Theroleofeachof theseprocessesisoutlinedbelowinthecontextofIoT.

 Collection:Timelyidentificationofpotentialevidencesourcesin (interconnected)IoTecosystemsiscrucial,particularlytolive forensics.However,itcanbechallengingtodosomanuallyduetoofthe dynamicnatureofdatainteractionsinIoTsystems.Hence,wecould exploreusingMLtechniques,suchasclassificationalgorithms(e.g., NaiveBayesClassifier,NearestNeighbor,andSupportVector Machines)toautomatethecollectionprocess.Careshould,however,be takentoensurethatonestrikesabalancebetweenfalse-negativeand false-positive.

 Examination:Thisprocessmayincludepre-processingofdigitaldata collectedfromemergingconfigurationdevices/applications,the selectionofsuitabletools(e.g.,encryptionalgorithmandhashing algorithmtobeused),andtheselectionofappropriatetechniques(e.g., logisticregression,tostatisticallyanalyzedatacollectedintheprevious process,andidentifyanyinformationusefultotheinvestigationsuchas existingrelationshipsbetweenobjectsofinterestaswellasvariables).  Analysis:Successfulcompletionofexaminationwillhelpustomakean

informeddecisiononthetoolsandapproachestobeadopted.For example,shouldweuseKNearestNeighborsorDecisionTrees?Using thegeometricdistance,itmaybepossibletousethe k-nearest-neighborstodecidewhichisthenearestobjectintheecosystem.Onthe otherhand,decisiontreesmaybeusedtobreakdownanycollected datasetintosmallersubsetswhileatthesametimeincrementally developinganassociateddecisiontree.Then,liveforensicsand/or in-depthanalysisofthedatawillbeundertaken.

 Reporting:Findingsfromtheanalysisprocesswillthenbeincludedin thereport,whichshouldalsoincludethetools,techniquesand

approachesused,theirrationaleandthelimitations(ifany).For example,byusingclassificationalgorithmssuchasNeuralNetwork, whatisthelimitation?Willanydatabemissedoutduringliveforensics duetotheuseofsuchclassificationalgorithms?

3.3. Supervisedmachinelearningapproaches

OneofthebenefitsofusingsupervisedMLapproachesinliveforensics isthepotentialforsuchtechniquestogiveapredictiononpossibleevents basedonpastoccurrences.Wewillnowdiscussafewpotentialsupervised MLalgorithmsthatcanbeusedinthiscontext:SupportVectorMachines (SVM),k-NearestNeighbors(kNN),NaiveBayesandRandomForests.  kNN:kNNcan facilitatetheidentificationof existingrelationships

basedontheforensicallyacquireddigitaldata.Specifically,duetoits non-parametriclearningtechniques,itcanbeusedtoclassifysamples fromadatasetontheprincipleofsimilarity.Generally,kNN'soutput primarilydependsontheinstancesthatemanateorarestoredinthe memory.Also,amajorityofthekNNneighborsaretaskedwithgivinga decisionon thecontinuousvariablesthatareused[33]. TheKNN adoptsthreedistinctdistancemetrics,namely:euclidean;Manhattan and Minkowski distance functions. The algorithm in this context adoptsaKtobeequaltothesquarerootofthetuplenumbersandthen thedistancethatexistsbetweenthesamplesiscalculated.Afterthis,it issortedinascendingorderandthereafter,thenearestneighborare easilyselected.Thedistancemetricisrepresentedasfollows:

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Xk i¼1ðx iyiÞ 2 v u u t (1)

whichistheEuclideandistancefunctionfromthenearestexisting points

Xk i¼1

jxiyij (2)

whichistheManhattandistancefunction

½X k i¼1ðjx iyijÞ q 1 =q (3)

whichistheMinkowskidistancefunction

 SupportVectorMachine(SVM):AsupportVectorMachine(SVM)is abletolearnbywayofassigninglabelstoobjects.BasicallySVMwhich isbasedonstatisticallearningcaneasilybeappliedinforensicanalysis ofthecollecteddigitaldatabecauseSVMisabletogeneratea hyper-planethathasacapabilityofmaximizingamarginthatexistsbetween classes[34].SVMadoptsatechniquethatallowsawholetrainingsetto be considered asthe mainrootnode of a giventree [35], which thereaftermaybesplittovarioussubnodesbasedontheexistinguseful information.Itisrepresentedasfollows:

AtrainingsetmaySmayberepresentedas:

S¼½ða1b1Þ;ða2b2Þ;...;ðanbnÞ (4)

ahyper-planeforthetrainingsetisrepresentedasF(x)=0where ai2R,bi2(1,1)thenthesumattemptstofindtheweightvectorand

thebias[36,37].Thismakesitmoresuitabletocategorizedifferent aspects anddimensions of data that is collected for purposes of forensicanalysis.

 NaiveBayesAlgorithm:TheNaiveBayeswhichisalsoaclassification algorithm could be employed to predict the probabilistic of occurrenceofeventsfromagivenclass.Theauthorsstillemphasize onthefactthatNaiveBayestechniqueisindependentanddonotneed

(4)

to depend on other existing attributes. Generally, Naive Bayes classificationisbasedonextractingthestandarddeviationandthe meanduringclassification[38,39].Furthermore,itallowsinputdata tobegroupedbasedonthetrainingandtestsdata.ThisallowsNaive Bayestoworkonisolateddatawithoutliercharacteristicswhilefacing irrelevantattributesasshowninequation5.

gðx;

m

;aÞ¼ 1ffiffiffiffiffiffiffi 2m

p eðx2sm2Þ2 (5)

 RandomForests:Thenatureofvoluminousdatathatisacquiredfrom connectedenvironmentshasasignificanceofadoptingrandomforest classifierasasupervisedlearningtechniqueinconductingliveforensic analysis.Basicallyrandomforestallowsasingleclassifiertobeableto provideamachinelearningmodelthatisaimedatachievingdifferent reasons like parameterization and over-fitting. This is based on ensembledecisiontrees,whereeachtreeistestedindependently[40]. Thisallowsadatasettobesplitintorandomsamples.Forexample,

givenaclassc,therandomforestcanbeusedintheestimationofthe probabilitythatpredictscforasampleasfollows:

PðcjXÞX

N i¼1

pnðcjXÞ (6)

whereP(c|X)becomestheestimateddensityoftheclasslabels. Giventhatliveforensicsconsistsofdatawithcontinuousfeatures,the learningmethodswouldbemoresuitableinthiscontext.

3.4. Stagesofpreparingliveforensicdata

Thissectiongivesgeneralinsightsthatareusedwhilepreparinglive forensicdataforinvestigationusingmachinelearningapproaches.  FeatureEngineering:Featureengineeringisaprocessinthis

frameworkthatdistinctivelyallowfortheselectionofvarioussubsets ofexclusivefeaturesfromasetofcollectedlivedatacomingfromECO. Thisallowsonetobeabletoobtainimportantorselectivefeaturesthat

(5)

allowproperclassification.Usingfeatureengineeringalsoallowsfor theidentificationofamultitudeoffeaturesthatcanassistduring classification[41].Thesuggestedframeworkwhenfullyimplemented inanIoTenvironmenttorealiseadigitalforensictoolwouldaimto utilizeMachinelearningalgorithmssuchasthosediscussedearlierto automatedfeatureidentificationinordertoavoidunnecessary redundanciesduringfeatureselectionandelimination.Whilethis studydoesnotemployadirectscenariowithrespective implementa-tion,itsensitizesontheneedforemployingsupervisedtechnique duringliveforensics.

 FeatureSelection:Featureselectionbasicallyattemptstodistinctively selectfromasubsetoffeaturesX,asetofYfeaturesandthroughthisit minimallyidentifiesthesufficientfeaturesthatarenecessaryto improvehowagivenclassifiedmodelcanaccuratelypredictthe outcomeofliveanalysisprocess[42].Consequently,theultimategoal ofemployingfeatureselectioninthiscontextistoallow,during forensicanalysis,thelearningmodeltoidentifyusefulandkeysubsets fromadistributionoffeaturesandbeabletomapthemtotheoriginal classdistributionbasedontheidentifiedfeatures[43].

 FeatureElimination:Featureeliminationemphasizesthatagivenlive forensicdatasetcanforensicallybeusedtojudgetheexclusivefeatures presentinthedatasetasbeinguseful/relevantornot.Inthiscontext relevant/usefulhasbeenusedtoshowwhetherthosefeaturesareina positionofbeingeliminatedornot.Itisimperativetopnotethata numberoffactorsmaycontributetothiselimination,forexample,in ECOthereexistrampantdynamicconfigurationandreconfigurationof devicesofemergingdevicesthatprovidemassivedata.Also,overtime someofthechangesinthetechnicalandtechnologicalaspectsmay hinderidentificationoffeaturesthatneedstobeeliminatedandthis maybeabottleneckwhenitcomestodigitalforensics.Inthisprocess, careshouldbetakenbecauseitispossibletoremovewhatmaybeuseful intheprocess[44,45].Also,[44,45]hasidentifieddifferentstrategies forforensicprofilingadversariesinthewakeofaforensicinvestigation whiledoingfeatureelimination.Thisisowingtothefactthat behavioralchangesisacommonaspect.

 FeatureNormalization:Theimportanceofnormalizingthefeaturesofa givendataduringliveforensicwouldbetoindependentlygiveroomto normalizeeachofthefeaturebasedonsomegivenrangegiventhat extracteddatafromdifferentsourcesnormallyconsistofavarietyof features[46].Alsousingdifferentfeaturedistancesmeasureslikethe Euclideandistances,Manhattandistanceetcmayassigndifferent weightstotheseextracteddata.Basedonthatfeaturenormalization becomesimportantbecauseitcanbalancetherangeofthesefeatures basedonthecomputingsimilarity.Generally,theprocedureinvolves transformingthefeaturecomponentsstatisticallysuchthatthevalues areabletogivecorrectorbetterestimatesofthefeatures.Basedonthe collecteddatafromIoTenvironments,thefeaturesofthecollecteddata couldbetransformedbasedonauniformrandomvariable,basedon ranksorbasedonsomescalingapproaches[46].

 FeatureRepresentation:Onecorecharacteristicsofadynamic environment,suchastheECO,istheintegrationofmultiplesourcesof informationintoacentralizedprocess.Therefore,whenseveral featuresfrommultiplesourcesareaggregatedoveragivenspectrumof analysis,therewillbeaneedtodefineauniqueformatfortheinstances ineachfeaturevector.Furthermore,afeaturespacedataformatcanbe definedtoaccommodatethepotentialheterogeneityofdata.Asaway toensuresuchprocess,thefeaturerepresentationphasewilldefinethe dataformat.

3.5. ImplementationFeasibilityofMachineLearninginECOinIoTPlatform ThegenericframeworkgiveninFig.1,isfurtherdesignedusingthe architecturalmodelforECOinIoTplatformdevelopedby[1],asshownin

Fig. 2. This consideration is then used to develop an hypothetical investigativescenario,throughwhichtheimplementationfeasibilityof supervised machine learningapproach anddigital forensicreadiness

(DFR)canbeevaluated.TheintegratedECMproposedinthisscenario consist of a goal manager, adaptation manager, context manager, enactmentengine,knowledgebase,anddigitalforensicreadinessengine. Byfunction,thegoalmanagerinterpretsthegoaloftheuser(aforensic investigatorinourcase)tocoordinateECOsthatcanbeusedtoachieve thegoal.TheAdaptationmanagerattemptstoaligntheECOstothe dynamismofthegoalandtheenvironment.Thecontextmanageronthe otherhand,attemptstomaintainthecontextualdynamismoftheECOs, whiletheenactmentengineisresponsibleforenactingECOsbyensuring thatECOsconstituentsperformfunctionalitiesinspecificsequence.The knowledgebaseservesasthesystemscontainerfortheECO.Referto[1]

fordetailsofthesecomponents.TheDFRengineisamechanismthat identifies, captures andstorespotential digital contentfrom theIoT platformbased on pre-defined rules(adaptive rule table).Such pre-defined rulesare alignswith thecontext maintained bythe context manager,andthespecificsequenceof functionalities ensured bythe enactmentengine.

Thus,theDFRengineprovideapreemptiveandproactiveapproachfor IoTinformationcollection,inmannerthatcanbeusedduringaforensic investigation.Furthermore, thenotion of DFR posit thatthe forensic soundnessoftheinformationcollectedisensured,suitableforlitigation. Inputfromthemachinelearningprocessplaysacriticalroleinthisregard. Tocorrectlyidentifythecompositionofpotentiallyviabledigitalevidence, rulesbasedondecisiontrees(RandomForestandC4.5decisiontreesfor instance),andevenNaiveBayesalgorithmcanbeleveragedtoidentifyand extractpotentialdigitalevidencefromagivencontextwithinagiventhe sequenceoffunctionalitiesofeachECOs.However,toensurethedegreeof accuracy of such rules, distance measures such as Manhattan and Minkowskidistancefunctions,andotherdissimilaritymetricsasdepicted in[47]canbeleveraged.Furthermore,theprocessofclassifyingpotential digitalinformationwouldrequireanalgorithmthatisrobusttonoise.Also, suchanalgorithmwould be fairlyrobust todimensionalitychallengesoften associatedwithsuchasexploratoryprocess.Inthisregard,classifierssuch asmulti-classsupportvectormachine,andtheNeuralNetworkfamilies can beconsidered.

4. Hypotheticalinvestigativescenario

We present a hypothetical scenario that dynamically conducts forensicactivitiesthataidinpotentialincidentidentificationbyfocusing onthreemainaspects:Collectingstreamedsensordata,analysisofthe stateofthecollectedevidencethroughdynamicdiscoveryandreporting thefindings.Inresponsetoreportsaboutapotentialincidentintheharbor area(that produceda large bang),a Law Enforcement Agent(LEA) requests a dynamically Automated Forensic Incident Management System (AFSM) to “analyze the potential incident”. An ECO is dynamicallyformedfromthedynamicallydiscovered“things”located aroundthecrimescene,e.g.,soundsensorsandacameracontrolledbythe CMAs.Thesensorsandthecamerastreamsoundandvideotothelaw enforcementserver,respectively.Consequently,thecameraisalsoableto track suspects activities spontaneously. The server can process the streameddataandclassifytheincident(e.g.,shooting,accidentetc)and thiscanbeusedtodrawconclusionsthathelpsintheformationofan objectiveforensichypothesis.Asisshownin Fig.2,thehypothetical scenariocompriseanemergentconfigurationmanagerandan investiga-tioninvocationmodulesthatplaysasignificantroleinpotentialincident identification.Theconceptbehindthisapproachisthat,likeabroker,a requestercanqueryforthediscoveryofthingsandoncethethingsare discoveredtheresponseis relayedtotherequesterin order todraw conclusiononthepotentialincidentasisillustratedbyFig.3.

4.1. Investigationinvocation

Fromtheaforementionedscenario,theinvestigationisinvokedby wayofqueryingtheAFIMS,thathasaknowledgebasetoassistinforensic

(6)

incident identification approaches in IoT environments. This allows analysistobeconductedeffectivelybasedonrapiddynamicdiscoveryof thingsthatsurroundthecrimescene.Consequently,amongtheimportant aspectoftheinvestigationinvocationistheactivationof ECObythe AFMISthatmakesitpossibleforstreamedsensordatatobeanalysed usingsupervisedmachinelearningapproaches.

4.2. DFRemergentconfigurationmanager

Leveraging the architecture on emergent configuration manager (ECM)developedin[1]toformulatethescenario,anintegratedadigital forensic readiness (DFR) engine as a major component is further introduced to addressforensicinvestigation processrequirement. As statedin[1,3],anECMisresponsibleforthemanagementoftheemergent configuration in waysthat addresses emergent user needs, adaptive capabilities included. Furthermore, the emergent configuration is referredtoasacollaborativeenvironmentofdiverse‘Things’designed towardsacommongoalaswellastoaddressapotentiallyunforeseen contextual dynamism. The integrated DFR emergent configuration manageris thereforea mechanismthatiscapable of identifyingand storepotentialdigitalevidencethatwouldotherwisenotbeavailable whentheinvestigationprocessisinvoked.Theadaptedapproachfurther combines the DFR engine, and the supervised machine learning componentsintotheexistingECOarchitecture,toformanintelligent engine whichcanbe leveragedfor investigation.This Ideaisfurther depicted in the example scenario presented in Part-4 of Fig. 2. Specifically, the intelligent unit is responsible for the extraction of context,incidentidentification,andincidentanalysis.Whilstthemachine learningcomponentoftheintelligentforensicplatformcanbeusedto extract meaningfulpattern fromthecontext extractedfrom theECO streameddata,theDFRenginecanbeusedtoascertainanddocumentthe potential incidentwhich the investigatorwouldthenuse toconduct investigation.SufficeittonotethattheDFRenginepresentsaproactive mechanismfor apotential investigation.This notion isbased onthe assertionthataproperlydevelopedDFRenginewillhavethecapacityto queryandbequeried,aswellasastoragepotential.However,thiscould furtheropenupapotentialincidentcategorization andidentification challenge,asextensivelyhighlightedin[48].

5. Discussions

TheheterogeneityandthedynamiccompositionofanECOrepresents aclassicalfeatureengineeringproblemwhichunderminesthereliability (particularlytheareaunderthereceiveroperatingcharacteristicscurve -AUC)ofanymachinelearning(supervised,reinforced,semi-supervised, or unsupervised) approach. Fundamental to this problem is the probability of extracting relevant anduseful configuration data that canbeleveragedtoconductaliveforensicanalysis.TheproposedECO framework(asdepictedintheHigh-levelapproachinFig.1)presents baselinefortherealizationofaliveforensicanalysisinanygivenIoT environment.However,thechallengeofforensicanalysis,specificallyin adynamicenvironment,containsmyriadofchallengeswhichshouldbe addressedgoingforward.Oneofsuchchallengesincludethepotentialof largefeaturespacewhichistypicallytermed“curseofdimensionality”in thesoftcomputingdiscipline.Additionally,thedynamicdiscoveryof things within the proximity of the crime scene as is shown in the hypotheticalscenario(seeSection4)showsthatfundamentallystreamed sensordatacouldeasilybe usedtoconductliveforensicanalysisfor purposesof incident identificationgiven that theAFIMS could only triggeredwhenapotentialincidentisthoughttohaveoccurred.Basedon this,theauthorshavebeenabletoputacrossfar-reachingpropositions thathaveafocusonhowECOcanbeutilisedinIoTenvironmentto achieve this objective. Consequently, several feature engineering approaches and supervised machine learning approaches have been developed inthe soft computingdomain toattemptto addresssuch challenge. However, such solution would further require a context dependentapproachtobetterengineerandcontextualizethesolutionto achieve areliable outcome.Expectantly, the induction of dimension reductionalgorithms wouldgenerate acontext-dependentweight for featureswithinthefeaturespace.Consequently,theweightofagiven featurewithinthefeaturespacecan beusedtoredesigntheforensic analysisprocess.Studiesin[49,50] haveexploreddiversesupervised learningalgorithmsthatcanbeappliedtoaugmentsuchalive(near-real time) analysis process. Besides, another potentially fundamental challengeis theprocessofascertaining therelevance ofeachfeature inthefeaturespace,beyondthesemanticweightofthefeature.Whilst dimensionality reduction algorithms, such as principal component analysis,aresuitableandfundamentallyrequiredinanydynamicdata classificationtasks,thedegreeof(forensic)evidentialusefulnessofthe featurepresentsalogicalchallengetowardsthereliabilityoftheforensic process.Thisisessentiallyimportantinaliveforensicanalysisprocess wherealowcomputationaltimeisrequired.Inaddition,thedegreeof accuracyisrequiredtobeveryhigh,asfalseerrorrateisexpectedtothe minimizedto0.001[51].

Thedefinitionofappropriatemetricsofevaluationwouldbeanother area of interestin this proposed approach. Existing soft computing metrics such as accuracy, specificity, equal error rate, AUC, and F-measure are often suggested to be effective. However, given the contextualandlive-natureoftheproposedanalysisapproach,theneed todevelopacontext-basedevaluationmetricscouldarise.Thisiscouldbe essentialwhennoaprioriinformationordatabasemightbeavailable.The lackofaprioriinformationwouldevidentlysuggestthatanunsupervised machinelearningapproachwouldbeconsidered,orareinforcedlearning approach.Thiscan,however,beextensivelyexploredinthe experimen-tationphaseoftheproposedapproach.

Peculiar to the proposed analysis process is the potential of a supervised ML approach to data analysis. Whilst the unsupervised approach could provide a direct approach to analysis through clusterization,theinductionof asupervisedapproachcanbeusedto finetunethedegreeof accuracyof theanalysisprocess. Asupervised approachpositsthattheinputdatastreamisparsedintoclasseswhichare then fed into the learning algorithm(s). The class-formation would, therefore, be a potential challenge in an ECO in IoT, which is characterizedbyheterogeneousstreamsofdatasources.However,given

(7)

thatECOsaresystemsformedbya setof things,withtheirservices, functionalities,andapplications,thatcooperatetemporarilytoachieve someusergoal,aninvestigatorcouldleveragethecommonalitytodefine classes.Forinstance,inputstreamsfromapplicationsandservicesfrom different‘Things’canbeclassifieddistinctlyusingidentifiersfromsuch sources.Consequently,thiscanprovideabaselineforextractingclasses forthesupervisedmachinelearningprocess.However,thereexistthe potentialofmiss-classificationexceptwhenafundamentalframeworkis definedasabaselineforclassformation.Therefore,aforensicanalysis processinanemergentconfigurationinIoTenvironmentwouldrequire thedefinitionofsuchclassidentificationandextractionprocess.

6. Conclusionandfutureworks

We explained the importance of a context-dependent on-the-fly forensicanalysisprocesstofacilitateliveforensicanalysisonemergent configurationin IoTenvironment.Specifically,ourconceptual frame-workleveragesNISTSP800-86standardandsupervisedMLapproaches. Suchaproposedapproachhasthepotentialtobeagamechangerin IoTforensics,althoughextensiveevaluationsondifferentdatasetsfroma broadrangeofapplicationsarerequired.However,carefulplanningon the evaluation scenarios is required. Hence, one potential research agenda is tocollaborate closely withrelevant stakeholder groupsto designanddevelopdifferentevaluationscenarios.

Oncetheseevaluationscenarioshavebeendeveloped,wewillalso evaluate a prototype of our proposed framework in the different scenarios.Thiswillallowustoidentifyanylimitations,forexamplein theMLtechniques,scenarios,orconfigurations.

DeclarationofCompetingInterests

Theauthorshavenocompetingintereststodeclare. AppendixA.Supplementarydata

Supplementarydataassociatedwiththisarticlecanbefound,inthe onlineversion,athttp://dx.doi.org/10.1016/j.fsir.2020.100122. References

[1]F.Alkhabbas,R.Spalazzese,P.Davidsson,Eco-iot:Anarchitecturalapproachfor realizingemergentconfigurationsintheinternetofthings,EuropeanConferenceon SoftwareArchitecture(2018)86–102.

[2]F.Alkhabbas,R.Spalazzese,P.Davidsson,Architectingemergentconfigurationsinthe internetofthings,2017IEEEInternationalConferenceonSoftwareArchitecture (ICSA)(2017)221–224.

[3]F.Alkhabbas,R.Spalazzese,P.Davidsson,Emergentconfigurationsintheinternetof thingsassystemofsystems,2017IEEE/ACMJoint5thInternationalWorkshopon SoftwareEngineeringforSystems-of-Systemsand11thWorkshoponDistributed SoftwareDevelopment,SoftwareEcosystemsandSystems-of-Systems.(JSOS)(2017) 70–71.

[4]F.Alkhabbas,M.Ayyad,R.-C.Mihailescu,P.Davidsson,Acommitment-based approachtorealizeemergentconfigurationsintheinternetofthings,2017IEEE InternationalConferenceonSoftwareArchitectureWorkshops(ICSAW)(2017)88– 91.

[5]F.Alkhabbas,R.Spalazzese,P.Davidsson,Iot-basedsystemsofsystems,Proceedings ofthe2ndeditionofSwedishWorkshopontheEngineeringofSystemsofSystems (SWESOS2016)(2016).

[6]W.Wu,J.Alvarez,C.Liu,H.-M.Sun,Botdetectionusingunsupervisedmachine learning,MicrosystemTechnologies24(2018)209.

[7]N.L.Beebe,J.G.Clark,Digitalforensictextstringsearching:Improvinginformation retrievaleffectivenessbythematicallyclusteringsearchresults,DigitalInvest.4 (2007)49.

[8]H.H.Pajouh,R.Javidan,R.Khayami,D.Ali,K.-K.R.Choo,Atwo-layerdimension reductionandtwo-tierclassificationmodelforanomaly-basedintrusiondetectionin iotbackbonenetworks,IEEETransactionsonEmergingTopicsinComputing(2016).

[9]A.J.Tall’on-Ballesteros,J.C.Riquelme,Dataminingmethodsappliedtoadigital forensicstaskforsupervisedmachinelearning,ComputationalIntelligenceinDigital Forensics:ForensicInvestigationandApplications(2014)413–428.

[10]M.Mohlala,A.R.Ikuesan,H.S.Venter,Userattributionbasedonkeystrokedynamics indigitalforensicreadinessprocess,2017IEEEConferenceonApplication InformationNetworkSecurity(AINS)(2017)124–129.

[11]I.R.Adeyemi,S.AbdRazak,M.Salleh,Understandingonlinebehavior:exploringthe

probabilityofonlinepersonalitytraitusingsupervisedmachine-learningapproach, Front.ICT3(2016)8.

[12]N.Huang,J.He,B.Zhao,G.Liu,Forensicanalysisofdistributedcomputingnetwork basedondecisionvalues,2016InternationalSymposiumonComputerConsumer Control(IS3C)(2016)423–427.

[13]M.Buccoli,P.Bestagini,M.Zanoni,A.Sarti,S.Tubaro,Unsupervisedfeaturelearning forbootlegdetectionusingdeeplearningarchitectures,2014IEEEInternational WorkshoponInformationForensicsSecurity(WIFS)(2014)131–136.

[14]V.H.Bhide,S.Wagh,i-learningiot:Anintelligentselflearningsystemforhome automationusingiot,2015InternationalConferenceonCommunicationsandSignal Processing(ICCSP)(2015)1763–1767.

[15]H.-S.Ham,H.-H.Kim,M.-S.Kim,M.-J.Choi,Linearsvm-basedandroidmalware detectionforreliableiotservices,J.Appl.Math.2014(2014).

[16]N.M.Karie,V.R.Kebande,H.Venter,Divergingdeeplearningcognitivecomputing techniquesintocyberforensics,ForensicSci.Int.:Synergy1(2019)61.

[17]M.N.A.Khan,DigitalForensicsusingMachineLearningMethodsPh.D.thesis,school UniversityofSussex, 2008.

[18]S.Costantini,G.DeGasperis,R.Olivieri,Digitalforensicsandinvestigationsmeet artificialintelligence,Ann.Math.Artif.Intel.(2019).

[19]F.Mitchell,Theuseofartificialintelligenceindigitalforensics:Anintroduction, DigitalEvid.Elec.SignatureL.Rev.(2010).

[20]P.P.Ray,Asurveyoninternetofthingsarchitectures,J.KingSaudUniv.-Comput. Inform.Sci.(2018).

[21]S.Khorashadizadeh,A.R.Ikuesan,V.R.Kebande,Generic5ginfrastructureforiot ecosystem,InternationalConferenceofReliableInformationandCommunication Technology(2019)451–462.

[22]R.-C.Mihailescu,R.Spalazzese,C.Heyer,andP.Davidsson,Arole-basedapproachfor orchestratingemergentconfigurationsintheinternetofthings,arXivpreprint arXiv:1809.09870(2018).

[23]V.R.Kebande,I.Ray,Agenericdigitalforensicinvestigationframeworkforinternetof things(iot),2016IEEE4thInternationalConferenceonFutureInternetofThings Cloud(FiCloud)(2016)356–362.

[24]S.Li,K.-K.R.Choo,Q.Sun,W.J.Buchanan,J.Cao,Iotforensics:Amazonechoasause case,IEEEInternetThingsJ.6(2019)6487.

[25]X.Zhang,O.Upton,N.L.Beebe,K.-K.R.Choo,Iotbotnetforensics:Acomprehensive digitalforensiccasestudyonmiraibotnetservers,ForensicSci.Int.:DigitalInvest.32 (2020)300926.

[26]X.Zhang,K.-K.R.Choo,DigitalForensicEducation:AnExperientialLearning Approach,Vol.61,Springer, 2019.

[27]X.Zhang,K.-K.R.Choo,N.L.Beebe,Howdoisharemyiotforensicexperiencewiththe broadercommunity?.anautomatedknowledgesharingiotforensicplatform,IEEE InternetofThingsJ.6(2019)6850.

[28]O.Alkadi,N.Moustafa,B.Turnbull,K.-K.R.Choo,Adeepblockchain framework-enabledcollaborativeintrusiondetectionforprotectingiotandcloudnetworks,IEEE InternetThingsJ.(2020).

[29]M.Saharkhizan,A.Azmoodeh,A.Dehghantanha,K.-K.R.Choo,R.M.Parizi,An ensembleofdeeprecurrentneuralnetworksfordetectingiotcyberattacksusing networktraffic,IEEEInternetThingsJ.(2020).

[30]F.Alkhabbas,M.DeSanctis,R.Spalazzese,A.Bucchiarone,P.Davidsson,A.Marconi, Enactingemergentconfigurationsintheiotthroughdomainobjects,International ConferenceonService-OrientedComputing(2018)279–294.

[31]R.-C.Mihailescu,J.Persson,P.Davidsson,U.Eklund,Towardscollaborativesensing usingdynamicintelligentvirtualsensors,InternationalSymposiumonIntelligentand DistributedComputing,Springer,2016,pp.217–226.

[32]A.Tegen,P.Davidsson,R.-C.Mihailescu,J.A.Persson,Collaborativesensingwith interactivelearningusingdynamicintelligentvirtualsensors,Sensors19(2019)477.

[33]A.Keramati,R.Jafari-Marandi,M.Aliannejadi,I.Ahmadian,M.Mozaffari,U.Abbasi, Improvedchurnpredictionintelecommunicationindustryusingdatamining techniques,Appl.SoftComput.24(2014)994.

[34]C.Cortes,V.Vapnik,Support-vectornetworks,MachineLearn.20(1995)273.

[35]Q.He,J.-F.Chen,Theinverseproblemofsupportvectormachinesanditssolution, 2005InternationalConferenceonMachineLearningandCybernetics,Vol.7,IEEE, 2005,pp.4322–4327.

[36]Z.Liu,L.Bai,Evaluatingthesuppliercooperativedesignabilityusinganovelsupport vectormachinealgorithm,200812thInternationalConferenceonComputer SupportedCooperativeWorkinDesign,IEEE,2008,pp.986–989.

[37]L.-m.He,X.-b.Yang,F.-s.Kong,2006InternationalConferenceonMachineLearning Cybernetics,IEEE, 2006,pp.3503–3507Supportvectormachinesensemblewith optimizingweightsbygeneticalgorithm.

[38]Y.N.Dewi,D.Riana,T.Mantoro,Improvingna”ivebayesperformanceinsingleimage papsmearusingweightedprincipalcomponentanalysis(wpca),2017International ConferenceonComputing,Engineering,andDesign(ICCED),1(2017).

[39]S.N.N.Alfisahrin,T.Mantoro,Dataminingtechniquesforoptimizationofliverdisease classification,2013InternationalConferenceonAdvancedComputerScience ApplicationsTechnologies,IEEE,2013,pp.379–384.

[40]L.Breiman,Randomforests,Mach.Learn.45(2001)5.

[41]V.N.Garla,C.Brandt,Ontology-guidedfeatureengineeringforclinicaltext classification,J.Biomed.Inform.45(2012)992.

[42]M.Dash,H.Liu,Featureselectionforclassification,IntelligentDataAnal.1(1997) 131.

[43]P.M.Narendra,K.Fukunaga,Abranchandboundalgorithmforfeaturesubset selection,IEEETrans.Comput.917(1977).

[44]Z.M.Hira,D.F.Gillies,Areviewoffeatureselectionandfeatureextractionmethods appliedonmicroarraydata,Adv.Bioinform.(2015).

[45]K.Kira,L.A.Rendell,etal.,Thefeatureselectionproblem:Traditionalmethodsanda newalgorithm,Aaai,Vol.2(1992)129–134.

(8)

[46]S.Aksoy,R.M.Haralick,Featurenormalizationandlikelihood-basedsimilarity measuresforimageretrieval,PatternRecognit.Lett.22(2001)563.

[47]A.R.Ikuesan,M.Salleh,H.S.Venter,S.A.Razak,S.M.Furnell,Aheuristicsforhttp trafficidentificationinmeasuringuserdissimilarity,Human-IntelligentSyst. Integration1(2020).

[48]A.Al-Dhaqm,S.Razak,D.A.Dampier,K.R.Choo,K.Siddique,R.A.Ikuesan,A.Alqarni, V.R.Kebande,Categorizationandorganizationofdatabaseforensicinvestigation

processes,IEEEAccess1(2020).

[49]A.R.Ikuesan,S.A.Razak,H.S.Venter,M.Salleh,Polychronicitytendency-basedonline behavioralsignature,Int.J.MachineLearn.Cybernet.10(2019)2103.

[50]I.R.Adeyemi,S.A.Razak,M.Salleh,H.S.Venter,Observingconsistencyinonline communicationpatternsforuserre-identification,PLOSONE11(2016)e0166930.

[51]A.R.Ikuesan,H.S.Venter,Digitalbehavioral-fingerprintforuserattributionindigital forensics:Arewethereyet?DigitalInvest.30(2019)73.

Figure

Fig. 2. Scenario representation based on ECO (adapted from [1]).

References

Related documents

Stöden omfattar statliga lån och kreditgarantier; anstånd med skatter och avgifter; tillfälligt sänkta arbetsgivaravgifter under pandemins första fas; ökat statligt ansvar

Generally, a transition from primary raw materials to recycled materials, along with a change to renewable energy, are the most important actions to reduce greenhouse gas emissions

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

Från den teoretiska modellen vet vi att när det finns två budgivare på marknaden, och marknadsandelen för månadens vara ökar, så leder detta till lägre

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

• Utbildningsnivåerna i Sveriges FA-regioner varierar kraftigt. I Stockholm har 46 procent av de sysselsatta eftergymnasial utbildning, medan samma andel i Dorotea endast

Den förbättrade tillgängligheten berör framför allt boende i områden med en mycket hög eller hög tillgänglighet till tätorter, men även antalet personer med längre än

På många små orter i gles- och landsbygder, där varken några nya apotek eller försälj- ningsställen för receptfria läkemedel har tillkommit, är nätet av