Latent space manipulation for high-resolution medical image synthesis via the StyleGAN
LukasFettya,∗,MikaelBylundb,PeterKuessa,GerdHeilemanna,TufveNyholmb,DietmarGeorga,TommyLöfstedtb
aDepartmentofRadiationOncology,MedicalUniversityofVienna,Vienna,Austria
bDepartmentofRadiationSciences,UmeåUniversity,Umeå,Sweden Received20December2019;accepted1May2020
Abstract
Introduction: ThispaperexploresthepotentialoftheStyleGANmodelasanhigh-resolutionimagegeneratorforsynthetic medicalimages.Thepossibilitytogeneratesamplepatientimagesofdifferentmodalitiescanbehelpfulfortrainingdeep learningalgorithmsase.g.adataaugmentationtechnique.
Methods: TheStyleGANmodelwastrainedonComputedTomography(CT)andT2-weightedMagneticResonance(MR) imagesfrom100patientswithpelvicmalignancies.Theresultingmodelwasinvestigatedwithregardstothreefeatures:
ImageModality,Sex,andLongitudinalSlicePosition.Further,thestyletransferfeatureoftheStyleGANwasusedtomove imagesbetweenthemodalities.Theroot-mean-squarderror(RMSE)andtheMeanAbsoluteError(MAE)wereusedto quantifyerrorsforMRandCT,respectively.
Results: Wedemonstratehowthesefeaturescanbetransformedbymanipulatingthelatentstylevectors,andattemptto quantifyhowtheerrorschangeaswemovethroughthelatentstylespace.Thebestresultswere achievedbyusing the styletransferfeatureoftheStyleGAN(58.7HUMAEforMRtoCTand 0.339RMSEforCTtoMR).Slicesbelowand aboveaninitialcentralslicecanbepredictedwithanerrorbelow75HUMAEand0.3RMSEwithin4cmforCTandMR, respectively.
Discussion: TheStyleGANisapromisingmodeltouseforgeneratingsyntheticmedicalimagesforMRandCTmodalities aswellasfor3Dvolumes.
Keywords: StyleGAN,Imagesynthesis,Latentspace
Introduction
Medicalimaginghasbeenhighlightedasoneoftheareas wheredeeplearninghasthelargestimplications,andgreat- estpotential[1–3].Forinstance,decisionsupportsystemsfor radiologicalevaluationsofimageshavebeendescribedinsev- eralrecentreviewsandpublicationsas an areawheredeep learningcanleadtosignificantbenefitsfor thepatients [4].
Therearemanyotherpotentialapplications,suchassegmen- tationanddelineationoforgansoreventumourregions,which
∗Correspondingauthor:LukasFetty,DepartmentofRadiationOncology,MedicalUniversityofVienna,Vienna,Austria.
E-mail:lukas.fetty@meduniwien.ac.at(L.Fetty).
areimportantforradiologicalandradiotherapy applications [5,6],imageimprovementandsuper-resolution[7],creationof attenuationmapsforattenuationcorrectionofPET/MRdata orfor novelindividualised radiotherapy treatment planning concepts[8–10],etc.
Theseapplicationsallrequire,andwillcontinuetoneed, largesets oftraining datathatspanthe populationvariabil- itywellinordertoavoidoverfittingandtoproducereliable results. This is problematic for medical applications since medicaldataisusuallyscarce,anditischallengingtoshare
ZMedPhys30(2020)305–314
https://doi.org/10.1016/j.zemedi.2020.05.001 www.elsevier.com/locate/zemedi
medicaldatawiththirdpartiesorevenbetweendifferenthos- pitalsbecauseofpatientintegrityconcerns[1].
Sincegenerativeadversarialnetworks(GANs)wereintro- duced in 2014 [11], they have been used for image augmentation in numerous medical applications [12,13].
GANsarealsousedinmanyotherapplications[14],suchasin imageregistration,imagesegmentation,andimage-to-image translationtasks,justtonameafew.
One of the main reasonsbehind the developmentof the GANwasthepossibilitytosynthesiseartificialdatafromcom- pletelyunlabelledtrainingdata.Thisapplicationhasalsobeen investigatedbyseveralresearchgroups[15,16].However,the methods that have been available for generating synthetic imageshavenotscaledtohigh-resolutionimages.Becauseof this,thepublicationsthatdescribetheuseofsyntheticimages havemostlydealtwiththelow-resolutioncase,andhavethus beenseverelylimited.
Recently,Karrasetal.[17]proposedanovelGANmodel thatincludesaprogressiveincreaseoftheoutputimageres- olution duringtraining, andthe endresultisthe successful synthesisofrealistichigh-resolutionimages.Theythenfur- therimprovedthemodelandalsomadeitpossibletoinclude stochasticityandstyletransferinthegenerationprocess;the improvedprogressive GANwascalledStyleGAN[18].The output images are now of sufficiently high resolution and qualitythat thegeneratedimages couldpotentially beused toaugmentamedicalimagedataset.The abilitytotrainon syntheticimageswouldthusalleviatethesmalldatasetprob- lem, allowing deep learning models to be trained on large amountsofsyntheticdata.
Sincethelatentspacebecomesahigh-levelrepresentation oftheimages,certainknownattributesoftheimagesandthe correspondinglatentvectorscanbeusedtolearnafunctionor directionthatdescribestheattributes[19,20].Inordertouse thegeneratedimagesfortraininge.g.deeplearningmodels, it is imperative to understandthe latent space, what it can encode,andhowitisorganised.
In the present study, the StyleGAN model’s latent style spacewasinvestigated,providingadeeperunderstandingof the internalstructureof the network.Further,the possibili- tiestomanipulate thelatentspacewasreexaminedinorder to generate customised high-dimensional medical images.
The StyleGAN model was investigated after being trained on images of two different modalities: T2- weighted mag- netic resonance (MR) images and computed tomography (CT) images, captured in the pelvic regions of both male and female patients with various cancer diagnoses. Three image attributes (Image Modality, Sex, and Longitudinal Slice Position) wereselected andmethods were developed tomanipulatethemsuchthat imageswithcustomrepresen- tationof theseattributescould begeneratedinacontrolled manner.
Material and Methods
Data
The data used in this study were collected from 117 patients undergoingtreatment atUmeåUniversityHospital.
Thepatientsweremainlydiagnosedwitheitherprostate,rec- tal, or gynaecological cancer, andwere imaged using both MR and CT as part of the routine clinical workflow. The MR imageswereacquiredwithaGE3TSIGNAPET/MR, andthe CTimageswithaPhilipsBrillianceBigBore.See thesupplementarymaterialforacquisitiondetails.The data collection was performed according to an existing ethical permit(number2018-234-31M),andinformedconsentwas obtainedfromeachpatient.Patientswithmetalhipimplants wereexcluded, leaving15female(meanage68years)and 85malepatients(meanage70years).TheMRimageswere T2-weightedandwerebias correctedusing N4ITK;theCT imageswererigidlyregisteredtotheMRimagesusingElastix with standard settings and were further resampled to the samematrixdimensionof512×512pixels.Thepreprocess- ing was performed using MICE Toolkit (NONPI Medical AB,Umeå,Sweden;www.micetoolkit.com).TheMRimages were normalisedbyscaling therange [0,2500]to[−1,1], and the CT images by scaling [−1024, 1500] to [−1, 1], withouttruncation.Intotal17,542imageswereusedfortrain- ing whichincluded on average 88 images per patient and modality.
Networkarchitecture
TheStyleGANarchitecturediffersfromtheoriginalGAN model in several ways. As presented in Karras et al., the StyleGANmodeldiffersinthreemajorways:
•ThemainnetworkisaprogressivelygrowingGANwhere thegeneratornetworkfirstlearnstogeneratelow-resolution images,andthenprogressivelygenerateslargerandlarger imagesasthetrainingprogresses.Thismakesthenetwork convergeevenforhigh-dimensionaloutputs.
•Theinputtothe networkis ad-dimensionalindependent Gaussianrandomvectorwithzeromeanandvarianceone.
Thisinputspace, denoted Z,is mapped toan intermedi- ate latent space, denoted W, by a dense neural network.
Thissteptransformstheinputspacetoastylespace,that isassumedtobemoredisentangled,havingdifferentimage features encodedalong different (approximately) orthog- onaldimensionsinthestylespace.Smoothinterpolations whereindividualfeaturesarecontrolledshouldthereforebe possibleintheWspace.TheWspaceisthennormalised (adaptiveinstancenormalisation,AdaIN)[21],andfedto thedifferentlayersofthegeneratornetwork.
•Finally,theStyleGANmodelalsoacceptsnoiseinjections ateachresolutioninordertointroducestochasticdetailsin theimages.
Training
TheStyleGANmodel1wasimplementedinPyTorchv1.0 [22]insuchawaythatit takesarandomlatent vectorand outputsa2Dimage.TheAdamoptimiser[23]wasusedwith momentumparametersβ1=0andβ2=0.99.Theweightswere exponentiallyaveragedwithadecayrateof0.999asin[17].
Theinitial learningrate wassetto 0.001forboththe gen- eratorandthediscriminator,andto0.0001forthemapping networkthattransformsthelatentvectorsfromtheZtotheW space.Mixingregularisationwasalsoincluded,whichinjects asecond latent vectorat arandomresolution duringtrain- ing.Asinthe originalStyleGAN paper,two lossfunctions wereincluded,namelytheWassersteinloss withagradient penalty(WGAN-GP)[24]andanon-saturatingloss[11]with R1regularisation[25].Themini-batchsizewasprogressively decreasedfrom256to8imagesperbatch,andwasdecreased whentheresolutionwasincreased.
TheStyleGANmodelwas trainedontwo NVIDIARTX 2080TiGPUsforuptoatotalof2,400,000gradientupdates.
Imagequality
TheFr´echetinceptiondistance(FID)wasusedtoevaluate thequalityofthegeneratedimages.TheFIDwascomputed usingallthetrainingimagesandusing10,000randomimages generatedwithafixedtruncationlevelof0.7[26].
The FID was computed at every 100,000 updates from 700,000to2,400,000andusedtoselectthefinalmodel.
Latentspacemanipulation
The manipulation of the modality (MR to CT and the reverse),thesexofthe patient,andhowtochangethe lon- gitudinalslicepositionof thepatient(the sliceindexinthe imagevolume)wasinvestigated.
Inordertofindlatentdirectionsthat encodethefeatures, animageencoder-decoderwascon-structedtogeneratestyle vectorswfromimagesandviceversa.Itwasbuiltusingthe StyleGANgenerator as a(fixed)decoder. The encoderhad thesamestructureastheStyleGANgenerator,butinreverse, andwithinstancenormalisation[27]insteadofAdaIN.The convolutionalpartoftheencoderhadanoutputresolutionof 4×4×512,andthishiddenrepresentationwasfedtoafour- layerdenseneuralnetworkwithLeakyReLUactivationswith anegativeslopeof0.2,resultinginafinal512-dimensional outputintheWspaceoftheStyleGAN.
1 https://github.com/rosinality/style-based-gan-pytorch; This repository includesanimplementationoftheStyle-GANmodelwhichwasusedand adaptedforourexperiments.
TheencodernetworkwastrainedusingtheRAdam opti- miser [28]withβ1=0 andβ2=0.99,as for the StyleGAN training. The weights were exponentially averaged with a decayrateof 0.999[17].The lossfunctionwasthesumof theEuclidean distancebetween theinput randomW space vectorsandthereconstructedlatentstylevectorsoutputfrom theencoderandan1lossbetweentheinputimageandaStyle- GANregeneratedimage.Theencodernetworkwastrainedon 224,000randomlygeneratedimagesandcorrespondingstyle vectorswithamini-batchsizeof8.Therandomimageswere generatedusingatruncationlevelof0.7.
Theencoderwasthenusedinarefinementprocesstoembed thetrainingimagesintotheWspace.Thegeneratedstylevec- torwasappliedasaninitialguessandwasfurtherrefinedby minimising the loss between inputimage andthe regener- atedStyleGANimage.Sincetheinitiallatentvectoroutputof theencodercanbeunstable,therefinementwasintroducedto reducedifferencesbetweenthetargetimageandthegenerated image.Thelossfunctionfortherefinementwasacombination offeatureand2loss,
Lrefine(w) =Lfeatures(G (w) ,x) + G (w) −x2 (1) wherexistheinputimageandG(w)istheimagethatthe StyleGANregeneratedfromthestylevector w.Thefeature lossisdefinedas:
Lfeatures(w) =
r∈R
Dr(G (w)) −Dr(x)2 (2)
whereDristheoutputofthefirstconvolutionlayerofthe discriminator at resolutions r ∈ R={64, 128, 256, 512}.
RAdam was used with β1=0.9, β2=0.999, with an initial learningrateof0.001.Thenumberofiterationswaslimited to350 toconstraintherefined latentvectorstolienearthe initialguesses.
The refined style vectors from the training images were finallyusedtomodelthelatentspaceforfeaturemanipulation.
TheoverallworkflowcanbeseeninFigure1.
Latentspacemodels,andmodelselection
Themanipulationswereperformedbycreatingprediction modelsforthefeaturesofinterest,forwhichthegroundtruth wasknownfromthetrainingdata.Forthis,logisticregression wasusedasabaselinemodel,andhyper-parametersearches wereperformedinordertoattempttofindsuitabledeepdense neural networks for predictingthe features better than the baselinelogisticregressionmodels.
Inordertomanipulatethelatentstylevectors,therepresen- tationofeachstylevectorinthelasthiddenlayerofthefound predictionmodelhadtobetransferredbacktothelatentstyle spaceoftheStyleGAN.Therefore,anotherhyper-parameter searchoverdenseneuralnetworkswasperformedtopredict the corresponding latent style vectors from avector in the
Figure1.Anillustrationofthenetworkarchitecturesofthemodelsusedinthisstudy.A)AnillustrationofthearchitectureoftheStyleGAN model,andB)anillustrationoftheencoderstructureusedtotransformanimagetothelatentstylespace.C)Anillustrationofthetraining procedureoftheencoderandtherefinementprocedure.ThefourdifferentimagesinthetrainingprocessshowninC)symbolisesthebatch size.TheswitchinthetestingprocessinC)definestheinjectionoftheinitialguessoftheEncoderwhichisonlyusedinthefirstiteration.
lasthiddenlayerofthepredictionmodel(denotedthereverse model).
Detailsonthesehyper-parametersearchescanbefoundin thesupplementarymaterial.
Todetermineadirectioninthelatentspaceencodingthe longitudinal direction, regression networkswere trained to predictthenormalisedsliceindex(themostinferiorslicewas settozero, the mostsuperior slice wassettoone, andthe othersliceswerelinearlyassignedavaluebetweenzeroand one),butthisapproachhadproblemsconverging,andgener- allydidnotperformwell.Therefore,sliceswithanormalised index below 0.4 were instead assigned to class zero(infe- rior),andsliceswithnormalisedindexabove0.6toclassone (superior),andclassification networksweretrained.Hence, the procedurehere was the same as that for modality and sex.
In case the validation R2 (coefficient of determination) for the reversemodel times the validation accuracy of the predictionmodelwaslowerthanthelogisticregressionbase- linevalidationaccuracy,thelogisticregressionmodelswere selectedinsteadofthedensenetworks.
Interpolatingthelatentspace
The found predictionmodels wereused to manipulate a latentstylevectorbythetransformation
w =˜ Reverse(( ¯w∗+τ(Forward(w)− ¯w∗))+αwLR) (3) whereForwardtransformsthestylevectortothelasthidden layerofthepredictionmodel,andReversetransformsavec- torinthelasthiddenlayerofthepredictionmodelbacktothe stylespace,W.Thewisalatentstylevector,τ ∈ (0,1)the truncationlevel, ¯w∗=Forward( ¯w)is themean latentvector (themeanof1,000randomlygeneratedlatentvectors)trans- formedtothelasthiddenlayerofthepredictionmodel,α∈R isaweightcoefficientfortheattributemanipulation,andwLR
theparametervectorinthelasthiddenlayeroftheprediction model, i.e.inallcasesalogisticregressioncoefficientvec- torinthelasthiddenlayerofthepredictionmodel(notethat thedeep densenetworksalsoperformlogisticregressionin the lasthiddenlayer).Thecoefficientvector wLR describes the direction encoding a particular feature, and hence the
directioninwhichthestylevectorshouldbemovedinorder tochangethecorrespondingfeature.
Thestylespacemanipulationwasinvestigatedwithregards tothemodalityandthelongitudinalsliceposition,sincethe groundtruthswereavailableinthosecasesfromthepatient dataset.
Modality Forthe modality,1,000 slices were randomly selectedfromtheinitialpatienttrainingsetandtransformed usingEquation3bymoving thecorrespondingstyle vector inthedirectionoftheothermodalitybyscalingthedecision planenormalbyafactorintherange[1,100]in31stepsona logscale.Imagesgeneratedateachofthe31pointswerecom- paredtothegroundtruthimageoftheothermodality.Forthe generatedCTimagesthemeanabsoluteerrors(MAEs)were computedbetweenthetrueimagesandthegeneratedimages, andforthegeneratedMRimagestherootmeansquarederrors (RMSEs)werecomputed.
LongitudinalSlicePositionForthelongitudinalsliceposi- tion,allslicesfromall100patientswereused.Oneofthemost central51sliceswasselectedtobethemainslice,andanother ofthecentral51sliceswasselectedasthesoughtslice.The mainslicewastransformedbymovingthecorrespondingstyle vectorintheinferior-superiordirection,asidentified bythe predictionmodel,byafactorintherange[−0.8,0.8]andthe soughtslicewascomparedtothedecodedimagecorrespond- ingtothetransformedmainslicebytheMAEforgenerated CTimages,andtheRMSEforgeneratedMRimages.
Styletransfer
The StyleGAN also allows the generator output to be manipulatedbyusingthestyletransfercapabilityofthenet- work,wheretwolatentvectorsareincludedintothegeneration process(corresponding to sourceand targetimages). Style vectorscanbeinjectedintotheAdaINtogivethenetworkthe abilitytofusedifferentrepresentationsoftheimage,suchas e.g.anMRandaCTimage.
Thedegreeofmixingischangedasafunctionoftheinjec- tionlocation,i.e.intowhichresolutionlayerthesecondstyle vectorisinjected.Ifthevectorisinjectedintolow-resolution layers(e.g.4–64pixels),thisresultsinstrongmixingofthe imagecharacteristics. If the vector is instead injected into high-resolutionlayers(e.g.64–512pixels),itismainlycolour adaptationoftheimagesthatisachieved.Thisisalsoseenin theoriginalStyleGANpaper[18].
Styletransferwas performed by injecting 1,000random stylevectorsfromthetrainingdataintoallsevenlayers(reso- lutionsof4–512pixels),comparingtothepairedimagefrom theothermodality.ThegeneratedCTimageswerecompared usingMAE,andthegeneratedMRimagesusingRMSE.Qual- itativetestswerealsoperformed byvisually evaluatingthe network’sstyletransferability.
Analysisofthedecisionsurface
Inordertoimprovetheunderstandingofthedecisionsur- faces, and thus the disentanglement of the W space, the
averagecurvaturesofthepredictionmodel’sdecisionsurfaces werecomputedasameanstounderstandthefeatureseparation inthelatentstylespace.Thiswasdonebyrandomlyselect- ingpointsonthedecisionsurface,randomlyselectingtangent directions,computingthecurvatureofthegraphinducedby theintersectionofthetangent-normalplaneandthedecision surface.Foreachpoint,themeancurvatureatthepointwas estimatedasthemeanof512randomtangentdirections,and themeanandstandarddeviationwerecomputedfrom1,000 randompointsonthedecisionsurface.Seethesupplementary materialsfordetailsonthisprocedure.
Results
The networktraining took over onemonth using all the 17,542 images. The image quality increased progressively duringtraining,whenusingtheWGAN-GPlossfunction.Ini- tialtestsalsoincluded theR1 lossfunction,butled topoor imagequalityandafailuretoconvergetomeaningful out- puts.After1.2Mupdates,thenetworkperformancewasstill increasing,andsothetrainingwascontinuedforanother1.2M updates.
Imagequality
TheFIDscoredecreasedoverthecourseoftraining.After about1MupdatestheFIDscorewasabout20,anditdecreased tojustabove12at2.2Mupdates.Aplateauwasthenobserved withFIDscoresaround12–13untilall2.4Mupdateshadbeen made.
Themodelat2.2MupdateshadthelowestFIDscore(with ascoreof about12.3).The 2.2M model wastherefore the modelthatwasusedthroughoutthiswork.
Figure2illustratessixteenrandomexampleimagesgener- atedusingthe2.2Mnetwork.Seethesupplementarymaterial formorerandomexampleimages.
Modelselection Modality
Forseparating stylevectorsintothoseencodingMRand thoseencodingCTimages,thebestmodelendedupbeinga networkwithnohiddenlayers,i.e.alogisticregressionmodel.
Thevalidationsetaccuracywasabout1.0.
Since the best model was already a logistic regression model,noreversemodelswereevaluatedforthemodality.
Longitudinalsliceposition
Todetermineadirectioninthelatentspacethatencodesthe longitudinaldirection,thebestmodelhadonehiddenlayer, with128neuronsinthehiddenlayer.Thedropoutrate was about0.095,andtheinitiallearningrate wasabout0.0044.
Figure2.SixteenrandomsamplesfromtheStyleGANmodeltrainedon100patientvolumeimagesofpairedMRandCTimages.
The networkwastrained for 150epochs withamini-batch sizeof32.Thevalidationsetaccuracywasabout1.00.
Thebest reversemodel wasanetworkwiththreehidden layers, with300, 152, and 370 neurons inthe hidden lay- ers,respectively. Thedropout ratewas0.0(i.e.,nodropout wasused),andtheinitiallearningratewasabout6.0·10−5. Thenetworkwastrainedfor82epochsusingmini-batchesof twelvehiddenlayervectors.ThevalidationsetR2wasabout 0.98.Thebaselinelogisticregressionmodelhadavalidation setaccuracyofabout0.99.Hence,since0.99≥1.00·0.98,the baselinelogisticregressionmodelwasusedformanipulating thepatientlongi-tudinaldirectionfeature.
Sex
Forthemodeltoclassifythepatient’ssex,thebestmodel wasanetworkwithtwohiddenlayers,with128 neuronsin the first hiddenlayerand54 neurons inthe second hidden layer.Thedropoutratewaszero,i.e.nodropout,andtheini- tiallearningratewas0.0013.Thenetworkwouldbetrained for 150epochs using mini-batchesof 45stylevectors.The validationsetaccuracywasabout1.00.
The best reversemodel wasa networkwithfour hidden layers,with512,500,341,and263neuronsinthefirstthrough fourthhiddenlayers,respectively.Thedropoutratewas0.0, i.e.nodropoutwasused,andtheinitiallearningratewasabout 0.00026.Thenetworkwastrainedfor89epochsusingmini- batchesofelevenhiddenlayervectors.ThevalidationsetR2 wasabout0.98.Thebaselinelogisticregressionmodelhada validationsetaccuracyofabout0.98.Since0.98≥1.00·0.98, thebaselinelogisticregression modelwasthusselectedfor manipulatingthepatientsexfeature.
Latentspacemanipulation
Manipulatingthemodality
TheleastmeanMAEover1,000randompatientsliceswas about73.6HUwhentransformingfromMRtoCT.Theleast RMSE over the same1,000 randomslices was about 0.35 whentransformingfromCTtoMR(computedonimagesstill intherange[−1,1]).Figure3illustratestheerrorsoverthe 1,000random slices whentransforming CT toMR images
(left)andcorrespondinglyMRimagestoCTimages(right) bymoving fromrandompointsfromonemodalityintothe domainof theothermodality.Theerrorbarscorrespond to approximate95%confidenceintervalsofthemeans.Theleast errors thus best correspond to the associated ground truth images.
Manipulatingthelongitudinalsliceposition
Figure 4 illustrates the manipulation of the longitudinal slice positionwhen movingalongthe decisionsurfacenor- mal“searching”forasliceatdifferentoffsetsfromthemain slice.ThetoprowcontainstheresultsforMRimagesandthe bottomrowtheresultsfortheCTimages.Theleftpartillus- tratestheaverageerrors(averagedoverthe100patients)when interpolatingbetweenpairwiseslices.Therightpartillustrates theaverageerrorswheninterpolatingbetweenthecentreslice andslicesoffsetfromthecentreslice,togetherwiththeerrors attranslationsusingfactorsintherange[−0.8,0.8].Theaver- agedistancebetweenslicesintheWspacewasabout0.031 fortheMRimagesandabout0.029fortheCTimages(aver- agedoverallpatients),innormalisedunitsalongthedecision surfacenormal.I.e.,toobtainthenextinferiorslicegivena particularslice,wemoveinthenegativedirectionofthenor- maladistanceofabout0.031or0.029,andtoobtainthenext superiorslicegiventhesameslice,wemoveinthedirection ofthenormaladistanceofabout0.031or0.029.Thisworked well startingfromany slice, butthe errorsclearlyincrease withthedistancebetweentheslicepositions.
Styletransfer
Themeanerrorsoverthe1,000randomsliceswere68.6, 58.6, 58.7, 58.9,60.4, 459.1, 461.4, and460.6, for MR to CTtransformationwheninjectinginthezeroththroughsev- enth locations (4×4 through 512×512), respectively. For theCTtoMRtransformation,thecorrespondingRMSEval- ueswere0.344,0.340,0.340,0.340,0.339,0.380,0.377,and 0.377.Figure5illustratesanimagetransferwhenthetarget informationisinjectedindifferentlocationsofthenetwork.
Injectionsinthefirstlayer(4×4)resultedingeneralfeature mixinglikemodalityandlongitudinalsliceposition.Inject- ingtheimagesinthemiddlelayers(8×8through64×64)
Figure3.AnillustrationofmovingfromtheCTdomaintotheMRdomain(left)andviceversa(right).ThepointsaretheRMSEandMAE, respectively,over1,000randomslices,andtheerrorbarsareapproximate95%confidenceintervalsofthemeans.Theimagesinthelower partofthefigurearefromarandompatient(onerandomintheleftpartandanotherrandomintherightpart),decodedattheircorresponding positionsonthefirstaxisoftheplots.Notethedirectionofthedistanceaxisintheleftplot.Thenumbersintheimageindicatethelossof theimagesinthelowerparts.Thenumberingcountsfromonetofive,whereoneisclosesttothedecisionboundaryandincreasestogether withthedistancetothedecisionboundary.
changedthemodalityinformationofthesourcetothetarget one.Nochangeswereobservedwhenthetargetvectorwas injectedinlaterlayers(128×128through512×512).
Analysisofthedecisionsurface
Seethesupplementarymaterialforillustrationsofthedeci- sionsurfacesofthedifferentmodels.Thecurvaturesacrossthe surfaceforthesexhadameanofabout-0.0013withastandard deviationofabout0.0053,thecurvaturesforthelongitudinal slice position had a mean of about -0.00039 with a stan- darddeviationofabout0.00077.Hence,thecurvatureswere mostlyzero,anddidnotsignificantlydeviatefromzero.How- ever,thedistributionofaveragecurvaturesappearednottobe normallydistributed(p<4·10−80forsexandp<1·10−28 for longitudinal slice position, using the D’Agostino and Pearson normalitytest), which wouldbe the expected out- come had the average curvatures come from the same distribution.
Discussion
TheStyleGANmodelwastrainedonmultimodalimages containingCTandMRimagesofthepelvicregionfrommale andfemalepatients,withdifferentpathologies.TheFIDscore wasabout12.3whichissimilartothatforsyntheticallygen- eratedfacesthatKarrasetal.achieved(theygot4.4).Thedata inthisstudydifferfromKarrasetal.asthenetworkhadto learntwoseparateimagedistributions(i.e.MRandCT)that areverydifferentinintensityandtexture,andimagesatthe boundarybetweenthetwomodalitiesdidnotresembleimages fromeithermodality;thesearelikelytwoofthemainfactors contributingtoahigher FIDscore. Further,comparingthis metricforimagesoftwodifferentdomainsischallengingand hastobeconsideredwithcaution.Nevertheless,thenetwork
generatesimagesthatareofahighvisualquality,andofahigh resolution(512×512pixels).Giventhehighvisualqualityof theimages,itislikelythattheycanbeusedfortrainingdeep learningmodels,e.g.eitherfor pre-trainingor asaformof dataaugmentation.
Affinetransformationsoflatentstylevectorsappeartowork well,andduringthisprojectweneverfoundany“pockets”in thelatentspacewherethemodelfailedtogeneraterealistic images.TheStyleGANmodel allowsmanipulationof indi- vidualfeaturesinthelatentstylespace,W,andinparticularit appearsthatthemodalitytransfer,movingfromMRtoCTor theotherwayaround,workssatisfactoryinpractice.Infact, thereported MAEfor transferring fromthe MR tothe CT domain(73.6HU)isnotfarfromthatreportedinthelitera- tureonsyntheticCT(sCT)generation,wheretheerrorsfrom usingdeepconvolutionalneuralnetworksusuallyareinthe range40–50HU[29–35].
The StyleGAN model further learned meaningful latent stylespacerepresentationsofthelongitu-dinaldirectionof thepatients.Theerrorsgrewwiththedistancefromtheinitial slice,butwerelessthan75HUMAEand0.3RMSEwithin 4cmfor CT andMR, respectively, forthe rangeof factors thatweretested.Thesearefairlysmallerrors,asseenwhen comparingtothemodalitytransfer,forinstance.
Thestudywasinconclusiveaboutwhethersomeprincipal directionshaveastrongcurvatureonthedecisionboundary, orifwhatweobtainedisaresultfrombiasesinthereverse models.Wecanconcludethatthehyper-surfacesofthedense neuralnetworksappeartobemostlywithoutcurvature,which wouldexplainwhythelogisticregressionmodelsperformed almostaswellasthedenseneuralnetworks.Ourresultsdo pointtowards the conclusion that the latent style space of theStyleGAN model ismostlydisentangled, inthefeature dimensionswehaveinvestigated,evenifwecannotruleout entanglements.
Figure4. Anillustrationofmovingfrominferiortosuperiorslicesinthepatientvolumeimages.ThetoprowcorrespondstogeneratedMR images,andthebottomrowcorrespondstogeneratedCTimages.Theleftplotsillustratetheerrors(theRMSEandMAEsoverthe100 patientvolumes,respectively)aswemovefromoneslice(indicesrelativetothecentreslice)toanotherslice(alsowithindicesrelativeto thecentreslice).Thediagonal“valleys”implythatmovingtonearbyslicesresultsinsmallererrorscomparedtomovingtodistantslices.
Therightsideoftheplot(thelineplots)illustratesthecentrerowfromthecorrespondingleftsideoftheplot(inthe2Dplots).Eachline illustratestheerrorsobtainedaswemovealongthenormalvectorbyadistanceonthefirstaxis.Foreachline,theerrorsareminimalwhen thesoughtsliceisreached.Hence,thecurveinducedbytheminima(highlightedwiththeblackdottedline)ofallthelinescorrespondsto theerrorsachievedaswemovefromthecentreslicetofartherawayslices.Thehighlightedminimacorrespondstothecentralrowofthe 2Dplots.Wenotethattheerrorsincreasewiththedistanceofthesoughtslicerelativetothecentralslice.
UsingtheStyleGANforstyletransferappearstobeanother viableoptionformanipulatingfeatures.However,strongstyle transferssuchasthosedemonstratedintheoriginalpaperwere notobservedduringthisstudy.Thiscanbeexplainedbythe dataarrangementoftheoriginalStyleGANmodelwherethe outputwasthreecolourchannelsinsteadofonechannelused inourwork.Consid-eringthisarchitecturedifferenceinject- inglatentvectorsinthelastlayercannotchangethecolour representation.Themodalitytransferhadthesmallesterrors, andgavethequalitativelybestresultswhenthetargetlatent vectorwasinjectedintothe8×8through64×64resolution layers.TheleastMAEwhengeneratingCTimages(58.6HU) isclosetothatreportedintheliteratureforsCTgeneration,and lowerthanthecorrespondingvaluefromlatentspacemanip- ulation.Ontheotherhand,thesegeneratedimagesarelikely
biasedtowardsthepairedimagesincethosewereusedinthis evaluation.Stylemixing,wherefeaturesofbothstylevectors (MRandCT)arestronglymixed,wasonlyobservedinthe first resolution layerwhere bothmodality andlongitudinal slicepositionchangedsimultaneously.
Conclusion
Thefeaturemanipulationandstyletransfercapabilitiesof theStyleGANmakesitanattractivemodeltostudythelatent stylespace.Themodelthatwaspresentedinthisstudycould beusedtogeneraterealisticslices,andpossiblyevenvolumes, of MR orCT imagesfrom syntheticpatients of bothsexes for trainingotherdeeplearningmodels.Thefeasibilityand benefits/drawbacks ofusing syntheticimages suchas those
Figure5.AnillustrationofstyletransferwiththeStyleGAN,wherefeaturesfromMR(orrespectivelyCT)weretranslatedtoCT(orMR, correspondingly),wheretheinjectionofthetargetstylevectorwasindifferentlocationsoftheStyleGAN(layers4×4,8×8–64×64,and 128×128–512×512)whichareencodedinthecolumns.Thegreenandorangeboxesincludetwodifferentexampleswherethegreenbox includestheexampleoftranslatingMRtoCTandtheorangeboxincludestheexampleoftranslatingCTtoMR.
generatedinthisworkforthispurposewillbeinvestigatedin ourfuturework.
Futureworkcouldalsoincludeautomaticidentificationof featuredirections,andmeanstofurtherdisentanglethelatent style space for instance by regularising it. Such work and improvementscouldleadtobettermeanstomanipulatethe images,andtogeneratedimageswithentirelycustomfeatures.
Further,theStyleGANcouldbeusedtosimilarlyinvestigate otherbodyregions,suchastheheadforinstance.
Acknowledgement
Tufve Nyholm and Tommy Löfstedt are co-owners of NONPIMedical AB, the developer of MICE Toolkit—the softwareusedinthisworktopreparethetrainingdata.
ThisresearchwasinpartfundedbytheAustrianScience Fund(FWF,projectnumberP30065-B27).SomeoftheGPUs usedinthisresearchwerefundedbyagrantfromtheCancer ResearchFundofNorthernSweden.Wegratefullyacknowl- edgethesupportofNvidiaCorporationintheirdonationofa TitanXpGPUusedinthisresearch.
Appendix A Zusätzliche Daten
Zusätzliche Daten verbunden mit diesem Artikel finden sich in der Online-Version unter:
https://doi.org/10.1016/j.zemedi.2020.05.001.
References
[1] ChingT,HimmelsteinDS,Beaulieu-JonesBK,KalininAA,DoBT, WayGP,etal.Opportunitiesandobstaclesfordeeplearninginbiology andmedicine.J.R.Soc.Interface2018;15(141).
[2] LundervoldAS,LundervoldA.Anoverviewofdeeplearninginmed- icalimaging focusingonMRI.Zeitschriftfur MedizinischePhysik 2019;29(2):102–27.
[3] MaierA,SybenC,LasserT,RiessC.Agentleintroductiontodeeplearn- inginmedicalimageprocessing.ZeitschriftfurMedizinischePhysik 2019;29(2):86–101.
[4] SahinerB,PezeshkA,HadjiiskiLM,WangX,DrukkerK,ChaKH, etal.Deeplearninginmedicalimagingandradiationtherapy.Med.
Phys2019;46(1):e1–36.
[5] FengZ,NieD,WangL,ShenD.Semi-supervisedlearningforpelvicmr imagesegmen-tationbasedonmulti-taskresidualfullyconvolutional networks.ISBI2018:885–8.
[6] Jacobsen N, Deistung A, Timmann D,Goericke SL, Reichenbach JR, G¨ullmar D. Analysis of intensity normalization for optimal
segmentationperformance of a fullyconvolutional neuralnetwork.
ZeitschriftfurMedizinischePhysik2019;29(2):128–38.
[7]MahapatraD,BozorgtabarB,GarnaviR.Imagesuper-resolutionusing progressivegen-erativeadversarialnetworksformedicalimageanaly- sis.ComputerizedMedicalImagingandGraphics2019;71:30–9.
[8]LeynesAP,YangJ,WiesingerF,KaushikSS,ShanbhagDD,SeoY,etal.
DirectPseudoCTGenerationforPelvisPET/MRIAttenuationCorrec- tionusingDeepConvolutionalNeuralNetworkswithMulti-parametric MRI:ZeroEcho-timeandDixonDeeppseudoCT(ZeDD-CT).J.Nucl.
Med2017.
[9]SchnurrAK,ChungK,RussT,SchadLR, ¨ZollnerFG.Simulation-based deepartifactcorrectionwithConvolutionalNeuralNetworksforlimited angleartifacts.ZeitschriftfurMedizinischePhysik2019;29(2):150–61.
[10]RussT,GoerttlerS,SchnurrAK,BauerDF,HatamikiaS,SchadLR, Zollner¨ FG,ChungK.SynthesisofCTimagesfromdigitalbodyphan- toms usingCycleGAN.International Journalof ComputerAssisted RadiologyandSurgery2019;14(10):1741–50.
[11]GoodfellowIJ,Pouget-AbadieJ,MirzaM,XuB,Warde-FarleyD,Ozair S,etal.Gener-ativeAdversarialNetworks.NIPS2014:2672–80.
[12]BurlinaPM,JoshiN,PachecoKD,LiuTY,BresslerNM.Assessmentof DeepGen-erativeModelsforHigh-ResolutionSyntheticRetinalImage GenerationofAge-RelatedMacularDegeneration.JAMAOphthalmol- ogy2019;137(3):258–64.
[13]Frid-AdarM,KlangE,AmitaiM,GoldbergerJ,GreenspanH.Synthetic dataaugmen-tationusingGANforimprovedliverlesionclassification.
ISBI2018.
[14]KazeminiaS,BaurC,KuijperA,vanGinnekenB,NavabN,Albar- qouniS,etal.GANsforMedicalImageAnalysis.Preprintariv:2018, 1809.06222v2.
[15]Diaz-Pinto A, Colomer A, Naranjo V, Morales S, Xu Y, Frangi AF. Retinal Image Syn- thesis and Semi-supervised Learning for Glaucoma Assessment. IEEE Transactions on Medical Imaging 2019;38(9):2211–8.
[16]Frid-AdarM,DiamantI,KlangE,AmitaiM,GoldbergerJ,Greenspan H.GAN-basedsyntheticmedicalimageaugmentationforincreased CNN performance in liver lesion classifica- tion.Neurocomputing 2018;321:321–31.
[17]KarrasT,AilaT,LaineS,LehtinenJ.ProgressiveGrowingofGANs forImprovedQuality,Stability,andVariation.inICLR2018.
[18]KarrasT,LaineS,AilaT.AStyle-BasedGeneratorArchitecturefor GenerativeAdver-sarialNetworks.CVPR2019.
[19]ShenY,GuJ,TangX,ZhouB.InterpretingtheLatentSpaceofGANs forSemanticFaceEditing.PreprintarXiv:2019,1907.10786.
[20]AbdalR,QinY,WonkaP.Image2StyleGAN:HowtoEmbedImages IntotheStyleGANLatentSpace?PreprintarXiv:2019,1904.03189.
[21]HuangX,BelongieS.ArbitraryStyleTransferinReal-TimewithAdap- tiveInstanceNor-malization.ICCV2017.
[22]PaszkeA,ChananG,LinZ,GrossS,YangE,AntigaL,etal.Automatic differentiationinPyTorch.NIPS2017.
[23]KingmaDP,BaJL.Adam:Amethodforstochasticoptimization.in ICLR2017.
[24]GulrajaniI,AhmedF,ArjovskyM,DumoulinV,CourvilleA.Improved TrainingofWassersteinGANs.NIPS2017.
[25]MeschederL,GeigerA,NowozinS.WhichTrainingMethodsforGANs doactuallyConverge?ICML2018.
[26]HeuselM,RamsauerH,UnterthinerT,NesslerB,HochreiterS.GANs TrainedbyaTwoTime-ScaleUpdateRuleConvergetoaLocalNash Equilibrium.NIPS2017.
[27]UlyanovD,VedaldiA,LempitskyV.InstanceNormalization:TheMiss- ingIngredientforFastStylization.CVPR2017.
[28]LiuL, JiangH,HeP, Chen W,LiuX,GaoJ,etal.OntheVari- anceoftheAdaptiveLearningRateandBeyond.PreprintarXiv2019, 1908.03265.
[29]KorhonenJ,KapanenM,Keyri¨lainenJ,Sep¨pa¨laT,TenhunenM.Adual modelHUconversionfromMRIintensityvalueswithinandoutside ofbonesegmentforMRI-basedra-diotherapytreatmentplanningof prostatecancer.MedicalPhysics2013;41(1):011704.
[30]EdmundJM,NyholmT.AreviewofsubstituteCTgenerationforMRI- onlyradiationtherapy.RadiationOncology2017;12(1):28.
[31]WolterinkJM,DinklaAM,SavenijeMH,SeevinckPR,vandenBerg CA, ˇIsgumI.DeepMRtoCTsynthesisusingunpaireddata.MICCAI 2017:14–23.
[32]NieD,CaoX,GaoY,WangL,ShenD.EstimatingCTImagefromMRI DataUsing3DFullyConvolutionalNetworks.DeepLearnDataLabel MedAppl2016;2016:170–8.
[33]MasperoM,SavenijeMHF,DinklaAM,SeevinckPR,IntvenMPW, Jurgenliemk-SchulzIM,etal.FastsyntheticCTgenerationwithdeep learning for general pelvisMR-onlyRadiotherapy.Phys Med Biol 2018:1–14.
[34]EmamiH,DongM,Nejad-DavaraniSP,Glide-HurstC.Generating Synthetic CTs from Magnetic ResonanceImages usingGenerative AdversarialNetworks.MedPhys2018.
[35]Xiang L, Wang Q, Nie D, Zhang L, Jin X, Qiao Y, Shen D.
Deepembeddingconvo-lutionalneuralnetworkforsynthesizingCT image from T1-Weighted MR image. Med Image Anal 2018;47,:
31–44.
Availableonlineatwww.sciencedirect.com