On Multi-level Exclusive Caching Offline Optimality and

来源：个人技术集锦

OnMulti-levelExclusiveCaching:OfﬂineOptimalityand

Whypromotionsarebetterthandemotions.

BinnyS.Gill

IBMAlmadenResearchCenter

binnyg@us.ibm.com

Abstract—Multi-levelcachehierarchieshavebecomeverycommon;however,mostcachemanagementpoliciesresultinduplicatingthesamedataredundantlyonmultiplelevels.Thestate-of-the-artexclusivecachingtechniquesusedtomitigatethiswastageinmulti-levelcachehierarchiesaretheDEMOTEtechniqueanditsvariants.Whiletheseachievegoodhitratios,theysufferfromsigniﬁcantI/Oandcomputationaloverheads,makingthemunsuitablefordeploymentinreal-lifesystems.WeproposeadramaticallybetterperformingalternativecalledPROMOTE,whichprovidesexclusivecachinginmulti-levelcachehierarchieswithoutdemotionsoranyoftheoverheadsinherentinDEMOTE.PROMOTEusesanadaptiveprobabilisticﬁlteringtechniquetodecidewhichpagesto“promote”tocachesclosertotheapplication.WhilebothDEMOTEandPROMOTEprovidethesameaggregatehitratios,PROMOTEachievesmorehitsinthehighestcachelevelsleadingtobetterresponsetimes.Wheninter-cachebandwidthislimited,PROMOTEconvincinglyoutperformsDEMOTEbybeing2xmoreefﬁcientinbandwidthusage.Forexample,inatracefromareal-lifescenario,PROMOTEprovidedanaverageresponsetimeof3.42msascomparedto5.05msforDEMOTEonatwo-levelhierarchyofLRUcaches,and5.93msascomparedto7.57msonathree-levelcachehierarchy.Wealsodiscovertheoreticalboundsforoptimalmulti-levelcacheperformance.Wedevisetwoofﬂinepolicies,calledOPT-UBandOPT-LB,thatprovablyserveasupperandlowerboundsonthetheoreticallyoptimalperformanceofmulti-levelcachehierarchies.ForaseriesofexperimentsonawidegamutoftracesandcachesizesOPT-UBandOPT-LBranwithin2.18%and2.83%ofeachotherfortwo-cacheandthree-cachehierarchies,respectively.Thesecloseboundswillhelpevaluatealgorithmsandguidefutureimprovementsintheﬁeldofmulti-levelcaching.

throughmultiplelayersofcachebeforereachinganapplication.Ithasbeenobservedthatsingle-levelcachereplacementpoliciesperformverypoorlywhenusedinmulti-levelcaches[26].

Weproposeasimpleanduniversalprobabilistictechnique,calledPROMOTE,thatadaptsanysingle-levelreadcachingalgorithmintoanalgorithmthatallowsamulti-levelreadcachehierarchytoperformaseffectivelyasasinglecacheoftheaggregatesize.Un-likepreviousalgorithms,thisnovelalgorithmimposesnegligiblecomputationalandI/Ooverheads.Asanotherkeycontributiontotheﬁeldofmulti-levelcaching,weprovide,fortheﬁrsttime,techniquestocomputeverycloseboundsforthenotoriouslyelusiveofﬂineoptimalperformanceofmulti-levelcaches.A.TheProblemwithInclusion

Oneoftheearliestexamplesofmulti-levelcachesaroseinthecontextofaprocessor[30],[28].ThemultiplelayersofcachewerenamedL1,L2,L3,etc.,withL1(highestcache)beingtheclosesttothepro-cessor,thesmallestinsize,andthefastestinresponsetime.Forefﬁcientcachecoherency,systemsenforcedtheinclusionproperty[1],whichmandatedthatthehighercachesbeasubsetofthelowercaches.OtherthanincaseswhereL2wasonlymarginallylargerthanL1[22],theperformancepenaltyofthisredundancyofcontentbetweenthelayerswasnotofmuchconcern.Instarkcontrast,inthehierarchyofcachesformedbymultiplecomputingsystemsconnectedtoeachother,inclusion,sometimespartial,isnotbydesignandhasbeenfoundtobedetrimentaltoperformance[26],[13].Theredundancyofdatabetweenthecachelevelsismostseverewhenthecachesareofcomparablesizes.Arequestisalwaysservicedfromtheclosestcachetotheclientthathasthedata,whilefurthercopiesofthedatainlowercachesarenotuseful.B.WorkingtowardsExclusion

Incachehierarchiesthroughwhichpagestraverseﬁxedpathsfromthedatasourcetotheapplication,exclusivityofallcachesishighlydesirable.Inmulti-pathcachehierarchies,wherepagescangetaccessedvia

I.INTRODUCTION

Veryrarelydoesdatareachitsconsumerwithouttravelingthroughacache.Theperformancebeneﬁtofacacheissigniﬁcant,andeventhoughcachesaremuchmoreexpensivethanmassstoragemedialikedisks,nearlyalldataservers,webservers,databases,andinfactmostcomputingdevicesareequippedwithacache.Inthelastseveraldecadesnumerousreadcachingalgorithmshavebeendevised(forexample,LRU[10],CLOCK[8],FBR[29],2Q[20],LRFU[21],LIRS[18],MQ[36],ARC[25]andSARC[14]).Mostofthework,however,hasfocusedonthecasewhenasinglesig-niﬁcantlayerofcacheseparatesthedataproducerandthedataconsumer.Inpractice,however,datatravels

USENIX AssociationFAST ’08: 6th USENIX Conference on File and Storage Technologies

however,aredomain-speciﬁcandnoteasilyapplicabletomorethantwolevelsofcache.

AsimilarapproachistouseaClientCacheTracking(CCT)table[6]inthelowercachetosimulatethecontentsofthehighercache.Thisallowsthelowercachetoproactivelyreloadtheevictedpagesfromthestoragemedia.Theextracostoftheserequests,however,mayoverburdenthestoragemediaresultinginhighreadresponsetimes.

D.ExclusivityviaSmartHigherCaches

Fig.1.

Cachehierarchiesthatbeneﬁtfromexclusivecaching.

multiplepaths,exclusivecachingshouldbeappliedtothoseportionsofthecachehierarchywhichdonothavealargeworkloadoverlapbetweenthevariouspaths.Figure1showsacommonscenarioforwhichvariousexclusivecachingalgorithmshavebeenproposed([36],[33],[19],[12]).Itisfairlycommontoﬁndcachehierarchiesformedbyaﬁrstlevelofapplicationservers(databaseservers,webproxyservers,webcontentde-liveryservers,storagevirtualizationservers)whichactasclientsforbackendstorageservers,whilebothareequippedwithsigniﬁcantandcomparablecaches[33].Insomecases,itmayalsobepossibletoincludetheendclientstoformanexclusivecachehierarchyofthreeormorelevelsofcache.Ana¨ıvewayofenforcingtheexclusionpropertywouldbetoassociatedifferentcacheswithdifferentlogicaladdresses.Obviously,thiswillnotbeabletogatherfrequentlyhitpagesinthetopmostcachesandtheaverageresponsetimeforsomeworkloadsmightendupworsethanthecasewhenthecachesarenotexclusiveatall.Succinctly,thechallengeofmulti-levelexclusivecachingis:“Exclusivityachievedefﬁcientlywithmosthitsatthehighestcachelevels”.C.ExclusivityviaSmartLowerCaches

IthasbeenknownthattheLeastFrequentlyUsed([13],[32])algorithmperformsbetteratsecond-levelcachesthatthetraditionalLRUalgorithm.Amoresophisticatedsecond-levelcachemanagementalgorithmistheMQalgorithm[36]whichmaintainsmultiplelistsgearedtocapturefrequentlyaccessedpageswithlongreuseintervals.However,itisnotstudiedformorethantwolevelsofcacheandalsocannotachievecompleteexclusivityofthecacheswheredesirable.

Amorerecentalgorithm,X-RAY[2],constructsintheRAIDcontrollercacheanapproximateimageofthecontentsoftheﬁlesystemcachebymonitoringthemeta-dataupdates,whichallowsforbettercachereplacementdecisionsandexclusivity.Suchgray-boxapproaches,

Chenetal.[7]proposeanalgorithmcalledACCAinwhichaclient(highercache)simulatesthecontentsoftheserver(lowercache),andusesthatknowledgetopreferablyevictthosepageswhicharealsopresentintheserver.Thisisdifﬁculttodoinmulti-clientscenariosorwheretheserverbehavioriscomplexorproprietary.E.ExclusivityviaMulti-levelCollaboration

Whenmultiplelayersofcachecanbemodiﬁedtocommunicatewitheachother,mostoftheguessworkofsimulatingcachecontentscanbeavoided.Eventhoughextensionsarerequiredtothecommunicationprotocol,thisclasshasproventobethemostpromisingintermsofperformanceandeaseofimplementation.

1)Applicationcontrolled:TheUniﬁedandLevel-awareCaching(ULC)algorithm[19]controlsacachehierarchyfromthehighestlevelbyissuingRETRIEVEandDEMOTEcommandstolowercachescausingthemtomoveblocksupanddownthehierarchy,respectively.Thehighestcachelevel(applicationclient)hastokeeptrackofthecontentsofallthecachesbelow,whichentailscomplexityintheclient’simplementation.

2)Applicationhints:KARMA[12]isaimedatap-plicationssuchasdatabasesthatcanprovidehintsforplacementdecisionsinallcachelevels.Suchsolutionsareapplication-speciﬁcanddonotlendthemselvestogeneralapplicabilityinmulti-levelcaches.ThisbringsustoDEMOTE,whichisthemostpopulargeneral-purposecollaborativemulti-levelcachingtechnique.F.TheDEMOTETechnique

TheDEMOTEtechnique[33],orequivalentlytheGlobaltechnique[35],showninFigure2,canbeappliedtomanygeneralpurposesingle-levelcachingpolicies(like,LRU,MQ,ARC,etc)tocreatemulti-levelversionsthatachieveexclusivityofcachecontents.Aswithanyexclusivecachingscheme,DEMOTEshouldonlybeusedinscenariosthatbeneﬁtfromexclusivecaching[33],[27].

G.TheProblemswithDEMOTE

WhiletheDEMOTEtechniquestrivestoimprovetheaggregatehitratiooverthenon-exclusivevariant,theoverallperformancemightinfactsufferbecause

FAST ’08: 6th USENIX Conference on File and Storage TechnologiesUSENIX Association

HITMRULRUDEMOTEMRUDEMOTELRUHITH.OurContributionsWepresenttwomajorresults:BoundsforOptimalPerformance:Inthestudyofcachingalgorithms,itisinvaluabletoknowtheofﬂineoptimalperformancethatamulti-levelcachecandeliver.WhileBelady’sMIN[4](anofﬂinealgorithm)isconsideredoptimalforsingle-levelcaches,itcannotbeappliedtomulti-levelcachescenarios,where,apartfromthehitratio,thelocationofhitsisalsoextremelyimportant.Ourﬁrstcontributionprovidesinsightintotheoptimalofﬂineperformanceofmulti-levelcaches.WeprovidepoliciescalledOPT-UBandOPT-LBthatprovablyserveasupperandlowerboundsfortheoptimalofﬂineperformanceformulti-levelcachesalongbothaverageresponsetimeandinter-cachebandwidthusagemetrics.Throughaseriesofexperimentsonawidegamutoftraces,cachesizesandconﬁgurations,wedemonstratethatOPT-UBandOPT-LBareverycloseboundsontheoptimalaverageresponsetime,runningonanaverage,within2.18%and2.83%ofeachotherforallthetestedtwo-cacheandthree-cachehierarchies,respectively.Evenformorecomplexhierarchies,theboundsremaincloseatabout10%ofeachother.Thisnovelresultenablesustoestimatefortheﬁrsttime,theperformancegapbetweenthecurrentstate-of-the-artalgorithmsandtheofﬂineoptimalformulti-levelcaches.PROMOTETechnique:Asanotherfundamentalcon-tributiontotheﬁeldofcaching,weproposeasim-pleandsigniﬁcantlybetteralternativetotheDEMOTEtechnique,calledPROMOTE,whichprovidesexclusivecachingwithoutdemotions,applicationhints,oranyoftheoverheadsinherentinDEMOTE.PROMOTEusesaprobabilisticﬁlteringtechniqueto“promote”pagestohighercachesonaread.NotonlydoweshowthatPROMOTEisapplicabletoawiderrangeofalgorithmsandcachehierarchies,itisonanaverage,2xmoreefﬁcientthanDEMOTErequiringonlyhalftheinter-cachebandwidthbetweenthevariouscachelevels.Inawidevarietyofexperiments,whilebothtech-niquesachievedthesameaggregatehitratio,PROMOTEprovided13.0%and37.5%morehitsinthehighestcachethanDEMOTEwhenthetechniqueswereappliedtoLRUandARC[25]algorithms,respectively,leadingtobetteraverageresponsetimesevenwhenweal-lowDEMOTEunlimitedinter-cachebandwidthandfreedemotions.Inlimitedbandwidthscenarios,PROMOTEconvincinglyoutperformsDEMOTE.Forexample,inatracefromareal-lifescenario,PROMOTEprovidedanaverageresponsetimeof3.21msascomparedto5.43msforDEMOTEonatwo-levelhierarchyofARCcaches,and5.61msascomparedto8.04msonathree-levelcachehierarchy.DEMOTEMRULRUMRULRUHITFig.2.TheDEMOTEtechnique.Whenaclientisabouttoejectacleanblockfromitscache,itsendsthecleanblocktothelowercacheusingtheDEMOTEoperation.Thelowercacheputsthedemotedblockintoitscache,ejectinganotherblockifnecessarytomakespace.HitsinanylistaresenttotheMRUendoftheappropriatehighestcache.ofthecostoftheDEMOTEoperation,including:(i)networktrafﬁctosendevictedpagestolowercaches,and(ii)processorcyclesconsumedtoprepare,sendandreceivedemotedpages.ThishasthwartedthepracticaldeploymentofDEMOTEinrealsystems.Incaseswhereinter-cachebandwidthistight,ortheworkloadthroughputishigh,theaverageresponsetimesuffersinspiteofahigherhitratio.Thishappensasreadsstalluntildemotions(sometimesinmultiplelevels)cancreatespaceforthenewpage.Further,aneviction,whichusedtobeatrivialoperation,nowbecomesanexpensiveoperationalmostequivalenttoawriterequesttothelowercache.Thisleadstofurtherdegradationofperformance.Addtothistheconcernthatforworkloadsthatdonotexhibittemporallocality,likepurelysequential(orpurelyrandom),alldemotionsarefutileandweendupwastingcriticalnetworkresourceswhenmostneeded.Eviction-basedcachereplacement[6]wasproposedtoalleviatethenetworkbandwidthwastage.Alistofevictedpagesissentperiodicallytothelowercachewhichreloadsthemfromthedisksintoitscache.Wehaveobservedformanyrealsystemsandbenchmarkstheseextrareadsincreasetheaveragemisspenalty,diminishingordefeatingthebeneﬁtsofanyincreaseinhitratiothatsuchexclusivecachingcanprovide.Therehavebeenotherattemptstopartiallymitigatethecostofdemotions[34]orthecostoftheextraI/Osonthedisks[6]bygroupingthemandusing‘idletime’.Inouropinion,idletimeshouldneverbeconsideredfreeasitcanbeusedbyotheralgorithmslikeprefetchingtoimprovereadperformance.Sophisticatedenterprise-classsystemswhichrunroundtheclockstrivehardtominimizeidle-time.ULCandKARMAdoreducethenumberoflowreusepagesthatenterthehighercachesandtherebyminimizingthebandwidthwastage.How-ever,ULC’scomplexityandKARMA’sdependenceonapplicationhintsmakethemlessgenerallyapplicable.USENIX AssociationFAST ’08: 6th USENIX Conference on File and Storage Technologies

II.OFFLINEOPTIMALt1t2t3andtheinter-cachetrafﬁcbetweencachesCiandCi+1istminterCacheTrafci→ci+1=|σi+1|+|Dci→ci+1|(2)WedeﬁnehitOPT(σ,S)tobethenumberofhitsproducedbyBelady’sofﬂineOPTpolicyontherequestsequenceσandasingle-levelcacheofsizeS.DisksClientsσ1size = S1hits = h1C1Dσ2C1C2size = Sσ32size = S3hits = h3σmhits = h2C2DC2C3(Demotions)(Demotions)(Highest Cache)C3(Lowest Cache)B.OptimalUpperBound:OPT-UBWedeﬁneaconceptualpolicyOPT-UBthatservesasanupperboundonperformance,implyingthatnopolicycanperformbetterthanthisboundintermsofeitheraverageresponsetimeorinter-cachebandwidthusagebyachievingbetter(lower)valuesforeithermetric.OPT-UBisabound,notontheperformanceofaparticularcache,butontheaggregateperformanceofthecachehierarchy.LetOPT-UBbeapolicythatforarequestsequenceσ,exhibitsforeachcacheCi,hinumberofhits,whilerequiringnodemotionsorreloadsfromdisks,where,hi=hitOPT(σ,i󰀂j=1Fig.3.Amulti-levelsingle-pathcachehierarchy.A.QuestforOfﬂineOptimalityTheofﬂineoptimalperformance,whichisthebestpossibleperformancegivenfullknowledgeofthefuturerequests,isacriticalguideforcachealgorithmdevel-opment.Theofﬂineoptimalalgorithmforsingle-levelcachesistheclassicBelady’sMINorOPT[4]whichsimplyevictsthepagewiththefurthestre-referencetime.Forfourdecades,thequestforasimilaralgorithmformulti-levelcacheshasremainedunfulﬁlled.Hitratio,theperformancemetricthatservedussowellinsingle-levelcaches,losesitsﬁdelitytoperformanceinmulti-levelcaches,wherethebeneﬁtofahitdependsonwhichlevelitoccurredin(e.g.twohitsonahighercachecouldbebetterthanthreehitsonalowercache.)Theaveragereadresponsetimeisamuchbetterper-formancemetric,butiscomplicatedinthepresenceofdemotionsand/oreviction-basedreloadingfromdisks.Theinter-cachebandwidthusageisanotherimportantmetricsincemanyscenariosarebottle-neckedbynet-workresources[33].Itdoesappearthatdifferentpoli-cieswouldbeoptimaldependingonwhichperformancemetricweuse.Wenowproveuniversalboundsforthesingle-pathscenarioasdepictedinFigure3thatsimultaneouslyapplytoboththeaverageresponsetimeandinter-cachebandwidthmetrics.Theboundsapplytoallalgorithmsirrespectiveofwhetherdemotions,inclusion,orhintsareused.Later,weexplainextensionstocomputetheboundsforthemulti-pathscenariosaswell.CiSihititmσiσmCacheatleveli,i=1,2,...,nSizeofthecacheCiNumberofhitsatCiAvg.round-tripresponsetimefromCiAvg.round-tripresponsetimefromdisksSequenceofreadsseenbyCiSequenceofreads(misses)seenbydisksNumberofdemotionsfromCitoCi+1Sj)−hitOPT(σ,i−1󰀂j=1Sj)(3)Notethatweintendtocomputeatheoreticalupperboundwhichisnecessarilyachievable.LemmaII.1Nopolicycanhavemorehits,uptoanycachelevel,thanOPT-UB.Moreprecisely,OPT-UB󰀁kmaximizesi=1hk,∀k≤n.Proof:ByEqn.3theaggregatehitsforthesetofcachesC1,..,CkisaggrHitsk=k󰀂i=1hi=hitOPT(σ,k󰀂i=1Si)(4)BythedeﬁnitionofhitOPT,thisisthesameasthatobtainedbyBelady’sOPTonacacheoftheaggregatesize.SinceBelady’sOPTisknowntodeliverthemaximumpossiblehitratio,aggrHitskismaximizedforallk≤n.TheoremII.1Nopolicycanhavealowertotalinter-cachetrafﬁcthanOPT-UB.Proof:Inter-cachetrafﬁcisthesumofmissesanddemotionsbetweentwoadjacentcaches(byEqn.2).Since,OPT-UBisdeﬁnedtohavenodemotionsandmaximizestheaggregatehitsatalllevelsofthecache(LemmaII.1),nootherpolicycanhavelowerinter-cachetrafﬁcthanOPT-UB.TheoremII.2NopolicycanhavealoweraverageresponsetimethanOPT-UB.Dci→ci+1FromFigure3anddeﬁnitionsabove,thetotalre-sponsetimeistotalRespTime=n󰀂i=1hi·ti+|σm|·tm(1)52

FAST ’08: 6th USENIX Conference on File and Storage TechnologiesUSENIX Association

Proof:Weprovebycontradiction.Letabetterperformingpolicyachievealoweraverageresponsetime(equivalently,lowertotalresponsetime)foraworkloadthanOPT-UB,byprovidingh󰀁ihitsateach

󰀁

correspondingcacheCi,andmoverallmisses,ascomparedtohihitsandmmissesforOPT-UB.

nn󰀂󰀂

󰀁󰀁

∴hi·ti+m·tm⇒

i=1

n󰀂i=1

󰀁

h󰀁i·(ti−tm)+(m+n󰀂i=1

i=1

n󰀂i=1

h󰀁i)·tm

n󰀂i=1

⇒(a)⇒

n󰀂i=1

hi·(ti−tm)+(m+

n󰀂i=1

hi)·tm

h󰀁i·(tm−ti)>

hi·(tm−ti)

h󰀁i·(tn−ti+tm−tn)>

n󰀂i=1

hi·(tn−ti+tm−tn)

n󰀂

⇒

(b)

n󰀂i=1

h󰀁i

·(tn−ti)>+(

n󰀂i=1

hi·(tn−ti)

hi−

i=1

n󰀂

h󰀁i)·(tm−tn)hi·(tn−ti)hi·(tn−ti)

⇒(c)⇒

(d)

n󰀂i=1

n−1󰀂i=1

h󰀁i·(tn−ti)>h󰀁i

i=1

n󰀂i=1n−1󰀂i=1

lowerboundonoptimalmulti-levelcachingperfor-mance.Abetterperformingpolicywillhavetodemon-strateeitherloweraverageresponsetimeorlowerinter-cachebandwidthusage.OPT-LBisthebestknownoff-linemulti-levelcachingpolicythatweareawareof.

ThebasicideaistoapplyBelady’sOPTalgorithminacascadedfashion.Westartwiththehighestcachelevel,C1,andapplyOPTtotherequestsequenceσ1,assumingasinglecachescenario.WenotethemissesfromC1withtheirtimestampsandprepareanotherrequestsequence,σ2,forthelowercacheC2.WerepeatthesameprocessatC2,inturngeneratingσ3.Thisisperformedforeachlevelandwekeepacountofhitsobtainedateverylevel.

Inotherwords,hi=hitOPT(σi,Si)andσi+1=traceOPT(σi,Si),wheretraceOPTisatraceofthemisseswhenOPTisappliedtothegivenrequeststreamandcachesize.

Onceeachlevelhaslearneditsσi,allcachelevelscanoperatetogetherreplicatingtheresultinreal-time.Sincethiscanbedonepractically,thispolicybydeﬁnitionservesasthelowerboundfortheofﬂineoptimalalonganyperformancemetric.NotethatOPT-LBdoesnotre-quireanydemotionsorreloadsfromdisks.EventhoughOPT-LBdoesnotguaranteeexclusivityofcaches,weexperimentallyconﬁrmthatOPT-LBisindeedaverycloselowerboundforofﬂineoptimalsforbothaverageresponsetimeandinter-cachebandwidthusageinmulti-levelcaches.

D.BoundsforMulti-pathHierarchies

ItissimpletoextendOPT-UBandOPT-LBforanycomplexcachehierarchy.Whileitisfairlycommontoaccepttheuseoftracesformulti-clientcachingexperiments([33],[19]),theresultsareaccurateonlyincaseswheretherelativeorderofrequestsfromvariousclientscanbeassumedﬁxed.ThesameholdstrueforOPT-UBandOPT-LB,whicharevalidboundsformulti-pathhierarchiesifwecanassumethattherelativeorderofrequestsfromvariousclientsisﬁxed.Notethatthereisnosuchcaveatinsingleclientscenarios,wheretrace-basedanalysisisaccurate.

WeextendOPT-UBasfollows:westartwithde-terminingthemaximumhitratioobtainableateachcacheatthehighestlevelbyapplyingBelady’sOPT.Similarly,wedeterminethemaximumaggregatehitratioobtainableineachtwo-levelsubtreestartingatthehighestlevels.Wesubtractthehitratioofthehighestlevelcachestoobtainthehitratioforthesecond-levelcaches.Wedothisuntilwehavehitratiovaluesforallcachelevels,usingwhich,wearriveattheOPT-UBaverageresponsetimevalue.Thisisasimplegeneralizationofthesingle-pathapproach.

·(tn−ti)>

(a)followsasthesumofallhitsandmissesisthesameforbothpolicies(|σ1|)andthesecondtermonbothsidesoftheinequalitycanberemoved.Thesecondtermontherighthandsideof(b)isnon-negativebecausetm>tnandbyLemmaII.1,nopolicycanhavemoreaggregatehits(uptoanycachelevel)thanOPT-UB.(c)followsbyremovingthenon-negativesecondtermininequality(b).(d)followsasthenthterminthesummationiszero.Notethatbetweenstep(a)andstep(d),thesuperscriptofthesummationhasdroppedfromnton−1.Steps(a)through(d)canberepeateduntiln=2(as,foralli,ti>t(i−1)).Wewillthenarriveath󰀁1·(t2−t1)>h1·(t2−t1).Ast2>t1,itimpliesthath󰀁LemmaII.1,whichstates1>h1,whichcontradicts󰀁k

thatOPT-UBmaximizesi=1hk,∀k≤n(includingk=1).

C.OptimalLowerBound:OPT-LB

WenowintroduceaverysimpleandpracticalofﬂinealgorithmcalledOPT-LB,whichprovidesaveryclose

USENIX AssociationFAST ’08: 6th USENIX Conference on File and Storage Technologies

ADAPTINGPROBPROMOTE:

ﬂoathintFreq=0.05;

P−1Pk

ﬂoatsizeRatio=kS/ii=1i=1Si;(atlevelk)ﬂoatprobPromote=sizeRatio;structadaptHint{

//lifeofthecachetimetlife;

ﬂoatoccupancy;//fractionofcacheoccupiedbyT2

//-onlyrequiredbyPROMOTE-ARC

}higherHint;//hintfromhighercacheEverycache.life∗hintFreq

1:PrepareandsendadaptHinttolowercacheOnreceivingadaptHinth2:higherHint=h;

3:Everyalternatetime(toobservetheimpact

ofthepreviousadaptation):adjustIfNeeded();adjustIfNeeded()

4:staticﬂoatprev=0;

5:ﬂoatcurr=adaptRatio();/*algo-speciﬁc*/

6:ﬂoatf=(2∗curr−1);/*f=0isthetarget*/7:if((f>0&&8:prev−currsizeRatio)14:probPromote=sizeRatio;15:endif16:endif

17:prev=curr;

shouldPromoteUpwards()18:if(HighestCache||19:probPromote22:returntrue;

Fig.4.ThecommonmethodsusedbyPROMOTEtoadaptprobPromotewithineverycachebyleveragingthehintprotocol.

PROMOTE-LRU:

adaptRatio()/*returnavaluebetween0and1*/

23:returnhigherHint.life/(cache.life+higherHint.life)Onreceivingfromhighercache(readReqaddr)24:pagep=lookupCache(addr);25:if(p)/*hit*/26:promoteHint=shouldPromoteUpwards();27:if(promoteHint)28:removepagepfromcache29:endif30:sendtohighercache(p,promoteHint)31:else/*miss*/32:sendtolowercache(addr)33:endif

Onreceivingfromlowercache(pagep,boolpromoteHint)34:if(promoteHint)35:promoteHint=shouldPromoteUpwards();36:if(!promoteHint)37:createpagepincache38:endif39:endif

40:sendtohighercache(p,promoteHint)PROMOTE-ARC:

adaptRatio()/*returnsavaluebetween0and1*/

41:ﬂoathigher=higherHint.occupancy/higherHint.life;42:ﬂoatself=T2.occupancy/T2.life;43:returnself/(self+higher);

Onreceivingfromhighercache(readReqaddr,boolT2hint)44:pagep=lookupCache(addr);45:if(p||addrfoundinhistory)46:T2hint=true;47:removeaddrfromhistory(ifpresent)48:endif

49:if(p)/*hit*/50:promoteHint=shouldPromoteUpwards();51:if(promoteHint)52:removepagepfromcache53:endif54:sendtohighercache(p,promoteHint,T2hint)55:else/*miss*/56:sendtolowercache(addr,T2hint)57:endif

Onreceivingfromlowercache(pagep,

boolpromoteHint,boolT2hint)

58:if(promoteHint)59:promoteHint=shouldPromoteUpwards();60:if(!promoteHint)61:if(T2hint)62:createpagepinT263:else64:createpagepinT165:endif66:endif67:endif

68:sendtohighercache(p,promoteHint,T2hint)

Fig.5.ThePROMOTEaugmentationstoARCandLRUalgorithms.Theseenhancementsareapartfromtheregularmechanisms(notshown)inherentinthesealgorithms.

WeextendOPT-LBformulti-pathhierarchiesbymergingtraces(accordingtotimestamps)emergingfromhighercachesbeforeapplyingthemtothelowercache.Wepresentacoupleofillustrativeexamplestowardstheendofthepaper(Figures14and15).

III.THEPROMOTETECHNIQUE

ThegoalofthePROMOTEtechniqueistoprovideexclusivecachingthatperformsbetterthanDEMOTE,whileatthesametime,requiresnodemotions,reloadsfromdisks,oranycomputationallyintenseoperation.Eachcacheindependentlyandprobabilisticallydecideswhethertokeepthedataexclusivelyasitispassedalongtotheapplication.Theprobabilityofholdingthedataorpromotingthedataupwardsisadaptivelydetermined.

FAST ’08: 6th USENIX Conference on File and Storage TechnologiesUSENIX Association

WepresentthePROMOTEalgorithminFigures4and5.A.AchievingExclusivity

ThePROMOTEtechniqueensuresthatonlyonecacheclaimsownershipofapageonitswaytotheclient.Onsubsequentaccessestothepage,thepagecaneitherstayinthesamecacheorbepromotedupwards.AsinDEMOTE,onlywhenapageisaccessedviadifferentpathsinamulti-pathhierarchycanapageappearinmultiplelocations(whichisbetterthanenforcingabsoluteexclusivity).

B.UsingpromoteHintwithaREADReply

WhileDEMOTEusesanadditionalbandwidth-intensiveoperation(DEMOTE:similartoapageWRITE),PROMOTEusesabooleanbit,calledpromoteHint,thatispassedalongwitheveryREADreply,andhelpstodecidewhichcacheshouldownthepage.ThepromoteHintissettotruewhenaREADreplyisﬁrstformed(cachehitordiskread)andsettofalseintheREADreplywhensomecachedecidestokeep(own)it.AREADreplywithafalsepromoteHintimpliesthatalowercachehasalreadydecidedtoownthepageandthepageshouldnotbecachedatthehigherlevels.WhenacachereceivesaREADreplywithatruepromoteHint,thecachegetstodecidewhethertokeep(own)thepagelocallyor“promote”itupwards.AteachcachelevelPROMOTEmaintainsaseparateprobPromotevalue,whichistheprobabilitywithwhichthatcachewillpromoteapageupwards.Ifacachedecidestoownthepage(Lines18-22),itchangesthepromoteHinttofalseinthereplyandmaintainsacopyofthepagelocally.Inallothercases,aREADreplyissimplyforwardedtothehighercachewiththepromoteHintvalueunchanged,andanylocalcopyofthedataisdeletedfromthecache.ThehighestcachehasprobPromote=0,implyingthatitalwaysownsapagethatitreceiveswithatruepromoteHint.ApagereceivedwithafalsepromoteHint,implyingthatthepageisalreadyownedbyalowercache,ismerelydiscardeduponreturningittotheapplication.C.PromotingHitsUpwards

PagesthatincurrepeatedhitsinPROMOTEarehighlylikelytomigrateupwardsinthehierarchyastheyaresubjectedrepeatedlytotheprobPromoteprobabilityofmigratingupwards.Themorethenumberofhitsin-curredbyapagethemoreitcanclimbinthehierarchy.Themosthitpagessoonbegintoaccumulateinthetopmostlevel.

NotethatwhileDEMOTEusesinter-cachebandwidthforpagesontheirwayuptowardstheapplicationandalsotheirwaydown,PROMOTEsavesbandwidthbymovingpagesonlyinonedirection.

D.AdaptingprobPromote

EachcachemaintainsandadaptsaseparateprobPromotevalue.ThereaderwillappreciatethatthelowertheprobPromotevalueatacache,thelesseristherateatwhichnewpageswillenterthecachesaboveit.Thus,bychangingthevalueofprobPromote,alowercachecaninﬂuencetheinﬂuxrateofnewpagesatthehighercaches.ThefundamentalideainthePROMOTEtechniqueistousethisleveragetocreateasituationwherethe“usefulness”ofpagesevictedfromthevariouscachesareroughlythesame(ifpossible).ThisisdifferentfromDEMOTE,wherepagesevictedfromthehighercachearemoreuseful(andhenceneedtobedemoted)thanthepagesevictedfromthelowercache.

Tofacilitatetheadaptation,PROMOTErequiresaverysimpleinterfacetoperiodicallyreceiveadaptationinformationlikethecachelife(thetimestampdifferencebetweentheMRUandLRUpagesinthecache)ofthenexthighercache(s).Atregularintervals,eachcache,otherthanthelowest,sendsthehinttothenextlowercache(Lines1-3),usingwhich,theadaptRatioiscomputed(Lines23,41-43).ThegoalistoadaptprobPromoteinsuchawaythattheadaptRatioapproaches0.5(avaluethatimpliesthattheusefulnessofpagesevictedfromthehighercacheisthesameasofthosefromthelowercache).Ifthehighercachehasalargerlife(adaptRatio>0.5),probPromoteisincreased,elseitisdecreased.SincethereisalagbetweentheadaptationinprobPromoteanditsimpactontheadaptRatio,therecomputationofprobPromote(Lines4-17)isdoneonlyonalternatetimesthehintisreceived(Line3).Further,ifthepreviousadaptationofprobPromoteisfoundtohavereducedtheseparationofadaptRatioand0.5byareasonablefraction(Lines7-10),thennofurtheradaptationisdone.Toavoidanyrunawayadaptation,probPromoteneedstobecarefullyadaptedsothattheadaptationisproportionaltothedifferencebetweenadaptRatioand0.5andalsoisslowerwhenclosetoextremevaluesof0and1(Lines11-12).Tostartoffinafairfashion,probPromoteisinitializedaccordingtothesizesofthecaches.Sincethehighercachesusuallydemonstratehigherhitratesthanthelowercaches,weforbidprobPromotetogobeyondtheratiothusdetermined(Lines13-14).

LetusexaminesomeexamplesofPROMOTEinexistingcachemanagementpolicies:

1)PROMOTE-LRU:AsshowninFigure6,LRUisimplementedateachcachelevel,augmentedbythePROMOTEprotocol.ThedynamicadaptationofprobPromoteateachlevel,resultsinequalizingthecachelivesanditcanbeshownthatthecachehierarchyachievesahitratioequaltothatofasinglecacheoftheaggregatesize.Thesameistrue,ifinsteadofthe

USENIX AssociationFAST ’08: 6th USENIX Conference on File and Storage Technologies

Applications/TracesReplyReadApplications/TracesReplyReadT1T2T1 MRUT2 MRUC1B1LRUB2C1Reply +promoteHintorARCLRUorARCT1T2T1 MIDT2 MIDC2B1LRUB2C2Reply +promoteHintorARCLRUorARCT1T2C3B1LRUB2C3orARCLRUT1 LRUT2 LRU󰀁k|C|/ii=1i=1|Ci|.Insteadofthecachelife,thelife/occupancyoftheT2listispassedtolowercaches,whereoccupancyisthefractionofthecacheoccupiedbyT2.MerelyusingthecachelifeforT2listdidnotfarewell,comparedtoDEMOTE-ARC,inunlimitedbandwidthcases.TheprobPromotefortheT2listsisdynamicallyadaptedateachlevelsoastoequalizelife/occupancyacrossthehierarchy(Lines41-43).AnotherhintcalledtheT2hintisusedalongwithreadrequestsandrepliestoindicatethatthepageshouldbestoredinaT2listasithasbeenseenearlier(Line46).Ifanycachedecidestokeepthepage(Line59),itcreatesthepageinT2ifT2hintistrue;elseitcreatesitinT1.E.HandlingMulti-pathHierarchiesSincePROMOTEdoesnotsigniﬁcantlyalterthelocalcachingpolicy,extensionstomulti-pathhierarchiesisassimpleasrequiringthatthecacheswithmultipledirectlyconnectedhighercachesmaintainaseparateprobPromotevaluecorrespondingtoeachsuchhighercache,andthathintsbesenttoalldirectlyconnectedlowercaches.Itmaynotalwaysbepossibletoequalizethecachelivesormarginalutilities,however,merelyallowingtheadaptationtoattempttheequalizationresultsinbetterperformance.Ontheotherhand,itisdifﬁculttoconceiveamulti-pathversionofDEMOTEinmanycases(e.g.ARChierarchiesinFigures14and15).Hence,PROMOTEisnotonlyeasiertoapplytomulti-levelcachesthanDEMOTE,itisalsomorebroadlyapplicable.IV.EXPERIMENTALSET-UPA.TracesWeusebothsyntheticandreal-lifescenariotracesthathavebeenwidelyusedforevaluatingcachingalgorithms.P1-P14:Thesetraces[25],[17]werecollectedoverseveralmonthsfromworkstationsrunningWindowsNTbyusingVTrace[23].Financial1andFinancial2:Thesetraces[16]werecollectedbymonitoringrequeststodisksofOLTPapplicationsattwolargeﬁnancialinstitutions.SPC1:Weuseatrace(asseenbyasubsetofdisks)whenservicingtheSPC-1benchmark[24].Itcombinesthreeworkloadsthatfollowpurelyrandom,sequential,andhierarchicalreuseaccessmodels.Thissyntheticworkloadhasbeenwidelyusedforevaluatingcachealgorithms[25],[14],[16],[3].ZipfLike:WeuseasynthetictracethatfollowsaZipf-like[37]distribution,wheretheprobabilityoftheithpagebeingreferencedisproportionalto1/iα(α=0.75,over400Kblocks).Thisapproximatescommon󰀁k−1DemotedpagesReadReadReadReadDemotedpagesReadorARCB1B2ReadReplyDisksPROMOTE LRU or ARCDisksDEMOTE LRU or ARCFig.6.CommunicationprotocolsforthePROMOTEtechnique(leftpanel)andtheDEMOTEtechnique(rightpanel).Althoughshowninthesamediagram,eitherLRUorARCdata-structuresareusedconsistentlyacrossalllevels.cacheliveswechoosetoequalizethemarginalutilityofthecaches.ThemarginalutilitycanbecomputedbymeasuringthehitrateonaﬁxednumberofpagesintheLRUendsofthecaches[14].Toavoidtheextracomplexity,wedonotusethemarginalutilityapproachinthispaper.2)PROMOTE-ARC:First,wesummarizetheARCalgorithm[25]:ARCmaintainscpagesinthecacheandchistorypages.LRUlistT1containspagesthathavebeenseenonlyoncerecently,andT2containsthepagesthathavebeenseenatleasttwicerecently.TheaddressesofthepagesrecentlyevictedfromT1andT2arestoredinB1andB2,respectively.|T1|+|T2|≤cisenforced,whiletherelativesizesofthelistsareadaptedaccordingtotherelativeratesofhitsinthecorrespondinghistorylistsB1andB2.Forsimplicity,considerasingle-pathhierarchyofARCcaches(Figure6).Theoretically,thecachescouldpassthemarginalutilityoftheT1andT2liststolowercacheswhichcoulddynamicallyadaptprobPromoteateachcacheleveltoequalizetheutilityacrossthehierarchy.However,itcanbechallengingtoadaptprobPromoteinastablewayforbothT1andT2listsateachlevel.WefoundasimplepolicythatworksverywellforARC.FortrafﬁcdestinedforT1lists,probPromoteateachcacheCkissettoaﬁxedvalue56

FAST ’08: 6th USENIX Conference on File and Storage Technologies

ReplyReplyHintReplyHintUSENIX Association

accesspatterns,suchasinwebtrafﬁc[9],[5].Multi-levelcachingalgorithms[33],[12]haveemployedthistraceforevaluations.Sincewritecachemanagementpoliciesneedtolever-agebothtemporalandspatiallocality(see[15]),thewritecacheistypicallymanagedusingapolicydistinctfromthereadcache.Followingpreviouswork[33],[12],wefocusonthereadcomponentofcachesandchoosetoignorethewritesforsimplicity.IncludingthewriteswouldonlyturnthecomparisonsmoreinfavorofPROMOTEastheywouldincreasecontentionforthediskandnetworkresources,ascenarioinwhichPROMOTEeasilyoutshinesDEMOTE.Eachtraceislimitedtotheﬁrsttwomillionreadstoshortentheexperimentdurations.B.SoftwareSetupDemotedPagesReadRequestRead Reply+ T2hint (T)+ promoteHint bit (H)DEMOTEinterface:UsedonlybytheDEMOTEtechniquetotransferevictedpagestothenextlowercache.•Hintinterface:UsedbythePROMOTEtechniquetoperiodicallytransferlifeandoccupancyinfor-mationtothenextlowercache.CacheSimsimulatesthefollowingrealisticroundtripresponsetimesforstoragesystemhierarchies(basedonourknowledge):Fortwo-levelscenarios:t1=0.5ms,t2=1.0ms,tm=5.0.Forthree-levelscenarios:t1=0.5ms,t2=1.0ms,t3=2.0ms,tm=10.0ms.Theresultsinthispaperareapplicableforanysetofvalueswhereti57

TwoCacheHierarchyontraceP1P1: C1-C2 Traffic 4e+06OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRU 90 80OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRUP1: Hit RatioTotal Inter-cache Traffic (512 byte blocks) 3.5e+06 3e+06 2.5e+06 2e+06 1.5e+06 1e+06 5000000KAggregate Hit Ratio (%)90K100K 70 60 50 40 30 20 100K10K20K30K40K50K60K70K80K10K20K30K40K50K60K70K80K90K100KSize of Each Cache (512 byte blocks)P1: Avg. Response Time (Unlimited Bandwidth) 4.5 4OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRU 6.5 6Size of Each Cache (512 byte blocks)P1 Avg. Response Time with Limited BandwidthOPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRUAverage Response Time (ms)Average Response Time (ms)90K100K 5.5 5 4.5 4 3.5 3 2.5 2 1.5 3.5 3 2.5 2 1.5 10K10K20K30K40K50K60K70K80KSize of Each Cache (512 byte blocks) 10K10K20K30K40K50K60K70K80KSize of Each Cache (512 byte blocks)90K100KFig.8.Onthex-axisisthesizeoftheC1andC2caches.Weplottheinter-cachetrafﬁc(top-left),theaggregatehitratio(top-right),andtheaverageresponsetimeallowingunlimitedbandwidthandfreedemotions(bottom-left),asafunctionofthecachingalgorithmandcachesize.Inalimitedbandwidthscenario(300blockspersecond:1.5timesthatrequirediftherewerenohitsordemotions),PROMOTEoutperformsDEMOTEsigniﬁcantly(bottom-right).bandwidthusagethanonthenumberorlocationofhits.Hence,wemeasureboththeinter-cachebandwidthusageandtheaverageresponsetime(assumingunlim-itedbandwidth).Theactualbandwidthlimitdependsonthehardwareandthenumberandintensityofotherapplicationsusingthesamenetworkfabric.Weprovidemeasurementsforsomelimitedbandwidthcasesaswell.Inourexperimentalresults,errorbarsarenotshownsincetheaverageresponsetimesoverseparaterunswaswithin1%,evenfortheadaptivealgorithms.V.RESULTSA.VeryCloseBoundsonOfﬂineOptimalPerformance:OPT-UBandOPT-LBWecomputedtheaverageresponsetimesforOPT-UBandOPT-LBforawiderangeoftraces(SectionIV-A)andcachesizes(Figures8through13),andfoundthatonanaveragetheboundsranwithin2.18%and2.83%ofeachotherfortwoandthreelevelsingle-pathhierarchies,respectively.Themaximumseparationbetweenboundsforanytraceandcachecombinationwasonly8.6%and10.0%forthetwoandthreelevelcaches,respectively.Intermsofinter-cachebandwidthusage,OPT-LBisoptimalandcoincideswithOPT-UBfortheC1-C2trafﬁc.ThisisbecauseOPT-LBdoesnotuseanydemotionsandachievesthemaximumpossiblehitsintheC1cache(asgivenbyBelady’sOPT).ForC2-C3trafﬁc,theboundsrun,onanaverage,within3.4%ofeachother.OPT-LBisnotoptimalfortheC2-C3trafﬁcbecauseitdoesnotusedemotionsbetweentheC1andC2whichcouldhavepotentiallyreducedthenumberofmissesﬂowingoutofC2.Inamorecomplexmulti-pathscenarioshowninFigure14(andFigure15),theboundsranabout8.5%(and10.8%)ofeachotherintermsoftheaverageresponsetime,andcoincidedintermsofinter-cachetrafﬁc.Webelievethattheclosenessoftheseboundsinpracticeandthefactthattheyaresigniﬁcantlysuperiortothecurrentstate-of-the-artmulti-levelcachingalgo-rithms(Figures8through13)constituteanextremelysigniﬁcantresult,andprovideanimportantmissingperspectivetotheﬁeldofmulti-levelcaching.B.TwoCacheHierarchyInFigure8,wepresentdetailedresultsforaﬁrsttrace,P1,inatwocache(samesize)hierarchy.Weob-servedsimilarresultswithalltracesgiveninSectionIV-58

FAST ’08: 6th USENIX Conference on File and Storage TechnologiesUSENIX Association

TwoCacheHierarchyontracesP3,P5,andP7P3: C1-C2 Traffic 4e+06OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRU 5 4.5P3: Avg. Response Time (Unlimited Bandwidth)OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRUTotal Inter-cache Traffic (512 byte blocks) 3.5e+06 3e+06 2.5e+06 2e+06 1.5e+06 1e+06 5000000KAverage Response Time (ms)90K100K 4 3.5 3 2.5 2 1.50K10K20K30K40K50K60K70K80K10K20K30K40K50K60K70K80K90K100KSize of Each Cache (512 byte blocks)P5: C1-C2 Traffic 4e+06OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRU 5 4.5Size of Each Cache (512 byte blocks)P5: Avg. Response Time (Unlimited Bandwidth)OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRUTotal Inter-cache Traffic (512 byte blocks) 3.5e+06 3e+06 2.5e+06 2e+06 1.5e+06 1e+06 5000000KAverage Response Time (ms)90K100K 4 3.5 3 2.5 2 1.50K10K20K30K40K50K60K70K80KSize of Each Cache (512 byte blocks)P7: C1-C2 Traffic10K20K30K40K50K60K70K80KSize of Each Cache (512 byte blocks)90K100KP7: Avg. Response Time (Unlimited Bandwidth)OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRU 5 4.5OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRU 4e+06Total Inter-cache Traffic (512 byte blocks) 3.5e+06 3e+06 2.5e+06 2e+06 1.5e+06 1e+06 5000000KAverage Response Time (ms)90K100K 4 3.5 3 2.5 2 1.50K10K20K30K40K50K60K70K80K10K20K30K40K50K60K70K80K90K100KSize of Each Cache (512 byte blocks)Size of Each Cache (512 byte blocks)Fig.9.Onthex-axisisthesizeoftheC1andC2caches.Weplottheinter-cachetrafﬁc(left),andtheaverageresponsetimeallowingunlimitedbandwidthandfreedemotions(right),asafunctionofthecachingalgorithmandcachesize.A,theresultsforsomeofwhichareshowninFigures9and10.Inter-cacheBandwidthUsage:PROMOTE2xmoreefﬁcientthanDEMOTE.InFigure8-10,weplotthetotaltrafﬁcbetweentheC1andC2caches(demotions+C1misses).Observethatasthecachesizesgrow,theinter-cachetrafﬁcdecreasesasC1producesmorehits.ForbothLRUandARCvariants,DEMOTEgeneratesmorethandoublethetrafﬁcgeneratedbyPROMOTE.ThisisbecauseDEMOTEcausesademotionforeveryC1miss(afterC1isfull),andalsoincursmoremissesinC1thanPROMOTE.Thisistrueforalltracesandcachesizes,where,onanaverage,DEMOTErequires101%moreinter-cachebandwidththanPROMOTEfortheLRUvariant,andabout121%morefortheARCvariant.AggregateHitRatio:PROMOTEsameasDEMOTE.InFigure8,weobservethatbothPROMOTE-LRUandPROMOTE-ARCachievealmostthesameaggregatehitratioastheirDEMOTEcounterparts.Thiswasobservedforalltracesandcachesizes.WealsoconﬁrmthatplainLRUachievesthelowestaggregatehitratioastheinclusivenatureofthelowercacheresultsinveryfewhits.Pleasenote,however,thattheaggregatehitratioisnotareliableperformancemetric.HitsintheHighestCache:PROMOTEbeatsDEMOTE.Forthesameaggregatehitratio,ahighernumberUSENIX AssociationFAST ’08: 6th USENIX Conference on File and Storage Technologies

TwoCacheHierarchyontracesFinancial2,SPC1,andZipf-likeFinancial2: C1-C2 Traffic 3e+06OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRU 3.5Financial2: Avg. Response Time (Unlimited Bandwidth)OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRUTotal Inter-cache Traffic (512 byte blocks) 2.5e+06 2e+06Average Response Time (ms)90K100K 3 2.5 1.5e+06 2 1e+06 500000 1.5 00K10K20K30K40K50K60K70K80KSize of Each Cache (512 byte blocks)spc1: C1-C2 Traffic 10K10K20K30K40K50K60K70K80KSize of Each Cache (512 byte blocks)90K100Kspc1: Avg. Response Time (Unlimited Bandwidth)OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRU 5 4.5OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRU 4e+06Total Inter-cache Traffic (512 byte blocks) 3.5e+06 3e+06 2.5e+06 2e+06 1.5e+06 1e+06 5000000KAverage Response Time (ms)90K100K 4 3.5 3 2.5 2 1.5 10K10K20K30K40K50K60K70K80K10K20K30K40K50K60K70K80K90K100KSize of Each Cache (512 byte blocks)zipf: C1-C2 Traffic 4e+06OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRU 4.5Size of Each Cache (512 byte blocks)zipf: Avg. Response Time (Unlimited Bandwidth)OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRUTotal Inter-cache Traffic (512 byte blocks) 3.5e+06 3e+06 2.5e+06 2e+06 1.5e+06 1e+06 5000000K 4Average Response Time (ms)90K100K 3.5 3 2.5 210K20K30K40K50K60K70K80K 1.50K10K20K30K40K50K60K70K80K90K100KSize of Each Cache (512 byte blocks)Size of Each Cache (512 byte blocks)Fig.10.Onthex-axisisthesizeoftheC1andC2caches.Weplottheinter-cachetrafﬁc(left),andtheaverageresponsetimeallowingunlimitedbandwidthandfreedemotions(right),asafunctionofthecachingalgorithmandcachesize.PROMOTE-ARCDEMOTE-ARCPROMOTE-LRUDEMOTE-LRUP1709476408174653880590276P3546440333751446803140384P5377332280481210473172625P7456983263244222156125960P9570250469771511271396135P11692042540978639430667633Financial1532270506503644893688946Financial2120424394700914101011488752spc1533414390810291183155197zipf845104365909791667760877TABLEI.NumberofC1hits(outof2000000requests),atacachesizeof50Kblocks.WhileaggregatehitswerealmostthesameforbothPROMOTEandDEMOTE,weobservedthatPROMOTEaccumulatesmorehitsinC1.60

FAST ’08: 6th USENIX Conference on File and Storage TechnologiesUSENIX Association

VaryingRelativeCacheSizesinaTwoCacheHierarchyP1: C1-C2 Traffic 4e+06OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRU 3.6 3.4P1: Avg. Response Time (Unlimited Bandwidth)OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRUTotal Inter-cache Traffic (512 byte blocks) 3.5e+06 3e+06 2.5e+06 2e+06 1.5e+06 1e+06 50000010KAverage Response Time (ms)90K 3.2 3 2.8 2.6 2.4 2.2 220K30K40K50K60K70K80K 1.810K20K30K40K50K60K70K80K90KSize of C1 (Higher) Cache (512 byte blocks)Size of C1 (Higher) Cache (512 byte blocks)Fig.11.Onthex-axisisthesizeoftheC1(higher)cache.|C1|+|C2|=100Kblocks.ofhitsinthehighestcacheleadstobetteraverageresponsetimes.InTableI,wecomparethenumberofhitsinC1forawiderangeoftraceswithtwolevelsofcacheof50Kblockseach.Whileallthetracesexhibitedsimilarbehavior,weskipsometracesinthetabletokeepitsmall.WeobservethatPROMOTE-LRUbeatsDEMOTE-LRUby13.0%onanaverage,andPROMOTE-ARCbeatsitsDEMOTEcontenderby37.5%.ThisisprimarilybecausePROMOTEprobabilis-ticallypromoteshighreusepagestothehighercache,whileDEMOTEforcesallpagestoﬂowthroughthehighercache,pushingoutagreatnumberofhitstothelowercachelevels.AverageResponseTime:PROMOTEbeatsDEMOTE.InFigure8weplottheaverageresponsetimeforthetraceP1atvariouscachesizes.Whenweassumeanunlimitedinter-cachebandwidthandfreedemotions(althoughDEMOTEhasaroughmodeltoattributefordemotioncosts,wecreatethecasemostfavorabletoDEMOTE),PROMOTE-LRUstillbeatsDEMOTE-LRUbyupto4%andPROMOTE-ARCbeatsDEMOTE-ARCbyupto5%acrossallcachesizes.Forallothertraces(someshowninFigures9and10)PROMOTEachieves0.3%(LRU)and1.5%(ARC)betterresponsetimesonanaverage.InthelowerrightpanelofFigure8,weexaminealimitedbandwidthcase,whichismorerealistic.Weallow300blockspersecond,whichis1.5xthatrequirediftherewerenohitsordemotions(1/tm=200).Whenweaveragetheresponsetimeacrossallcachesizes,weobservethatPROMOTEsubstantiallyoutperformsDEMOTEbyachievinglowerresponsetimes,3.21ms(forARC)and3.42ms(forLRU),ascomparedtoDEMOTEwith5.43ms(forARC)and5.05ms(forLRU),respectively.Infact,bothDEMOTE-LRUandDEMOTE-ARCconsistentlyperformworsethanevenplainLRU.Surprisingly,forsmallercachesizes,DE-MOTEvariantsperformevenworsethannocachingatall(i.e.worsethantm=5ms)!Thishappensbecause,whenbandwidthisthebottleneckandweuseDEMOTE,thebandwidthsavedduetoonehitincacheC1isconsumedbyonedemotionduetoamiss.WhenthenumberofmissesisgreaterthanthenumberofhitsinthecacheC1(<50%hitratio),ano-cachingpolicywillactuallyperformbetter.SinceDEMOTEisclearlyworseinthelimitedband-widthcase,weconsistentlyassumeunlimitedinter-cachebandwidthandfreedemotionsfortheremainingtracesshowninFigures9and10.C.DifferingCacheSizesInFigure11,wevarytherelativesizeoftheC1andC2cachesfrom1:9to9:1whilekeepingtheaggregatecachesizeequalto100Kblocks.Wepresentaverageresponsetimeandinter-cachetrafﬁc(assumingunlimitedbandwidthandfreedemotions)forthetraceP1(othertraceshavesimilarresults).WeobservethatPROMOTEvariantshaveconsistentlybetterresponsetimesthantheDEMOTEvariantsacrosstheentirespectrumofrelativesizes.TheaverageresponsetimeforplainLRUpeaks(implyingthatthehitratioisthelowest)whenthetwocachesareofthesamesize.Thisconﬁrmsthatthemostduplicationofdatahappenswhenthecachesareofcomparablesizes.Foralltheotheralgorithms,theaverageresponsetimedecreasesasthesizeoftheC1cacheincreasesasmorehitsoccuratthehighestcache.Asbefore,weobservethattheDEMOTEvariantsinvariablyconsume2xbandwidthwhencomparedtothePROMOTEvariants.D.VaryingInter-cacheBandwidthWeconsideraclientusing50Kblockseachintwolevelsofcache,andhavinganetworkimpactofupto500KBps.InatypicalenterpriseSANenviron-ment,therearethousandsofsuchconcurrentappli-cationthreads,scalingtheneedforbothcacheandUSENIX AssociationFAST ’08: 6th USENIX Conference on File and Storage Technologies

P1 Avg. Response Time with Varying Available Bandwidth 10OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRUAverage Response Time (ms) 8 6 4 2 0 0 50 100 150 200 250 300 350 400 450 500Available inter-cache bandwidth (KBps)Fig.12.Inatwo-cachehierarchywith50Kblockseach,wevarytheinter-cachebandwidthavailabletoaclientonthex-axis.Onthey-axisweplottheaverageresponsetimeinmilliseconds.networkresourcesbymanyordersofmagnitude.Infact,evenwithoutdemotions,manySANenvironmentsareregularlyfoundbandwidth-bound(particularlywithsequentialreads),implying,asweobservebelow,thatimplementingDEMOTEmightbedetrimentaltoperfor-mance.InFigure12,weplottheaverageresponsetimeasafunctionoftheinter-cachebandwidthavailabletotheclient.Foreachalgorithm,wenoticetworegimes,abandwidth-sensitiveregimeontheleftsidewheredecreasingthebandwidthavailableincreasestheav-erageresponsetime,andabandwidth-insensitive,ﬂatregime,ontheright.Asexpected,OPT-UB,closelyfollowedbyOPT-LB,performsthebest,withthelowestaverageresponsetimeandtheleastinter-cachebandwidthrequirement(asindicatedbythelongﬂatportionontheright).Notethat,theDEMOTEvariantsperformevenworsethanplainLRUwhenbandwidthisnotabundant.SANenvironmentswhichcannotaccom-modatethe2xnetworkcostofDEMOTEoverLRUwillseethisbehavior.ThisfundamentalconcernhaslimitedthedeploymentofDEMOTEalgorithmsincommercialsystems.ThePROMOTEvariantsaresigniﬁcantlybetterthantheDEMOTEvariantswhenbandwidthislimited,whiletheyoutperformLRUhandsomelywhenbandwidthisabundant.Asbandwidthisreduced,LRUbecomesonlymarginallyworsethanPROMOTEbecausethebeneﬁtofmorehitsinthelowercache,C2,isnolongerfeltasthebandwidthbetweenthecachesbecomesthebottleneck.Overall,PROMOTEperformsthebestinallbandwidthregimes.E.ThreeCacheHierarchyIncreasingthecomplexityofthehierarchieswestudy,wenowturntoathree-level(threeequalsizecaches)hierarchy.Asinthetwo-levelcase,wepresentdetailedresultsfortheﬁrsttraceP1inFigure13.Theothertraceshadsimilarresultsbutwedonotpresentplotsforlackofspace.Weobservethatforthewidevarietyoftracesandcachesizes,PROMOTEoutperformsDEMOTEinthree-levelcachesaswell:Inter-cacheBandwidthUsage:PROMOTEis2xmoreefﬁcientthanDEMOTEwhichuses105%(111%resp.)morebandwidthbetweenC1andC2and98%(113%resp.)morebandwidthbetweenC2andC3,whencomparedtoPROMOTE-LRU(PROMOTE-ARC,respectively).AggregateHitRatio:PROMOTEsameasDEMOTE.HitsintheHighestCache:PROMOTEachieves1.5%and10%morehitsinthetoptwocachesthanDEMOTEfortheLRUandARCvariants,respectively.AverageResponseTime:Whenbandwidthisnotlimitedanddemotionsarefree,PROMOTEbeatsDE-MOTEby0.2%and1.3%ontheaverageresponsetimeforLRUandARCvariants,respectively.Foralimitedbandwidthcase,whereweallow200blockspersecond(2xtimes1/tm=100),Whenweaveragetheresponsetimeacrossallcachesizes,weobservethatPROMOTEsubstantiallyoutperformsDEMOTEbyachievinglowerresponsetimes,5.61ms(forARC)and5.93ms(forLRU),ascomparedtoDEMOTEwith8.04ms(forARC)and7.57ms(forLRU),respectively.F.MoreComplexCacheHierarchiesPROMOTEcanbeappliedtocomplexhierarchies.Asthepossibleconﬁgurationsareendless,wepicktwosimpleandyetinterestingconﬁgurationsforourexperiments.1)Tree-likeHierarchy:Weuseahierarchyofthreecaches,C1a(40Kblocks)andC1b(30Kblocks)attheﬁrstlevel,andasharedcacheC2(50Kblocks)atthesecond-level(seeFigure14).WhileC1aservesoneofP2,P3,P4orP5traces,C1bservestheP1trace.WeimposenobandwidthrestrictionsandassumefreedemotionsfortheDEMOTEalgorithm.Weobservethatforallfourcombinations,thePROMOTE-LRUhasequalorbetteraverageresponsetimewhilegeneratingonlyhalftheinter-cachetrafﬁcthanDEMOTE-LRU.DE-MOTEcannotbeappliedtotree-likeARChierarchies,allowingPROMOTE-ARCtowinbydefault.ThisisbecauseDEMOTEsimulatesaglobalARCalgorithmwhichadaptstheratiooftheglobalARClists,T1andT2.Insingle-pathscenarios,thesameratioofthetwoARClists,|T1|:|T2|,canbeenforcedatalllevels.However,inmulti-pathscenarios,thisisnotalwayspossible,astheamountofT2pagesinacachedependsontheworkloaditsees,andtheratiodeterminedbyARCmaynotbeenforceable.62

FAST ’08: 6th USENIX Conference on File and Storage TechnologiesUSENIX Association

P1: C1-C2 Traffic 4e+06OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRU 90 80OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRUP1: Hit RatioTotal Inter-cache Traffic (512 byte blocks) 3.5e+06 3e+06 2.5e+06 2e+06 1.5e+06 1e+06 5000000KAggregate Hit Ratio (%)90K100K 70 60 50 40 30 20 100K10K20K30K40K50K60K70K80KSize of Each Cache (512 byte blocks)10K20K30K40K50K60K70K80KSize of Each Cache (512 byte blocks)90K100KP1: Avg. Response Time (Unlimited Bandwidth) 9 8OPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRU 9 8P1 Avg. Response Time with Limited BandwidthOPT-UBOPT-LBPromote-ARCDemote-ARCPromote-LRUDemote-LRULRUAverage Response Time (ms) 7 6 5 4 3 20KAverage Response Time (ms)90K100K 7 6 5 4 3 20K10K20K30K40K50K60K70K80K10K20K30K40K50K60K70K80K90K100KSize of Each Cache (512 byte blocks)Size of Each Cache (512 byte blocks)Fig.13.Onthex-axisisthesizeofeachcache:C1,C2,andC3.WepresentthetotaltrafﬁcbetweenC1andC2(top-left),theaggregatehitratio(top-right),andtheaverageresponsetimeallowingunlimitedbandwidthandfreedemotions(bottom-left)forarangeofper-levelcachesizes.Inalimitedbandwidthscenario(200blockspersecond:2timesthatrequirediftherewerenohitsordemotions),PROMOTEoutperformsDEMOTEsigniﬁcantly(bottom-right).P2 (or P3,P4,P5) P1C1a40K blockst(c1a) = 0.5ms P2’C1b30K blocksP1’t(c1b) = 0.5msP1+P2DEMOTE-LRUPROMOTE-LRUPROMOTE-ARCOPT-LBOPT-UBDEMOTE-LRUPROMOTE-LRUPROMOTE-ARCOPT-LBOPT-UBDEMOTE-LRUPROMOTE-LRUPROMOTE-ARCOPT-LBOPT-UBDEMOTE-LRUPROMOTE-LRUPROMOTE-ARCOPT-LBOPT-UBHitRatioContribution=hits/4000000C1aC1bC2Aggr0.140.100.120.360.140.120.100.360.160.140.130.430.270.240.100.610.270.240.120.630.020.040.100.200.200.020.010.030.090.090.030.040.090.170.170.100.110.140.240.240.100.110.140.240.240.100.130.140.240.240.100.080.100.120.140.060.060.100.080.110.080.050.090.100.120.230.230.340.560.590.180.180.270.410.440.210.220.320.510.53Traf.(MBlocks)C1a-C1b-C2C22.833.161.461.541.361.430.901.040.901.043.771.831.591.211.213.821.941.871.661.663.681.861.661.331.333.161.551.451.041.043.161.551.451.041.043.161.551.431.041.04Avg.Resp.Time3.453.453.132.292.124.014.013.512.532.294.234.223.853.202.984.084.063.602.742.53t(c2) = 1.0msC250K blocksP1+P3t(m) = 5.0msP1+P4h1a (OPTUB) = h1a (OPTLB) = hitOpt(P2, 40K)h1b (OPTUB) = h1b (OPTLB) = hitOpt(P1, 30K)h2 (OPTLB) = hitOpt(P1’+P2’, 50K)h2 (OPTUB) = hitOpt(P1+P2, 120K) − (h1a + h1b)P1+P5Fig.14.Atree-likemulti-pathcachehierarchy.Tracesdonotoverlap(2millionreadseach).Foreaseofcomparisonbetweencaches,theindividualhitratiocontributionsarenormalizedbasedonthetotalnumberofreadsinthecachehierarchy.USENIX AssociationFAST ’08: 6th USENIX Conference on File and Storage Technologies

P1 + P2 (or P3,P4,P5)t(c1) = 0.5ms P2’t(c2a) = 1.0msC2a40K blocksC150K blocksP1’t(c2b) = 1.0msC2b30K blocksP1+P2DEMOTE-LRUPROMOTE-LRUPROMOTE-ARCOPT-LBOPT-UBDEMOTE-LRUPROMOTE-LRUPROMOTE-ARCOPT-LBOPT-UBDEMOTE-LRUPROMOTE-LRUPROMOTE-ARCOPT-LBOPT-UBDEMOTE-LRUPROMOTE-LRUPROMOTE-ARCOPT-LBOPT-UBHitRatioContribution=hits/4000000C1C2aC2bAggr0.190.100.060.350.180.110.060.350.230.130.070.430.460.060.080.600.460.000.000.630.110.150.200.400.400.130.110.170.320.320.120.130.180.360.360.080.040.080.060.000.010.020.030.050.000.040.030.070.060.000.070.050.060.100.000.060.060.060.030.000.070.060.070.090.000.250.240.350.560.590.200.190.260.410.440.220.220.310.510.53Traf.(MBlocks)C1-C1-C2aC2b3.173.271.731.541.621.461.051.091.051.093.831.871.781.331.333.871.961.961.751.753.751.911.861.491.493.251.551.401.081.083.041.581.370.950.953.271.581.441.061.06Avg.Resp.Time3.503.493.182.352.103.933.973.522.572.274.154.203.883.212.984.074.073.662.772.51P1+P3t(m) = 5.0msP1+P4h1 (OPTUB) = h1 (OPTLB) = hitOpt(P1+P2, 50K)h2a (OPTLB) = hitOpt(P2’, 40K)h2b (OPTLB) = hitOpt(P1’, 30K)h2a+h2b (OPTUB) = hitOpt(P1+P2, 120K) − h1P1+P5Fig.15.Aninvertedtree-likesingle-pathcachehierarchy.TwotracesaremergedbeforepresentingtocacheC1.Foreaseofcomparisonbetweencaches,theindividualhitratiocontributionsarenormalizedbasedonthetotalnumberofreadsinthecachehierarchy.InFigure14,wealsoshowthestepsusedtocomputeOPT-UBandOPT-LBinaccordancetoSectionII-D.Asusual,weobservethatOPT-LBandOPT-UBprovideclosebounds(8.5%apart)ontheoptimalperformanceforthegivenhierarchy.2)InvertedTree-likeHierarchy:WeinvertthecachehierarchyusedaboveasshowninFigure15.AttheﬁrstlevelwehaveasinglecacheC1a(50Kblocks)andatthesecondlevelwehaveC2a(40Kblocks)andC2b(30Kblocks).ThetraceP1accessesdatathroughtheC1,C2ahierarchy,whilethetracesP2,P3,P4orP5areservedthroughtheC1,C2bhierarchy.WeagainnoticethatPROMOTE-LRUperformswithin1%oftheDEMOTEvariantintermsofresponsetime,andistwiceasefﬁcientintermsofbandwidthusage.PROMOTE-ARCperformsmuchbetterthantheLRUbasedalgo-rithmsasexpected.DEMOTEcannotgeneralizeARCforthishierarchyandthusisnotacontender.AgainweobservethatOPT-LBandOPT-UBprovideclosebounds(10.8%apart)ontheoptimalperformanceforthegivenhierarchy.VI.CONCLUSIONSAslargecachesarebecomingubiquitous,multi-levelcachingisemergingasanimportantﬁeldforinnovation.Inthispaperwehavemadetwomajorcontributions.Wehavedemonstratedasimpleandpowerfultech-nique,calledPROMOTE,whichissigniﬁcantlysupe-riortothepopularDEMOTEtechnique.Forhalfthebandwidth,PROMOTEprovidessimilaraggregatehitratiosforavarietyofworkloads,withmorehitsinthetopmostcache.Thereductioninbandwidthusageprovideshugeimprovementsinaverageresponsetimeswhennetworkresourcesarenotabundant.Eveninconstrainedbandwidthcases,unlikeDEMOTE,PRO-MOTEalwaysperformsbetterthananon-exclusivehierarchyofcaches.Thischaracteristicisessentialforimplementationincommercialsystemswherenetworkusagebehaviorcannotbepredicted.WeanticipatetheprinciplesinthePROMOTEtechniquetoengendermoresophisticatedmulti-levelcachingpoliciesinthefuture.Whileimprovingcachingalgorithmsisimportant,knowingthetheoreticalboundsonperformanceisex-tremelyinvaluable.Wehaveprovidedthismuchneededknowledgeintheformoftwoverycloseboundsontheoptimalperformance.OPT-UBdelimitsthebestpossibleresponsetimeandbandwidthusageforanymulti-levelcachingpolicy,while,OPT-LBservesasthebestknownoff-linemulti-levelcachingpolicythatweareawareof.Wehopethatthesenewboundswillspurandguidefutureresearchinthisﬁeld.VII.ACKNOWLEDGMENTSWeareindebtedtotheanonymousreviewersofthepaperfortheirinsightfulcomments.Wearealsograte-fultoDr.TheodoreWong,ourshepherd,fordetailedcommentsandsuggestionsthatgreatlyimprovedthereadabilityofthepaper.REFERENCES[1]J.-L.BaerandW.-H.Wang.Ontheinclusionpropertiesformulti-levelcachehierarchies.InISCA’88:Proceedingsofthe15thAnnualInternationalSymposiumonComputerarchitecture,pages73–80,LosAlamitos,CA,USA,1988.IEEEComputerSocietyPress.[2]LakshmiN.Bairavasundaram,MuthianSivathanu,AndreaC.Arpaci-Dusseau,andRemziH.Arpaci-Dusseau.X-RAY:ANon-InvasiveExclusiveCachingMechanismforRAIDs.InProceedingsofthe31stAnnualInternationalSymposiumonComputerArchitecture(ISCA’04),Munich,Germany,June2004.64

FAST ’08: 6th USENIX Conference on File and Storage TechnologiesUSENIX Association

[3]S.BansalandD.Modha.Car:Clockwithadaptivereplacement,

2004.

[4]L.A.Belady.Astudyofreplacementalgorithmsforvirtual

storagecomputers.IBMSys.J.,5(2):78–101,1966.

[5]LeeBreslau,PeiCao,LiFan,GrahamPhillips,andScott

Shenker.WebcachingandZipf-likedistributions:Evidenceandimplications.InINFOCOM(1),pages126–134,1999.

[6]Z.Chen,Y.Zhou,andK.Li.Eviction-basedcacheplacement

forstoragecaches.InProceedingsofthe2003USENIXAnnualTechnicalConference,pages269–282,2003.

[7]ZhifengChen,YanZhang,YuanyuanZhou,HeidiScott,and

BerniSchiefer.Empiricalevaluationofmulti-levelbuffercachecollaborationforstoragesystems.InSIGMETRICS’05:Proceedingsofthe2005ACMSIGMETRICSinternationalconferenceonMeasurementandmodelingofcomputersystems,pages145–156,NewYork,NY,USA,2005.ACMPress.[8]F.J.Corbat´o.Apagingexperimentwiththemulticssystem.

InInHonorofP.M.Morse,pages217–228.MITPress,1969.AlsoasMITProjectMACReportMAC-M-384,May1968.[9]MarkE.CrovellaandAzerBestavros.Self-similarityinWorld

WideWebtrafﬁc:evidenceandpossiblecauses.IEEE/ACMTransactionsonNetworking,5(6):835–846,1997.

[10]PeterJ.Denning.Theworkingsetmodelforprogrambehavior.

Commun.ACM,11(5):323–333,1968.[11]EMC.EMCsymmetricdmxarchitectureguide.

http://www.emc.com/products/systems/pdf/C1011emcsymmdmxpdgldv.pdf,March2004.

[12]MichaelFactor,AssafSchuster,andGalaYadgar.Karma:

Know-it-allreplacementforamultilevelcache.InProc.oftheUSENIXConferenceonFileandStorageTechnologies,2007,2007.

[13]KevinW.FroeseandRichardB.Bunt.Theeffectofclient

cachingonﬁleserverworkloads.InHICSS(1),pages150–159,1996.

[14]BinnyS.GillandDharmendraS.Modha.SARC:Sequential

prefetchinginadaptivereplacementcache.InProceedingsoftheUSENIX2005AnnualTechnicalConference,pages293–308,2005.

[15]BinnyS.GillandDharmendraS.Modha.WOW:Wideordering

ofwrites-combiningspatialandtemporallocalityinnon-volatilecaches.InProceedingsofthe4thUSENIXConferenceonFileandStorageTechnologies(FAST),pages129–142,2005.[16]PawanGoyal,DivyeshJadav,DharmendraS.Modha,andRenu

Tewari.CacheCOW:providingQoSforstoragesystemcaches.InSIGMETRICS,pages306–307,2003.

[17]WindsorW.Hsu,AlanJaySmith,andHonestyC.Young.The

automaticimprovementoflocalityinstoragesystems.ACMTrans.Comput.Syst.,23(4):424–473,2005.

[18]S.JiangandX.Zhang.LIRS:Anefﬁcientlowinter-reference

recencysetreplacementpolicytoimprovebuffercacheperfor-mance.InProc.ACMSIGMETRICSConf.,2002.

[19]SongJiangandXiaodongZhang.ULC:Aﬁleblockplacement

andreplacementprotocoltoeffectivelyexploithierarchicallocalityinmulti-levelbuffercaches.ICDCS,00:168–177,2004.[20]T.JohnsonandD.Shasha.2Q:Alowoverheadhighper-formancebuffermanagementreplacementalgorithm.InProc.VLDBConf.,pages297–306,1994.

[21]D.Lee,J.Choi,J.-H.Kim,S.H.Noh,S.L.Min,Y.Cho,

andC.S.Kim.LRFU:Aspectrumofpoliciesthatsubsumestheleastrecentlyusedandleastfrequentlyusedpolicies.IEEETrans.Computers,50(12):1352–1360,2001.

[22]LishingLiu.Issuesinmulti-levelcachedesigns.InICCS

’94:Proceedingsofthe1994IEEEInternationalConferenceonComputerDesign:VLSIinComputer&Processors,pages46–52,Washington,DC,USA,1994.IEEEComputerSociety.[23]J.R.LorchandA.J.Smith.TheVTracetool:Buildingasystem

tracesforWindowsNTandWindows2000.MSDNMagazine,15(10):86–102,Oct2000.

[24]BrueMcNuttandStevenJohnson.AstandardtestofI/Ocache.

InProc.Comput.MeasurementsGroup’s2001Int.Conf.,2001.

[25]NimrodMegiddoandD.S.Modha.ARC:Aself-tuning,low

overheadreplacementcache.InFAST,pages115–130,2003.[26]D.MuntzandP.Honeyman.Multi-levelcachingindistributed

ﬁlesystems-or-yourcacheain’tnuthin’buttrash.ProceedingsoftheUSENIXWinterConference,pages305–313,January1992.

[27]LiOu,XubinHe,MarthaJ.Kosa,andStephenL.Scott.

Auniﬁedmultiple-levelcacheforhighperformancestoragesystems.InMASCOTS’05:Proceedingsofthe13thIEEEInternationalSymposiumonModeling,Analysis,andSimulationofComputerandTelecommunicationSystems,pages143–152,Washington,DC,USA,2005.IEEEComputerSociety.

[28]S.Przybylski,M.Horowitz,andJ.Hennessy.Characteristics

ofperformance-optimalmulti-levelcachehierarchies.InISCA’89:Proceedingsofthe16thannualinternationalsymposiumonComputerarchitecture,pages114–121,NewYork,NY,USA,1989.ACMPress.

[29]J.T.RobinsonandM.V.Devarakonda.Datacachemanagement

usingfrequency-basedreplacement.InProc.ACMSIGMET-RICSConf.,pages134–142,1990.

[30]R.T.ShortandH.M.Levy.Asimulationstudyoftwo-levelcaches.InISCA’88:Proceedingsofthe15thAnnualInternationalSymposiumonComputerarchitecture,pages81–88,LosAlamitos,CA,USA,1988.IEEEComputerSocietyPress.

[31]J.Z.TengandR.A.Gumaer.ManagingIBMdatabase2

bufferstheeffectofclientcachingonﬁleserverworkloads.IBMSystemsJournal,23(2):211–218,1984.

[32]DarrylL.Willick,DerekL.Eager,andRichardB.Bunt.

Diskcachereplacementpoliciesfornetworkﬁleservers.InInternationalConferenceonDistributedComputingSystems,pages2–11,1993.

[33]TheodoreM.WongandJohnWilkes.Mycacheoryours?

makingstoragemoreexclusive.InProc.oftheUSENIXAnnualTechnicalConference,pages161–175,2002.

[34]JieshengWu,PeteWyckoff,andDhabaleswarK.Panda.

Demotion-basedexclusivecachingthroughdemotebuffering:Designandevaluationsoverdifferentnetworks.InWorkshoponStorageNetworkArchitectureandParallelI/O(SNAPI),2003.[35]YuanyuanZhou,ZhifengChen,andKaiLi.Second-level

buffercachemanagement.IEEETrans.ParallelDistrib.Syst.,15(6):505–519,2004.

[36]YuanyuanZhou,JamesPhilbin,andKaiLi.TheMulti-Queue

replacementalgorithmforsecondlevelbuffercaches.InProceedingsofthe2001USENIXAnnualTechnicalConference,pages91–104,2001.

[37]G.K.Zipf.HumanBehaviorandthePrincipleofLeastEffort.

AddisonWesley,Reading,MA,1949.

USENIX AssociationFAST ’08: 6th USENIX Conference on File and Storage Technologies

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文

全部栏目

On Multi-level Exclusive Caching Offline Optimality and