IBMUniversityFacultyAwardProgramProposal
Prof.StephenW.KecklerDepartmentofComputerSciencesTheUniversityofTexasatAustin
IBMTechnicalSponsor:RonKalla(SystemsGroup)
March10,2005
1TechnicalAreasofResearch
VLSIdesign,powermanagement,continuousoptimization
2ProjectDescription
Ascomputerchipdesignershavepushedaggressivelyforhigherperformanceprocesses,circuits,andsystems,designmarginshaveshrunkdramatically.Forexample,intherelativelyrecentpast,peakpowerconsumptionwaswellbelowpackaginglimitsforhigh-performancesystems.However,today’spackagelimitationsonbothpowerconsumptionandheatdissipationhaverisentofirstorderdesignconstraints.Whilemostchipsaredesignedforaparticularmaximumthermaloperatingpoint,theytypicallyeitheroperatefarfromthatpointorarethrottledinacoarsegrainfashiontopreventthermalviolations.Atypicalworkloadforacomputersystemandhowtheworkloadusesthesystemcomponentschangesdrasticallyovertime.Webande-commerceserversseevaryingloaddependingonthetimeofday.Theambienttemperatureseenbyacomputerinamachineroommayvarynotjustonload,butalsowhenothersystemsareaddedtoorremovedfromtheroom.Applicationswithlargedatasetsandirregularaccesspatternsoftenexhibitpoorcachebehavior,leavingtheprocessorstalledforextendedperiodsoftime,whileotherapplicationsmaybemoreprocessorordiskintensive.Evenasingleapplicationgoesthroughphaseswhichplacevaryingburdensonthesystem[1].Asystemdesignedformaximumactivityratesofallitscomponentswouldcertainlynotexceeditspowerlimits,butwouldtypicallyoperatefarfromitstruecapabilities.Thekeychallengeisnotsolelytoreducepowerconsumption,butinsteadtodeliverenergytowhereitismostusefulinthesystematanygiventime.
Weproposetouseon-linecontinuousoptimizationtodynamicallytunethesystemtomeetpower,temper-ature,andenergyconstraints.Whilesubstantialopportunitiesforreducingpowerconsumptionareavailable,extendingthecurrentstrategyoflocalizedcontrolofindividualpowermanagementtechniquesisnotviable.Withoutcoordinatedcontrol,acollectionofindividualtechniquesmaybeenabledindestructiveorineffectivecombinations.Existingopen-loopcontroltechniquesdonotguaranteeeffectiveoperationthroughoutthewiderangeofprocessvariability,applicationspace,andoperatingconditions.Asimplereactiveapproachofenablingapower(orothermetric)savingtechniquebasedonapre-definedsetofeventssuchas”after1000cyclesofinactivity,transitiontosleepmode”doesnottakeintoaccounttheeffectivenessoftheactionatruntime.Tech-niquesthatareappliedglobally,suchasIntel’srecentlyannounced”DemandBasedSwitching”whichallowsfinegrainadjustmentstochip-widevoltageandfrequency,lacktheabilitytochanneltheenergytodifferentpartsofthechipatdifferenttimes.Weproposetoexamineandevaluateclosed-looppowermanagementmechanismsthatmonitorandmeasuretheireffectivenessovertimeandacrossapplications,aswellasallocateenergytothemostcriticalresourcesatanygiventime.
Opportunitiesforpowermanagement:Inourpriorwork,weinvestigatedopportunitiesforreducingbothdynamicpower[2]andstaticpower[3]inthecontextofanout-of-ordermicroprocessor.Inourstudiesof
1
dynamicpowerwetrackeddynamicpowerconsumptionthroughoutapipelinemodeloftheAlpha21264pro-cessor,notingthepowertaxofmis-predictionandover-provisioning.Wefoundthatmis-predictionaccountedforapproximately6Over-provisionedstructuresthataredesignedformaximumthroughputbutnotfullyusedbytypicalprogramsaccountedforabout17pipelineenergy.Inourstudyofstaticpower,wecomparedtheeffectiveness(fromthemicroarchitecturalperspective)ofdifferentmechanismsthatreducestaticpowercon-sumptionincaches,includingpowergatinganddynamicthresholdvoltagemodulation.Wefoundthatthesevaryingtechniquesprovideddifferentbenefitstodifferentcaches,andcouldimprovetheenergy-delayproductofthesecachesbyafactorof20-50,dependingonthecacheandthetechnique.
Theliteratureonmicroarchitecturalmechanismsisalreadylargeandcontinuestogrow.Dynamicmanage-menttechniquesincludeclockgating,dynamicvoltage/frequencyscaling,pipelinegating,pipelinethrottling,anddynamicmicroarchitecturalstructuresizemodulation.Additionalstrategiestodynamicallymanageleakageenergyincludeinstruction-cacheresizinganddrowsycaches,eachofwhichplacesaportionofacacheintoalow-powerstate.
Challengesforcombiningtechniques:Simplyextendingtheexistingclassofmicroarchitecturalmanage-menttechniquestoencompasspower,energy,andtemperatureconstraintsfallsshortofarobustmanagementsystem.Employingmultiplesimultaneouspowermanagementtechniquesposestwomainconcerns.First,powermanagementparametersaretypicallydeterminedwithincompleteknowledgeofphysicalenvironment,operatingconditions,andapplicationcharacteristics.Ifcodeprofilingandpre-fabricationprocessorsimulationsdonotaccuratelymatchactualruntimeconditions,themismatchcanleadtoineffectivemanagement.Forexam-ple,changingthefrequencyandvoltagesettingsbasedonrecentprogrambehaviorviaaperformancemonitormayprovideexcellentcontrolforthetestbenchmarksuiteyetresultinapathologicalcaseforacustomer’sproprietarysoftware.
Second,runtimeeventscouldrepeatedlytriggerconflictsbetweenmanagementpolicies.Forexample,anenergy-savingpolicymightsetthefrequencyatafastrateforaprogramsothatitcancompletethetaskquicklyandthenpowerdowntoconservestaticenergy.Aseparatetemperaturepolicymightsetalowerfrequencytocoolthechipintheeventofexcessiveheatdissipation.Duringprogramexecution,thechipcouldbreachatemperaturethreshold,causingoscillationsbetweenmanagementmechanismsthattriggeraslowerfrequencyforcoolingandfasterfrequencytooptimizeleakage.Avoidingsuchconflictsrequirestestingeachcombinationoftechniques,addingtothecostandcomplexityofprocessorverification.
CoordinatedPowerManagement:Weproposetocontrolthepowermanagementmechanismsinacoor-dinatedfashion,adjustingtheminconcerttoachievethedesiredperformancegoalswithintheconstraintsoflimitedpower,energy,andtemperaturelevels.Theinfrastructureforcoordinatedpowermanagementincludesacollectionofsensors(whichcouldincludetemperaturesensorsaswellasactivitycounters),asetofactuatorsforadjustingthevariouspowermanagementparameters,andacontrollerthatmakespolicydecisions.Weexpectthatthealgorithmsandchangingpoliciesmayrequireprogrammabilityintheformofasimpleembeddedpro-cessor.Whilewewillinitiallyfocusonasingle-chipmicroprocessorwithanembeddedpowermanager,wealsoforseethisapproachcomplementingasystem-levelstrategy(suchasthatacrossanSMP)inwhichthedifferentnodesinthesystemarerunatvaryingfrequenciesandpowerconsumptionaccordingtoload[4].
Currentpowermanagersreacttospecificeventswithpre-determinedresponses,suchasthePentium4thermalcontrolpolicyparaphrasedas”iftemperatureexceedsthethreshold,thenenableintermittentclockgating.”Agoal-drivenmanagementapproachadaptstoawiderrangeofoperatingconditionsandresourceuse,allowingprocessorstorunclosertotheedgeofpower,temperature,andenergylimits.Agoal-seekingapproachisflex-ible,unliketrigger-drivendecisionsthatreacttospecificeventswithpre-determinedresponses.Forexample,agoal-drivencontrollerfacinganimpendingthermalemergencyselectsthemosteffectivechoiceforthesitu-ation,choosingthebestcombinationofclockgating,threadmigration,voltageandfrequencyscaling,orother
2
options.Itcanprovidesaferoperatingconditionsforrun-timeenvironmentsandconfigurationsnotexpectedduringdesignandvalidationphases.
Ourcoordinatedapproachwouldsupplyagoaltothepowermanagersuchas”maximumperformancewithinsettemperatureandenergylimits,”whichwouldthenselecttheappropriatemechanismstoachievethegoal.Themanagermaintainsamodelofthesystemandunderstandsthefirst-ordersensitivityofperformance,tempera-ture,andpowertothemanagementactuatorsatitsdisposal.Wewillexploreafamilyofalgorithms,includingconstrained-optimizationapproaches,whichcanusegradientdescenttechniquestodrivetheconfigurationto-wardthedesiredgoalusingfeedbackfromthesensors.Consequently,themanagercantracksystembehaviorandshiftgoalobjectivesinsynchronywithchangingapplicationdemandsandenergyresources.Thisclosed-loopfeedbacksystemisaverypowerfulparadigmforempiricallyfindinggoodconfigurations,butgoodcontrolsystemsengineeringmethodsmustbeapplied.
Asanexampleofcontinuousoptimizationforpower,considerthefollowingscenario.Theoperatingsystemnotifiesthecoordinatedmanagertoseekthegoalofhighthroughputwithinlimitsofastrictupperboundontemperaturewithmoderatepowerandenergythresholds.Theprocessoriscurrentlyoperatingwithamid-rangevoltagelevel;sensordataindicatethatthetemperatureiswithinanacceptablerangeandthattheperformanceislessthanthegoal.Themanagerdirectsthevoltageregulatortostepupthesupplyvoltageandmonitorsthetemperatureriseandperformancecounters,andcontinuestoraisethefrequencyandvoltageuntilachievingthedesiredperformancetarget.Ifarunningapplicationcausesathermalspike,themanagertakesimmediateactiontocoordinatearesponsebetweenthevoltage,frequency,andactivitymigrationcontrols,whilepostpon-ingacacheleakagepolicythatwouldhavecreatedatemporaryincreaseinwrite-backtrafficataninoppor-tunemoment.Withcoordinatedinformationfrommultiplesourcesandagoal-drivenalgorithm,ahierarchicalpower/energy/temperaturemanagercanadapttothesystemenvironmentandpushtheoperatingconditionstotheedgeofacceptablelimits.
Thecoordinatedmanagerdesignintegratesthefundamentalprinciplesofclosed-loopandgoal-drivencontrolthroughthefollowingbasicmechanisms:
(1)Sensors:Themanagerrequiresaccesstotemperaturesensorsandeventcounters(collectivelyreferredtoassensors)throughoutthechipatappropriatesamplingintervals.Themanagercanalsouseactivitycounterdatatotrackdecisioneffectivenessanddeterminecostfunctionsforknobsettings.
(2)Actuators:Acoordinatedmanagerrequiresusefulknobstoturn,suchasDVFS,pipelinewidthmodula-tion,andsleepmodetechniquesinourexperiments.Aselectionofknobsthatencompassarangeofoptionsfromcoarse-grainglobalcontroltofine-grainlocalizedcontrolprovideresolutionfortuningtheprocessor’soperationtoitsgoalstate.
(3)Feedbackalgorithms:Arobustalgorithmdirectsknobsettingsbysynthesizinginformationfromsensorsandcounters.Thealgorithmmustbestableoverawiderangeofinputandgoalfunctionsinordertopreventsystemfailurefromerrantcontroldecisions.
(4)Hierarchy:Themanagerwillspanhardwareandsoftwareforacombinationofimmediatecontrolandandflexibility.Ahierarchywithinthemanagerdistributesdecisionsaccordingtorequiredresponsetime:quickresponseinhardwareforphenomenawithshorttimeconstants,suchasajumpinleakagepowerwhenaunitexitssleepmode;andsoftwaretohandlelongerintervalsbetweendecisionsforslow-movingtrendslikegradualchipwarming.
(5)Granularity:Someresponsessuchasuniversalclockreductionareappliedatagloballevel,whileothers,suchascachesleepmodes,targetonlyalocalizedarea.Theadventoftechniquessuchasvoltageislandsandgloballyasynchronous,locallysynchronous(GALS)designswillenabletechniquessuchasDVFStobeappliednon-uniformlyacrossthechip.Acoordinatedmanagercantuneawiderangeofcoarseandfinegrainmanagementtechniquestoefficientlymanageresources.
3
Evaluation:Wehavecompletedthedevelopmentofanarchitecturalsimulationinfrastructuretoquantifytheeffectofpower,temperature,andenergymanagementdecisions.Ourinfrastructurecombinesourdetailedandvalidatedmicroarchitecturalsimulator(sim-alpha)withtheWattchpowermodelandtheHotSpottemperaturemodel.Wehavealreadyextendedthesimulatortoincludepowermanagementtechniquessuchasdynamicfrequencyandvoltagescaling,pipelinethrottling,andcacheleakagecontrol.Ourinitialexperimentsmeasuredsystemwithnopowermanagement,uncoordinatedpowermanagement,andfixedpowermanagement(tryingallpossiblepowermanagementparametersettingsandpickingthebestone–atechniquenotfeasibleinreality)[5].TheresultsshowthatthebestpowermanagementsettingssubstantiallyoutperformtheuncoordinatedmanagerandnopowermanagementbyawidemarginonasubsetoftheSPEC2000benchmarks.
Inthecomingyear,wewillevaluatealgorithmsfordynamicallymanagingandallocatingenergysubjecttotemperature,power,andperformanceconstraints.Wehopetosurpasstheperformanceoftheoptimaloff-linealgorithmwithagooddynamicon-linealgorithmthatoperatesinconjunctionwithapplicationexecution.Wewillextendoursimulationinfrastructuretoincludeperformancecountersandasensornetworktoprovidedataforthecoordinatedmanager’sonlinealgorithmandmeasurethesystemresponseatrealisticsamplingintervals.Futureworkmayexaminetheviabilityofusingfiner-grainedvoltage/frequencymodulation,asaffordedthroughfabricationandcircuittechniquessuchasvoltage/frequencyislands.
References:
[1]“DiscoveringandExploitingProgramPhases,”T.Sherwood,E.Perelman,G.Hamerly,S.Sair,andB.CalderIEEEMicro,23(6),pp.84-93,November/December,2003.
[2]“MicroprocessorPipelineEnergyAnalysis,”R.Natarajan,H.Hanson,S.W.Keckler,C.R.Moore,andD.Burger,IEEEInternationalSymposiumonLowPowerElectronicsandDesign(ISLPED),pp.282-287,August,2003.
[3]“StaticEnergyReductionTechniquesforMicroprocessorCaches,”H.Hanson,M.S.Hrishikesh,V.Agarwal,S.W.Keckler,andD.Burger,IEEETransactionsonVLSISystems,11(3),pp.303-313,June,2003.
[4]“SchedulingforHeterogeneousProcessorsinServerSystems,”S.Ghiasi,T.Keller,andF.Rawson(IBMAustinResearchLaboratory),ComputingFrontiersConference,May,2005.
[5]“ACaseforCoordinatedManagementofPerformance,Power,Energy,andTemperature,”H.HansonandS.Keckler,submittedtotheIEEEInternationalSymposiumonLowPowerElectronicsandDesign(ISLPED),2005.
3ProjectObjectivesandGoals
Ourprimarygoalsaretoanswerthefollowingresearchquestions:
Whatarethelimitsofindividualpowermanagementtechniquesappliedinisolation?
Howdothesedifferentpowermanagementtechniquesinteractwhenappliedsimultaneously,butcon-trolledindependently?Aretheinteractionscomplementaryorconfrontational?
Whataretheappropriatemetricsforpower/thermaloptimization(temperature,powerconsumption?),andwhatarethemostappropriatemeansofmeasuringthemon-line?
Whatarethenaturaltimeconstantsofthepowermanagementtechniques?Howlongdoesittaketoinvokeeachtechnique,whatistheoverhead,andhowlongdoesittakeforthetechniquetotakeeffect?Whatarethelimitsofacoordinatedapproachtopowermanagement,inwhichallofthepowermanage-menttechniquesarecontrolledinacooperativefashion?
4
Howclosecanrealcontrolalgorithmsapproachtheoptimallimitsofpowermanagement?Whatarethebenefitsoffeedbackcontrolalgorithmsoveropen-loopalgorithms?
Whatistherightbalancebetweenhardwareandsoftwareinimplementinganembeddedpowermanager?Howdothebestpowermanagementpoliciesonanaggressiveconventionalarchitecturecomparetoamoreconservativesimpler(andperhapsmoreinherentlypowerefficient)architecturewithlessextensivepowermanagement,intermsofpowerandperformance?
Inaddition,weexpectthattheinsightsdevelopedduringthisstudywillbeofinteresttotheIBMSystemsGroup.WeexpecttointeractwiththeRonKallaandCarlAnderson(amongothersatIBM)toensurerelevanceoftheworktoIBMandtoprovideaconduitfortheinsightsbackintoIBM.
4LongTermImpact
Aggressivepowermanagementisnecessarytolimitthepackagingandsystemcostsforpowerdeliveryandcoolinginbothhigh-endandlow-endsystems.On-linecontinuousoptimizationrepresentsadeparturefromtheconventionalapproachofdesigningachip/systemfortheworstcase.Suchoptimizationwillallowdesignstopotentiallyexceedtheirpowerandtemperaturelimits,butwillrelyonon-linemechanismstoensuresafeoperatingconditions.Thisapproachwillallowthesystemtorunclosertotheedgeofthepower/performanceenvelopethancurrentstrategiesthatoverlyrestrictthesystematdesigntime.IBMhasrecognizedtheneedforpowermanagementandhasestablishedacorporate-widelowpowerinitiativecenteredattheIBMAustinResearchLaboratory(ARL).Inaddition,newIBMinitiativesinautonomiccomputingarewell-matchedwiththenotionofcontinuousoptimizationdescribedinthisproposal.CombiningthecircuitsandsystemsworkfromtheARLwiththeexpectedmicroarchitecturalresultsfromthisresearchwilllikelyprovebeneficialtofuturedesignswithintheIBMSystemsgroup.WeareuniquelypositionedtoinvestigatethisareabecauseofourstrongtiestoIBMinboththeTRIPSandPERCSprojects.
5
因篇幅问题不能全部显示,请点此查看更多更全内容