Version1.0.7
MathematicsandComputerScienceDivision
ArgonneNationalLaboratory
WilliamGroppEwingLuskDavidAshtonPavanBalajiDariusBuntinasRalphButlerAnthonyChanDavidGoodellJayeshKrishnaGuillaumeMercier
RobRossRajeevThakurBrianToonenApril2,2008
ThisworkwassupportedbytheMathematical,Information,andComputationalSci-encesDivisionsubprogramoftheOfficeofAdvancedScientificComputingResearch,Sci-DACProgram,OfficeofScience,U.S.DepartmentofEnergy,underContractDE-AC02-06CH11357.
∗
1
Contents
1Introduction
2MigratingtoMPICH2fromMPICH12.12.22.3
DefaultRuntimeEnvironment
.................
111222344455566671112121515
StartingParallelJobs......................Command-LineArgumentsinFortran.............
3QuickStart
4CompilingandLinking4.14.24.34.4
SpecifyingCompilers.......................SharedLibraries.........................SpecialIssuesforC++......................SpecialIssuesforFortran....................
5RunningProgramswithmpiexec5.15.25.3
Standardmpiexec........................ExtensionsforAllProcessManagementEnvironments....ExtensionsfortheMPDProcessManagementEnvironment.5.3.15.3.25.3.35.4
BasicmpiexecargumentsforMPD...........OtherCommand-LineArgumentstompiexecforMPDEnvironmentVariablesAffectingmpiexecforMPD..
ExtensionsforSMPDProcessManagementEnvironment..5.4.1
mpiexecargumentsforSMPD.............
5.5ExtensionsforthegforkerProcessManagementEnvironment5.5.1
mpiexecargumentsforgforker.............
i
5.65.7
RestrictionsoftheremshellProcessManagementEnvironment17UsingMPICH2withSLURMandPBS.............5.7.15.7.2
MPDinthePBSenvironment.............OSCmpiexec.......................
1718186ManagingtheProcessManagementEnvironment6.1
MPD................................
7Debugging7.1gdbviampiexec.........................7.2
TotalView.............................
8MPE8.1MPILogging...........................8.2User-definedlogging.......................8.3MPIChecking...........................8.4
MPEoptions...........................
9OtherToolsProvidedwithMPICH210MPICH2underWindows
10.1Directories.............................10.2Compiling.............................10.3Running..............................AFrequentlyAskedQuestions
A.1GeneralInformation.......................
A.1.1Q:WhatisMPICH2?..................A.1.2Q:WhatdoesMPICHstandfor?............
ii
18181919232424252627282828282930303030
A.1.3Q:CanMPIbeusedtoprogrammulticoresystems?.A.2BuildingMPICH2........................
A.2.1Q:WhatisthedifferencebetweentheMPDandSMPD
processmanagers?....................A.2.2Q:DoIhavetoconfigure/make/installMPICH2each
timeforeachcompilerIuse?..............A.2.3Q:HowdoIconfiguretousetheAbsoftFortrancom-pilers?...........................A.2.4Q:WhenIconfigureMPICH2,Igetamessageabout
FDZEROandtheconfigureaborts...........A.2.5Q:WhenIusetheg95Fortrancompilerona64-bit
platform,someofthetestsfail.............
30313131323333
A.2.6Q:WhenIrunmake,itfailsimmediatelywithmany
errorsbeginningwith“sock.c:8:24:mpidusock.h:NosuchfileordirectoryInfileincludedfromsock.c:9:../../../../include/mpiimpl.h:91:21:mpidpre.h:NosuchfileordirectoryInfileincludedfromsock.c:9:../../../../in-clude/mpiimpl.h:1150:error:syntaxerrorbefore”MPIDVCRT”../../../../include/mpiimpl.h:1150:warning:nosemi-colonatendofstructorunion”.............34A.2.7Q:Whenbuildingthessmorsshmchannel,Igetthe
error“mpiduprocesslocks.h:234:2:error:#error***Noatomicmemoryoperationspecifiedtoimplementbusylocks***”......................A.2.8Q:WhenusingtheIntelFortran90compiler(version
9),themakefailswitherrorsincompilingstatementthatreferenceMPIADDRESSKIND.............A.2.9Q:ThebuildfailswhenIuseparallelmake......A.3WindowsversionofMPICH2..................
A.3.1IamhavingtroubleinstallingandusingtheWindows
versionofMPICH2....................A.4CompilingMPIPrograms....................
34
3435353535
iii
A.4.1C++andSEEKSET...................A.4.2C++andErrorsinNullcomm::Clone
.........
35363737
A.5RunningMPIPrograms.....................
A.5.1Q:HowdoIpassenvironmentvariablestothepro-cessesofmyparallelprogram..............A.5.2Q:HowdoIpassenvironmentvariablestothepro-cessesofmyparallelprogramwhenusingthempdprocessmanager?.....................A.5.3Q:WhatdeterminesthehostsonwhichmyMPIpro-cessesrun?........................A.5.4Q:OnWindows,IgetanerrorwhenIattempttocall
MPICommspawn......................
373739
A.5.5Q:Myoutputdoesnotappearuntiltheprogramexits39A.5.6Q:Fortranprogramsusingstdiofailwhenusingg95.A.5.7Q:HowdoIrunMPIprogramsinthebackground
whenusingthedefaultMPDprocessmanager?....
4040
iv
1INTRODUCTION1
1Introduction
ThismanualassumesthatMPICH2hasalreadybeeninstalled.Forinstruc-tionsonhowtoinstallMPICH2,seetheMPICH2Installer’sGuide,ortheREADMEinthetop-levelMPICH2directory.Thismanualexplainshowtocompile,link,andrunMPIapplications,andusecertaintoolsthatcomewithMPICH2.Thisisapreliminaryversionandsomesectionsarenotcompleteyet.However,thereshouldbeenoughheretogetyoustartedwithMPICH2.
2MigratingtoMPICH2fromMPICH1
IfyouhavebeenusingMPICH1.2.x(1.2.7p1isthelatestversion),youwillfindanumberofthingsaboutMPICH2thataredifferent(andhopefullybetterineverycase.)YourMPIapplicationprogramsneednotchange,ofcourse,butanumberofthingsabouthowyourunthemwillbedifferent.MPICH2isanall-newimplementationoftheMPIStandard,designedtoimplementalloftheMPI-2additionstoMPI(dynamicprocessmanagement,one-sidedoperations,parallelI/O,andotherextensions)andtoapplythelessonslearnedinimplementingMPICH1tomakeMPICH2morerobust,efficient,andconvenienttouse.TheMPICH2Installer’sGuideprovidessomeinformationonchangesbetweenMPICH1andMPICH2totheprocessofconfiguringandinstallingMPICH.Changestocompiling,linking,andrunningMPIprogramsbetweenMPICH1andMPICH2aredescribedbelow.
2.1DefaultRuntimeEnvironment
InMPICH1,thedefaultconfigurationusedthenow-oldp4portablepro-grammingenvironment.Processeswerestartedviaremoteshellcommands(rshorssh)andtheinformationnecessaryforprocessestofindandcon-nectwithoneanotheroversocketswascollectedandthendistributedatstartuptimeinanon-scalablefashion.Furthermore,theentanglementofprocessmanagmentfunctionalitywiththecommunicationmechanismledtoconfusingbehaviorofthesystemwhenthingswentwrong.
MPICH2providesaseparationofprocessmanagementandcommunica-tion.Thedefaultruntimeenvironmentconsistsofasetofdaemons,called
3QUICKSTART2
mpd’s,thatestablishcommunicationamongthemachinestobeusedbe-foreapplicationprocessstartup,thusprovidingaclearerpictureofwhatiswrongwhencommunicationcannotbeestablishedandprovidingafastandscalablestartupmechanismwhenparalleljobsarestarted.Section6.1de-scribestheMPDprocessmanagementsysteminmoredetail.Otherprocessmanagersarealsoavailable.
2.2StartingParallelJobs
MPICH1providedthempiruncommandtostartMPICH1jobs.TheMPI-2Forumrecommendedastandard,portablecommand,calledmpiexec,forthispurpose.MPICH2implementsmpiexecandallofitsstandardargu-ments,togetherwithsomeextensions.SeeSection5.1forstandardar-gumentstompiexecandvarioussubsectionsofSection5forextensionsparticulartovariousprocessmanagementsystems.
MPICH2alsoprovidesanmpiruncommandforsimplebackwardcom-patibility,butMPICH2’smpirundoesnotprovidealltheoptionsofmpiexecoralloftheoptionsofMPICH1’smpirun.
2.3Command-LineArgumentsinFortran
MPICH1(morepreciselyMPICH1’smpirun)requiredaccesstocommandlineargumentsinallapplicationprograms,includingFortranones,andMPICH1’sconfiguredevotedsomeefforttofindingthelibrariesthatcon-tainedtherightversionsofiargcandgetargandincludingthoselibrarieswithwhichthempif77scriptlinkedMPIprograms.SinceMPICH2doesnotrequireaccesstocommandlineargumentstoapplications,thesefunctionsareoptional,andconfiguredoesnothingspecialwiththem.Ifyouneedtheminyourapplications,youwillhavetoensurethattheyareavailableintheFortranenvironmentyouareusing.
3QuickStart
TouseMPICH2,youwillhavetoknowthedirectorywhereMPICH2hasbeeninstalled.(Eitheryouinstalleditthereyourself,oryoursystemsadmin-istratorhasinstalledit.Oneplacetolookinthiscasemightbe/usr/local.
4COMPILINGANDLINKING3
IfMPICH2hasnotyetbeeninstalled,seetheMPICH2Installer’sGuide.)Wesuggestthatyouputthebinsubdirectoryofthatdirectoryintoyourpath.ThiswillgiveyouaccesstoassortedMPICH2commandstocompile,link,andrunyourprogramsconveniently.Othercommandsinthisdirectorymanagepartsoftherun-timeenvironmentandexecutetools.
Oneofthefirstcommandsyoumightrunismpich2versiontofindouttheexactversionandconfigurationofMPICH2youareworkingwith.SomeofthematerialinthismanualdependsonjustwhatversionofMPICH2youareusingandhowitwasconfiguredatinstallationtime.
YoushouldnowbeabletorunanMPIprogram.LetusassumethatthedirectorywhereMPICH2hasbeeninstalledis/home/you/mpich2-installed,andthatyouhaveaddedthatdirectorytoyourpath,using
setenvPATH/home/you/mpich2-installed/bin:$PATHfortcshandcsh,or
exportPATH=/home/you/mpich2-installed/bin:$PATH
forbashorsh.ThentorunanMPIprogram,albeitonlyononemachine,youcando:
mpd&
cd/home/you/mpich2-installed/examplesmpiexec-n3cpimpdallexit
Detailsforthesecommandsareprovidedbelow,butifyoucansuccessfullyexecutethemhere,thenyouhaveacorrectlyinstalledMPICH2andhaverunanMPIprogram.
4CompilingandLinking
AconvenientwaytocompileandlinkyourprogramisbyusingscriptsthatusethesamecompilerthatMPICH2wasbuiltwith.Thesearempicc,mpicxx,mpif77,andmpif90,forC,C++,Fortran77,andFortran90pro-grams,respectively.Ifanyofthesecommandsaremissing,itmeansthatMPICH2wasconfiguredwithoutsupportforthatparticularlanguage.
4COMPILINGANDLINKING4
4.1SpecifyingCompilers
YouneednotusethesamecompilerthatMPICH2wasbuiltwith,butnotallcompilersarecompatible.YoucanalsospecifythecompilerforbuildingMPICH2itself,asreportedbympich2version,justbyusingthecompilingandlinkingcommandsfromtheprevioussection.TheenvironmentvariablesMPICHCC,MPICHCXX,MPICHF77,andMPICHF90maybeusedtospecifyalternateC,C++,Fortran77,andFortran90compilers,respectively.
4.2SharedLibraries
CurrentlysharedlibrariesareonlytestedonLinuxandMacOSX,andtherearerestrictions.SeetheInstaller’sGuideforhowtobuildMPICH2asasharedlibrary.Ifsharedlibrarieshavebeenbuilt,youwillgetthemauto-maticallywhenyoulinkyourprogramwithanyoftheMPICH2compilationscripts.
4.3SpecialIssuesforC++
Someusersmaygeterrormessagessuchas
SEEK_SETis#definedbutmustnotbefortheC++bindingofMPI
Theproblemisthatbothstdio.handtheMPIC++interfaceuseSEEKSET,SEEKCUR,andSEEKEND.ThisisreallyabugintheMPI-2standard.Youcantryadding
#undefSEEK_SET#undefSEEK_END#undefSEEK_CUR
beforempi.hisincluded,oraddthedefinition
-DMPICH_IGNORE_CXX_SEEK
tothecommandline(thiswillcausetheMPIversionsofSEEKSETetc.tobeskipped).
5RUNNINGPROGRAMSWITHMPIEXEC5
4.4SpecialIssuesforFortran
MPICH2providestwokindsofsupportforFortranprograms.ForFortran77programmers,thefilempif.hprovidesthedefinitionsoftheMPIconstantssuchasMPICOMMWORLD.Fortran90programmersshouldusetheMPImoduleinstead;thisprovidesallofthedefinitionsaswellasinterfacedefinitionsformanyoftheMPIfunctions.However,thisMPImoduledoesnotprovidefullFortran90support;inparticular,interfacesfortheroutines,suchasMPISend,thattake“choice”argumentsarenotprovided.
5RunningProgramswithmpiexec
IfyouhavebeenusingtheoriginalMPICH,oranyofanumberofotherMPIimplementations,thenyouhaveprobablybeenusingmpirunasawaytostartyourMPIprograms.TheMPI-2StandarddescribesmpiexecasasuggestedwaytorunMPIprograms.MPICH2implementsthempiexecstandard,andalsoprovidessomeextensions.MPICH2providesmpirunforbackwardcompatibilitywithexistingscripts,butitdoesnotsupportthesameorasmanyoptionsasmpiexecoralloftheoptionsofMPICH1’smpirun.
5.1Standardmpiexec
HerewedescribethestandardmpiexecargumentsfromtheMPI-2Stan-dard[1].ThesimplestformofacommandtostartanMPIjobis
mpiexec-n32a.out
tostarttheexecutablea.outwith32processes(providinganMPICOMMWORLDofsize32insidetheMPIapplication).Otheroptionsaresupported,forspec-ifyinghoststorunon,searchpathsforexecutables,workingdirectories,andevenamoregeneralwayofspecifyinganumberofprocesses.Multiplesetsofprocessescanberunwithdifferentexectuablesanddifferentvaluesfortheirarguments,with“:”separatingthesetsofprocesses,asin:
mpiexec-n1-hostloginnodemaster:-n32-hostsmpslave
5RUNNINGPROGRAMSWITHMPIEXEC6
The-configfileargumentallowsonetospecifyafilecontainingthespec-ificationsforprocesssetsonseparatelinesinthefile.Thismakesitunnec-essarytohavelongcommandlinesformpiexec.(Seepg.353of[2].)ItisalsopossibletostartaoneprocessMPIjob(withaMPICOMMWORLDwhosesizeisequalto1),withoutusingmpiexec.ThisprocesswillbecomeanMPIprocesswhenitcallsMPIInit,anditmaythencallotherMPIfunctions.Currently,MPICH2doesnotfullysupportcallingthedynamicprocessroutinesfromMPI-2(e.g.,MPICommspawnorMPICommaccept)fromprocessesthatarenotstartedwithmpiexec.
5.2ExtensionsforAllProcessManagementEnvironments
Somempiexecargumentsarespecifictoparticularcommunicationsub-systems(“devices”)orprocessmanagementenvironments(“processman-agers”).Ourintentionistomakeallargumentsasuniformaspossibleacrossdevicesandprocessmanagers.Forthetimebeingwewilldocumenttheseseparately.
5.3
ExtensionsfortheMPDProcessManagementEnviron-ment
MPICH2providesanumberofprocessmanagementsystems.ThedefaultiscalledMPD.MPDprovidesanumberofextensionstothestandardformofmpiexec.5.3.1
BasicmpiexecargumentsforMPD
ThedefaultconfigurationofMPICH2choosestheMPDprocessmanagerandthe“simple”implementationoftheProcessManagementInterface.MPDprovidesaversionofmpiexecthatsupportsboththestandardar-gumentsdescribedinSection5.1andotherargumentsdescribedinthissection.MPDalsoprovidesanumberofcommandsforqueryingtheMPDprocessmanagementenvironmentandinteractingwithjobsithasstarted.Beforerunningmpiexec,theruntimeenvironmentmustbeestablished.InthecaseofMPD,thedaemonsmustberunning.SeeSection6.1forhowtorunandmanagetheMPDdaemons.
5RUNNINGPROGRAMSWITHMPIEXEC7
WeassumethattheMPDringisupandtheinstallation’sbindirectoryisinyourpath;thatis,youcando:
mpdtrace
anditwilloutputalistofnodesonwhichyoucanrunMPIprograms.Nowyouarereadytorunaprogramwithmpiexec.Letusassumethatyouhavecompiledandlinkedtheprogramcpi(intheinstalldir/examplesdirectoryandthatthisdirectoryisinyourPATH.Orthatisyourcurrentworkingdirectoryand‘.’(“dot”)isinyourPATH.Thesimplestthingtodois
mpiexec-n5cpi
toruncpionfivenodes.Theprocessmanagementsystem(suchasMPD)willchoosemachinestorunthemon,andcpiwilltellyouwhereeachisrunning.
Youcanusempiexectorunnon-MPIprogramsaswell.Thisissome-timesusefulinmakingsureallthemachinesareupandreadyforuse.Usefulexamplesinclude
mpiexec-n10hostnameand
mpiexec-n10printenv5.3.2
OtherCommand-LineArgumentstompiexecforMPD
TheMPI-2standardspecifiesthesyntaxandsemanticsofthearguments-n,-path,-wdir,-host,-file,-configfile,and-soft.Allofthesearecur-rentlyimplementedforMPD’smpiexec.Eachoftheseiswhatwecalla“lo-cal”option,sinceitsscopeistheprocessesinthesetofprocessesdescribedbetweencolons,oronseparatelinesofthefilespecifiedby-configfile.Weaddsomeextensionsthatarelocalinthiswayandsomethatare“global”inthesensethattheyapplytoalltheprocessesbeingstartedbytheinvocationofmpiexec.
5RUNNINGPROGRAMSWITHMPIEXEC8
TheMPI-2Standardprovidesawaytopassdifferentargumentstodif-ferentapplicationprocesses,butdoesnotprovideawaytopassenvironmentvariables.MPICH2providesanextensionthatsupportsenvironmentvari-ables.Thelocalparameter-envdoesthisforonesetofprocesses.Thatis,
mpiexec-n1-envFOOBARa.out:-n2-envBAZZFAZZb.outmakesBARthevalueofenvironmentvariableFOOonthefirstprocess,runningtheexecutablea.out,andgivestheenvironmentvariableBAZZthevalueFAZZonthesecondtwoprocesses,runningtheexecutableb.out.Tosetanenvironmentvariablewithoutgivingitavalue,use’’asthevalueintheabovecommandline.
Theglobalparameter-genvcanbeusedtopassthesameenvironmentvariablestoallprocesses.Thatis,
mpiexec-genvFOOBAR-n2a.out:-n4b.out
makesBARthevalueoftheenvironmentvariableFOOonallsixprocesses.If-genvappears,itmustappearinthefirstgroup.Ifboth-genvand-envareused,the-env’saddtotheenvironmentspecifiedoraddedtobythe-genvvariables.Ifthereisonlyonesetofprocesses(no“:”),the-genvand-envareequivalent.
Thelocalparameter-envallisanabbreviationforpassingtheen-tireenvironmentinwhichmpiexecisexecuted.Theglobalversionofitis-genvall.Thisglobalversionisimplicitlypresent.Topassnoenvi-ronmentvariables,use-envnoneand-genvnone.So,forexample,tosetonlytheenvironmentvariableFOOandnoothers,regardlessofthecurrentenvironment,youwoulduse
mpiexec-genvnone-envFOOBAR-n50a.out
InthecaseofMPD,wecurrentlymakeanexceptionforthePATHenviron-mentvariable,whichisalwayspassedthrough.Thisexceptionwasaddedtomakeitunnecessarytoexplicitlypassthisvariableinthedefaultcase.Alistofenvironmentvariablenameswhosevaluesaretobecopiedfromthecurrentenvironmentcanbegivenwiththe-envlist(respectively,-genvlist)parameter;forexample,
5RUNNINGPROGRAMSWITHMPIEXEC9
mpiexec-genvnone-envlistHOME,LD_LIBRARY_PATH-n50a.outsetstheHOMEandLDLIBRARYPATHintheenvironmentofthea.outpro-cessestotheirvaluesintheenvironmentwherempiexecisbeingrun.Inthissituationyoucan’thavecommasintheenvironmentvariablenames,althoughofcoursetheyarepermittedinvalues.
Someextensionparametershaveonlyglobalversions.Theyare-lprovidesranklabelsforlinesofstdoutandstderr.Theseareabit
obscureforprocessesthathavebeenexplicitlyspawned,butarestilluseful.-usizesetsthe“universesize”thatisretrievedbytheMPIattribute
MPIUNIVERSESIZEonMPICOMMWORLD.-bnrisusedwhenonewantstorunexecutablesthathavebeencompiled
andlinkedusingthechp4mpdormyrinetdeviceinMPICH1.TheMPDprocessmanagerprovidesbackwardcompatibilityinthiscase.-machinefilecanbeusedtospecifyinformationabouteachofasetof
machines.Thisinformationmayincludethenumberofprocessestorunoneachhostwhenexecutinguserprograms.Forexample,assumethatamachinefilenamedmfcontains:
#commentlinehostahostb:2hostcifhn=hostc-gigehostd:4ifhn=hostd-gige
Inadditiontospecifyinghostsandnumberofprocessestorunoneach,thismachinefileindicatesthatprocessesrunningonhostcandhostdshouldusethegigeinterfaceonhostcandhostdrespectivelyforMPIcommunications.(ifhnstandsfor“interfacehostname”andshouldbesettoanalternatehostnameforthemachinethatisusedtodesignateanalternatecommunicationinterface.)ThisinterfaceinformationcausestheMPIimplementationtochoosethealternatehostnamewhenmakingconnections.Whenthealternatehostnamespecifiesaparticularinterface,MPICHcommunicationwillthentraveloverthatinterface.
Youmightusethismachinefileinthefollowingway:
5RUNNINGPROGRAMSWITHMPIEXEC
mpiexec-machinefilemf-n7p0
10
Processrank0istorunonhosta,ranks1and2onhostb,rank3onhostc,andranks4-6onhostd.Notethatthefilespecifiesinformationforupto8ranksandweonlyused7.ThatisOK.But,ifwehadused“-n9”,anerrorwouldberaised.Thefileisnotusedasapoolofmachinesthatarecycledthrough;theprocessesaremappedtothehostsintheorderspecifiedinthefile.
Amorecomplexcommand-lineexamplemightbe:
mpiexec-l-machinefilemf-n3p1:-n2p2:-n2p3Here,ranks0-2allrunprogramp1andareexecutedplacingrank0onhostaandranks1-2onhostb.Similarly,ranks3-4runp2andareexecutedonhostcandhostd,respectively.Ranks5-6runonhostdandexecutep3.
-scanbeusedtodirectthestdinofmpiexectospecificprocessesina
paralleljob.Forexample:
mpiexec-sall-n5a.out
directsthestdinofmpiexectoallfiveprocesses.
mpiexec-s4-n5a.out
directsittojusttheprocesswithrank4,and
mpiexec-s1,3-n5a.outsendsittoprocesses1and3,while
mpiexec-s0-3-n5a.outsendsstdintoprocesses0,1,2,and3.
Thedefault,if-sisnotspecified,istosendmpiexec’sstdintoprocess0only.
Theredirectionof-stdinthroughmpiexectovariousMPIprocessesisintendedprimarilyforinteractiveuse.Becauseofthecomplexityofbufferinglargeamountsofdataatvariousprocessesthatmaynothavereadityet,theredirectionoflargeamountsofdatatompiexec’sstdinisdiscouraged,andmaycauseunexpectedresults.Thatis,
5RUNNINGPROGRAMSWITHMPIEXEC
mpiexec-sall-n5a.out shouldnotbeusedifbigfileismorethanafewlineslong.Haveoneoftheprocessesopenthefileandreaditinstead.ThefunctionsinMPI-IOmaybeusefulforthispurpose. A“:”canoptionallybeusedbetweenglobalargsandnormalargumentsets,e.g.: mpiexec-l-n1-hosthost1pgm1:-n4-hosthost2pgm2isequivalentto: mpiexec-l:-n1-hosthost1pgm1:-n4-hosthost2pgm2Thisoptionimpliesthattheglobalargumentscanoccuronaseparatelineinthefilespecifiedby-configfilewhenitisusedtoreplacealongcommandline.5.3.3 EnvironmentVariablesAffectingmpiexecforMPD Asmallnumberofenvironmentvariablesaffectthebehaviorofmpiexec.MPIEXECTIMEOUTThevalueofthisenvironmentvariableisthemaximum numberofsecondsthisjobwillbepermittedtorun.Whentimeisup,thejobisaborted.MPIEXECPORTRANGEIfthisenvironmentvariableisdefinedthentheMPD systemwillrestrictitsusageofportsforconnectingitsvariouspro-cessestoportsinthisrange.Ifthisvariableisnotassigned,butMPICHPORTRANGEisassigned,thenitwillusetherangespecifiedbyMPICHPORTRANGEforitsports.Otherwise,itwillusewhateverpaortsareassignedtoitbythesystem.Portrangesaregivenasapairofintegersseparatedbyacolon.MPIEXECBNRIfthisenvironmentvariableisdefined(itsvalue,ifany,is currentlyinsignificant),thenMPDwillactinbackward-compatibilitymode,supportingtheBNRinterfacefromtheoriginalMPICH(e.g.versions1.2.0–1.2.7p1)insteadofitsnativePMIinterface,asawayforapplicationprocessestointeractwiththeprocessmanagementsystem. 5RUNNINGPROGRAMSWITHMPIEXEC12 MPDCONEXTAddsastringtothedefaultUnixsocketnameusedbympiexec tofindthelocalmpd.Thisallowsonetorunmultiplempdringsatthesametime. 5.4ExtensionsforSMPDProcessManagementEnvironment SMPDisanalternateprocessmanagerthatrunsonbothUnixandWin-dows.Itcanlaunchjobsacrossbothplatformsifthebinaryformatsmatch(big/littleendiannessandsizeofCtypes–int,long,void*,etc).5.4.1 mpiexecargumentsforSMPD mpiexecforsmpdacceptsthestandardMPI-2mpiexecoptions.Execute mpiexecor mpiexec-help2 toprinttheusageoptions.Typicalusage: mpiexec-n10myapp.exeAlloptionstompiexec:-nx -npx launchxprocesses-localonlyx -npx-localonly launchxprocessesonthelocalmachine -machinefilefilename useafiletolistthenamesofmachinestolaunchon 5RUNNINGPROGRAMSWITHMPIEXEC-hosthostname launchonthespecifiedhost.-hostsnhost1host2... hostn 13 -hostsnhost1m1host2m2...hostnmn launchonthespecifiedhosts.Inthesecondversionthenumberofprocesses=m1+m2+...+mn-dirdrive:\\my\\working\\directory -wdir/my/working/directory launchprocesseswiththespecifiedworkingdirectory.(-dirand-wdirareequivalent)-envvarval setenvironmentvariablebeforelaunchingtheprocesses-exitcodes printtheprocessexitcodeswheneachprocessexits. -noprompt preventmpiexecfrompromptingforusercredentials.Insteaderrorswillbeprintedandmpiexecwillexit.-localroot launchtherootprocessdirectlyfrommpiexecifthehostislocal.(Thisallowstherootprocesstocreatewindowsandbedebugged.)-portport -pport specifytheportthatsmpdislisteningon. -phrasepassphrase specifythepassphrasetoauthenticateconnectionstosmpdwith.-smpdfilefilename specifythefilewherethesmpdoptionsarestoredincludingthepassphrase.(unixonlyoption)-pathsearchpathsearchpathforexecutable,;separated 5RUNNINGPROGRAMSWITHMPIEXEC-timeoutseconds timeoutforthejob.Windowsspecificoptions: 14 -mapdrive:\\\\host\\share mapadriveonallthenodesthismappingwillberemovedwhentheprocessesexit-logon promptforuseraccountandpassword -pwdfilefilename readtheaccountandpasswordfromthefilespecified. puttheaccountonthefirstlineandthepasswordonthesecond-nopopupdebugdisablethesystempopupdialogiftheprocesscrashes -priorityclass[:level] settheprocessstartuppriorityclassandoptionallylevel.class=0,1,2,3,4=idle,below,normal,above,high level=0,1,2,3,4,5=idle,lowest,below,normal,above,highestthedefaultis-priority2:3-register encryptausernameandpasswordtotheWindowsregistry.-remove deletetheencryptedcredentialsfromtheWindowsregistry.-validate[-hosthostname] validatetheencryptedcredentialsforthecurrentorspecifiedhost.-delegate usepasswordlessdelegationtolaunchprocesses.-impersonate usepasswordlessauthenticationtolaunchprocesses.-plaintext don’tencryptthedataonthewire. 5RUNNINGPROGRAMSWITHMPIEXEC15 5.5 ExtensionsforthegforkerProcessManagementEnvi-ronment gforkerisaprocessmanagementsystemforstartingprocessesonasin-glemachine,socalledbecausetheMPIprocessesaresimplyforkedfromthempiexecprocess.ThisprocessmanagersupportsprogramsthatuseMPICommspawnandtheotherdynamicprocessroutines,butdoesnotsup-porttheuseofthedynamicprocessroutinesfromprogramsthatarenotstartedwithmpiexec.ThegforkerprocessmanagerisprimiarilyintendedasadebuggingaidasitsimplifiesdevelopmentandtestingofMPIprogramsonasinglenodeorprocessor.5.5.1 mpiexecargumentsforgforker Inadditiontothestandardmpiexeccommand-linearguments,thegforkermpiexecsupportsthefollowingoptions: -np -env theprocessesbeingrunbympiexec.-envnonePassnoenvironmentvariables(otherthanonesspecifiedwith other-envor-genvarguments)totheprocessesbeingrunbympiexec.Bydefault,allenvironmentvariablesareprovidedtoeachMPIprocess(rationale:principleofleastsurprisefortheuser)-envlist bycommas),withtheircurrentvalues,totheprocessesbeingrunbympiexec.-genv -genvoptionshavethesamemeaningastheircorresponding-envversion, excepttheyapplytoallexecutables,notjustthecurrentexecutable(inthecasethatthecolonsyntaxisusedtospecifymultipleexecuables).-genvnoneLike-envnone,butforallexecutables-genvlist -usize 5RUNNINGPROGRAMSWITHMPIEXEC16 -lLabelstandardoutandstandarderror(stdoutandstderr)withthe rankoftheprocess-maxtime -exitinfoProvidemoreinformationonthereasoneachprocessexitedif thereisanabnormalexitInadditiontothecommandlineargments,thegforkermpiexecprovidesanumberofenvironmentvariablesthatcanbeusedtocontrolthebehaviorofmpiexec: MPIEXECTIMEOUTMaximumrunningtimeinseconds.mpiexecwillter-minateMPIprogramsthattakelongerthanthevaluespecifiedbyMPIEXECTIMEOUT.MPIEXECUNIVERSESIZESettheuniversesize MPIEXECPORTRANGESettherangeofportsthatmpiexecwilluseincom-municatingwiththeprocessesthatitstarts.Theformatofthisis usedifMPIEXECPORTRANGEisnotset.MPIEXECPREFIXDEFAULTIfthisenvironmentvariableisset,outputtostan-dardoutputisprefixedbytherankinMPICOMMWORLDoftheprocessandoutputtostandarderrorisprefixedbytherankandthetext(err);botharefollowedbyananglebracket(>).Ifthisvariableisnotset,thereisnoprefix.MPIEXECPREFIXSTDOUTSettheprefixusedforlinessenttostandardout-put.A%disreplacedwiththerankinMPICOMMWORLD;a%wisre-placedwithanindicationofwhichMPICOMMWORLDinMPIjobsthatinvolvemultipleMPICOMMWORLDs(e.g.,onesthatuseMPICommspawnorMPICommconnect).MPIEXECPREFIXSTDERRLikeMPIEXECPREFIXSTDOUT,butforstandarder-ror. 5RUNNINGPROGRAMSWITHMPIEXEC17 MPIEXECSTDOUTBUFSetsthebufferingmodeforstandardoutput.Valid valuesareNONE(nobuffering),LINE(bufferingbylines),andBLOCK(bufferingbyblocksofcharacters;thesizeoftheblockisimplemen-tationdefined).ThedefaultisNONE.MPIEXECSTDERRBUFLikeMPIEXECSTDOUTBUF,butforstandarderror. 5.6 RestrictionsoftheremshellProcessManagementEnvi-ronment Theremshell“processmanager”providesaverysimpleversionofmpiexecthatmakesuseofthesecureshellcommand(ssh)tostartprocessesonacollectionofmachines.Asthisisintendedprimarilyasanillustrationofhowtobuildaversionofmpiexecthatworkswithotherprocessmanagers,itdoesnotimplementallofthefeaturesoftheothermpiexecprogramsdescribedinthisdocument.Inparticular,itignoresthecommandlineoptionsthatcontroltheenvironmentvariablesgiventotheMPIprograms.Itdoessupportthesameoutputlabelingfeaturesprovidedbythegforkerversionofmpiexec.However,thisversionofmpiexeccanbeusedmuchlikethempirunforthechp4deviceinMPICH-1torunprogramsonacollectionofmachinesthatallowremoteshells.Afilebythenameofmachinesshouldcontainthenamesofmachinesonwhichprocessescanberun,onemachinenameperline.Theremustbeenoughmachineslistedtosatisfytherequestednumberofprocesses;youcanlistthesamemachinenamemultipletimesifnecessary. Formorecomplexneedsorforfasterstartup,werecommendtheuseofthempdprocessmanager. 5.7UsingMPICH2withSLURMandPBS MPICH2canbeusedinbothSLURMandPBSenvironments.IfconfiguredwithSLURM,usethesrunjoblaunchingutilityprovidedbySLURM.ForPBS,MPICH2jobscanbelaunchedintwoways:(i)usingMPDor(ii)usingtheOSCmpiexec. 6MANAGINGTHEPROCESSMANAGEMENTENVIRONMENT185.7.1 MPDinthePBSenvironment PBSspecifiesthemachinesallocatedtoaparticularjobinthefile$PBSNODEFILE.ButtheformatusedbyPBSisdifferentfromthatofMPD.Specifically,PBSlistseachnodeonasingleline;ifanode(n0)hastwoprocessors,itislistedtwice.MPDontheotherhandusesanidentifier(ncpus)todescribehowmanyprocessorsanodehas.So,ifn0hastwoprocessors,itislistedasn0:2. OnewaytoconvertthenodefiletotheMPDformatisasfollows:sort$PBSNODEFILE|uniq-C|awk’{printf(”%s:%s”,$2,$1);}’>mpd.nodes OncethePBSnodefileisconverted,MPDcanbenormallystartedwithinthePBSjobscriptusingmpdbootandtorndownusingmpdallexit.mpdboot-fmpd.hosts-n[NUMNODESREQUESTED]mpiexec-n[NUMPROCESSES]./mytestprogrammpdallexit5.7.2 OSCmpiexec PeteWyckofffromtheOhioSupercomputerCenterprovidesaalternateutil-itycalledOSCmpiexectolaunchMPICH2jobsonPBSsystemswithoutus-ingMPD.Moreinformationaboutthiscanbefoundhere:http://www.osc.edu/pw/mpiexec 6ManagingtheProcessManagementEnvironment Someoftheprocessmanagerssupplyusercommandsthatcanbeusedtointeractwiththeprocessmanagerandtocontroljobs.Inthissectionwedescribeusercommandsthatmaybeuseful. 6.1MPD mpdstartsanmpddaemon. mpdbootstartsasetofmpd’sonalistofmachines. mpdtracelistsalltheMPDdaemonsthatarerunning.The-loptionlists fullhostnamesandtheportwherethempdislistening. 7DEBUGGING19 mpdlistjobsliststhejobsthatthempd’sarerunning.Jobsareidentified bythenameofthempdwheretheyweresubmittedandanumber.mpdkilljobkillsajobspecifiedbythenamereturnedbympdlistjobsmpdsigjobdeliversasignaltothenamedjob.Signalsarespecifiedbyname ornumber.Youcanusekeystrokestoprovidesignalsintheusualway,wherempiexecstandsinfortheentireparallelapplication.Thatis,ifmpiexecisbeingruninaUnixshellintheforeground,youcanuse^C(control-C)tosendaSIGINTtotheprocesses,or^Z(control-Z)tosuspendallofthem.Asuspendedjobcanbecontinuedintheusualway. PreciseargumentformatscanbeobtainedbypassinganyMPDcom-mandthe--helpor-hargument.MoredetailscanbefoundintheREADMEinthempich2top-leveldirectoryortheREADMEfileintheMPDdirectorympich2/src/pm/mpd. 7Debugging Debuggingparallelprogramsisnotoriouslydifficult.Herewedescribeanumberofapproaches,someofwhichdependontheexactversionofMPICH2youareusing. 7.1gdbviampiexec IfyouareusingtheMPDprocessmanager,youcanusethe-gdbargumenttompiexectoexecuteaprogramwitheachprocessrunningunderthecontrolofthegdbsequentialdebugger.The-gdboptionhelpscontrolthemultipleinstancesofgdbbysendingstdineithertoallprocessesortoaselectedprocessandbylabelingandmergingoutput.Thecurrentimplementationhassomeminorlimitations.Forexample,wedonotsupportsettingyourownprompt.Thisisbecausewecapturethegdboutputandexamineitbeforeprocessingit,e.g.mergingidenticallines.Also,wesetabreakpointatthebeginningofmaintogetallprocessessynchronizedatthebeginning.Thus,theuserwillhaveaduplicate,unusablebreakpointifhesetsoneattheveryfirstexecutablelineofmain.Otherwise,totheextentpossible, 7DEBUGGING20 wetrytosimplypassuserinputthroughtogdbandletsthingsprogressnormally. Thefollowingscriptofa-gdbsessiongivesanideaofhowthisworks.Inputkeystrokesaresenttoallprocessesunlessspecifiallydirectedbythe“z”command. ksl2%0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9:0-9: mpiexec-gdb-n10cpi(gdb)l 5doublef(double);6 7doublef(doublea)8{9return(4.0/(1.0+a*a));10}1112intmain(intargc,char*argv[])13{14intdone=0,n,myid,numprocs,i;(gdb)15doublePI25DT=3.141592653589793238462643;16doublemypi,pi,h,sum,x;17doublestartwtime=0.0,endwtime;18intnamelen;19charprocessor_name[MPI_MAX_PROCESSOR_NAME];2021MPI_Init(&argc,&argv);22MPI_Comm_size(MPI_COMM_WORLD,&numprocs);23MPI_Comm_rank(MPI_COMM_WORLD,&myid);24MPI_Get_processor_name(processor_name,&namelen);(gdb)2526fprintf(stdout,\"Process%dof%dison%s\\n\27myid,numprocs,processor_name);28fflush(stdout);2930n=10000;/*default#ofrectangles*/31if(myid==0)32startwtime=MPI_Wtime();3334MPI_Bcast(&n,1,MPI_INT,0,MPI_COMM_WORLD);(gdb)b30 Breakpoint2at0x4000000000002541: file/home/lusk/mpich2/examples/cpi.c,line30. 7DEBUGGING 0-9:(gdb)r 0-9:Continuing. 0:Process0of10isonksl21:Process1of10isonksl22:Process2of10isonksl23:Process3of10isonksl24:Process4of10isonksl25:Process5of10isonksl26:Process6of10isonksl27:Process7of10isonksl28:Process8of10isonksl29:Process9of10isonksl20-9: 0-9:Breakpoint2,main(argc=1,argv=0x60000fffffffb4b8)0-9:at/home/lusk/mpich2/examples/cpi.c:300-9:30n=10000;*default#ofrectangles*/0-9:(gdb)n0-9:31if(myid==0)0-9:(gdb)n0:32startwtime=MPI_Wtime();1-9:34MPI_Bcast(&n,1,MPI_INT,0,MPI_COMM_WORLD);0-9:(gdb)z00:(gdb)n0:34MPI_Bcast(&n,1,MPI_INT,0,MPI_COMM_WORLD);0:(gdb)z 0-9:(gdb)where 0-9:#0main(argc=1,argv=0x60000fffffffb4b8)0-9:at/home/lusk/mpich2/examples/cpi.c:340-9:(gdb)n0-9:36h=1.0/(double)n;0-9:(gdb)0-9:37sum=0.0;0-9:(gdb)0-9:39for(i=myid+1;i<=n;i+=numprocs)0-9:(gdb)0-9:41x=h*((double)i-0.5);0-9:(gdb)0-9:42sum+=f(x);0-9:(gdb)0-9:39for(i=myid+1;i<=n;i+=numprocs)0-9:(gdb)0-9:41x=h*((double)i-0.5);0-9:(gdb)0-9:42sum+=f(x);0-9:(gdb) 21 7DEBUGGING22 0-9:39for(i=myid+1;i<=n;i+=numprocs)0-9:(gdb)0-9:41x=h*((double)i-0.5);0-9:(gdb)0-9:42sum+=f(x);0-9:(gdb)0-9:39for(i=myid+1;i<=n;i+=numprocs)0-9:(gdb)0-9:41x=h*((double)i-0.5);0-9:(gdb)0-9:42sum+=f(x);0-9:(gdb)0-9:39for(i=myid+1;i<=n;i+=numprocs)0-9:(gdb)0-9:41x=h*((double)i-0.5);0-9:(gdb)0-9:42sum+=f(x);0-9:(gdb)0-9:39for(i=myid+1;i<=n;i+=numprocs)0-9:(gdb)0-9:41x=h*((double)i-0.5);0-9:(gdb)0-9:42sum+=f(x);0-9:(gdb)psum 0:$1=19.9998759514977991:$1=19.9998675516727252:$1=19.9998587518635493:$1=19.9998495520713284:$1=19.9998399522971585:$1=19.9998299525422036:$1=19.9998195528076587:$1=19.9998087530947698:$1=19.9997975534048329:$1=19.9997859537391920-9:(gdb)c 0-9:Continuing. 0:piisapproximately3.1415926544231256,Erroris0.00000000083333251-9: 1-9:Programexitednormally. 1-9:(gdb)0:wallclocktime=44.9094120: 0:Programexitednormally.0:(gdb)q 0-9:MPIGDBENDINGksl2% 7DEBUGGING Youcanattachtoarunningjobwith mpiexec-gdba where 23 7.2TotalView MPICH2supportsuseoftheTotalViewdebuggerfromEtnus,throughtheMPDprocessmanageronly.IfMPICH2hasbeenconfiguredtoenabledebuggingwithTotalView(SeethesectiononconfigurationoftheMPDprocessmanagerintheInstaller’sGuide)thenonecandebuganMPIpro-gramstartedwithMPDbyadding-tvtotheglobalmpiexecarguments,asin mpiexec-tv-n3cpi YouwillgetapopupwindowfromTotalViewaskingwhetheryouwanttostartthejobinastoppedstate.Ifso,whentheTotalViewwindowappears,youmayseeassemblycodeinthesourcewindow.Clickonmaininthestackwindow(upperleft)toseethesourceofthemainfunction.TotalViewwillshowthattheprogram(allprocesses)arestoppedinthecalltoMPIInit.WhendebuggingwithTotalViewusingtheabovestartupsequence,MPICH2jobscannotberestartedwithoutexitingTotalView.InMPICH2version1.0.6orlater,TotalViewcanbeinvokedonanMPICH2jobasfollows: totalviewpython-a‘whichmpiexec‘-tvsu\\ IfyouhaveMPICH2version1.0.6orlaterandTotalView8.1.0orlater,youcanuseaTotalViewfeaturecalledindirectlaunchwithMPICH2.InvokeTotalViewas: totalview ThenselecttheProcess/StartupParameterscommand.ChoosetheParalleltabintheresultingdialogboxandchooseMPICH2astheparallelsystem. 8MPE24 ThensetthenumberoftasksusingtheTasksfieldandenterotherneededmpiexecargumentsintotheAdditionalStarterArgumentsfield. IfyouwanttobeabletoattachtoarunningMPICH2jobusingTo-talView,youmustusethe-tvsuoptiontompiexecwhenstartingthejob.UsingthisoptionwilladdabarrierinsideMPIInitandhencemayaffectstartupperformanceslightly.ItwillhavenoeffectontherunningofthejoboncealltaskshavereturnedfromMPIInit.Inordertodebugarun-ningMPICH2job,youmustattachTotalViewtotheinstanceofPythonthatisrunningthempiexecscript.Ifyouhavejustonetaskrunningonthenodewhereyouinvokedmpiexec,andnootherPythonscriptsrunning,therewillbethreeinstancesofPythonrunningonthenode.OneoftheseistheparentoftheMPICH2taskonthatnode,andoneistheparentofthatPythonprocess.NeitherofthoseistheinstanceofPythonyouwanttoattachto—theyarebothrunningtheMPDscript.ThethirdinstanceofPythonhasnochildrenandisnotthechildofaPythonprocess.Thatistheonethatisrunningmpiexecandistheoneyouwanttoattachto. 8MPE MPICH2comeswiththesameMPE(Multi-ProcessingEnvironment)toolsthatareincludedwithMPICH1.TheseincludeseveraltracelibrariesforrecordingtheexecutionofMPIprogramsandtheJumpshotandSLOGtoolsforperformancevisualization,andaMPIcollectiveanddatatypecheckinglibrary.TheMPEtoolsarebuiltandinstalledbydefaultandshouldbeavailablewithoutrequiringanyadditionalsteps.TheeasiestwaytouseMPEprofilinglibrariesisthroughthe-mpe=switchprovidedbyMPICH2’scompilerwrappers,mpicc,mpicxx,mpif77,andmpif90. 8.1MPILogging MPEprovidesautomaticMPIlogging.Forinstance,toviewMPIcommu-nicationpatternofaprogram,fpilog.f,onecansimplylinkthesourcefileasfollows: mpif90-mpe=mpilog-ofpilogfpilog.f 8MPE25 The-mpe=mpilogoptionwilllinkwithappropriateMPEprofilinglibraries.Thenrunningtheprogramthroughmpiexecwillresultalogfile,Unknown.clog2,intheworkingdirectory.ThefinalstepistoconvertandviewthelogfilethroughJumpshot:jumpshotUnknown.clog2 8.2User-definedlogging InadditiontousingthepredefinedMPEloggingtologMPIcalls,MPEloggingcallscanbeinsertedintouser’sMPIprogramtodefineandlogstates.ThesestatesarecalledUser-Definedstates.Statesmaybenested,allowingonetodefineastatedescribingauserroutinethatcontainsseveralMPIcalls,anddisplayboththeuser-definedstateandtheMPIoperationscontainedwithinit. Thetypicalwaytoinsertuser-definedstatesisasfollows: •GethandlesfromMPElogginglibrary:MPELoggetstateeventIDs()hastobeusedtogetuniqueeventIDs(MPElogginghandles).1ThisisimportantifyouarewritingalibrarythatusestheMPEloggingroutinesfromtheMPEsystem.HardwiringtheeventIDsisconsideredabadideasinceitmaycauseeventIDconfictandsothepracticeisn’tsupported. •Settheloggedstate’scharacteristics:MPEDescribestate()setsthenameandcolorofthestates. •Logtheeventsoftheloggedstates:MPELogevent()arecalledtwicetologtheuser-definedstates. OlderMPElibrariesprovideMPELoggeteventnumber()whichisstillbe-ingsupportedbuthasbeendeprecated.UsersarestronglyurgedtouseMPELoggetstateeventIDs()instead. 1 8MPE Belowisasimpleexamplethatusesthe3stepsoutlinedabove. 26 inteventID_begin,eventID_end;... MPE_Log_get_state_eventIDs(&eventID_begin,&eventID_end);... MPE_Describe_state(eventID_begin,eventID_end, \"Multiplication\\"red\"); ... MyAmult(Matrixm,Vectorv){ /*Logthestarteventofthered\"Multiplication\"state*/MPE_Log_event(eventID_begin,0,NULL);...Amultcode,includingMPIcalls... /*Logtheendeventofthered\"Multiplication\"state*/MPE_Log_event(eventID_end,0,NULL);} ThelogfilegeneratedbythiscodewillhavetheMPIroutinesnestedwithintheroutineMyAmult(). Besidesuser-definedstates,MPE2alsoprovidessupportforuser-definedeventswhichcanbedefinedthroughuseofMPELoggetsoloeventID()andMPEDescribeevent().Formoredetails,e.g.seecpilog.c. 8.3MPIChecking TovalidatealltheMPIcollectivecallsinaprogrambylinkingthesourcefileasfollows: mpif90-mpe=mpicheck-owrong_realswrong_reals.fRunningtheprogramwillresultwiththefollowingoutput: >mpiexec-n4wrong_reals StartingMPICollectiveandDatatypeChecking!Process3of4isaliveBacktraceofthecallstackatrank3: At[0]:wrong_reals(CollChk_err_han+0xb9)[0x8055a09] 8MPE27 At[1]:wrong_reals(CollChk_dtype_scatter+0xbf)[0x8057bff]At[2]:wrong_reals(CollChk_dtype_bcast+0x3d)[0x8057ccd]At[3]:wrong_reals(MPI_Bcast+0x6c)[0x80554bc]At[4]:wrong_reals(mpi_bcast_+0x35)[0x80529b5]At[5]:wrong_reals(MAIN__+0x17b)[0x805264f]At[6]:wrong_reals(main+0x27)[0x80dd187] At[7]:/lib/libc.so.6(__libc_start_main+0xdc)[0x9a34e4]At[8]:wrong_reals[0x8052451] [cli_3]:abortingjob: FatalerrorinMPI_Comm_call_errhandler: CollectiveChecking:BCAST(Rank3)-->Inconsistentdatatypesignatures detectedbetweenrank3andrank0.TheerrormessagehereshowsthattheMPIBcasthasbeenusedwithin-consistentdatatypeintheprogramwrongreals.f.8.4MPEoptions OtherMPEprofilingoptionsthatareavailablethroughMPICH2compilerwrappersare -mpe=mpilog :AutomaticMPIandMPEuser-definedstateslogging.Thislinksagainst-llmpe-lmpe.:TraceMPIprogramwithprintf.Thislinksagainst-ltmpe.:AnimateMPIprograminreal-time.Thislinksagainst-lampe-lmpe. :CheckMPIProgramwiththeCollective&Datatype Checkinglibrary.Thislinksagainst-lmpe_collchk.:UseMPEgraphicsroutineswithX11library.Thislinksagainst-lmpe -mpe=mpitrace -mpe=mpianim -mpe=mpicheck -mpe=graphics -mpe=log 9OTHERTOOLSPROVIDEDWITHMPICH228 -mpe=nolog :NullifyMPEuser-definedstateslogging.Thislinksagainst-lmpe_null.:Printthehelppage. -mpe=help FormoredetailsofhowtouseMPEprofilingtools,seempich2/src/mpe2/README. 9OtherToolsProvidedwithMPICH2 MPICH2alsoincludesatestsuiteforMPI-1andMPI-2functionality;thissuitemaybefoundinthempich2/test/mpisourcedirectoryandcanberunwiththecommandmaketesting.ThistestsuiteshouldworkwithanyMPIimplementation,notjustMPICH2. 10 10.1 MPICH2underWindows Directories ThedefaultinstallationofMPICH2isinC:\\ProgramFiles\\MPICH2.Un-dertheinstallationdirectoryarethreesub-directories:include,bin,andlib.TheincludeandlibdirectoriescontaintheheaderfilesandlibrariesnecessarytocompileMPIapplications.Thebindirectorycontainsthepro-cessmanager,smpd.exe,andtheMPIjoblauncher,mpiexec.exe.ThedllsthatimplementMPICH2arecopiedtotheWindowssystem32directory. 10.2Compiling ThelibrariesinthelibdirectorywerecompiledwithMSVisualC++.NET2003andIntelFortran8.1.ThesecompilersandanyothersthatcanlinkwiththeMS.libfilescanbeusedtocreateuserapplications.gccandg77forcygwincanbeusedwiththelibmpich*.alibraries. ForMSDeveloperStudiousers:CreateaprojectandaddC:\\ProgramFiles\\MPICH2\\include 10MPICH2UNDERWINDOWS29 totheincludepathand C:\\ProgramFiles\\MPICH2\\lib tothelibrarypath.Addmpi.libandcxx.libtothelinkcommand.Addcxxd.libtotheDebugtargetlinkinsteadofcxx.lib. IntelFortran8usersshouldaddfmpich2.libtothelinkcommand.Cygwinusersshoulduselibmpich2.alibfmpich2g.a. 10.3Running MPIjobsarerunfromacommandpromptusingmpiexec.exe.SeeSec-tion5.4onmpiexecforsmpdforadescriptionoftheoptionstompiexec. AFREQUENTLYASKEDQUESTIONS30 AFrequentlyAskedQuestions ThisisthecontentoftheonlineFAQ,asofJune23,2006. A.1 A.1.1 GeneralInformation Q:WhatisMPICH2? MPICH2isafreelyavailable,portableimplementationofMPI,theStandardformessage-passinglibraries.ItimplementsbothMPI-1andMPI-2.A.1.2 Q:WhatdoesMPICHstandfor? A:MPIstandsforMessagePassingInterface.TheCHcomesfromChameleon,theportabilitylayerusedintheoriginalMPICHtoprovideportabilitytotheexistingmessage-passingsystems.A.1.3 Q:CanMPIbeusedtoprogrammulticoresystems? A:TherearetwocommonwaystouseMPIwithmulticoreprocessorsormultiprocessornodes: UseoneMPIprocesspercore(here,acoreisdefinedasaprogramcounterandsomesetofarithmetic,logic,andload/storeunits). UseoneMPIprocesspernode(here,anodeisdefinedasacollectionofcoresthatshareasingleaddressspace).Usethreadsorcompiler-providedparallelismtoexploitthemultiplecores.OpenMPmaybeusedwithMPI;theloop-levelparallelismofOpenMPmaybeusedwithanyimplementationofMPI(youdonotneedanMPIthatsupportsMPITHREADMULTIPLEwhenthreadsareusedonlyforcomputationaltasks).Thisissometimescalledthehybridprogrammingmodel. AFREQUENTLYASKEDQUESTIONS31 A.2 A.2.1 BuildingMPICH2 Q:WhatisthedifferencebetweentheMPDandSMPDprocessmanagers? MPDisthedefaultprocessmanagerforMPICH2onUnixplatforms.ItiswritteninPython.SMPDistheprimaryprocessmanagerforMPICH2onWindows.ItisalsousedforrunningonacombinationofWindowsandLinuxmachines.ItiswritteninC.A.2.2 Q:DoIhavetoconfigure/make/installMPICH2eachtimeforeachcompilerIuse? No,inmanycasesyoucanbuildMPICH2usingonesetofcompilersandthenusethelibraries(andcompilationscripts)withothercompilers.However,thisdependsonthecompilersproducingcompatibleobjectfiles.Specifically,thecompilersmust •Supportthesamebasicdatatypeswiththesamesizes.Forexample,theCcompilersshouldusethesamesizesforlonglongandlongdouble. •Mapthenamesofroutinesinthesourcecodetonamesintheobjectfilesintheobjectfileinthesameway.ThiscanbeaproblemforFor-tranandC++compilers,thoughyoucanoftenforcetheFortrancom-pilerstousethesamenamemapping.Morespecifically,mostFortrancompilersmapnamesinthesourcecodeintoalllower-casewithoneortwounderscoresappendedtothename.TousethesameMPICH2li-brarywithallFortrancompilers,thosecompilersmustmakethesamenamemapping.Thereisoneexceptiontothisthatisdescribedbelow.•PerformthesamelayoutforCstructures.TheClangaugedoesnotspecifyhowstructuresarelayedoutinmemory.For100%compatibil-ity,allcompilersmustfollowthesamerules.However,ifyoudonotuseanyoftheMPIMINLOCorMPIMAXLOCdatatypes,andyoudonotrelyontheMPICH2librarytosettheextentofatypecreatedwithMPITypestructorMPITypecreatestruct,youcanoftenignorethisrequirement. AFREQUENTLYASKEDQUESTIONS32 •Requirethesameadditionalruntimelibraries.NotallcompilerswillimplementthesameversionofUnix,andsomeroutinesthatMPICH2usesmaybepresentinonlysomeoftheruntimelibrariesassociatedwithspecificcompilers.Theabovemayseemlikeastringentsetofrequirements,butinpractice,manysystemsandcompilersetsmeettheseneeds,iffornootherreasonthanthatanysoftwarebuiltwithmultiplelibrarieswillhaverequirementssimilartothoseofMPICH2forcompatibility. Ifyourcompilersarecompletelycompatible,downtotheruntimeli-braries,youmayusethecompilationscripts(mpiccetc.)byeitherspecify-ingthecompileronthecommandline,e.g. mpicc-cc=icc-cfoo.c orwiththeenvironmentvariablesMPICHCCetc.(thisexampleassumeac-shellsyntax): setenvMPICH_CCiccmpicc-cfoo.c Ifthecompileriscompatibleexceptfortheruntimelibraries,thenthissameformatworksaslongasaconfigurationfilethatdescribesthenecessaryruntimelibrariesiscreatedandplacedintotheappropriatedirectory(the“sysconfdir”directoryinconfigureterms).Seetheinstallationmanualformoredetails. Insomecases,MPICH2isabletobuildtheFortraninterfacesinawaythatsupportsmultiplemappingsofnamesfromtheFortransourcecodetotheobjectfile.Thisisdonebyusingthe“multipleweaksymbol”supportinsomeenvironments.Forexample,whenusinggccunderLinux,thisisthedefault.A.2.3 Q:HowdoIconfiguretousetheAbsoftFortrancompilers? A:Youhaveseveraloptions.OneistousetheFortran90compilerforbothF77andF90.Another(ifyoudonotneedFortran90)istouse--disable-f90whenconfiguring.TheoptionswithwhichwetestMPICH2andtheAbsoftcompilersarethefollowing: AFREQUENTLYASKEDQUESTIONS FFLAGS\"-f-B108\" F90FLAGS\"-YALL_NAMES=LCS-B108\"F77f77F90f90 33 setenvsetenvsetenvsetenfA.2.4 Q:WhenIconfigureMPICH2,IgetamessageaboutFDZEROandtheconfigureaborts A:FDZEROispartofthesupportfortheselectcalls(see“manselect”or“man2select”onLinuxandmanyotherUnixsystems).Whatthismeansisthatyoursystem(probablyaMac)hasabrokenversionoftheselectcallandrelateddatatypes.ThisisanOSbug;theonlyrepairistoupdatetheOStogetpastthisbug.Thistestwasaddedspecificallytodetectthiserror;iftherewasaneasywaytoworkaroundit,wewouldhaveincludedit(wedon’tjustimplementFDZEROourselvesbecausewedon’tknowwhatelseisbrokeninthisimplementationofselect). Ifthisconfigureworkswithgccbutnotwithxlc,thentheproblemiswiththeincludefilesthatxlcisusing;sincethisisanOScall(evenifemulated),allcompilersshouldbeusingconsistentifnotidenticalincludefiles.Inthiscase,youmayneedtoupdatexlc.A.2.5 Q:WhenIusetheg95Fortrancompilerona64-bitplat-form,someofthetestsfail A:Theg95compilerincorrectlydefinesthedefaultFortranintegerasa64-bitintegerwhiledefiningFortranrealsas32-bitvalues(theFortranstandardrequiresthatINTEGERandREALbethesamesize).ThiswasapparentlydonetoallowaFortranINTEGERtoholdthevalueofapointer,ratherthanrequiringtheprogrammertoselectanINTEGERofasuitableKIND.Toforcetheg95compilertocorrectlyimplementtheFortranstandard,usethe-i4flag.Forexample,settheenvironmentvariableF90FLAGSbeforeconfiguringMPICH2: setenvF90FLAGS\"-i4\" G95usersshouldnotethatthere(atthiswriting)aretwodistributionsofg95for64-bitLinuxplatforms.Oneuses32-bitintegersandreals(andconformstotheFortranstandard)andoneuses32-bitintegersand64-bit AFREQUENTLYASKEDQUESTIONS34 reals.Werecommendusingtheonethatconformstothestandard(notethatthestandardspecifiestheratioofsizes,nottheabsolutesizes,soaFortran95compilerthatused64bitsforbothINTEGERandREALwouldalsoconformtotheFortranstandard.However,suchacompilerwouldneedtouse128bitsforDOUBLEPRECISIONquantities).A.2.6 Q:WhenIrunmake,itfailsimmediatelywithmanyer-rorsbeginningwith“sock.c:8:24:mpidusock.h:NosuchfileordirectoryInfileincludedfromsock.c:9:../../../../in-clude/mpiimpl.h:91:21:mpidpre.h:Nosuchfileordirec-toryInfileincludedfromsock.c:9:../../../../include/mpiimpl.h:1150:error:syntaxerrorbefore”MPIDVCRT”../../../../in-clude/mpiimpl.h:1150:warning:nosemicolonatendofstructorunion” CheckifyouhavesettheenvirnomentvariableCPPFLAGS.Ifso,unsetitanduseCXXFLAGSinstead.Thenrerunconfigureandmake.A.2.7 Q:Whenbuildingthessmorsshmchannel,Igettheer-ror“mpiduprocesslocks.h:234:2:error:#error***Noatomicmemoryoperationspecifiedtoimplementbusylocks***” Thessmandsshmchannelsdonotworkonallplatformsbecausetheyusespecialinterprocesslocks(oftenassembly)thatmaynotworkwithsomecompilersormachinearchitectures.TheyworkonLinuxwithgcc,Intel,andPathscalecompilersonvariousIntelarchitectures.TheyalsoworkinWindowsandSolarisenvironments.A.2.8 Q:WhenusingtheIntelFortran90compiler(version9),themakefailswitherrorsincompilingstatementthatref-erenceMPIADDRESSKIND. Checktheoutputoftheconfigurestep.Ifconfigureclaimsthatifortisacrosscompiler,thelikelyproblemisthatprogramscompiledandlinkedwithifortcannotberunbecauseofamissingsharedlibrary.Trytocompileandrunthefollowingprogram(namedconftest.f90): AFREQUENTLYASKEDQUESTIONSprogramconftest integer,dimension(10)::nend 35 Ifthisprogramfailstorun,thentheproblemisthatyourinstallationofiforteitherhasanerrororyouneedtoaddadditionalvaluestoyourenvi-ronmentvariables(suchasLDLIBRARYPATH).Checkyourinstallationdocu-mentationfortheifortcompiler.Seehttp://softwareforums.intel.com/ISN/Community/en-US/search/SearchResults.aspx?q=libimf.soforanexampleofproblemsofthiskindthatusersarehavingwithversion9ofifort. IfyoudonotneedFortran90,youcanconfigurewith--disable-f90.A.2.9 Q:ThebuildfailswhenIuseparallelmake Parallelmake(ofteninvokedwithmake-j4)willcauseseveraljobstepsinthebuildprocesstoupdatethesamelibraryfile(libmpich.a)concurrently.Unfortunately,neitherthearnortheranlibprogramscorrectlyhandlethiscase,andtheresultisacorruptedlibrary.Fornow,thesolutionistonotuseaparallelmakewhenbuildingMPICH2. A.3 A.3.1 WindowsversionofMPICH2 IamhavingtroubleinstallingandusingtheWindowsver-sionofMPICH2 SeethetipsforinstallingandrunningMPICH2onWindowsprovidedbyauser,BrentPaul.OrseetheMPICH2WindowsDevelopmentGuide. A.4 A.4.1 CompilingMPIPrograms C++andSEEKSET Someusersmaygeterrormessagessuchas SEEK_SETis#definedbutmustnotbefortheC++bindingofMPI AFREQUENTLYASKEDQUESTIONS36 Theproblemisthatbothstdio.handtheMPIC++interfaceuseSEEKSET,SEEKCUR,andSEEKEND.ThisisreallyabugintheMPI-2standard.Youcantryadding #undefSEEK_SET#undefSEEK_END#undefSEEK_CUR beforempi.hisincluded,oraddthedefinition -DMPICH_IGNORE_CXX_SEEK tothecommandline(thiswillcausetheMPIversionsofSEEKSETetc.tobeskipped).A.4.2 C++andErrorsinNullcomm::Clone Someusers,particularlywitholderC++compilers,mayseeerrormessagesoftheform \"errorC2555:’MPI::Nullcomm::Clone’:overridingvirtualfunctiondiffersfrom’MPI::Comm::Clone’onlybyreturntypeorcallingconvention\".ThisiscausedbythecompilernotimplementingpartoftheC++standard.Toworkaroundthisproblem,addthedefinition -DHAVE_NO_VARIABLE_RETURN_TYPE_SUPPORTtotheCXXFLAGSvariableoradda #defineHAVE_NO_VARIABLE_RETURN_TYPE_SUPPORT1beforeincludingmpi.h. AFREQUENTLYASKEDQUESTIONS37 A.5 A.5.1 RunningMPIPrograms Q:HowdoIpassenvironmentvariablestotheprocessesofmyparallelprogram A:Thespecificmethoddependsontheprocessmanagerandversionofmpiexecthatyouareusing.Seetheappropriatespecificsection.A.5.2 Q:HowdoIpassenvironmentvariablestotheprocessesofmyparallelprogramwhenusingthempdprocessman-ager? A:Bydefault,alltheenvironmentvariablesintheshellwherempiexecisrunarepassedtoallprocessesoftheapplicationprogram.(TheoneexceptionisLDLIBRARYPATHwhenthempd’sarebeingrunasroot.)Thisdefaultcanbeoverriddeninmanyways,andindividualenvironmentvariablescanbepassedtospecificprocessesusingargumentstompiexec.Asynopsisofthepossibleargumentscanbelistedbytyping mpiexec-help andfurtherdetailsareavailableintheUsersGuide.A.5.3 Q:WhatdeterminesthehostsonwhichmyMPIprocessesrun? A:Whereprocessesrun,whetherbydefaultorbyspecifyingthemyourself,dependsontheprocessmanagerbeingused. Ifyouareusingthegforkerprocessmanager,thenallMPIprocessesrunonthesamehostwhereyouarerunningmpiexec. Ifyouareusingthempdprocessmanager,whichisthedefault,thenmanyoptionsareavailable.Ifyouareusingmpd,thenbeforeyourunmpiexec,youwillhavestarted,orwillhavehadstartedforyou,aringofprocessescalledmpd’s(multi-purposedaemons),eachrunningonitsownhost.Itislikely,butnotnecessary,thateachmpdwillberunningonaseparatehost.Youcanfindoutwhatthisringofhostsconsistsofbyrunningtheprogrammpdtrace.Oneofthempd’swillberunningonthe“local”machine,theone AFREQUENTLYASKEDQUESTIONS38 whereyouwillrunmpiexec.ThedefaultplacementofMPIprocesses,ifoneruns mpiexec-n10a.out istostartthefirstMPIprocess(rank0)onthelocalmachineandthentodistributetherestaroundthempdringoneatatime.Iftherearemoreprocessesthanmpd’s,thenwraparoundoccurs.Iftherearemorempd’sthanMPIprocesses,thensomempd’swillnotrunMPIprocesses.Thusanynumberofprocessescanberunonaringofanysize.Whileoneisdoingdevelopment,itishandytorunonlyonempd,onthelocalmachine.ThenalltheMPIprocesseswillrunlocallyaswell. Thefirstmodificationtothisdefaultbehavioristhe-1optiontompiexec(notagreatargumentname).If-1isspecified,asin mpiexec-1-n10a.out thenthefirstapplicationprocesswillbestartedbythefirstmpdintheringafterthelocalhost.(Ifthereisonlyonempdinthering,thenthiswillbeonthelocalhost.)Thisoptionisforusewhenaclusterofcomputenodeshasa“headnode”wherecommandslikempiexecarerunbutnotapplicationprocesses. Ifanmpdisstartedwiththe--ncpusoption,thenwhenitisitsturntostartaprocess,itwillstartseveralapplicationprocessesratherthanjustonebeforehandingoffthetaskofstartingmoreprocessestothenextmpdinthering.Forexample,ifthempdisstartedwith mpd--ncpus=4 thenitwillstartasmanyasfourapplicationprocesses,withconsecutiveranks,whenitisitsturntostartprocesses.ThisoptionisforuseinclustersofSMP’s,whentheuserwouldlikeconsecutiverankstoappearonthesamemachine.(Inthedefaultcase,thesamenumberofprocessesmightwellrunonthemachine,buttheirrankswouldbedifferent.) (Afeatureofthe--ncpus=[n]argumentisthatithastheaboveeffectonlyuntilallofthempd’shavestartednprocessesatatimeonce;afterwardseachmpdstartsoneprocessatatime.Thisisinordertobalancethenumberofprocessespermachinetotheextentpossible.) AFREQUENTLYASKEDQUESTIONS39 Otherwaystocontroltheplacementofprocessesarebydirectuseofargumentstompiexec.SeetheUsersGuide.A.5.4 Q:OnWindows,IgetanerrorwhenIattempttocallMPICommspawn. A:OnWindows,youneedtostarttheprogramwithmpiexecforanyoftheMPI-2dynamicprocessfunctionstowork.A.5.5 Q:Myoutputdoesnotappearuntiltheprogramexits A:Outputtostdoutandstderrmaynotbewrittenfromyourprocessimmediatelyafteraprintforfprintf(orPRINTinFortran)because,underUnix,suchoutputisbufferedunlesstheprogrambelievesthattheoutputistoaterminal.Whentheprogramisrunbympiexec,theCstandardI/Olibrary(andnormallytheFortranruntimelibrary)willbuffertheoutput.ForCprogrammers,youcaneitheruseacallfflush(stdout)toforcetheoutputtobewrittenoryoucansetnobufferingbycalling #include setvbuf(stdout,NULL,_IONBF,0); oneachfiledescriptor(stdoutinthisexample)whichyouwanttosendtheoutputimmedatelytoyourterminalorfile. ThereisnostandardwaytoeitherchangethebufferingmodeortoflushtheoutputinFortran.However,manyFortransincludeanextensiontoprovidethisfunction.Forexample,ing77, callflush() canbeused.Thexlfcompilersupports callflush_(6) wheretheargumentistheFortranlogicalunitnumber(here6,whichisoftentheunitnumberassociatedwithPRINT).WiththeG95Fortran95compiler,settheenvironmentvariableG95UNBUFFERED6tocauseoutputtounit6tobeunbuffered. REFERENCESA.5.6 Q:Fortranprogramsusingstdiofailwhenusingg95 40 A:Bydefault,g95doesnotflushoutputtostdout.Thisalsoappearstocauseproblemsforstandardinput.IfyouareusingtheFortranlogicalunits5and6(orthe*unit)forstandardinputandoutput,settheenvironmentvariableG95UNBUFFERED6toyes.A.5.7 Q:HowdoIrunMPIprogramsinthebackgroundwhenusingthedefaultMPDprocessmanager? A:TorunMPIprogramsinthebackgroundwhenusingMPD,youneedtoredirectstdinfrom/dev/null.Forexample, mpiexec-n4a.outReferences [1]MessagePassingInterfaceForum.MPI2:AMessagePassingInterface standard.InternationalJournalofHighPerformanceComputingAppli-cations,12(1–2):1–299,1998.[2]MarcSnir,SteveW.Otto,StevenHuss-Lederman,DavidW.Walker, andJackDongarra.MPI—TheCompleteReference:Volume1,TheMPICore,2ndedition.MITPress,Cambridge,MA,1998. 因篇幅问题不能全部显示,请点此查看更多更全内容Passthelistedenvironmentvariables(namesseparated
Like-envlist,butforallexecutables