Ab-initio quantum chemistry with neural-network wavefunctions
\tcbsetbefore upper= \newtcolorbox[auto counter]mybox[2][]floatplacement=t!,float,fonttitle=,title=Box \thetcbcounter — #2,#1 \addbibresourcerefs.bib \addbibresourcerefs-jh.bib \AtBeginBibliography
Abstract
Machinelearningandspecificallydeep-learningmethodshaveoutperformedhumancapabilitiesinmanypatternrecognitionanddataprocessingproblems,ingameplaying,andnowalsoplayanincreasinglyimportantroleinscientificdiscovery.Akeyapplicationofmachinelearninginthemolecularsciencesistolearnpotentialenergysurfacesorforcefieldsfromab-initiosolutionsoftheelectronicSchrödingerequationusingdatasetsobtainedwithdensityfunctionaltheory,coupledcluster,orotherquantumchemistrymethods.Herewereviewarecentandcomplementaryapproach:usingmachinelearningtoaidthedirectsolutionofquantumchemistryproblemsfromfirstprinciples.Specifically,wefocusonquantumMonteCarlo(QMC)methodsthatuseneuralnetworkansatzfunctionsinordertosolvetheelectronicSchrödingerequation,bothinfirstandsecondquantization,computinggroundandexcitedstates,andgeneralizingovermultiplenuclearconfigurations.Comparedtoexistingquantumchemistrymethods,thesenewdeepQMCmethodshavethepotentialtogeneratehighlyaccuratesolutionsoftheSchrödingerequationatrelativelymodestcomputationalcost.
Theseauthorscontributedequally\@footnotetextEmails:frank.noe@fu-berlin.de,giuseppe.carleo@epfl.ch,pfau@google.com
1 Introduction
Inthepastdecade,machinelearning(ML)hasmadeinroadsintomanyareasofthephysicalsciences\citepCarleoRMP19,oftenoutperformingmoretraditionalcomputationalmethods\citepjumper2021highly,DeringerN21orofferingentirelynewapproachestosolvescientificproblems\citepNoeS19,HuangNC20.Quantumchemistry(QC)hasbeenamongthefirstfieldstohavebeenaffectedbythisrevolution\citepTkatchenkoNC20,vonLilienfeldNC20,NoeARPC20.MostapplicationsofMLinQChavebeenconcernedwithsupervisedlearningofmolecularpropertiesfrommolecularstructure\citepDralJPCL20,eitheracrossconformational\citepUnkeCR21orchemicalspace\citepvonLilienfeldNRC20,aswellaswithunsupervisedlearningforthegenerationofnovelmolecules\citepBianJMM21.Thesemethodsallrequireapre-existingdatasetofmoleculesandtheirpropertiesasaninput,typicallyobtainedwithstandardmethodsofQCsuchasdensityfunctionaltheory\citepJonesDFTReview20215orcoupledclustertheory\citepBartlett2007.Inthesescenarios,MLaccuratelyapproximatesagivenmethodofQCatvastlyincreasedcomputationalefficiency.Thisapproachhasbeenalreadyreviewedinotherworkscitedabove.Incontrast,thecurrentreviewfocusesonthecomplementaryuseofMLasanab-initiotechniqueinQC,whichrequiresnoexternaldataandinsteadrecoversmolecularpropertiesfromfirstprinciples.Here,MLis“integrated”intoQC,withthegoalofarrivingatab-initiomethodswithamorefavourableaccuracy–efficiencytrade-offthantraditionalQCmethods.
Thegoalofcomputationalchemistryistopredictpropertiesofknownmoleculesandtodesignmoleculeswithdesiredproperties.Mostmolecularpropertiesaredeterminedbythebehaviouroftheelectrons,soQCmethodsattempttoapproximatetheSchrödingerequationforelectronsinmolecules.Traditionally,QCmethodsaredividedintoab-initioandsemi-empiricalmethods,wheretheformerhavenofittedparametersdeterminedfromexternaldata,whereasthelatterdo.Methodsthatdonotusequantummechanicsatall(suchasforcefields)arecalledempiricalandaretypicallynotconsideredpartofQC,althoughthisviewmaybechangingwiththeadventofprincipledandaccurateML-basedempiricalmethods.ItisusefultocastthesethreecategoriesofmethodsinthelightofMLterminology(Fig. 1a).MLcanberoughlydividedintosupervised,unsupervised,andreinforcementlearning.InsupervisedlearningtheMLmodellearnstopredictthelabels(outputs)ofthedata(inputs)fromagivendatasetsoastominimizethedifferencebetweenthepredictedandreferencelabels.Byidentifyingtheinputswithmolecularstructuresandtheoutputswithmolecularproperties,allsemi-empiricalandempiricalmethodsofQCfitintosupervisedlearning,butusingmostlyrelativelysimpleandphysicallymotivatedfunctionalformsratherthanthemoregeneralandhighlyflexiblefunctionstypicalforML.Viceversa,themanyrecentsuccessfulsupervisedMLmodelsthatpredictenergiesorothermolecularpropertiesbasedonQCtrainingdatacanbeclassifiedasempiricalmethods\citepDeringerCR21,BehlerCR21,UnkeCR21,MusilCR21.Unsupervisedlearningisconcernedwithunlabelleddata,andthegeneraltaskistolearntheunderlyingprobabilitydistributionthatwouldgenerateagivendataset.Examplesinchemistryincludegenerativemodelsforstructuralformulas\citepGomez-BombarelliACS18aswellasfull3Dstructuresofmolecules\citepNoeS19,Hoogeboom22,andinphysicstheestimationofquantumstatesfrommeasurements,knownasquantumtomography\citepTorlaiNP18.Finally,inreinforcementlearning,theMLmodel(alsoreferredtoasanagentisabletointeractdirectlywithitsenvironment,ratherthantojustpassivelyreceivedata.Here,theaimisfortheagenttolearnapolicyforhowtointeractwiththeenvironmentsoastomaximizealong-termreward\citepsutton2018reinforcement.ReinforcementlearningisbehindsomeofthemostprominentsuccessesofMLsuchasplayinggamesatasuperhumanlevel\citeptesauro1994td,mnih2015human,silver2016masteringorthecontrolofplasmaintokamaks\citepDegraveN22.Incertainsettingstheagentcanself-generatedatabytreatingitsownpolicyastheenvironment.Thisisknownasself-play,andhasbeenthebasisformanyadvancesinsymmetricgames\citepheinrich2015fictitious,SilverS18.Althoughtherearemanykeydifferences,thisisthebranchofMLconceptuallymostsimilartoab-initioQC,inthesensethatnoexternaldataotherthantherulesofthesystemorgamearerequiredforeither.Inthetraditionalpicture,onemovesfromempiricaltoab-initiomethodsbyretainingmoreofthefirst-principlesphysics.Similarly,thereisageneraltrendforMLmodelsinchemistrytoencodeanincreasingamountofmolecularphysics.Thisincludesphysicalconstraintssuchasenergyconservation\citepChmielaSA17,invarianceandequivarianceofmolecularpropertieswithrespecttorotation,translation,orexchangeofindistinguishableparticles\citepBehlerPRL07,SchuttNC17,aswellasotherphysicalconceptssuchasmany-bodyexpansions\citepDrautzPRB19orevensurrogatequantum-mechanicalmodels\citepLiJCTC18a,SchuttNC19,KirkpatrickS21.Similarconsiderationscanbemadefortheproblemofab-initiolearningofsolutionstotheelectronicSchrödingerequationintroducedhereandwewilldiscussdifferentstrategiesthroughoutthereview.TheSchrödingerequationisaneigenvalueproblemthatcanbeequivalentlyformulatedviaseveralvariationalprinciples—itssolutions,theeigenstatewavefunctionsandenergies,canbefoundbysearchingforstationarypointsofcertainfunctionalsoverthespaceofallphysicallyadmissiblewavefunctions.Importantly,thegroundstateofamoleculecanbefoundbyminimizingtheenergyexpectationvalueofawavefunction.Thisprincipleunderliesmanyab-initioQCmethods,andalsothemethodsinthisreview,assuchavariationalprinciplenaturallydefinesaMLproblem—theeigenstates(suchasthegroundstate)arerepresentedasaneuralnetworkandtheparametersofthatnetworkareobtainedbyminimizingthevariationalelectronicenergy.Thereviewedmethodsdifferintheparticularformoftheneural-networkansatzused,asdescribedbelow.
Section 2brieflyreviewsthecomponentsofelectronicstructuretheorynecessaryforthedevelopmentoftheMLmethodstobediscussedlateron.TheelectronicstructureproblemismappedtoMLinSection 3,whichisfollowedbyareviewoftheab-initioMLmethodsforQCformulatedinrealspaceandinadiscretebasisinSections 5 and 4,respectively.ThereviewisconcludedinSection 6.
2 Electronicstructure
2.1 Schrödingerequation
QCaimsatfindingapproximatesolutionsoftheelectronicSchrödingerequationthatstrikeagoodbalancebetweenaccuracyandefficiency\citepPiela20(Fig. 1b).Thenon-relativisticelectronicSchrödingerequationwithintheBorn–Oppenheimerapproximationforagivenmoleculespecifiedbythechargesandcoordinatesofthenuclei,,,isasecond-orderdifferentialequationforthewavefunction,,whichisafunctionofthecoordinatesofelectrons(Fig. 2a):
(1) | |||
(2) |
AnalternativeformulationoftheSchrödingerequationusesthenotionofanexpectationvalue,
(3) | ||||
InsteadofsolvingEq. 1,theground-state(lowest-energy)solutioncanbefoundbyminimizingthisenergyexpectationvaluewithrespecttoallpossiblewavefunctions(variationalprinciple),
(4) |
2.2 Antisymmetricwavefunctions
Electronsarefermions,andassuchtheirwavefunctionmustbeantisymmetricwithrespecttoexchangeofanytwoelectrons.ThiscardinalfeatureofelectronicwavefunctionspermeatesthewholeofQC.Ingeneral,electronsalsopossessspincoordinates,,butthenonrelativisticHamiltoniandoesnotoperateonspin,sothespincoordinateofeachelectroncanbeconsideredfixed.Tosimplifythepresentationhere\parencite[forfulltreatment,see][Sec. IV.E]FoulkesRMP01,wetakeadvantageofthefixedspincoordinates,sothespatialwavefunctionmustbeantisymmetriconlywithrespecttotheexchangeofsame-spinelectrons,i.e.,when(Fig. 2b),
(5) |
ByfarthemostcommonwaytoformantisymmetricwavefunctionsinQCisasantisymmetrizedproductsofsingle-electronfunctions(orbitals),.Theseproductscanbewrittenasdeterminantsofanmatrix,,formedbyputtingelectronsintoorbitals,andarereferredtoasSlaterdeterminants(Fig. 2c):
(6) |
Wheninterpretingasthe-thecomponentofa-dimensionalfeaturevectorforthe-thelectron(usingMLparlance),,aSlaterdeterminantisinfacttheonlyantisymmetricfunctionoffeaturevectorsthatislinearineveryoneofthem,makingitanaturalchoice.Alternativeantisymmetricformsexist,suchasthePfaffian\citepBajdichPRL06ortheVandermondedeterminantanditsgeneralizations\citepHanJCP19,AcevedoDLS20,butthesearefarlesscommonandwewillnotdiscussthemhere.Slaterdeterminantsformedfromdifferentorbitalscanbefurthermixedinalinearcombinationwithoutbreakingtheantisymmetry(Fig. 2c).Infact,thissimpletechniqueisthepowerhousebehindallthehigh-accuracymethodsofQC,yetitisalsoitsbane,becausethenumberofSlaterdeterminantsrequiredtoachieveagivenaccuracyrisesexponentiallywiththenumberofatomsinmostcases.Forfermionicwavefunctionsthereisnoknowngeneralapproachtoeffectivelyreducethesearchspacefromthisexponentialregimewithoutsacrificingaccuracy.However,QChasproducedmanymethodsthatachieveexcellentapproximationsforspecificmoleculesandmaterialsofpracticalinterest.Thecostofthesehighlyaccuratemethodsisgenerallylessthanexponential,butneverthelessincreasesrapidlywithsystemsize(Fig. 1b).
2.3 Variationalwavefunctionmethods
AnimportantclassofQCmethodsderivesdirectlyfromthevariationalprinciple(Eq. 4),byassumingacertainwavefunctionansatz,,parametrizedby.Minimizingtheenergyofthisansatzwithrespecttothenalwaysyieldsanupperboundfortheexactground-stateenergy,
(7) |
Theboundbecomestighterastheexpressivenessoftheansatzisimproved.Onecandistinguishtwostrategiestoconstructtheansatzes.First,traditionalQCusesrelativelysimpleforms,suchthattheintegralofEq. 3canbeevaluatedanalytically,whichdrasticallysimplifiestheminimizationproblem\citepSzabo96,Piela20.Second,quantumMonteCarlo(QMC)enablestheuseofarbitrarilycomplexansatzesatthecostofhavingtodotheintegralevaluationandminimizationstochastically\citepBecca17.Thelatterisanaturalframeworktoincorporateneuralnetworks,andweintroduceitinmoredetailinSection 3.1.Hereweintroducethreeansatzesforelectronicwavefunctionsofthefirst(traditional)kind,sincetheyserveasscaffoldingfortheneural-networkansatzesofSections 5 and 4.WealsobrieflydiscusshowtheyrelatetootherpopularQCmethods.{mybox}[label=box:first-second-quant]FirstandsecondquantizationComputationalmethodsfortheelectronicSchrödingerequationcanbedividedtofirst-quantizedapproachesinrealspaceandsecond-quantizedapproachesinadiscretebasis.Infirstquantization,oneworkswiththeindividualelectronsandtheircoordinatesdirectlyinrealspace(,)asinEq. 1,
Here,mustbeanantisymmetricfunction,whichspecifieswhichelectronsoccupywhichcoordinates,whilethemany-electronbasisstates()areordinarynon-symmetric(Cartesian)productstates.Insecondquantization,onehastofirstintroduceadiscretebasis(inpracticefinite),labelledby,whichthenenablesonetoworkwithpreformedantisymmetricmany-electronbasisstates(Slaterdeterminants),andratherthanspecifyingwhichelectronsoccupywhichone-electronstates,theoccupationnumbers(,)specifywhichone-electronstatesareoccupiedwithoutanyreferencetoaparticularelectron,
Here,canbeanarbitrarytensorwithoutantisymmetry,whichisinsteadencodedinthemany-electronbasisstates.Thisabilitytopushtheantisymmetryfromthewavefunctionobjecttothemany-electronbasisisthemainadvantageofsecondquantization,atthecostofhavingtocommittoaparticulardiscretebasis.Butregardlessofthecomputationalframework,eitherthewavefunctionobjectitself(infirstquantization)orthemany-electronbasis(insecondquantization)consistsofSlaterdeterminants,andinhigh-accuracymethodstheirnumbergrowsrapidlywithsystemsize.
Firstandsecondquantization.Illustrationonelectronsin1Dandafinitebasisofsize5..
Hartree–Fock
PerhapsthesimplestnontrivialansatzinQCisthesingleSlaterdeterminantofEq. 6,wheretheorbitalsareconsideredasfreeparameters.Optimizedvariationally,thisansatzleadstotheso-calledHartree–Fock(HF)method.Inpracticetheorbitalsarelinearlyexpandedinafixedfiniteone-electronbasis,,,withinmostcases:
(8) |
TheuseofafinitebasissetturnsthefunctionaloptimizationproblemofEq. 8intoacomputationalproblemwhosecostscaleswiththefourthpowerofthenumberofbasisfunctions,,assuminganaiveimplementation.Onitsown,theHFansatzisexpressiveenoughtodescribemuchofchemistryqualitatively,butnotalwaysandcertainlynotquantitatively.However,itcanbeconsideredastartingpointformostwavefunction-basedQCmethods.Densityfunctionaltheory(DFT)isnotsuchamethod,relyinginsteadonanin-principleexactmappingoftheab-initioHamiltonian(Eq. 2)toamean-field-likeproblem,whichcanbesolvedexactlywithasingleSlaterdeterminant\citepJonesDFTReview20215,TealePCCP22.However,thevariationalprincipledoesnotholdinDFTbecausetheexchange-correlationcontributionstotheenergyfunctionalarenotknownexactlyandmustbeapproximatedinpractice.Fromhereon,wewillstaywithinthevariationalprincipleandinsteadfocusonincreasingtheexpressivenessoftheHFansatz.
Configurationinteraction
TheHFansatzcanbestraightforwardlyextendedbyformingmultipleSlaterdeterminantsfromdifferentsetsoforbitalsandconsideringtheirlinearcombination(Fig. 2c),
(9) |
Whentheorbitalsofeachdeterminantarepooledfromalargersupersetof(mutuallyorthogonal)fixedorbitalsofsize,andtheonlyfreeparametersarethelinearcoefficientsofthedeterminants,theansatziscalledconfigurationinteraction(CI).OneoftheappealsoftheCIansatzisthatitsSlaterdeterminantscanbeconsideredamany-electronantisymmetricbasisandlabelledusingtheoccupationnumbersoftheone-electronstates.Thisso-calledsecondquantizedformalismhasmanyconvenientpropertiesforcomputation(seeBox LABEL:box:first-second-quant).ThesimplestversionofCI,calledfullCI(FCI),considersallpossibleSlaterdeterminantsandisexactwithinthechosenfiniteone-electronbasis.Intheusualcasewhen,however,thecomputationaleffortscalesexponentiallywith,whichmakesFCIapplicableonlytothesmallestmolecules.WaystotackletheexponentialscalingincludefixedtruncationoftheCIexpansionorits“compression”throughanalyticalmeans(coupledclustertheory,[Bartlett2007];matrixproductstates,[ChanDMRG2011]),deterministicpruning(selectedCI,[HuronJCP73]),orstochasticsampling(FCI-QMC,[BoothJCP09]).Section 5exploresanovelwayof“compressing”theCIexpansionthroughneuralnetworks.
Beyondfixedbases
TheeffectivenessoftheCIansatzdependsonthechoiceofthefixedmolecularorbitalsfromwhichtheSlaterdeterminantsarebuilt.AnaturalextensionofCIallowsboththeorbitalsandtheCIexpansioncoefficientstovaryduringthevariationalminimization.Suchanansatzoftwostackedlinearcombinations(Eqs. 9 and 8)ishardertooptimizebutmuchmoreexpressive.ThemostcommonvariantistoconsiderallSlaterdeterminantsformedbylettingelectronsoccupyaspaceoforbitals,whiletheremainingelectronsoccupyafixedsetofinactiveobitals.Thisiscalledthecompleteactivespaceself-consistentfield(CASSCF)method\citepOlsenIJQC11.Duetothelargervariationalfreedom,aCASSCFansatztypicallyrequiresmanyfewerdeterminantsthanaCIansatzofcomparableaccuracy.ButCASSCFandevenFCIarestilllimitedbythefixedone-electronbasisusedtoformthemolecularorbitals(Eq. 8):FCIisonlyexactinthecompletebasissetlimit,whichinpracticecannotbereachedforanybutthesmallestmolecularsystems.AnextensionoftheCASSCFansatzwouldallownotonlytheone-electronorbitalsbutalsotheone-electronbasisfunctionstovary.Thestackedstructureofsuchanansatzwouldbereminiscentofdeepneuralnetworks,andSection 4explorestheculminationofthislineofthoughtbyincorporatingactualdeepneuralnetworksintotheansatz.Thisremovesanyapriorilimitationsontheexpressiveness.Bymakingeachindividualdeterminantmaximallyexpressive,suchansatzesfurtherreducethenumberofdeterminantsrequiredtoreachagivenaccuracy.
3 MachinelearningforelectronicSchrödingerequation
{mybox}[label=sec:QMC]VariationalMonteCarloOptimizationofwavefunctionswithneuralnetworksnaturallyleadstothevariationalMonteCarlo(VMC)framework.First,MonteCarlointegrationofEq. 3canhandlearbitrarilycomplicatedansatzesforwhichanalyticalintegralsarenotavailable.Second,VMCsamplestheseintegralsstochasticallywhichnaturallycombineswiththestochasticgradientdescentusedforoptimizingneuralnetworks.IntraditionalQC,VMChasbeenusedextensivelywithreal-spacefirst-quantizedapproaches\citepFoulkesRMP01andmorerecentlyinthediscrete-basissecond-quantizedsetting\citepNeuscammanJAGPHilbert2013,SabzevariJCTC18.Theexpectationvalueofanyoperator,suchastheHamiltonian(Eq. 3),canbewrittenasaMonteCarlointegraloveracontinuousordiscretebasis,,
Here,theexpectationvalueisobtainedasanexpectedvalueofa“local”energy,localinthesensethatitisdefinedforeverybasiselement.AstraightforwardandgenerallyapplicablewaytoobtainthesamplesisMarkov-chainMonteCarlo(MCMC).MCMCisaniterativeprocedure,inwhichanewsamplepoint,,isproducedfromacurrentone,,bymakingaproposalstepwithprobability,andthenacceptingorrejectingtheproposalwithprobability
TheresultingMarkovchainthensamples.VariantsofMCMCdifferintheconstructionoftheproposalstepsand,andincludethesimplestMetropolisalgorithm()aswellasmoresophisticatedflavourssuchasLangevinMonteCarlo.TheVMCformulafortheexpectationvalueisexactinthelimitofinfinitesamplesize,,butinpracticeitincursastatisticalerrorproportionalto.Whileconvergesslowlywithsamplesize,VMChasthegreatbenefitthattheastheansatzconvergestotheexacteigenstates,thelocalenergyconvergestoaconstant(theexactenergy),andassuchitsvariancevanishesandsodoesthestatisticalsamplingerror.
3.1 Mappingquantummechanicstomachinelearning
Electronicstructure | Machinelearning |
---|---|
Wavefunction | Probabilitydistribution |
Naturalorbital | Marginaldistribution |
Stochasticreconfiguration | Naturalgradientdescent |
Hartree–Fock | Mean-fieldvariationalBayes |
DiffusionMonteCarlo | Particlefiltering; |
SequentialMonteCarlo |
AMLproblemanditssolutionarespecifiedbythemodel,itsinputsandoutputs,thedata,andtheoptimizationcriterion(lossfunction).Inthisregard,solvingtheSchrödingerequationwiththevariationalprincipleamountstothefollowingMLproblem(Fig. 3).Theneuralnetwork(Section 3.2)representsawavefunction,whichacceptselectroncoordinates(firstquantization)oroccupationnumbers(secondquantization)asinputandoutputsthewavefunctionvalue.Thelossfunctionistheenergyexpectationvaluecorrespondingtothiswavefunction.Theinputsaresampledfromtheprobabilitydistributiongivenbythesquareofthewavefunctionrepresentedbythecurrentneuralnetwork,andtheHamiltonianoperatorisusedtoobtainanestimateofthelossfunctionfromthesamples.Theparametersofthenetwork,andthusthewavefunction,arethenmodifiedtominimizethelossfunction.Exceptfortherepresentationofthewavefunctionasanetwork,thisistheregularvariationalMonteCarlo(VMC)framework(Box LABEL:sec:QMC).Theoptimizationmethodsused(Box LABEL:sec:optimization)arealsofairlyconventional,althoughadaptedtoaneuralnetworkcontext.ThisstraightforwardcorrespondencebetweentheSchrödingerequationandMLledtotheintroductionofsimilarconceptsonbothsides,albeitknownunderdifferentnames(Table 1).Theapplicabilityofdeeplearningforquantum-mechanicalcalculationswasfirstrealizedandexploitedby\citetCarleoS17forthecaseofspinlatticesinoneandtwodimensions.Theirapproach,knownasNeuralQuantumStates(NQS),hassincebeenappliedtomanydifferentquantumsystems\citepsaito2017solving,nomura2017restricted,corey2021variational,nikita2021broken.Inessence,thisreviewisconcernedwiththeextensionofthisapproachtoelectronsinmolecules.{mybox}[label=sec:optimization]Optimizingneural-networkansatzesUptothestatisticalerror,theVMCexpectationvaluefortheenergy(Box LABEL:sec:QMC)obeysthevariationalprinciple(Eq. 4).VMCexploitsthisbyvaryingaparametricwavefunctionansatzsoastominimizetheenergy.Forasufficientlyexpressiveansatz,thevariationalenergywilleventuallyapproximatethegroundstateenergyofEq. 1andtheansatzwillapproximatethegroundstatewavefunction.Themoststraightforwardoptimizationmethodisgradientdescent,wheretheparametersareiterativelyupdatedas
withlearningrate.Theenergygradientisgivenby
where
isanoperatorrepresentingthelogarithmicderivativesofthewavefunction.ThisgradientcanbeefficientlyestimatedusingMonteCarlointegration(Box LABEL:sec:QMC).Insomecasestheoptimizationcanbespedupandmademorestablewithhigher-ordermethods,suchasthestochasticreconfiguration(SR)scheme\citepSorellaPRL98.SRtakesthecorrelationbetweenindividualvariationalparametersintoaccountbyintroducingthequantumgeometrictensor:
Theupdateruleisthenmodifiedto
TheSRschemeapproximatesanimaginary-timeevolutionwhereeachiterationtriestobestapproximatethestate.SRissimilartothenaturalgradientdescentalgorithm\citepamari˙natural˙1998thatiswell-knownintheMLcommunity,andcanbeinterpretedasaquantumgeneralizationoftheFisherinformationmatrix\citepAy17.Insomecases,itisconvenienttoapproximatethequantumgeometrictensorusingtheKronecker-factoredapproximatecurvature(KFAC)approach\citepmartens2015optimizing.
3.2 Deeplearning
Thestandardpracticeinab-initioQCtodayisinsomewaysanalogoustothestateofcomputervisionbeforetheriseofdeeplearning.Priorto2012,thebestpipelinesforlarge-scaleimagerecognitionconsistedofacombinationofhand-designedfeaturesandsimpleMLmodels\citepperronnin2010large.Asingledeepconvolutionalneuralnetworktrainedend-to-endwasabletocuttherecognitionerrorinhalfrelativetothesesystems\citepkrizhevsky2012imagenet,andsincethendeepneuralnetworkshavedominatedcomputervisionresearch.Inab-initioQC,ground-statesolutionstotheSchrödingerequationareusuallyrepresentedbyawavefunctionansatzwitharelativelysimplefunctionalform,andparametersareusuallyfitthroughamixofprocedures(fixed-pointiteration,variationaloptimization)ratherthanaunifiedend-to-endestimationofallparameterssimultaneously.ThedevelopmentofdeepQMCmethodsisdrivenbythehopethattheuseofneuralnetworkswillsignificantlyincreasetheexpressivenessofwavefunctionansatzes,enablinglargeleapsinaccuracyasinimagerecognition.ToappreciatehowandwhydeepneuralnetworkscanbeusefullyappliedinQC,abriefreviewoftheirapplicationinartificialintelligenceisnecessary.Forathoroughreviewofthehistoryofdeeplearning,see\citetschmidhuber2015deep,andforareviewofthefundamentalconceptsindeeplearning,see\citetlecun2015deep.Neuralnetworksdatebacktotheverybeginningofcomputerscience\citepmcculloch1943logical,andtheirmodernformoriginateswiththesingleperceptron“unit”\citeprosenblatt1958perceptron,whichproducesasoutputanon-linearfunctionofthesumofaconstant,knownasthebias,andalinearcombinationofitsinputs.Thenon-linearfunctionrisesfromzerotooneasitsinputincreases,mimickingtheactivationfunctionofabiologicalneuron.Whenmanysuchunitsareassembledinparalleltoforma“layer,”andseverallayersarecomputedserially,takingtheoutputfromonelayerastheinputtothenext,theresultingmulti-layerperceptron(MLP)can,intheory,representanysmoothfunctiontoarbitraryaccuracygivenenoughunits\citephornik1989multilayer.However,actuallyfittingorlearningasetofparametersthatmatchesanygivenfunctionisdifferentmatter.Aformofgradientdescentutilizingderivativescomputedusingbackpropagation,orreverse-modeautomaticdifferentiation\citepwerbos1974beyond,linnainmaa1970representation,linnainmaa1976taylor,wasfoundtobeeffectivefortrainingneuralnetworks\citeprumelhart1986learning.Thisledtoawaveofenthusiasmforneuralnetworks,whicheventuallyfadedasseveralissueswerediscovered,suchastheinfamous“vanishinggradients”andgettingstuckinlocalminima.Severalfactorswereinstrumentalinrehabilitatingneuralnetworksunderthebannerof“deeplearning”:acombinationofalgorithmicadvances\citepglorot2010understandingandtheuseofmodernGPUhardware\citephooker2020hardwaremadethecomputationsmuchfaster,andtheresultingabilitytotrainlargernetworksmadeissueswithlocalminimalesssevere\citepdauphin2014identifying,choromanska2015loss.Furthermore,deepneuralnetworkswiththehelpofstochasticgradientdescentcanbeappliedstraightforwardlyandefficientlytolargedatasets,unlikeotherMLmodels\citepbottou2008learning,bottou2011tradeoffs.Finally,empiricalsuccesseslikewinningtheImageNetLargeScaleVisualRecognitionChallenge\citeprussakovsky2015imagenethelpedlegitimizedeeplearningresearchandgenerateexcitementamongresearchers.Today,thebarriertoentryfordevelopingandtrainingdeepneuralnetworksisquitelow,thankstoamatureecosystemofsoftwarelibrariesfornumericalcomputingwithautomaticdifferentiationandhardwareaccelerators\citepAbadiOSDI16,paszke2017automatic,bradbury2018jax.However,actuallyachievinggoodperformancefromadeeplearningmodelstillrequiressomefinesseandapplicationofvariousheuristics.Itissafetosaythatasignificantamountofthepracticeofdeeplearningremainsmoreartthanscience.Thegoodnewsisthatonceeffectiveheuristicsforaparticularproblemdomainhavebeendeveloped,thesesameheuristicscanoftenbeappliedwithlittlemodificationtootherproblemsinthesamedomain.
3.3 Neuralnetworkarchitectures
Thestartingpointformostneuralnetworksisthemulti-layerperceptron(MLP),formedasacompositionoflayers,
(10) | ||||
whereissomenon-linearactivationfunction,andandarethematricesofweightsandvectorsofbiasestolearn.WhileavanillaMLPiscapableofrepresentingarbitraryfunctions,therealpowerofneuralnetworkscomesfrommoresophisticatedarchitectures.Manyofthesearchitecturesaredesignedtoencodesomeparticularinvarianceorequivariance—thatis,whentheinputtothenetworkistransformedinaparticularway,theoutputshouldeitherbeunchangedorshouldtransforminacorrespondingway.Forinstance,theweightsinalayerofaconvolutionalneuralnetwork(ConvNet)\citeplecun1998gradientarerestrictedtobeadiscreteconvolutionoperator,whichconstrainseachlayertobetranslation-equivariant,anaturalconstraintforimagerecognition,andalsodramaticallyreducesthenumberofpossibleweightsinalayer.Equivariancetopermutationisanotherfrequentlyusefulproperty,andonethatisespeciallyimportantinreal-spaceapproachestorepresentingelectronicwavefunctions(seeSection 4).Asimplepermutation-equivariantlayerfirstproposedby\citetshawe1989buildingcanbeconstructedbyapplyingthesametransformationtoeachinputandsummingtheresults.Moresophisticatedpermutation-equivariantlayersareusedbymodelsliketheTransformer\citepvaswani2017attentionorSchNet\citepschutt2018schnet.Manyoftheseequivariantlayerscanbeunifiedinaconceptualframeworkbasedaroundthelanguageofgeometryandgrouptheory,whereinthechoiceoftransformationtobeequivarianttoleadsnaturallytorecipesforconstructingtheappropriateneuralnetworklayers\citepbronstein2021geometric.Anotherclassofneuralnetworkarchitectures,whichhavebeeninfluentialaswavefunctionansatzes,arerestrictedBoltzmannmachines(RBMs)\citephinton2006reducing.Thesewereoriginallydevelopedforunsupervisedlearning,butintheVMCsettingconsideredheretheyleadtoasimpledeterministicexpressionforthelogprobabilitythatcloselyresemblesaone-layerMLP.Despitetheirearlypopularity,RBMshavebeenlargelyeclipsedintheAIcommunitybyothermethodsforunsupervisedlearning,suchasvariationalautoencoders\citepkingma2013auto,generativeadversarialnetworks\citepgoodfellow2014generative,normalizingflows\citeprezende2015variational,autoregressivemodels\citepoord2016wavenet,oord2016conditional,anddiffusionmodels\citepsohl2015deep.Infact,someofthesenewermodelshavestartedtohaveanimpactasneuralnetworkwavefunctionansatzesforspinsystems.Examplesaredeepautoregressivequantumstates\citepsharir2020deep,convolutionalneuralnetworks\citepchoo2019two,recurrentneuralnetworks\citephibat-allah˙recurrent˙2020,andnormalizingflows\citepxie˙ab-initio˙2021.
4 Electronsinfirstquantization
Oneapproachtostudyingtheelectronicproblemwithdeeplearningistoworkwithparameterizedmany-bodywavefunctionsinfirstquantization,.Herestandsforthe-tupleofelectroncoordinates,,andsamplingisrealizedoverelectronicpositions(Box LABEL:sec:QMC).Theantisymmetryconstraint(Eq. 5)mustbeimposedintoavoidcollapsingontoalower-energybosonicstate.Acommonlyadoptedformis,wherethefirstfactorissymmetric(or“bosonic”)underexchangeofelectroncoordinatesandthesecondfactorcarriesthenecessaryantisymmetry.ThesimplestandmostcommonapproachistobuildtheantisymmetricpartofthewavefunctionsusingSlaterdeterminants(Eq. 6).AsdiscussedinSection 2,singleSlaterdeterminantswithfixedorbitalshavelimitedexpressivenessandmanysuchdeterminantsneedtobecombinedtoachievehighaccuracy.Anaturalgeneralizationofasumoffixed-orbitalSlaterdeterminantsisthecommonly-usedSlater–Jastrowwavefunction
(11) |
wheretheJastrowfactor,constitutesthesymmetric(“bosonic”)partofthestateandtypicallycontainsone-andtwo-body(andinmanycaseshigher-order)parameterizedcorrelations.Thesetnotation,,indicatesthatdoesnotdependontheorderoftheelectroncoordinates.ThedeterminantsinEq. 11aretypicallyreplacedwiththeproductofspin-upandspin-downdeterminants\citepFoulkesRMP01.Separatingtheup-anddown-spindeterminantsimprovescomputationalefficiency,simplifiestheimplementation,andmakesiteasiertohandletheelectron-electroncusps,whileleavingexpectationvaluesofspin-independentoperatorsunchanged.Moreflexibleparametricformscanbeobtainedleveragingtheapproximationpowerofartificialneuralnetworks.Inthefollowing,wediscussneural-network-basedstrategiestoparameterizetheseforms.
4.1 Discretespace
Thefirstapplicationsofneuralnetworkstoelectronicsystemswereforelectronsmovingindiscretizedspace,asrealized,forexample,inthe2DHubbardmodelofstrongly-interactingelectrons.Inthefollowing,forsimplicity,wediscussthecaseofspinlesselectronsinlatticesites,anddenotewiththediscretelatticeindexcorrespondingtoelectronposition.Theextensiontothespinfulcasewillbeconsideredmoreindetailwhendiscussingcontinuousspacelateron.ThesymmetricpartcanbereadilyparameterizedwithastrategycloselyrelatedtoNQSforspins:
(12) |
whereistheuniqueoccupationnumberrepresentationcorrespondingtotheelectronicpositionsandrepresentagenericfunctionwhichcouldberepresentedbyaneuralnetwork.Sincetheoccupationnumbersareinvariantunderpermutationoftheelectronpositions,isalsosymmetricunderexchange.AnyoftheNNarchitecturesalsoadoptedforspinsystems\citepCarleoS17orlatticebosons\citepsaito2017solvingcanbeusedtorepresentthesymmetricpart.EarlyworksontheHubbardmodeladoptedpositive-definiteRBM-basedparameterizationsof\citepnomura2017restricted,whilemorerecentworkshaveadopteddeep-networkparameterizationsallowingforsignchanges\citepstokes˙quantum˙2020.Thesimplestparameterizationfortheantisymmetricpart,,isagainaSlaterdeterminant
(13) |
wherethematrixofdiscreteorbitalsholdsthevariationalparameterstobeoptimized.Thisapproach,however,hastheimportantdrawbackofnotprovidingenoughvariationalflexibility,sinceiteffectivelyfixestheanti-symmetricparttoamean-fieldreferencesolution.
Neuralbackflow
Asignificantimprovementisobtainedbyconsideringamany-bodybackflowtransformationoftheorbitals\citepfeynman˙energy˙1956,kwon˙effects˙1993.Inthisvariationalform,thematrixofone-electronorbitalsispromotedtoaparameterizedmany-electronfunctiondependingonalltheoccupationnumbers:
(14) |
whereisacorrectiontothesingle-particleorbitals.Inphysics-inspiredparameterizations,istypicallytakentobeasimplefunctionoftheelectronicoccupationnumbers\citeptocchio˙role˙2008.Theneuralbackflowmethod\citepLuoPRL19insteadintroducedaflexibleparameterizationofthebackfloworbitalsbasedonartificialneuralnetworks.Inthiscase,isparameterizedwithaMLPtakingasinputstheelectronicoccupationnumbersandoutputingamany-bodycorrectiontothematrix.Thisapproachallowstheorbitalstodynamicallychangedependingonthepositionsoftheelectrons,thusallowingonetoincludegenuinelymany-bodycorrelationsintheantisymmetricpartofthewavefunction.
Constrainedhiddenfermions
Neuralbackflowtransformationsarenottheonlywaytointroduceflexibleparameterizationsoftheantisymmetricpartofthewavefunction.Theconstrainedhiddenfermionformalismbuildsontheideaofintroducingasetofauxiliaryfermionicparticles,withpositions,andlivingonlatticesites.Theseauxiliaryparticlesareusedtoeffectivelymediatecorrelationsamongthephysicaldegreesoffreedom\citeprobledo˙moreno˙fermionic˙2022.CallingaSlaterdeterminantfortheextended(physical+hidden)system,theresultingantisymmetricformforthephysicalsystemisgivenby
(15) |
Inthisexpression,isafunction,parameterizedbyaneuralnetwork,mappingthephysicalpositionstothehiddenones.Thisapproachhasbeenshowntoimprovesystematicallyovertheneuralbackflowformforthe2DHubbardmodel\citeprobledo˙moreno˙fermionic˙2022.
4.2 Continuousspace
Wenowfocusondescribingtheimportantcaseoffirst-quantizedelectronsincontinuousspace,directlycorrespondingtotheelectronicSchrödingerequation.Asinthediscrete-spacecase,theSlater–Jastrowformmaybeimprovedinamattersuitableforusewithneuralquantumstatesbyaddingabackflowtransformation,inwhichtheone-electronorbitalsarereplacedbymany-electronfunctions.Thebackflowtransformationcaneithermodifytheorbitalsdirectlyviaamultiplicativeand/oradditiveterm:
(16) |
oractasaquasiparticletransformationoftheelectroncoordinates:
(17) |
wheretheparamterizedfunctions,,areinvarianttopermutationsof,andisathree-componentvectorthatmodifies.Ifweconsideradeterminantoforbitalsofthisform,
(18) |
thenweseethatorbitalswithbackflowtransformationsarejustoneexampleofabroaderclassoffunctions:inorderforthedeterminanttobeantisymmetric,thematrixwithelementsmustbepermutation-equivariant;thatis,exchangingelectronsandalsoexchangescolumnsand.WhiletraditionalSlater–Jastrow–backflowwavefunctionshavehadconsiderablesuccess,theyalsohavelimitationsduetothechoiceoffixedfunctionalforms.Thegoal,therefore,istocomeupwithmoreflexiblepermutation-equivariantfunctions.Herewehighlightseveralapproachesthatsharethiscommontheme.
Iterativebackflow
\citetTaddeiPRB15introducedaformofbackflowthatappliedEq. 17repeatedlyinaninterativefashion.Suchanansatzisformallyequivalenttoexpressingthebackflowasadeepneuralnetwork\citepRuggeri2018-ql,albeitwithartificialrestrictiononthedimensionalityofthehiddenlayers.TheiterativebackflowwasusedforstudyingtheHeandHeliquids
DeepWF
TheDeepWF\citepHanJCP19approachusesanansatzsimilartoaSlater–Jastrowwavefunctionbutwithasimplerantisymmetricterm:
(19) |
ThelearnedsymmetricfunctionissimilartoaJastrowfactorandensuresthatthewavefunctioncapturestheelectron-nuclearandelectron-electroncuspconditions.TheantisymmetricfactorsareconstructedfromtheVandermonde-likedeterminantofanexplicitlyantisymmetrictwo-bodyfunction,.Thetwo-bodyantisymmetricfunctionisentirelylearned.Suchafunctionalformcanbeevaluatedinoperations,comparedtoforadeterminant.However,theuseofasimplifiedantisymmetricfunctionisalsolikelytolimittheaccuracyachieved:DeepWFobtainsonly43.6%ofthecorrelationenergyfortheberylliumatomanddoesnotevenreachHFaccuracyfortheboronatom.ThePauliNetandFermiNetapproachesdescribedbelowdomuchbetter.VanillaPauliNetobtained99.94%and97.3%ofthecorrelationenergiesfortheberylliumandboronatoms,andFermiNet99.97%and99.83%,respectively.Furthermore,FermiNetandPauliNetbothsubstantiallysurpassconventionalSlater-Jastrow-backflow(SJB)wavefunctionsonfirst-rowatoms,forwhichnearlyexactbenchmarkvaluesexist.
PauliNet
PauliNet\citepHermannNC20buildsuponHForCASSCForbitalsasaphysicallymeaningfulbaselineandtakesaneuralnetworkapproachtotheSJBwavefunctioninordertocorrectthisbaselinetowardsahigh-accuracysolution(Fig. 4a).Cuspconditionsareexplicitlymetviatheinclusionofcuspcorrectiontermsinthewavefunction\citepMa2005-cusps.Agraph-convolutionalblockbasedonSchNet\citepschutt2018schnetisusedtocreateapermutation-equivariantlatentspacerepresentationdependingonthemany-electronconfiguration.ThisembeddingisthenpassedintoseparatedeepneuralnetworksthatlearntheJastrowfactoranda(cuspless)backflowtransformation.\citetHermannNC20introducedPauliNetwithapurelymultiplicativebackflowasshowninFig. 4a;\citetSchatzleJCP21generalizedthistoamultiplicativeandadditivebackflowasshowninEq. 16.PauliNetisoptimizedwithafixednumberofSlaterdeterminants.Mostoftheresultsreportedin\citetHermannNC20,SchatzleJCP21wereobtainedwitharound10determinants.
FermiNet
FermiNet\citepPfauPRR20takesamoreminimalist(ormachine-learningmaximalist)approachandattemptstotrainaneuralnetworktorepresenttheentirewavefunction(Fig. 4b).FermiNetusestwoparallelnetworks,describingone-andtwo-electronfeaturesrespectively.Theinputstoeachlayerintheone-electronstreamarepermutation-equivariantfunctionsoftheactivationsfromthepreviouslayersoftheone-andtwo-electronstreams.Thefinallayerprojectsthelatentspaceintotherequirednumberoforbitals,fromwhichdeterminantscanbeformedandevaluated.AswithPauliNet,thefinalwavefunctionisasumoveranumberofdeterminants.Formostoftheresultsreportedin\citetPfauPRR20,16determinantswereused.FermiNetbuildsuparichdescriptionofelectron-electroninteractionsfromthepermutation-equivariantmixingofinformationdescribingone-andtwo-electronfeatures.Inparticular,theelectron-nuclearandelectron-electroncuspsinthewavefunctionarerepresentedaccurately,despitenotbeingencodedexplicitly.WhereasPauliNetisusuallytrainedwiththeADAMoptimizer,FermiNettrainingwasfoundtobesubstantiallyimprovedwhenemployingtheKFACoptimizer.WhilebothPauliNetandFermiNetexceedtheaccuracyofconventionalSJBwavefunctionsonsmallsystems,thereareimportanttradeoffsbetweenthetwomodels.ResultsfrombothontheautomerizationofcyclobutadienecanbeseeninFig. 5.TheFermiNetistypicallytrainedwithalargernumberofparametersthanthePauliNet,requiringmoreiterationsandmorecomputationperiterationtoconverge,butittypicallyconvergestoalowerabsoluteenergy.Recently,\citetgerard2022goldproposedahybridansatzwhichusesneuralnetworklayerssimilartotheSchNetandPauliNetinaFermiNet-likearchitecture.ThishybridansatzwasfoundtoreachevenlowerabsoluteenergiesthantheFermiNetonsystemslikebenzeneandthepotassiumatom.
Potentialenergysurfaces
Typicallyoneoptimisesawavefunctionataspecificgeometrybutthisquicklybecomesprohibitivelyexpensiveforexploringthehigh-dimensionalpotentialenergysurfaceofevenrelativelysmallmolecules.\citetScherbelaNCS22developedatrainingmethodologythatallowsweightsharingbetween(simplified)PauliNetarchitecturestargetingdifferentgeometries.Byswitchingthegeometrybeingtrainedateachepoch,theyshowedthatthecomputationalcostfortrainingacrossasetofgeometriescanbeimprovedbyanorderofmagnitudewithoutaffectingtheaccuracyofthefinalenergies,with95%ofnetworkparameterssharedacrossallgeometries.Thisimpliesthatthenetworkislearningfeaturesofelectroncorrelationingeneralratherthanfittingtoaspecificgeometry.Theyalsodemonstratedthatawavefunctionforalargermoleculecouldbeinitialisedfromawavefunctionforasmallermoleculeandcouldthenbefine-tunedinarelativelyshortoptimizationstage.PretrainingneuralnetworkwavefunctionsfromsmallersystemshasalsobeenshowntodramaticallyaccelerateconvergenceforKagomelatticemodels\citepYang2020-bk.Inasimilarvein,\citetGao2021-cg,gao2022samplingdemonstratedthatameta-learningapproach,whereagraphneuralnetworkisusedtoparameterizeawavefunctionmodel,canaccuratelyrepresentthewavefunctionsformultiplegeometries,enablingafullyquantum-mechanicalpotentialenergysurfacetoberepresentedinasinglemodel.TheirapproachusedaFermiNet-likewavefunctionmodel,butthemeta-learningconceptisdirectlyapplicabletootherwavefunctionrepresentations,assumingthewavefunctionformissufficientlyflexible.
Periodicsystems
Therehasalsobeenprogressonusingfirst-quantizedneuralnetworkarchitecturesinperiodicsystems,suchasinteractingquantumgasesinlowdimension\citeppescia˙neural-network˙2022,theelectrongas\citepwilson2022-ueg,cassella2022-ueg,Li2022-abinitio,andforsmallcellsofsolidssuchaslithiumhydrideandgraphene\citepLi2022-abinitio.Again,sufficientlyexpressivenetworksattheVMClevelhavebeenfoundcapableofrivallingorsurpassingtheaccuracyoffixed-nodediffusionMonteCarlocalculationsusingconventionalSlater-Jastrow-backflowtrialwavefunctions.
4.3 Extensions
Pseudopotentials
Theelectronicstructureofheavyatoms,especiallytransitionmetals,iscomplicatedandchallengingforallQCmethods.ThedifficultyiscompoundedbythehighcomputationalcostofvariationalMonteCarlomethods,whichscaleroughlyas\citepHammond1987,whereisthenuclearcharge.Whilstthecoreelectronscontributeheavilytothetotalenergy,energydifferencesarelargelydeterminedbythebehaviourofthevalenceelectrons.Thecoreelectronscanthereforeberemovedandtheeffectivenuclearchargereducedbytheuseofpseudopotentials.Theuseofpseudopotentialsiscommoninmanymethods,includingdensityfunctionaltheoryandconventionalvariationalMonteCarlo.\citetLi2022-pseudodemonstratethateffectivecorepotentialscanbereadilycombinedwithFermiNetandachieveaccuracycomparabletoCCSDT(Q)extrapolatedtothecompletebasissetlimitforfirst-rowtransitionmetalatoms.Thecomputationaltimeperiterationwasreducedby43%(17%)forthescandium(zinc)atomusinganargoncore.Again,thisapproachisnotrestrictedtoFermiNet.Pseudopotentialscanbeusedwithanyfirst-quantizedneuralnetworkwavefunction.
DiffusionMonteCarlo(DMC)
ProjectormethodssuchasDMC\citepneeds2020variationalandauxiliary-fieldMonteCarlo\citepShiJCP21gobeyondVMCbyusingstochasticalgorithmstosamplethegroundstatewithoutrequiringitswavefunctiontoberepresentedasaknownfunctionornetwork.DMCisinprincipleexactbut,formany-fermionsystems,reliesinpracticeonthefixed-nodeapproximation,inwhichcollapsetothebosonicgroundstateisavoidedbyimposingthesignstructureofthetrialwavefunctionontheDMCwavefunction.ADMCsimulationthereforesamples(stochastically)thelowestenergystatewiththesamesignstructureasthetrialwavefunction.TheimprovementsthatresultfromapplyingDMCtoconventionalSlater-Jastrow-backflowtrialfunctionsoptimizedusingVMCmethodsaresubstantial,explainingwhyDMCissooftenusedtoprovideimprovedestimatesoftheground-statewavefunctionandenergy.\citetWilson21combinedDMCwithaFermiNettrialwavefunction.Forfirst-rowatoms,DMCcapturedmuchoftheremainingcorrelationenergy(94%ofthedifferencebetweentheVMCenergyandtheexactenergyinthecaseofthenitrogenatom).However,\citetWilson21usedasimplifiedFermiNetthatgaveVMCenergieshigherthanthosereportedby\citetPfauPRR20,whichwerealreadywithin1mHofexactresultsforallfirst-rowatoms.Givenevidencethatthemean-fieldequivalentofPauliNetcanessentiallymatchHFinthecompletebasissetlimit\citepSchatzleJCP21,itispossiblethattheremainingerrorinPauliNetandFermiNetwavefunctionsisdominatedbyerrorsinthenodalsurface,whicharerarelysampledregionsduringoptimisation.Ifthisisthecase,diffusionMonteCarlowiththefixednodeapproximationmaynotproducesubstantiallylowerenergies.Ontheotherhand,sinceneuralnetworkwavefunctionsroutinelycaptureover90%ofthecorrelationenergyattheVMClevel,theneedtoperformexpensivediffusionMonteCarlocalculationsisgreatlyreduced.Morerecently,\citetRen22showedthatDMCcancaptureroughlyhalfoftheremainingcorrelationenergyfortheatomsLi-Ar,whenusingaverysmallFermiNet-basedarchitecture.WhilstitispossibletoachieveenergieswithinchemicalaccuracyusingFermiNetattheVMClevel,thesecalculationsmodelthecaseforlargersystemswhereconvergingtheenergywithrespecttonetworksizemightnotbefeasible.\citetRen22wentontodemonstratethatDMCusingFermiNettrialwavefunctionsnoticeablyreducestheenergyforlargersystems.Inthecaseofthebenzenedimer,thereductionwas50mH.
ExcitedStates
Ourdiscussionsofar,andmostVMCcalculations,havefocusedongroundstateproperties.However,excitedstatesareofcriticalimportancetounderstandthebehaviourofmaterials.Fortunately,recentalgorithmicdevelopmentsbymultiplegroupshavedemonstratedthatthecalculationofexcitedstatesusingVMCmethodsisfeasibleandcanachieveanacceptabletrade-offinaccuracyandcost.HerewehighlightthreesuchapproachesutilizingconventionalVMCwavefunctions.Oneapproachisthestate-averagedVMCmethod\citepSchautz2004,Dash2019,inwhichtheaverageenergyovermultiplestatesisminimisedandindividualstatesareprojectedoutviadiagonalizationwithinthebasisofexcitedstates.Similartechniquesareusedwithotherquantumchemistrymethods.\citetZhao2016insteadminimizedadifferentobjectivefunction,suchthatthestatewithenergyclosesttoadesiredenergytargetisobtained.\citetPathak2021suggestedasimplealternative,whereastateisforcedtobe(approximately)orthogonaltoalllowerenergystatesviaapenaltyterm.ThesetechniquescanbereadilyappliedtoVMCusingneural-networkwavefunctionsand,inparticular,penaltyfunctionapproacheshaverecentlybeenexplored.Aswithground-statecalculations,theflexibilityofthewavefunctionansatztorepresentthedesiredstateiscritical.\citetEntwistle22demonstratedthatthePauliNetarchitecturecombinedwithapenaltyfunctioncanrepresentthelowestfewexcitedstatesofmoleculesuptothesizeofbenzene(Fig. 4c).Relatedly,\citetChooPRL2020demonstratedthatNQSonlatticemodelscanobtainthelowest-energystateofanygivenAbeliansymmetrybyperformingwhatisessentiallyaground-statesimulationinthatsymmetrysector,andmultiplestatesofthesamesymmetryusingapenaltyfunction.However,themostaccurateandefficientwaytoobtainexcitedstateswithinVMC,irrespectiveofwavefunctionansatz,remainsanopenquestion\citepCuzzocrea2020.
5 Electronsinsecondquantization
Insteadofworkingdirectlywiththeinfinite-dimensionalHilbertspacecorrespondingtothereal-spaceHamiltonianofEq. 2,itiscommonpracticeinQCtouseafinitebasisset.Bychoosingasetofelectronicbasisfunctions,wecandefineasetofsecond-quantisedoperators()whichcreate(annihilate)anelectroninthe-thbasisfunction,andwhichsatisfythecanonicalanticommutationrelations.Theseoperatorsthenactonthesecond-quantizedwavefunction,whichencodesamplitudesfordifferentoccupationsoftheorbitals(Box LABEL:box:first-second-quant).Projectingthereal-spaceHamiltonianontothissetoforbitalsthenyieldsthecorrespondingdiscretizedHamiltonian,
(20) |
where
(21) | ||||
(22) |
arematrixelementsoftheone-andtwo-electrontermsinthereal-spaceHamiltonianofEq. (2).ForsimplebasisfunctionssuchasGaussiansorplanewaves,thematrixelementscanbeevaluatedanalytically.ThisHamiltonianservesasthestartingpointforthemethodsdescribedinthissection.
5.1 Fermionicneuralquantumstates
Insteadofworkingdirectlywiththeoccupation-numberrepresentationofthewavefunction(Box LABEL:box:first-second-quant),itisalsopossibletomapoccupationnumbersontodegreesoffreedomofspin-1/2particles,suchthatemptyorbitalsmaptodownspinsandoccupiedorbitalstoupspins.ThismappingmakesitpossibletoleverageNQSandothermethodsforsolvingquantumspinsystems.ThesamedualityallowsthecreationandannihilationoperatorsappearingintheelectronicHamiltonian(Eq. 20)tobewrittenintermsofspinoperators.Thiscanbeachieved,forexample,withtheJordan–Wignermapping\citepWigner1928,thattransformsannihilationandcreationoperatorsinto,respectively,loweringandraisingspinoperators.Thismappingisnotunique,however,andthereexistmorerecentalternatives,suchasparityorBravyi–Kitaevencodings\citepBK2002,bothofwhichhavebeendevelopedinthecontextofquantumsimulations.Regardlessofthechoiceofspinencoding,thefinaloutcomeisaspinHamiltonianwiththegeneralform
(23) |
definedasalinearcombinationwithrealcoefficientsof,whichare-foldtensorproductsofsingle-qubitPaulioperatorsandtheidentity:.ThegroundstateofthespinHamiltonianinEq. 23canbeapproximatedusingaspin-basedNQSrepresentationbasedoncomplex-valuedRBMs\citepCarleoS17.Forasystemofspins,themany-bodyamplitudecorrespondingtoastateinthebasis,i.e.,),takesthecompactform
(24) |
withparameters.ThisansatzcanbeoptimisedwithVMCtechniques(Box LABEL:sec:optimization),typicallyrelyingonthestochasticreconfiguration\citepSorellaPRL98approach.Anumberofworkshaveadoptedthisapproachandachievedcompetitivevariationalresultsforsmallbasissets\citepChooNC20,YangJCTC20,eveninconjunctionwithquantumcomputers\citeptorlai2020precise,iouchtchenko2022neural.InFig. 6(a),weshowthedissociationcurveof,intheSTO-3Gbasis,usingtheRBMasdescribedabove\citepChooNC20.
Solids
Thesecond-quantizationframeworkalsoallowsonetotreatsolids,usingasabasistheBlochorbitalsobtainedbysolvingthecrystallineHFequations\citepdel˙re˙self-consistent-field˙1967.Creationandannihiliationoperators,and,forelectronsinbandwithcrystalmomentumareintroduced,andtheresultingHamiltonianissimilartoEq. 20,withthenoticeabledifferencethattheone-andtwo-bodymatrixelementsnowdependonthecrystalmomenta:and,withthefourmomentaappearinginthetwo-bodyintegralssatisfyingtheconservationofthetotalcrystalmomentum.UsingGaussian-basedatomicfunctionsasthesingle-particlebasisandRBMwavefunctionstorepresentthemany-bodystate,\citepyoshioka2021solvingappliedthisapproachtostudytheelectronicstructureofsolids.InFig. 6(b),weshowthecomputedground-stateenergiesforgraphenecrystalsasafunctionofthelatticeconstant.
ExactSampling
FermionicNQSaretypicallysampledusingtheMCMCapproachcommonlyadoptedinVMC(Box LABEL:sec:QMC).However,themixingrateoftheMCMCalgorithmisknowntobeslowinsomecases,suchasclosetophasetransitions,andMCMCsimulationscansufferfromcriticalslowingdown.Awaytocircumventthislimitationistointroducemodelwavefunctionsexplicitlydesignedtoallowexactsamplingoftheirsquaremodulus,thusavoidingtheneedtouseMCMC.Onesuchfamilyareautoregressiveneuralnetworkwavefunctions\citepsharir2020deep,acomplex-valuedgeneralizationoftheautoregressivemodelscommonlyadoptedindeeplearning.Suchnetworksrepresentnormalizedwavefunctionsandallowonetodirectlyobtainperfectlyuncorrelatedsamples;thisisusefulasthewavefunctiondistributionformanyQCproblemscanbehighlymulti-modal.TheexactsamplingapproachwasappliedtoQChamiltoniansinarecentworkby\citetBarrettNMI22.OptimizationsinthewayHamiltonianmatrixelementsandthecorrespondingMonteCarloestimatorsarecomputedhavemadeitpossibletotreatmuchlargersystemsthanwereaccessibleintheearlyapplicationsof\citetChooNC20.Specifically,\citetzhao˙scalable˙2022)obtaincompetitivevariationalenergies,improvingontheCCSDenergiesofmoleculesinminimalbasissets.Resultsforuptoaround50electronsin80orbitals(\ceNa2CO3atequilibrium)havebeenobtainedatrelativelymodestcomputationalcost.
5.2 ML-assistedselectedCI
FormanyQCproblems,althoughthedimensionoftheHilbertspacegrowsexponentiallywithsystemsize,thenumberofrelevantconfigurationsinthegroundstatetypicallyremainssparse.ThissuggeststhatbyefficientlyselectingtherelevantconfigurationsandthendiagonalisingtheHamiltonianonthereducedsubspace,onecanachievehighlyaccurateresults.ThissetofapproachesisalsoknownasselectedCI\citepHuronJCP73,giner2013using,holmes2016heat,sharma2017semistochastic.DifferentflavoursofselectedCIvaryinthewayrelevantconfigurationsareselected.Onewell-knownapproachiscalledMonteCarloCI(MCCI)\citepgreer1998monteandcanbebrieflysummarisedasfollows:
-
Startfromafinitesetofconfigurations
-
Byconsideringsingleordoubleexcitationsstartingfromconfigurationsin,constructanexpandedset.
-
ConstructtheHamiltonianfortheexpandedsetanddiagonalisetoobtainthewavefunctioncoefficientsfortheconfigurationsintheset.
-
Discardtheconfigurationswhosecoefficientislessthanagiventhreshold.Theremainingconfigurationsthenformanewsetofconfigurations.
-
Repeatuntilconvergence.
MLtechniquescanbeusedtoimproveselectionoftheconfigurationset.Onesuchapproachistoperformsupervisedlearning\citepCoe2018Machine,GlielmoPRX20,whereaneuralnetworkistrainedtopredictthewavefunctioncoefficientsusingthedatafromtheMCCImethod,i.e.,thewavefunctioncoefficientsoftheconfigurationsintheset.Aftertraining,thenetworkcanbequeriedorsampledtoselecttheconfigurationswiththelargestcoefficients.Inotherwords,thenetworkisusedtobootstrapandpredictthecoefficientsofconfigurationsnotyetseeninthedataset.Itwasshownin\citetCoe2018MachinethatsuchanapproachconvergesfasterthanthevanillaMCCImethod.ThetaskofselectingconfigurationsforselectedCIcanalsobecastasareinforcement-learningtaskwherethestateisthecurrentsetofconfigurationsandanagentistrainedtoperformactionsonthesettoiterativelymodifytheconfigurationswiththeaimofminimisingthevariationalenergy.Thisapproachwasappliedin\citetGoings2021Reinforcementtoachievenear-FCIaccuracyforsmallmoleculesinasmallbasisset.
6 Challengesandoutlook
Ab-initioQCwithneural-networkwavefunctionshasonlyjustemergedasaviablepathtohighlyaccurateelectronic-structuremethods,yetitalreadycompeteswithestablishedapproachesthathavebeendevelopedfordecades.Weimaginethatitmaybecomethemethodologywiththebesttrade-offbetweenefficiencyandaccuracyforsystemswithuptoonetotwohundredelectronsandanontrivialelectronicstructure.Beforethatcanhappen,however,severalchallengesmustbeaddressed.Allthemethodsarecurrentlyinadevelopmentstageandonlylimitedbenchmarkingisavailable.Assuch,itisnotyetclearwhethertheexcellentaccuracyseensofarwillbemaintainedacrossabroaderrangeofchemicalsystems,orhowrapidlytheaccuracywilldegradewithsystemsize.Relatedtothisisourincompleteunderstandingofwhatlimitstheaccuracyofneural-networkansatzes,andhowtheirsuccessorfailureisrelatedtophysicalphenomenasuchasstrongcorrelation.Sincetheunderlyingelectronicproblemisexponentiallyhardbutthealgorithmsarepolynomial,theymustbelimitedinaccuracyinsomeways.Itisnotcurrentlyclear,however,whetherthelimitationsseentodatearecausedbytherestrictedexpressivenessoftheneuralnetworksorbydifficultiesinoptimizationorboth.Forinstance,whileithasbeenproventhatasinglegeneralizedSlaterdeterminantisinprinciplesufficienttorepresentanyantisymmetricfunction\citepHutter20,itmightnotbepossibletoparametrizeitwithapolynomiallyscalingneuralnetworkortrainitwithinapolynomiallyscalingtime.Apartfromthesefundamentalissues,therearemanypracticalchallenges.WhilethescalingofvariationalQMCwithsystemsizeisfavourable,theprefactorduetotheneuralnetworksislarge.Untilveryrecently,thislimitedapplicationstosystemsnolargerthanthebenzenemolecule(42electrons),whichisthreetofourtimesbelowourenvisagedapplicabilityrange,althoughresultsfora108-electronsimulationcellofsolidLiHhavenowbeenreported\citepLi2022-abinitio.TheprefactorcanbereducedbyintegratingtraditionalQCtechniquessuchaspseudopotentials\citepLi2022-pseudo,developingmoreefficientneural-networkarchitectures,orusingMLtechniquessuchaspre-trainingandtransferlearning.Specifictothediscrete-basissecond-quantizedapproachesistheissueofbasis-setconvergence,wheresufficientlylargebasissetsmayincreasetheprefactorbyuptothreeordersofmagnitudecomparedtominimalbasissets.Anotherchallengeisrelatedtothestochasticoptimization,whichproducesnoiseintheconvergedenergiesthatisespeciallyamplifiedwhencalculatingsmallenergydifferences.Weare,however,optimisticthatmanyofthesechallengescanbeaddressedandcanbeaddressedquickly,thankstotherelativesimplicityoftheframeworkbasedonvariationalQMCandofneuralnetworkscomparedtotraditionalQCapproaches.Indeed,thissimplicityhasalreadyenabledrapiddevelopmentofmultipleextensionstothefirstsingle-pointground-statecalculationsonmolecules,includingtransferablewavefunctions,excitedstates,andformulationsforperiodicsystems,alloriginatingfrommultipleindependentresearchgroups.First-quantizedapproachessuchasFermiNet,PauliNet,andtheirsuccessorarchitecturesalreadymatchessentiallyexactbenchmarkresultstowithinchemicalaccuracyforsmallsystems.Yetthesenetworksarejustasmallsubsetofpossiblearchitecturesforrepresentingantisymmetricwavefunctions,anditisunlikelythattheoptimaloneswerefoundonthefirstattempt,soweexpectthatsignificantinnovationliesahead.Webelievethatab-initiomethodsbasedonneural-networkwavefunctionswillbecomeanintegralpartoftheQCtoolboxthatenablesstraightforwardelectronic-structurecalculationsofcomplexmolecularsystems.\printbibliography
Acknowledgements
WeacknowledgefundingfromtheGermanMinistryforEducationandResearch(BerlinInstitutefortheFoundationsofLearningandData,BIFOLD),theBerlinMathematicsResearchCenterMATH+(AA1-6,AA2-8),andEuropeanCommission(ERCCoG772230ScaleCell).