Ab-initio quantum chemistry with neural-network wavefunctions

Jan Hermann James Spencer Kenny Choo Antonio Mezzacapo IBM Quantum, Thomas J. Watson Research Center, Yorktown Heights, New York 10598, USA W. M. C. Foulkes Imperial College London, Department of Physics, South Kensington Campus, London SW7 2AZ, United Kingdom David Pfau Giuseppe Carleo Frank Noé
\tcbset

before upper= \newtcolorbox[auto counter]mybox[2][]floatplacement=t!,float,fonttitle=,title=Box \thetcbcounter — #2,#1 \addbibresourcerefs.bib \addbibresourcerefs-jh.bib \AtBeginBibliography

Abstract

Machinelearningandspecificallydeep-learningmethodshaveoutperformedhumancapabilitiesinmanypatternrecognitionanddataprocessingproblems,ingameplaying,andnowalsoplayanincreasinglyimportantroleinscientificdiscovery.Akeyapplicationofmachinelearninginthemolecularsciencesistolearnpotentialenergysurfacesorforcefieldsfromab-initiosolutionsoftheelectronicSchrödingerequationusingdatasetsobtainedwithdensityfunctionaltheory,coupledcluster,orotherquantumchemistrymethods.Herewereviewarecentandcomplementaryapproach:usingmachinelearningtoaidthedirectsolutionofquantumchemistryproblemsfromfirstprinciples.Specifically,wefocusonquantumMonteCarlo(QMC)methodsthatuseneuralnetworkansatzfunctionsinordertosolvetheelectronicSchrödingerequation,bothinfirstandsecondquantization,computinggroundandexcitedstates,andgeneralizingovermultiplenuclearconfigurations.Comparedtoexistingquantumchemistrymethods,thesenewdeepQMCmethodshavethepotentialtogeneratehighlyaccuratesolutionsoftheSchrödingerequationatrelativelymodestcomputationalcost.

\@footnotetext

Theseauthorscontributedequally\@footnotetextEmails:frank.noe@fu-berlin.de,giuseppe.carleo@epfl.ch,pfau@google.com

1 Introduction

Inthepastdecade,machinelearning(ML)hasmadeinroadsintomanyareasofthephysicalsciences\citepCarleoRMP19,oftenoutperformingmoretraditionalcomputationalmethods\citepjumper2021highly,DeringerN21orofferingentirelynewapproachestosolvescientificproblems\citepNoeS19,HuangNC20.Quantumchemistry(QC)hasbeenamongthefirstfieldstohavebeenaffectedbythisrevolution\citepTkatchenkoNC20,vonLilienfeldNC20,NoeARPC20.MostapplicationsofMLinQChavebeenconcernedwithsupervisedlearningofmolecularpropertiesfrommolecularstructure\citepDralJPCL20,eitheracrossconformational\citepUnkeCR21orchemicalspace\citepvonLilienfeldNRC20,aswellaswithunsupervisedlearningforthegenerationofnovelmolecules\citepBianJMM21.Thesemethodsallrequireapre-existingdatasetofmoleculesandtheirpropertiesasaninput,typicallyobtainedwithstandardmethodsofQCsuchasdensityfunctionaltheory\citepJonesDFTReview20215orcoupledclustertheory\citepBartlett2007.Inthesescenarios,MLaccuratelyapproximatesagivenmethodofQCatvastlyincreasedcomputationalefficiency.Thisapproachhasbeenalreadyreviewedinotherworkscitedabove.Incontrast,thecurrentreviewfocusesonthecomplementaryuseofMLasanab-initiotechniqueinQC,whichrequiresnoexternaldataandinsteadrecoversmolecularpropertiesfromfirstprinciples.Here,MLis“integrated”intoQC,withthegoalofarrivingatab-initiomethodswithamorefavourableaccuracy–efficiencytrade-offthantraditionalQCmethods.

(

a

(

b
Figure 1: Quantumchemistryandmachinelearning.(a)Machinelearningdisciplinesandtheirdependenceondatacanbemappedtodisciplinesinquantumchemistry.Thisworkreviewstheuseofmachinelearninginab-initioquantumchemistry,wheretheonlyinputtomachinelearningistheSchrödingerequationitself.Thisapproachusesself-generateddata,ratherthanrelyingonexternaldata.Theclosestanalogueinmachinelearningisreinforcementlearningwithself-play,whichsubstitutesdatafromanexternalenvironmentwithdatageneratedbytheagent,thoughinmanyotherrespectsthetwoapproachesaredistinct.(b)Trade-offbetweencomputationalefficiencyandaccuracyinquantumchemistrymethods.Accuracyofelectronicstructuremethodsagainsttheasymptoticscalingoftheircomputationalcostwithsystemsize,.Popularmethods,suchasdensityfunctionaltheory,areoutliersfromthegeneraltrend.

Thegoalofcomputationalchemistryistopredictpropertiesofknownmoleculesandtodesignmoleculeswithdesiredproperties.Mostmolecularpropertiesaredeterminedbythebehaviouroftheelectrons,soQCmethodsattempttoapproximatetheSchrödingerequationforelectronsinmolecules.Traditionally,QCmethodsaredividedintoab-initioandsemi-empiricalmethods,wheretheformerhavenofittedparametersdeterminedfromexternaldata,whereasthelatterdo.Methodsthatdonotusequantummechanicsatall(suchasforcefields)arecalledempiricalandaretypicallynotconsideredpartofQC,althoughthisviewmaybechangingwiththeadventofprincipledandaccurateML-basedempiricalmethods.ItisusefultocastthesethreecategoriesofmethodsinthelightofMLterminology(Fig. 1a).MLcanberoughlydividedintosupervised,unsupervised,andreinforcementlearning.InsupervisedlearningtheMLmodellearnstopredictthelabels(outputs)ofthedata(inputs)fromagivendatasetsoastominimizethedifferencebetweenthepredictedandreferencelabels.Byidentifyingtheinputswithmolecularstructuresandtheoutputswithmolecularproperties,allsemi-empiricalandempiricalmethodsofQCfitintosupervisedlearning,butusingmostlyrelativelysimpleandphysicallymotivatedfunctionalformsratherthanthemoregeneralandhighlyflexiblefunctionstypicalforML.Viceversa,themanyrecentsuccessfulsupervisedMLmodelsthatpredictenergiesorothermolecularpropertiesbasedonQCtrainingdatacanbeclassifiedasempiricalmethods\citepDeringerCR21,BehlerCR21,UnkeCR21,MusilCR21.Unsupervisedlearningisconcernedwithunlabelleddata,andthegeneraltaskistolearntheunderlyingprobabilitydistributionthatwouldgenerateagivendataset.Examplesinchemistryincludegenerativemodelsforstructuralformulas\citepGomez-BombarelliACS18aswellasfull3Dstructuresofmolecules\citepNoeS19,Hoogeboom22,andinphysicstheestimationofquantumstatesfrommeasurements,knownasquantumtomography\citepTorlaiNP18.Finally,inreinforcementlearning,theMLmodel(alsoreferredtoasanagentisabletointeractdirectlywithitsenvironment,ratherthantojustpassivelyreceivedata.Here,theaimisfortheagenttolearnapolicyforhowtointeractwiththeenvironmentsoastomaximizealong-termreward\citepsutton2018reinforcement.ReinforcementlearningisbehindsomeofthemostprominentsuccessesofMLsuchasplayinggamesatasuperhumanlevel\citeptesauro1994td,mnih2015human,silver2016masteringorthecontrolofplasmaintokamaks\citepDegraveN22.Incertainsettingstheagentcanself-generatedatabytreatingitsownpolicyastheenvironment.Thisisknownasself-play,andhasbeenthebasisformanyadvancesinsymmetricgames\citepheinrich2015fictitious,SilverS18.Althoughtherearemanykeydifferences,thisisthebranchofMLconceptuallymostsimilartoab-initioQC,inthesensethatnoexternaldataotherthantherulesofthesystemorgamearerequiredforeither.Inthetraditionalpicture,onemovesfromempiricaltoab-initiomethodsbyretainingmoreofthefirst-principlesphysics.Similarly,thereisageneraltrendforMLmodelsinchemistrytoencodeanincreasingamountofmolecularphysics.Thisincludesphysicalconstraintssuchasenergyconservation\citepChmielaSA17,invarianceandequivarianceofmolecularpropertieswithrespecttorotation,translation,orexchangeofindistinguishableparticles\citepBehlerPRL07,SchuttNC17,aswellasotherphysicalconceptssuchasmany-bodyexpansions\citepDrautzPRB19orevensurrogatequantum-mechanicalmodels\citepLiJCTC18a,SchuttNC19,KirkpatrickS21.Similarconsiderationscanbemadefortheproblemofab-initiolearningofsolutionstotheelectronicSchrödingerequationintroducedhereandwewilldiscussdifferentstrategiesthroughoutthereview.TheSchrödingerequationisaneigenvalueproblemthatcanbeequivalentlyformulatedviaseveralvariationalprinciples—itssolutions,theeigenstatewavefunctionsandenergies,canbefoundbysearchingforstationarypointsofcertainfunctionalsoverthespaceofallphysicallyadmissiblewavefunctions.Importantly,thegroundstateofamoleculecanbefoundbyminimizingtheenergyexpectationvalueofawavefunction.Thisprincipleunderliesmanyab-initioQCmethods,andalsothemethodsinthisreview,assuchavariationalprinciplenaturallydefinesaMLproblem—theeigenstates(suchasthegroundstate)arerepresentedasaneuralnetworkandtheparametersofthatnetworkareobtainedbyminimizingthevariationalelectronicenergy.Thereviewedmethodsdifferintheparticularformoftheneural-networkansatzused,asdescribedbelow.

( (
Figure 2: Electronicstructureproblemanditsneural-networksolutions.(a)TheproblemisfullyspecifiedbythegeometryofamoleculeandtheelectronicSchrödingerequation.(b)OnlyfullyantisymmetricwavefunctionsareadmissibleassolutionsduetothePauliexclusionprincipleand(c)theseareoftenrepresentedwithSlaterdeterminants.(d,e)Solutionsformulatedinfirstquantizationuseantisymmetricneuralnetworkstorepresentthewavefunctiondirectlyinrealspace.(f)Secondquantizationtransferstheantisymmetrytoafixedfinitebasis,enablingtheuseofvanillaneuralnetworks.

Section 2brieflyreviewsthecomponentsofelectronicstructuretheorynecessaryforthedevelopmentoftheMLmethodstobediscussedlateron.TheelectronicstructureproblemismappedtoMLinSection 3,whichisfollowedbyareviewoftheab-initioMLmethodsforQCformulatedinrealspaceandinadiscretebasisinSections 5 and 4,respectively.ThereviewisconcludedinSection 6.

2 Electronicstructure

2.1 Schrödingerequation

QCaimsatfindingapproximatesolutionsoftheelectronicSchrödingerequationthatstrikeagoodbalancebetweenaccuracyandefficiency\citepPiela20(Fig. 1b).Thenon-relativisticelectronicSchrödingerequationwithintheBorn–Oppenheimerapproximationforagivenmoleculespecifiedbythechargesandcoordinatesofthenuclei,,,isasecond-orderdifferentialequationforthewavefunction,,whichisafunctionofthecoordinatesofelectrons(Fig. 2a):

(1)
(2)

AnalternativeformulationoftheSchrödingerequationusesthenotionofanexpectationvalue,

(3)

InsteadofsolvingEq. 1,theground-state(lowest-energy)solutioncanbefoundbyminimizingthisenergyexpectationvaluewithrespecttoallpossiblewavefunctions(variationalprinciple),

(4)

2.2 Antisymmetricwavefunctions

Electronsarefermions,andassuchtheirwavefunctionmustbeantisymmetricwithrespecttoexchangeofanytwoelectrons.ThiscardinalfeatureofelectronicwavefunctionspermeatesthewholeofQC.Ingeneral,electronsalsopossessspincoordinates,,butthenonrelativisticHamiltoniandoesnotoperateonspin,sothespincoordinateofeachelectroncanbeconsideredfixed.Tosimplifythepresentationhere\parencite[forfulltreatment,see][Sec. IV.E]FoulkesRMP01,wetakeadvantageofthefixedspincoordinates,sothespatialwavefunctionmustbeantisymmetriconlywithrespecttotheexchangeofsame-spinelectrons,i.e.,when(Fig. 2b),

(5)

ByfarthemostcommonwaytoformantisymmetricwavefunctionsinQCisasantisymmetrizedproductsofsingle-electronfunctions(orbitals),.Theseproductscanbewrittenasdeterminantsofanmatrix,,formedbyputtingelectronsintoorbitals,andarereferredtoasSlaterdeterminants(Fig. 2c):

(6)

Wheninterpretingasthe-thecomponentofa-dimensionalfeaturevectorforthe-thelectron(usingMLparlance),,aSlaterdeterminantisinfacttheonlyantisymmetricfunctionoffeaturevectorsthatislinearineveryoneofthem,makingitanaturalchoice.Alternativeantisymmetricformsexist,suchasthePfaffian\citepBajdichPRL06ortheVandermondedeterminantanditsgeneralizations\citepHanJCP19,AcevedoDLS20,butthesearefarlesscommonandwewillnotdiscussthemhere.Slaterdeterminantsformedfromdifferentorbitalscanbefurthermixedinalinearcombinationwithoutbreakingtheantisymmetry(Fig. 2c).Infact,thissimpletechniqueisthepowerhousebehindallthehigh-accuracymethodsofQC,yetitisalsoitsbane,becausethenumberofSlaterdeterminantsrequiredtoachieveagivenaccuracyrisesexponentiallywiththenumberofatomsinmostcases.Forfermionicwavefunctionsthereisnoknowngeneralapproachtoeffectivelyreducethesearchspacefromthisexponentialregimewithoutsacrificingaccuracy.However,QChasproducedmanymethodsthatachieveexcellentapproximationsforspecificmoleculesandmaterialsofpracticalinterest.Thecostofthesehighlyaccuratemethodsisgenerallylessthanexponential,butneverthelessincreasesrapidlywithsystemsize(Fig. 1b).

2.3 Variationalwavefunctionmethods

AnimportantclassofQCmethodsderivesdirectlyfromthevariationalprinciple(Eq. 4),byassumingacertainwavefunctionansatz,,parametrizedby.Minimizingtheenergyofthisansatzwithrespecttothenalwaysyieldsanupperboundfortheexactground-stateenergy,

(7)

Theboundbecomestighterastheexpressivenessoftheansatzisimproved.Onecandistinguishtwostrategiestoconstructtheansatzes.First,traditionalQCusesrelativelysimpleforms,suchthattheintegralofEq. 3canbeevaluatedanalytically,whichdrasticallysimplifiestheminimizationproblem\citepSzabo96,Piela20.Second,quantumMonteCarlo(QMC)enablestheuseofarbitrarilycomplexansatzesatthecostofhavingtodotheintegralevaluationandminimizationstochastically\citepBecca17.Thelatterisanaturalframeworktoincorporateneuralnetworks,andweintroduceitinmoredetailinSection 3.1.Hereweintroducethreeansatzesforelectronicwavefunctionsofthefirst(traditional)kind,sincetheyserveasscaffoldingfortheneural-networkansatzesofSections 5 and 4.WealsobrieflydiscusshowtheyrelatetootherpopularQCmethods.{mybox}[label=box:first-second-quant]FirstandsecondquantizationComputationalmethodsfortheelectronicSchrödingerequationcanbedividedtofirst-quantizedapproachesinrealspaceandsecond-quantizedapproachesinadiscretebasis.Infirstquantization,oneworkswiththeindividualelectronsandtheircoordinatesdirectlyinrealspace(,)asinEq. 1,

Here,mustbeanantisymmetricfunction,whichspecifieswhichelectronsoccupywhichcoordinates,whilethemany-electronbasisstates()areordinarynon-symmetric(Cartesian)productstates.Insecondquantization,onehastofirstintroduceadiscretebasis(inpracticefinite),labelledby,whichthenenablesonetoworkwithpreformedantisymmetricmany-electronbasisstates(Slaterdeterminants),andratherthanspecifyingwhichelectronsoccupywhichone-electronstates,theoccupationnumbers(,)specifywhichone-electronstatesareoccupiedwithoutanyreferencetoaparticularelectron,

Here,canbeanarbitrarytensorwithoutantisymmetry,whichisinsteadencodedinthemany-electronbasisstates.Thisabilitytopushtheantisymmetryfromthewavefunctionobjecttothemany-electronbasisisthemainadvantageofsecondquantization,atthecostofhavingtocommittoaparticulardiscretebasis.Butregardlessofthecomputationalframework,eitherthewavefunctionobjectitself(infirstquantization)orthemany-electronbasis(insecondquantization)consistsofSlaterdeterminants,andinhigh-accuracymethodstheirnumbergrowsrapidlywithsystemsize.

Firstandsecondquantization.Illustrationonelectronsin1Dandafinitebasisofsize5..

Hartree–Fock

PerhapsthesimplestnontrivialansatzinQCisthesingleSlaterdeterminantofEq. 6,wheretheorbitalsareconsideredasfreeparameters.Optimizedvariationally,thisansatzleadstotheso-calledHartree–Fock(HF)method.Inpracticetheorbitalsarelinearlyexpandedinafixedfiniteone-electronbasis,,,withinmostcases:

(8)

TheuseofafinitebasissetturnsthefunctionaloptimizationproblemofEq. 8intoacomputationalproblemwhosecostscaleswiththefourthpowerofthenumberofbasisfunctions,,assuminganaiveimplementation.Onitsown,theHFansatzisexpressiveenoughtodescribemuchofchemistryqualitatively,butnotalwaysandcertainlynotquantitatively.However,itcanbeconsideredastartingpointformostwavefunction-basedQCmethods.Densityfunctionaltheory(DFT)isnotsuchamethod,relyinginsteadonanin-principleexactmappingoftheab-initioHamiltonian(Eq. 2)toamean-field-likeproblem,whichcanbesolvedexactlywithasingleSlaterdeterminant\citepJonesDFTReview20215,TealePCCP22.However,thevariationalprincipledoesnotholdinDFTbecausetheexchange-correlationcontributionstotheenergyfunctionalarenotknownexactlyandmustbeapproximatedinpractice.Fromhereon,wewillstaywithinthevariationalprincipleandinsteadfocusonincreasingtheexpressivenessoftheHFansatz.

Configurationinteraction

TheHFansatzcanbestraightforwardlyextendedbyformingmultipleSlaterdeterminantsfromdifferentsetsoforbitalsandconsideringtheirlinearcombination(Fig. 2c),

(9)

Whentheorbitalsofeachdeterminantarepooledfromalargersupersetof(mutuallyorthogonal)fixedorbitalsofsize,andtheonlyfreeparametersarethelinearcoefficientsofthedeterminants,theansatziscalledconfigurationinteraction(CI).OneoftheappealsoftheCIansatzisthatitsSlaterdeterminantscanbeconsideredamany-electronantisymmetricbasisandlabelledusingtheoccupationnumbersoftheone-electronstates.Thisso-calledsecondquantizedformalismhasmanyconvenientpropertiesforcomputation(seeBox LABEL:box:first-second-quant).ThesimplestversionofCI,calledfullCI(FCI),considersallpossibleSlaterdeterminantsandisexactwithinthechosenfiniteone-electronbasis.Intheusualcasewhen,however,thecomputationaleffortscalesexponentiallywith,whichmakesFCIapplicableonlytothesmallestmolecules.WaystotackletheexponentialscalingincludefixedtruncationoftheCIexpansionorits“compression”throughanalyticalmeans(coupledclustertheory,[Bartlett2007];matrixproductstates,[ChanDMRG2011]),deterministicpruning(selectedCI,[HuronJCP73]),orstochasticsampling(FCI-QMC,[BoothJCP09]).Section 5exploresanovelwayof“compressing”theCIexpansionthroughneuralnetworks.

Beyondfixedbases

TheeffectivenessoftheCIansatzdependsonthechoiceofthefixedmolecularorbitalsfromwhichtheSlaterdeterminantsarebuilt.AnaturalextensionofCIallowsboththeorbitalsandtheCIexpansioncoefficientstovaryduringthevariationalminimization.Suchanansatzoftwostackedlinearcombinations(Eqs. 9 and 8)ishardertooptimizebutmuchmoreexpressive.ThemostcommonvariantistoconsiderallSlaterdeterminantsformedbylettingelectronsoccupyaspaceoforbitals,whiletheremainingelectronsoccupyafixedsetofinactiveobitals.Thisiscalledthecompleteactivespaceself-consistentfield(CASSCF)method\citepOlsenIJQC11.Duetothelargervariationalfreedom,aCASSCFansatztypicallyrequiresmanyfewerdeterminantsthanaCIansatzofcomparableaccuracy.ButCASSCFandevenFCIarestilllimitedbythefixedone-electronbasisusedtoformthemolecularorbitals(Eq. 8):FCIisonlyexactinthecompletebasissetlimit,whichinpracticecannotbereachedforanybutthesmallestmolecularsystems.AnextensionoftheCASSCFansatzwouldallownotonlytheone-electronorbitalsbutalsotheone-electronbasisfunctionstovary.Thestackedstructureofsuchanansatzwouldbereminiscentofdeepneuralnetworks,andSection 4explorestheculminationofthislineofthoughtbyincorporatingactualdeepneuralnetworksintotheansatz.Thisremovesanyapriorilimitationsontheexpressiveness.Bymakingeachindividualdeterminantmaximallyexpressive,suchansatzesfurtherreducethenumberofdeterminantsrequiredtoreachagivenaccuracy.

3 MachinelearningforelectronicSchrödingerequation

{mybox}

[label=sec:QMC]VariationalMonteCarloOptimizationofwavefunctionswithneuralnetworksnaturallyleadstothevariationalMonteCarlo(VMC)framework.First,MonteCarlointegrationofEq. 3canhandlearbitrarilycomplicatedansatzesforwhichanalyticalintegralsarenotavailable.Second,VMCsamplestheseintegralsstochasticallywhichnaturallycombineswiththestochasticgradientdescentusedforoptimizingneuralnetworks.IntraditionalQC,VMChasbeenusedextensivelywithreal-spacefirst-quantizedapproaches\citepFoulkesRMP01andmorerecentlyinthediscrete-basissecond-quantizedsetting\citepNeuscammanJAGPHilbert2013,SabzevariJCTC18.Theexpectationvalueofanyoperator,suchastheHamiltonian(Eq. 3),canbewrittenasaMonteCarlointegraloveracontinuousordiscretebasis,,

Here,theexpectationvalueisobtainedasanexpectedvalueofa“local”energy,localinthesensethatitisdefinedforeverybasiselement.AstraightforwardandgenerallyapplicablewaytoobtainthesamplesisMarkov-chainMonteCarlo(MCMC).MCMCisaniterativeprocedure,inwhichanewsamplepoint,,isproducedfromacurrentone,,bymakingaproposalstepwithprobability,andthenacceptingorrejectingtheproposalwithprobability

TheresultingMarkovchainthensamples.VariantsofMCMCdifferintheconstructionoftheproposalstepsand,andincludethesimplestMetropolisalgorithm()aswellasmoresophisticatedflavourssuchasLangevinMonteCarlo.TheVMCformulafortheexpectationvalueisexactinthelimitofinfinitesamplesize,,butinpracticeitincursastatisticalerrorproportionalto.Whileconvergesslowlywithsamplesize,VMChasthegreatbenefitthattheastheansatzconvergestotheexacteigenstates,thelocalenergyconvergestoaconstant(theexactenergy),andassuchitsvariancevanishesandsodoesthestatisticalsamplingerror.

3.1 Mappingquantummechanicstomachinelearning

Electronpositions,
Figure 3: VariationalMonteCarlowithneuralnetworks.Electronpositions,,ororbitaloccupationnumbers,,describeanelectronconfigurationwhichisaninputtothewavefunction,,representedbyaneuralnetworkparametrizedwith.Thewavefunctionisusedintwoways:first,tosamplenewelectronconfigurationswhichprovidenewinputtotheneuralnetwork(yellow),andsecond,toevaluatetheelectronicenergy,whichisminimizedbyvaryingthenetworkparameters(blue).
Electronicstructure Machinelearning
Wavefunction Probabilitydistribution
Naturalorbital Marginaldistribution
Stochasticreconfiguration Naturalgradientdescent
Hartree–Fock Mean-fieldvariationalBayes
DiffusionMonteCarlo Particlefiltering;
SequentialMonteCarlo
Table 1: Dictionaryofelectronicstructureandmachinelearning.

AMLproblemanditssolutionarespecifiedbythemodel,itsinputsandoutputs,thedata,andtheoptimizationcriterion(lossfunction).Inthisregard,solvingtheSchrödingerequationwiththevariationalprincipleamountstothefollowingMLproblem(Fig. 3).Theneuralnetwork(Section 3.2)representsawavefunction,whichacceptselectroncoordinates(firstquantization)oroccupationnumbers(secondquantization)asinputandoutputsthewavefunctionvalue.Thelossfunctionistheenergyexpectationvaluecorrespondingtothiswavefunction.Theinputsaresampledfromtheprobabilitydistributiongivenbythesquareofthewavefunctionrepresentedbythecurrentneuralnetwork,andtheHamiltonianoperatorisusedtoobtainanestimateofthelossfunctionfromthesamples.Theparametersofthenetwork,andthusthewavefunction,arethenmodifiedtominimizethelossfunction.Exceptfortherepresentationofthewavefunctionasanetwork,thisistheregularvariationalMonteCarlo(VMC)framework(Box LABEL:sec:QMC).Theoptimizationmethodsused(Box LABEL:sec:optimization)arealsofairlyconventional,althoughadaptedtoaneuralnetworkcontext.ThisstraightforwardcorrespondencebetweentheSchrödingerequationandMLledtotheintroductionofsimilarconceptsonbothsides,albeitknownunderdifferentnames(Table 1).Theapplicabilityofdeeplearningforquantum-mechanicalcalculationswasfirstrealizedandexploitedby\citetCarleoS17forthecaseofspinlatticesinoneandtwodimensions.Theirapproach,knownasNeuralQuantumStates(NQS),hassincebeenappliedtomanydifferentquantumsystems\citepsaito2017solving,nomura2017restricted,corey2021variational,nikita2021broken.Inessence,thisreviewisconcernedwiththeextensionofthisapproachtoelectronsinmolecules.{mybox}[label=sec:optimization]Optimizingneural-networkansatzesUptothestatisticalerror,theVMCexpectationvaluefortheenergy(Box LABEL:sec:QMC)obeysthevariationalprinciple(Eq. 4).VMCexploitsthisbyvaryingaparametricwavefunctionansatzsoastominimizetheenergy.Forasufficientlyexpressiveansatz,thevariationalenergywilleventuallyapproximatethegroundstateenergyofEq. 1andtheansatzwillapproximatethegroundstatewavefunction.Themoststraightforwardoptimizationmethodisgradientdescent,wheretheparametersareiterativelyupdatedas

withlearningrate.Theenergygradientisgivenby

where

isanoperatorrepresentingthelogarithmicderivativesofthewavefunction.ThisgradientcanbeefficientlyestimatedusingMonteCarlointegration(Box LABEL:sec:QMC).Insomecasestheoptimizationcanbespedupandmademorestablewithhigher-ordermethods,suchasthestochasticreconfiguration(SR)scheme\citepSorellaPRL98.SRtakesthecorrelationbetweenindividualvariationalparametersintoaccountbyintroducingthequantumgeometrictensor:

Theupdateruleisthenmodifiedto

TheSRschemeapproximatesanimaginary-timeevolutionwhereeachiterationtriestobestapproximatethestate.SRissimilartothenaturalgradientdescentalgorithm\citepamari˙natural˙1998thatiswell-knownintheMLcommunity,andcanbeinterpretedasaquantumgeneralizationoftheFisherinformationmatrix\citepAy17.Insomecases,itisconvenienttoapproximatethequantumgeometrictensorusingtheKronecker-factoredapproximatecurvature(KFAC)approach\citepmartens2015optimizing.

3.2 Deeplearning

Thestandardpracticeinab-initioQCtodayisinsomewaysanalogoustothestateofcomputervisionbeforetheriseofdeeplearning.Priorto2012,thebestpipelinesforlarge-scaleimagerecognitionconsistedofacombinationofhand-designedfeaturesandsimpleMLmodels\citepperronnin2010large.Asingledeepconvolutionalneuralnetworktrainedend-to-endwasabletocuttherecognitionerrorinhalfrelativetothesesystems\citepkrizhevsky2012imagenet,andsincethendeepneuralnetworkshavedominatedcomputervisionresearch.Inab-initioQC,ground-statesolutionstotheSchrödingerequationareusuallyrepresentedbyawavefunctionansatzwitharelativelysimplefunctionalform,andparametersareusuallyfitthroughamixofprocedures(fixed-pointiteration,variationaloptimization)ratherthanaunifiedend-to-endestimationofallparameterssimultaneously.ThedevelopmentofdeepQMCmethodsisdrivenbythehopethattheuseofneuralnetworkswillsignificantlyincreasetheexpressivenessofwavefunctionansatzes,enablinglargeleapsinaccuracyasinimagerecognition.ToappreciatehowandwhydeepneuralnetworkscanbeusefullyappliedinQC,abriefreviewoftheirapplicationinartificialintelligenceisnecessary.Forathoroughreviewofthehistoryofdeeplearning,see\citetschmidhuber2015deep,andforareviewofthefundamentalconceptsindeeplearning,see\citetlecun2015deep.Neuralnetworksdatebacktotheverybeginningofcomputerscience\citepmcculloch1943logical,andtheirmodernformoriginateswiththesingleperceptron“unit”\citeprosenblatt1958perceptron,whichproducesasoutputanon-linearfunctionofthesumofaconstant,knownasthebias,andalinearcombinationofitsinputs.Thenon-linearfunctionrisesfromzerotooneasitsinputincreases,mimickingtheactivationfunctionofabiologicalneuron.Whenmanysuchunitsareassembledinparalleltoforma“layer,”andseverallayersarecomputedserially,takingtheoutputfromonelayerastheinputtothenext,theresultingmulti-layerperceptron(MLP)can,intheory,representanysmoothfunctiontoarbitraryaccuracygivenenoughunits\citephornik1989multilayer.However,actuallyfittingorlearningasetofparametersthatmatchesanygivenfunctionisdifferentmatter.Aformofgradientdescentutilizingderivativescomputedusingbackpropagation,orreverse-modeautomaticdifferentiation\citepwerbos1974beyond,linnainmaa1970representation,linnainmaa1976taylor,wasfoundtobeeffectivefortrainingneuralnetworks\citeprumelhart1986learning.Thisledtoawaveofenthusiasmforneuralnetworks,whicheventuallyfadedasseveralissueswerediscovered,suchastheinfamous“vanishinggradients”andgettingstuckinlocalminima.Severalfactorswereinstrumentalinrehabilitatingneuralnetworksunderthebannerof“deeplearning”:acombinationofalgorithmicadvances\citepglorot2010understandingandtheuseofmodernGPUhardware\citephooker2020hardwaremadethecomputationsmuchfaster,andtheresultingabilitytotrainlargernetworksmadeissueswithlocalminimalesssevere\citepdauphin2014identifying,choromanska2015loss.Furthermore,deepneuralnetworkswiththehelpofstochasticgradientdescentcanbeappliedstraightforwardlyandefficientlytolargedatasets,unlikeotherMLmodels\citepbottou2008learning,bottou2011tradeoffs.Finally,empiricalsuccesseslikewinningtheImageNetLargeScaleVisualRecognitionChallenge\citeprussakovsky2015imagenethelpedlegitimizedeeplearningresearchandgenerateexcitementamongresearchers.Today,thebarriertoentryfordevelopingandtrainingdeepneuralnetworksisquitelow,thankstoamatureecosystemofsoftwarelibrariesfornumericalcomputingwithautomaticdifferentiationandhardwareaccelerators\citepAbadiOSDI16,paszke2017automatic,bradbury2018jax.However,actuallyachievinggoodperformancefromadeeplearningmodelstillrequiressomefinesseandapplicationofvariousheuristics.Itissafetosaythatasignificantamountofthepracticeofdeeplearningremainsmoreartthanscience.Thegoodnewsisthatonceeffectiveheuristicsforaparticularproblemdomainhavebeendeveloped,thesesameheuristicscanoftenbeappliedwithlittlemodificationtootherproblemsinthesamedomain.

3.3 Neuralnetworkarchitectures

Thestartingpointformostneuralnetworksisthemulti-layerperceptron(MLP),formedasacompositionoflayers,

(10)

whereissomenon-linearactivationfunction,andandarethematricesofweightsandvectorsofbiasestolearn.WhileavanillaMLPiscapableofrepresentingarbitraryfunctions,therealpowerofneuralnetworkscomesfrommoresophisticatedarchitectures.Manyofthesearchitecturesaredesignedtoencodesomeparticularinvarianceorequivariance—thatis,whentheinputtothenetworkistransformedinaparticularway,theoutputshouldeitherbeunchangedorshouldtransforminacorrespondingway.Forinstance,theweightsinalayerofaconvolutionalneuralnetwork(ConvNet)\citeplecun1998gradientarerestrictedtobeadiscreteconvolutionoperator,whichconstrainseachlayertobetranslation-equivariant,anaturalconstraintforimagerecognition,andalsodramaticallyreducesthenumberofpossibleweightsinalayer.Equivariancetopermutationisanotherfrequentlyusefulproperty,andonethatisespeciallyimportantinreal-spaceapproachestorepresentingelectronicwavefunctions(seeSection 4).Asimplepermutation-equivariantlayerfirstproposedby\citetshawe1989buildingcanbeconstructedbyapplyingthesametransformationtoeachinputandsummingtheresults.Moresophisticatedpermutation-equivariantlayersareusedbymodelsliketheTransformer\citepvaswani2017attentionorSchNet\citepschutt2018schnet.Manyoftheseequivariantlayerscanbeunifiedinaconceptualframeworkbasedaroundthelanguageofgeometryandgrouptheory,whereinthechoiceoftransformationtobeequivarianttoleadsnaturallytorecipesforconstructingtheappropriateneuralnetworklayers\citepbronstein2021geometric.Anotherclassofneuralnetworkarchitectures,whichhavebeeninfluentialaswavefunctionansatzes,arerestrictedBoltzmannmachines(RBMs)\citephinton2006reducing.Thesewereoriginallydevelopedforunsupervisedlearning,butintheVMCsettingconsideredheretheyleadtoasimpledeterministicexpressionforthelogprobabilitythatcloselyresemblesaone-layerMLP.Despitetheirearlypopularity,RBMshavebeenlargelyeclipsedintheAIcommunitybyothermethodsforunsupervisedlearning,suchasvariationalautoencoders\citepkingma2013auto,generativeadversarialnetworks\citepgoodfellow2014generative,normalizingflows\citeprezende2015variational,autoregressivemodels\citepoord2016wavenet,oord2016conditional,anddiffusionmodels\citepsohl2015deep.Infact,someofthesenewermodelshavestartedtohaveanimpactasneuralnetworkwavefunctionansatzesforspinsystems.Examplesaredeepautoregressivequantumstates\citepsharir2020deep,convolutionalneuralnetworks\citepchoo2019two,recurrentneuralnetworks\citephibat-allah˙recurrent˙2020,andnormalizingflows\citepxie˙ab-initio˙2021.

4 Electronsinfirstquantization

.(
Figure 4: Neural-networkarchitecturesforselectedreal-spacewavefunctions.(a)OriginalPauliNetarchitecturefrom\citepHermannNC20.(b)OriginalFermiNetarchitecturefrom\citepPfauPRR20.Botharchitectureshavebeenmodifiedandextendedbyvariouscontributionsmentionedinthisreview.(c)Approachtocomputingexcitingstatesin\citepEntwistle22.

Oneapproachtostudyingtheelectronicproblemwithdeeplearningistoworkwithparameterizedmany-bodywavefunctionsinfirstquantization,.Herestandsforthe-tupleofelectroncoordinates,,andsamplingisrealizedoverelectronicpositions(Box LABEL:sec:QMC).Theantisymmetryconstraint(Eq. 5)mustbeimposedintoavoidcollapsingontoalower-energybosonicstate.Acommonlyadoptedformis,wherethefirstfactorissymmetric(or“bosonic”)underexchangeofelectroncoordinatesandthesecondfactorcarriesthenecessaryantisymmetry.ThesimplestandmostcommonapproachistobuildtheantisymmetricpartofthewavefunctionsusingSlaterdeterminants(Eq. 6).AsdiscussedinSection 2,singleSlaterdeterminantswithfixedorbitalshavelimitedexpressivenessandmanysuchdeterminantsneedtobecombinedtoachievehighaccuracy.Anaturalgeneralizationofasumoffixed-orbitalSlaterdeterminantsisthecommonly-usedSlater–Jastrowwavefunction

(11)

wheretheJastrowfactor,constitutesthesymmetric(“bosonic”)partofthestateandtypicallycontainsone-andtwo-body(andinmanycaseshigher-order)parameterizedcorrelations.Thesetnotation,,indicatesthatdoesnotdependontheorderoftheelectroncoordinates.ThedeterminantsinEq. 11aretypicallyreplacedwiththeproductofspin-upandspin-downdeterminants\citepFoulkesRMP01.Separatingtheup-anddown-spindeterminantsimprovescomputationalefficiency,simplifiestheimplementation,andmakesiteasiertohandletheelectron-electroncusps,whileleavingexpectationvaluesofspin-independentoperatorsunchanged.Moreflexibleparametricformscanbeobtainedleveragingtheapproximationpowerofartificialneuralnetworks.Inthefollowing,wediscussneural-network-basedstrategiestoparameterizetheseforms.

4.1 Discretespace

Thefirstapplicationsofneuralnetworkstoelectronicsystemswereforelectronsmovingindiscretizedspace,asrealized,forexample,inthe2DHubbardmodelofstrongly-interactingelectrons.Inthefollowing,forsimplicity,wediscussthecaseofspinlesselectronsinlatticesites,anddenotewiththediscretelatticeindexcorrespondingtoelectronposition.Theextensiontothespinfulcasewillbeconsideredmoreindetailwhendiscussingcontinuousspacelateron.ThesymmetricpartcanbereadilyparameterizedwithastrategycloselyrelatedtoNQSforspins:

(12)

whereistheuniqueoccupationnumberrepresentationcorrespondingtotheelectronicpositionsandrepresentagenericfunctionwhichcouldberepresentedbyaneuralnetwork.Sincetheoccupationnumbersareinvariantunderpermutationoftheelectronpositions,isalsosymmetricunderexchange.AnyoftheNNarchitecturesalsoadoptedforspinsystems\citepCarleoS17orlatticebosons\citepsaito2017solvingcanbeusedtorepresentthesymmetricpart.EarlyworksontheHubbardmodeladoptedpositive-definiteRBM-basedparameterizationsof\citepnomura2017restricted,whilemorerecentworkshaveadopteddeep-networkparameterizationsallowingforsignchanges\citepstokes˙quantum˙2020.Thesimplestparameterizationfortheantisymmetricpart,,isagainaSlaterdeterminant

(13)

wherethematrixofdiscreteorbitalsholdsthevariationalparameterstobeoptimized.Thisapproach,however,hastheimportantdrawbackofnotprovidingenoughvariationalflexibility,sinceiteffectivelyfixestheanti-symmetricparttoamean-fieldreferencesolution.

Neuralbackflow

Asignificantimprovementisobtainedbyconsideringamany-bodybackflowtransformationoftheorbitals\citepfeynman˙energy˙1956,kwon˙effects˙1993.Inthisvariationalform,thematrixofone-electronorbitalsispromotedtoaparameterizedmany-electronfunctiondependingonalltheoccupationnumbers:

(14)

whereisacorrectiontothesingle-particleorbitals.Inphysics-inspiredparameterizations,istypicallytakentobeasimplefunctionoftheelectronicoccupationnumbers\citeptocchio˙role˙2008.Theneuralbackflowmethod\citepLuoPRL19insteadintroducedaflexibleparameterizationofthebackfloworbitalsbasedonartificialneuralnetworks.Inthiscase,isparameterizedwithaMLPtakingasinputstheelectronicoccupationnumbersandoutputingamany-bodycorrectiontothematrix.Thisapproachallowstheorbitalstodynamicallychangedependingonthepositionsoftheelectrons,thusallowingonetoincludegenuinelymany-bodycorrelationsintheantisymmetricpartofthewavefunction.

Constrainedhiddenfermions

Neuralbackflowtransformationsarenottheonlywaytointroduceflexibleparameterizationsoftheantisymmetricpartofthewavefunction.Theconstrainedhiddenfermionformalismbuildsontheideaofintroducingasetofauxiliaryfermionicparticles,withpositions,andlivingonlatticesites.Theseauxiliaryparticlesareusedtoeffectivelymediatecorrelationsamongthephysicaldegreesoffreedom\citeprobledo˙moreno˙fermionic˙2022.CallingaSlaterdeterminantfortheextended(physical+hidden)system,theresultingantisymmetricformforthephysicalsystemisgivenby

(15)

Inthisexpression,isafunction,parameterizedbyaneuralnetwork,mappingthephysicalpositionstothehiddenones.Thisapproachhasbeenshowntoimprovesystematicallyovertheneuralbackflowformforthe2DHubbardmodel\citeprobledo˙moreno˙fermionic˙2022.

4.2 Continuousspace

Wenowfocusondescribingtheimportantcaseoffirst-quantizedelectronsincontinuousspace,directlycorrespondingtotheelectronicSchrödingerequation.Asinthediscrete-spacecase,theSlater–Jastrowformmaybeimprovedinamattersuitableforusewithneuralquantumstatesbyaddingabackflowtransformation,inwhichtheone-electronorbitalsarereplacedbymany-electronfunctions.Thebackflowtransformationcaneithermodifytheorbitalsdirectlyviaamultiplicativeand/oradditiveterm:

(16)

oractasaquasiparticletransformationoftheelectroncoordinates:

(17)

wheretheparamterizedfunctions,,areinvarianttopermutationsof,andisathree-componentvectorthatmodifies.Ifweconsideradeterminantoforbitalsofthisform,

(18)

thenweseethatorbitalswithbackflowtransformationsarejustoneexampleofabroaderclassoffunctions:inorderforthedeterminanttobeantisymmetric,thematrixwithelementsmustbepermutation-equivariant;thatis,exchangingelectronsandalsoexchangescolumnsand.WhiletraditionalSlater–Jastrow–backflowwavefunctionshavehadconsiderablesuccess,theyalsohavelimitationsduetothechoiceoffixedfunctionalforms.Thegoal,therefore,istocomeupwithmoreflexiblepermutation-equivariantfunctions.Herewehighlightseveralapproachesthatsharethiscommontheme.

BothPauliNetandFermiNetpredictrelativeenergieswithintherangeofexperimentalvaluesandagreewithmultireferencecoupledcluster.ThePauliNetconvergesmorequickly,whiletheFermiNetreacheslowertotalenergy.Figuremodifiedfrom
Figure 5: Automerizationofcyclobutadienewithneural-networkansatzes.BothPauliNetandFermiNetpredictrelativeenergieswithintherangeofexperimentalvaluesandagreewithmultireferencecoupledcluster.ThePauliNetconvergesmorequickly,whiletheFermiNetreacheslowertotalenergy.Figuremodifiedfrom\citetSpencer20.

Iterativebackflow

\citet

TaddeiPRB15introducedaformofbackflowthatappliedEq. 17repeatedlyinaninterativefashion.Suchanansatzisformallyequivalenttoexpressingthebackflowasadeepneuralnetwork\citepRuggeri2018-ql,albeitwithartificialrestrictiononthedimensionalityofthehiddenlayers.TheiterativebackflowwasusedforstudyingtheHeandHeliquids

DeepWF

TheDeepWF\citepHanJCP19approachusesanansatzsimilartoaSlater–Jastrowwavefunctionbutwithasimplerantisymmetricterm:

(19)

ThelearnedsymmetricfunctionissimilartoaJastrowfactorandensuresthatthewavefunctioncapturestheelectron-nuclearandelectron-electroncuspconditions.TheantisymmetricfactorsareconstructedfromtheVandermonde-likedeterminantofanexplicitlyantisymmetrictwo-bodyfunction,.Thetwo-bodyantisymmetricfunctionisentirelylearned.Suchafunctionalformcanbeevaluatedinoperations,comparedtoforadeterminant.However,theuseofasimplifiedantisymmetricfunctionisalsolikelytolimittheaccuracyachieved:DeepWFobtainsonly43.6%ofthecorrelationenergyfortheberylliumatomanddoesnotevenreachHFaccuracyfortheboronatom.ThePauliNetandFermiNetapproachesdescribedbelowdomuchbetter.VanillaPauliNetobtained99.94%and97.3%ofthecorrelationenergiesfortheberylliumandboronatoms,andFermiNet99.97%and99.83%,respectively.Furthermore,FermiNetandPauliNetbothsubstantiallysurpassconventionalSlater-Jastrow-backflow(SJB)wavefunctionsonfirst-rowatoms,forwhichnearlyexactbenchmarkvaluesexist.

PauliNet

PauliNet\citepHermannNC20buildsuponHForCASSCForbitalsasaphysicallymeaningfulbaselineandtakesaneuralnetworkapproachtotheSJBwavefunctioninordertocorrectthisbaselinetowardsahigh-accuracysolution(Fig. 4a).Cuspconditionsareexplicitlymetviatheinclusionofcuspcorrectiontermsinthewavefunction\citepMa2005-cusps.Agraph-convolutionalblockbasedonSchNet\citepschutt2018schnetisusedtocreateapermutation-equivariantlatentspacerepresentationdependingonthemany-electronconfiguration.ThisembeddingisthenpassedintoseparatedeepneuralnetworksthatlearntheJastrowfactoranda(cuspless)backflowtransformation.\citetHermannNC20introducedPauliNetwithapurelymultiplicativebackflowasshowninFig. 4a;\citetSchatzleJCP21generalizedthistoamultiplicativeandadditivebackflowasshowninEq. 16.PauliNetisoptimizedwithafixednumberofSlaterdeterminants.Mostoftheresultsreportedin\citetHermannNC20,SchatzleJCP21wereobtainedwitharound10determinants.

FermiNet

FermiNet\citepPfauPRR20takesamoreminimalist(ormachine-learningmaximalist)approachandattemptstotrainaneuralnetworktorepresenttheentirewavefunction(Fig. 4b).FermiNetusestwoparallelnetworks,describingone-andtwo-electronfeaturesrespectively.Theinputstoeachlayerintheone-electronstreamarepermutation-equivariantfunctionsoftheactivationsfromthepreviouslayersoftheone-andtwo-electronstreams.Thefinallayerprojectsthelatentspaceintotherequirednumberoforbitals,fromwhichdeterminantscanbeformedandevaluated.AswithPauliNet,thefinalwavefunctionisasumoveranumberofdeterminants.Formostoftheresultsreportedin\citetPfauPRR20,16determinantswereused.FermiNetbuildsuparichdescriptionofelectron-electroninteractionsfromthepermutation-equivariantmixingofinformationdescribingone-andtwo-electronfeatures.Inparticular,theelectron-nuclearandelectron-electroncuspsinthewavefunctionarerepresentedaccurately,despitenotbeingencodedexplicitly.WhereasPauliNetisusuallytrainedwiththeADAMoptimizer,FermiNettrainingwasfoundtobesubstantiallyimprovedwhenemployingtheKFACoptimizer.WhilebothPauliNetandFermiNetexceedtheaccuracyofconventionalSJBwavefunctionsonsmallsystems,thereareimportanttradeoffsbetweenthetwomodels.ResultsfrombothontheautomerizationofcyclobutadienecanbeseeninFig. 5.TheFermiNetistypicallytrainedwithalargernumberofparametersthanthePauliNet,requiringmoreiterationsandmorecomputationperiterationtoconverge,butittypicallyconvergestoalowerabsoluteenergy.Recently,\citetgerard2022goldproposedahybridansatzwhichusesneuralnetworklayerssimilartotheSchNetandPauliNetinaFermiNet-likearchitecture.ThishybridansatzwasfoundtoreachevenlowerabsoluteenergiesthantheFermiNetonsystemslikebenzeneandthepotassiumatom.

Potentialenergysurfaces

Typicallyoneoptimisesawavefunctionataspecificgeometrybutthisquicklybecomesprohibitivelyexpensiveforexploringthehigh-dimensionalpotentialenergysurfaceofevenrelativelysmallmolecules.\citetScherbelaNCS22developedatrainingmethodologythatallowsweightsharingbetween(simplified)PauliNetarchitecturestargetingdifferentgeometries.Byswitchingthegeometrybeingtrainedateachepoch,theyshowedthatthecomputationalcostfortrainingacrossasetofgeometriescanbeimprovedbyanorderofmagnitudewithoutaffectingtheaccuracyofthefinalenergies,with95%ofnetworkparameterssharedacrossallgeometries.Thisimpliesthatthenetworkislearningfeaturesofelectroncorrelationingeneralratherthanfittingtoaspecificgeometry.Theyalsodemonstratedthatawavefunctionforalargermoleculecouldbeinitialisedfromawavefunctionforasmallermoleculeandcouldthenbefine-tunedinarelativelyshortoptimizationstage.PretrainingneuralnetworkwavefunctionsfromsmallersystemshasalsobeenshowntodramaticallyaccelerateconvergenceforKagomelatticemodels\citepYang2020-bk.Inasimilarvein,\citetGao2021-cg,gao2022samplingdemonstratedthatameta-learningapproach,whereagraphneuralnetworkisusedtoparameterizeawavefunctionmodel,canaccuratelyrepresentthewavefunctionsformultiplegeometries,enablingafullyquantum-mechanicalpotentialenergysurfacetoberepresentedinasinglemodel.TheirapproachusedaFermiNet-likewavefunctionmodel,butthemeta-learningconceptisdirectlyapplicabletootherwavefunctionrepresentations,assumingthewavefunctionformissufficientlyflexible.

Periodicsystems

Therehasalsobeenprogressonusingfirst-quantizedneuralnetworkarchitecturesinperiodicsystems,suchasinteractingquantumgasesinlowdimension\citeppescia˙neural-network˙2022,theelectrongas\citepwilson2022-ueg,cassella2022-ueg,Li2022-abinitio,andforsmallcellsofsolidssuchaslithiumhydrideandgraphene\citepLi2022-abinitio.Again,sufficientlyexpressivenetworksattheVMClevelhavebeenfoundcapableofrivallingorsurpassingtheaccuracyoffixed-nodediffusionMonteCarlocalculationsusingconventionalSlater-Jastrow-backflowtrialwavefunctions.

4.3 Extensions

Pseudopotentials

Theelectronicstructureofheavyatoms,especiallytransitionmetals,iscomplicatedandchallengingforallQCmethods.ThedifficultyiscompoundedbythehighcomputationalcostofvariationalMonteCarlomethods,whichscaleroughlyas\citepHammond1987,whereisthenuclearcharge.Whilstthecoreelectronscontributeheavilytothetotalenergy,energydifferencesarelargelydeterminedbythebehaviourofthevalenceelectrons.Thecoreelectronscanthereforeberemovedandtheeffectivenuclearchargereducedbytheuseofpseudopotentials.Theuseofpseudopotentialsiscommoninmanymethods,includingdensityfunctionaltheoryandconventionalvariationalMonteCarlo.\citetLi2022-pseudodemonstratethateffectivecorepotentialscanbereadilycombinedwithFermiNetandachieveaccuracycomparabletoCCSDT(Q)extrapolatedtothecompletebasissetlimitforfirst-rowtransitionmetalatoms.Thecomputationaltimeperiterationwasreducedby43%(17%)forthescandium(zinc)atomusinganargoncore.Again,thisapproachisnotrestrictedtoFermiNet.Pseudopotentialscanbeusedwithanyfirst-quantizedneuralnetworkwavefunction.

DiffusionMonteCarlo(DMC)

ProjectormethodssuchasDMC\citepneeds2020variationalandauxiliary-fieldMonteCarlo\citepShiJCP21gobeyondVMCbyusingstochasticalgorithmstosamplethegroundstatewithoutrequiringitswavefunctiontoberepresentedasaknownfunctionornetwork.DMCisinprincipleexactbut,formany-fermionsystems,reliesinpracticeonthefixed-nodeapproximation,inwhichcollapsetothebosonicgroundstateisavoidedbyimposingthesignstructureofthetrialwavefunctionontheDMCwavefunction.ADMCsimulationthereforesamples(stochastically)thelowestenergystatewiththesamesignstructureasthetrialwavefunction.TheimprovementsthatresultfromapplyingDMCtoconventionalSlater-Jastrow-backflowtrialfunctionsoptimizedusingVMCmethodsaresubstantial,explainingwhyDMCissooftenusedtoprovideimprovedestimatesoftheground-statewavefunctionandenergy.\citetWilson21combinedDMCwithaFermiNettrialwavefunction.Forfirst-rowatoms,DMCcapturedmuchoftheremainingcorrelationenergy(94%ofthedifferencebetweentheVMCenergyandtheexactenergyinthecaseofthenitrogenatom).However,\citetWilson21usedasimplifiedFermiNetthatgaveVMCenergieshigherthanthosereportedby\citetPfauPRR20,whichwerealreadywithin1mHofexactresultsforallfirst-rowatoms.Givenevidencethatthemean-fieldequivalentofPauliNetcanessentiallymatchHFinthecompletebasissetlimit\citepSchatzleJCP21,itispossiblethattheremainingerrorinPauliNetandFermiNetwavefunctionsisdominatedbyerrorsinthenodalsurface,whicharerarelysampledregionsduringoptimisation.Ifthisisthecase,diffusionMonteCarlowiththefixednodeapproximationmaynotproducesubstantiallylowerenergies.Ontheotherhand,sinceneuralnetworkwavefunctionsroutinelycaptureover90%ofthecorrelationenergyattheVMClevel,theneedtoperformexpensivediffusionMonteCarlocalculationsisgreatlyreduced.Morerecently,\citetRen22showedthatDMCcancaptureroughlyhalfoftheremainingcorrelationenergyfortheatomsLi-Ar,whenusingaverysmallFermiNet-basedarchitecture.WhilstitispossibletoachieveenergieswithinchemicalaccuracyusingFermiNetattheVMClevel,thesecalculationsmodelthecaseforlargersystemswhereconvergingtheenergywithrespecttonetworksizemightnotbefeasible.\citetRen22wentontodemonstratethatDMCusingFermiNettrialwavefunctionsnoticeablyreducestheenergyforlargersystems.Inthecaseofthebenzenedimer,thereductionwas50mH.

ExcitedStates

Ourdiscussionsofar,andmostVMCcalculations,havefocusedongroundstateproperties.However,excitedstatesareofcriticalimportancetounderstandthebehaviourofmaterials.Fortunately,recentalgorithmicdevelopmentsbymultiplegroupshavedemonstratedthatthecalculationofexcitedstatesusingVMCmethodsisfeasibleandcanachieveanacceptabletrade-offinaccuracyandcost.HerewehighlightthreesuchapproachesutilizingconventionalVMCwavefunctions.Oneapproachisthestate-averagedVMCmethod\citepSchautz2004,Dash2019,inwhichtheaverageenergyovermultiplestatesisminimisedandindividualstatesareprojectedoutviadiagonalizationwithinthebasisofexcitedstates.Similartechniquesareusedwithotherquantumchemistrymethods.\citetZhao2016insteadminimizedadifferentobjectivefunction,suchthatthestatewithenergyclosesttoadesiredenergytargetisobtained.\citetPathak2021suggestedasimplealternative,whereastateisforcedtobe(approximately)orthogonaltoalllowerenergystatesviaapenaltyterm.ThesetechniquescanbereadilyappliedtoVMCusingneural-networkwavefunctionsand,inparticular,penaltyfunctionapproacheshaverecentlybeenexplored.Aswithground-statecalculations,theflexibilityofthewavefunctionansatztorepresentthedesiredstateiscritical.\citetEntwistle22demonstratedthatthePauliNetarchitecturecombinedwithapenaltyfunctioncanrepresentthelowestfewexcitedstatesofmoleculesuptothesizeofbenzene(Fig. 4c).Relatedly,\citetChooPRL2020demonstratedthatNQSonlatticemodelscanobtainthelowest-energystateofanygivenAbeliansymmetrybyperformingwhatisessentiallyaground-statesimulationinthatsymmetrysector,andmultiplestatesofthesamesymmetryusingapenaltyfunction.However,themostaccurateandefficientwaytoobtainexcitedstateswithinVMC,irrespectiveofwavefunctionansatz,remainsanopenquestion\citepCuzzocrea2020.

5 Electronsinsecondquantization

(
Figure 6: Electronicenergiesformoleculesandsolidsinsecondquantization(a)DissociationcurveformoleculeintheSTO-3Gbasis.ThegreenstarsshowresultsforarestrictedBoltzmannmachinewhichrepresentstheelectronsindiscretespace.Figuretakenfrom\citepChooNC20.(b)Grapheneonahoneycomblatticesolvedusingthecc-pVDZbasisset.Figuretakenfrom\citepyoshioka2021solving.

Insteadofworkingdirectlywiththeinfinite-dimensionalHilbertspacecorrespondingtothereal-spaceHamiltonianofEq. 2,itiscommonpracticeinQCtouseafinitebasisset.Bychoosingasetofelectronicbasisfunctions,wecandefineasetofsecond-quantisedoperators()whichcreate(annihilate)anelectroninthe-thbasisfunction,andwhichsatisfythecanonicalanticommutationrelations.Theseoperatorsthenactonthesecond-quantizedwavefunction,whichencodesamplitudesfordifferentoccupationsoftheorbitals(Box LABEL:box:first-second-quant).Projectingthereal-spaceHamiltonianontothissetoforbitalsthenyieldsthecorrespondingdiscretizedHamiltonian,

(20)

where

(21)
(22)

arematrixelementsoftheone-andtwo-electrontermsinthereal-spaceHamiltonianofEq. (2).ForsimplebasisfunctionssuchasGaussiansorplanewaves,thematrixelementscanbeevaluatedanalytically.ThisHamiltonianservesasthestartingpointforthemethodsdescribedinthissection.

5.1 Fermionicneuralquantumstates

Insteadofworkingdirectlywiththeoccupation-numberrepresentationofthewavefunction(Box LABEL:box:first-second-quant),itisalsopossibletomapoccupationnumbersontodegreesoffreedomofspin-1/2particles,suchthatemptyorbitalsmaptodownspinsandoccupiedorbitalstoupspins.ThismappingmakesitpossibletoleverageNQSandothermethodsforsolvingquantumspinsystems.ThesamedualityallowsthecreationandannihilationoperatorsappearingintheelectronicHamiltonian(Eq. 20)tobewrittenintermsofspinoperators.Thiscanbeachieved,forexample,withtheJordan–Wignermapping\citepWigner1928,thattransformsannihilationandcreationoperatorsinto,respectively,loweringandraisingspinoperators.Thismappingisnotunique,however,andthereexistmorerecentalternatives,suchasparityorBravyi–Kitaevencodings\citepBK2002,bothofwhichhavebeendevelopedinthecontextofquantumsimulations.Regardlessofthechoiceofspinencoding,thefinaloutcomeisaspinHamiltonianwiththegeneralform

(23)

definedasalinearcombinationwithrealcoefficientsof,whichare-foldtensorproductsofsingle-qubitPaulioperatorsandtheidentity:.ThegroundstateofthespinHamiltonianinEq. 23canbeapproximatedusingaspin-basedNQSrepresentationbasedoncomplex-valuedRBMs\citepCarleoS17.Forasystemofspins,themany-bodyamplitudecorrespondingtoastateinthebasis,i.e.,),takesthecompactform

(24)

withparameters.ThisansatzcanbeoptimisedwithVMCtechniques(Box LABEL:sec:optimization),typicallyrelyingonthestochasticreconfiguration\citepSorellaPRL98approach.Anumberofworkshaveadoptedthisapproachandachievedcompetitivevariationalresultsforsmallbasissets\citepChooNC20,YangJCTC20,eveninconjunctionwithquantumcomputers\citeptorlai2020precise,iouchtchenko2022neural.InFig. 6(a),weshowthedissociationcurveof,intheSTO-3Gbasis,usingtheRBMasdescribedabove\citepChooNC20.

Solids

Thesecond-quantizationframeworkalsoallowsonetotreatsolids,usingasabasistheBlochorbitalsobtainedbysolvingthecrystallineHFequations\citepdel˙re˙self-consistent-field˙1967.Creationandannihiliationoperators,and,forelectronsinbandwithcrystalmomentumareintroduced,andtheresultingHamiltonianissimilartoEq. 20,withthenoticeabledifferencethattheone-andtwo-bodymatrixelementsnowdependonthecrystalmomenta:and,withthefourmomentaappearinginthetwo-bodyintegralssatisfyingtheconservationofthetotalcrystalmomentum.UsingGaussian-basedatomicfunctionsasthesingle-particlebasisandRBMwavefunctionstorepresentthemany-bodystate,\citepyoshioka2021solvingappliedthisapproachtostudytheelectronicstructureofsolids.InFig. 6(b),weshowthecomputedground-stateenergiesforgraphenecrystalsasafunctionofthelatticeconstant.

ExactSampling

FermionicNQSaretypicallysampledusingtheMCMCapproachcommonlyadoptedinVMC(Box LABEL:sec:QMC).However,themixingrateoftheMCMCalgorithmisknowntobeslowinsomecases,suchasclosetophasetransitions,andMCMCsimulationscansufferfromcriticalslowingdown.Awaytocircumventthislimitationistointroducemodelwavefunctionsexplicitlydesignedtoallowexactsamplingoftheirsquaremodulus,thusavoidingtheneedtouseMCMC.Onesuchfamilyareautoregressiveneuralnetworkwavefunctions\citepsharir2020deep,acomplex-valuedgeneralizationoftheautoregressivemodelscommonlyadoptedindeeplearning.Suchnetworksrepresentnormalizedwavefunctionsandallowonetodirectlyobtainperfectlyuncorrelatedsamples;thisisusefulasthewavefunctiondistributionformanyQCproblemscanbehighlymulti-modal.TheexactsamplingapproachwasappliedtoQChamiltoniansinarecentworkby\citetBarrettNMI22.OptimizationsinthewayHamiltonianmatrixelementsandthecorrespondingMonteCarloestimatorsarecomputedhavemadeitpossibletotreatmuchlargersystemsthanwereaccessibleintheearlyapplicationsof\citetChooNC20.Specifically,\citetzhao˙scalable˙2022)obtaincompetitivevariationalenergies,improvingontheCCSDenergiesofmoleculesinminimalbasissets.Resultsforuptoaround50electronsin80orbitals(\ceNa2CO3atequilibrium)havebeenobtainedatrelativelymodestcomputationalcost.

5.2 ML-assistedselectedCI

FormanyQCproblems,althoughthedimensionoftheHilbertspacegrowsexponentiallywithsystemsize,thenumberofrelevantconfigurationsinthegroundstatetypicallyremainssparse.ThissuggeststhatbyefficientlyselectingtherelevantconfigurationsandthendiagonalisingtheHamiltonianonthereducedsubspace,onecanachievehighlyaccurateresults.ThissetofapproachesisalsoknownasselectedCI\citepHuronJCP73,giner2013using,holmes2016heat,sharma2017semistochastic.DifferentflavoursofselectedCIvaryinthewayrelevantconfigurationsareselected.Onewell-knownapproachiscalledMonteCarloCI(MCCI)\citepgreer1998monteandcanbebrieflysummarisedasfollows:

  1. Startfromafinitesetofconfigurations

  2. Byconsideringsingleordoubleexcitationsstartingfromconfigurationsin,constructanexpandedset.

  3. ConstructtheHamiltonianfortheexpandedsetanddiagonalisetoobtainthewavefunctioncoefficientsfortheconfigurationsintheset.

  4. Discardtheconfigurationswhosecoefficientislessthanagiventhreshold.Theremainingconfigurationsthenformanewsetofconfigurations.

  5. Repeatuntilconvergence.

MLtechniquescanbeusedtoimproveselectionoftheconfigurationset.Onesuchapproachistoperformsupervisedlearning\citepCoe2018Machine,GlielmoPRX20,whereaneuralnetworkistrainedtopredictthewavefunctioncoefficientsusingthedatafromtheMCCImethod,i.e.,thewavefunctioncoefficientsoftheconfigurationsintheset.Aftertraining,thenetworkcanbequeriedorsampledtoselecttheconfigurationswiththelargestcoefficients.Inotherwords,thenetworkisusedtobootstrapandpredictthecoefficientsofconfigurationsnotyetseeninthedataset.Itwasshownin\citetCoe2018MachinethatsuchanapproachconvergesfasterthanthevanillaMCCImethod.ThetaskofselectingconfigurationsforselectedCIcanalsobecastasareinforcement-learningtaskwherethestateisthecurrentsetofconfigurationsandanagentistrainedtoperformactionsonthesettoiterativelymodifytheconfigurationswiththeaimofminimisingthevariationalenergy.Thisapproachwasappliedin\citetGoings2021Reinforcementtoachievenear-FCIaccuracyforsmallmoleculesinasmallbasisset.

6 Challengesandoutlook

Ab-initioQCwithneural-networkwavefunctionshasonlyjustemergedasaviablepathtohighlyaccurateelectronic-structuremethods,yetitalreadycompeteswithestablishedapproachesthathavebeendevelopedfordecades.Weimaginethatitmaybecomethemethodologywiththebesttrade-offbetweenefficiencyandaccuracyforsystemswithuptoonetotwohundredelectronsandanontrivialelectronicstructure.Beforethatcanhappen,however,severalchallengesmustbeaddressed.Allthemethodsarecurrentlyinadevelopmentstageandonlylimitedbenchmarkingisavailable.Assuch,itisnotyetclearwhethertheexcellentaccuracyseensofarwillbemaintainedacrossabroaderrangeofchemicalsystems,orhowrapidlytheaccuracywilldegradewithsystemsize.Relatedtothisisourincompleteunderstandingofwhatlimitstheaccuracyofneural-networkansatzes,andhowtheirsuccessorfailureisrelatedtophysicalphenomenasuchasstrongcorrelation.Sincetheunderlyingelectronicproblemisexponentiallyhardbutthealgorithmsarepolynomial,theymustbelimitedinaccuracyinsomeways.Itisnotcurrentlyclear,however,whetherthelimitationsseentodatearecausedbytherestrictedexpressivenessoftheneuralnetworksorbydifficultiesinoptimizationorboth.Forinstance,whileithasbeenproventhatasinglegeneralizedSlaterdeterminantisinprinciplesufficienttorepresentanyantisymmetricfunction\citepHutter20,itmightnotbepossibletoparametrizeitwithapolynomiallyscalingneuralnetworkortrainitwithinapolynomiallyscalingtime.Apartfromthesefundamentalissues,therearemanypracticalchallenges.WhilethescalingofvariationalQMCwithsystemsizeisfavourable,theprefactorduetotheneuralnetworksislarge.Untilveryrecently,thislimitedapplicationstosystemsnolargerthanthebenzenemolecule(42electrons),whichisthreetofourtimesbelowourenvisagedapplicabilityrange,althoughresultsfora108-electronsimulationcellofsolidLiHhavenowbeenreported\citepLi2022-abinitio.TheprefactorcanbereducedbyintegratingtraditionalQCtechniquessuchaspseudopotentials\citepLi2022-pseudo,developingmoreefficientneural-networkarchitectures,orusingMLtechniquessuchaspre-trainingandtransferlearning.Specifictothediscrete-basissecond-quantizedapproachesistheissueofbasis-setconvergence,wheresufficientlylargebasissetsmayincreasetheprefactorbyuptothreeordersofmagnitudecomparedtominimalbasissets.Anotherchallengeisrelatedtothestochasticoptimization,whichproducesnoiseintheconvergedenergiesthatisespeciallyamplifiedwhencalculatingsmallenergydifferences.Weare,however,optimisticthatmanyofthesechallengescanbeaddressedandcanbeaddressedquickly,thankstotherelativesimplicityoftheframeworkbasedonvariationalQMCandofneuralnetworkscomparedtotraditionalQCapproaches.Indeed,thissimplicityhasalreadyenabledrapiddevelopmentofmultipleextensionstothefirstsingle-pointground-statecalculationsonmolecules,includingtransferablewavefunctions,excitedstates,andformulationsforperiodicsystems,alloriginatingfrommultipleindependentresearchgroups.First-quantizedapproachessuchasFermiNet,PauliNet,andtheirsuccessorarchitecturesalreadymatchessentiallyexactbenchmarkresultstowithinchemicalaccuracyforsmallsystems.Yetthesenetworksarejustasmallsubsetofpossiblearchitecturesforrepresentingantisymmetricwavefunctions,anditisunlikelythattheoptimaloneswerefoundonthefirstattempt,soweexpectthatsignificantinnovationliesahead.Webelievethatab-initiomethodsbasedonneural-networkwavefunctionswillbecomeanintegralpartoftheQCtoolboxthatenablesstraightforwardelectronic-structurecalculationsofcomplexmolecularsystems.\printbibliography

Acknowledgements

WeacknowledgefundingfromtheGermanMinistryforEducationandResearch(BerlinInstitutefortheFoundationsofLearningandData,BIFOLD),theBerlinMathematicsResearchCenterMATH+(AA1-6,AA2-8),andEuropeanCommission(ERCCoG772230ScaleCell).