Ab-initio quantum chemistry with neural-network wavefunctions

Jan Hermann James Spencer Kenny Choo Antonio Mezzacapo IBM Quantum, Thomas J. Watson Research Center, Yorktown Heights, New York 10598, USA W. M. C. Foulkes Imperial College London, Department of Physics, South Kensington Campus, London SW7 2AZ, United Kingdom David Pfau Giuseppe Carleo Frank Noé

\tcbset

before upper= \newtcolorbox[auto counter]mybox[2][]floatplacement=t!,float,fonttitle=,title=Box \thetcbcounter — #2,#1 \addbibresourcerefs.bib \addbibresourcerefs-jh.bib \AtBeginBibliography

Abstract

Machinelearningandspecificallydeep-learningmethodshaveoutperformedhumancapabilitiesinmanypatternrecognitionanddataprocessingproblems,ingameplaying,andnowalsoplayanincreasinglyimportantroleinscientificdiscovery.Akeyapplicationofmachinelearninginthemolecularsciencesistolearnpotentialenergysurfacesorforcefieldsfromab-initiosolutionsoftheelectronicSchrödingerequationusingdatasetsobtainedwithdensityfunctionaltheory,coupledcluster,orotherquantumchemistrymethods.Herewereviewarecentandcomplementaryapproach:usingmachinelearningtoaidthedirectsolutionofquantumchemistryproblemsfromfirstprinciples.Specifically,wefocusonquantumMonteCarlo(QMC)methodsthatuseneuralnetworkansatzfunctionsinordertosolvetheelectronicSchrödingerequation,bothinfirstandsecondquantization,computinggroundandexcitedstates,andgeneralizingovermultiplenuclearconfigurations.Comparedtoexistingquantumchemistrymethods,thesenewdeepQMCmethodshavethepotentialtogeneratehighlyaccuratesolutionsoftheSchrödingerequationatrelativelymodestcomputationalcost.

\@footnotetext

$^{†}$ Theseauthorscontributedequally\@footnotetext $^{*}$ Emails:frank.noe@fu-berlin.de,giuseppe.carleo@epfl.ch,pfau@google.com

1 Introduction

Inthepastdecade,machinelearning(ML)hasmadeinroadsintomanyareasofthephysicalsciences\citepCarleoRMP19,oftenoutperformingmoretraditionalcomputationalmethods\citepjumper2021highly,DeringerN21orofferingentirelynewapproachestosolvescientificproblems\citepNoeS19,HuangNC20.Quantumchemistry(QC)hasbeenamongthefirstfieldstohavebeenaffectedbythisrevolution\citepTkatchenkoNC20,vonLilienfeldNC20,NoeARPC20.MostapplicationsofMLinQChavebeenconcernedwithsupervisedlearningofmolecularpropertiesfrommolecularstructure\citepDralJPCL20,eitheracrossconformational\citepUnkeCR21orchemicalspace\citepvonLilienfeldNRC20,aswellaswithunsupervisedlearningforthegenerationofnovelmolecules\citepBianJMM21.Thesemethodsallrequireapre-existingdatasetofmoleculesandtheirpropertiesasaninput,typicallyobtainedwithstandardmethodsofQCsuchasdensityfunctionaltheory\citepJonesDFTReview20215orcoupledclustertheory\citepBartlett2007.Inthesescenarios,MLaccuratelyapproximatesagivenmethodofQCatvastlyincreasedcomputationalefficiency.Thisapproachhasbeenalreadyreviewedinotherworkscitedabove.Incontrast,thecurrentreviewfocusesonthecomplementaryuseofMLasanab-initiotechniqueinQC,whichrequiresnoexternaldataandinsteadrecoversmolecularpropertiesfromfirstprinciples.Here,MLis“integrated”intoQC,withthegoalofarrivingatab-initiomethodswithamorefavourableaccuracy–efficiencytrade-offthantraditionalQCmethods.

Figure 1: Quantumchemistryandmachinelearning.(a)Machinelearningdisciplinesandtheirdependenceondatacanbemappedtodisciplinesinquantumchemistry.Thisworkreviewstheuseofmachinelearninginab-initioquantumchemistry,wheretheonlyinputtomachinelearningistheSchrödingerequationitself.Thisapproachusesself-generateddata,ratherthanrelyingonexternaldata.Theclosestanalogueinmachinelearningisreinforcementlearningwithself-play,whichsubstitutesdatafromanexternalenvironmentwithdatageneratedbytheagent,thoughinmanyotherrespectsthetwoapproachesaredistinct.(b)Trade-offbetweencomputationalefficiencyandaccuracyinquantumchemistrymethods.Accuracyofelectronicstructuremethodsagainsttheasymptoticscalingoftheircomputationalcostwithsystemsize, $N$ .Popularmethods,suchasdensityfunctionaltheory,areoutliersfromthegeneraltrend.

Thegoalofcomputationalchemistryistopredictpropertiesofknownmoleculesandtodesignmoleculeswithdesiredproperties.Mostmolecularpropertiesaredeterminedbythebehaviouroftheelectrons,soQCmethodsattempttoapproximatetheSchrödingerequationforelectronsinmolecules.Traditionally,QCmethodsaredividedintoab-initioandsemi-empiricalmethods,wheretheformerhavenofittedparametersdeterminedfromexternaldata,whereasthelatterdo.Methodsthatdonotusequantummechanicsatall(suchasforcefields)arecalledempiricalandaretypicallynotconsideredpartofQC,althoughthisviewmaybechangingwiththeadventofprincipledandaccurateML-basedempiricalmethods.ItisusefultocastthesethreecategoriesofmethodsinthelightofMLterminology(Fig. 1a).MLcanberoughlydividedintosupervised,unsupervised,andreinforcementlearning.InsupervisedlearningtheMLmodellearnstopredictthelabels(outputs)ofthedata(inputs)fromagivendatasetsoastominimizethedifferencebetweenthepredictedandreferencelabels.Byidentifyingtheinputswithmolecularstructuresandtheoutputswithmolecularproperties,allsemi-empiricalandempiricalmethodsofQCfitintosupervisedlearning,butusingmostlyrelativelysimpleandphysicallymotivatedfunctionalformsratherthanthemoregeneralandhighlyflexiblefunctionstypicalforML.Viceversa,themanyrecentsuccessfulsupervisedMLmodelsthatpredictenergiesorothermolecularpropertiesbasedonQCtrainingdatacanbeclassifiedasempiricalmethods\citepDeringerCR21,BehlerCR21,UnkeCR21,MusilCR21.Unsupervisedlearningisconcernedwithunlabelleddata,andthegeneraltaskistolearntheunderlyingprobabilitydistributionthatwouldgenerateagivendataset.Examplesinchemistryincludegenerativemodelsforstructuralformulas\citepGomez-BombarelliACS18aswellasfull3Dstructuresofmolecules\citepNoeS19,Hoogeboom22,andinphysicstheestimationofquantumstatesfrommeasurements,knownasquantumtomography\citepTorlaiNP18.Finally,inreinforcementlearning,theMLmodel(alsoreferredtoasanagentisabletointeractdirectlywithitsenvironment,ratherthantojustpassivelyreceivedata.Here,theaimisfortheagenttolearnapolicyforhowtointeractwiththeenvironmentsoastomaximizealong-termreward\citepsutton2018reinforcement.ReinforcementlearningisbehindsomeofthemostprominentsuccessesofMLsuchasplayinggamesatasuperhumanlevel\citeptesauro1994td,mnih2015human,silver2016masteringorthecontrolofplasmaintokamaks\citepDegraveN22.Incertainsettingstheagentcanself-generatedatabytreatingitsownpolicyastheenvironment.Thisisknownasself-play,andhasbeenthebasisformanyadvancesinsymmetricgames\citepheinrich2015fictitious,SilverS18.Althoughtherearemanykeydifferences,thisisthebranchofMLconceptuallymostsimilartoab-initioQC,inthesensethatnoexternaldataotherthantherulesofthesystemorgamearerequiredforeither.Inthetraditionalpicture,onemovesfromempiricaltoab-initiomethodsbyretainingmoreofthefirst-principlesphysics.Similarly,thereisageneraltrendforMLmodelsinchemistrytoencodeanincreasingamountofmolecularphysics.Thisincludesphysicalconstraintssuchasenergyconservation\citepChmielaSA17,invarianceandequivarianceofmolecularpropertieswithrespecttorotation,translation,orexchangeofindistinguishableparticles\citepBehlerPRL07,SchuttNC17,aswellasotherphysicalconceptssuchasmany-bodyexpansions\citepDrautzPRB19orevensurrogatequantum-mechanicalmodels\citepLiJCTC18a,SchuttNC19,KirkpatrickS21.Similarconsiderationscanbemadefortheproblemofab-initiolearningofsolutionstotheelectronicSchrödingerequationintroducedhereandwewilldiscussdifferentstrategiesthroughoutthereview.TheSchrödingerequationisaneigenvalueproblemthatcanbeequivalentlyformulatedviaseveralvariationalprinciples—itssolutions,theeigenstatewavefunctionsandenergies,canbefoundbysearchingforstationarypointsofcertainfunctionalsoverthespaceofallphysicallyadmissiblewavefunctions.Importantly,thegroundstateofamoleculecanbefoundbyminimizingtheenergyexpectationvalueofawavefunction.Thisprincipleunderliesmanyab-initioQCmethods,andalsothemethodsinthisreview,assuchavariationalprinciplenaturallydefinesaMLproblem—theeigenstates(suchasthegroundstate)arerepresentedasaneuralnetworkandtheparametersofthatnetworkareobtainedbyminimizingthevariationalelectronicenergy.Thereviewedmethodsdifferintheparticularformoftheneural-networkansatzused,asdescribedbelow.

Figure 2: Electronicstructureproblemanditsneural-networksolutions.(a)TheproblemisfullyspecifiedbythegeometryofamoleculeandtheelectronicSchrödingerequation.(b)OnlyfullyantisymmetricwavefunctionsareadmissibleassolutionsduetothePauliexclusionprincipleand(c)theseareoftenrepresentedwithSlaterdeterminants.(d,e)Solutionsformulatedinfirstquantizationuseantisymmetricneuralnetworkstorepresentthewavefunctiondirectlyinrealspace.(f)Secondquantizationtransferstheantisymmetrytoafixedfinitebasis,enablingtheuseofvanillaneuralnetworks.

Section 2brieflyreviewsthecomponentsofelectronicstructuretheorynecessaryforthedevelopmentoftheMLmethodstobediscussedlateron.TheelectronicstructureproblemismappedtoMLinSection 3,whichisfollowedbyareviewoftheab-initioMLmethodsforQCformulatedinrealspaceandinadiscretebasisinSections 5 and 4,respectively.ThereviewisconcludedinSection 6.

2 Electronicstructure

2.1 Schrödingerequation

QCaimsatfindingapproximatesolutionsoftheelectronicSchrödingerequationthatstrikeagoodbalancebetweenaccuracyandefficiency\citepPiela20(Fig. 1b).Thenon-relativisticelectronicSchrödingerequationwithintheBorn–Oppenheimerapproximationforagivenmoleculespecifiedbythechargesandcoordinatesofthenuclei, $Z_{I}$ , $R_{I}$ ,isasecond-orderdifferentialequationforthewavefunction, $ψ (r_{1}, \dots, r_{N})$ ,whichisafunctionofthecoordinatesof $N$ electrons(Fig. 2a):

	$^H ψ (r_{1}, \dots, r_{N}) = E ψ (r_{1}, \dots, r_{N}),$		(1)
	$^H := \sum i (- \frac{1}{2} \nabla_{r_{i}}^{2} - \sum I \frac{Z_{I}}{\| r_{i} - R_{I} \|}) + \sum i < j \frac{1}{\| r_{i} - r_{j} \|} .$		(2)

AnalternativeformulationoftheSchrödingerequationusesthenotionofanexpectationvalue,

	$⟨^H ⟩_{ψ}$	$\equiv \frac{⟨ ψ \|^H \| ψ ⟩}{⟨ ψ \| ψ ⟩}$		(3)
		$= \frac{\int d r_{1} \dots d r_{N} ψ (r_{1}, \dots, r_{N})^H ψ (r_{1}, \dots, r_{N})}{\int d r_{1} \dots d r_{N} \| ψ (r_{1}, \dots, r_{N}) \|^{2}} .$		(3)

InsteadofsolvingEq. 1,theground-state(lowest-energy)solutioncanbefoundbyminimizingthisenergyexpectationvaluewithrespecttoallpossiblewavefunctions(variationalprinciple),

E = min ψ ⟨^H ⟩_{ψ} .

(4)

2.2 Antisymmetricwavefunctions

Electronsarefermions,andassuchtheirwavefunctionmustbeantisymmetricwithrespecttoexchangeofanytwoelectrons.ThiscardinalfeatureofelectronicwavefunctionspermeatesthewholeofQC.Ingeneral,electronsalsopossessspincoordinates, $s_{i} \in {↑, ↓}$ ,butthenonrelativisticHamiltoniandoesnotoperateonspin,sothespincoordinateofeachelectroncanbeconsideredfixed.Tosimplifythepresentationhere\parencite[forfulltreatment,see][Sec. IV.E]FoulkesRMP01,wetakeadvantageofthefixedspincoordinates,sothespatialwavefunctionmustbeantisymmetriconlywithrespecttotheexchangeofsame-spinelectrons,i.e.,when $s_{i} = s_{j}$ (Fig. 2b),

ψ (\dots, r_{i}, \dots, r_{j}, \dots) = - ψ (\dots, r_{j}, \dots, r_{i}, \dots) .

(5)

ByfarthemostcommonwaytoformantisymmetricwavefunctionsinQCisasantisymmetrizedproductsofsingle-electronfunctions(orbitals), $ϕ_{j} (r)$ .Theseproductscanbewrittenasdeterminantsofan $N \times N$ matrix, $ϕ_{j} (r_{i})$ ,formedbyputting $N$ electronsinto $N$ orbitals,andarereferredtoasSlaterdeterminants(Fig. 2c):

(6)

Wheninterpreting $ϕ_{j} (r_{i})$ asthe $j$ -thecomponentofa $N$ -dimensionalfeaturevectorforthe $i$ -thelectron(usingMLparlance), $ϕ (r_{i})$ ,aSlaterdeterminantisinfacttheonlyantisymmetricfunctionof $N$ featurevectorsthatislinearineveryoneofthem,makingitanaturalchoice.Alternativeantisymmetricformsexist,suchasthePfaffian\citepBajdichPRL06ortheVandermondedeterminantanditsgeneralizations\citepHanJCP19,AcevedoDLS20,butthesearefarlesscommonandwewillnotdiscussthemhere.Slaterdeterminantsformedfromdifferentorbitalscanbefurthermixedinalinearcombinationwithoutbreakingtheantisymmetry(Fig. 2c).Infact,thissimpletechniqueisthepowerhousebehindallthehigh-accuracymethodsofQC,yetitisalsoitsbane,becausethenumberofSlaterdeterminantsrequiredtoachieveagivenaccuracyrisesexponentiallywiththenumberofatomsinmostcases.Forfermionicwavefunctionsthereisnoknowngeneralapproachtoeffectivelyreducethesearchspacefromthisexponentialregimewithoutsacrificingaccuracy.However,QChasproducedmanymethodsthatachieveexcellentapproximationsforspecificmoleculesandmaterialsofpracticalinterest.Thecostofthesehighlyaccuratemethodsisgenerallylessthanexponential,butneverthelessincreasesrapidlywithsystemsize(Fig. 1b).

2.3 Variationalwavefunctionmethods

AnimportantclassofQCmethodsderivesdirectlyfromthevariationalprinciple(Eq. 4),byassumingacertainwavefunctionansatz, $ψ (\cdot; θ)$ ,parametrizedby $θ$ .Minimizingtheenergyofthisansatzwithrespectto $θ$ thenalwaysyieldsanupperboundfortheexactground-stateenergy,

E = min ψ ⟨^H ⟩_{ψ} \leq min θ ⟨^H ⟩_{ψ (\cdot; θ)} .

(7)

Theboundbecomestighterastheexpressivenessoftheansatzisimproved.Onecandistinguishtwostrategiestoconstructtheansatzes.First,traditionalQCusesrelativelysimpleforms,suchthattheintegralofEq. 3canbeevaluatedanalytically,whichdrasticallysimplifiestheminimizationproblem\citepSzabo96,Piela20.Second,quantumMonteCarlo(QMC)enablestheuseofarbitrarilycomplexansatzesatthecostofhavingtodotheintegralevaluationandminimizationstochastically\citepBecca17.Thelatterisanaturalframeworktoincorporateneuralnetworks,andweintroduceitinmoredetailinSection 3.1.Hereweintroducethreeansatzesforelectronicwavefunctionsofthefirst(traditional)kind,sincetheyserveasscaffoldingfortheneural-networkansatzesofSections 5 and 4.WealsobrieflydiscusshowtheyrelatetootherpopularQCmethods.{mybox}[label=box:first-second-quant]FirstandsecondquantizationComputationalmethodsfortheelectronicSchrödingerequationcanbedividedtofirst-quantizedapproachesinrealspaceandsecond-quantizedapproachesinadiscretebasis.Infirstquantization,oneworkswiththeindividualelectronsandtheircoordinatesdirectlyinrealspace( $r_{i} \in R^{3}$ , $i = 1, \dots, N$ )asinEq. 1,

| ψ ⟩ = \int d r_{1} d r_{2} \dots ψ (r_{1}, r_{2}, \dots) | r_{1} r_{2} \dots ⟩,

Here, $ψ$ mustbeanantisymmetricfunction,whichspecifieswhichelectronsoccupywhichcoordinates,whilethemany-electronbasisstates( $| r_{1} r_{2} \dots ⟩$ )areordinarynon-symmetric(Cartesian)productstates.Insecondquantization,onehastofirstintroduceadiscretebasis(inpracticefinite),labelledby $k$ ,whichthenenablesonetoworkwithpreformedantisymmetricmany-electronbasisstates(Slaterdeterminants),andratherthanspecifyingwhichelectronsoccupywhichone-electronstates,theoccupationnumbers( $n_{k} \in {0, 1}$ , $\sum_{k} n_{k} = N$ )specifywhichone-electronstatesareoccupiedwithoutanyreferencetoaparticularelectron,

| ψ ⟩ = \sum n_{1} n_{2} \dots ψ_{n_{1} n_{2} \dots} | n_{1} n_{2} \dots ⟩ .

Here, $ψ_{n_{1} n_{2} \dots}$ canbeanarbitrarytensorwithoutantisymmetry,whichisinsteadencodedinthemany-electronbasisstates $| n_{1} n_{2} \dots ⟩$ .Thisabilitytopushtheantisymmetryfromthewavefunctionobjecttothemany-electronbasisisthemainadvantageofsecondquantization,atthecostofhavingtocommittoaparticulardiscretebasis.Butregardlessofthecomputationalframework,eitherthewavefunctionobjectitself(infirstquantization)orthemany-electronbasis(insecondquantization)consistsofSlaterdeterminants,andinhigh-accuracymethodstheirnumbergrowsrapidlywithsystemsize.

Firstandsecondquantization.Illustrationon $N = 3$ electronsin1Dandafinitebasisofsize5. $r = (r_{1}, r_{2}, r_{3})$ .

Hartree–Fock

PerhapsthesimplestnontrivialansatzinQCisthesingleSlaterdeterminantofEq. 6,wheretheorbitals $ϕ_{j} (r)$ areconsideredasfreeparameters.Optimizedvariationally,thisansatzleadstotheso-calledHartree–Fock(HF)method.Inpracticetheorbitalsarelinearlyexpandedinafixedfiniteone-electronbasis, $φ_{k} (r)$ , $k = 1, \dots, K$ ,with $K \sim N$ inmostcases:

(8)

TheuseofafinitebasissetturnsthefunctionaloptimizationproblemofEq. 8intoacomputationalproblemwhosecostscaleswiththefourthpowerofthenumberofbasisfunctions, $O (K^{4})$ ,assuminganaiveimplementation.Onitsown,theHFansatzisexpressiveenoughtodescribemuchofchemistryqualitatively,butnotalwaysandcertainlynotquantitatively.However,itcanbeconsideredastartingpointformostwavefunction-basedQCmethods.Densityfunctionaltheory(DFT)isnotsuchamethod,relyinginsteadonanin-principleexactmappingoftheab-initioHamiltonian(Eq. 2)toamean-field-likeproblem,whichcanbesolvedexactlywithasingleSlaterdeterminant\citepJonesDFTReview20215,TealePCCP22.However,thevariationalprincipledoesnotholdinDFTbecausetheexchange-correlationcontributionstotheenergyfunctionalarenotknownexactlyandmustbeapproximatedinpractice.Fromhereon,wewillstaywithinthevariationalprincipleandinsteadfocusonincreasingtheexpressivenessoftheHFansatz.

Configurationinteraction

TheHFansatzcanbestraightforwardlyextendedbyformingmultipleSlaterdeterminantsfromdifferentsetsoforbitalsandconsideringtheirlinearcombination(Fig. 2c),

ψ (r_{1}, \dots, r_{N}) = \sum p c_{p} D_{ϕ_{p}} (r_{1}, \dots, r_{N}) .

(9)

Whentheorbitalsofeachdeterminantarepooledfromalargersupersetof(mutuallyorthogonal)fixedorbitalsofsize $M > N$ ,andtheonlyfreeparametersarethelinearcoefficientsofthedeterminants,theansatziscalledconfigurationinteraction(CI).OneoftheappealsoftheCIansatzisthatitsSlaterdeterminantscanbeconsideredamany-electronantisymmetricbasisandlabelledusingtheoccupationnumbersoftheone-electronstates.Thisso-calledsecondquantizedformalismhasmanyconvenientpropertiesforcomputation(seeBox LABEL:box:first-second-quant).ThesimplestversionofCI,calledfullCI(FCI),considersall $(\frac{M}{N})$ possibleSlaterdeterminantsandisexactwithinthechosenfiniteone-electronbasis.Intheusualcasewhen $M \sim N$ ,however,thecomputationaleffortscalesexponentiallywith $N$ ,whichmakesFCIapplicableonlytothesmallestmolecules.WaystotackletheexponentialscalingincludefixedtruncationoftheCIexpansionorits“compression”throughanalyticalmeans(coupledclustertheory,[Bartlett2007];matrixproductstates,[ChanDMRG2011]),deterministicpruning(selectedCI,[HuronJCP73]),orstochasticsampling(FCI-QMC,[BoothJCP09]).Section 5exploresanovelwayof“compressing”theCIexpansionthroughneuralnetworks.

Beyondfixedbases

TheeffectivenessoftheCIansatzdependsonthechoiceofthefixedmolecularorbitals $ϕ_{j} (r)$ fromwhichtheSlaterdeterminants $D_{ϕ_{p}} (r_{1}, \dots, r_{N})$ arebuilt.AnaturalextensionofCIallowsboththeorbitalsandtheCIexpansioncoefficients $c_{p}$ tovaryduringthevariationalminimization.Suchanansatzoftwostackedlinearcombinations(Eqs. 9 and 8)ishardertooptimizebutmuchmoreexpressive.Themostcommonvariantistoconsiderall $(\frac{M^{'}}{N^{'}})$ Slaterdeterminantsformedbyletting $N^{'} < N$ electronsoccupyaspaceof $M^{'} < M$ orbitals,whiletheremaining $N - N^{'}$ electronsoccupyafixedsetofinactiveobitals.Thisiscalledthecompleteactivespaceself-consistentfield(CASSCF)method\citepOlsenIJQC11.Duetothelargervariationalfreedom,aCASSCFansatztypicallyrequiresmanyfewerdeterminantsthanaCIansatzofcomparableaccuracy.ButCASSCFandevenFCIarestilllimitedbythefixedone-electronbasisusedtoformthemolecularorbitals(Eq. 8):FCIisonlyexactinthecompletebasissetlimit,whichinpracticecannotbereachedforanybutthesmallestmolecularsystems.AnextensionoftheCASSCFansatzwouldallownotonlytheone-electronorbitalsbutalsotheone-electronbasisfunctionstovary.Thestackedstructureofsuchanansatzwouldbereminiscentofdeepneuralnetworks,andSection 4explorestheculminationofthislineofthoughtbyincorporatingactualdeepneuralnetworksintotheansatz.Thisremovesanyapriorilimitationsontheexpressiveness.Bymakingeachindividualdeterminantmaximallyexpressive,suchansatzesfurtherreducethenumberofdeterminantsrequiredtoreachagivenaccuracy.

3 MachinelearningforelectronicSchrödingerequation

{mybox}

[label=sec:QMC]VariationalMonteCarloOptimizationofwavefunctionswithneuralnetworksnaturallyleadstothevariationalMonteCarlo(VMC)framework.First,MonteCarlointegrationofEq. 3canhandlearbitrarilycomplicatedansatzesforwhichanalyticalintegralsarenotavailable.Second,VMCsamplestheseintegralsstochasticallywhichnaturallycombineswiththestochasticgradientdescentusedforoptimizingneuralnetworks.IntraditionalQC,VMChasbeenusedextensivelywithreal-spacefirst-quantizedapproaches\citepFoulkesRMP01andmorerecentlyinthediscrete-basissecond-quantizedsetting\citepNeuscammanJAGPHilbert2013,SabzevariJCTC18.Theexpectationvalueofanyoperator,suchastheHamiltonian(Eq. 3),canbewrittenasaMonteCarlointegraloveracontinuousordiscretebasis, ${| x ⟩}$ ,

⟨^H ⟩_{ψ} = \int_{x} \frac{| ⟨ x | ψ ⟩ |^{2}}{⟨ ψ | ψ ⟩} \frac{⟨ x |^H | ψ ⟩}{⟨ x | ψ ⟩} = E_{x \sim | ⟨ x | ψ ⟩ |^{2}} [E_{loc} (x)] .

Here,theexpectationvalueisobtainedasanexpectedvalueofa“local”energy $E_{loc}$ ,localinthesensethatitisdefinedforeverybasiselement $x$ .AstraightforwardandgenerallyapplicablewaytoobtainthesamplesisMarkov-chainMonteCarlo(MCMC).MCMCisaniterativeprocedure,inwhichanewsamplepoint, $x^{'}$ ,isproducedfromacurrentone, $x$ ,bymakingaproposalstepwithprobability $g (x^{'} | x)$ ,andthenacceptingorrejectingtheproposalwithprobability

p = min (1, \frac{| ⟨ x^{'} | ψ ⟩ |^{2} g (x | x^{'})}{| ⟨ x | ψ ⟩ |^{2} g (x^{'} | x)}) .

TheresultingMarkovchainthensamples $| ⟨ x | ψ ⟩ |^{2}$ .VariantsofMCMCdifferintheconstructionoftheproposalstepsand $g$ ,andincludethesimplestMetropolisalgorithm( $g (x^{'} | x) = g (x | x^{'})$ )aswellasmoresophisticatedflavourssuchasLangevinMonteCarlo.TheVMCformulafortheexpectationvalueisexactinthelimitofinfinitesamplesize, $N \to \infty$ ,butinpracticeitincursastatisticalerrorproportionalto $\sqrt{V a r [E_{loc}] / N}$ .While $1 / \sqrt{N}$ convergesslowlywithsamplesize,VMChasthegreatbenefitthattheastheansatzconvergestotheexacteigenstates,thelocalenergyconvergestoaconstant(theexactenergy),andassuchitsvariancevanishesandsodoesthestatisticalsamplingerror.

3.1 Mappingquantummechanicstomachinelearning

Figure 3: VariationalMonteCarlowithneuralnetworks.Electronpositions, $r_{j}$ ,ororbitaloccupationnumbers, $n_{k}$ ,describeanelectronconfigurationwhichisaninputtothewavefunction, $ψ$ ,representedbyaneuralnetworkparametrizedwith $θ$ .Thewavefunctionisusedintwoways:first,tosamplenewelectronconfigurationswhichprovidenewinputtotheneuralnetwork(yellow),andsecond,toevaluatetheelectronicenergy,whichisminimizedbyvaryingthenetworkparameters(blue).

Electronicstructure	Machinelearning
Wavefunction	Probabilitydistribution
Naturalorbital	Marginaldistribution
Stochasticreconfiguration	Naturalgradientdescent
Hartree–Fock	Mean-fieldvariationalBayes
DiffusionMonteCarlo	Particlefiltering;
	SequentialMonteCarlo

Table 1: Dictionaryofelectronicstructureandmachinelearning.

AMLproblemanditssolutionarespecifiedbythemodel,itsinputsandoutputs,thedata,andtheoptimizationcriterion(lossfunction).Inthisregard,solvingtheSchrödingerequationwiththevariationalprincipleamountstothefollowingMLproblem(Fig. 3).Theneuralnetwork(Section 3.2)representsawavefunction,whichacceptselectroncoordinates(firstquantization)oroccupationnumbers(secondquantization)asinputandoutputsthewavefunctionvalue.Thelossfunctionistheenergyexpectationvaluecorrespondingtothiswavefunction.Theinputsaresampledfromtheprobabilitydistributiongivenbythesquareofthewavefunctionrepresentedbythecurrentneuralnetwork,andtheHamiltonianoperatorisusedtoobtainanestimateofthelossfunctionfromthesamples.Theparametersofthenetwork,andthusthewavefunction,arethenmodifiedtominimizethelossfunction.Exceptfortherepresentationofthewavefunctionasanetwork,thisistheregularvariationalMonteCarlo(VMC)framework(Box LABEL:sec:QMC).Theoptimizationmethodsused(Box LABEL:sec:optimization)arealsofairlyconventional,althoughadaptedtoaneuralnetworkcontext.ThisstraightforwardcorrespondencebetweentheSchrödingerequationandMLledtotheintroductionofsimilarconceptsonbothsides,albeitknownunderdifferentnames(Table 1).Theapplicabilityofdeeplearningforquantum-mechanicalcalculationswasfirstrealizedandexploitedby\citetCarleoS17forthecaseofspinlatticesinoneandtwodimensions.Theirapproach,knownasNeuralQuantumStates(NQS),hassincebeenappliedtomanydifferentquantumsystems\citepsaito2017solving,nomura2017restricted,corey2021variational,nikita2021broken.Inessence,thisreviewisconcernedwiththeextensionofthisapproachtoelectronsinmolecules.{mybox}[label=sec:optimization]Optimizingneural-networkansatzesUptothestatisticalerror,theVMCexpectationvaluefortheenergy(Box LABEL:sec:QMC)obeysthevariationalprinciple(Eq. 4).VMCexploitsthisbyvaryingaparametricwavefunctionansatz $ψ_{θ}$ soastominimizetheenergy.Forasufficientlyexpressiveansatz,thevariationalenergywilleventuallyapproximatethegroundstateenergyofEq. 1andtheansatzwillapproximatethegroundstatewavefunction $Ψ$ .Themoststraightforwardoptimizationmethodisgradientdescent,wheretheparametersareiterativelyupdatedas

θ \leftarrow θ - η \frac{\partial E (θ)}{\partial θ}

withlearningrate $η > 0$ .Theenergygradientisgivenby

\frac{\partial E}{\partial θ_{k}} = ⟨ {^O}_{k}^{*}^H ⟩ - ⟨^H ⟩ ⟨ {^O}_{k}^{*} ⟩,

where

{^O}_{k} (x) = \frac{\partial ln ψ (x; θ)}{\partial θ_{k}}

isanoperatorrepresentingthelogarithmicderivativesofthewavefunction.ThisgradientcanbeefficientlyestimatedusingMonteCarlointegration(Box LABEL:sec:QMC).Insomecasestheoptimizationcanbespedupandmademorestablewithhigher-ordermethods,suchasthestochasticreconfiguration(SR)scheme\citepSorellaPRL98.SRtakesthecorrelationbetweenindividualvariationalparametersintoaccountbyintroducingthequantumgeometrictensor $S$ :

S_{k k^{'}} = ⟨ O_{k}^{*} O_{k^{'}} ⟩ - ⟨ O_{k}^{*} ⟩ ⟨ O_{k^{'}} ⟩ .

Theupdateruleisthenmodifiedto

θ \leftarrow θ - η S^{- 1} \frac{\partial E (θ)}{\partial θ} .

TheSRschemeapproximatesanimaginary-timeevolutionwhereeachiterationtriestobestapproximatethestate $e^{- η^H} | ψ ⟩$ .SRissimilartothenaturalgradientdescentalgorithm\citepamari˙natural˙1998thatiswell-knownintheMLcommunity,and $S$ canbeinterpretedasaquantumgeneralizationoftheFisherinformationmatrix\citepAy17.Insomecases,itisconvenienttoapproximatethequantumgeometrictensor $S$ usingtheKronecker-factoredapproximatecurvature(KFAC)approach\citepmartens2015optimizing.

3.2 Deeplearning

Thestandardpracticeinab-initioQCtodayisinsomewaysanalogoustothestateofcomputervisionbeforetheriseofdeeplearning.Priorto2012,thebestpipelinesforlarge-scaleimagerecognitionconsistedofacombinationofhand-designedfeaturesandsimpleMLmodels\citepperronnin2010large.Asingledeepconvolutionalneuralnetworktrainedend-to-endwasabletocuttherecognitionerrorinhalfrelativetothesesystems\citepkrizhevsky2012imagenet,andsincethendeepneuralnetworkshavedominatedcomputervisionresearch.Inab-initioQC,ground-statesolutionstotheSchrödingerequationareusuallyrepresentedbyawavefunctionansatzwitharelativelysimplefunctionalform,andparametersareusuallyfitthroughamixofprocedures(fixed-pointiteration,variationaloptimization)ratherthanaunifiedend-to-endestimationofallparameterssimultaneously.ThedevelopmentofdeepQMCmethodsisdrivenbythehopethattheuseofneuralnetworkswillsignificantlyincreasetheexpressivenessofwavefunctionansatzes,enablinglargeleapsinaccuracyasinimagerecognition.ToappreciatehowandwhydeepneuralnetworkscanbeusefullyappliedinQC,abriefreviewoftheirapplicationinartificialintelligenceisnecessary.Forathoroughreviewofthehistoryofdeeplearning,see\citetschmidhuber2015deep,andforareviewofthefundamentalconceptsindeeplearning,see\citetlecun2015deep.Neuralnetworksdatebacktotheverybeginningofcomputerscience\citepmcculloch1943logical,andtheirmodernformoriginateswiththesingleperceptron“unit”\citeprosenblatt1958perceptron,whichproducesasoutputanon-linearfunctionofthesumofaconstant,knownasthebias,andalinearcombinationofitsinputs.Thenon-linearfunctionrisesfromzerotooneasitsinputincreases,mimickingtheactivationfunctionofabiologicalneuron.Whenmanysuchunitsareassembledinparalleltoforma“layer,”andseverallayersarecomputedserially,takingtheoutputfromonelayerastheinputtothenext,theresultingmulti-layerperceptron(MLP)can,intheory,representanysmoothfunctiontoarbitraryaccuracygivenenoughunits\citephornik1989multilayer.However,actuallyfittingorlearningasetofparametersthatmatchesanygivenfunctionisdifferentmatter.Aformofgradientdescentutilizingderivativescomputedusingbackpropagation,orreverse-modeautomaticdifferentiation\citepwerbos1974beyond,linnainmaa1970representation,linnainmaa1976taylor,wasfoundtobeeffectivefortrainingneuralnetworks\citeprumelhart1986learning.Thisledtoawaveofenthusiasmforneuralnetworks,whicheventuallyfadedasseveralissueswerediscovered,suchastheinfamous“vanishinggradients”andgettingstuckinlocalminima.Severalfactorswereinstrumentalinrehabilitatingneuralnetworksunderthebannerof“deeplearning”:acombinationofalgorithmicadvances\citepglorot2010understandingandtheuseofmodernGPUhardware\citephooker2020hardwaremadethecomputationsmuchfaster,andtheresultingabilitytotrainlargernetworksmadeissueswithlocalminimalesssevere\citepdauphin2014identifying,choromanska2015loss.Furthermore,deepneuralnetworkswiththehelpofstochasticgradientdescentcanbeappliedstraightforwardlyandefficientlytolargedatasets,unlikeotherMLmodels\citepbottou2008learning,bottou2011tradeoffs.Finally,empiricalsuccesseslikewinningtheImageNetLargeScaleVisualRecognitionChallenge\citeprussakovsky2015imagenethelpedlegitimizedeeplearningresearchandgenerateexcitementamongresearchers.Today,thebarriertoentryfordevelopingandtrainingdeepneuralnetworksisquitelow,thankstoamatureecosystemofsoftwarelibrariesfornumericalcomputingwithautomaticdifferentiationandhardwareaccelerators\citepAbadiOSDI16,paszke2017automatic,bradbury2018jax.However,actuallyachievinggoodperformancefromadeeplearningmodelstillrequiressomefinesseandapplicationofvariousheuristics.Itissafetosaythatasignificantamountofthepracticeofdeeplearningremainsmoreartthanscience.Thegoodnewsisthatonceeffectiveheuristicsforaparticularproblemdomainhavebeendeveloped,thesesameheuristicscanoftenbeappliedwithlittlemodificationtootherproblemsinthesamedomain.

3.3 Neuralnetworkarchitectures

Thestartingpointformostneuralnetworksisthemulti-layerperceptron(MLP),formedasacompositionof $L$ layers,

	$M L P (x)$	$= f^{L} \circ f^{L - 1} \circ \dots \circ f^{1} (x),$		(10)
	$f^{ℓ} (z)$	$= f (W^{ℓ} z + b^{ℓ}),$		(10)

where $f$ issomenon-linearactivationfunction,and $W^{ℓ}$ and $b^{ℓ}$ arethematricesofweightsandvectorsofbiasestolearn.WhileavanillaMLPiscapableofrepresentingarbitraryfunctions,therealpowerofneuralnetworkscomesfrommoresophisticatedarchitectures.Manyofthesearchitecturesaredesignedtoencodesomeparticularinvarianceorequivariance—thatis,whentheinputtothenetworkistransformedinaparticularway,theoutputshouldeitherbeunchangedorshouldtransforminacorrespondingway.Forinstance,theweightsinalayerofaconvolutionalneuralnetwork(ConvNet)\citeplecun1998gradientarerestrictedtobeadiscreteconvolutionoperator,whichconstrainseachlayertobetranslation-equivariant,anaturalconstraintforimagerecognition,andalsodramaticallyreducesthenumberofpossibleweightsinalayer.Equivariancetopermutationisanotherfrequentlyusefulproperty,andonethatisespeciallyimportantinreal-spaceapproachestorepresentingelectronicwavefunctions(seeSection 4).Asimplepermutation-equivariantlayerfirstproposedby\citetshawe1989buildingcanbeconstructedbyapplyingthesametransformationtoeachinputandsummingtheresults.Moresophisticatedpermutation-equivariantlayersareusedbymodelsliketheTransformer\citepvaswani2017attentionorSchNet\citepschutt2018schnet.Manyoftheseequivariantlayerscanbeunifiedinaconceptualframeworkbasedaroundthelanguageofgeometryandgrouptheory,whereinthechoiceoftransformationtobeequivarianttoleadsnaturallytorecipesforconstructingtheappropriateneuralnetworklayers\citepbronstein2021geometric.Anotherclassofneuralnetworkarchitectures,whichhavebeeninfluentialaswavefunctionansatzes,arerestrictedBoltzmannmachines(RBMs)\citephinton2006reducing.Thesewereoriginallydevelopedforunsupervisedlearning,butintheVMCsettingconsideredheretheyleadtoasimpledeterministicexpressionforthelogprobabilitythatcloselyresemblesaone-layerMLP.Despitetheirearlypopularity,RBMshavebeenlargelyeclipsedintheAIcommunitybyothermethodsforunsupervisedlearning,suchasvariationalautoencoders\citepkingma2013auto,generativeadversarialnetworks\citepgoodfellow2014generative,normalizingflows\citeprezende2015variational,autoregressivemodels\citepoord2016wavenet,oord2016conditional,anddiffusionmodels\citepsohl2015deep.Infact,someofthesenewermodelshavestartedtohaveanimpactasneuralnetworkwavefunctionansatzesforspinsystems.Examplesaredeepautoregressivequantumstates\citepsharir2020deep,convolutionalneuralnetworks\citepchoo2019two,recurrentneuralnetworks\citephibat-allah˙recurrent˙2020,andnormalizingflows\citepxie˙ab-initio˙2021.

4 Electronsinfirstquantization

Figure 4: Neural-networkarchitecturesforselectedreal-spacewavefunctions.(a)OriginalPauliNetarchitecturefrom\citepHermannNC20.(b)OriginalFermiNetarchitecturefrom\citepPfauPRR20.Botharchitectureshavebeenmodifiedandextendedbyvariouscontributionsmentionedinthisreview.(c)Approachtocomputingexcitingstatesin\citepEntwistle22.

Oneapproachtostudyingtheelectronicproblemwithdeeplearningistoworkwithparameterizedmany-bodywavefunctionsinfirstquantization, $ψ (r; θ)$ .Here $r$ standsforthe $N$ -tupleofelectroncoordinates, $r_{1}, r_{2}, \dots, r_{N}$ ,andsamplingisrealizedoverelectronicpositions $r$ (Box LABEL:sec:QMC).Theantisymmetryconstraint(Eq. 5)mustbeimposedin $ψ$ toavoidcollapsingontoalower-energybosonicstate.Acommonlyadoptedformis $ψ (r; θ) = S (r; θ) \times A (r; θ)$ ,wherethefirstfactorissymmetric(or“bosonic”)underexchangeofelectroncoordinatesandthesecondfactorcarriesthenecessaryantisymmetry.ThesimplestandmostcommonapproachistobuildtheantisymmetricpartofthewavefunctionsusingSlaterdeterminants(Eq. 6).AsdiscussedinSection 2,singleSlaterdeterminantswithfixedorbitalshavelimitedexpressivenessandmanysuchdeterminantsneedtobecombinedtoachievehighaccuracy.Anaturalgeneralizationofasumoffixed-orbitalSlaterdeterminantsisthecommonly-usedSlater–Jastrowwavefunction

ψ (r; θ) = e^{J ({r}; θ)} \sum k c_{k} ∣ ∣ ∣ ∣ ∣ ∣ \begin{matrix} ϕ_{1}^{k} (r_{1}; θ) & \dots & ϕ_{1}^{k} (r_{N}; θ) ⋮ & ⋱ & ⋮ ϕ_{N}^{k} (r_{1}; θ) & \dots & ϕ_{N}^{k} (r_{N}; θ) \end{matrix} ∣ ∣ ∣ ∣ ∣ ∣

(11)

wheretheJastrowfactor, $J ({r}; θ)$ constitutesthesymmetric(“bosonic”)partofthestateandtypicallycontainsone-andtwo-body(andinmanycaseshigher-order)parameterizedcorrelations.Thesetnotation, ${r} \equiv {r_{1}, \dots, r_{2}}$ ,indicatesthat $J$ doesnotdependontheorderoftheelectroncoordinates.ThedeterminantsinEq. 11aretypicallyreplacedwiththeproductofspin-upandspin-downdeterminants\citepFoulkesRMP01.Separatingtheup-anddown-spindeterminantsimprovescomputationalefficiency,simplifiestheimplementation,andmakesiteasiertohandletheelectron-electroncusps,whileleavingexpectationvaluesofspin-independentoperatorsunchanged.Moreflexibleparametricformscanbeobtainedleveragingtheapproximationpowerofartificialneuralnetworks.Inthefollowing,wediscussneural-network-basedstrategiestoparameterizetheseforms.

4.1 Discretespace

Thefirstapplicationsofneuralnetworkstoelectronicsystemswereforelectronsmovingindiscretizedspace,asrealized,forexample,inthe2DHubbardmodelofstrongly-interactingelectrons.Inthefollowing,forsimplicity,wediscussthecaseof $N$ spinlesselectronsin $M$ latticesites,anddenotewith $l (r) \in [1, M]$ thediscretelatticeindexcorrespondingtoelectronposition $r$ .Theextensiontothespinfulcasewillbeconsideredmoreindetailwhendiscussingcontinuousspacelateron.Thesymmetricpart $S (r; θ)$ canbereadilyparameterizedwithastrategycloselyrelatedtoNQSforspins:

S (r; θ) = g (n (r); θ),

(12)

where $n (r)$ istheuniqueoccupationnumberrepresentationcorrespondingtotheelectronicpositions $r$ and $g$ representagenericfunctionwhichcouldberepresentedbyaneuralnetwork.Sincetheoccupationnumbers $n (r)$ areinvariantunderpermutationoftheelectronpositions, $g (n (r))$ isalsosymmetricunderexchange.AnyoftheNNarchitecturesalsoadoptedforspinsystems\citepCarleoS17orlatticebosons\citepsaito2017solvingcanbeusedtorepresentthesymmetricpart $S$ .EarlyworksontheHubbardmodeladoptedpositive-definiteRBM-basedparameterizationsof $S (r; θ)$ \citepnomura2017restricted,whilemorerecentworkshaveadopteddeep-networkparameterizationsallowingforsignchanges\citepstokes˙quantum˙2020.Thesimplestparameterizationfortheantisymmetricpart, $A (r; θ)$ ,isagainaSlaterdeterminant

A (r; θ) = ∣ ∣ ∣ ∣ ∣ \begin{matrix} ϕ_{1} (r_{1}; θ) & \dots & ϕ_{1} (r_{N}; θ) ⋮ & ⋱ & ⋮ ϕ_{N} (r_{1}; θ) & \dots & ϕ_{N} (r_{N}; θ) \end{matrix} ∣ ∣ ∣ ∣ ∣,

(13)

wherethematrix $Φ \in C^{N \times M}$ ofdiscreteorbitals $ϕ_{i} (r_{j}) = Φ_{i, l (r_{j})}$ holdsthevariationalparameterstobeoptimized.Thisapproach,however,hastheimportantdrawbackofnotprovidingenoughvariationalflexibility,sinceiteffectivelyfixestheanti-symmetricparttoamean-fieldreferencesolution.

Neuralbackflow

Asignificantimprovementisobtainedbyconsideringamany-bodybackflowtransformationoftheorbitals\citepfeynman˙energy˙1956,kwon˙effects˙1993.Inthisvariationalform,thematrixofone-electronorbitals $Φ$ ispromotedtoaparameterizedmany-electronfunctiondependingonalltheoccupationnumbers:

(14)

where $Δ$ isacorrectiontothesingle-particleorbitals $Φ$ .Inphysics-inspiredparameterizations, $Δ$ istypicallytakentobeasimplefunctionoftheelectronicoccupationnumbers\citeptocchio˙role˙2008.Theneuralbackflowmethod\citepLuoPRL19insteadintroducedaflexibleparameterizationofthebackfloworbitalsbasedonartificialneuralnetworks.Inthiscase, $Δ$ isparameterizedwithaMLPtakingasinputstheelectronicoccupationnumbersandoutputingamany-bodycorrectiontothematrix $Φ$ .Thisapproachallowstheorbitalstodynamicallychangedependingonthepositionsoftheelectrons,thusallowingonetoincludegenuinelymany-bodycorrelationsintheantisymmetricpartofthewavefunction.

Constrainedhiddenfermions

Neuralbackflowtransformationsarenottheonlywaytointroduceflexibleparameterizationsoftheantisymmetricpartofthewavefunction.Theconstrainedhiddenfermionformalismbuildsontheideaofintroducingasetof $~ N$ auxiliaryfermionicparticles,withpositions $q$ ,andlivingon $~ M$ latticesites.Theseauxiliaryparticlesareusedtoeffectivelymediatecorrelationsamongthephysicaldegreesoffreedom\citeprobledo˙moreno˙fermionic˙2022.Calling $~ A (r, q; θ)$ aSlaterdeterminantfortheextended(physical+hidden)system,theresultingantisymmetricformforthephysicalsystemisgivenby

A (r; θ) = ~ A (r, F (r; θ)) .

(15)

Inthisexpression, $F$ isafunction,parameterizedbyaneuralnetwork,mappingthephysicalpositionstothehiddenones.Thisapproachhasbeenshowntoimprovesystematicallyovertheneuralbackflowformforthe2DHubbardmodel\citeprobledo˙moreno˙fermionic˙2022.

4.2 Continuousspace

Wenowfocusondescribingtheimportantcaseoffirst-quantizedelectronsincontinuousspace,directlycorrespondingtotheelectronicSchrödingerequation.Asinthediscrete-spacecase,theSlater–Jastrowformmaybeimprovedinamattersuitableforusewithneuralquantumstatesbyaddingabackflowtransformation,inwhichtheone-electronorbitals $ϕ_{i} (r_{j}; θ)$ arereplacedbymany-electronfunctions.Thebackflowtransformationcaneithermodifytheorbitalsdirectlyviaamultiplicativeand/oradditiveterm:

(16)

oractasaquasiparticletransformationoftheelectroncoordinates:

(17)

wheretheparamterizedfunctions, $f_{i}^{⨂}, f_{i}^{⨁}, ξ$ ,areinvarianttopermutationsof ${r}$ ,and $ξ ({r}; θ)$ isathree-componentvectorthatmodifies $r_{j}$ .Ifweconsideradeterminantoforbitalsofthisform,

∣ ∣ ∣ ∣ ∣ \begin{matrix} ϕ_{1} (r_{1}; {r}) & \dots & ϕ_{1} (r_{N}; {r}) ⋮ & ⋱ & ⋮ ϕ_{N} (r_{1}; {r}) & \dots & ϕ_{N} (r_{N}; {r}) \end{matrix} ∣ ∣ ∣ ∣ ∣,

(18)

thenweseethatorbitalswithbackflowtransformationsarejustoneexampleofabroaderclassoffunctions:inorderforthedeterminanttobeantisymmetric,thematrixwithelements $Φ_{i j} = ϕ_{i} (r_{j}; {r})$ mustbepermutation-equivariant;thatis,exchangingelectrons $k$ and $l$ alsoexchangescolumns $k$ and $l$ .WhiletraditionalSlater–Jastrow–backflowwavefunctionshavehadconsiderablesuccess,theyalsohavelimitationsduetothechoiceoffixedfunctionalforms.Thegoal,therefore,istocomeupwithmoreflexiblepermutation-equivariantfunctions.Herewehighlightseveralapproachesthatsharethiscommontheme.

Figure 5: Automerizationofcyclobutadienewithneural-networkansatzes.BothPauliNetandFermiNetpredictrelativeenergieswithintherangeofexperimentalvaluesandagreewithmultireferencecoupledcluster.ThePauliNetconvergesmorequickly,whiletheFermiNetreacheslowertotalenergy.Figuremodifiedfrom\citetSpencer20.

Iterativebackflow

\citet

TaddeiPRB15introducedaformofbackflowthatappliedEq. 17repeatedlyinaninterativefashion.Suchanansatzisformallyequivalenttoexpressingthebackflowasadeepneuralnetwork\citepRuggeri2018-ql,albeitwithartificialrestrictiononthedimensionalityofthehiddenlayers.Theiterativebackflowwasusedforstudyingthe $^{3}$ Heand $^{4}$ Heliquids

DeepWF

TheDeepWF\citepHanJCP19approachusesanansatzsimilartoaSlater–Jastrowwavefunctionbutwithasimplerantisymmetricterm:

ψ (r) = S ({r}, R) A^{↑} (r^{↑}) A^{↓} (r^{↓}) .

(19)

Thelearnedsymmetricfunction $S$ issimilartoaJastrowfactorandensuresthatthewavefunctioncapturestheelectron-nuclearandelectron-electroncuspconditions.Theantisymmetricfactors $A^{σ}$ areconstructedfromtheVandermonde-likedeterminantofanexplicitlyantisymmetrictwo-bodyfunction, $A^{σ} = \prod_{1 \leq i \leq j \leq N} (a (r_{i}, r_{j}, r_{i j}) - a (r_{j}, r_{i}, r_{i j}))$ .Thetwo-bodyantisymmetricfunctionisentirelylearned.Suchafunctionalformcanbeevaluatedin $O (N^{2})$ operations,comparedto $O (N^{3})$ foradeterminant.However,theuseofasimplifiedantisymmetricfunctionisalsolikelytolimittheaccuracyachieved:DeepWFobtainsonly43.6%ofthecorrelationenergyfortheberylliumatomanddoesnotevenreachHFaccuracyfortheboronatom.ThePauliNetandFermiNetapproachesdescribedbelowdomuchbetter.VanillaPauliNetobtained99.94%and97.3%ofthecorrelationenergiesfortheberylliumandboronatoms,andFermiNet99.97%and99.83%,respectively.Furthermore,FermiNetandPauliNetbothsubstantiallysurpassconventionalSlater-Jastrow-backflow(SJB)wavefunctionsonfirst-rowatoms,forwhichnearlyexactbenchmarkvaluesexist.

PauliNet

PauliNet\citepHermannNC20buildsuponHForCASSCForbitalsasaphysicallymeaningfulbaselineandtakesaneuralnetworkapproachtotheSJBwavefunctioninordertocorrectthisbaselinetowardsahigh-accuracysolution(Fig. 4a).Cuspconditionsareexplicitlymetviatheinclusionofcuspcorrectiontermsinthewavefunction\citepMa2005-cusps.Agraph-convolutionalblockbasedonSchNet\citepschutt2018schnetisusedtocreateapermutation-equivariantlatentspacerepresentationdependingonthemany-electronconfiguration.ThisembeddingisthenpassedintoseparatedeepneuralnetworksthatlearntheJastrowfactoranda(cuspless)backflowtransformation.\citetHermannNC20introducedPauliNetwithapurelymultiplicativebackflowasshowninFig. 4a;\citetSchatzleJCP21generalizedthistoamultiplicativeandadditivebackflowasshowninEq. 16.PauliNetisoptimizedwithafixednumberofSlaterdeterminants.Mostoftheresultsreportedin\citetHermannNC20,SchatzleJCP21wereobtainedwitharound10determinants.

FermiNet

FermiNet\citepPfauPRR20takesamoreminimalist(ormachine-learningmaximalist)approachandattemptstotrainaneuralnetworktorepresenttheentirewavefunction(Fig. 4b).FermiNetusestwoparallelnetworks,describingone-andtwo-electronfeaturesrespectively.Theinputstoeachlayerintheone-electronstreamarepermutation-equivariantfunctionsoftheactivationsfromthepreviouslayersoftheone-andtwo-electronstreams.Thefinallayerprojectsthelatentspaceintotherequirednumberoforbitals,fromwhichdeterminantscanbeformedandevaluated.AswithPauliNet,thefinalwavefunctionisasumoveranumberofdeterminants.Formostoftheresultsreportedin\citetPfauPRR20,16determinantswereused.FermiNetbuildsuparichdescriptionofelectron-electroninteractionsfromthepermutation-equivariantmixingofinformationdescribingone-andtwo-electronfeatures.Inparticular,theelectron-nuclearandelectron-electroncuspsinthewavefunctionarerepresentedaccurately,despitenotbeingencodedexplicitly.WhereasPauliNetisusuallytrainedwiththeADAMoptimizer,FermiNettrainingwasfoundtobesubstantiallyimprovedwhenemployingtheKFACoptimizer.WhilebothPauliNetandFermiNetexceedtheaccuracyofconventionalSJBwavefunctionsonsmallsystems,thereareimportanttradeoffsbetweenthetwomodels.ResultsfrombothontheautomerizationofcyclobutadienecanbeseeninFig. 5.TheFermiNetistypicallytrainedwithalargernumberofparametersthanthePauliNet,requiringmoreiterationsandmorecomputationperiterationtoconverge,butittypicallyconvergestoalowerabsoluteenergy.Recently,\citetgerard2022goldproposedahybridansatzwhichusesneuralnetworklayerssimilartotheSchNetandPauliNetinaFermiNet-likearchitecture.ThishybridansatzwasfoundtoreachevenlowerabsoluteenergiesthantheFermiNetonsystemslikebenzeneandthepotassiumatom.

Potentialenergysurfaces

Typicallyoneoptimisesawavefunctionataspecificgeometrybutthisquicklybecomesprohibitivelyexpensiveforexploringthehigh-dimensionalpotentialenergysurfaceofevenrelativelysmallmolecules.\citetScherbelaNCS22developedatrainingmethodologythatallowsweightsharingbetween(simplified)PauliNetarchitecturestargetingdifferentgeometries.Byswitchingthegeometrybeingtrainedateachepoch,theyshowedthatthecomputationalcostfortrainingacrossasetofgeometriescanbeimprovedbyanorderofmagnitudewithoutaffectingtheaccuracyofthefinalenergies,with95%ofnetworkparameterssharedacrossallgeometries.Thisimpliesthatthenetworkislearningfeaturesofelectroncorrelationingeneralratherthanfittingtoaspecificgeometry.Theyalsodemonstratedthatawavefunctionforalargermoleculecouldbeinitialisedfromawavefunctionforasmallermoleculeandcouldthenbefine-tunedinarelativelyshortoptimizationstage.PretrainingneuralnetworkwavefunctionsfromsmallersystemshasalsobeenshowntodramaticallyaccelerateconvergenceforKagomelatticemodels\citepYang2020-bk.Inasimilarvein,\citetGao2021-cg,gao2022samplingdemonstratedthatameta-learningapproach,whereagraphneuralnetworkisusedtoparameterizeawavefunctionmodel,canaccuratelyrepresentthewavefunctionsformultiplegeometries,enablingafullyquantum-mechanicalpotentialenergysurfacetoberepresentedinasinglemodel.TheirapproachusedaFermiNet-likewavefunctionmodel,butthemeta-learningconceptisdirectlyapplicabletootherwavefunctionrepresentations,assumingthewavefunctionformissufficientlyflexible.

Periodicsystems

Therehasalsobeenprogressonusingfirst-quantizedneuralnetworkarchitecturesinperiodicsystems,suchasinteractingquantumgasesinlowdimension\citeppescia˙neural-network˙2022,theelectrongas\citepwilson2022-ueg,cassella2022-ueg,Li2022-abinitio,andforsmallcellsofsolidssuchaslithiumhydrideandgraphene\citepLi2022-abinitio.Again,sufficientlyexpressivenetworksattheVMClevelhavebeenfoundcapableofrivallingorsurpassingtheaccuracyoffixed-nodediffusionMonteCarlocalculationsusingconventionalSlater-Jastrow-backflowtrialwavefunctions.

4.3 Extensions

Pseudopotentials

Theelectronicstructureofheavyatoms,especiallytransitionmetals,iscomplicatedandchallengingforallQCmethods.ThedifficultyiscompoundedbythehighcomputationalcostofvariationalMonteCarlomethods,whichscaleroughlyas $O (Z^{5})$ \citepHammond1987,where $Z$ isthenuclearcharge.Whilstthecoreelectronscontributeheavilytothetotalenergy,energydifferencesarelargelydeterminedbythebehaviourofthevalenceelectrons.Thecoreelectronscanthereforeberemovedandtheeffectivenuclearchargereducedbytheuseofpseudopotentials.Theuseofpseudopotentialsiscommoninmanymethods,includingdensityfunctionaltheoryandconventionalvariationalMonteCarlo.\citetLi2022-pseudodemonstratethateffectivecorepotentialscanbereadilycombinedwithFermiNetandachieveaccuracycomparabletoCCSDT(Q)extrapolatedtothecompletebasissetlimitforfirst-rowtransitionmetalatoms.Thecomputationaltimeperiterationwasreducedby43%(17%)forthescandium(zinc)atomusinganargoncore.Again,thisapproachisnotrestrictedtoFermiNet.Pseudopotentialscanbeusedwithanyfirst-quantizedneuralnetworkwavefunction.

DiffusionMonteCarlo(DMC)

ProjectormethodssuchasDMC\citepneeds2020variationalandauxiliary-fieldMonteCarlo\citepShiJCP21gobeyondVMCbyusingstochasticalgorithmstosamplethegroundstatewithoutrequiringitswavefunctiontoberepresentedasaknownfunctionornetwork.DMCisinprincipleexactbut,formany-fermionsystems,reliesinpracticeonthefixed-nodeapproximation,inwhichcollapsetothebosonicgroundstateisavoidedbyimposingthesignstructureofthetrialwavefunctionontheDMCwavefunction.ADMCsimulationthereforesamples(stochastically)thelowestenergystatewiththesamesignstructureasthetrialwavefunction.TheimprovementsthatresultfromapplyingDMCtoconventionalSlater-Jastrow-backflowtrialfunctionsoptimizedusingVMCmethodsaresubstantial,explainingwhyDMCissooftenusedtoprovideimprovedestimatesoftheground-statewavefunctionandenergy.\citetWilson21combinedDMCwithaFermiNettrialwavefunction.Forfirst-rowatoms,DMCcapturedmuchoftheremainingcorrelationenergy(94%ofthedifferencebetweentheVMCenergyandtheexactenergyinthecaseofthenitrogenatom).However,\citetWilson21usedasimplifiedFermiNetthatgaveVMCenergieshigherthanthosereportedby\citetPfauPRR20,whichwerealreadywithin1mHofexactresultsforallfirst-rowatoms.Givenevidencethatthemean-fieldequivalentofPauliNetcanessentiallymatchHFinthecompletebasissetlimit\citepSchatzleJCP21,itispossiblethattheremainingerrorinPauliNetandFermiNetwavefunctionsisdominatedbyerrorsinthenodalsurface,whicharerarelysampledregionsduringoptimisation.Ifthisisthecase,diffusionMonteCarlowiththefixednodeapproximationmaynotproducesubstantiallylowerenergies.Ontheotherhand,sinceneuralnetworkwavefunctionsroutinelycaptureover90%ofthecorrelationenergyattheVMClevel,theneedtoperformexpensivediffusionMonteCarlocalculationsisgreatlyreduced.Morerecently,\citetRen22showedthatDMCcancaptureroughlyhalfoftheremainingcorrelationenergyfortheatomsLi-Ar,whenusingaverysmallFermiNet-basedarchitecture.WhilstitispossibletoachieveenergieswithinchemicalaccuracyusingFermiNetattheVMClevel,thesecalculationsmodelthecaseforlargersystemswhereconvergingtheenergywithrespecttonetworksizemightnotbefeasible.\citetRen22wentontodemonstratethatDMCusingFermiNettrialwavefunctionsnoticeablyreducestheenergyforlargersystems.Inthecaseofthebenzenedimer,thereductionwas50mH.

ExcitedStates

Ourdiscussionsofar,andmostVMCcalculations,havefocusedongroundstateproperties.However,excitedstatesareofcriticalimportancetounderstandthebehaviourofmaterials.Fortunately,recentalgorithmicdevelopmentsbymultiplegroupshavedemonstratedthatthecalculationofexcitedstatesusingVMCmethodsisfeasibleandcanachieveanacceptabletrade-offinaccuracyandcost.HerewehighlightthreesuchapproachesutilizingconventionalVMCwavefunctions.Oneapproachisthestate-averagedVMCmethod\citepSchautz2004,Dash2019,inwhichtheaverageenergyovermultiplestatesisminimisedandindividualstatesareprojectedoutviadiagonalizationwithinthebasisofexcitedstates.Similartechniquesareusedwithotherquantumchemistrymethods.\citetZhao2016insteadminimizedadifferentobjectivefunction,suchthatthestatewithenergyclosesttoadesiredenergytargetisobtained.\citetPathak2021suggestedasimplealternative,whereastateisforcedtobe(approximately)orthogonaltoalllowerenergystatesviaapenaltyterm.ThesetechniquescanbereadilyappliedtoVMCusingneural-networkwavefunctionsand,inparticular,penaltyfunctionapproacheshaverecentlybeenexplored.Aswithground-statecalculations,theflexibilityofthewavefunctionansatztorepresentthedesiredstateiscritical.\citetEntwistle22demonstratedthatthePauliNetarchitecturecombinedwithapenaltyfunctioncanrepresentthelowestfewexcitedstatesofmoleculesuptothesizeofbenzene(Fig. 4c).Relatedly,\citetChooPRL2020demonstratedthatNQSonlatticemodelscanobtainthelowest-energystateofanygivenAbeliansymmetrybyperformingwhatisessentiallyaground-statesimulationinthatsymmetrysector,andmultiplestatesofthesamesymmetryusingapenaltyfunction.However,themostaccurateandefficientwaytoobtainexcitedstateswithinVMC,irrespectiveofwavefunctionansatz,remainsanopenquestion\citepCuzzocrea2020.

5 Electronsinsecondquantization

Figure 6: Electronicenergiesformoleculesandsolidsinsecondquantization(a)Dissociationcurvefor $C_{2}$ moleculeintheSTO-3Gbasis.ThegreenstarsshowresultsforarestrictedBoltzmannmachinewhichrepresentstheelectronsindiscretespace.Figuretakenfrom\citepChooNC20.(b)Grapheneonahoneycomblatticesolvedusingthecc-pVDZbasisset.Figuretakenfrom\citepyoshioka2021solving.

Insteadofworkingdirectlywiththeinfinite-dimensionalHilbertspacecorrespondingtothereal-spaceHamiltonianofEq. 2,itiscommonpracticeinQCtouseafinitebasisset.Bychoosingasetofelectronicbasisfunctions ${φ_{1} (r), φ_{2} (r), \dots}$ ,wecandefineasetofsecond-quantisedoperators ${^c}_{i}^{†}$ ( ${^c}_{i}$ )whichcreate(annihilate)anelectroninthe $i$ -thbasisfunction,andwhichsatisfythecanonicalanticommutationrelations ${^c†i,^cj}=δij$ .Theseoperatorsthenactonthesecond-quantizedwavefunction $ψ_{n_{1} n_{2} \dots}$ ,whichencodesamplitudesfordifferentoccupationsoftheorbitals(Box LABEL:box:first-second-quant).Projectingthereal-spaceHamiltonianontothissetoforbitalsthenyieldsthecorrespondingdiscretizedHamiltonian,

(20)

where

	$t_{i j}$	$= \int φ_{i}^{*} (r) (- \frac{1}{2} \nabla^{2} - \sum I \frac{Z_{I}}{\| r - R_{I} \|}) φ_{j} (r) d r,$		(21)
	$u_{i j k l}$	$= \iint φ_{i}^{} (r) φ_{j}^{} (r^{'}) \frac{1}{\| r - r^{'} \|} φ_{k} (r) φ_{l} (r^{'}) d r d r^{'},$		(22)

arematrixelementsoftheone-andtwo-electrontermsinthereal-spaceHamiltonianofEq. (2).ForsimplebasisfunctionssuchasGaussiansorplanewaves,thematrixelementscanbeevaluatedanalytically.ThisHamiltonianservesasthestartingpointforthemethodsdescribedinthissection.

5.1 Fermionicneuralquantumstates

Insteadofworkingdirectlywiththeoccupation-numberrepresentationofthewavefunction(Box LABEL:box:first-second-quant),itisalsopossibletomapoccupationnumbers $n_{k} \in {0, 1}$ ontodegreesoffreedom $σ_{k}^{z} \in {↓, ↑}$ ofspin-1/2particles,suchthatemptyorbitalsmaptodownspinsandoccupiedorbitalstoupspins.ThismappingmakesitpossibletoleverageNQSandothermethodsforsolvingquantumspinsystems.ThesamedualityallowsthecreationandannihilationoperatorsappearingintheelectronicHamiltonian(Eq. 20)tobewrittenintermsofspinoperators.Thiscanbeachieved,forexample,withtheJordan–Wignermapping\citepWigner1928,thattransformsannihilationandcreationoperatorsinto,respectively,loweringandraisingspinoperators ${^σ}^{\pm_{j}} = ({^σ}_{j}^{x} \pm i {^σ}_{j}^{y}) / 2$ .Thismappingisnotunique,however,andthereexistmorerecentalternatives,suchasparityorBravyi–Kitaevencodings\citepBK2002,bothofwhichhavebeendevelopedinthecontextofquantumsimulations.Regardlessofthechoiceofspinencoding,thefinaloutcomeisaspinHamiltonianwiththegeneralform

{^H}_{q} = r \sum p = 1 h_{p} {^Ξ}_{p},

(23)

definedasalinearcombinationwithrealcoefficients $h_{p}$ of ${^Ξ}_{p}$ ,whichare $N$ -foldtensorproductsofsingle-qubitPaulioperatorsandtheidentity: $^I, {^σ}^{x}, {^σ}^{y}, {^σ}^{z}$ .ThegroundstateofthespinHamiltonianinEq. 23canbeapproximatedusingaspin-basedNQSrepresentationbasedoncomplex-valuedRBMs\citepCarleoS17.Forasystemof $N$ spins,themany-bodyamplitudecorrespondingtoastateinthe $σ^{z}$ basis,i.e., $σ = (σ_{1}^{z} \dots σ_{N}^{z}$ ),takesthecompactform

ψ (σ; θ) = e^{\sum_{i} a_{i} σ_{i}^{z}} M \prod j = 1 2 cosh (b_{j} + N \sum i W_{i j} σ_{i}^{z}),

(24)

withparameters $θ = (a_{i}, b_{j}, W_{i j})$ .ThisansatzcanbeoptimisedwithVMCtechniques(Box LABEL:sec:optimization),typicallyrelyingonthestochasticreconfiguration\citepSorellaPRL98approach.Anumberofworkshaveadoptedthisapproachandachievedcompetitivevariationalresultsforsmallbasissets\citepChooNC20,YangJCTC20,eveninconjunctionwithquantumcomputers\citeptorlai2020precise,iouchtchenko2022neural.InFig. 6(a),weshowthedissociationcurveof $C_{2}$ ,intheSTO-3Gbasis,usingtheRBMasdescribedabove\citepChooNC20.

Solids

Thesecond-quantizationframeworkalsoallowsonetotreatsolids,usingasabasistheBlochorbitalsobtainedbysolvingthecrystallineHFequations\citepdel˙re˙self-consistent-field˙1967.Creationandannihiliationoperators, ${^c}_{i k}^{†}$ and ${^c}_{i k}$ ,forelectronsinband $i$ withcrystalmomentum $k$ areintroduced,andtheresultingHamiltonianissimilartoEq. 20,withthenoticeabledifferencethattheone-andtwo-bodymatrixelementsnowdependonthecrystalmomenta: $t_{i j} \to t_{i j}^{k}$ and $u_{i j k l} \to u_{i j k l}^{k_{1} k_{2} k_{3} k_{4}}$ ,withthefourmomentaappearinginthetwo-bodyintegralssatisfyingtheconservationofthetotalcrystalmomentum.UsingGaussian-basedatomicfunctionsasthesingle-particlebasisandRBMwavefunctionstorepresentthemany-bodystate,\citepyoshioka2021solvingappliedthisapproachtostudytheelectronicstructureofsolids.InFig. 6(b),weshowthecomputedground-stateenergiesforgraphenecrystalsasafunctionofthelatticeconstant.

ExactSampling

FermionicNQSaretypicallysampledusingtheMCMCapproachcommonlyadoptedinVMC(Box LABEL:sec:QMC).However,themixingrateoftheMCMCalgorithmisknowntobeslowinsomecases,suchasclosetophasetransitions,andMCMCsimulationscansufferfromcriticalslowingdown.Awaytocircumventthislimitationistointroducemodelwavefunctionsexplicitlydesignedtoallowexactsamplingoftheirsquaremodulus,thusavoidingtheneedtouseMCMC.Onesuchfamilyareautoregressiveneuralnetworkwavefunctions\citepsharir2020deep,acomplex-valuedgeneralizationoftheautoregressivemodelscommonlyadoptedindeeplearning.Suchnetworksrepresentnormalizedwavefunctionsandallowonetodirectlyobtainperfectlyuncorrelatedsamples;thisisusefulasthewavefunctiondistributionformanyQCproblemscanbehighlymulti-modal.TheexactsamplingapproachwasappliedtoQChamiltoniansinarecentworkby\citetBarrettNMI22.OptimizationsinthewayHamiltonianmatrixelementsandthecorrespondingMonteCarloestimatorsarecomputedhavemadeitpossibletotreatmuchlargersystemsthanwereaccessibleintheearlyapplicationsof\citetChooNC20.Specifically,\citetzhao˙scalable˙2022)obtaincompetitivevariationalenergies,improvingontheCCSDenergiesofmoleculesinminimalbasissets.Resultsforuptoaround50electronsin80orbitals(\ceNa2CO3atequilibrium)havebeenobtainedatrelativelymodestcomputationalcost.

5.2 ML-assistedselectedCI

FormanyQCproblems,althoughthedimensionoftheHilbertspacegrowsexponentiallywithsystemsize,thenumberofrelevantconfigurationsinthegroundstatetypicallyremainssparse.ThissuggeststhatbyefficientlyselectingtherelevantconfigurationsandthendiagonalisingtheHamiltonianonthereducedsubspace,onecanachievehighlyaccurateresults.ThissetofapproachesisalsoknownasselectedCI\citepHuronJCP73,giner2013using,holmes2016heat,sharma2017semistochastic.DifferentflavoursofselectedCIvaryinthewayrelevantconfigurationsareselected.Onewell-knownapproachiscalledMonteCarloCI(MCCI)\citepgreer1998monteandcanbebrieflysummarisedasfollows:

Startfromafinitesetofconfigurations $S_{i} = {| x ⟩}$
Byconsideringsingleordoubleexcitationsstartingfromconfigurationsin $S_{i}$ ,constructanexpandedset $S_{i}^{'}$ .
ConstructtheHamiltonian ${^H}_{i}$ fortheexpandedset $S_{i}^{'}$ anddiagonalisetoobtainthewavefunctioncoefficientsfortheconfigurationsintheset.
Discardtheconfigurationswhosecoefficientislessthanagiventhreshold $c_{min}$ .Theremainingconfigurationsthenformanewsetofconfigurations $S_{i + 1}$ .
Repeatuntilconvergence.

MLtechniquescanbeusedtoimproveselectionoftheconfigurationset.Onesuchapproachistoperformsupervisedlearning\citepCoe2018Machine,GlielmoPRX20,whereaneuralnetworkistrainedtopredictthewavefunctioncoefficientsusingthedatafromtheMCCImethod,i.e.,thewavefunctioncoefficientsoftheconfigurationsintheset $S_{i}^{'}$ .Aftertraining,thenetworkcanbequeriedorsampledtoselecttheconfigurationswiththelargestcoefficients.Inotherwords,thenetworkisusedtobootstrapandpredictthecoefficientsofconfigurationsnotyetseeninthedataset.Itwasshownin\citetCoe2018MachinethatsuchanapproachconvergesfasterthanthevanillaMCCImethod.ThetaskofselectingconfigurationsforselectedCIcanalsobecastasareinforcement-learningtaskwherethestateisthecurrentsetofconfigurationsandanagentistrainedtoperformactionsonthesettoiterativelymodifytheconfigurationswiththeaimofminimisingthevariationalenergy.Thisapproachwasappliedin\citetGoings2021Reinforcementtoachievenear-FCIaccuracyforsmallmoleculesinasmallbasisset.

6 Challengesandoutlook

Ab-initioQCwithneural-networkwavefunctionshasonlyjustemergedasaviablepathtohighlyaccurateelectronic-structuremethods,yetitalreadycompeteswithestablishedapproachesthathavebeendevelopedfordecades.Weimaginethatitmaybecomethemethodologywiththebesttrade-offbetweenefficiencyandaccuracyforsystemswithuptoonetotwohundredelectronsandanontrivialelectronicstructure.Beforethatcanhappen,however,severalchallengesmustbeaddressed.Allthemethodsarecurrentlyinadevelopmentstageandonlylimitedbenchmarkingisavailable.Assuch,itisnotyetclearwhethertheexcellentaccuracyseensofarwillbemaintainedacrossabroaderrangeofchemicalsystems,orhowrapidlytheaccuracywilldegradewithsystemsize.Relatedtothisisourincompleteunderstandingofwhatlimitstheaccuracyofneural-networkansatzes,andhowtheirsuccessorfailureisrelatedtophysicalphenomenasuchasstrongcorrelation.Sincetheunderlyingelectronicproblemisexponentiallyhardbutthealgorithmsarepolynomial,theymustbelimitedinaccuracyinsomeways.Itisnotcurrentlyclear,however,whetherthelimitationsseentodatearecausedbytherestrictedexpressivenessoftheneuralnetworksorbydifficultiesinoptimizationorboth.Forinstance,whileithasbeenproventhatasinglegeneralizedSlaterdeterminantisinprinciplesufficienttorepresentanyantisymmetricfunction\citepHutter20,itmightnotbepossibletoparametrizeitwithapolynomiallyscalingneuralnetworkortrainitwithinapolynomiallyscalingtime.Apartfromthesefundamentalissues,therearemanypracticalchallenges.WhilethescalingofvariationalQMCwithsystemsizeisfavourable,theprefactorduetotheneuralnetworksislarge.Untilveryrecently,thislimitedapplicationstosystemsnolargerthanthebenzenemolecule(42electrons),whichisthreetofourtimesbelowourenvisagedapplicabilityrange,althoughresultsfora108-electronsimulationcellofsolidLiHhavenowbeenreported\citepLi2022-abinitio.TheprefactorcanbereducedbyintegratingtraditionalQCtechniquessuchaspseudopotentials\citepLi2022-pseudo,developingmoreefficientneural-networkarchitectures,orusingMLtechniquessuchaspre-trainingandtransferlearning.Specifictothediscrete-basissecond-quantizedapproachesistheissueofbasis-setconvergence,wheresufficientlylargebasissetsmayincreasetheprefactorbyuptothreeordersofmagnitudecomparedtominimalbasissets.Anotherchallengeisrelatedtothestochasticoptimization,whichproducesnoiseintheconvergedenergiesthatisespeciallyamplifiedwhencalculatingsmallenergydifferences.Weare,however,optimisticthatmanyofthesechallengescanbeaddressedandcanbeaddressedquickly,thankstotherelativesimplicityoftheframeworkbasedonvariationalQMCandofneuralnetworkscomparedtotraditionalQCapproaches.Indeed,thissimplicityhasalreadyenabledrapiddevelopmentofmultipleextensionstothefirstsingle-pointground-statecalculationsonmolecules,includingtransferablewavefunctions,excitedstates,andformulationsforperiodicsystems,alloriginatingfrommultipleindependentresearchgroups.First-quantizedapproachessuchasFermiNet,PauliNet,andtheirsuccessorarchitecturesalreadymatchessentiallyexactbenchmarkresultstowithinchemicalaccuracyforsmallsystems.Yetthesenetworksarejustasmallsubsetofpossiblearchitecturesforrepresentingantisymmetricwavefunctions,anditisunlikelythattheoptimaloneswerefoundonthefirstattempt,soweexpectthatsignificantinnovationliesahead.Webelievethatab-initiomethodsbasedonneural-networkwavefunctionswillbecomeanintegralpartoftheQCtoolboxthatenablesstraightforwardelectronic-structurecalculationsofcomplexmolecularsystems.\printbibliography

Acknowledgements

WeacknowledgefundingfromtheGermanMinistryforEducationandResearch(BerlinInstitutefortheFoundationsofLearningandData,BIFOLD),theBerlinMathematicsResearchCenterMATH+(AA1-6,AA2-8),andEuropeanCommission(ERCCoG772230ScaleCell).