\TitleFontLight curve completion and forecasting using fast and scalable Gaussian processes (MuyGPs)

Imène R. Goumiri
Physics Division Lawrence Livermore National Laboratory Alec M. Dunton
Center for Applied Scientific Computing Lawrence Livermore National Laboratory Amanda L. Muyskens
Engineering Division Lawrence Livermore National Laboratory Benjamin W. Priest
Center for Applied Scientific Computing Lawrence Livermore National Laboratory Robert E. Armstrong
Physics Division Lawrence Livermore National Laboratory

Abstract

Temporalvariationsofapparentmagnitude,calledlightcurves,areobservationalstatisticsofinterestcapturedbytelescopesoverlongperiodsoftime.LightcurvesaffordtheexplorationofSpaceDomainAwareness(SDA)objectivessuchasobjectidentificationorposeestimationaslatentvariableinferenceproblems.Ground-basedobservationsfromcommercialofftheshelf(COTS)camerasremaininexpensivecomparedtohigherprecisioninstruments,however,limitedsensoravailabilitycombinedwithnoisierobservationscanproducegappytime-seriesdatathatcanbedifficulttomodel.Theseexternalfactorsconfoundtheautomatedexploitationoflightcurves,whichmakeslightcurvepredictionandextrapolationacrucialproblemforapplications.Traditionally,imageortime-seriescompletionproblemshavebeenapproachedwithdiffusion-basedorexemplar-basedmethods.Morerecently,DeepNeuralNetworks(DNNs)havebecomethetoolofchoiceduetotheirempiricalsuccessatlearningcomplexnonlinearembeddings.However,DNNsoftenrequirelargetrainingdatathatarenotnecessarilyavailablewhenlookingatuniquefeaturesofalightcurveofasinglesatellite.Inthispaper,wepresentanovelapproachtopredictingmissingandfuturedatapointsoflightcurvesusingGaussianProcesses(GPs).GPsarenon-linearprobabilisticmodelsthatinferposteriordistributionsoverfunctionsandnaturallyquantifyuncertainty.However,thecubicscalingofGPinferenceandtrainingisamajorbarriertotheiradoptioninapplications.Inparticular,asinglelightcurvecanfeaturehundredsofthousandsofobservations,whichiswellbeyondthepracticalrealizationlimitsofaconventionalGPonasinglemachine.Consequently,weemployMuyGPs,ascalableframeworkforhyperparameterestimationofGPmodelsthatusesnearestneighborssparsificationandlocalcross-validation.MuyGPsallowsustoquicklytrainmodelsonlightcurvedatainsecondsonasingleworkstation.Inthismanuscriptweexplorelightcurvecompletionandforecasting,andcompareembeddingsofthelightcurvedataintomulti-dimensionalspacestotakeadvantageofdailyandyearlyperiodicity.WeshowthatourmethodoutperformsfeedforwardDNNsbothintermsofaccuracyandquantityoftrainingdatarequired.

LLNL-PROC-839253

1 Introduction

Photometriclightcurvesareaseriesofobservationsthattrackthebrightnessofanobjectoveraperiodoftime.Theycanbeusedtocharacterizethedynamicpropertiesofasystem.Astronomersfrequentlyanalyzelightcurvestounderstandawholerangeofastrophysicaltopicsincluding:detailedphysicswithinastar [gaia2019],thediscoveryofexoplanets [kepler2013],classifyingdistantsupernovae [des2022],andcharacterizingthepopulationofnearearthasteroids [atlas2018].SpaceDomainAwareness(SDA)involvesmonitoring,detectingandunderstandingthepopulationofearth-orbitingbodiesorresidentspaceobjects(RSOs).SDAisofincreasingimportanceduetotheincipientgrowthinthenumberofspaceobjectsanddebrisorbitingtheearth,dueinlargeparttotherecentcommercialdevelopmentofsatellites.ThenumbersosuchRSOsisexpectedtogrowbyordersofmagnitudeinthenextdecade.WhileSDAisconcernedwithmaintainingcustodyoforbitingobjects,italsoprioritizestherapididentificationofchangestoandanomalousbehaviorofthemanyvaryingorbitalsystems.Fortunately,lightcurvesarevaluableforinspectingRSOsinmuchthesamewayasastrophysicalphenomena,andcanenablecriticalinformationforSDA.ThebrightnessofanRSOdependsonstructuralfeatureslikesize,shape,andmaterialcompositionaswellthegeometrybetweentheobject,thesunandobserver.Theproliferationoflowcostcommercial-off-the-shelf(COTS)ground-basedtelescopeshasmadethegenerationoflightcurvesfromRSOseasiertoproduce.Furthermore,manyconstellationsofground-basedtelescopeshavebeentaskedwithtrackingRSOsforSDA.ThesefactorshaveenabledtherelativelycheapproductionoflargevolumesofRSOlightcurves,whichhaspromptedpractitionerstoapplyautomationtoanalyzethematscale.PractitionershaverecentlyappliedmachinelearningtolightcurvesofRSOstosolvevariousSDAtasks.Forexample,comparingthelightcurveofanunknownRSOtoacatalogofknownRSOsisusefulforpredictingfeatureslikeshape [linares2014space, furfaro2019shape],materialcomposition [dianetti2019space],andgeneralcategoriesorclasses [linares2016space, jia2018space, furfaro2018space].Furthermore,forecastingRSOlightcurvesintothefutureisusefulfordetectingdeviationsfromtheexpectedpatterns-of-lifeinnear-real-time,affordingthedetectionofanomalouseventssuchasmaneuvers [shabarekh2016novel, dupree2021time]orconfigurationchanges.Thegoalofmachinelearninginthiscontextistolearnafunctionmappinganinputspace(typicallythetimedomain)toaanobservationspace,e.g.magnitude,baseduponindependentandidenticallydistributed(i.i.d.)samplesfromadistributionofinput-observationpairs.Wewillrefertothisdatadistributionasthe“targetdistribution”.Amachinelearningmodelissuccessfulwhenitisabletoaccuratelypredicttheobservationofanunseeninputdrawnfromthetargetdistribution.Bothdiffusion-basedorexemplar-basedmethodshavebeentraditionallyusedinimageortime-seriescompletionproblems.Diffusion-basedmethods [sohl-dickstein2015deep]usethermaldiffusionequationstopropagateinformationfromsurroundingregionsintomissingregions,theyaremostlyeffectivewhenthegapsaresmallandtendtosmoothdetailsoutforlargerproblems.Exemplar-basedmethods [criminisi2004region]usegreedyalgorithmstoapplypatchesoftrainingdatatomissingregionswhichcanproduceimplausiblepatterns.DeepNeuralNetworks(DNNs)areanespeciallypopulartooltomodellightcurvesintheliteratureduetotheirexpressivenessandgeneralizationcapabilities.ADNNisatypeofrepresentationlearningmodel—amachinelearningmodelthatlearnsanappropriatefeaturerepresentationofthedatainadditiontoproducingpredictions.DNNsiterativelytransformtheinputspaceintolatentfeaturespacesusinglinearfunctionsthatare“activated”byelement-wisenonlinearfunctions.DNNsalsohappentobeuniversalfunctionapproximators—itispossibletoapproximateanycontinuousfunctiontoanarbitrarylevelofprecisionusingasufficientlylargeDNN.ThepersistentpopularityofDNNsderivesfromseveralsources,suchasthewidespreadavailabilityofhardwareacceleratorslikegraphicalprocessingunits(GPUs),advancedstochasticoptimizationtechniquesthataidintheirtraining,andthedevelopmentofuser-friendlysoftwarelibrariessuchasTensorflow [Abadi_TensorFlow_Large-scale_machine_2015]andPyTorch [NEURIPS2019_9015].AlthoughDNNshavemanypositivefeatures,theyalsohavesomedrawbacksthatareparticularlynotableinthelightcurveproblem.First,modernDNNarchitecturestypicallyconsistofaverylargenumberofparametersthatmustbetrained.Trainingsuchmodelsgenerallyconsumesalargeamountofcomputingresources,asthemodelanditsgradientareevaluatedovermanyiterationsinordertorefineparametervalues.Inadditiontocomputationalexpense,trainingalargemodeltypicallyrequiresalargeamountofdata.Theliteraturehasobservedaroughlylinearrelationshipbetweenmodelsizeandtheamountoflabelleddatarequiredtotrainit [tan2018survey].Thismeansthatlearninganaccuratelightcurvemodelcanrequirealargenumberofobservationsingeneral.Second,whileDNNmodelslendthemselvestohighdimensionalfeaturespacestheytendtostrugglewithverysmallnumbersofdimensions.Featureengineeringcanoftensolvethisproblem,especiallyintimeseriesscenarioswheresomenotionofamovingwindowusuallyservesasafeaturespace.However,thisstrategyismostsuccessfulwhentheobservationcadenceofthelightcurveishigh.Upcomingground-basedsurveysofdeepspace,suchastheLegacySurveyofSpaceandTime(LSST)areexpectedtoincidentallycapturemanyRSOimagesfromwhichlightcurvescanbeextracted.However,theselightcurveswillbeirregular,sparse,andhavelow-dimensionalfeaturespaces,increasingthechallengeofapplyingexistingtechniques.Third,effectivetrainingtypicallyrequiresalargevolumeoftrainingdatathatiscompleteandrepresentativeofthetargetdistributioni.e.alargemodelrequiresalargeamountoftrainingdatathatisi.i.d.accordingtothetargetdistribution.Inadditiontoinformingpredictionaccuracy,dataindependenceandsizehelpscomplexDNNmodelsavoidoverfittingandsimplymemorizingthetrainingdata.However,collectionsofphysicalmeasurementsareoftenlimitedbytherealitiesofsensoravailabilityandphysicalobstruction.Forexample,weatheraffectsopticalastronomicalmeasurementsbyintroducingcorrelatednoiseorblockingthedesiredobjectfromviewentirely.Furthermore,itisingeneraldesirabletodesignamodelthatcanbealternativelyappliedtoseveraldifferentRSOinferenceproblems,i.e.severaldistincttargetdistributions.Atransferlearningapproach-onewherethemodelisatleastpartiallytrainedondatafromapossiblydifferentdistributionthanthetargetdistribution [tan2018survey]-couldaddressthisneedforalargevolumeoftrainingdata.Indeed,transferlearninghasbeenappliedtosatelliteapertureradarimages [rostami2019deep],radiofrequencyinterferencecharacterization [lefcourt2022space],andclassifyingRSOsusinglightcurves [furfaro2018space].However,transferlearningultimatelyreliesonfeatureslearnedfromapotentiallyunrelateddataset,whichmayleadtoinefficientorinaccurateconclusions.GenerativemodelssuchasVariationalAutoencoders(VAEs)andGenerativeAdversarialNetworks(GANs)areadditionalalternativesfromthemachinelearningliteraturethataddressdataefficiency.VAEsareautoencodersthatimposestructureupontheirlearnedencoderanddecoderfunctionstoensurethatthelatentrepresentationofthedataencodesthetargetdistribution.ResearchershavesuccessfullyappliedVAEstolearnshapesfromthelightcurvesofRSOs [furfaro2019shape].GANsconverselysimultaneouslytrainmodelsthatgeneratesamplesmeanttomimicthetargetdistributionwhiledistinguishingbetweenrealandsyntheticsamples.PractitionershaverecentlyusedGANstoclassifystarsbasedupontheirlightcurves [garcia2022improving].However,GANscanbechallengingandexpensivetotrain.GANsinferadistributionfromaverysmalltrainingdatasetandcansufferfrommodecollapse,non-convergenceorgeneralinstability.Thisbringsustothefourthpoint—DNNshavedifficultyquantifyingtheuncertaintyintheirpredictions.ItisgenerallydifficulttodeterminewhenaDNNisextrapolating—predictingtheresponseondatafromadifferentdistributionorinadifferentregionofthetrainingdata.Thisisproblematicinscientificapplicationsbecauseoverconfidentpredictionscanleadtoincorrectinferenceandunnaturalresultsthataredifficulttointerpret.Furthermore,itisimportantfordecisionmakerstodistinguishbetweenlowandhigh-confidencepredictionswhendrawingconclusionsforSDA.Althoughuncertaintyquantificationmethodsareinresearch,mostpracticalmodelsemployedbypractitionersprovideonlypointpredictionswithnoobviousmechanismtomeasuremodelconfidence.RecentattemptstoprovideuncertaintyquantificationtoDNNsattempttolearnpredictionintervals(i.e.errorbars)inadditiontopointpredictions,butthisliteratureisstilldevelopingandthereisasyetnoconsensussolution [kabir2018neural].OthersattemptahybridapproachwhereaGaussianprocessisappendedtothelastlayerofaDNN [bradshaw2017adversarial].InthismanuscriptweproposeGaussianProcess(GP)modelsasalternativesforthelightcurvemodelingproblem.LikeDNNs,GPsarerepresentationlearningmethodsthataredata-drivenuniversalfunctionapproximators.AGPisatypeofkernelmethodthatimplicitlyandnon-linearlymapsinputfeaturesintoapossiblyinfinitelydimensionalinnerproductspace.Kernelmethodsuseaparameterizedkernelfunctiontocheaplycomputeinnerproductsinthisimplicitspacetopredictunknownresponses.GPscanalsobethoughtofasthegeneralizationofamultivariatenormaldistribution,wherealldatawithinadefineddomainisassumedtobejointlynormallydistributedwiththecovariancedefinedbythekernelfunction.Therefore,predictionsfromGPmodelsareprobabilisticallydefinedthroughconditionalprobability.GPsareattractiveforscientificapplicationsduetothisinherentlyBayesianinferencemodel,whichproducesexplicituncertaintyquantification(Gaussiandistributions)ofitspredictions.Furthermore,GPshavebeenshowntooutperformDNNsindata-starvedregimes [muyskens2022star]andareidealmodelsforlow-dimensionalfeaturespaces [muyskens2021muygps].However,conventionalGPshaveverypoorscalinginthenumberoftrainingobservations,whichhaspreviouslylimitedtheirapplicationtothelightcurvemodelingproblem.BothrealizingthepredictionsfromtheGPmodelandevaluatingthelikelihoodfunctionintrainingrequirecubiccomputationinthenumberofdatapoints,andrequirequadraticmemorytostorethekernelmatrix.WhilethispracticaldrawbackhastendedtolimittheapplicationofGPstosmalldataproblems,scalableapproximateGPmethodshaveproliferatedintheliterature [heaton2019case, liu2020gaussian].Thesemethodstypicaltradeoffaccuracyforspeed.However,theapproximateGPmethodMuyGPsoffersaselectionofsettingsthatdemonstratessuperioraccuracyorcomputationalscalinginseveraldatasets[muyskens2021muygps, muyskens2022star, buchanan2022gaussian].Therefore,inthispaperweusetheMuyGPsapproximateGPestimationmethod [muyskens2021muygps].MuyGPsusesnearestneighborssparsificationandabatchedleave-one-outcross-validationobjectivefunctiontotrainaGP-likemodelinnearlylineartimeinthenumberofdataobservations.InvestigatorshavesuccessfullyappliedMuyGPstocosmologyimageprocessingproblems [goumiri2020star, muyskens2022star, buchanan2022gaussian].Inthispaper,weproposeamethodoflightcurveinterpolationandpredictionusingMuyGPsthatgivesbothpredictionsanduncertaintyquantificationofthosepredictions.Weusethismethodintwomodes:

Interpolation,thepredictionofunobservedmagnitudesinthepast,and
Forecasting,thepredictionofunobservedfuturemagnitudes.

Interpolationisusefulasadatapreprocessingutilityforcatalogs.Interpolationallowsustofillinmagnitudeobservationsforsparse,irregularlightcurvestomakethemsuitablefordownstreamcomparisontaskssuchasshape,pose,size,type,ormaterialestimation.Forecastingisusefultodetectdeviationsfromexpectedfuturepatterns-of-lifethatcorrespondtoanomalies.TheposteriorvarianceprovidedbyGPsisespeciallyusefultodeterminewhatconstitutesasignificantdeviationfromexpectedbehavior.Insection 2,weprovidebackgroundonthelightcurvedataitself,dataprocessing,aswellasthemachinelearningmethodsweutilizeinourcomparativestudy.Theninsection 3wedescribeseveralnumericalstudiesthatdemonstratethecomparativeperformanceofseveralchoicesofourGPmethodtothatofaDNNandhowtheuncertaintyquantificationprovidedbyourmethodcanbeusedtodetectanomalies.Finally,insection 4,wediscussourconclusions,theimportanceofourfindings,limitations,andfutureworkassociatedwithourmethodology.

2 Methodology

2.1 Lightcurvesdefinition

Lightcurvesaretimeseriesofthebrightnessofresidentspaceobjectsoverlongperiods.Theyareobtainedbyobservingthesameobject,nightafternight,forseveraldaysorevenyears.Onewaytovisualizethemisintwodimensionswithoneaxisrepresentingthetimeofday(orsolarphaseangle)andtheotheraxisrepresentingthedayswhilethepseudo-colorindicatesthebrightness,asinFig. 1.

Figure 1: Thelightcurveofasingleobjectover4years.Eachcolumnrepresentsasingleday.Thecolorscalerepresentsthebrightness.Theblackstripesandholeshighlightthesparsityofobservations;theformerareduetoweather,thelattertoeclipse.Theseasonalityandyearlyperiodicityinducestheoutershape.Courtesyof [monetgrams].

LightcurvesofRSOscontainpotentiallydetailedinformation.Forexample,onecoulddetectdustaccumulatingonhighlyreflectivesolarpanelsanddeducetheageandlifeexpectancyofasatellite.Higherintensityreflectionsalsohavethepotentialtoinformabouttheshapeoftheobservedobjectsincedifferentfacetsofamulti-facetedobjectwouldreflectdifferently.Inaddition,satellitescanmaneuverorrotate,inwhichcase,thereflectedlightwilldeviatefromtheexpectedpattern.ThosedeviationscanbedetectedandincreaseSDAknowledge.Observationscanbelimitedbythetimeofday,weather,sensorsaturation,andeclipses(seeFig. 2).Dataisonlyavailableatnight,andthemeasurementsarelessaccuratenearduskanddawn.Theweathercanprecludeobservationsforhoursandsometimesdaysatatime.Whenthereflectedlightfromthesunisparticularlybright,sensorscansaturateleadingtomissingdata,thoughthisparticularcaseisusuallyfairlyobviouswhenlookingatthebrightnessofsurroundingavailabledata.Lastly,periodically,thelightfromasatellitewillbeeclipsedbytheearth.

a) Missingdataduetoa)timeofday,b)weather,c)sensorsaturation,andd)eclipse. b)c)d)

Figure 2: Missingdataduetoa)timeofday,b)weather,c)sensorsaturation,andd)eclipse.

TotestourcodesweusealightcurvedatasetprovidedbyDaveMonet[monetgrams]of43satellites.Allsatellitesinthedatasetarefromthepubliccatalog(www.space-track.org).Weselected13distinctRSOs,thosemanuallyflaggedbyDaveMonetasnominal,forouranalysis.Thereareapproximately500,000datapointsperobject.AllobservationsweretakenfromasinglecamerainFlagstaffAZbetweenSeptember2014andSeptember2018.Thedatasetcontainsthebrightness(magnitude;bandisapproximatelyJohnsonV)aswellasthemeasurementerrorinbrightness(uncertainty).

2.2 Processing

Totestourpredictioncapabilities,weselectaportionofeachlightcurveastestdata.Weusetimeperiodsrangingfromafewhours(3hours)tomultipleweeks(2weeks).Andweeitherselectthetestdataatarandomtimewithinthetimeseries(interpolation)orattheend(extrapolation).Toguaranteethattheselectedperioddoesnotfallatatimelackingobservations(e.g.duringthedayorduringacloudynight)werejectthosethathavefewerthan90%ofthedatawe’dexpecttofindinthattimeintervalifthedatawasuniformlydistributed.Inaddition,weapplythatsameconditiontotheprecedingintervaltoensurethatwe’renotaccidentallyinterpolatingforlongerthatwewouldexpect.Sincethelightcurveshavesomedailyandyearlyperiodicity,wecompareseveralembeddingsofthelightcurvedataintomulti-dimensionalspaces.Thereferenceisthe1Dmodelwhichisjusttheoriginaltimeseries.The2Dmodelhasonedimensionrepresentingtimesofdayasrealnumbersinthe $[0, 1)$ interval,andanotherdimensionrepresentingdaysasintegers,startingatzeroonthefirstdayofobservation.The3Dmodelislikethe2Dmodelbutwithanadditionaldimensionforyears,asintegersbetween0and4,andwithamodifieddaysdimensionmodulo365.Notethatalltheinputdimensionsareeventuallynormalizedtotheinterval $[0, 1]$ asiscustomary.Analternate2Dmodelwithyearandtimeofyearasopposedtodayandtimeofdaywasconsideredbutdismissedforsimplicityandfornotbeingasintuitiveandinteresting.Weobservedthatday-to-daycorrelationisstrongerthanyear-to-yearinourdataset.Notethatitismoretraditionaltomodelthistypeofdataperiodicityusingaperiodickernel,buttheMuyGPsestimationframeworkdependsonakernelsparsitythatisultimatelynotvalidatedwiththatkerneltype.Therefore,theseembeddingsareanovelwaytoreplacesuchaperiodickernelwhilemaintainingthecomputationallyefficientframework.

2.3 Gaussianprocesses

Wewillconsiderthroughoutalightcurvetobeaunivariateresponse $Y : X \to R$ ,where $X \subseteq R^{d}$ istheobservationspacealong $d$ timedimensions.Inpreprocessingwede-trendthedata,andso $Y$ thereforehaszeromean.Inmodeling $Y$ withaGP,weassumethatitisdrawnfromaGaussianprocessdistribution.Thismeansthat $Y$ followsamultivariateGaussiandistributionatanyfinitesetof $n$ points $X = (x_{1}, \dots, x_{n}) \in X^{n}$ .However,inrealitymeasurementnoiseperturbsourobservedvaluesof $Y$ atlocations $X$ .Weassumethateachmeasurementisperturbedbyhomoscedasticnoisethatarei.i.d. $N (0, ϵ)$ .Thatis,

	$Y (X)$	$= (Y (x_{1}), \dots, Y (x_{n}))^{⊤} \sim N (˜ 0, Q_{θ} (X, X)),$		(1)
	$Q_{θ} (X, X)$	$= σ^{2} (K_{θ} (X, X) + ϵ I_{n}),$		(2)

where $N$ isthemultivariateGaussiandistribution, $˜ 0$ isthe $n$ -dimensionalzerovector, $σ^{2}$ isavariancescalingterm, $I_{n}$ isthe $n \times n$ identitymatrix,and $K_{θ} (X, X)$ isan $n \times n$ positivedefinite,symmetriccovariancematrixbetweentheelementsof $X$ thatiscontrollednon-linearlythroughkernelfunction $K_{θ} (\cdot, \cdot)$ withhyperparameters $θ$ . $Q_{θ} (X, X)$ is $K_{θ} (X, X)$ thatisperturbedbythemeasurementnoiseprior $ϵ$ andscalingparameter $σ^{2}$ .Similarly,anyadditional(possiblyunobserved)datum $x^{*} \in X$ isalsojointlynormalwithobserveddata $X$ bytheGPassumption.Thus,theconditionaldistributionfortheresponseof $Y$ at $x^{*}$ givenresponsesobservedat $X$ and $θ$ isalsomultivariatenormalwithmeanandvariance

	$Y_{θ} (x^{*} ∣ X)$	$= K_{θ} (x^{*}, X) Q_{θ} (X, X)^{- 1} Y (X), and$		(3)
	$Var (Y_{θ} (x^{*} ∣ X))$	$= K_{θ} (x^{}, x^{}) - K_{θ} (x^{}, X) Q_{θ} (X, X)^{- 1} K_{θ} (X, x^{}),$		(4)

where $K_{θ} (x^{*}, X) = K_{θ} (X, x^{*})^{⊤}$ isthecross-covariancematrixbetween $x^{*}$ andtheelementsof $X$ .GPsaretypicallytrainedbymaximizingthelog-likelihoodoftheobservations $Y (X)$ withrespectto $θ$ .Thislog-likelihoodfunctionpossessesthefollowingform:

l o g (L (θ, Y (X))) = - \frac{p}{2} l o g (2 π) - \frac{1}{2} l o g (| Q_{θ} (X, X) |) - \frac{1}{2} Y (X)^{T} Q_{θ} (X, X)^{- 1} Y (X) .

(5)

However,evaluatingEquation 5is $O (n^{3})$ computationand $O (n^{2})$ inmemory,whichisintractableforallbutrelativelysmalldata.AMuyGPsmodelrewritesequations 3 and 4as

	${ˆ Y}_{θ} (x^{} ∣ X_{N^{}})$	$= K_{θ} (x^{}, X_{N^{}}) Q_{θ} (X_{N^{}}, X_{N^{}})^{- 1} Y (X_{N^{*}}), and$		(6)
	$Var ({ˆ Y}_{θ} (x^{} ∣ X_{N^{}}))$	$= K_{θ} (x^{}, x^{}) - K_{θ} (x^{}, X_{N^{}}) Q_{θ} (X_{N^{}}, X_{N^{}})^{- 1} K_{θ} (X_{N^{}}, x^{}),$		(7)

where $X_{N^{*}}$ arethenearestneighborsof $x^{*}$ in $X ∖ {x^{*}}$ .MuyGPstrains $θ$ intermsofsomelossfunction $ℓ (\cdot, \cdot)$ overasampledbatch $B = (x_{1}^{*}, \dots, x_{b}^{*}) \subseteq X$ ofsize $b$ byminimizinganobjectivefunction $Q (θ)$ .Minimizing $Q (θ)$ allowsustotrain $θ$ withoutevaluatingtheexpensivedeterminantintheGPlikelihood.Whenweset $Q (θ)$ toleave-one-outcross-validationand $ℓ_{θ}$ tomeansquarederror,training $θ$ reducestominimizingthefunction

Q (θ) = \frac{1}{b} \sum i \in B {(Y (x_{i}^{*}) - {ˆ Y}_{θ} (x_{i}^{*} | X_{N_{i}^{*}}))}_{θ}^{2} .

(8)

Noteotherlossfunctionscanbeemployedinthisframework,butthismeansquarederrorfunctionhasbeendemonstratedasperformant [muyskens2021muygps].WeuseBayesianoptimizationtooptimizeEquation 8totrain $θ$ inourexperiments.

2.4 Astandardneuralnetworkmodelforbenchmarking

Neuralnetworkshaveachievedstate-of-the-artresultsoncommonbenchmarkproblemsandarenowubiquitousinscientificapplications.AlthoughthefocalpointofthisworkistheapplicationofGaussianprocessestolightcurvemodeling,webenchmarkthepredictiveperformanceofMuyGPsagainstastandarddeepneuralnetworkmodeltoensureitspredictiveaccuracy.Weselectafully-connectedarchitecture,asthestructureofthefeaturespaceisnotsuitedtoconvolutionalmodels.Moreover,recurrentneuralnetworks,includingLSTMs,areanotherpopularclassofmodelsthatareusefulinforecastingsettings,butnotinterpolationproblems.Fully-connectedneuralnetworkscanberepresentedasthecompositionofaseriesofaffinetransformationsandnonlinearactivationfunctions.Thecompositionofanindividualaffinetransformationandnonlinearactivationfunctionconstitutesalayerofthefully-connectednetwork.Eachaffinetransformationinthenetworkisparameterizedbyweights $W_{i} \in R^{n_{i} \times n_{i + 1}}$ ,where $n_{i}$ isthedimensionoftheinputtothe $i^{t h}$ layerand $n_{i + 1}$ istheoutputdimension,andabiasvector $b_{i} \in R^{n_{i + 1}}$ .Welabeltheactivationfunctioninthe $i^{t h}$ layer $σ_{i}$ .Let $F_{N N} (x) : R^{n_{i n}} \to R^{n_{o u t}}$ beaneuralnetwork.Then,

F_{N N} (x) = σ_{n} (W_{n} σ_{n - 1} (W_{n - 1} σ_{n - 2} (W_{n - 2} (\dots) + b_{n - 2}) + b_{n - 1}) + b_{n}) .

(9)

Wetrainafully-connectedneuralnetworkusingthedayoftheyearandtimeofdayastheindependentvariables(features),withthetargetsettothemagnitude(normalizedtotakeonvaluesbetween0and1bydividingbythelargestmagnitudeobserved).ThemodelarchitecturefeaturesReLUactivationfunctionsand5layersofsizes $2 \times 200$ , $200 \times 200$ , $200 \times 100$ , $100 \times 20$ ,and $20 \times 1$ .Thereare $62, 420$ totalparametersinthemodel,roughly $1 / 8 - 1 / 5$ thenumberoftrainingsamplesdependingonthetestcase.Wetraintheneuralnetworkinbatchesofsize128usingtheAdamoptimizerfor100epochs.Weusethemean-squarederrorlossfunctionwith20percentofthetrainingdatawithheldforvalidation,thetrainingratesetto $10^{- 3}$ ,andalearningrateexponentialdecayfactorof0.5appliedevery30trainingepochs.Wesettheratioofthenumberofparametersinthemodeltothenumberofdatapointsontheorderof $1 / 10$ ,acommonlyacceptedratiointhedeeplearningcommunity.Weselectedthesetraininghyperparametersbasedoncommonvaluesusedindeeplearningtoachievethebestcombinationoftraininglossdecayandvalidationerrorminimization.

3 Results

FirstwecomparethepredictingpowerofGaussianprocessesforthedifferentembeddingsoftheinputdatadescribedinsubsection 2.2.Foreachofthe13lightcurvesinourdataset,werandomlyselectatotalof20timeintervals,5foreachtimedurationin{3hours,1day,1week,2weeks},andusethoseintervalsastestdatawhileusingtherestastrainingdata.TheaccuracyofthepredictionismeasuredastheRootMeanSquareError(RMSE)dividedbytheextent(max-min)ofthemagnitudeinthetestdata.Weusethecommonradialbasisfunction(orRBF)todefineourkernelinallofourexperiments.Thismeansthatweusethekernelfunction

K_{ℓ} (x, x^{'}) = exp (- \frac{∥ x - x^{'} ∥_{2}^{2}}{2 ℓ^{2}}) .

(10)

TheRBFkernelhaslengthscaleparameter $ℓ$ ,andourmodeladditionallyhasthemeasurementnoisevarianceprior $ϵ$ andvariancescalingparameter $σ^{2}$ .Wefix $ϵ = 10^{- 5}$ inourexperiments.Wetrain $ℓ$ byoptimizingequationEquation 8viaBayesianoptimization.However,noprediction-basedobjectivefunctionissensitiveto $σ^{2}$ ,andsoweoptimizeitaccordingtotheanalyticproceduredescribedinSection2of[muyskens2021muygps].Fig. 3showsthedistributionoftheresultingcomparisonofthepredictionstothetruthforeachembedding.Withamedianof0.066,the2Dembeddingyieldsmorethan3timesbetterpredictionsthantheoriginal1Dembedding(median:0.24).Thebetterpredictionispossiblebecausethedistancebetweensimilardatapointsacrossdaysisreduced,thankstotheextradimension,allowingtheGPtomorereadilyusethemduringinterpolation.Howeverweseethataddingathirddimensionencodingtheyearlyperiodicitydoesnotyieldsignificantimprovementsoverthe2Dembeddingsincetheresultsarequasi-identical(median:0.065).Althoughthedistributionsof2Dand3Dembeddingsaresimilar,fromthison,wefocusonthe2Dembeddinganduseitexclusively.

Figure 3: Accuracyofpredictionsfordifferentdataembeddingsasastandardboxplotrepresentingthedistributionofresultsofcompleting5randomgapsof4distinctdurationsrangingfrom3hoursto2weeksforeachofthe13lightcurvesinthedataset.1Distherawtimeseriesofmagnitudes,2Dtakesintoaccountdailyperiodicity,and3Dtakesintoaccountbothdailyandyearlyperiodicity.WhileGPsbenefitgreatlyfromthedailyperiodicityinformation,theeffectofaddinginformationaboutyearlyperiodicityiscomparativelynegligible.

Partofthemotivationforbeingabletopredictdatainlightcurvesistocompletegapsandmissingdatapoints.Afirstbenchmarkistobeabletocompleterandomlyselecteddatapointsthroughoutalightcurve.Werandomlyselecteither5%,10%,or20%ofthetotalnumberofdatapointavailableineachofthe13lightcurvesinthedataset,5timesforeachproportion.Then,formorerealisticinterpolation,weselectconsecutivedatapointsrepresentinggapsofeither3hours,1day,1week,or2weeks,with5differentrandomstartinginstantsforeach13lightcurve.LastlywerepeatthesameprocedurebutbyselectinggapsattheendofeachlightcurvetoevaluatehowGPsperformforextrapolation.Fig. 4showstheperformanceofGPpredictionsforalloftheseinterpolationandextrapolationtasks.Forrandominterpolation,forallpercentages(5%,10%,or20%),theperformanceissimilarandverygoodwithamedianrelativeRMSEofabout0.03.Forgaps,eitherinterpolationorextrapolation,thereismorevariation,bothforagivengapduration,andacrossdurations.

Figure 4: ComparisonofthepredictingpowerofGPs(with2Dembedding)forvariousinterpolationandextrapolationtasks.Theleftpaneisforinterpolatingrandomlyselecteddatapoint,representingfrom5%to20%oftheentirelightcurves.Themiddlepaneisforinterpolatinggapsofdurationsrangingfrom3hoursto2weeks.Therightpaneisforextrapolatinggapsofsimilardurationsattheendofeachlightcurves.Randomdatapointsareinterpolatedwithgreataccuracy,evenwhentheyrepresentasignificantportionofthedataavailable.Interpolatinggapsovershortperiodsoftimeismorechallenging.

TocomparetheperformanceofGPscomparedtoDNNs,weranthesametaskofpredictingonedayofmissingdatainasingle4year-longlightcurvewithtrainingdataexclusivelyfromthatsinglelightcurveonthesamemachine.AsshowninFig. 5,GPsachievebetteraccuracyinonlyafractionofthetime.

Figure 5: ComparisonoftheperformanceofourGaussianprocessmethodwithanequivalentneuralnetwork.TheaccuracyismeasuredbytheRMSEscaledbytheextentofthemagnitudeinthetestdataset.ThecomputingtimeisthetotalofthetrainingandpredictiontimeasthosecannotbeseparatedfortheGP,andthepredictiontimeisnegligiblecomparedtothetrainingtimefortheNN.Onthesamedata,theGaussianprocessachievesabetteraccuracyinafractionofthetime.

Fig. 6illustratesasingleexampleofthequalityoftheGPpredictioncomparedtorawmeasurementsforadayofmissingdatausingthe3Dembedding.Themeanpredictionfollowsthetrendofmagnitudeswellasexpected.AsistypicalinGPmodels,ourkernelmodelisassumedtobestationaryandhomoscedastic,meaningthatwelearnasinglesetofhyperparametersthatbestmodelsallthetrainingdata(acrossallmagnitudes).Inthisintervalpictured,weseethatthedataathighmagnitudesseemtohavemorevariancethanthedatacollectedatlowermagnitudes.Ourmodelisagnostictoanychangeinvarianceinthedataitself,butwillonaverageprovidedesireduncertaintyquantification.Notethisonetimegapisasingleintervalthatisinitselftime-correlatedinmagnitudeandthereforevarianceregime.Therefore,thecoverageofthe95%intervalfromthisonesamplecanbedifferentthanthedesiredoveralllevel,butwhenweconsidermanyindependenttimeintervalsofthistype,theuncertaintieswillaveragetothedesired95%confidencelevel.InfutureworkmoreflexibleandcomplexGPkernelfunctionscouldbedesignedwithinhomogeneousornon-stationarykernelstocorrectforthispotentialpatterninmagnitudetoimproveuncertaintiesineachindividualsampleregion.Ifthemagnitudedistributioninourpredictionintervaldiffersfromthatofourtrainingdatathenthereisamodeldiscrepancyinourapproachthatcouldexplainourobservedvariances.

Figure 6: GPprediction(orange)comparedtomeasurement(blue)for1dayofmissingdata.Theorangeshadedregioncorrespondstothe95%confidenceintervalofthepredictionwhiletheblueshadedregioncorrespondstothemeasurementstandarderror.ThispredictionwasobtainedusingaRBFkernelwiththreedimensionaltrainingdata.TheGPdoesaverygoodjobapredictingdetailswhileeffectivelyfilteringnoise.

ButthetruepowerofGPscomesfromtheirabilitytopredictafullposteriordistributionandnotjustamean.Thepredictedcovariancecanbeusedtodetectareaswherethepredictiondiffersfromthemeasurementbyasignificantamount.OneapplicationofparticularinterestforSDAisthedetectionofanomalies—potentialmaneuversorstatechanges—inlightcurvedata.Fig. 7showstwoexampleofsuchanomalies.Inbothcases,twofactorsmaketheanomaliesnoticeable.Firstly,thepredictiondiffersfromthemeasurementforasmallbutsignificantperiodoftime,andthedifferenceexceedstwotimesthepredictedstandarddeviation.Andsecondly,themeasurementsfromthesurroundingdayslooksimilartothoseofthefocalday—andtotheprediction—exceptforthatparticularperiod.Thecombinationofthosetwofactorsindicatethatitisindeedthemeasurementsthatareanomalousandnotthepredictions.

Figure 7: Exampleofdetectedanomalies.Thepredictionissmoothandmatchesthemeasurementwellexceptforashorttimeinterval(near647.39and678.25daysrespectively)wherethemeasurementsuddenlyexceedsitsexpectedvaluebasedonthepredictionandconfirmedbymeasurementsfromthesurroundingdays.

4 Conclusion

WehaveshownGaussianprocesses(GPs)usingMuyGPtobeacapableandpertinenttoolforanalyzinglightcurvedata.Unlikeothermachinelearningmethodsthatcanhavemillionsofparameters,Gaussianprocessesaresimpleinthattheydonothavealargearchitecturesearchandinthattheyonlyhaveafewparameterstobeestimated.UnliketraditionallyGPestimationmethods,theimplementationoftheMuyGPsmethodologyallowsastronomerstoapplyGPmethodstoverylargedatawithverylittlecomputationalormemoryburden.Inadditiontotheirimprovedaccessibility,wehaveshownthatinourdesignedexperimentsonlightcurves,Gaussianprocessesarebothmoreaccurateandfasterthananexampleneuralnetwork.Further,theuncertaintyquantificationwegetfromGaussianprocessesallowsustoidentifystatisticallydistinctdeviationsfromthepattern-of-life.Asseenabove,whileGaussianprocessesareabletoingestrawuni-dimensionaltimeseries,encodingtheperiodicityofthelightcurvedatathroughadditionaldimensionsyieldsmuchbetterpredictionswhenwepredictlargegapsofobservations.Thisislikelyduetothedistinctperiodictrendsinthelightcurves.However,weobservedthatencodingthedailyperiodicitywassufficientandthatencodingtheyearlyperiodicitydidnotfurtherimproveprediction.Afirstexplanationisthatsurroundingdaysaretypicallymoresimilarthansurroundingyears,sothemajorityof“nearestneighbors”usedintheGPmodelwouldnaturallybeselectedfromsurroundingdaysratherthanfromsurroundingyears.WehavealsoshownhowGaussianprocessescompareadvantageouslytoneuralnetworksbothintermsofcomputingtimeduringtrainingandpredictionaccuracy.Notethatourmetricforcomputingtimecombinesbothtrainingandpredictiontimefromthetwomodels.ThismetricmasksthetradeoffthatatrainedneuralnetworkcanmakeadditionalpredictionsefficientlywhereastrainingaGaussianprocessusingMuyGPsisveryfast,butthepredictiontimegivenatrainedmodelismuchslowercomparatively.[fastmuygps]demonstrateawaytoimprovepredictiontimeofasimlarkernelinterpolationwithanadditionalapproximation.However,intheonlinescenarioofthelightcurveproblemwhereonemaywanttoperformthisanalysisinnearrealtime,onewouldbothretrainandpredictsimultaneouslysotheGPmethodissignificantlypreferred.BothmethodscanbeparallelizedtotakeadvantageofHighPerformanceComputing(HPC).ForDNNs,parallelizationstechniquesarereadilyavailable,forinstanceinPyTorch,withtheadventofGPUsandTPUs.ForMuyGPs,parallelizationseffortsareinprogressandwillbereleasedinadifferentpublication.WeranalloftheexperimentsinthismanuscriptonasinglecoreofacommoditylaptopusingthesoftwarelibraryMuyGPyS [muygpys2021github].FutureapplicationswilluseMPI [dalcin2011parallel]andJAX [jax2018github]toscalemodeltrainingandevaluationtomultiplecomputenodesusingmanyCPUsandGPUsinparallel.ThesescalabilityfeatureswillenableapplicationsthatscaletotheobservationsizesofLSST,whichareanticipatedtoinvolvehundredsofmillionsofspaceobjects.PossiblefuturedirectionsofresearchinvolvecomparingGPtrainingfromonesetoflightcurvesandusingthesetrainedparameterstomakepredictionsforadifferentsetofunseenlightcurves.Thiswilldeterminewhetheritisnecessarytofitparameterstoeachlightcurve.Finally,sinceourmethodgivesfullyinterpolatedlightcurves,futureworkcouldusethesede-noisedfulltimeseriesforfurtherSDAstudies.Insummary,webelieveourGPmethodisavaluablemethodforlightcurveinterpolationandfuturepredictionforthreescientifictasksofparticularinterest.First,byde-noisingandinterpolatingmissingdata,GPscanbepartofapre-processingpipelinebeforefeedingthedatatofurtheralgorithmsorsoftwarethatcannotworkwithrawdata.Asshowninfigures 1and 2,rawlightcurvescanhavealotofgapsfrequentlyspanningmultipledayssobeingabletointerpolatethosegapsiscrucial.Second,bybeingabletoextrapolatedataoverseveralhours,days,orevenweeks,GPscanbeusedtopredicttheexpectedbehaviorofRSOs,whichcanserveinforecastingandcollisionavoidanceinSDA.Lastandnotleast,byquantifyingtheuncertaintyofpredictions,GPsallowforautomaticchangedetection.Indeed,beingabletopredictafullposteriordistributionenablesdiscerninganomaliesinmeasurementsfromanomaliesinprediction,sinceuncertainpredictionscaneasilyleadtofalsepositives.Infutureresearch,itshouldbepossibletousetrackingdataand/ordocumentedknownmaneuverstofine-tuneandbenchmarkourmaneuverdetectioncapabilities.

Acknowledgments

ThisworkwasperformedundertheauspicesoftheU.S.DepartmentofEnergybyLawrenceLivermoreNationalLaboratoryunderContractDE-AC52-07NA27344withIMreleasenumberLLNL-PROC-839253.FundingforthisworkwasprovidedbyLLNLLaboratoryDirectedResearchandDevelopmentgrant22-ERD-028.ThisdocumentwaspreparedasanaccountofworksponsoredbyanagencyoftheUnitedStatesgovernment.NeithertheUnitedStatesgovernmentnorLawrenceLivermoreNationalSecurity,LLC,noranyoftheiremployeesmakesanywarranty,expressedorimplied,orassumesanylegalliabilityorresponsibilityfortheaccuracy,completeness,orusefulnessofanyinformation,apparatus,product,orprocessdisclosed,orrepresentsthatitsusewouldnotinfringeprivatelyownedrights.Referencehereintoanyspecificcommercialproduct,process,orservicebytradename,trademark,manufacturer,orotherwisedoesnotnecessarilyconstituteorimplyitsendorsement,recommendation,orfavoringbytheUnitedStatesgovernmentorLawrenceLivermoreNationalSecurity,LLC.TheviewsandopinionsofauthorsexpressedhereindonotnecessarilystateorreflectthoseoftheUnitedStatesgovernmentorLawrenceLivermoreNationalSecurity,LLC,andshallnotbeusedforadvertisingorproductendorsementpurposes.

	${ˆ Y}_{θ} (x^{} ∣ X_{N^{}})$	$= K_{θ} (x^{}, X_{N^{}}) Q_{θ} (X_{N^{}}, X_{N^{}})^{- 1} Y (X_{N^{*}}), and$		(6)
	$Var ({ˆ Y}_{θ} (x^{} ∣ X_{N^{}}))$	$= K_{θ} (x^{}, x^{}) - K_{θ} (x^{}, X_{N^{}}) Q_{θ} (X_{N^{}}, X_{N^{}})^{- 1} K_{θ} (X_{N^{}}, x^{}),$		(7)