\TitleFontLight curve completion and forecasting using fast and scalable Gaussian processes (MuyGPs)

Imène R. Goumiri
Physics Division
Lawrence Livermore National Laboratory Alec M. Dunton
Center for Applied Scientific Computing
Lawrence Livermore National Laboratory Amanda L. Muyskens
Engineering Division
Lawrence Livermore National Laboratory Benjamin W. Priest
Center for Applied Scientific Computing
Lawrence Livermore National Laboratory Robert E. Armstrong
Physics Division
Lawrence Livermore National Laboratory
Abstract

Temporalvariationsofapparentmagnitude,calledlightcurves,areobservationalstatisticsofinterestcapturedbytelescopesoverlongperiodsoftime.LightcurvesaffordtheexplorationofSpaceDomainAwareness(SDA)objectivessuchasobjectidentificationorposeestimationaslatentvariableinferenceproblems.Ground-basedobservationsfromcommercialofftheshelf(COTS)camerasremaininexpensivecomparedtohigherprecisioninstruments,however,limitedsensoravailabilitycombinedwithnoisierobservationscanproducegappytime-seriesdatathatcanbedifficulttomodel.Theseexternalfactorsconfoundtheautomatedexploitationoflightcurves,whichmakeslightcurvepredictionandextrapolationacrucialproblemforapplications.Traditionally,imageortime-seriescompletionproblemshavebeenapproachedwithdiffusion-basedorexemplar-basedmethods.Morerecently,DeepNeuralNetworks(DNNs)havebecomethetoolofchoiceduetotheirempiricalsuccessatlearningcomplexnonlinearembeddings.However,DNNsoftenrequirelargetrainingdatathatarenotnecessarilyavailablewhenlookingatuniquefeaturesofalightcurveofasinglesatellite.Inthispaper,wepresentanovelapproachtopredictingmissingandfuturedatapointsoflightcurvesusingGaussianProcesses(GPs).GPsarenon-linearprobabilisticmodelsthatinferposteriordistributionsoverfunctionsandnaturallyquantifyuncertainty.However,thecubicscalingofGPinferenceandtrainingisamajorbarriertotheiradoptioninapplications.Inparticular,asinglelightcurvecanfeaturehundredsofthousandsofobservations,whichiswellbeyondthepracticalrealizationlimitsofaconventionalGPonasinglemachine.Consequently,weemployMuyGPs,ascalableframeworkforhyperparameterestimationofGPmodelsthatusesnearestneighborssparsificationandlocalcross-validation.MuyGPsallowsustoquicklytrainmodelsonlightcurvedatainsecondsonasingleworkstation.Inthismanuscriptweexplorelightcurvecompletionandforecasting,andcompareembeddingsofthelightcurvedataintomulti-dimensionalspacestotakeadvantageofdailyandyearlyperiodicity.WeshowthatourmethodoutperformsfeedforwardDNNsbothintermsofaccuracyandquantityoftrainingdatarequired.

LLNL-PROC-839253

1 Introduction

Photometriclightcurvesareaseriesofobservationsthattrackthebrightnessofanobjectoveraperiodoftime.Theycanbeusedtocharacterizethedynamicpropertiesofasystem.Astronomersfrequentlyanalyzelightcurvestounderstandawholerangeofastrophysicaltopicsincluding:detailedphysicswithinastar [gaia2019],thediscoveryofexoplanets [kepler2013],classifyingdistantsupernovae [des2022],andcharacterizingthepopulationofnearearthasteroids [atlas2018].SpaceDomainAwareness(SDA)involvesmonitoring,detectingandunderstandingthepopulationofearth-orbitingbodiesorresidentspaceobjects(RSOs).SDAisofincreasingimportanceduetotheincipientgrowthinthenumberofspaceobjectsanddebrisorbitingtheearth,dueinlargeparttotherecentcommercialdevelopmentofsatellites.ThenumbersosuchRSOsisexpectedtogrowbyordersofmagnitudeinthenextdecade.WhileSDAisconcernedwithmaintainingcustodyoforbitingobjects,italsoprioritizestherapididentificationofchangestoandanomalousbehaviorofthemanyvaryingorbitalsystems.Fortunately,lightcurvesarevaluableforinspectingRSOsinmuchthesamewayasastrophysicalphenomena,andcanenablecriticalinformationforSDA.ThebrightnessofanRSOdependsonstructuralfeatureslikesize,shape,andmaterialcompositionaswellthegeometrybetweentheobject,thesunandobserver.Theproliferationoflowcostcommercial-off-the-shelf(COTS)ground-basedtelescopeshasmadethegenerationoflightcurvesfromRSOseasiertoproduce.Furthermore,manyconstellationsofground-basedtelescopeshavebeentaskedwithtrackingRSOsforSDA.ThesefactorshaveenabledtherelativelycheapproductionoflargevolumesofRSOlightcurves,whichhaspromptedpractitionerstoapplyautomationtoanalyzethematscale.PractitionershaverecentlyappliedmachinelearningtolightcurvesofRSOstosolvevariousSDAtasks.Forexample,comparingthelightcurveofanunknownRSOtoacatalogofknownRSOsisusefulforpredictingfeatureslikeshape [linares2014space, furfaro2019shape],materialcomposition [dianetti2019space],andgeneralcategoriesorclasses [linares2016space, jia2018space, furfaro2018space].Furthermore,forecastingRSOlightcurvesintothefutureisusefulfordetectingdeviationsfromtheexpectedpatterns-of-lifeinnear-real-time,affordingthedetectionofanomalouseventssuchasmaneuvers [shabarekh2016novel, dupree2021time]orconfigurationchanges.Thegoalofmachinelearninginthiscontextistolearnafunctionmappinganinputspace(typicallythetimedomain)toaanobservationspace,e.g.magnitude,baseduponindependentandidenticallydistributed(i.i.d.)samplesfromadistributionofinput-observationpairs.Wewillrefertothisdatadistributionasthe“targetdistribution”.Amachinelearningmodelissuccessfulwhenitisabletoaccuratelypredicttheobservationofanunseeninputdrawnfromthetargetdistribution.Bothdiffusion-basedorexemplar-basedmethodshavebeentraditionallyusedinimageortime-seriescompletionproblems.Diffusion-basedmethods [sohl-dickstein2015deep]usethermaldiffusionequationstopropagateinformationfromsurroundingregionsintomissingregions,theyaremostlyeffectivewhenthegapsaresmallandtendtosmoothdetailsoutforlargerproblems.Exemplar-basedmethods [criminisi2004region]usegreedyalgorithmstoapplypatchesoftrainingdatatomissingregionswhichcanproduceimplausiblepatterns.DeepNeuralNetworks(DNNs)areanespeciallypopulartooltomodellightcurvesintheliteratureduetotheirexpressivenessandgeneralizationcapabilities.ADNNisatypeofrepresentationlearningmodel—amachinelearningmodelthatlearnsanappropriatefeaturerepresentationofthedatainadditiontoproducingpredictions.DNNsiterativelytransformtheinputspaceintolatentfeaturespacesusinglinearfunctionsthatare“activated”byelement-wisenonlinearfunctions.DNNsalsohappentobeuniversalfunctionapproximators—itispossibletoapproximateanycontinuousfunctiontoanarbitrarylevelofprecisionusingasufficientlylargeDNN.ThepersistentpopularityofDNNsderivesfromseveralsources,suchasthewidespreadavailabilityofhardwareacceleratorslikegraphicalprocessingunits(GPUs),advancedstochasticoptimizationtechniquesthataidintheirtraining,andthedevelopmentofuser-friendlysoftwarelibrariessuchasTensorflow [Abadi_TensorFlow_Large-scale_machine_2015]andPyTorch [NEURIPS2019_9015].AlthoughDNNshavemanypositivefeatures,theyalsohavesomedrawbacksthatareparticularlynotableinthelightcurveproblem.First,modernDNNarchitecturestypicallyconsistofaverylargenumberofparametersthatmustbetrained.Trainingsuchmodelsgenerallyconsumesalargeamountofcomputingresources,asthemodelanditsgradientareevaluatedovermanyiterationsinordertorefineparametervalues.Inadditiontocomputationalexpense,trainingalargemodeltypicallyrequiresalargeamountofdata.Theliteraturehasobservedaroughlylinearrelationshipbetweenmodelsizeandtheamountoflabelleddatarequiredtotrainit [tan2018survey].Thismeansthatlearninganaccuratelightcurvemodelcanrequirealargenumberofobservationsingeneral.Second,whileDNNmodelslendthemselvestohighdimensionalfeaturespacestheytendtostrugglewithverysmallnumbersofdimensions.Featureengineeringcanoftensolvethisproblem,especiallyintimeseriesscenarioswheresomenotionofamovingwindowusuallyservesasafeaturespace.However,thisstrategyismostsuccessfulwhentheobservationcadenceofthelightcurveishigh.Upcomingground-basedsurveysofdeepspace,suchastheLegacySurveyofSpaceandTime(LSST)areexpectedtoincidentallycapturemanyRSOimagesfromwhichlightcurvescanbeextracted.However,theselightcurveswillbeirregular,sparse,andhavelow-dimensionalfeaturespaces,increasingthechallengeofapplyingexistingtechniques.Third,effectivetrainingtypicallyrequiresalargevolumeoftrainingdatathatiscompleteandrepresentativeofthetargetdistributioni.e.alargemodelrequiresalargeamountoftrainingdatathatisi.i.d.accordingtothetargetdistribution.Inadditiontoinformingpredictionaccuracy,dataindependenceandsizehelpscomplexDNNmodelsavoidoverfittingandsimplymemorizingthetrainingdata.However,collectionsofphysicalmeasurementsareoftenlimitedbytherealitiesofsensoravailabilityandphysicalobstruction.Forexample,weatheraffectsopticalastronomicalmeasurementsbyintroducingcorrelatednoiseorblockingthedesiredobjectfromviewentirely.Furthermore,itisingeneraldesirabletodesignamodelthatcanbealternativelyappliedtoseveraldifferentRSOinferenceproblems,i.e.severaldistincttargetdistributions.Atransferlearningapproach-onewherethemodelisatleastpartiallytrainedondatafromapossiblydifferentdistributionthanthetargetdistribution [tan2018survey]-couldaddressthisneedforalargevolumeoftrainingdata.Indeed,transferlearninghasbeenappliedtosatelliteapertureradarimages [rostami2019deep],radiofrequencyinterferencecharacterization [lefcourt2022space],andclassifyingRSOsusinglightcurves [furfaro2018space].However,transferlearningultimatelyreliesonfeatureslearnedfromapotentiallyunrelateddataset,whichmayleadtoinefficientorinaccurateconclusions.GenerativemodelssuchasVariationalAutoencoders(VAEs)andGenerativeAdversarialNetworks(GANs)areadditionalalternativesfromthemachinelearningliteraturethataddressdataefficiency.VAEsareautoencodersthatimposestructureupontheirlearnedencoderanddecoderfunctionstoensurethatthelatentrepresentationofthedataencodesthetargetdistribution.ResearchershavesuccessfullyappliedVAEstolearnshapesfromthelightcurvesofRSOs [furfaro2019shape].GANsconverselysimultaneouslytrainmodelsthatgeneratesamplesmeanttomimicthetargetdistributionwhiledistinguishingbetweenrealandsyntheticsamples.PractitionershaverecentlyusedGANstoclassifystarsbasedupontheirlightcurves [garcia2022improving].However,GANscanbechallengingandexpensivetotrain.GANsinferadistributionfromaverysmalltrainingdatasetandcansufferfrommodecollapse,non-convergenceorgeneralinstability.Thisbringsustothefourthpoint—DNNshavedifficultyquantifyingtheuncertaintyintheirpredictions.ItisgenerallydifficulttodeterminewhenaDNNisextrapolating—predictingtheresponseondatafromadifferentdistributionorinadifferentregionofthetrainingdata.Thisisproblematicinscientificapplicationsbecauseoverconfidentpredictionscanleadtoincorrectinferenceandunnaturalresultsthataredifficulttointerpret.Furthermore,itisimportantfordecisionmakerstodistinguishbetweenlowandhigh-confidencepredictionswhendrawingconclusionsforSDA.Althoughuncertaintyquantificationmethodsareinresearch,mostpracticalmodelsemployedbypractitionersprovideonlypointpredictionswithnoobviousmechanismtomeasuremodelconfidence.RecentattemptstoprovideuncertaintyquantificationtoDNNsattempttolearnpredictionintervals(i.e.errorbars)inadditiontopointpredictions,butthisliteratureisstilldevelopingandthereisasyetnoconsensussolution [kabir2018neural].OthersattemptahybridapproachwhereaGaussianprocessisappendedtothelastlayerofaDNN [bradshaw2017adversarial].InthismanuscriptweproposeGaussianProcess(GP)modelsasalternativesforthelightcurvemodelingproblem.LikeDNNs,GPsarerepresentationlearningmethodsthataredata-drivenuniversalfunctionapproximators.AGPisatypeofkernelmethodthatimplicitlyandnon-linearlymapsinputfeaturesintoapossiblyinfinitelydimensionalinnerproductspace.Kernelmethodsuseaparameterizedkernelfunctiontocheaplycomputeinnerproductsinthisimplicitspacetopredictunknownresponses.GPscanalsobethoughtofasthegeneralizationofamultivariatenormaldistribution,wherealldatawithinadefineddomainisassumedtobejointlynormallydistributedwiththecovariancedefinedbythekernelfunction.Therefore,predictionsfromGPmodelsareprobabilisticallydefinedthroughconditionalprobability.GPsareattractiveforscientificapplicationsduetothisinherentlyBayesianinferencemodel,whichproducesexplicituncertaintyquantification(Gaussiandistributions)ofitspredictions.Furthermore,GPshavebeenshowntooutperformDNNsindata-starvedregimes [muyskens2022star]andareidealmodelsforlow-dimensionalfeaturespaces [muyskens2021muygps].However,conventionalGPshaveverypoorscalinginthenumberoftrainingobservations,whichhaspreviouslylimitedtheirapplicationtothelightcurvemodelingproblem.BothrealizingthepredictionsfromtheGPmodelandevaluatingthelikelihoodfunctionintrainingrequirecubiccomputationinthenumberofdatapoints,andrequirequadraticmemorytostorethekernelmatrix.WhilethispracticaldrawbackhastendedtolimittheapplicationofGPstosmalldataproblems,scalableapproximateGPmethodshaveproliferatedintheliterature [heaton2019case, liu2020gaussian].Thesemethodstypicaltradeoffaccuracyforspeed.However,theapproximateGPmethodMuyGPsoffersaselectionofsettingsthatdemonstratessuperioraccuracyorcomputationalscalinginseveraldatasets[muyskens2021muygps, muyskens2022star, buchanan2022gaussian].Therefore,inthispaperweusetheMuyGPsapproximateGPestimationmethod [muyskens2021muygps].MuyGPsusesnearestneighborssparsificationandabatchedleave-one-outcross-validationobjectivefunctiontotrainaGP-likemodelinnearlylineartimeinthenumberofdataobservations.InvestigatorshavesuccessfullyappliedMuyGPstocosmologyimageprocessingproblems [goumiri2020star, muyskens2022star, buchanan2022gaussian].Inthispaper,weproposeamethodoflightcurveinterpolationandpredictionusingMuyGPsthatgivesbothpredictionsanduncertaintyquantificationofthosepredictions.Weusethismethodintwomodes:

  1. Interpolation,thepredictionofunobservedmagnitudesinthepast,and

  2. Forecasting,thepredictionofunobservedfuturemagnitudes.

Interpolationisusefulasadatapreprocessingutilityforcatalogs.Interpolationallowsustofillinmagnitudeobservationsforsparse,irregularlightcurvestomakethemsuitablefordownstreamcomparisontaskssuchasshape,pose,size,type,ormaterialestimation.Forecastingisusefultodetectdeviationsfromexpectedfuturepatterns-of-lifethatcorrespondtoanomalies.TheposteriorvarianceprovidedbyGPsisespeciallyusefultodeterminewhatconstitutesasignificantdeviationfromexpectedbehavior.Insection 2,weprovidebackgroundonthelightcurvedataitself,dataprocessing,aswellasthemachinelearningmethodsweutilizeinourcomparativestudy.Theninsection 3wedescribeseveralnumericalstudiesthatdemonstratethecomparativeperformanceofseveralchoicesofourGPmethodtothatofaDNNandhowtheuncertaintyquantificationprovidedbyourmethodcanbeusedtodetectanomalies.Finally,insection 4,wediscussourconclusions,theimportanceofourfindings,limitations,andfutureworkassociatedwithourmethodology.

2 Methodology

2.1 Lightcurvesdefinition

Lightcurvesaretimeseriesofthebrightnessofresidentspaceobjectsoverlongperiods.Theyareobtainedbyobservingthesameobject,nightafternight,forseveraldaysorevenyears.Onewaytovisualizethemisintwodimensionswithoneaxisrepresentingthetimeofday(orsolarphaseangle)andtheotheraxisrepresentingthedayswhilethepseudo-colorindicatesthebrightness,asinFig. 1.

Thelightcurveofasingleobjectover4years.Eachcolumnrepresentsasingleday.Thecolorscalerepresentsthebrightness.Theblackstripesand
Figure 1: Thelightcurveofasingleobjectover4years.Eachcolumnrepresentsasingleday.Thecolorscalerepresentsthebrightness.Theblackstripesandholeshighlightthesparsityofobservations;theformerareduetoweather,thelattertoeclipse.Theseasonalityandyearlyperiodicityinducestheoutershape.Courtesyof [monetgrams].

LightcurvesofRSOscontainpotentiallydetailedinformation.Forexample,onecoulddetectdustaccumulatingonhighlyreflectivesolarpanelsanddeducetheageandlifeexpectancyofasatellite.Higherintensityreflectionsalsohavethepotentialtoinformabouttheshapeoftheobservedobjectsincedifferentfacetsofamulti-facetedobjectwouldreflectdifferently.Inaddition,satellitescanmaneuverorrotate,inwhichcase,thereflectedlightwilldeviatefromtheexpectedpattern.ThosedeviationscanbedetectedandincreaseSDAknowledge.Observationscanbelimitedbythetimeofday,weather,sensorsaturation,andeclipses(seeFig. 2).Dataisonlyavailableatnight,andthemeasurementsarelessaccuratenearduskanddawn.Theweathercanprecludeobservationsforhoursandsometimesdaysatatime.Whenthereflectedlightfromthesunisparticularlybright,sensorscansaturateleadingtomissingdata,thoughthisparticularcaseisusuallyfairlyobviouswhenlookingatthebrightnessofsurroundingavailabledata.Lastly,periodically,thelightfromasatellitewillbeeclipsedbytheearth.

a)Missingdataduetoa)timeofday,b)weather,c)sensorsaturation,andd)eclipse.b)Missingdataduetoa)timeofday,b)weather,c)sensorsaturation,andd)eclipse.c)Missingdataduetoa)timeofday,b)weather,c)sensorsaturation,andd)eclipse.d)Missingdataduetoa)timeofday,b)weather,c)sensorsaturation,andd)eclipse.

Figure 2: Missingdataduetoa)timeofday,b)weather,c)sensorsaturation,andd)eclipse.

TotestourcodesweusealightcurvedatasetprovidedbyDaveMonet[monetgrams]of43satellites.Allsatellitesinthedatasetarefromthepubliccatalog(www.space-track.org).Weselected13distinctRSOs,thosemanuallyflaggedbyDaveMonetasnominal,forouranalysis.Thereareapproximately500,000datapointsperobject.AllobservationsweretakenfromasinglecamerainFlagstaffAZbetweenSeptember2014andSeptember2018.Thedatasetcontainsthebrightness(magnitude;bandisapproximatelyJohnsonV)aswellasthemeasurementerrorinbrightness(uncertainty).

2.2 Processing

Totestourpredictioncapabilities,weselectaportionofeachlightcurveastestdata.Weusetimeperiodsrangingfromafewhours(3hours)tomultipleweeks(2weeks).Andweeitherselectthetestdataatarandomtimewithinthetimeseries(interpolation)orattheend(extrapolation).Toguaranteethattheselectedperioddoesnotfallatatimelackingobservations(e.g.duringthedayorduringacloudynight)werejectthosethathavefewerthan90%ofthedatawe’dexpecttofindinthattimeintervalifthedatawasuniformlydistributed.Inaddition,weapplythatsameconditiontotheprecedingintervaltoensurethatwe’renotaccidentallyinterpolatingforlongerthatwewouldexpect.Sincethelightcurveshavesomedailyandyearlyperiodicity,wecompareseveralembeddingsofthelightcurvedataintomulti-dimensionalspaces.Thereferenceisthe1Dmodelwhichisjusttheoriginaltimeseries.The2Dmodelhasonedimensionrepresentingtimesofdayasrealnumbersintheinterval,andanotherdimensionrepresentingdaysasintegers,startingatzeroonthefirstdayofobservation.The3Dmodelislikethe2Dmodelbutwithanadditionaldimensionforyears,asintegersbetween0and4,andwithamodifieddaysdimensionmodulo365.Notethatalltheinputdimensionsareeventuallynormalizedtotheintervalasiscustomary.Analternate2Dmodelwithyearandtimeofyearasopposedtodayandtimeofdaywasconsideredbutdismissedforsimplicityandfornotbeingasintuitiveandinteresting.Weobservedthatday-to-daycorrelationisstrongerthanyear-to-yearinourdataset.Notethatitismoretraditionaltomodelthistypeofdataperiodicityusingaperiodickernel,buttheMuyGPsestimationframeworkdependsonakernelsparsitythatisultimatelynotvalidatedwiththatkerneltype.Therefore,theseembeddingsareanovelwaytoreplacesuchaperiodickernelwhilemaintainingthecomputationallyefficientframework.

2.3 Gaussianprocesses

Wewillconsiderthroughoutalightcurvetobeaunivariateresponse,whereistheobservationspacealongtimedimensions.Inpreprocessingwede-trendthedata,andsothereforehaszeromean.InmodelingwithaGP,weassumethatitisdrawnfromaGaussianprocessdistribution.ThismeansthatfollowsamultivariateGaussiandistributionatanyfinitesetofpoints.However,inrealitymeasurementnoiseperturbsourobservedvaluesofatlocations.Weassumethateachmeasurementisperturbedbyhomoscedasticnoisethatarei.i.d..Thatis,

(1)
(2)

whereisthemultivariateGaussiandistribution,isthe-dimensionalzerovector,isavariancescalingterm,istheidentitymatrix,andisanpositivedefinite,symmetriccovariancematrixbetweentheelementsofthatiscontrollednon-linearlythroughkernelfunctionwithhyperparameters.isthatisperturbedbythemeasurementnoisepriorandscalingparameter.Similarly,anyadditional(possiblyunobserved)datumisalsojointlynormalwithobserveddatabytheGPassumption.Thus,theconditionaldistributionfortheresponseofatgivenresponsesobservedatandisalsomultivariatenormalwithmeanandvariance

(3)
(4)

whereisthecross-covariancematrixbetweenandtheelementsof.GPsaretypicallytrainedbymaximizingthelog-likelihoodoftheobservationswithrespectto.Thislog-likelihoodfunctionpossessesthefollowingform:

(5)

However,evaluatingEquation 5iscomputationandinmemory,whichisintractableforallbutrelativelysmalldata.AMuyGPsmodelrewritesequations 3 and 4as

(6)
(7)

wherearethenearestneighborsofin.MuyGPstrainsintermsofsomelossfunctionoverasampledbatchofsizebyminimizinganobjectivefunction.MinimizingallowsustotrainwithoutevaluatingtheexpensivedeterminantintheGPlikelihood.Whenwesettoleave-one-outcross-validationandtomeansquarederror,trainingreducestominimizingthefunction

(8)

Noteotherlossfunctionscanbeemployedinthisframework,butthismeansquarederrorfunctionhasbeendemonstratedasperformant [muyskens2021muygps].WeuseBayesianoptimizationtooptimizeEquation 8totraininourexperiments.

2.4 Astandardneuralnetworkmodelforbenchmarking

Neuralnetworkshaveachievedstate-of-the-artresultsoncommonbenchmarkproblemsandarenowubiquitousinscientificapplications.AlthoughthefocalpointofthisworkistheapplicationofGaussianprocessestolightcurvemodeling,webenchmarkthepredictiveperformanceofMuyGPsagainstastandarddeepneuralnetworkmodeltoensureitspredictiveaccuracy.Weselectafully-connectedarchitecture,asthestructureofthefeaturespaceisnotsuitedtoconvolutionalmodels.Moreover,recurrentneuralnetworks,includingLSTMs,areanotherpopularclassofmodelsthatareusefulinforecastingsettings,butnotinterpolationproblems.Fully-connectedneuralnetworkscanberepresentedasthecompositionofaseriesofaffinetransformationsandnonlinearactivationfunctions.Thecompositionofanindividualaffinetransformationandnonlinearactivationfunctionconstitutesalayerofthefully-connectednetwork.Eachaffinetransformationinthenetworkisparameterizedbyweights,whereisthedimensionoftheinputtothelayerandistheoutputdimension,andabiasvector.Welabeltheactivationfunctioninthelayer.Letbeaneuralnetwork.Then,

(9)

Wetrainafully-connectedneuralnetworkusingthedayoftheyearandtimeofdayastheindependentvariables(features),withthetargetsettothemagnitude(normalizedtotakeonvaluesbetween0and1bydividingbythelargestmagnitudeobserved).ThemodelarchitecturefeaturesReLUactivationfunctionsand5layersofsizes,,,,and.Therearetotalparametersinthemodel,roughlythenumberoftrainingsamplesdependingonthetestcase.Wetraintheneuralnetworkinbatchesofsize128usingtheAdamoptimizerfor100epochs.Weusethemean-squarederrorlossfunctionwith20percentofthetrainingdatawithheldforvalidation,thetrainingratesetto,andalearningrateexponentialdecayfactorof0.5appliedevery30trainingepochs.Wesettheratioofthenumberofparametersinthemodeltothenumberofdatapointsontheorderof,acommonlyacceptedratiointhedeeplearningcommunity.Weselectedthesetraininghyperparametersbasedoncommonvaluesusedindeeplearningtoachievethebestcombinationoftraininglossdecayandvalidationerrorminimization.

3 Results

FirstwecomparethepredictingpowerofGaussianprocessesforthedifferentembeddingsoftheinputdatadescribedinsubsection 2.2.Foreachofthe13lightcurvesinourdataset,werandomlyselectatotalof20timeintervals,5foreachtimedurationin{3hours,1day,1week,2weeks},andusethoseintervalsastestdatawhileusingtherestastrainingdata.TheaccuracyofthepredictionismeasuredastheRootMeanSquareError(RMSE)dividedbytheextent(max-min)ofthemagnitudeinthetestdata.Weusethecommonradialbasisfunction(orRBF)todefineourkernelinallofourexperiments.Thismeansthatweusethekernelfunction

(10)

TheRBFkernelhaslengthscaleparameter,andourmodeladditionallyhasthemeasurementnoisevariancepriorandvariancescalingparameter.Wefixinourexperiments.WetrainbyoptimizingequationEquation 8viaBayesianoptimization.However,noprediction-basedobjectivefunctionissensitiveto,andsoweoptimizeitaccordingtotheanalyticproceduredescribedinSection2of[muyskens2021muygps].Fig. 3showsthedistributionoftheresultingcomparisonofthepredictionstothetruthforeachembedding.Withamedianof0.066,the2Dembeddingyieldsmorethan3timesbetterpredictionsthantheoriginal1Dembedding(median:0.24).Thebetterpredictionispossiblebecausethedistancebetweensimilardatapointsacrossdaysisreduced,thankstotheextradimension,allowingtheGPtomorereadilyusethemduringinterpolation.Howeverweseethataddingathirddimensionencodingtheyearlyperiodicitydoesnotyieldsignificantimprovementsoverthe2Dembeddingsincetheresultsarequasi-identical(median:0.065).Althoughthedistributionsof2Dand3Dembeddingsaresimilar,fromthison,wefocusonthe2Dembeddinganduseitexclusively.

Accuracyofpredictionsfordifferentdataembeddingsasastandardboxplotrepresentingthedistributionofresultsofcompleting5randomgapsof4distinctdurationsrangingfrom3hoursto2weeksforeachofthe13lightcurvesinthedataset.1Distherawtimeseriesofmagnitudes,2Dtakesintoaccountdailyperiodicity,and3Dtakesintoaccountbothdailyandyearlyperiodicity.WhileGPsbenefitgreatlyfromthedailyperiodicityinformation,theeffectofaddinginformationaboutyearlyperiodicityiscomparativelynegligible.
Figure 3: Accuracyofpredictionsfordifferentdataembeddingsasastandardboxplotrepresentingthedistributionofresultsofcompleting5randomgapsof4distinctdurationsrangingfrom3hoursto2weeksforeachofthe13lightcurvesinthedataset.1Distherawtimeseriesofmagnitudes,2Dtakesintoaccountdailyperiodicity,and3Dtakesintoaccountbothdailyandyearlyperiodicity.WhileGPsbenefitgreatlyfromthedailyperiodicityinformation,theeffectofaddinginformationaboutyearlyperiodicityiscomparativelynegligible.

Partofthemotivationforbeingabletopredictdatainlightcurvesistocompletegapsandmissingdatapoints.Afirstbenchmarkistobeabletocompleterandomlyselecteddatapointsthroughoutalightcurve.Werandomlyselecteither5%,10%,or20%ofthetotalnumberofdatapointavailableineachofthe13lightcurvesinthedataset,5timesforeachproportion.Then,formorerealisticinterpolation,weselectconsecutivedatapointsrepresentinggapsofeither3hours,1day,1week,or2weeks,with5differentrandomstartinginstantsforeach13lightcurve.LastlywerepeatthesameprocedurebutbyselectinggapsattheendofeachlightcurvetoevaluatehowGPsperformforextrapolation.Fig. 4showstheperformanceofGPpredictionsforalloftheseinterpolationandextrapolationtasks.Forrandominterpolation,forallpercentages(5%,10%,or20%),theperformanceissimilarandverygoodwithamedianrelativeRMSEofabout0.03.Forgaps,eitherinterpolationorextrapolation,thereismorevariation,bothforagivengapduration,andacrossdurations.

ComparisonofthepredictingpowerofGPs(with2Dembedding)forvariousinterpolationandextrapolationtasks.Theleftpaneisforinterpolatingrandomlyselecteddatapoint,representingfrom5%to20%oftheentirelightcurves.Themiddlepaneisforinterpolatinggapsofdurationsrangingfrom3hoursto2weeks.Therightpaneisforextrapolatinggapsofsimilardurationsattheendofeachlightcurves.Randomdatapointsareinterpolatedwithgreataccuracy,evenwhentheyrepresentasignificantportionofthedataavailable.Interpolatinggapsovershortperiodsoftimeismorechallenging.
Figure 4: ComparisonofthepredictingpowerofGPs(with2Dembedding)forvariousinterpolationandextrapolationtasks.Theleftpaneisforinterpolatingrandomlyselecteddatapoint,representingfrom5%to20%oftheentirelightcurves.Themiddlepaneisforinterpolatinggapsofdurationsrangingfrom3hoursto2weeks.Therightpaneisforextrapolatinggapsofsimilardurationsattheendofeachlightcurves.Randomdatapointsareinterpolatedwithgreataccuracy,evenwhentheyrepresentasignificantportionofthedataavailable.Interpolatinggapsovershortperiodsoftimeismorechallenging.

TocomparetheperformanceofGPscomparedtoDNNs,weranthesametaskofpredictingonedayofmissingdatainasingle4year-longlightcurvewithtrainingdataexclusivelyfromthatsinglelightcurveonthesamemachine.AsshowninFig. 5,GPsachievebetteraccuracyinonlyafractionofthetime.

ComparisonoftheperformanceofourGaussianprocessmethodwithanequivalentneuralnetwork.TheaccuracyismeasuredbytheRMSEscaledbytheextentofthemagnitudeinthetestdataset.ThecomputingtimeisthetotalofthetrainingandpredictiontimeasthosecannotbeseparatedfortheGP,andthepredictiontimeisnegligiblecomparedtothetrainingtimefortheNN.Onthesamedata,theGaussianprocessachievesabetteraccuracyinafractionofthetime.
Figure 5: ComparisonoftheperformanceofourGaussianprocessmethodwithanequivalentneuralnetwork.TheaccuracyismeasuredbytheRMSEscaledbytheextentofthemagnitudeinthetestdataset.ThecomputingtimeisthetotalofthetrainingandpredictiontimeasthosecannotbeseparatedfortheGP,andthepredictiontimeisnegligiblecomparedtothetrainingtimefortheNN.Onthesamedata,theGaussianprocessachievesabetteraccuracyinafractionofthetime.

Fig. 6illustratesasingleexampleofthequalityoftheGPpredictioncomparedtorawmeasurementsforadayofmissingdatausingthe3Dembedding.Themeanpredictionfollowsthetrendofmagnitudeswellasexpected.AsistypicalinGPmodels,ourkernelmodelisassumedtobestationaryandhomoscedastic,meaningthatwelearnasinglesetofhyperparametersthatbestmodelsallthetrainingdata(acrossallmagnitudes).Inthisintervalpictured,weseethatthedataathighmagnitudesseemtohavemorevariancethanthedatacollectedatlowermagnitudes.Ourmodelisagnostictoanychangeinvarianceinthedataitself,butwillonaverageprovidedesireduncertaintyquantification.Notethisonetimegapisasingleintervalthatisinitselftime-correlatedinmagnitudeandthereforevarianceregime.Therefore,thecoverageofthe95%intervalfromthisonesamplecanbedifferentthanthedesiredoveralllevel,butwhenweconsidermanyindependenttimeintervalsofthistype,theuncertaintieswillaveragetothedesired95%confidencelevel.InfutureworkmoreflexibleandcomplexGPkernelfunctionscouldbedesignedwithinhomogeneousornon-stationarykernelstocorrectforthispotentialpatterninmagnitudetoimproveuncertaintiesineachindividualsampleregion.Ifthemagnitudedistributioninourpredictionintervaldiffersfromthatofourtrainingdatathenthereisamodeldiscrepancyinourapproachthatcouldexplainourobservedvariances.

GPprediction(orange)comparedtomeasurement(blue)for1dayofmissingdata.Theorangeshadedregioncorrespondstothe95%confidenceintervalofthepredictionwhiletheblueshadedregioncorrespondstothemeasurementstandarderror.ThispredictionwasobtainedusingaRBFkernelwiththreedimensionaltrainingdata.TheGPdoesaverygoodjobapredictingdetailswhileeffectivelyfilteringnoise.
Figure 6: GPprediction(orange)comparedtomeasurement(blue)for1dayofmissingdata.Theorangeshadedregioncorrespondstothe95%confidenceintervalofthepredictionwhiletheblueshadedregioncorrespondstothemeasurementstandarderror.ThispredictionwasobtainedusingaRBFkernelwiththreedimensionaltrainingdata.TheGPdoesaverygoodjobapredictingdetailswhileeffectivelyfilteringnoise.

ButthetruepowerofGPscomesfromtheirabilitytopredictafullposteriordistributionandnotjustamean.Thepredictedcovariancecanbeusedtodetectareaswherethepredictiondiffersfromthemeasurementbyasignificantamount.OneapplicationofparticularinterestforSDAisthedetectionofanomalies—potentialmaneuversorstatechanges—inlightcurvedata.Fig. 7showstwoexampleofsuchanomalies.Inbothcases,twofactorsmaketheanomaliesnoticeable.Firstly,thepredictiondiffersfromthemeasurementforasmallbutsignificantperiodoftime,andthedifferenceexceedstwotimesthepredictedstandarddeviation.Andsecondly,themeasurementsfromthesurroundingdayslooksimilartothoseofthefocalday—andtotheprediction—exceptforthatparticularperiod.Thecombinationofthosetwofactorsindicatethatitisindeedthemeasurementsthatareanomalousandnotthepredictions.

Exampleofdetectedanomalies.Thepredictionissmoothandmatchesthemeasurementwellexceptforashorttimeinterval(near647.39and678.25daysrespectively)wherethemeasurementsuddenlyexceedsitsexpectedvaluebasedonthepredictionandconfirmedbymeasurementsfromthesurroundingdays. Exampleofdetectedanomalies.Thepredictionissmoothandmatchesthemeasurementwellexceptforashorttimeinterval(near647.39and678.25daysrespectively)wherethemeasurementsuddenlyexceedsitsexpectedvaluebasedonthepredictionandconfirmedbymeasurementsfromthesurroundingdays.
Figure 7: Exampleofdetectedanomalies.Thepredictionissmoothandmatchesthemeasurementwellexceptforashorttimeinterval(near647.39and678.25daysrespectively)wherethemeasurementsuddenlyexceedsitsexpectedvaluebasedonthepredictionandconfirmedbymeasurementsfromthesurroundingdays.

4 Conclusion

WehaveshownGaussianprocesses(GPs)usingMuyGPtobeacapableandpertinenttoolforanalyzinglightcurvedata.Unlikeothermachinelearningmethodsthatcanhavemillionsofparameters,Gaussianprocessesaresimpleinthattheydonothavealargearchitecturesearchandinthattheyonlyhaveafewparameterstobeestimated.UnliketraditionallyGPestimationmethods,theimplementationoftheMuyGPsmethodologyallowsastronomerstoapplyGPmethodstoverylargedatawithverylittlecomputationalormemoryburden.Inadditiontotheirimprovedaccessibility,wehaveshownthatinourdesignedexperimentsonlightcurves,Gaussianprocessesarebothmoreaccurateandfasterthananexampleneuralnetwork.Further,theuncertaintyquantificationwegetfromGaussianprocessesallowsustoidentifystatisticallydistinctdeviationsfromthepattern-of-life.Asseenabove,whileGaussianprocessesareabletoingestrawuni-dimensionaltimeseries,encodingtheperiodicityofthelightcurvedatathroughadditionaldimensionsyieldsmuchbetterpredictionswhenwepredictlargegapsofobservations.Thisislikelyduetothedistinctperiodictrendsinthelightcurves.However,weobservedthatencodingthedailyperiodicitywassufficientandthatencodingtheyearlyperiodicitydidnotfurtherimproveprediction.Afirstexplanationisthatsurroundingdaysaretypicallymoresimilarthansurroundingyears,sothemajorityof“nearestneighbors”usedintheGPmodelwouldnaturallybeselectedfromsurroundingdaysratherthanfromsurroundingyears.WehavealsoshownhowGaussianprocessescompareadvantageouslytoneuralnetworksbothintermsofcomputingtimeduringtrainingandpredictionaccuracy.Notethatourmetricforcomputingtimecombinesbothtrainingandpredictiontimefromthetwomodels.ThismetricmasksthetradeoffthatatrainedneuralnetworkcanmakeadditionalpredictionsefficientlywhereastrainingaGaussianprocessusingMuyGPsisveryfast,butthepredictiontimegivenatrainedmodelismuchslowercomparatively.[fastmuygps]demonstrateawaytoimprovepredictiontimeofasimlarkernelinterpolationwithanadditionalapproximation.However,intheonlinescenarioofthelightcurveproblemwhereonemaywanttoperformthisanalysisinnearrealtime,onewouldbothretrainandpredictsimultaneouslysotheGPmethodissignificantlypreferred.BothmethodscanbeparallelizedtotakeadvantageofHighPerformanceComputing(HPC).ForDNNs,parallelizationstechniquesarereadilyavailable,forinstanceinPyTorch,withtheadventofGPUsandTPUs.ForMuyGPs,parallelizationseffortsareinprogressandwillbereleasedinadifferentpublication.WeranalloftheexperimentsinthismanuscriptonasinglecoreofacommoditylaptopusingthesoftwarelibraryMuyGPyS [muygpys2021github].FutureapplicationswilluseMPI [dalcin2011parallel]andJAX [jax2018github]toscalemodeltrainingandevaluationtomultiplecomputenodesusingmanyCPUsandGPUsinparallel.ThesescalabilityfeatureswillenableapplicationsthatscaletotheobservationsizesofLSST,whichareanticipatedtoinvolvehundredsofmillionsofspaceobjects.PossiblefuturedirectionsofresearchinvolvecomparingGPtrainingfromonesetoflightcurvesandusingthesetrainedparameterstomakepredictionsforadifferentsetofunseenlightcurves.Thiswilldeterminewhetheritisnecessarytofitparameterstoeachlightcurve.Finally,sinceourmethodgivesfullyinterpolatedlightcurves,futureworkcouldusethesede-noisedfulltimeseriesforfurtherSDAstudies.Insummary,webelieveourGPmethodisavaluablemethodforlightcurveinterpolationandfuturepredictionforthreescientifictasksofparticularinterest.First,byde-noisingandinterpolatingmissingdata,GPscanbepartofapre-processingpipelinebeforefeedingthedatatofurtheralgorithmsorsoftwarethatcannotworkwithrawdata.Asshowninfigures 1and 2,rawlightcurvescanhavealotofgapsfrequentlyspanningmultipledayssobeingabletointerpolatethosegapsiscrucial.Second,bybeingabletoextrapolatedataoverseveralhours,days,orevenweeks,GPscanbeusedtopredicttheexpectedbehaviorofRSOs,whichcanserveinforecastingandcollisionavoidanceinSDA.Lastandnotleast,byquantifyingtheuncertaintyofpredictions,GPsallowforautomaticchangedetection.Indeed,beingabletopredictafullposteriordistributionenablesdiscerninganomaliesinmeasurementsfromanomaliesinprediction,sinceuncertainpredictionscaneasilyleadtofalsepositives.Infutureresearch,itshouldbepossibletousetrackingdataand/ordocumentedknownmaneuverstofine-tuneandbenchmarkourmaneuverdetectioncapabilities.

Acknowledgments

ThisworkwasperformedundertheauspicesoftheU.S.DepartmentofEnergybyLawrenceLivermoreNationalLaboratoryunderContractDE-AC52-07NA27344withIMreleasenumberLLNL-PROC-839253.FundingforthisworkwasprovidedbyLLNLLaboratoryDirectedResearchandDevelopmentgrant22-ERD-028.ThisdocumentwaspreparedasanaccountofworksponsoredbyanagencyoftheUnitedStatesgovernment.NeithertheUnitedStatesgovernmentnorLawrenceLivermoreNationalSecurity,LLC,noranyoftheiremployeesmakesanywarranty,expressedorimplied,orassumesanylegalliabilityorresponsibilityfortheaccuracy,completeness,orusefulnessofanyinformation,apparatus,product,orprocessdisclosed,orrepresentsthatitsusewouldnotinfringeprivatelyownedrights.Referencehereintoanyspecificcommercialproduct,process,orservicebytradename,trademark,manufacturer,orotherwisedoesnotnecessarilyconstituteorimplyitsendorsement,recommendation,orfavoringbytheUnitedStatesgovernmentorLawrenceLivermoreNationalSecurity,LLC.TheviewsandopinionsofauthorsexpressedhereindonotnecessarilystateorreflectthoseoftheUnitedStatesgovernmentorLawrenceLivermoreNationalSecurity,LLC,andshallnotbeusedforadvertisingorproductendorsementpurposes.

References