Prediction-based One-shot Dynamic Parking Pricing

Seoyoung Hong Yonsei UniversitySeoulKorea seoyoungh@yonsei.ac.kr , Heejoo Shin University of California San DiegoSan DiegoUSA hes002@ucsd.edu , Jeongwhan Choi Yonsei UniversitySeoulKorea jeongwhan.choi@yonsei.ac.kr and Noseong Park Yonsei UniversitySeoulKorea noseong@yonsei.ac.kr

2022

Abstract.

Many U.S. metropolitan cities are notorious for their severe shortage of parking spots. To this end, we present a proactive prediction-driven optimization framework to dynamically adjust parking prices. We use state-of-the-art deep learning technologies such as neural ordinary differential equations (NODEs) to design our future parking occupancy rate prediction model given historical occupancy rates and price information. Owing to the continuous and bijective characteristics of NODEs, in addition, we design a “one-shot” price optimization method given a pre-trained prediction model, which requires only one iteration to find the optimal solution. In other words, we optimize the price input to the pre-trained prediction model to achieve targeted occupancy rates in the parking blocks. We conduct experiments with the data collected in San Francisco and Seattle for years. Our prediction model shows the best accuracy in comparison with various temporal or spatio-temporal forecasting models. Our “one-shot” optimization method greatly outperforms other black-box and white-box search methods in terms of the search time and always returns the optimal price solution.

dynamic pricing, parking occupancy prediction, price modeling, prediction-driven optimization

^†^†journalyear: 2022^†^†copyright: acmcopyright^†^†conference: Proceedings of the 31st ACM International Conference on Information and Knowledge Management; October 17–21, 2022; Atlanta, GA, USA^†^†booktitle: Proceedings of the 31st ACM International Conference on Information and Knowledge Management (CIKM ’22), October 17–21, 2022, Atlanta, GA, USA^†^†price: 15.00^†^†doi: 10.1145/3511808.3557421^†^†isbn: 978-1-4503-9236-5/22/10^†^†ccs: Theory of computation Mathematical optimization^†^†ccs: Computing methodologies Neural networks

1. Introduction

Maintaining an adequate parking occupancy rate is a highly essential but challenging problem in crowded metropolitan areas (Saharan et al., 2020; Shao et al., 2018; Camero et al., 2018). In 2011-2013, for instance, San Francisco piloted SFpark, a demand-responsive parking pricing program adjusts parking prices based on observed occupancy rates. (Pierce and Shoup, 2013) If an observed occupancy rate is higher than a target occupancy rate, its price is increased. By adjusting prices in this manner, the program aimed to control parking occupancy rates around a target (ideal) rate of 70%-80%. SFpark’s success has led to its adoption to other cities such as Los Angeles, Seattle, and Madrid (Ghent et al., 2012; Pierce and Shoup, 2013; Lin et al., 2017). They mainly focused on improving parking availability by keeping occupancy rates below the ideal rate, which is also of our utmost interest in this paper.

Comparison of two paradigms of prediction-driven optimization. (a) The black-box query-based method typically requires a large number of queries until convergence of the solution. (b) As the name “one-shot” suggests, our method requires only one query to find the optimal solution. We can easily find the optimal price by solving the reverse-mode integral problem of the NODE layer in our model. — (a) Black-box query-based prediction-driven optimization

In this work, we propose an advanced prediction-driven optimization method to optimally adjust prices and achieve target on-street parking occupancy rates — we conduct experiments for San Francisco and Seattle. Our approach differs from previous methods in the following sense: i) previous methods relied on observed parking demand to optimize price, i.e., reactive, but we implement a prediction-based optimization model which predicts occupancy rates and optimizes prices based on those predictions, i.e., proactive. ii) The price elasticity of demand varies depending on locations (Pierce and Shoup, 2013), which implies that each block’s demand may have a different degree of responsiveness to price changes. This is not considered in the existing approach, as it adjusts parking prices simultaneously for all blocks by fixed amounts. Our approach can change prices in a much more fine-grained manner, e.g., hourly and separately for each block. iii) We adjust the prices accurately, maximizing the influence of price adjustment.

The overall workflow of our prediction-driven dynamic price optimization. (a) Our predictive model consists of three parts: i) an initial prediction module with spatiotemporal processing to process short-term and long-term occupancy history, ii) a price reflection module to adjust the initial prediction with price information, and iii) a final prediction module to make the final predictions on the future occupancy rates of — (a) Overall workflow of how we predict the future occupancy rate

Given a set of parking blocks, denoted ${b_{i}}_{i = 1}^{N}$ , let $y^{*} \in [0, 1]^{N}$ be an $N$ -dimensional vector which contains target occupancy rates for $N$ parking blocks. Our method finds $p^{*} \in [p_{m i n}, p_{m a x}]^{N}$ , where $p_{m a x}$ / $p_{m i n}$ denotes the maximum/minimum price, which makes the pre-trained occupancy prediction model output $y^{*}$ (cf. Fig. 1 (b)). In this research, it is important to design a high-accuracy prediction model since it is imperative to guarantee the quality of price optimization. The most important technical concern is, however, how to integrate the prediction model and the optimization method because an ill-integrated prediction-driven optimization does not return a solution in time (An et al., 2016, 2017; Li et al., 2021; Jeon et al., 2021). The simplest method is to use black-box queries, where the optimization algorithm queries the prediction model about the quality of a solution (cf. Fig. 1 (a)). This black-box method typically requires a non-trivial number of queries. A better approach is the gradient-based white-box method. Gradients are created from the internals of the prediction model to update the input feature (i.e., the parking prices in our case). This approach achieves a shorter search time than that of the black-box method. However, this also requires multiple iterations to achieve a reliable solution.

In this work, we propose yet another “one-shot” white-box search method that can find the optimal price solution with only one query to the prediction model, owing to the continuous and bijective characteristic of neural ordinary differential equations (NODEs (Chen et al., 2018b)). Fig. 2 shows our model design. Since we solve two coupled problems in this paper, the occupancy rate prediction and the price optimization, Figs. 2 (a) and (b) show the two workflows of a) how we forecast the future occupancy rates and b) how we optimize the prices, respectively. The final prediction module in our model, which consists of NODEs, is continuous and bijective, and our one-shot prediction-driven optimization process in Fig. 2 (b) is possible.

Our prediction model consists of three modules as shown in Fig. 2 (a): i) the initial prediction module, ii) the price reflection module, and iii) the final prediction module. The initial prediction module is further decomposed into three layers, depending on the type of processed information. The short-term layer processes the occupancy rates during $K$ recent periods and the long-term layer processes the past occupancy rates older than the short-term information, e.g., a week ago. The concatenated processing layer combines the two hidden representations, one by the short-term layer and the other by the long-term layer, to produce the initial predictions on the future occupancy rates of all those $N$ parking blocks. Up to this moment, however, we do not consider the price information yet, and it is the second price reflection module that processes the price information to enhance the initial occupancy predictions. The final prediction module fine-tunes the predictions to increase the model accuracy. The continuous and bijective characteristics of the final prediction module enables the one-shot price optimization.

The detailed price optimization process is depicted in Fig. 2 (b). We first train the predictive model and then give the observed short/long-term occupancy information and the target occupancy rates $y^{*}$ , we decide the best prices $p^{*}$ which yield the target rates. The red box of Fig. 2 (b) is first processed with the observed short/long-term input to derive the initial prediction $z_{i n i t}$ and the blue box is processed to derive $z_{a d j u s t}^{*}$ . We note that at this moment, the input and output of the final prediction module are opposite in comparison with Fig. 2 (a), which is theoretically possible due to its continuous and bijective properties. Then, we calculate $p^{*}$ which matches $z_{i n i t}$ to $z_{a d j u s t}^{*}$ .

We conduct experiments on two datasets collected in San Francisco and Seattle for multiple years. Our prediction model outperforms various baselines, and our one-shot optimization can very quickly find the optimal prices for most of the parking blocks during the testing period, whereas other black-box and white-box methods fail to do so. In addition, our visualization results intuitively show that our proposed method is effective in adjusting the parking occupancy rates to on or below the ideal occupancy rate. Our contributions can be summarized as follows:

We design a sophisticated future parking occupancy rate prediction model based on NODEs.
We design a novel one-shot price optimization, owing to the continuous and bijective characteristics of NODEs.
In our experiments with two real-world datasets, our prediction model outperforms many existing temporal and spatiotemporal models, and our one-shot optimization finds better solutions in several orders of magnitude faster in comparison with other prediction-driven optimization paradigms.

2. Related Work and Preliminaries

2.1. Neural Ordinary Differential Equations

NODEs solve the following integral problem to calculate the last hidden vector $z (T)$ from the initial vector $z (0)$ (Chen et al., 2018b):

(1)

z (T) = z (0) + \int_{0}^{T} f (z (t); θ_{f}) d t,

where $f (z (t); θ_{f})$ , which we call ODE function, is a neural network to approximate $˙ z def = \frac{d z (t)}{d t}$ . To solve the integral problem, NODEs rely on ODE solvers, e.g., the explicit Euler method, the Dormand–Prince (DOPRI) method, and so forth (Dormand and Prince, 1980).

Let $ϕ_{t} : R^{dim (z (0))} \to R^{dim (z (T))}$ be a mapping from $t = 0$ to $t = T$ created by an ODE after solving the integral problem. It is well-known that $ϕ_{t}$ becomes a homeomorphic mapping (Dupont et al., 2019; Massaroli et al., 2020): $ϕ_{t}$ is continuous and bijective and $ϕ_{t}^{- 1}$ is also continuous for all $t \in [0, 1]$ . It was shown that the homeomorphic characteristic increases the robustness of NODEs to scarce input (Kim et al., 2021; Choi et al., 2021). We conjecture that NODEs are, for the same reason, also suitable for our occupancy prediction, since we cannot feed abundant information to our model due to the difficulty of collecting other auxiliary data. Occupancy prediction is an information-scarce task and therefore, the robustness of the model is crucial for our task.

Reverse-mode integral problem

In addition, the bijective characteristic makes our prediction optimization process easier than other methods. For instance, let $z^{*} (T)$ be the preferred output that we want to see. Our price optimization corresponds to finding the $z^{*} (0)$ that leads to $z^{*} (T)$ . In the case of NODEs, for finding $z^{*} (0)$ , we can solve the reverse-mode integral problem, i.e., $z^{*} (0) = z^{*} (T) - \int_{0}^{T} f (z (t); θ_{f}) d t$ , and $z^{*} (0)$ is unique.

Adjoint sensitivity method

Instead of the backpropagation method, the adjoint sensitivity method is used to train NODEs for its efficiency and theoretical correctness (Chen et al., 2018b). After letting $a_{z} (t) = \frac{d L}{d z (t)}$ for a task-specific loss $L$ , it calculates the gradient of loss w.r.t model parameters with another reverse-mode integral as follows:

\nabla_{θ_{f}} L = \frac{d L}{d θ_{f}} = - \int_{t_{m}}^{t_{0}} a_{z} (t)^{T} \frac{\partial f (z (t); θ_{f})}{\partial θ_{f}} d t .

$\nabla_{z (0)} L$ can also be calculated similarly and we can propagate the gradient backward to layers before the ODE if any. It is worth mentioning that the space complexity of the adjoint sensitivity method is $O (1)$ whereas using the backpropagation to train NODEs has a space complexity proportional to the number of DOPRI steps. The time complexity of the adjoint sensitivity method is similar, or slightly more efficient than that of backpropagation. Therefore, we can train NODEs efficiently.

2.2. Parking Occupancy Prediction

Parking management in metropolitan areas has long been a matter of interest and researched from diverse perspectives since the 1970s. In recent years, there have been many studies on predicting parking availability, of which several have focused on the SFpark data. Two studies compared the efficacy of regression trees and various regression methods for predicting San Francisco’s parking occupancy rates (Zheng et al., 2015; Simhon et al., 2017). Other studies have been conducted on other data, such as Santander, Spain (Vlahogianni et al., 2016); Birmingham, UK (Camero et al., 2018); Melbourne, Australia (Shao et al., 2018). Traditional machine learning algorithms have been extended for processing spatiotemporal data, e.g., traffic forecasting (Yu et al., 2018; Li et al., 2018; Bai et al., 2020; Choi et al., 2022). According to the survey (Jiang and Luo, 2021), two recent researches (Zhang et al., 2020; Jiang and Luo, 2021) used a spatiotemporal model to predict the parking occupancy rates of Beijing and Shenzhen in China. These models are fine-grained and use different features and datasets. Most importantly, they do not consider the price information and their models are designed without considering the price optimization. We instead compare our model with spatiotemporal models.

2.3. Parking Price Optimization

The parking price optimization problem can be formulated as adjusting prices to minimize the error between predicted and target occupancy rates. In (Fabusuyi and Hampshire, 2018), they introduced a predictive optimization strategy that enables proactive parking pricing rather than a reactive approach based on observed parking rates. By implementing a regression-based prediction model considering price elasticity, they optimized parking prices. However, their model is a simple regression for which prediction-driven optimization is technically straightforward. In (Saharan et al., 2020), they compared various machine learning prediction models and utilized them to produce occupancy-based parking prices for Seattle city’s on-street parking system.

These papers have adopted prediction-based optimization, formulating pricing schemes based on predicted occupancy rates (as in ours). However, our method is novel in the following aspects:

Most of the previous work implemented predictive models with simple machine learning approaches such as linear regression. Our model uses the latest and advanced deep learning approaches, such as NODEs.
Traditional optimization methods, such as greedy, gradient-based optimization, and so forth, were used for previous papers. We greatly reduce the running time of our method using the novel one-shot optimization which relies on the continuous and bijective nature of NODEs.

3. Occupancy Rate Prediction

3.1. Overall Architecture

As shown in Fig. 2, our proposed method consists of three modules: Firstly, the initial prediction module with spatiotemporal processing can be further decomposed into the following three layers:

The short-term layer processes the short-term occupancy rate information. This layer only considers $K$ recent occupancy rates and produces the hidden representation matrix $H_{s h o r t} \in R^{N \times dim (H_{s h o r t})}$ .
The long-term layer reads the long-term occupancy rate information (i.e., the mean occupancy rates on the same day of the last week/last two weeks/last month/last two months and so forth). Since the long-term occupancy rate has irregular time intervals, we found that the fully-connected layer is a reasonable design for this layer. This layer produces the hidden representation matrix $H_{l o n g} \in R^{N \times dim (H_{l o n g})}$ .
Given the concatenated hidden representation matrix $H_{s h o r t} \oplus H_{l o n g}$ , where $\oplus$ means the horizontal concatenation, the concatenated processing layer produces the initial future occupancy rate prediction $z_{i n i t} \in R^{N \times 1}$ .

Secondly, the price reflection module adjusts $z_{i n i t}$ created by the initial prediction module since it does not consider the price information yet. Given $z_{i n i t}$ , this module adjusts it to $z_{a d j u s t} \in R^{N \times 1}$ by considering the inputted price information.

Lastly, the final prediction module evolves $z_{a d j u s t}$ to the final prediction $^y \in [0, 1]^{N \times 1}$ . We use a series of NODEs since we found that a single NODE does not return reliable predictions.

3.2. Initial Prediction Module

We note that in this module, all parking blocks’ short-term and long-term occupancy information is processed altogether. Therefore, our model is able to consider, when predicting for a parking block, not only its own historical information but also other related parking blocks’ historical information, i.e., spatiotemporal processing.

Short-term history layer

Given a set of $K$ recent short-term occupancy rates, denoted ${s_{i}}_{i = 1}^{K}$ , where $s_{i} \in [0, 1]^{N}$ , $N$ refers to the number of parking blocks, and $s_{K}$ represents the most recent record, we extract the hidden representation of the short-term history. We note that ${s_{i}}_{i = 1}^{K}$ constitutes a time-series sample for which any type of time-series processing technique can be applied. This layer is to use the following NODE where $H (0)$ is created from ${s_{i}}_{i = 1}^{K}$ :

(2)

H_{s h o r t} = H (0) + \int_{0}^{1} f (H (t); θ_{f}) d t,

The ODE function $f : R^{N \times dim (H_{s h o r t})} \to R^{N \times dim (H_{s h o r t})}$ is defined as follows:

\begin{matrix} f (H (t); θ_{f}) = & ψ (F C_{N \times dim (H_{s h o r t}) \to N \times dim (H_{s h o r t})} (E_{1})), E_{1} = & σ (F C_{N \times dim (H_{s h o r t}) \to N \times dim (H_{s h o r t})} (E_{0})), E_{0} = & σ (F C_{N \times dim (H_{s h o r t}) \to N \times dim (H_{s h o r t})} (H (t))), \end{matrix}

where $σ$ is a rectified linear unit, $ψ$ is a hyperbolic tangent, $θ_{f}$ refers to the parameters of the three fully-connected layers. Note that this ODE layer’s hidden size is the same as the input size $dim (H_{s h o r t})$ .

Long-term history layer

Let ${l_{i}}_{i = 1}^{L}$ , where $l_{i} \in [0, 1]^{N}$ , be a set of the long-term occupancy rate information. We have 12 types of the long-term historical information for each parking block, i.e., $L = 12$ , which includes the past occupancy rates at different time points. We use the following fully-connected layer to process the long-term history ${l_{i}}_{i = 1}^{L}$ since they do not clearly constitute a time-series sample:

(3)

H_{l o n g} = {FC}_{N \times L \to N \times L} (\oplus_{i = 1}^{L} l_{i}^{⊺}),

where ${FC}_{N \times L \to N \times L}$ means the fully-connected layer with an input size (i.e., the dimensionality of input matrix) of $N \times L$ to an output size of $N \times L$ .

Concatenated processing layer

We horizontally combine $H_{s h o r t}$ and $H_{l o n g}$ to produce the initial prediction $z_{i n i t}$ . We evolve $z_{i n i t}$ from $H_{s h o r t}$ by referring to $H_{l o n g}$ . In this layer, we have one more augmented NODE layer as follows:

(4)

z_{i n i t} = c (1) = c (0) + \int_{0}^{1} m (c (t), H_{l o n g}; θ_{m}) d t,

where $c (0) = {FC}_{N \times dim (H_{s h o r t}) \to N \times 1} (H_{s h o r t})$ . We note that after this processing, $z_{i n i t} \in R^{N \times 1}$ . For reducing the overall computational overhead of our method, we early make the initial prediction.

The ODE function $m : R^{N \times (L + 1)} \to R^{N \times 1}$ is defined as follows:

\begin{matrix} m (c (t), h_{l o n g}; θ_{m}) = & ψ (F C_{N \times 1 \to N \times 1} (u_{1})), u_{1} = & σ (F C_{N \times 1 \to N \times 1} (u_{0})), u_{0} = & σ (F C_{N \times (L + 1) \to N \times 1} (c (t) \oplus h_{l o n g})), \end{matrix}

where $θ_{m}$ refers to the parameters of the three fully-connected layers. In particular, this type of NODEs is called as augmented NODEs, which can be written as follows for our case:

(5)

\frac{d}{d t} [\begin{matrix} c (t) h_{l o n g} \end{matrix}] = [\begin{matrix} m (c (t), h_{l o n g}; θ_{m}) 0 \end{matrix}] .

3.3. Price Reflection Module

The above initial prediction module does not consider one of the most important factors in forecasting the future occupancy, which is the price information. We reflect the price information $p \in [p_{m i n}, p_{m a x}]^{N}$ into $z_{i n i t}$ and create $z_{a d j u s t}$ in this module. We use the following regressor for this module:

(6)

z_{a d j u s t} = z_{i n i t} - ((c ⊙ p)^{⊺} + b),

where $c \in [0, \infty]^{N}$ is a coefficient vector that represents the demand elasticity on price changes, i.e. how much the occupancy rate reacts to the price. $⊙$ means the element-wise product. $b \in R^{N \times 1}$ is a bias (column) vector.

3.4. Final Prediction Module

Given the adjusted prediction $z_{a d j u s t} \in R^{N \times 1}$ , there are multiple NODE layers to evolve it to the future occupancy prediction $^y$ . Since we found that only one NODE layer is not enough for this purpose, we adopt the following $M$ NODE layers:

(7)

\begin{matrix} y_{1} (1) & = z_{a d j u s t} + \int_{0}^{1} j_{1} (y_{1} (t); θ_{j_{1}}) d t, y_{i} (1) & = y_{i - 1} (1) + \int_{0}^{1} j_{i} (y_{i} (t); θ_{j_{i}}) d t, 1 < i < M, y_{M} (1) & = y_{M - 1} (1) + \int_{0}^{1} j_{M} (y_{M} (t); θ_{j_{M}}) d t, \end{matrix}

where the future occupancy prediction $^y = y_{M} (1)$ .

We use the following identical architecture for $j_{i} : R^{N \times 1} \to R^{N \times 1}$ , for all $1 \leq i \leq M$ , but their parameters $θ_{j_{i}}$ are different:

\begin{matrix} j_{i} (y_{i} (t); θ_{j_{i}}) = & ψ (F C_{N \times 1 \to N \times 1} (o_{1})), o_{1} = & σ (F C_{N \times 1 \to N \times 1} (o_{0})), o_{0} = & σ (F C_{N \times 1 \to N \times 1} (y_{i} (t))) . \end{matrix}

Input: Training data

D_{t r a i n}

, Validating data

D_{v a l}

, Maximum iteration number

m a x_i t e r

1 Initialize all parameters, denoted

θ_{a l l}

;

i t e r \leftarrow 0

; while $i t e r < m a x_i t e r$ do

2 Train all parameters

θ_{a l l}

with the loss

L

; Validate and update the best parameters;

i t e r \leftarrow i t e r + 1

;

return the selected best parameters;

Algorithm 1 How to train our model

3.5. Training Algorithm

We use the adjoint method (Chen et al., 2018b) to train each NODE layer in our model, which requires a memory of $O (1)$ and a time of $O (δ)$ where $δ$ is the (average) step-size of an underlying ODE solver. In Alg. (1), we show our training algorithm. We use the following mean squared error loss with the training data $D_{t r a i n}$ :

L = \frac{\sum_{i = 1}^{| D_{t r a i n} |} ∥ y_{i} - {^y}_{i} ∥_{2}^{2}}{| D_{t r a i n} |} + w ∥ θ_{a l l} ∥_{2}^{2},

where $y_{i}$ and ${^y}_{i}$ mean the ground-truth and predicted occupancy rate of the $i$ -th training sample, respectively. $w$ is the coefficient of the $L^{2}$ regularization term. $θ_{a l l}$ denotes the super set of the all parameters in our model.

Well-posedness of Training

The well-posedness¹¹1A well-posed problem means i) its solution uniquely exists, and ii) its solution continuously changes as input data changes. of NODEs was already proved in (Lyons et al., 2004, Theorem 1.3) under the mild condition of the Lipschitz continuity. We show that training our NODE layers is also a well-posed problem. Almost all activations, such as ReLU, Leaky ReLU, SoftPlus, Tanh, Sigmoid, ArcTan, and Softsign, have a Lipschitz constant of 1. Other common neural network layers, such as dropout, batch normalization and other pooling methods, have explicit Lipschitz constant values. Therefore, the Lipschitz continuity of $f$ , $m$ and $j_{i}$ for all $i$ can be fulfilled in our case, making it is a well-posed training problem. Our training algorithm solves a well-posed problem so its training process is stable in practice.

4. One-shot Price Optimization

We describe our proposed dynamic pricing method. The key algorithm is the following “one-shot” price optimization method which finds a solution in one iteration. For this step, we pre-train the prediction model and all its parameters are considered as constants during this step.

Given the short-term history information ${s_{i}}_{i = 1}^{K}$ , the long-term history information ${l_{i}}_{i = 1}^{L}$ , and the target occupancy rates $y^{*}$ , we want to find the optimal prices $p^{*}$ that leads to $y^{*}$ as follows:

(8)

argminp∗∥^y−y∗∥1N,subject topmin≤p∗≤pmax,^y=ξ(p∗,{si}Ki=1,{li}Li=1;θξ),

where $ξ$ is our pre-trained occupancy prediction model, and $θ_{ξ}$ is its set of parameters. This becomes a complicated non-linear resource allocation problem, whose polynomial-time solver is unknown, if $ξ$ is a general deep neural network (1; O. Granmo and B. J. Oommen (2010)). For those reasons, people typically rely on the Karush–Kuhn–Tucker (KKT) condition, when it is possible to calculate the derivative of objective, which teaches us the necessary condition of optimality to find a reasonable solution, but this method sometimes converges to saddle points that are not good enough (Ruszczynski, 2006). In our case, however, we use the following one-shot method due to our special design suitable for solving the problem:

We feed the given short-term and the long-term history information and derive $z_{i n i t}$ from the initial prediction module;
Let $y^{*}$ is the target occupancy rates that we want to achieve. We then solve the following series of reverse-mode integral problems to derive $z_{a d j u s t}^{*}$ :

$\begin{matrix} y_{M - 1} (1) & = y^{*} - \int_{0}^{1} j_{M} (y_{M} (t); θ_{j_{M}}) d t, y_{i - 1} (1) & = y_{i} (1) - \int_{0}^{1} j_{i} (y_{i} (t); θ_{j_{i}}) d t, 1 < i < M, z_{a d j u s t}^{*} & = y_{1} (1) - \int_{0}^{1} j_{1} (y_{1} (t); θ_{j_{1}}) d t; \end{matrix}$
The optimal price vector $p^{*}$ can be calculated as follows:

(9) $p^{*} = \frac{z_{i n i t} - z_{a d j u s t}^{*} - b}{c} .$

We note that we query the initial prediction module and solve each of the integral problems exactly once, i.e., a constant complexity of $O (1)$ given a fixed number $M$ of NODE layers, which is theoretically the minimum complexity that we can achieve (since optimization with zero queries is infeasible). Note $M$ does not vary during operation but is fixed given a prediction model.

\toprule		Training Dataset	Test Dataset
\midrule San Francisco	$K = 1$	407,008	183,280
	$K = 2$	356,132	160,370
	$K = 3$	305,256	137,460
\midruleSeattle	$K = 1$	75,264	31,360
	$K = 2$	65,856	27,440
	$K = 3$	56,448	23,520
\bottomrule

Table 1. The size of the training/testing datasets

Figure 3. The visualization of parking operating hours and meter operating hours of San Francisco

5. Experiments on Prediction

Our software and hardware environments are as follows: Ubuntu 18.04 LTS, Python 3.7.10, PyTorch 1.6.0, TORCHDIFFEQ 0.2.2, TORCHDIFFEQPACK 0.1.0, CUDA 11.2, and NVIDIA Driver 460.91.03, and i7 CPU, and NVIDIA Quadro RTX 8000.

\toprule	Model		Batch size	Learning rate	Weight decay	#Final layers	Initial coefficient	#Params
\midrule San Francisco	Substitute Eq. (2) with	RNN	64	$1 \times 10^{- 4}$	$1 \times 10^{- 5}$	10	0.03	7,911,217
		LSTM	64	$1 \times 10^{- 4}$	$1 \times 10^{- 5}$	10	0.03	10,404,217
		GRU	64	$1 \times 10^{- 4}$	$1 \times 10^{- 5}$	10	0.03	9,573,217
		STGCN	64	$1 \times 10^{- 3}$	$1 \times 10^{- 5}$	10	0.03	7,700,854
		DCRNN	64	$1 \times 10^{- 3}$	$1 \times 10^{- 5}$	10	0.03	7,753,914
		AGCRN	64	$1 \times 10^{- 3}$	$1 \times 10^{- 5}$	10	0.03	13,594,000
\cmidrule2-9	Proposed (full model)		64	$1 \times 10^{- 4}$	$1 \times 10^{- 5}$	15	0.03	7,001,059
	Proposed (w/o final module)		64	$1 \times 10^{- 4}$	$1 \times 10^{- 5}$	15	0.03	8,131,549
\midrule Seattle	Substitute Eq. (2) with	RNN	64	$1 \times 10^{- 3}$	$1 \times 10^{- 5}$	10	0.5	3,544,657
		LSTM	64	$1 \times 10^{- 3}$	$1 \times 10^{- 5}$	10	0.5	5,947,657
		GRU	64	$1 \times 10^{- 3}$	$1 \times 10^{- 5}$	10	0.5	5,146,657
		STGCN	64	$1 \times 10^{- 3}$	$1 \times 10^{- 5}$	10	0.5	2,977,894
		DCRNN	64	$1 \times 10^{- 3}$	$1 \times 10^{- 5}$	10	0.5	3,031,074
		AGCRN	64	$1 \times 10^{- 3}$	$1 \times 10^{- 5}$	10	0.5	5,322,406
\cmidrule2-9	Proposed (full model)		64	$5 \times 10^{- 4}$	$5 \times 10^{- 5}$	10	0.5	2,985,619
	Proposed (w/o final module)		64	$5 \times 10^{- 4}$	$5 \times 10^{- 5}$	10	0.5	2,694,559
\bottomrule

Table 2. Best hyperparameters and the number of trainable parameters. The weight decay is the

L^{2}

regularization coefficient term

w

of (3.5). The number of final layers is

M

of (7). Considering the original price range of each dataset, we set the initial coefficient value of the price reflection module,

c

of (6).

5.1. Experimental Environments

Dataset

The San Francisco Municipal Transportation Agency (SFMTA) website ²²2https://www.sfmta.com/getting-around/drive-park/demand-responsive-pricing/sfpark-evaluation provides data collected during the SFpark pilot project. On-street occupancy rate data contain per-block hourly occupancy rate and meter prices for seven parking districts.

The Seattle Department of Transportation (SDOT) sets on-street parking rates based on data to obtain a target occupancy rate of 70% to 85%. The SDOT website ³³3https://www.seattle.gov/transportation/projects-and-programs/programs/parking-program/performance-based-parking-pricing-program provides a dataset of on-street occupancy rates from 2010. In contrast to San Francisco, Seattle has fixed annual pricing that is updated based on the previous year’s data. Prices vary depending on the time of day; 8 a.m. to 11 a.m., 11 a.m. to 5 p.m., and 5 p.m. to 8 p.m.

Since we optimize the parking price, we define parking occupancy as metered parking occupancy and only predict and optimize when the parking blocks are metered. For instance, as shown in Fig. 3, meters are only operational from 9 a.m. to 5 p.m in San Francisco. Many parking blocks in Seattle also are not metered after 5 p.m. In San Francisco, parking is free on all weekends, and in Seattle on Sundays. Therefore, only data from 9 a.m. to 5 p.m. of San Francisco and data from 8 a.m. to 5 p.m. of Seattle on weekdays are used in our experiments.

We use data from August 11, 2011, to July 31, 2013 for San Francisco. Finally, $N = 158$ blocks from 63 streets and 7 districts with no missing data in the aforementioned period are selected for prediction/optimization. For Seattle, we extract the most recent data from July 19, 2019 to March 23, 2020 and predict/optimize $N = 98$ parking blocks in 9 parking areas.

Historical data from various periods are derived via feature engineering. In the case of short-term history, the occupancy rate in the past $K$ hours is given as a feature. Different $K$ settings lead to different training/testing data as shown in Table 1. For example, San Francisco dataset consists of 407,008 (resp. 356,132) samples, when $K = 1$ (resp. $K = 2$ ). The ratio between a training set and a test set is 7:3. We only consider $K = {1, 2, 3}$ for short-term history length in our experiments because longer $K$ settings lead to a drastic reduction in the number of training samples (as daily occupancy rates are only available from 8 a.m./9 a.m. to 5 p.m.). However, long-term history contains much detailed information.

For the long-term historical features, we obtained historical occupancy records for various periods. The average occupancy rates for the last week, last two weeks, last month, and past two months are derived. Moreover, we separated the time zones into 9 a.m. to 11 a.m., 12 p.m. to 2 p.m., and 3 p.m. to 5 p.m. Adopting this strategy, we ended up with 12 long-term history features.

Baselines

We compare our model with the following baseline prediction models:

RNN, LSTM (Hochreiter and Schmidhuber, 1997) and GRU (Chung et al., 2014) are popular time-series processing models. We create two different ways of how to utilize them for experiments: i) These models are used instead of our short-term history layer and other implementations are the same. ii) We feed only the short-term historical information to them and predict (without any other layers) since they are designed to consider only one type of sequence.
DCRNN (Li et al., 2018), AGCRN (Bai et al., 2020), and STGCN (Yu et al., 2018) are popular baselines in the field of spatiotemporal machine learning. DCRNN combines graph convolution with recurrent neural networks in an encoder-decoder manner. AGCRN learns an adaptive adjacency matrix from data. STGCN combines graph convolution with gated temporal convolution. We also define two different versions for them in the same way above.

\toprule	Model			$K = 1$		$K = 2$		$K = 3$
\cmidrule5-10				MSE	$R^{2}$	MSE	$R^{2}$	MSE	$R^{2}$
\midrule San Francisco	Ours	Substitute Eq. (2) with	RNN	0.01374 $\pm$ 0.00027	0.60727 $\pm$ 0.00763	0.01296 $\pm$ 0.00223	0.62080 $\pm$ 0.00223	0.01373 $\pm$ 0.00016	0.59760 $\pm$ 0.00457
			LSTM	0.01701 $\pm$ 0.00027	0.51369 $\pm$ 0.01161	0.01443 $\pm$ 0.00321	0.57783 $\pm$ 0.00321	0.01517 $\pm$ 0.00009	0.55557 $\pm$ 0.00261
			GRU	0.01505 $\pm$ 0.00039	0.56983 $\pm$ 0.01113	0.01395 $\pm$ 0.00334	0.59187 $\pm$ 0.00334	0.01446 $\pm$ 0.00014	0.57618 $\pm$ 0.00406
			STGCN	0.01040 $\pm$ 0.00050	0.70287 $\pm$ 0.01419	0.01027 $\pm$ 0.00045	0.69969 $\pm$ 0.01319	0.01037 $\pm$ 0.00049	0.69619 $\pm$ 0.01433
			DCRNN	0.01022 $\pm$ 0.00030	0.70796 $\pm$ 0.00851	0.01012 $\pm$ 0.00041	0.70404 $\pm$ 0.01199	0.01020 $\pm$ 0.00053	0.70113 $\pm$ 0.01534
			AGCRN	0.01021 $\pm$ 0.00012	0.70821 $\pm$ 0.00329	0.01001 $\pm$ 0.00014	0.70727 $\pm$ 0.00417	0.01047 $\pm$ 0.00032	0.69326 $\pm$ 0.00938
\cmidrule3-10		Proposed (full model)		0.00980 $\pm$ 0.00002	0.71985 $\pm$ 0.00046	0.00975 $\pm$ 0.00001	0.71473 $\pm$ 0.00034	0.00999 $\pm$ 0.00003	0.70726 $\pm$ 0.00092
		Proposed (w/o final module)		0.01005 $\pm$ 0.00003	0.71278 $\pm$ 0.00084	0.00994 $\pm$ 0.00004	0.70915 $\pm$ 0.00109	0.01009 $\pm$ 0.00003	0.70435 $\pm$ 0.00090
\cmidrule2-10	Existing Method (only short-term)	RNN		0.01295 $\pm$ 0.00004	0.62994 $\pm$ 0.00120	0.01258 $\pm$ 0.00003	0.63189 $\pm$ 0.00102	0.01321 $\pm$ 0.01378	0.61225 $\pm$ 0.00145
		LSTM		0.01564 $\pm$ 0.00020	0.55307 $\pm$ 0.00562	0.01395 $\pm$ 0.00005	0.59198 $\pm$ 0.00137	0.01476 $\pm$ 0.01378	0.56762 $\pm$ 0.00296
		GRU		0.01374 $\pm$ 0.00006	0.60724 $\pm$ 0.00174	0.01280 $\pm$ 0.00006	0.62547 $\pm$ 0.00167	0.01319 $\pm$ 0.01378	0.61344 $\pm$ 0.00240
		STGCN		0.01119 $\pm$ 0.00054	0.68005 $\pm$ 0.01546	0.01100 $\pm$ 0.00047	0.67825 $\pm$ 0.01378	0.01112 $\pm$ 0.00077	0.67407 $\pm$ 0.02243
		DCRNN		0.01125 $\pm$ 0.00004	0.67850 $\pm$ 0.00127	0.01077 $\pm$ 0.00002	0.68493 $\pm$ 0.00046	0.01072 $\pm$ 0.00005	0.68578 $\pm$ 0.00136
		AGCRN		0.01092 $\pm$ 0.00010	0.68796 $\pm$ 0.00282	0.01059 $\pm$ 0.00012	0.69012 $\pm$ 0.00355	0.01095 $\pm$ 0.00066	0.67913 $\pm$ 0.01948
		NODE		0.01058 $\pm$ 0.00005	0.69753 $\pm$ 0.00146	0.01055 $\pm$ 0.00008	0.69123 $\pm$ 0.00238	0.01063 $\pm$ 0.00006	0.68840 $\pm$ 0.00170
\midrule Seattle	Ours	Substitute Eq. (2) with	RNN	0.02392 $\pm$ 0.00015	0.62621 $\pm$ 0.00225	0.02721 $\pm$ 0.00036	0.57511 $\pm$ 0.00567	0.02799 $\pm$ 0.00053	0.56240 $\pm$ 0.00828
			LSTM	0.02564 $\pm$ 0.00034	0.59918 $\pm$ 0.00529	0.02797 $\pm$ 0.00032	0.56323 $\pm$ 0.00500	0.03128 $\pm$ 0.00085	0.51106 $\pm$ 0.01323
			GRU	0.02466 $\pm$ 0.00010	0.61449 $\pm$ 0.00151	0.02660 $\pm$ 0.00058	0.58451 $\pm$ 0.00902	0.02917 $\pm$ 0.00149	0.54399 $\pm$ 0.01746
			STGCN	0.02125 $\pm$ 0.00029	0.66792 $\pm$ 0.00451	0.02154 $\pm$ 0.00103	0.66360 $\pm$ 0.01611	0.02183 $\pm$ 0.00112	0.65870 $\pm$ 0.01746
			DCRNN	0.02128 $\pm$ 0.00023	0.66740 $\pm$ 0.00352	0.02171 $\pm$ 0.00056	0.66094 $\pm$ 0.00873	0.02203 $\pm$ 0.00088	0.65556 $\pm$ 0.01376
			AGCRN	0.02165 $\pm$ 0.00009	0.66162 $\pm$ 0.00140	0.02170 $\pm$ 0.00038	0.66137 $\pm$ 0.00643	0.02359 $\pm$ 0.00038	0.63345 $\pm$ 0.00610
\cmidrule3-10		Proposed (full model)		0.02098 $\pm$ 0.00006	0.67204 $\pm$ 0.00088	0.02126 $\pm$ 0.00008	0.66803 $\pm$ 0.00122	0.02153 $\pm$ 0.0000	0.66339 $\pm$ 0.00029
		Proposed (w/o final module)		0.02266 $\pm$ 0.00009	0.64580 $\pm$ 0.00140	0.02331 $\pm$ 0.00017	0.63599 $\pm$ 0.00263	0.02386 $\pm$ 0.00021	0.62708 $\pm$ 0.00327
\cmidrule2-10	Existing Method (only short-term)	RNN		0.02428 $\pm$ 0.00002	0.62045 $\pm$ 0.00035	0.02674 $\pm$ 0.00010	0.58230 $\pm$ 0.00162	0.02712 $\pm$ 0.00010	0.57598 $\pm$ 0.00155
		LSTM		0.02490 $\pm$ 0.00008	0.61076 $\pm$ 0.00118	0.02529 $\pm$ 0.00013	0.60497 $\pm$ 0.00202	0.02720 $\pm$ 0.00004	0.57485 $\pm$ 0.00063
		GRU		0.02427 $\pm$ 0.00006	0.62067 $\pm$ 0.00088	0.02466 $\pm$ 0.00004	0.61491 $\pm$ 0.00064	0.02548 $\pm$ 0.00010	0.60162 $\pm$ 0.00156
		STGCN		0.02148 $\pm$ 0.00037	0.66423 $\pm$ 0.00574	0.02167 $\pm$ 0.00007	0.66158 $\pm$ 0.00111	0.02179 $\pm$ 0.00008	0.65931 $\pm$ 0.00131
		DCRNN		0.02416 $\pm$ 0.00022	0.62232 $\pm$ 0.00338	0.02323 $\pm$ 0.00047	0.63721 $\pm$ 0.00735	0.02250 $\pm$ 0.00036	0.64833 $\pm$ 0.00558
		AGCRN		0.02158 $\pm$ 0.00010	0.66263 $\pm$ 0.00158	0.02485 $\pm$ 0.00217	0.61193 $\pm$ 0.03388	0.02168 $\pm$ 0.00021	0.66108 $\pm$ 0.00321
		NODE		0.02138 $\pm$ 0.00006	0.66596 $\pm$ 0.00094	0.02163 $\pm$ 0.00005	0.66211 $\pm$ 0.00078	0.02182 $\pm$ 0.00005	0.65894 $\pm$ 0.00080
\bottomrule

Table 3. The results of parking occupancy prediction (mean

\pm

std.dev.)

Hyperparameters

We test the following hyperparameters for each model — we trained for 1,000 epochs with a batch size of 64 for all models and other detailed settings are in Table 2:

For RNN, LSTM, and GRU, we use one layer has the hidden size of 500.
For STGCN, we set the channels of three layers (two temporal gated convolution layers and one spatial graph convolution layer) in ST-Conv block to 64, and the Chebyshev polynomials to 1. To construct a parking block graph, we compute the pairwise network distances between blocks and build an adjacency matrix using thresholded Gaussian kernel.
For DCRNN, we set the number of GRU hidden units to 64 and the number of recurrent layers to 2. We use the weighted adjacency matrix constructed in the same way as STGCN.
For AGCRN, we set the hidden unit to 64 for all the AGCRN cells. The size of node embeddings is set to 2 and 5 for San Francisco and Seattle, respectively.
For ours, NODE, we use MALI (Zhuang et al., 2021), the training algorithm for NODEs and the ODE solver error tolerance is set to 1e-5.

5.2. Experimental Results

Experimental results are summarized in Table 3 — we report the mean and std. dev. performance with five different random seeds. In general, RNN-based models perform poorly compared to other spatiotemporal models, such as AGCRN and STGCN. Moreover, when spatiotemporal models are plugged into our model in the place of Eq. (2), they mostly perform better than when they are used solely with the short-term information. Other modules remain the same when we substitute Eq. (2) with other baselines, thus the long-term features and price are also used, which is reasonable. In all cases, the best outcomes are achieved by our full model.

Ablation Study

As an ablation study, we remove the final prediction module and let $z_{a d j u s t}$ be the final prediction. Since $z_{a d j u s t}$ is adjusted with the price information from the initial prediction, we can also use it for price optimization (or dynamic pricing). However, the proposed prediction model without the final prediction module does not perform as well as our full model as shown in Table 3.

6. Experiments on Optimization

We describe our optimization results. The prediction model is fixed during this process and we optimize an actionable input, i.e., price, to achieve target occupancy rates. Since our proposed full model produces the lowest prediction errors, we use it as an oracle. In this step, it is crucial to use the best predictive model as an oracle (although some sub-optimal models provide a lower complexity) (An et al., 2016, 2017; Chen et al., 2018a; Li et al., 2021; Jeon et al., 2021; Park and others, ). If not, the optimized solution (toward the sub-optimal model) does not yield expected outcomes when being applied to real-world environments.

6.1. Experimental Environments

\toprule	Optimization Performance			Runtime (seconds)
	$τ = 70 %$	$τ = 75 %$	$τ = 80 %$	Runtime (seconds)
\midruleObserved in Data	0.5475	0.4440	0.3450	N/A
Greedy	0.4685	0.2045	0.0553	446.6343
Gradient-based	0.3606	0.1609	0.0488	0.5471
\midruleOne-shot (Ours)	0.1938	0.0065	0.0033	0.0000007
\bottomrule

Table 4. The optimization performance (the ratio of failed test cases where the optimized occupancy rate exceeds the threshold

τ

) and the average runtime of San Francisco

\toprule	Optimization Performance			Runtime (seconds)
	$τ = 70 %$	$τ = 75 %$	$τ = 80 %$	Runtime (seconds)
\midruleObserved in Data	0.1307	0.1003	0.0756	N/A
Greedy	0.1533	0.0917	0.0495	0.5438
Gradient-based	0.1739	0.1171	0.0724	0.0042
\midruleOne-shot (Ours)	0.0997	0.0684	0.0440	0.0000007
\bottomrule

Table 5. The optimization performance and the average runtime of Seattle

Baselines

We compare our method with the following baseline methods (whose design concepts are similar to what used in (An et al., 2016, 2017; Li et al., 2021; Jeon et al., 2021)) — the target occupancy rate $y^{*}$ is set to 70% and the minimum price $p_{m i n}$ is set to $0.25 and $0.5 for San Francisco and Seattle, respectively; the maximum price $p_{m a x}$ is set to $34.5 for San Francisco and $3.0 for Seattle. Given the short-term, long-term occupancy rates, and the pre-trained prediction model’s parameters $θ_{ξ}$ , we optimize the price $p^{*}$ given the target occupancy rate $y^{*}$ :

We use the following greedy black-box search strategy: We set the initial price vector $p^{*}$ to $p_{m i n}$ for all parking blocks. For each iteration, we increase each block’s parking price by $0.25 (given that its price has not already reached $p_{m a x}$ ) and find the parking block which decreases the error term $∥ y^{*} -^y ∥_{2}^{2}$ the most. We then increase the price of this block by $0.25. The aforementioned strategy is repeated until the current iteration does not decrease the error term.
We test the following gradient-based white-box strategy: The gradient $\frac{\partial ∥ y^{*} -^y ∥_{2}^{2}}{\partial p^{*}}$ flows to $p^{*}$ with the Adam optimizer (Kingma and Ba, 2014) up to 1,000 iterations until the error term $∥ y^{*} -^y ∥_{2}^{2}$ stabilizes — $p^{*}$ is initialized to $0.25 for all parking blocks. Note that in this scheme, $p^{*}$ is optimized to minimize the error term, with $p_{m i n} \leq p^{*} \leq p_{m a x}$ . This method can find the optimal solution if the error term is a convex function, which is not the case in our setting, so it returns a sub-optimal solution.

The worst-case complexity of the greedy strategy is $O (N^{2} \frac{p_{m a x} - p_{m i n}}{0.25})$ whereas our method has a complexity of $O (1)$ . Depending on how quickly the solution converges, the gradient method’s performance varies substantially.

Research Questions

We answer the following research questions from our experiments:

(RQ1) What is the runtime of each method?
(RQ2) For how many test cases does each method successfully find (or fail to find) $p^{*}$ ?

6.2. Experimental Results

Rq1

In Tables 4 and 5, we compare the optimization results. In the case of San Francisco, our method is several orders of magnitude faster than the greedy black-box method and the gradient-based white box method. The greedy method, in particular, showed significant limitations in terms of runtime, with one test case lasting 447 seconds on average and taking up to 4,653 iterations. Considering that we optimized 183,280 cases in the test period, our one-shot method showed remarkable runtime. Seattle’s price range is not as broad as San Francisco’s, i.e., $0.5 to $3.0. As a result, the optimization of Seattle showed faster runtime in Table 5.

Rq2

We also show the percentage of the failure cases where the optimized occupancy rate is larger than the threshold $τ = {70 %, 75 %, 80 %}$ . These errors occur due to imperfections in the price optimization process. Our method greatly outperforms other methods by showing a much smaller failure ratio in comparison to others. Theoretically, our method should show zero error if we can exactly solve the reverse-mode integral, but due to the practical limitation of ODE solvers, it produces a small amount of error. However, 80.6% to 99.7% of the 183,280 cases in San Francisco, and 91% to 95.6% of the 31,360 cases in Seattle are still below the optimal occupancy rate $τ = {70 %, 75 %, 80 %}$ , which are acceptable.

Figure 4. San Francisco’s mean observed occupancy rate and the mean optimized parking price at 12:00 p.m. in the test period for seven blocks in seven different districts

7. Case Studies

We introduce selected examples of prediction and price optimization outcomes with visualization. Figure 4 shows the mean observed (or ground-truth in the dataset) occupancy rate and optimized parking price of San Francisco during the test period. As shown, the optimized parking price and the observed parking occupancy rate are highly correlated. The correlation between the mean hourly observed occupancy rate and the mean optimized price on each block is 0.724 in our San Francisco’s test set.

The interesting point is that the mean optimized price jumped when the observed occupancy exceeded the ideal rate of 80%. The parking block in downtown, 500 Battery St’s occupancy rate was almost always over 90% at noon, with a mean rate of 93%. The mean optimized price at noon is $15, a fairly high price. Since the parking block is almost always over-occupied, our framework accordingly recommends a reasonably high price to reduce the occupancy rate. The average parking price observed in the dataset was $5.9, which seems to be insufficiently high to achieve the target occupancy rate.

Our strategy also shows adaptability to sudden changes in demand. On July 4, 2013, the aforementioned block’s parking occupancy rate was observed 1% at noon. For this day, our method dynamically decreased the price to $0.25, the minimum price. However, the observed price was $6. Similarly, the block in the civic center, 500 Franklin St., showed a low occupancy rate on average. However, on April 12, the occupancy rate at noon was higher than usual at 81%. Our optimized price was $1.72, while the observed price was $0.75–even lower than the mean price of $0.86. These cases demonstrate the robustness of our proactive optimization.

Fig. 5 shows the effectiveness of our dynamic pricing with some selected examples in two parking blocks in Seattle. As shown, their occupancy rates by the oracle in red are well controlled by our dynamic pricing in blue. The observed ground-truth price of these blocks is set to the basic price, $0.5, and adjusted to $1 at 5 p.m.

In Fig. 6, we show other types of visualization. In the four parking blocks of Seattle, the ground-truth occupancy rates observed in the data are above the ideal range $[0.6, 0.85]$ in many cases, whereas our method successfully suppresses the occupancy rates on or below the range.

The correlation visualization between the occupancy rate and the optimized price. These figures include Seattle’s test data from January 31st. — (a) The block between E James St and E Cherry St, 12th Avenue

The comparison between the ground-truth occupancy rate and the predicted occupancy rate when our optimized pricing is applied. These figures show Seattle’s data. — (a) East Green Lake Dr N between NE 72nd St and NE Maple Leaf Pl

8. Conclusions & Limitations

In the U.S. metropolitan cities, dynamic pricing is necessary to keep parking occupancy rates within a tolerable range. To this end, we presented a one-shot prediction-driven optimization framework which is featured by i) an effective prediction model for the price-occupancy relation, and ii) a one-shot optimization method. Our prediction model is carefully tailored for the price-occupancy relation and therefore, it outperforms other general time series (or spatiotemporal) forecasting models. Our sophisticated prediction model, which includes many layers for processing the short-term and the long-term historical information, and the price information, is designed considering the optimization process, which is quite different from other predictive models. In general, predictive models are not designed considering optimization, and therefore, one should rely on a black-box optimization technique and so on. In our case, however, the price reflection module is deferred to after the initial prediction module, and the final prediction module relies on NODEs which are bijective and continuous. Owing to all these design points, the price optimization can be solved in $O (1)$ . Our experiments with the data collected in San Francisco and Seattle show that the presented dynamic pricing works well as intended, outperforming other optimization methods. Our method is several orders of magnitude faster than them and successfully suppresses too large occupancy rates.

One limitation in our work is that we should have relied on the oracle model to verify the efficacy of our method (instead of running real systems in San Francisco and Seattle). However, this is a common limitation of many prediction-driven optimization researches (An et al., 2016, 2017; Chen et al., 2018a; Li et al., 2021; Jeon et al., 2021; Park and others, ). We contribute a novel approach, which drastically enhances the complexity of solving the optimization problem, to the community of prediction-driven optimization.

Acknowledgement

Noseong Park is the corresponding author. This work was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No. 2020-0-01361, Artificial Intelligence Graduate School Program (Yonsei University)).

References

[1] (1995) A nonlinear knapsack problem. Operations Research Letters 17 (3), pp. 103–110. Cited by: §4.
B. An, H. Chen, N. Park, and V. S. Subrahmanian (2017) Data-driven frequency-based airline profit maximization. ACM Trans. Intell. Syst. Technol. 8 (4). Cited by: §1, §6.1, §6, §8.
B. An, H. Chen, N. Park, and V.S. Subrahmanian (2016) MAP: frequency-based maximization of airline profits based on an ensemble forecasting approach. In KDD, Cited by: §1, §6.1, §6, §8.
L. Bai, L. Yao, C. Li, X. Wang, and C. Wang (2020) Adaptive graph convolutional recurrent network for traffic forecasting. In NeurIPS, Cited by: §2.2, item 2.
A. Camero, J. Toutouh, D. H. Stolfi, and E. Alba (2018) Evolutionary deep learning for car park occupancy prediction in smart cities. In LION, pp. 386–401. Cited by: §1, §2.2.
H. Chen, B. An, G. Sharon, J. P. Hanna, P. Stone, C. Miao, and Y. C. Soh (2018a) DyETC: dynamic electronic toll collection for traffic congestion alleviation. In AAAI, Cited by: §6, §8.
R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud (2018b) Neural ordinary differential equations. In NeurIPS, Cited by: §1, §2.1, §2.1, §3.5.
J. Choi, H. Choi, J. Hwang, and N. Park (2022) Graph neural controlled differential equations for traffic forecasting. In AAAI, Cited by: §2.2.
J. Choi, J. Jeon, and N. Park (2021) LT-ocf: learnable-time ode-based collaborative filtering. In CIKM, pp. 1020–1031. Cited by: §2.1.
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555. Cited by: item 1.
J.R. Dormand and P.J. Prince (1980) A family of embedded runge-kutta formulae. Journal of Computational and Applied Mathematics 6 (1), pp. 19 – 26. Cited by: §2.1.
E. Dupont, A. Doucet, and Y. W. Teh (2019) Augmented neural odes. In NeurIPS, Cited by: §2.1.
T. Fabusuyi and R. C. Hampshire (2018) Rethinking performance based parking pricing: a case study of sfpark. Transportation Research Part A: Policy and Practice 115, pp. 90–101. Cited by: §2.3.
P. Ghent, D. Mitchell, and A. Sedadi (2012) LA express park™-curbing downtown congestion through intelligent parking management. In 19th ITS World CongressERTICO-ITS EuropeEuropean CommissionITS AmericaITS Asia-Pacific, Cited by: §1.
O. Granmo and B. J. Oommen (2010) Solving stochastic nonlinear resource allocation problems using a hierarchy of twofold resource allocation automata. IEEE Transactions on Computers 59 (4), pp. 545–560. Cited by: §4.
S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural computation 9, pp. 1735–80. External Links: Document Cited by: item 1.
J. Jeon, D. Lee, S. Hwang, S. Kang, N. Park, D. Li, K. Lee, and J. Liu (2021) Large-scale flight frequency optimization with global convergence in the us domestic air passenger markets. In SDM, Cited by: §1, §6.1, §6, §8.
W. Jiang and J. Luo (2021) Graph neural network for traffic forecasting: a survey. arXiv preprint arXiv:2101.11174. Cited by: §2.2.
J. Kim, J. Jeon, J. Lee, J. Hyeong, and N. Park (2021) OCT-gan: neural ode-based conditional tabular gans. In TheWebConf, Cited by: §2.1.
D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: item 2.
D. Li, J. Liu, J. Jeon, S. Hong, T. Le, D. Lee, and N. Park (2021) Large-scale data-driven airline market influence maximization. In KDD, Cited by: §1, §6.1, §6, §8.
Y. Li, R. Yu, C. Shahabi, and Y. Liu (2018) Diffusion convolutional recurrent neural network: data-driven traffic forecasting. In ICLR, Cited by: §2.2, item 2.
T. Lin, H. Rivano, and F. Le Mouël (2017) A survey of smart parking solutions. IEEE Transactions on Intelligent Transportation Systems 18 (12), pp. 3229–3253. Cited by: §1.
T. Lyons, M. Caruana, and T. Lévy (2004) Differential equations driven by rough paths. Note: École D’Eté de Probabilités de Saint-Flour XXXIV - 2004 Cited by: §3.5.
S. Massaroli, M. Poli, J. Park, A. Yamashita, and H. Asama (2020) Dissecting neural odes. External Links: 2002.08071 Cited by: §2.1.
[26] N. Park et al. APE: a data-driven, behavioral model based anti-poaching engine. to appear in IEEE Trans. Computational Social Systems. Cited by: §6, §8.
G. Pierce and D. Shoup (2013) Getting the prices right: an evaluation of pricing parking by demand in san francisco. Journal of the american planning association 79 (1), pp. 67–81. Cited by: §1, §1.
A. Ruszczynski (2006) Nonlinear optimization. Princeton University Press. Cited by: §4.
S. Saharan, N. Kumar, and S. Bawa (2020) An efficient smart parking pricing system for smart city environment: a machine-learning based approach. Future Generation Computer Systems 106. Cited by: §1, §2.3.
W. Shao, Y. Zhang, B. Guo, K. Qin, J. Chan, and F. D. Salim (2018) Parking availability prediction with long short term memory model. In International Conference on Green, Pervasive, and Cloud Computing, pp. 124–137. Cited by: §1, §2.2.
E. Simhon, C. Liao, and D. Starobinski (2017) Smart parking pricing: a machine learning approach. In 2017 IEEE Conference on Computer Communications Workshops, Cited by: §2.2.
E. I. Vlahogianni, K. Kepaptsoglou, V. Tsetsos, and M. G. Karlaftis (2016) A real-time parking prediction system for smart cities. Journal of Intelligent Transportation Systems 20 (2), pp. 192–204. Cited by: §2.2.
B. Yu, H. Yin, and Z. Zhu (2018) Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In IJCAI, Cited by: §2.2, item 2.
W. Zhang, H. Liu, Y. Liu, J. Zhou, and H. Xiong (2020) Semi-supervised hierarchical recurrent graph neural network for city-wide parking availability prediction. In AAAI, Cited by: §2.2.
Y. Zheng, S. Rajasegarar, and C. Leckie (2015) Parking availability prediction for sensor-enabled car parks in smart cities. In ISSNIP, Cited by: §2.2.
J. Zhuang, N. C. Dvornek, S. Tatikonda, and J. S. Duncan (2021) MALI: a memory efficient and reverse accurate integrator for neural odes. In ICLR, Cited by: item 5.