Federated Online Clustering of Bandits

Xutong Liu The Chinese University of Hong Kong
Hong Kong SAR, China
Haoru Zhao Shanghai Jiao Tong University
Shanghai, China
Tong Yu Adobe Research
San Jose, CA, USA
Shuai Li Correspondence to: Shuai Li <shuaili8@sjtu.edu.cn> Shanghai Jiao Tong University
Shanghai, China
John C.S. Lui The Chinese University of Hong Kong
Hong Kong SAR, China
Abstract

Contextual multi-armed bandit (MAB) is an important sequential decision-making problem in recommendation systems. A line of works, called the clustering of bandits (CLUB), utilize the collaborative effect over users and dramatically improve the recommendation quality. Owing to the increasing application scale and public concerns about privacy, there is a growing demand to keep user data decentralized and push bandit learning to the local server side. Existing CLUB algorithms, however, are designed under the centralized setting where data are available at a central server. We focus on studying the federated online clustering of bandit (FCLUB) problem, which aims to minimize the total regret while satisfying privacy and communication considerations. We design a new phase-based scheme for cluster detection and a novel asynchronous communication protocol for cooperative bandit learning for this problem. To protect users’ privacy, previous differential privacy (DP) definitions are not very suitable, and we propose a new DP notion that acts on the user cluster level. We provide rigorous proofs to show that our algorithm simultaneously achieves (clustered) DP, sublinear communication complexity and sublinear regret. Finally, experimental evaluations show our superior performance compared with benchmark algorithms.

1 Introduction

Stochastic multi-armed bandit (MAB) [auer2002finite] is a well-known sequential decision-making problem, where a learner sequentially selects actions so as to maximize the cumulative rewards (or minimize the cumulative regret). One fruitful application area of MAB is the online recommendation systems (RecSys) [chu2011contextual, abbasi2011improved, gentile2014online, li2019improved, zhang2020conversational, li2021unifying], where MAB algorithms provide a principled way to handle the challenge of exploration-exploitation trade-off [lattimore2020bandit].

To advance the bandit algorithm for large-scale applications, contextual linear bandits add the simple yet effective linear structure assumptions on actions and reward functions [chu2011contextual, li2010contextual, abbasi2011improved]. One limitation, however, is that such a model mainly works in a content-dependent manner, ignoring the often used tool of collaborative filtering. To address this issue, the clustering of bandits (CLUB) are proposed [gentile2014online, li2016collaborative, li2018online, li2019improved]. The CLUB algorithms adaptively cluster similar users and utilize the collaborative information given by the cluster structure, which dramatically improves the recommendation quality.

While most existing bandit algorithms are designed under a centralized setting, in response to the increasing application scale and public concerns about privacy, there is a growing demand to keep user data decentralized and push the learning of bandit models to the client or the local server side . This paradigm is now known as federated learning [kairouz2021advances]. Owing to its overall applicability, there has been a surge of interest in studying federated MAB [dubey2020differentially, zhu2021federated, shi2021federated], which promises cooperative bandit learning with larger amounts of data (across multiple local servers) while keeping the data decentralized. This motivates us to study the CLUB problem to its federated counterpart, i.e., the federated clustering of bandits (FCLUB).

In FCLUB, each local server can conduct its own local clustering of bandit algorithms. To enable the collaborative effects of users across different servers, the local server could also collaborate with other local servers under the coordination of a global server, whose communication needs to satisfy specific privacy and communication requirements. The goal of this work is to design an federated online clustering of bandit framework, so as to minimize the -round regret under the privacy protection requirements and communication cost considerations.

The key challenge of FCLUB is designing collaborative bandit learning procedures and cluster detection strategies to identify the overall cluster structures across different local servers, where each local server only holds part of the users with unknown interests. Such a problem is more challenging due to the following privacy and communication cost requirements, which are two first-order requirements for any federated applications [kairouz2021advances].

Privacy protection: To reduce the privacy leakage of each user, we expect local servers to only share user clusters’ data instead of individuals’ raw data. In addition, we still need a mechanism to protect the uploaded (cluster) information against possible adversaries outside the local server, for which we adopt the solution concept of differential privacy (DP). However, the off-the-shelf DP notion is defined on individual users, hence unsuitable for FCLUB. It is challenging and unclear what is a suitable notion of privacy over the clustering of users and how to devise algorithms to guarantee the corresponding privacy requirements.

Communication: Communication is critical for collaborative learning, but may also be expensive or time-consuming. For FCLUB, it is desired to minimize the total regret while keeping the communication costs (in terms of communication rounds between the global server and local servers) as low as possible. Another requirement is to design an asynchronous communication protocol incorporating the randomly arriving users and possibly lagging servers, preventing commonly used synchronous protocols [dubey2020differentially].

1.1 Our Contributions

To address the aforementioned challenges, this paper makes four contributions.

1. Problem Formulation: We propose the setting of online clustering of bandits to its federated counterpart, which considers the privacy protection and communication requirements. We also propose a novel cluster differential privacy (CDP) notion tailored for the FCLUB setting.

2. Algorithm Design: We propose a private and communication-efficient FCLUB-CDP algorithm. For privacy protection, a tree-based privatizer is designed to guarantee our proposed CDP. For communication efficiency, we follow the phase-based principle for cluster detection and propose the asynchronous communication protocol for delayed information sharing. In particular, each local server maintains upload/download buffers and occasionally uploads/downloads the buffered information to/from the global server only if it finds the latest information deviates too far from the last update.

3. Theoretical Analysis: We prove that FCLUB-CDP achieves the regret bounds, communication costs and -CDP privacy guarantee, respectively.

4. Experiments: We conduct extensive experiments over synthetic and real-world datasets to validate our theoretical analysis. Empirical results show the superior performance of our algorithm over existing algorithms.111Codes and datasets are available at GitHub.

1.2 Related Work

Online Clustering of Bandits. The online clustering bandits is first proposed by gentile2014online and shows its effeteness by accelerating the learning process of contextual bandits. The key idea is to use a graph representing the user similarity and adaptively refine the user clusters for information sharing. This work has been extended by a series of works considering the collaborative effects on both users and items [li2016collaborative], the context-aware settings [gentile2017context], the cascading bandit setting [li2018online] and the users with different user frequency [li2019improved]. However, none of these works consider the privacy constraints and communication cost requirements imposed by the FL paradigm like the current work, and therefore cannot give guarantees on these two critical criteria. korda2016distributed considers the peer-to-peer but non-private clustering of bandits, our work studies the private bandit setting under the orchestration of a global server, which requires different algorithms and analysis.

Federated and Distributed Bandits. There has been growing interest in bandit learning with multiple players. One line of research investigates the competitive agents with collisions [anandkumar2011distributed, rosenski2016multi, bistritz2018distributed, boursier2019sic], in which the reward for an arm is zero if it is chosen by more than one agent. The goal of these works is to minimize regret without communication, which is different from ours. The cooperative distributed bandits are most related to our work, in which multiple agents collaborate to solve a bandit problem over certain communication networks, e.g., peer-to-peer networks [korda2016distributed] or client-server networks [dubey2020differentially, li2021asynchronous]. Our work belongs to the client-server setting, but we differ from both dubey2020differentially and li2021asynchronous since neither of them considers the clustering effects of users.

Differential Privacy. Our work leverages on differential privacy, a rigorous mathematical framework of privacy first proposed by dwork2006calibrating. We utilize several useful techniques from the standard differential privacy to maintain our cluster differential privacy condition. Most notably, we use a tree-based algorithm which is introduced in chan2011private to realize differential privacy for the continual release of statistics. In the single-agent bandit setting, shariff2018differentially also utilizes this tree-based algorithm to achieve Joint DP, which is then extended by dubey2020differentially to the federated setting. The closest work to ours is dubey2020differentially and they study the simpler case where each local server only holds one user and all users are identical (with the same unknown preference vector), hence gives the different user-level DP definition with different privatizer and analysis.

To the best of our knowledge, this paper is the first to generalize the CLUB to its federated setting, which simultaneously achieves privacy protection and communication requirements.

2 Problem Settings

In this section, we formulate the setting of “Federated Clustering of Bandits” (FCLUB). We use to represent set . We use boldface lowercase letters and boldface capitalized letters for column vectors and matrices, respectively. For the norms, denotes the norm of vector . For any symmetric positive semi-definite (PSD) matrix (i.e., ), denotes the matrix norm of regarding matrix .

At the global level, there are users, denoted by the set . Each user has an unknown preference vector and for simplicity we assume . Since users may have the same/similar preference vector, we assume there exists (unknown) different preference vectors, i.e., . Users with the same preference vector form an underlying cluster and we denote these (unrevealed) clusters by . Different from CLUB, users in FCLUB are distributed in local servers denoted by . At the local level, the local server contains users (with ) and similarly, these users form local cluster where .

The learning agent interacts with the bandit game as the follows. At each time , a user randomly arrives with probability . Then items are generated to form a item set , where the feature of each item is drawn independently from a fixed but unknown distribution over . The learning agent identifies the local server that belongs to and the current user cluster (detected by our algorithm) which lies in. The local server then recommends an item to the user based on the aggregated information from cluster . After receives the recommendation, the learning agent receives a random reward . Let be the historical information before time . We assume the expectation of reward is linear in the feature vector and the unknown preference vector , i.e., , and have sub-Gaussian tails .

Now we give some assumptions on preference vectors and item feature vectors. Note that all the assumptions follow the previous works gentile2014online, gentile2017context, li2018online, li2019improved.

Assumption 1 (Gap between preference vectors).

For any two different preference vectors , there is a fixed but unknown gap so that .222As previous works, this assumption can be relaxed by assuming the existence of two thresholds, one for the between-cluster distance , the other for the within-cluster distance .

Assumption 2 (Item regularity).

For item distribution , there exists a known so that is full rank with minimal eigenvalue . Meanwhile, for all time , for any fixed unit vector , has sub-Gaussian tail with variance .

Learning Efficiency. The goal of the learning agent is to accumulate as much reward as possible. Let the optimal item for user at time be . The learning performance is measured by the regret, defined as

(1)

where is the regret at time and the expectation is taken over the randomness of the algorithm and the environment regarding the users and the item sets .

In addition to the regret, privacy protection and the communication cost are two important criterion in federated learning. In this work, we aim to ensure that the user data are protected under privacy constraints and the communication complexity is low.

Privacy Requirements. To protect the user data, we introduce two privacy requirements. First, we desire the local server only uploads the user clusters’ sufficient statistics (or clustered data) for the learning procedure, instead of individual users’ raw data. Second, to protect the clustered data against third-party adversaries outside the local server, we adopt the notion of DP, which encodes the intuition that any observable output changes very little (in probability) when any input datum changes. Since existing DP notions are defined on user-level data shariff2018differentially, dubey2020differentially, we introduce a new differential privacy requirement to protect the cluster-level data.

The contextual MAB problem involves two sets of variables against the adversaries outside the local server: the decision sets and the observed rewards . Since the users only receive and store observations regarding the chosen action and the observed reward , it suffices to protect to achieve DP requirements. Let be time slots when user in cluster appears at local server , we denote two sequences and as -neighboring if for .

Definition 1 (Cluster Differential Privacy).

In the FCLUB setting with servers and (at most) clusters, a federated contextual bandit algorithm is -CDP, if for any s.t. , any and the set of sequences and s.t. and are neighboring, and for any subset of actions of actions, it holds that

(2)

Note that our CDP notion formalizes the intuition that the action chosen by any local server (at the cluster level ) must be sufficiently indistinguishable (in probability) to any single pair from any other local cluster . Such a notion does not require each cluster is private to its own observations, i.e., each cluster of users can be trusted with its own data, which is different from the local DP [zheng2020locally] or Joint DP [shariff2018differentially] that assume even itself cannot be trusted.

Communication Complexity. To evaluate the communication complexity, we count one upload operation (or one download operation) between any local server and the global server as one communication round. Our communication complexity is the total number of communication rounds over the time horizon .

3 Algorithm

Figure 1: Illustration of how our algorithm detects clusters from phase to . Local server delete edge and split into . The global server merges local clusters as .

In this section, we introduce our phase-based federated clustering of bandit algorithms with CDP (FCLUB-CDP).

Identify the Underlying Cluster Structure. To correctly identify the cluster structure, we design a phase-based clustering detection algorithm in Algorithm 2. The high level idea is to first conduct local-level clustering of bandits on each local server and merge the local clusters on the global server.

At the local level, each server maintains a profile of for its own local users , where is Gramian matrix, is the moment vector of regressand by the regressors, and is the number of times that has appeared up to time . At the beginning of phase (or ), based on profiles , each server maintains an undirected graph structure , where nodes represent all local users and a pair of users are connected by an edge if they are similar. We initialize the graph by a complete graph and gradually delete edges at every beginning of each phase . Specifically, in line 3, we delete edge between any user and if the distance between their estimated preference vector are larger than the following threshold.

(3)

where and . After the deletion, users in connected components are grouped into local cluster . In line 5, server uploads the clustered information to the global server, which contains clustered information for each cluster . Note that are added with random perturbation to protect the users’ data, which will be introduced shortly after.

In line 7, when the global server receives the privatized clustered information from all servers, it performs a merge operation to merge clusters from different servers whose estimated clustered preference vectors are close into a global clusters according to the following inequality.

(4)

where and . denotes the set of global clusters. Note that the local clusters in the same global cluster indexed by will communicate and share protected clustered information with each other in an asynchronous manner. At the beginning of phase , if the new global cluster structure is different from at phase , we will renew the shared global information , , for . For local servers in the same global cluster , the local synchronized information , the upload buffers and download buffers are renewed. We also generate a new tree-based privatizer PVT for each cluster at server , which will be introduced later on.

1:  Input: Failure probability , deletion parameter , merge parameter , privacy parameters . 2:  User initialization: For , . 3:  Local server initialization: For , set graph , local information, upload buffers, download buffers: , perturbations . 4:  Global server initialization: create one global cluster and set the global information for , the . 5:  for  do 6:     Detect and adjust clusters (Algorithm 2). 7:     for  do 8:        Compute the total time step . 9:        Advance all parameter, e.g., . 10:        User at local server arrives and gets the local cluster that belongs to based on . 11:        Compute local according to Lemma 1. 12:        Local server receives feasible context set and recommends item . 13:        User receives feedback , update the user ’s information: and others unchanged. 14:        Check upload event (Algorithm 3). 15:        Check download event (Algorithm 4). 16:     end for 17:  end for
Algorithm 1 Phase-based FCLUB with CDP

Asynchronous Communication Protocol. In this work, we design a novel asynchronous communication protocol to incorporate the randomly arriving users. To reduce the communication cost, our high-level idea is to use the delayed communication, where the feedback are temporarily stored in buffers and only if the stored information exceeds a threshold, the upload/download events are triggered. Such a threshold will ensure that the local information will not diverge too far from the global information, which in turn will not diverge too far from the scenario when information are fully synchronized. Also note that our communication is conducted in the asynchronous manner at the local cluster level. In other words, all local clusters indexed by will establish connection with each other within the global cluster . For each local cluster , it stores a local copy of the sufficient statistics and a upload buffer . For the global server, it prepares for each local cluster a download buffer , which are used to send other local servers’ information to the local cluster. It also maintains the global statistics to save the data uploaded from local clusters in global cluster .

Our proposed communication framework consists of two components: the upload protocol (Algorithm 3) and the download protocol (Algorithm 4). For the upload protocol, at each time step , user visits the server and receives recommended item . After the user interacts with the environment and observes feedback , the local server updates the upload buffers in line 1 and checks the following condition to decide whether to upload the upload buffer:

(5)

where and are tentative and current perturbation for privacy protection, respectively. If the condition is satisfied, the local server sends to the global cluster . The global server then merges the uploaded information into the global information in line 4 and also sends it to download buffers for other local clusters in line 6. For local cluster itself, the local server updates the local statistics and initializes the upload buffer using the newly generated perturbation in lines 8 to 10. For the download protocol, at each time step , the global server will check the deviation between global statistics and the local statistics via following condition:

(6)

independently for local clusters in global cluster . If any cluster satisfies such condition, the global server sends the information from other clusters to , which is used to update ’s local statistics. Finally, the global server cleans the download buffer.

Tree-Based Privacy Protocol. To ensure the uploaded information are privatized, we adopt the tree-based privatizer to generate random perturbations and whenever an upload event happens. Note that the privatier subroutine is at the local cluster level and a new privatizer is created if the cluster structure changes at the start of any phase .

Let be a (matrix-valued) sequence of length , and be the partial sum of the first elements that will be realised privately. Generally speaking, the tree-based mechanism dwork2006calibrating maintains a binary tree of depth , where the leaf nodes contain the elements and the parent node maintains the sum of its children. For each node with value , the tree-base mechanism protects privacy by adding noise to each node and release if queried. The key advantage is that such a tree only accesses nodes to compute and release the partial sum , which means the perturbation is at most instead of .

Following this general idea, we implement the tree-based privatizer that satisfies the requirements of CDP. Recall that we only need to protect the information uploaded to the global server, it suffices to maintain a tree of depth for the upload event, where is the total number of uploads. To make the partial sums private, we insert a random noise matrix to each node in , similar to that of shariff2018differentially and dubey2020differentially,. Specifically, we sample a random matrix where each entry is drawn from i.i.d. Gaussian distribution and symmetrize it to get . It follows that in order to ensure the whole tree is -DP, each node should preserve -DP. In other words, it suffices to set the variance for each tree node. Note that at each upload round , the total noise added to the partial sum is the summation of at most random matrices with size , where the top-left -submatrix forms and the first elements from the right-most vector forms . By concentration of random matrices tao2011topics, we have with probability at least , the operator norm of is

(7)

for any .

Recommendation Procedure. At each time step , the recommended item for user is selected as follows. When the current cluster is correct (which is guaranteed after rounds and to be proved later), the estimated is computed using the local information and . Since by Lemma 1, , the confidence radius is , which characterizes the exploration bonus for item . Then the local server will recommend the item that maximizes the plus the above exploration bonus. Finally, the user will receive feedback and the system updates corresponding statistics for better decision in future rounds.

Lemma 1.

Under the setting of FCLUB and fix a local cluster located at the server which shares the information with clusters (including itself), let the true preference vector be and the true cluster be , let . When all (global) clusters are correctly identified and partitioned, it holds with probability at least ,

(8)

where .

1:  . 2:  for  do 3:     Set by deleting any edge if Equation 3 holds. 4:     For , generate new perturbation using in Algorithm 5 and set historical . 5:     For , upload the local clustered information to the global server, where . 6:  end for 7:  The global server does global merge based on and get global clusters , where the two local clusters , (with ) are merged together in if Equation 4 holds. 8:  if  or  then 9:     //Renew the cluster information. 10:     for  do 11:        Set global gram matrix . 12:        for  do 13:           Set . 14:           Create new perturbation using in Algorithm 5. 15:           Set new 16:           Set new . 17:        end for 18:     end for 19:  end if
Algorithm 2 Phase-based Cluster Detection and Adjustment
1:  Update upload buffer . 2:  if  then 3:     The global cluster finds so that . 4:     Update global information . 5:     for  do 6:        Global server updates other servers’ download buffer . 7:     end for 8:     Local server updates the local statistics: . 9:     Local server sets and creates new perturbation using the tree-based privatizer . 10:     Local server initializes the upload buffer using the new perturbation: . 11:  end if
Algorithm 3 Check Upload Event
1:  for  do 2:     if  then 3:        Local server receives . 4:        Global server cleans the download buffer: . 5:     end if 6:  end for
Algorithm 4 Check Download Event
1:  Input: Privacy budget , number of uploads . 2:  Create a binary tree of depth . 3:  For each node, we generate a perturbation matrix matrix , where and with . 4:  Calculate a queue of for partial sums . 5:  Sequentially pop one pair of if PVT() is called.
Algorithm 5 Privatizer PVT() for cluster at server

4 Results

Recall that perturbations are designed to satisfy the -CDP requirement. In particular, the privacy budget affects the regret and communication bounds via the following quantities , which can be treated as spectral bounds for . Let , where is the number of uploads for local server and cluster .

Definition 2 (Approximately-accurate and ).

The bounds and are -accurate for for any and :

(9)

with probability at least .

As will be shown later, our communication protocol ensures , so , , and , where is given by our privatizer.

In the following, we will give general regret and communication bounds using and replace them with their exact values.

4.1 Regret Bound

We give the following theorem as our main result for the regret bound.

Theorem 1.

Suppose the cluster structure over the users and items satisfy the assumptions in Section 2 with gap parameter and item regularity parameter . If the privatizer produces random perturbation that are -accurate as in Definition 2, with probability at least , the regret is upper bounded by

(10)

We will give the proof sketch for the above theorem 1.

Proof.

Our proof mainly consists of two parts. The first part bounds the number of exploration rounds after which the overall user clusters are correctly detected at the global server. The second part is to bound the regret for the asynchronous contextual linear bandits after the clusters are partitioned correctly.

Different from standard online clustering bandits, the key technical challenge is to take care of the additional random Gaussian noise produced by the privater, which perturbs the true observation that is needed for global cluster detection and the regret analysis for contextual linear bandits. Moreover, such perturbed observation are also lagged behind the instant observation, since FCLUB-CDP adopts the "delayed" asynchronous communication where upload and download are triggered occasionally. This makes standard contextual bandit analysis no longer works and requires new proof techniques to handle the gap between instant observation and the lagged (and perturbed) observation.

For the first cluster detection part, by the assumption of item regularity, we prove that after rounds, the local estimates are accurate enough so that the local clusters are correctly identified, similar to that of li2018online. Specifically, the 2-norm distance between local estimate and the truth for any user is less than . Thus the local clusters are split correctly for all local servers. Now for the global cluster detection, the global server receives the aggregated observation from correctly partitioned local clusters, in which random Gaussian noises are added. Based on spectra property of Gaussian noise matrices (definition 2), the global server will spend additional rounds so that the perturbed estimate are accurate enough at the beginning of phase , where . Therefore, after , the overall user clusters are partitioned correctly.

For the regret after , we use the delayed update technique from [abbasi2011improved, Section 5.1], which only recomputes the confidence radius only times and hence saves computation. The same strategy can also be applied for the delayed communication. The key analysis relies on using the upload and download condition in eq. 5 and eq. 6, so that the actually-used cluster confidence radius is at most times larger than that if all local servers upload their perturbed observations in a fully synchronized manner, where . This will give a regret for the second part, where is the private-version regret for the cluster if all observation are synchronized at each round. The full proof is put in Appendix B. ∎

4.2 Communication Cost

We give the following theorem to bound the total communication cost.

Theorem 2.

Under the CDP setting, the total communication cost satisfies:

(11)
Proof.

The total communication cost also has two parts: the upload at the beginning of each phase for global cluster detection and the asynchronous communication within each phase for information sharing. For the first part, the algorithm has at most phases and at each phase, there are total local clusters uploading the clustered information, hence the total communication cost is . For the second part, recall that we adopt the delayed asynchronous communication protocol and the total number of uploads and downloads can be bounded by . See Appendix C for the detailed proofs. ∎

4.3 Privacy Guarantee

Theorem 3.

Algorithm 1 preserves -CDP as defined in Definition 1.

Proof.

The CDP condition is satisfied by assigning the right amount of Gaussian noise in each tree node of our tree-based privacy protocol in Section 3. See Appendix D for details. ∎

4.4 Discussion and Comparison

Discussion on the Regret Bounds. For the regret bound, our result has two terms: the regret before the clusters are correctly partitioned and the regret after the clusters are correctly partition . We will compare our results with several degenerate cases, given that we are the first work to study the federated clustering of bandits setting. For these cases, the additional CDP causes at most factor and asynchronous communication protocol causes at most factor in general.

First, when , our setting degenerates to the linear bandits with DP where all users share the same underlying parameter. Compared to shariff2018differentially which gives a regret with communication, our bound has a additional factor (or more precisely factor) for the second term, which stems from the larger perturbation in order to protect total communication rounds.

Second, when , our setting reduces to the online clustering bandits with DP, li2018online gives a for the non-DP version. Since CDP mechanism requires random perturbation, the clustering process suffers an additional for the first term and the second regret term now has a new leading factor due to the CDP requirements.

Third, when and if we consider the special case when each local server only has one user and all users come in a round-robin manner, our setting reduces to the distributed linear bandits with DP. dubey2020differentially provides a synchronized algorithm that achieves , our second term has an additional factor because of different communication protocol, which enables asynchronous communication at the cost of the larger compared with communication rounds.

Finally, there is a lower bound , if we consider the case where the clustering structure is known, the communication and privacy budgets are unlimited and each cluster contains equal number of users. In this case, it is equivalent to learn independent linear bandits, each with expected rounds and according to dani2008stochastic, the lower bound is . In other cases, the regret lower bound will be greater and the lower bound still holds. Our regret bound matches the lower bound up to a factor of .

Discussion on the Communication Cost. Our communication cost also has two terms: the first term for identifying clusters at the beginning of each phase and the leading term for our asynchronous communication protocol. Compared with dubey2020differentially when and users come at the round-robin manner, our communication has an additional factor. Due to the specialty of the user arrival, the same paper can achieve communication cost independent of at the cost of additional factor in the regret. Though our total communication cost can not be reduced below due to the first term, it will be interesting to consider whether the similar trade-off works for our asynchronous protocol in the future work.

(a) Comparison with Baselines on the Synthetic Dataset
(b) Comparison with Baselines on the MovieLens Dataset
Figure 2: Comparative Experiments on the Cumulative Regret with Baseline Algorithms

5 Experiments

To validate our theoretical findings, we conduct experiments on a synthetic dataset and a real-world MovieLens dataset. Algorithm 1 is denoted as CDP-FCLUB-DC and we also present its non-private version FCLUB-DC and its non-private, synchronized version FCLUB (by instantly uploading the observations). The baselines include CLUB which uses a separate CLUB algorithm gentile2014online for each local server; SCLUB which uses a separate SCLUB algorithm li2019improved for each server; and LinUCB which uses a separate LinUCB algorithm abbasi2011improved for each user. We also consider the synchronized and asynchronous version of Algorithm 1 (denoted as Homo and Homo-DC, respectively) by treating users are identical with the same preference vector. Note that all results are averaged over ten random seeds, and we provide mean results with one unit of standard derivation for each curve. Due to the space limit, we provide the detailed experiment settings (including data generation and processing) in Section E.1, the parameter study Section E.2, the communication cost Section E.3 and running time results in Section E.4, respectively.

Synthetic Dataset. We first conduct experiments on a synthetic dataset. In Figure 1(a), we compare our algorithm CDP-FCLUB-DC with the baselines listed above. The vertical axis indicates the cumulative regret and the horizontal axis indicates the round . In general, our algorithm CDP-FCLUB-DC’s performance has a clear advantage over baseline SCLUB, CLUB, LinUCB, Homo and Homo-DC. Since Homo-DC and Homo assumes users are in the same cluster, they mistakenly merge different clusters and suffer linear regrets, indicating the correctness of cluster detection is essential to have small regrets. Compared with SCLUB and CLUB that only perform local clustering operations, we can verify the correctness of our algorithm’s clustering operations at the global level, which successfully leverages the collaborative effects across different local servers. As expected, CDP-FCLUB-DC performs a little worse than FCLUB and FCLUB-DC due to the delayed communication and cluster differential privacy requirements.

MovieLens Dataset. In this section, we also compare our algorithm CDP-FCLUB-DC with the baselines listed above on movie recommendations with the MovieLens dataset. The performances are shown in Figure 1(b). Our algorithm CDP-FCLUB-DC’s performance has an advantage over baseline SCLUB, CLUB, LinUCB, Homo and Homo-DC in general. Figure 1(b) shows CDP-FCLUB-DC performs worse than FCLUB and FCLUB-DC due to the delay communication and cluster differential privacy (CDP) as we have explained in synthetic dataset part. Different from the synthetic dataset, in the early stage, our algorithm needs more time to identify the underlying cluster structure. But after all user clusters are correctly detected at the global server, our algorithm performs better than Homo/Homo-DC that assume users are homogeneous, CLUB/SCLUB on each local server and LinUCB on each user.

6 Conclusion and Future Work

In this paper, we formulate the federated online clustering of bandits problem, which generalizes the clustering of bandits problem to its federated counterpart. To tackle this new problem, we propose a FCLUB-CDP algorithm, which simultaneously achieves sublinear regret, sublinear communication complexity and satisfies our newly-defined clustered differential privacy requirements. Compared with benchmark algorithms, we show that FCLUB-CDP achieves superior performance regarding regret and communication cost. There are many compelling directions for future study. For example, it would be interesting to study our problem where local differential privacy is considered. One could also study a more efficient protocol to further reduce the communication cost.

Acknowledgement

The corresponding author Shuai Li is supported by National Natural Science Foundation of China (62006151). This work is sponsored by Shanghai Sailing Program. The work of John C.S. Lui was supported in part by the RGC SRFS2122-4S02.

References

Supplementary Material

Appendix A Summary of Notations

Table 1 summarizes the notations used through the main paper and the appendix.

Symbol Meaning
Number of users.
Number of underlying clusters.
Dimension.
Number of local servers.
Number of candidate items to be recommended each round.
Number of rounds.
Privacy budgets.
Index of rounds.
Index of phases, each phase begins at .
(or ) Index of users (or index of users at round )
(or ) Index of local clusters (or index of local clusters at round )
(or ) Index of global clusters (or index of global clusters at round )
(or ) Index of local servers (or index of local servers at round )
Candidate item set at round .
Preference vector for user .
or ) Feature vector (or the chosen vector at time ).
Unknown distribution that generates .
Sub-Gaussian parameters for the reward.
Threshold for the between-cluster distance.
minimal eigenvalue for the .
The connection graph at local server at phase , the edge connecting two users means they belong to the same cluster.
The -th global cluster identified by the global server at phase .
Gram matrix, moment vector, number of arrival times for user at time , respectively.
Gram matrix, moment vector, total number of arrival times for all users in cluster at local server at the beginning of phase , respectively. Used for global cluster detection.
local synchronized Gram matrix, local synchronized moment vector, local synchronized total number of arrival times for all users in cluster at local server at , which are collected by asynchronous upload/download events, respectively. Used for recommendation.
global synchronized Gram matrix, global synchronized moment vector, global synchronized total number of arrival times for all users in global cluster at the global server in , respectively. Used for communication.
upload buffer for Gram matrix, upload buffer for moment vector, upload buffer for total number of arrival times for all users in cluster at local server at , respectively. Used for upload.
download buffer for Gram matrix, download buffer for moment vector, download buffer for total number of arrival times for all users in cluster at local server at , respectively. Used for download.
Table 1: Table of Notations
Symbol Meaning
The perturbation matrix for next upload, current upload and adding right amount of to make positive semi-definite, respectively. Used for privacy requirements.
The perturbation vector for next upload, current upload, respectively. Used for privacy requirements.
The total number of communications rounds for local cluster at server at before round .
The maximum number of communications for each local server.
Upload and Download threshold, respectively.
Number of rounds after which clusters are detected and partitioned correctly with high probability.
Spectral bounds for the perturbation matrices .
Confidence interval.
Expected cumulative regrets over time .
Expected communication costs over time .
The fully-synchronized gram matrix, moment vector and number of arrival times if all servers upload/download all information instantly, respectively. Used for analysis.
Table 2: Table of Notations (Continued)

Appendix B Regret Analysis

This section is organized as follows. In section B.1, we introduce several high probability events for the regret analysis. In section B.2, we bound the time horizon after which all clusters are detected and partitioned correctly with high probability. In section B.3, we bound the regret after the clusters are detected correctly. In section B.4, we put all things together to conclude theorem 1.

b.1 High Probability Events

We first define five events that will be helpful for the later analysis and bound the probability that each event happens.

Before we state the definition of the events, recall that is the number of local servers. is the number of times user comes to the system before time , is the number of times users belong to local cluster at server comes to the system before time . As defined in definition 2, , , and , where . We denote .

For the gram matrices, is the gram matrix for user at time , is the gram matrix for cluster detection at the beginning of each phase (Line 5 in algorithm 2), is the synchronized gram matrix which takes information from all clusters that are in the same global cluster.

For the confidence intervals, let a small failure probability to be tuned later. We denote intervals , , .

Now we are ready to define a series of events as follows.

.

.

.

.

.

For each of the event , it is noted that their probability , where is given by the definition in Equation 9 and the tree-based privacy protocol in section 3, is by [abbasi2011improved, Theorem 2], is by [gentile2014online, Claim 1] and [li2018online, Lemma 7], and are by Lemma 1 whose proofs are postponed to section B.3.

b.2 Correctness of the Cluster Detection

In this section, we show in lemma 2 when , then with high probability, all local and global clusters are correctly detected and partitioned. In lemma 3, we give the explicit formulation for .

Lemma 2.

Let , , , , , . When , then with probability , (a): All local clusters are partitioned correctly;

(b) All local clusters are correctly merged at the global server.

Proof.

For any user , it suffices to show that when is larger than some threshold (whose value will be settled later), then we have the following two results hold.

(a): Under event and , which happens at least ,

(12)

where the last inequality is valid when .

When Equation 12 holds, for any two user who belong to different clusters, i.e., , , which will trigger the condition in Line 3 in Algorithm 2. On the other hand, when the condition in Line 3 in Algorithm 2 holds, , which implies . In other words, all local clusters are correct after for all users.

(b): Moreover, under event and , for any correctly partitioned local cluster at server with true parameter , with probability at least ,

(13)

, where and .

Here is irrelevant to or (except probability because the merge operation only uses the the local observations from server with local noise added at the beginning of each phase, but doesn’t contain any information from other local servers. According to similar argument, we can show for any two cluster who belong to the same cluster, i.e., , we have , which will trigger the condition in Line 7 in Algorithm 2. On the other hand, if the above condition is triggered, then it implies . In other words, all global clusters are merged correctly.

Now for the last inequality of Equation 12 and Equation 13, we only need to prove the following inequality holds for any ,

(14)

, which uses the observation that and for Equation 12 and for Equation 13.

After we prove the sufficient as shown in Lemma 3 so that Equation 14 holds, now by [li2018online, Lemma 8], at global time , Equation 14 is correct with probability at least . Then at the next cluster detection (which occurs at most two times of ), the global server will partition the global clusters correctly with probability at least (by using union bounds of all corresponding events).

The following lemma states a threshold after which Equation 14 will hold.

Lemma 3.

When , it holds that

(15)

, where , , , , .

Proof.

We can divide this proof into three parts: , and .

The first part can be satisfied by (a) and (b) . And (a) can be satisfied by , (b) by [li2018online, Lemma 9] can be satisfied by .

Then consider the second part. Since , we can prove this part by (c) and (d) . By some math calculation, (c) can be satisfied by and (d) can be satisfied when .

Finally we prove the third part, since , the third part has a similar form to the second part, so their proof are similar. We can get the third part could be satisfied by and .

Considering the required by , and put all these together the lemma 3 is proved. ∎

b.3 Regret Bound After Clusters are Correctly Detected

In this section, we prove the regret upper bound after the clusters are correctly detected and partitioned. Such a bound relies on the confidence interval given by lemma 1.

Proposition 1.

After rounds, with probability at least , the clusters on both local servers and global servers are correctly partitioned, and the partition won’t change in later rounds. Then fix any true global cluster whose local clusters are scattered on servers, let users belong to appear total times, then with probability , the total regret for true cluster is bounded by

and the total regret is bounded by

(16)

where and .

Proof.

We consider the case when all true clusters are correctly detected and partitioned both locally and globally. Suppose the user belong to global cluster whose underlying preference vectors are comes to the server at time slots . For this true cluster , users belong to lies in servers to form local clusters be . With a little abuse of notation, we denote the perturbation at time and by and , respectively. At time , user in server and local cluster arrives, with candidate feature vector sets . With a little bit abuse of the notation, let Recall that is the action selected by our algorithm and as defined in lemma 1. Denote the full-synchronized gram matrix from all servers for cluster without any perturbation as (same for and ). Let . The instantaneous regret for cluster can be written as:

(17)
(18)
(19)

where Equation 17 is because the definition of the selected item , Equation 18 is by Cauchy Schwarz inequality and Equation 19 is by Lemma 4.

By summing over all instant regrets, we have

(20)
(21)
(22)
(23)

where Equation 21 is due to the Cauchy-Schwarz inequality, Equation 22 is by [shariff2018differentially, Lemma 22] and Equation 23 is by lemma 1.

The following lemma bounds the ratio caused by using the delayed information instead of the fully-synchronized information .

Lemma 4.

Let , then

(24)
Proof.

Without loss of generality, let the local cluster ids are with and the global gram matrix is , we have

(25)
(26)
(27)
(28)

where Equation 25 is due to , Equation 26 is because , Equation 27 is due to the fact that for any positive semi-definite matrices s.t. [abbasi2011improved, Lemma 12] and Equation 28 is because of the condition of upload Equation 5 and download Equation 6.

The following lemma states the confidence interval for each local cluster , which is used by line 12 in algorithm 1. See 1

Proof.

Let be the time slot when the last upload (or download) event happens before time and sequence be the time slots when user that belongs to cluster appears. Denote the clusters that belongs to cluster to be . With a little abuse of notation, we denote and by and , respectively. Recall that by definition, . Then we have

Then we multiply the both sides by ,

(29)
(30)

where Equation 29 uses so we have and under event we have , Equation 30 uses the Definition 2 and the self-normalized bound in [abbasi2011improved, Theorem 2].

b.4 Putting All Together

See 1

Proof.

Since , by lemma 2 the regret before all global clusters are correctly identified is , with probability at least . After , by eq. 16 the regret is upper bounded by , with probability . Finally, it suffices to set and by union bounds, the above theorem holds with probability at least . ∎

Appendix C Communication cost analysis

In this section, we show how to upper bound the number of communication rounds before and after the cluster structures are detected correctly.

Proposition 2.

Under the CDP setting, the total communication cost satisfies:

(31)
Proof.

We consider two cases, when and .

Case 1: .

Recall that when , the cluster structure are correct and there will be no changing of clusters afterwards. Fix any local cluster at local server , suppose it shares information with total clusters (including itself). Let be the sequence of time when upload event or download event happen and be the corresponding sequence of local Gram matrix after each event happens. Similar to the proof of

Then we have:

(32)

On one hand, by the definition of the upload and download event, we have for any . On the other hand, by [shariff2018differentially, Lemma 22], , where the last inequality uses the fact that . So . Since we at most have local clusters, the total communication is .

Case 2: . Equivalently, we are in phase , then each phase we will communicate using the similar argument. Then the total communication is .

For the communication during the beginning of each phase in order to detect the cluster structure, we know there will be at most communications, which concludes the lemma.

Appendix D Privacy Guarantee

To guarantee the algorithm preserves the CDP, we use a tree based algorithm where each tree node preserves -DP. For each local cluster , the local privatizer PVT() is -DP and together they preserve -CDP as in definition 1. Also note that our protocol only uploads the protected clustered data rather than each user’s individual data, which satisfies the privacy requirements in section 2.

Appendix E Experiment Settings and Supplemental Experiments

e.1 Experiment Settings

Synthetic Dataset. We consider a setting of users with global clusters and local servers. The weight vectors and item vectors are generated randomly with dimension . For each global cluster , we fix a weight vector and we guarantee that all weight vectors are orthogonal and the difference gap between them is . We stipulate that the default privacy budget is and communication upload/download rate . In each round, a random user comes and the algorithm selects one of items to recommend to the user. After we receive the feedback and compare it with the best action based on the true weight vectors, we evaluate the performance of the algorithm by calculating the cumulative regret.

MovieLens Dataset. We randomly draw movies and users as preparation for the experiment. Similar to li2018online, we generate the weight vectors on the basis of the users and the movies they rated. We randomly select users from these users for the experiment. In this experiment, we assume that there are users with global clusters and local servers and the weight vectors and item vectors are generated randomly with dimension . The users are selected randomly from these users for the experiment. Besides, we stipulate that the default privacy budget is and communication upload/download rate .

e.2 Parameter Study

We conduct the parameter study on a synthetic dataset. Our goal in this section is to numerically investigate the influence of parameters’ change on the performance of our CDP-FCLUB-DC algorithm.

(a) Vary Dimension and Fix Other Variables
(b) Vary Privacy Budget and Fix Other Variables
(c) Vary the Number of Local Servers and Fix Other Variables
(d) Vary the Number of Global Cluster and Fix Other Variables
(e) Vary the Number of users and Fix Other Variables
Figure 3: Parameter Study on the Synthetic Dataset

In Figure 2(a), we vary the dimension from 5 to 20 and fix other parameters. It can be seen that with the increase of dimension , the convergence speed of regret will slow down. In Figure 2(b), we change the privacy budget from 0.5 to 8. It is obvious that when is small, the cumulative regret is better because at this time the privacy will have less impact on the recommendation. In Figure 2(c) and Figure 2(d), we vary the number of local server from 2 to 8 and the number of global cluster from 2 to 8 respectively. The results show that the value of and are positively correlated with the cumulative regret, which is consistent with our conclusion in Theorem 1. For the number of users , Figure 2(e) shows that the empirical results deviate from Theorem 1 since the regret should be larger when there are more users. As we can see that the shaded area almost overlapped for , we conjecture the derivation comes from the randomness of our algorithm and these curves should behave normally when we conduct more independent experiments.

e.3 communication cost

(a) Communication Cost for the Synthetic Dataset
(b) Communication Cost for the MovieLens Dataset
Figure 4: Comparative Experiments on the Communication Costs

Communication cost is one of the most important factors we consider when designing the algorithm. In Figure 4, we compare the communication cost of FCLUB using full communication with that of FCLUB-DC and CDP-FCLUB-DC using the delayed communication. The results clearly indicate that our asynchronous communication protocol can reduce the communication cost effectively.

e.4 Running Time

CDP-FCLUB-DC FCLUB-DC FCLUB SCLUB CLUB LinUCB Homo-DC Homo
run time (ms) 1.707 1.814 58.870 3.805 0.779 0.647 1.277 29.265
run time (ms) 1.667 1.892 83.492 3.855 0.776 0.654 1.301 35.645
Table 3: Comparison of the Running Time

We also compare the average running time (ms) of each round between our algorithm CDP-FCLUB-DC and baselines on synthetic dataset and MovieLens dataset (the first line is synthetic dataset and the second line is MovieLens dataset). It can be seen from Table 3 that due to the use of delay communication, our algorithm has a great improvement in communication cost compared with FCLUB and our run time cost is even lower than SCLUB. However, because of the existence of communication, our run time cost is still higher than CLUB and LinUCB.