Leachable Component Clustering
^†^†thanks: ^*Corresponding author.

Miao Cheng^* School of Computer Science
and Engineering
Guangxi Normal University
Guilin, Guangxi, China
Email: mcheng@mailbox.gxnu.edu.cn Xinge You School of Electronic Information
and Communication
Huazhong University of Science and Technology
Wuhan, Hubei, China
Email: youxg@hust.edu.cn

Abstract

Clustering attempts to partition data instances into several distinctive groups, while the similarities among data belonging to the common partition can be principally reserved. Furthermore, incomplete data frequently occurs in many real-world applications, and brings perverse influence on pattern analysis. As a consequence, the specific solutions to data imputation and handling are developed to conduct the missing values of data, and independent stage of knowledge exploitation is absorbed for information understanding. In this work, a novel approach to clustering of incomplete data, termed leachable component clustering, is proposed. Rather than existing methods, the proposed method handles data imputation with Bayes alignment, and collects the lost patterns in theory. Due to the simple numeric computation of equations, the proposed method can learn optimized partitions while the calculation efficiency is held. Experiments on several artificial incomplete data sets demonstrate that, the proposed method is able to present superior performance compared with other state-of-the-art algorithms.

Incomplete data, leachable component clustering, Bayes alignment, calculation efficiency.

I Introduction

As an essential category of data analysis and handling methods, clustering aims to divide data instances into several separate data groups, by assuming the instances of each cluster share the common similarities of data patterns [20][3]. As a consequence, different data points are to be organized regularly in accordance to certain partition rules, and normally, further handling can be achieved conveniently [41]. Until now, k-means has been the most popular solution to most clustering problems, due to its stable and outstanding performance [28], which is derived from the previous contribution of the classic Lloyd’s method [31]. Furthermore, spectral clustering method [35][33] has received broad attentions in recent years, which seeks for the approximate ideal solutions via decomposition of the spectral Laplacian. Nevertheless, it always suffers from the complexities of decomposition calculation, if large data is absorbed into clustering.

On the other hand, some existing works consider the data analysis with missing values [30][18], which are in loss quite common during information convey and processing. The keypoints of such problem, have been reduced to supplement or fix the broken data with certain repatching approaches in theory, and the resulting data can be adopted to further handling [14]. Since data blocks are normally represented as numeric matrices, it is referred as a matrix completion problem, and solved with optimization tools [6][7]. Nevertheless, the shortcoming of high complexities prohibits it from calculation efficiency, while the concrete framework is hardly to be further developed. More frequently, incomplete information often occurs in big data handling [34][39], e.g., database and recommendation systems, another category of solutions are referred to fill the incomplete information with data imputation methods. Furthermore, data imputation is adopted to computer vision to enhance component analysis with missing values [10][38], and incremental learning of uncertain data has also been discussed [4]. As a consequence, some professional tools have been devised for data imputation, e.g., MICE [5], Amelia II [25]. In addition, multi-view clustering has attracted broad attentions in recent years [43][29], which aims to learn the ideal partitions based on multiple representation of the common instances. It is noticeable that, the proposed method is also feasible for multi-view incomplete data analysis, owing to benefits of imputation models.

As a common limitation, clustering analysis of missing data are hardly to be addressed efficiently, as the two issues are independent problems and solved separately. Till now, there are quite a few solutions available for such issue, and most of them handle the clustering via a pre-step of data imputation [32][9]. In order to improve the performance of data imputation approaches, there is a thirst for specific clustering solution to partitions of incomplete information [26][2]. Based on fuzzy c-means clustering [11], Hathaway and Bezdek proposed a generalized clustering framework for incomplete data [24]. By devising a tolerant subspace clustering method, it is able to handle data with missing values flexibly [22]. In the literature, it has been reduced to be explained as optimization problems [42][27] with specific objective functions that are defined accordingly [1].

In this work, a novel approach to clustering of incomplete data, termed leachable component clustering (LCC), is proposed. To distinguish from existing methods, the main contributions of this work are highlighted as below.

The existing methods focus on representative clustering, while statistical patterns of incomplete data have been ignored. The proposed method seeks for optimized imputation, by exploiting the imputation models with respect to preservation of intrinsic distributions.
The proposed imputation models aim to fulfill the lost elements with the defined objectives, while fixed solutions can be obtained with calculation of the equations. Benefit from the efficiency of optimization, the ideal approximation can be calculated stably, while convergences can be achieved.
The proposed framework can be considered as a unified solution to analysis of incomplete data, and leads to the possible extensions of further advances.

The rest of this paper is organized as follows. The background knowledge of self-expressive clustering method is introduced in section II, and the basic conception of latent component models are given in section III. Then, the main idea of the proposed LCC is given in section IV. The experimental results on several incomplete data sets are given in section V. Finally, the conclusion is draw in section VI.

Ii Self-Expressive Clustering

Given a set of data instances $X = [x_{1}, x_{2}, \dots, x_{n}] \in R^{d \times n}$ , and the amount of clustering $c > 0$ . The standard k-means clustering method attempts to assign each instance $\begin{matrix} x_{i}, & i = 1, 2, \dots, n \end{matrix}$ into $c$ separate partitions, while the reconstruction errors of data can be minimized corresponding to the learned data means. Actually, it has been demonstrated that, it is equivalent for k-means and line reconstruction, and can be summarized as a common objective with a perspective of quadratic optimization problem [3]. Afterward, such idea is summarized as a self-expressive property [42][27], while appended extensions can be developed conveniently. Without loss of generality, the self-expressive problem can be defined as

\begin{matrix} J_{S E C} & = arg min Z \frac{1}{2} {∥ X - X Z ∥}^{2} + γ {∥ Z ∥}_{F}^{2} s . t . & Z^{T} 1 = 1, 0 \leq Z \leq 1. \end{matrix}

(1)

Here, $Z \in R^{n \times n}$ indicates the coefficient matrix that is able to approximately reconstruct the original data instances, $γ$ denotes the balance parameter, while the sub-conditions can avoid the trivial solution and ensure the positive of $Z$ . By extending the objective, the k-means clustering is to optimize the following function,

J_{k} = arg min Z \frac{1}{2} {∥ X - M_{x} Z ∥}^{2} + γ {∥ Z ∥}_{F}^{2},

(2)

where $M_{x} \in R^{d \times c}$ denotes the matrix consisting of data means of each cluster as one of its columns, and $Z \in R^{c \times n}$ is the assigned indicator of partitions. Obviously, the k-means clustering aims to optimize the objective function by seeking for the ideal $Z$ and updating $M_{x}$ iteratively, which can approximate the original $X$ .

Iii Latent Component Models of Data Imputation

Data imputation aims to fullfil the missing values of data instances $X$ , and certain mechnism is adopted to approximately learn the predicated values that can approach to the ground truth $˜ X$ .

Iii-a Principle Component Imputation Model

The well-known principal component analysis (PCA) [40] has been widely applied to learn the orthogonal components of the centered data block, and the main idea can be also explained as the ideal approximation of the original data in theory [13][12][21]. According to the self-expressive property, the orthogonal components of $X$ and $˜ X$ nearly share the common bases. Furthermore, the learned orthogonal components are able to make ideal reconstruction of $X$ , e.g.,

\begin{matrix} J_{P} & = arg min P {∥ ∥ X - P P^{T} X ∥ ∥}^{2} s . t . & P^{T} P = I, \end{matrix}

(3)

where $P \in R^{d \times r}$ denotes the orthogonal bases of $X$ . As a consequence, the reconstructed data can be adopted to learn the following bases accordingly, and the stable approximation is able to be obtained. With the reconstructed patterns $P P^{T} X$ , it is believed that it holds the similar components with ground truth $˜ X$ , and the ideal imputation of missing values can be estimated.

Iii-B Self-Expressive Representation Learning

Though incomplete data brings difficulties in information analysis, it is still possible to be analyzed while the dilemma of handling of original data can be avoided. Without loss of generality, representation learning attempts to learn the representative patterns of the original data with respect to specific objectives [37][36], e.g.,

f : \begin{matrix} x_{i} \to y_{i}, & i = 1, 2, \dots, n . \end{matrix}

(4)

As a consequence, the learned $y_{i}$ are adopted to further handling, which are more optimistic for incomplete representation [19][8].

More specifically, the similarity objectives of instances are normally referred [15], and thus, the objective of pseudo data can be defined as

J_{S} = arg min y \frac{1}{2} n \sum i = 1 n \sum j = 1 {∥ ∥ S i m (ˆ x_{i, j}, ˆ x_{j, i}) - S i m (y_{i}, y_{j}) ∥ ∥}^{2} .

(5)

Here, $S i m (\cdot, \cdot)$ denotes the similarity function between two instances. The ${ˆ x}_{i, j}$ and ${ˆ x}_{j, i}$ indicate the labeled instances corresponding to the valid features between two instances, which are respectively defined as

ˆ x_{i, j} = x_{i} ⊙ l_{i} ⊙ l_{j}

(6)

and

ˆ x_{j, i} = x_{j} ⊙ l_{j} ⊙ l_{i},

(7)

where $⊙$ denotes the Hadamard product [21]. Furthermore, the $l_{i}$ denotes the labels that represent the validation and missing values of instance $x_{i}$ , e.g.,

\begin{matrix} l_{i q} = {\begin{matrix} 1, & i f x_{i q} \neq 0 0, & o t h e r w i s e . \end{matrix} q = 1, 2, \dots, d . \end{matrix}

(8)

Here, $x_{i q}$ denotes the $q$ -th element of the incomplete vector $x_{i}$ . As a consequence, the learned $y_{i}$ can approximately preserve the original similarities of data pairs.

Iii-C Information Volume Preservation Model

The main idea of information volume preservation (IVP) is based on the assumption that, the informative patterns of data actually hold a nearly constant volume corresponding to different partitions. Contrarily, IVP is able to contribute to pattern unfolding and distribution learning [17][16]. In terms of this idea, it is to reserve the information contained in ground truth to be approximate to incomplete data, while reconstruction is adopted to estimate $_{i}$ .

With respect to IVP model, the feature distances of incomplete data can be estimated and nearly approximate to the characteristics of ground truth, as the lost patterns can be weakened in the high-dimensional feature space. Particularly, the reconstruction of kernel components are adopted, and the information volumes $V_{f} (x_{i})$ of kernels $h (\cdot, \cdot)$ associated with incomplete data $x_{i}, i = 1, 2, \dots, n$ , can be defined as

(9)

Here, $x_{j}$ denotes the $k$ nearest neighbors of $x_{i}$ . As a consequence, the dilemma of incomplete patterns can be alleviated, and IVP aims to preserve the obtained volumes in the fulfilled data that are to be estimated. Furthermore, it is necessary to make the data patterns be the explicit characteristics of normal instances, which can be further depicted with data similarities, e.g., euclidean or cosine distances. Accordingly, the objective of IVP model can be defined as

\begin{matrix} J_{I V P} = arg min ˜ x_{i} ∣ ∣ V (_{i}) - V_{f} (x_{i}) ∣ ∣, & i = 1, 2, \dots, n . \end{matrix}

(10)

Thus, the ideal approximation can be learned, by solving the equations and repairing incomplete data alternately.

Iv Leachable Component Learning

Derived from information preservation model, a novel approach to data imputation is devised in this work. And the estimation of data distributions are exploited to predicate the lost patterns of incomplete data.

Iv-a Bayes Alignment Model

Assume that the distribution parameters of fulfilled data are equivalent to the ones of ground truth, then it is feasible to learn the imputation by solving the latent equations of associated probabilistic models [23]. More specifically, the basic idea of the imputation model is based on the assumption that, the difference of data distributions holds equivalence between the incomplete and fulfilled instances, which can approximate to the distribution of ground truth. Distinguishingly, the Bayes alignment in the context, actually stands for the affiliation probabilities of each instance associated with certain data partitions, and are approximately represented by distributions. Nevertheless, it is worthwhile to highlight several calculation issues firstly. The sample means are to be calculated associated with the available values of incomplete data, and formally, the valid values of each instance is referred, which is defined as

\begin{matrix} m_{i} = \frac{1}{δ_{i}} n \sum j = 1 x_{j i} \cdot l_{j i}, & i = 1, 2, \dots, d . δ_{i} = n \sum j = 1 l_{j i} \end{matrix}

(11)

Accordingly, the sample variance $σ$ is to be calculated associated with the valid labels of data features. Then, the distribution of valid features of each data can be calculated according to normal distribution of samples. As a consequence, the distribution of samples with respect to a specific data $x_{i}$ can be approximately calculated as

G_{m, σ^{2}} (x_{i}) = \frac{1}{\sqrt{2 π} σ} exp (- \frac{{(x_{i} - m ⊙ l_{i})}^{T} (x_{i} - m ⊙ l_{i})}{2 σ^{2}}) .

(12)

Note that, the obtained distribution $G_{m, σ^{2}} (x_{i})$ is an estimated value of incomplete data associated with the available characteristics of data patterns.

As a consequence, it attempts to learn the ideal approximation $_{i}$ that can reserve the statistical distributions of each incomplete data, such as

\begin{matrix} J_{B A} = arg min ˜ x_{i} ∣ ∣ G_{˜ m, ˜ σ^{2}} (˜ x_{i}) - G_{m, σ^{2}} (x_{i}) ∣ ∣, & i = 1, 2, \dots, n . \end{matrix}

(13)

Furthermore, the statistical parameters of fulfilled data can be calculated naturally, and obtain the approximate $˜ m$ and $^{2}$ . There are several available calculation solutions to obtain the latent values of incomplete data with respect to the objective of $J_{B A}$ . Nevertheless, the objective of Bayes alignment imputation model can be solved with the simple calculation of equations.

Iv-B Solution of Latent Leachable Learning

The leachable imputation models aim to learn the approximate incomplete data $x_{i}, i = 1, 2, \dots, n$ with objectives of latent components. For the Bayes alignment model, it is to solve the following equation for each incomplete data,

˜ x_{i} \cdot ˜ x_{i} - 2 ˜ m \cdot ˜ x_{i} + ˜ m \cdot ˜ m + 2 {˜ σ}^{2} log (\sqrt{} 2 π ˜ σ G_{m, σ^{2}} (x_{i})) = 0.

(14)

Note that, the objective is a standard quadratic equation, but the solution to the above equation normally exists in complex domain. Nevertheless, it can be approximated in real values with numeric rotations, and achievements are promised to be obtained with efficiency.

As a consequence, the ideal solution $_{i}$ to the proposed imputation model can be calculated as

(15)

Here, $τ (\cdot)$ denotes the absolute function, such as

τ (x) = {\begin{matrix} | x |, & i f x \neq 0 0, & o t h e r w i s e . \end{matrix}

(16)

In addition, there obtains two alternative $_{i}$ for approximation in either model. During the first iteration, $_{i}$ that is close to mean $˜ m$ is chosen as the solution of equation. Then, the ideal $_{i}$ is to be updated as the one that is close to the obtained $_{i}$ in the last iteration. Nevertheless, it has been shown that either $_{i}$ can be competent to learn the ideal results. After obtaining the approximate data, it is to update the parameters and repeatedly optimize the latent components during iterations.

Furthermore, the proposed method is able to be recycled, due to its intrinsic relationship with data means. More specifically, the learned partitions can be the supposed affiliation of each instance as learning models in the next cycle, and the leachable components can be achieved with the fresh data groups. In other words, the parameters of distributions can be estimated and updated with the learned groups during each iteration, with respect to the instances that belong to the common partitions, as well as the lost patterns of instances. As a consequence, the fulfilled patterns can be achieved with repeated cycles, benefiting from the initially leachable learning.

Iv-C Discussion

Compared with existing methods, the fixed solutions can be obtained with the proposed alignment model during each iteration, and the improved results are available in the further optimization steps. Nevertheless, the optimization convergence is hardly to be achieved, while the supplemented data can be approximated to the original data with respect to global distribution. As a consequence, the similarity function can be defined in accordance with Eq. (14), and the lost elements can be estimated by solving the approximate equation. Note that, the proposed supplementation models mainly rely on the solutions to line equations, which can be efficiently calculated with low computational complexities.

Similarly, the original IVP imputation model aims to solve the equation associated with the reservation of information volumes, e.g.,

\begin{matrix} k \sum j = 1 (˜ x_{i} \cdot ˜ x_{i} - 2 \cdot x_{j} \cdot ˜ x_{i} + x_{j} \cdot x_{j}) - k \sum j = 1 S_{f} (x_{i}, x_{j}) = k \cdot ˜ x_{i} \cdot ˜ x_{i} - 2 \cdot ˜ x_{i} \cdot k \sum j = 1 x_{j} + k \sum j = 1 x_{j} \cdot x_{j} - k \sum j = 1 S_{f} (x_{i}, x_{j}) = 0 \end{matrix}

(17)

where $S_{f}$ denotes the similarities of incomplete data between $x_{i}$ and $x_{j}$ in feature space, which is defined as

S_{f} (x_{i}, x_{j}) = S i m_{f} (ˆ x_{i}, ˆ x_{j}) .

(18)

Nevertheless, it is hardly to achieve convergence in a common step, owing to independent optimization of each instance, which is necessary to calculate the information volumes associated with each instance. In practice, it is alleviated by setting an upper bound of iterations, and sampling is adopted to reduce the complexities.

Fig. 1: The obtained Normalized Mutual Information (NMI) of different algorithms on four data sets. (a) Birds (b) Firewall (c) Flower (d) Monkey.

Fig. 2: The obtained F Score of different algorithms on four data sets. (a) Birds (b) Firewall (c) Flower (d) Monkey.

Fig. 3: The obtained Rand Index of different algorithms on four data sets. (a) Birds (b) Firewall (c) Flower (d) Monkey.

Data	Partition	Dimensionality	Instance
Birds	20	50	3625
Firewall	3	11	3000
Flower	5	50	4323
Monkey Species	10	50	1098

TABLE I: The details of data sets

V Experiments

In this section, the performance of the proposed methods are to be evaluated on several artificial data sets. Four data sets are employed in the experiments, including Internet Firewall¹¹1http://archive.ics.uci.edu/ml/datasets/Internet+Firewall+Data, 100 Birds²²2https://www.kaggle.com/gpiosenka/100-bird-species, Flower Recognition³³3https://www.kaggle.com/alxmamaev/flowers-recognition, 10 Monkey Species⁴⁴4https://www.kaggle.com/slothkong/10-monkey-species. All experiments are performed on the hardware of 2.9 GHz CPU with six cores and 16 GB RAM. During the experiments, partial instances of some data sets are employed, and the deep and reduced representations of image data sets are extracted, which consist of normalized patterns of 50 dimensions for each instance. For the Internet Firewall data set, the 1,000 instances of each category are randomly selected, and the image data of 20 categories in bird species are selected. In summary, the details of involved data sets are given in Tab. I. Several state-of-the-art methods associated with data imputation of latent component models are involved to make a comparison of clustering performance, which are given as follows.

Baseline k-means clustering [28]: The k-means clustering on raw incomplete data.
Nearest neighborhood fulfilled clustering (NNC) [3]: The NN imputation based clustering of incomplete data. To reduce complexity, the average fulfillment from three random neighbors are preferred in the experiments.
Principle components fulfilled clustering (PCC) [13][12][38]: The PC imputation based clustering of incomplete data.
Self-expressive clustering (SEC) with incomplete data [42][27]: The SEC method on raw incomplete data.
Information volume preservation fulfilled clustering, namely IVPC(1) and IVPC(2): The proposed IVP imputation followed by the standard self-expressive and k-means clustering respectively.
Bayes alignment fulfilled clustering, namely BAC(1) and BAC(2): The BA imputation followed by the self-expressive and k-means clustering respectively.

For each data set, thirty percent of whole elements are randomly selected and set to be null to make the incomplete data. Then, the data supplementation and standard SEC are performed by following each algorithm. Note that, the initial partitions of instances are quite important for optimized clustering, while random initialization is employed in the experiments. To alleviate the occasional sense, all experiments are repeated five times and the average results are calculated and recorded as the outputs. The obtained results of the first experiment are given in Fig. 1-3, corresponding to three clustering measure.

According to the experimental results, the best NMI results are obtained by BAC method on all four data sets, while better outputs are achieved with F measure. The obtained results of NNC and PCC are quite similar to each other, as well as the Baseline. In other words, these two methods produce the similar fulfillment for data imputation, and close performance are given for clustering. Furthermore, the outputs of IVPC are different from each other, corresponding to different clustering mechanism. And totally, k-means based IVPC is able to give better partitions compared with SEC in most cases, especially on the Firewall and Flower data sets. Though k-means based BAC is better with NMI and F measures on Firewall and Flower, the inferior results are obtained if the Rand Index measure is referred. Nevertheless, it is noticeable that, the obtained Rand Index of different methods are quite close to each other, and it is hardly to make conclusion.

Fig. 4: The obtained Normalized Mutual Information (NMI) of different algorithms associated with different percent of null elements on four data sets. (a) Birds (b) Firewall (c) Flower (d) Monkey.

Fig. 5: The obtained F Score of different algorithms associated with different percent of null elements on four data sets. (a) Birds (b) Firewall (c) Flower (d) Monkey.

Fig. 6: The time complexities (seconds) of different algorithms associated with different percent of null elements on four data sets. (a) Birds (b) Firewall (c) Flower (d) Monkey.

In the second experiment, the performance of different methods with variational null elements are evaluated. More specifically, 0.1-0.8 percent of elements are randomly set to be null on four data sets, then all algorithms are repeatedly performed, and the average outputs are recorded as the results. In the experiment, the SEC based IVP method is performed to learn the data imputation associated with different percent of incomplete data. The NMI and F Score results are shown in Fig. 4-5, while Rand Index results are ignored due to the limited space.

In terms of the results, the performance of most algorithms are quite similar to each other, and the fluctuating tendency of average results are reduced slowly as more unavailable elements are appended into experiments. Obviously, all methods are affected by the involved amount of null elements, and the results of NNC, PCC, and SEC are quite close, owing to the fact that the final outlets depend on either imputation and clustering, especially on Birds and Firewall. Compared with SEC based BAC, k-means BAC is superior than other methods, while IVPC is able to achieve comparable results.

In addition, the compuational costs associated with different percent of null elements are observed by recording the average time complexities of different methods, which is given in Fig. 6. Nevertheless, IVPC is ignored due to its high complexities of calculation of information volumes for each instance.

In terms of the observations, the proposed BAC can present the ideal performance if k-means is adopted to learn the partitions. Furthermore, stable time complexities are given by all methods as increasing parts of null elements, which discloses the efficiency of the involved methods. Since orthogonal decomposition is necessary for PCC, the most time complexities are required. Owing to random selection of neighbors, NNC method is able to present the optimistic efficiency on Birds and Flower data sets. Furthermore, SEC needs more computational times, as it is hardly to achieve stable convergence for self-expressive learning.

Vi Conclusion

Clustering attempts to partition data instances into several groups associated with the maximum similarities of common characteristics. Furthermore, incomplete data frequently occurs in many real-world applications, e.g, digital data conveying and information processing, and brings perverse influence on data handling with missing values. In this work, a novel approach to clustering of incomplete data is proposed, which normally fulfill the incomplete data by leachable component learning.

More specifically, the proposed model exploits the similarities of distributions between the repaired and incomplete patterns, and alignment framework is adopted to learn the fulfilled elements. Similar to predication of Bayes distribution, it is able to afford clustering of incomplete data with estimation of distribution parameters.

Experiments on diverse artificial data sets show that, the proposed method is able to give the outstanding performance compared with the state-of-the-art methods, while calculation efficiency is still held. The future works mainly focus on the advances of the proposed method to other topics of information handling with incomplete data, as well as the relative learning of pattern analysis.

Acknowledgement

The authors would like to thank anonymous reviewers for their constructive suggestions, and Firat University, General Dynamics, and Alexander Mamaev for providing data sets online. This work was partly supported by Innovation and Talent Foundation of Guangxi Province of China (RZ1900007485, AD19110154), and Natural Science Foundation of China (62172177). The corresponding author of this work is Miao Cheng.

References

[1] V. AUdigier, N. Niang, and M. R. Rigon (2014) Generative adversarial nets. In Proc. International Conference on Neural Information Processing, Cambridge, USA, pp. . Cited by: §I.
[2] V. AUdigier, N. Niang, and M. R. Rigon (2018) Optimal clustering with missing values. In Proc. International Conference on Bioinformatics, Computational Biology, and Health Informatics, Washington D. C., USA, pp. . Cited by: §I.
[3] C. M. Bishop (2006) Pattern recognition and machine learning. edition, Springer, . Cited by: §I, §II, 2nd item.
[4] M. Brand (2002) Incremental singular value decomposition of uncertain data with missing values. In Proc. European Conference on Computer Vision, Copenhagen, Denmark, pp. . Cited by: §I.
[5] S. V. Buuren and K. G. Oudshoorn (2011) MICE: multivariate imputation by chained equations in r. Journal of Statistical Software 45 (3), pp. 1–67. Cited by: §I.
[6] E. J. Candés and Y. Plan (2010) Matrix completion with noise. Proceedings of the IEEE 98 (6), pp. 925–936. Cited by: §I.
[7] E. J. Candés and B. Recht (2009) Exact matrix completion via convex optimization. Foundations of Computational Mathematics 9 (6), pp. 717–772. Cited by: §I.
[8] E. J. Candes, J. Romberg, and T. Tao (2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Information Theory 52 (2), pp. 489–509. Cited by: §III-B.
[9] Z. Charles, A. Jalali, and R. Willett (2018) Sparse subspace clustering with missing and corrupted data. In Proc. IEEE Data Science Workshop, Lausanne, Switzerland, pp. . Cited by: §I.
[10] P. Chen and D. Suter (2004) Recovering the missing components in a large noisy low-rank matrix: application to sfm. IEEE Trans. Pattern Analysis and Machine Intelligence 26 (8), pp. 1051–1063. Cited by: §I.
[11] M. Cheng, B. Fang, J. Wen, and Y. Y. Tang (2010) Marginal discriminant projections: an adaptable margin discriminant approach to feature reduction and extraction. Pattern Recognition Letters 31 (13), pp. 1965–1974. Cited by: §I.
[12] M. Cheng, Z. Liu, H. Zou, and A. C. Tsoi (2014) A family of maximum margin criterion for adaptive learning. In Proc. International Conference on Neural Information Processing, Siem Reap, Cambodia, pp. . Cited by: §III-A, 3rd item.
[13] M. Cheng, Y. Y. Tang, and C. M. Pun (2011) Nonparametric feature extraction via direct maximum margin. In Proc. International Conference on Machine Learning and Applications, Honolulu, USA, pp. . Cited by: §III-A, 3rd item.
[14] M. Cheng and A. C. Tsoi (2016) CRH: a simple benchmark approach to continuous hashing. In Proc. Global Conference on Signal and Information Processing, Orlando, USA, pp. . Cited by: §I.
[15] M. Cheng and X. You (2020) Adaptive matching of kernel means. In Proc. International Conference on Pattern Recognition, Milan, Italy, pp. . Cited by: §III-B.
[16] M. Cheng, F. Zhou, H. Zhang, H. Zou, and J. Wu (2021) Function approximation for adaptive learning of label distributions. In Proc. International Conference on Signal and Image Processing, Nanjing, China, pp. . Cited by: §III-C.
[17] M. Cheng (2020) IVP-ldl: label distribution learning via preservation of information volumes. In Proc. International Conference on Advanced Computational Intelligence, Dali, China, pp. . Cited by: §III-C.
[18] J. K. Dixon (1979) Pattern recognition with partly missing data. IEEE Trans. System, Man, and Cybernetics 9 (10), pp. 617–621. Cited by: §I.
[19] D. L. Donoho (2006) Compressive sensing. IEEE Trans. Information Theory 52 (4), pp. 1289–1306. Cited by: §III-B.
[20] R. O. Duda, P. E. Hart, and D. G. Stork (2000) Pattern classification. edition, Wiley, California, USA. Cited by: §I.
[21] C. F. V. L. G. H. Golub (2013) Matrix computations. Four Edition edition, John Hopkins University Press, Baltimore, USA. Cited by: §III-A, §III-B.
[22] S. Gunnemann, E. Muller, S. Raubach, and T. Seidl (2011) Flexible fault tolerant subspace clustering for data with missing values. In Proc. IEEE International Conference on Data Mining, Vancouver, Canada, pp. . Cited by: §I.
[23] T. Hastie, R. Tibshirani, and J. Friedman (2011) The elements of statistical learning: data mining, inference, and predication. 2nd Edition edition, Springer, . Cited by: §IV-A.
[24] R. J. Hathaway and J. C. Bezdek (2001) Fuzzy c-means clustering of incomplete data. IEEE Trans. System, Man, and Cybernetics, Part B: Cybernetics 31 (5), pp. 735–744. Cited by: §I.
[25] J. Honaker, G. King, and M. Blackwell (2011) Amelia ii: a program for missing data. Journal of Statistical Software 45 (7), pp. 1–47. Cited by: §I.
[26] E. R. Hruschka and N. F. F. E. E. R. Hruschka Jr (2004) Towards efficient imputation by nearest-neighbors: a clustering based approach. In Proc. Austrilian Joint Conference on Artificial Intelligence, Cairns, Australia, pp. . Cited by: §I.
[27] Z. Kang, C. Peng, and Q. Cheng (2017) Twin learning for similarity and clustering: a unified kernel approach. In Proc. AAAI Conference on Artificial Intelligence, San Francisco, USA, pp. . Cited by: §I, §II, 4th item.
[28] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Analysis and Machine Intelligence 24 (7), pp. 881–892. Cited by: §I, 1st item.
[29] Y. Lin, Y. Gou, Z. Liu, B. Li, J. Lv, and X. Peng (2021) COMPLETER: incomplete multi-view clustering via contrastive prediction. In Proc. International Conference on Computer Vision and Pattern Recognition, Nashville, USA, pp. . Cited by: §I.
[30] R. J. A. Little and D. B. Rubin (2019) Statistical analysis with missing data. 3rd Edition edition, John Wiley & Sons, Cambridge, UK. Cited by: §I.
[31] S. Lloyd (1982) Least squares quantization in pcm. IEEE Trans. Information Theory 28 (2), pp. 129–137. Cited by: §I.
[32] Q. Ma, Y. Gu, W. C. Lee, and G. Yu (2019) Order-sensitive imputation for clustered missing values. In Proc. IEEE International Conference on Data Engineering, Macao, China, pp. . Cited by: §I.
[33] L. Z. Manor and P. Perona (2004) Self-tuning spectral clustering. In Proc. International Conference on Neural Information Processing Systems, Vancouver, Canada, pp. 849–856. Cited by: §I.
[34] H. Nagashima and Y. Kato (2020) Method for selecting a data imputation model based on programming by example for data analysis. In Proc. IEEE International Conference on Big Data, Atlanta, USA, pp. . Cited by: §I.
[35] A. Ng, M. I. Jordan, and Y. Weiss (2001) On spectral clustering: analysis and an algorithm. In Proc. International Conference on Neural Information Processing Systems, Vancouver, Canada, pp. . Cited by: §I.
[36] S. T. Roweis and L. K. Saul (2000) Nonlinear dimensionality reduction by locally linear embedding. Nature 290 (5500), pp. 2323–2326. Cited by: §III-B.
[37] N. Saeed, H. Nam, M. I. U. Haq, and D. B. M. Saqib (2019) A survey on multidimensional scaling. ACM Computing Surveys 51 (3), pp. 1–25. Cited by: §III-B.
[38] H. Y. Shum, K. Ikeuchi, and R. Reddy (1995) Principle component analysis with missing data and its application to polyhedral object modeling. IEEE Trans. Pattern Analysis and Machine Intelligence 17 (9), pp. 854–867. Cited by: §I, 3rd item.
[39] S. Song, Y. Sun, A. Zhang, L. Chen, and J. Wang (2018) Enriching data imputation under similarity rule constraints. IEEE Trans. Knowledge and Data Engineering 32 (2), pp. 275–287. Cited by: §I.
[40] M. Turk and A. Pentland (1991) Eigenfaces for recognition. Journal of Cognitive Neuroscience 3 (1), pp. 71–86. Cited by: §III-A.
[41] R. Vidal (2011) Subspace clustering. IEEE Signal Processing Magazine 28 (2), pp. 52–68. Cited by: §I.
[42] C. Zhang, Q. Hu, H. Fu, P. Zhu, and X. Cao (2015) Low-rank tensor constrained multiview subspace clustering. In Proc. IEEE International Conference on Computer Vision, Santiago, USA, pp. . Cited by: §I, §II, 4th item.
[43] W. Zhu, J. Lu, and J. Zhou (2019) Structured general and specific multi-view subspace clustering. Pattern Recognition 93 (), pp. 392–403. Cited by: §I.

Leachable Component Clustering ††thanks: *Corresponding author.