Binary Representation via Jointly Personalized Sparse Hashing
Abstract.
Unsupervised hashing has attracted much attention for binary representation learning due to the requirement of economical storage and efficiency of binary codes. It aims to encode high-dimensional features in the Hamming space with similarity preservation between instances. However, most existing methods learn hash functions in manifold-based approaches. Those methods capture the local geometric structures (i.e., pairwise relationships) of data, and lack satisfactory performance in dealing with real-world scenarios that produce similar features (e.g. color and shape) with different semantic information. To address this challenge, in this work, we propose an effective unsupervised method, namely Jointly Personalized Sparse Hashing (JPSH), for binary representation learning. To be specific, firstly, we propose a novel personalized hashing module, i.e., Personalized Sparse Hashing (PSH). Different personalized subspaces are constructed to reflect category-specific attributes for different clusters, adaptively mapping instances within the same cluster to the same Hamming space. In addition, we deploy sparse constraints for different personalized subspaces to select important features. We also collect the strengths of the other clusters to build the PSH module with avoiding over-fitting. Then, to simultaneously preserve semantic and pairwise similarities in our proposed JPSH, we incorporate the proposed PSH and manifold-based hash learning into the seamless formulation. As such, JPSH not only distinguishes the instances from different clusters, but also preserves local neighborhood structures within the cluster. Finally, an alternating optimization algorithm is adopted to iteratively capture analytical solutions of the JPSH model. We apply the proposed representation learning algorithm JPSH to the similarity search task. Extensive experiments on four benchmark datasets verify that the proposed JPSH outperforms several state-of-the-art unsupervised hashing algorithms.
1. Introduction
Representation learning has attracted extensive research attention in affective applications, such as affective image retrieval (Yao et al., 2019, 2020; Zhao et al., 2014), emotion classification (Chen et al., 2015; Rao et al., 2020, 2019) and facial recognition (Shi et al., 2020). Traditional representation learning usually bases on real-valued features, which will cause a waste of time and space. Recently, hashing representation also known as binary representation has been popular and achieved promising performances (Xiang et al., 2021; Jin et al., 2018; Tang et al., 2018). Furthermore, unsupervised hashing refers to the technology that maps high-dimensional features to compact binary codes without label information (Li et al., 2022; Liu et al., 2014; Hao et al., 2017; Hu et al., 2020; Zhang et al., 2020), and the similarity relationship between instances can be approximated by the Hamming distance. Therefore, unsupervised hashing with the low storage and efficient computation can be widely used in representation learning tasks, especially in the image similarity search tasks, which aims to retrieve some related samples from the dataset (Zhu et al., 2018; Jin et al., 2021).
Existing unsupervised hashing methods can be categorized into data-independent and learning-based hashing. Previous works mainly focus on finding suitable projections to produce optimal binary codes, e.g., Locality Sensitive Hashing (LSH) (Datar et al., 2004) and Min-wise Hashing (Min-Hash) (Indyk, 2001). Such methods model learning processes without using any data structure and distribution in the original space, and require long hash bits to achieve satisfactory results.
In contrast, learning-based hashing methods are gaining popularity in recent years. These methods preserve similarity relationships between instances via different viewpoints. For example, Iterative Quantization with Principal Component Analysis (PCA) (PCA-ITQ) (Gong et al., 2012) imposes a rotation matrix to iteratively reduce quantization errors between low-dimensional features and binary codes. Sparse Projections (SP) (Xia et al., 2015) incorporates a sparse regularizer to reduce the number of parameters and computational cost. Ordinal Embedding Hashing (OEH) (Liu et al., 2016) and Ordinal Constraint Hashing (OCH) (Liu et al., 2018) preserve the ranking information by embedding the ordinal relation among data points. Concatenation Hashing (CH) (Weng and Zhu, 2020) encourages that any two instances close to the same center point are close, whilst maintaining relative positions in the Hamming space. Recovery of Subspace Structures Hashing (RSSH) (Tian et al., 2020) preserves the semantic similarity in the Hamming space via an unsupervised multi-stage hashing model.
In addition, some manifold-based hashing methods have been proposed to capture the complex structures of data based on graph learning. Spectral Hashing (SH) (Weiss et al., 2008) converted the binary code learning to the graph partitioning. Zhu et al. proposed the Sparse Embedding and Least Variance Encoding (SELVE) (Zhu et al., 2014) to encode the sparse embedding vector over a learned dictionary. Anchor Graph Hashing (AGH) (Liu et al., 2011) built an anchor graph (Liu et al., 2010), and it obtained the tractable low-rank adjacency matrix. The matrix was used to measure the similarity between a pair of data points and a small number of anchor points, which were the K-means clustering centers because of the strong representation power of the category attributes. To preserve the underlying manifold structure with the t-SNE (Van der Maaten and Hinton, 2008) that is a modification of stochastic neighborhood embedding, Inductive Manifold Hashing (IMH) (Shen et al., 2013) was propsed. Some researchers developed a tractable alternating maximization algorithm to preserve the neighborhood structure inherent in the data, called Discrete Graph Hashing (DGH) (Liu et al., 2014). Jiang et al. implicitly computed the similarity graph matrix by feature transformation, and then proposed the Scalable Graph Hashing (SGH) (Jiang and Li, 2015) method. Graph PCA (gPCA) Hashing (Zhu et al., 2017) simultaneously preserved local structures via manifold learning and global structures via PCA. Locally Linear Hashing (LLH) (Irie et al., 2014) reconstructed the locally linear structures of manifolds in the binary Hamming space with locality-sensitive coding. Compared to LLH, Discrete Locality Linear embedding Hashing (DLLH) (Ji et al., 2017) directly reconstructed binary codes by maintaining the local linear relationship of data points. Considering the -norm term, Jointly Sparse Hashing (JSH) (Lai et al., 2018) could minimize the information loss. Unsupervised Discrete Hashing (UDH) (Jin et al., 2021) captured the semantic information by a balanced graph semantic loss for exploring both the similar and dissimilar relationships among data. By considering the semantic information, Li et al. proposed a weakly-supervised hashing method for image retrieval and achieved state-of-the-art performance (Li et al., 2020).
However, most of existing manifold-based hashing methods do not always produce satisfactory results in practice because instances with similar features (e.g. color and shape) may have different semantics. An example is shown in Figure 1 (a), where a cat (row 1, column 2) is visually similar to a dog (row 2, column 2) instead of the other cat (row 1, column 1). On the other hand, there is a large difference in color between the black horse (row 3, column 1) and the white horse (row 3, column 2), which may be mistakenly treated as two different animals. Existing methods model the neighboring graph in hash functions to reflect the pairwise relationship between two instances. For example, SH constructs the Laplacian graph to describe the relationship between two instances. AGH creates a modified version of SH by introducing the anchor graph (Liu et al., 2010). JSH uses a -regularized term to minimize the information loss of low-dimensional features and binary codes on the basis of AGH. Although the above methods can effectively preserve the local neighborhood structure from the high-dimensional feature space (shown in Figure 2 (a)) to a low-dimensional space (shown in Figure 2 (b)), such methods fail to consider semantics of instances that correspond to the intrinsic geometric structures of data (Zhang et al., 2019; Yang et al., 2018; Lin et al., 2015; Fernandez-Beltran et al., 2020).
To tackle the above problems, in this paper, we propose an effective unsupervised hashing method, namely Jointly Personalized Sparse Hashing (JPSH), to learn the binary representation for similarity search. Unlike existing manifold hashing methods that only preserve the pairwise relationship, the proposed JPSH encodes both semantic and pairwise similarities into the Hamming space, which can effectively distinguish different clusters while preserving local structures within clusters, as illustrated in Figure 2 (c). The proposed JPSH framework is shown in Figure 1, and the main contributions of our work can be summarized as follows:
-
A novel Personalized Sparse Hashing (PSH) module is proposed to preserve the semantics of instances that correspond to the intrinsic structures of data. It learns personalized subspaces that reflect the special categorical characteristics, and then defines sparse constraints for each personalized subspaces. Thus, the PSH can adaptively map instances within the same cluster to the same Hamming space and learn more discriminative features. To our knowledge, this is the first work that preserves semantic similarity in a personalized subspace.
-
We incorporate PSH and a standard manifold-based hashing method JSH in a seamless formulation to construct the proposed JPSH model, which simultaneously preserves semantic and pairwise similarities in the Hamming space.
The rest of this paper is organized as follows. Section 2 introduces some related works. Section 3 presents the details of our JPSH including the proposed personalized sparse hashing, the overall loss function, the discussions, the optimization algorithm, the out-of-sample extension. Section 4 shows experimental results and analyses, followed by the conclusion in Section 5.
2. Related Works
In this section, we first define some used notations, and then introduce several related works, including Spectral Hashing, Anchor Graph Hashing and Jointly Sparse Hashing.
2.1. Notations
In the following, we use bold uppercase letters (e.g. ) for matrices, bold lowercase letters (e.g. ) for vectors, and normal lowercase letters (e.g. z) for scalars. If there are no special instructions, we define the -th row of the matrix as , the -th column of the matrix as , the -th element of or the -th element of as . We also denote the transpose of the matrix as , and the trace of the matrix as if it is a square matrix. , , and represent the -norm, -norm, and Frobenius norm of the matrix , respectively. denotes the Kronecker product between matrices and . In this paper, the training set is denoted as , , where is the number of training samples, and is the dimension of features. The binary codes , , where , is the length of bits, and , is the number of anchor points. is the identity matrix with the size of -by-.
2.2. Spectral Hashing
SH (Weiss et al., 2008) aims to extract the discriminative information from the original space to learn binary codes based on the graph theory. With this objective, it minimizes the errors of binary codes between instances with Laplacian graph as follows:
(1) |
where is an all-ones vector, , is the affinity matrix, is a pre-defined parameter, and is binary codes of the -th training sample. It is time-consuming to compute for large-scale datasets. In addition, the information loss between low-dimensional features and binary codes will increase due to the binarization operation.
2.3. Anchor Graph Hashing
To reduce the computational complexity in Eq. (1), AGH (Liu et al., 2011) designs a truncated similarity matrix between all instances and anchor points. The -th element of is defined as:
(2) |
where is the -th anchor point; is the index set of neighboring anchor points of , and is a pre-defined parameter. Although AGH has the linear training time for the developed model, it still fails to control the information loss between low-dimensional features and binary codes in the binary representation learning.
2.4. Jointly Sparse Hashing
To minimize the information loss between low-dimensional feature vectors and binary codes, JSH (Lai et al., 2018) presents -regularized regression formulation, and the objective function can be defined as follows:
(3) |
where is the balance parameter, is the pairwise weight matrix; is the rotation matrix. Although JSH effectively handles the pairwise projection from a high-dimensional space to the Hamming space, the semantics of instances are unknown and motivate our work reported in this paper.
3. Jointly Personalized Sparse Hashing
In this section, we present the details of the proposed JPSH, including semantic-preserving Personalized Sparse Hashing (PSH), overall objective function of JPSH, discussions, optimization, the out-of-sample extension.
3.1. Personalized Sparse Hashing
The semantic structure can reflect the intrinsic geometric relationship across different data points within a specific category. The proposed PSH thinks that instances with similar semantic structures should be adaptively projected to the same Hamming space, and vice versa, as shown in Figure 1 (b). For this purpose, one simple yet effective way is to build different personalized subspaces (Weinberger et al., 2009) to capture different semantic structures. Retrieving label information is an expensive process in terms of time, labor and human expertise, and we define semantic structures in a high-dimensional space by pseudo labels produced by K-means. The instances with similar semantic information likely share the same anchor point. Therefore, we derive a set of anchor points as the training set of PSH and obtaining the following objective function:
(4) |
where is a personalized weight matrix for the anchor point and . is a rotation matrix to minimize quantization errors, driven by ITQ (Gong et al., 2012).
Since the binarization operation for low-dimensional features and binary codes helps increase the information loss, we consider using the feature sparsity for personalized weight to select important features. Exclusive group lasso (EGL) (Kong et al., 2014, 2016) encourages intra-cluster competition but discourages inter-cluster competition. Inspired by EGL we first impose a -norm regularization on personalized weight for pursuing the sparsity of intra-cluster features. Then, we create a -norm term to constrain the non-sparsity of inter-cluster features. This regularization term enables us to perform optimal feature selection for each cluster. However, moderating the personalized weight with a single anchor point may over-fit with a poor generalization ability. Therefore, we force the anchor point to lend certain strengths from its neighboring anchors to learn the personalized weight , motivated by the network lasso penalty (Hallac et al., 2015). Finally, we re-write the loss function of PSH as follows:
(5) |
where and are balance parameters, is the similarity matrix of anchors, and the -th element of is defined as follows:
(6) |
where is a given parameter, and is the index set of nearest neighbors of the anchor point .
3.2. Overall Objective Function of JPSH
As aforementioned, the PSH model shown in Eq. (5) can effectively preserve particular attributes of each cluster from the high-dimensional space into the Hamming space. Instances within the same cluster can be projected by the same personalized weight matrix, resulting in directly maintaining the semantics of instances that correspond to intrinsic structures of data into the Hamming space.
To further improve the representation capacity of binary codes, we also hope to preserve the local neighborhood structure within the cluster in the Hamming space. To this end, we maintain the pairwise similarity via a manifold-based hashing learning method. Since JSH (Lai et al., 2018) can quantify the similarity between each instance and its neighboring anchor points, and greatly reduce the time complexity, we thereby joint Eqs. (3) and (5) to construct the Jointly Personalized Sparse Hashing (JPSH) model. The overall objective function of JPSH is defined as:
(7) | ||||
where is the balance parameter for projected matrix .
3.3. Discussions
In the following, we will discuss how our proposed JPSH learns the binary representation and preserves discriminative information compared with other hashing methods from two aspects.
On the one hand, compared to existing manifold-based hashing methods, such as SH and JSH, our proposed JPSH not only considers local pairwise similarity in the original feature space , but also the intrinsic semantic similarity of the data. In particular, Eq. (3) shows that JSH can effectively reduce the computational complexity using truncated similarity matrix , and avoid information loss via . Therefore, we maintain local neighborhood structures within each cluster by JSH, as the local structure information in the two clusters (purple points and blue points shown in the Figure 2 (b)). However, JSH has difficulty distinguishing instances with similar features but different semantic information. Comparing Eq. (7) against Eq. (3), we learn a set of personalized weights for different anchors , and employ the -norm sparse constraint on each . So, instances with the same pseudo-label can be adaptively mapped to the personalized subspace through the unique projection learning of personalized weights. Therefore, the proposed JPSH can avoid misrepresenting instances of similar features with different semantics by directly retaining the semantic relationship of the original feature space.
On the other hand, compared against those multi-stage hashing methods, such as PCA-ITQ and RSSH, we construct the seamless hash objective function Eq. (7). It can avoid suboptimal solutions generated by using multiple independent learning strategies. For example, PCA-ITQ first applies PCA to perform linear dimensionality reduction, and then uses an alternating algorithm for refining the initial orthogonal transformation to reduce quantization errors. RSSH preserves semantic relationships on three subproblems: multi-subspace learning, similarity matrix construction, and semantics preserved hashing. Different from the above methods, we construct personalized weight matrix , pairwise weight matrix and two rotation matrices and in the seamless formulation Eq. (7). Thus, our proposed JPSH can significantly avoid the suboptimal solution, and obtain discriminative hash codes in binary representation learning.
3.4. Optimization
The optimization of Eq. (7) is intractable and non-convex w.r.t all parameters simultaneously. Therefore, we adopt an alternating optimization algorithm to iteratively update all parameters till the algorithm converges to an acceptable settlement.
Update Personalized Weight : First, we update the personalized weight when , , and are fixed. The subproblem of Eq. (7) w.r.t is:
(8) |
For the convenience, we devide Eq. (8) into three functions: , , . We denote , and , where is a diagonal function. Since , the derivative of w.r.t can be computed as follows:
(9) |
According to Refs. (Li et al., 2017a, 2018), we have the derivative of w.r.t as follows:
(10) |
where is a diagonal matrix, and the -th diagonal element can be defined as:
(11) |
where is a very small constant to ensure the derivative solvable, is an indicator function, and if belongs to , otherwise .
The derivative of w.r.t. is computed as follows:
(12) |
where is an identity matrix. is square matrix, and the -th element of can be expressed as:
(13) |
Combining , and together, and setting it to be zero, we have a closed-form solution of the personalized weight as follows:
(14) |
Update Pairwise Weight : We attempt to update the pairwise weight when other four parameters are fixed. Removing terms that are irrelevant to , we have the following function as:
(15) |
According to Ref. (Lai et al., 2018), since the and , the term between and is an identity matrix. So, the closed-form solution of is:
(16) |
where is a diagonal matrix, and the -th element of is:
(17) |
where is the -th row of the pairwise weight .
Update Rotation Matrix : To compute , we fix , , and , and then solve the following maximization problem:
(18) |
According to (Zou et al., 2006; Li et al., 2017b), we compute by Singular Value Decomposition (SVD), and the updating scheme of is described below:
(19) |
Update Rotation Matrix : To derive the solution of , we fix the other four parameters and re-form the maximization problem as follows:
(20) |
We use SVD to solve the problem , and then retain the solution of as follows:
(21) |
Update Binary Codes : Finally, we update the parameter when others have been fixed and solve the optimization problem as follows:
(22) |
Thus, the solution to is:
(23) |
where is a sign function.
With the above handling, the pseudo codes of our proposed JPSH method is summarized in Algorithm 1.
3.5. Out-of-Sample Extension
After having optimized the overall objective function of JPSH, we perform similarity prediction based on four parameters , , and . More specifically, for each query instance , we first find the personalized weight corresponding to the anchor point using the minimum Euclidean distance, for the query instance . Then, we joint the personalized weight , pairwise weight , and two rotation matrices and to learn discriminative binary codes , that is: .
Observing , we can find that each hash code is constrained by and . The first term transforms the anchor corresponding to by the personalized weight matrix . Since the personalized weight matrices are different for different clusters, the hash codes generated by the first trem have the semantic separability. From Eq. (7), we can observe that the pairwise weight matrix is produced under the constraints of pairwise similarity matrix . Thus, the second trem reflects the local neighborhood structures among data. Integrating the first term and the second term into the program of producing , we can obtain the binary codes fused the semantic and pairwise similarities.
4. Experimental WorkS
To verify the effectiveness of the proposed binary representation algorithm JPSH, we use it in the task of similarity search. We conduct extensive experiments on four widely-used benchmark image datasets, and compare with several state-of-the-art hash algorithms.
4.1. Datasets
The proposed method and the comparative ones are evaluated on the following four datasets, and some example images of each dataset are shown in Figure 3. MNIST (LeCun et al., 1998) contains 70,000 handwritten digit images from 10 classes in total. Each image is re-shaped to a 784-dimensional feature vector. We construct the testing set by randomly selecting 100 images per class, and the remaining images are composed as the training set (Lin et al., 2018). CIFAR-10 (Krizhevsky et al., 2009) consists of 60,000 tiny images of 10 classes. Following (Hu et al., 2018), 512-dimensional GIST features (Oliva and Torralba, 2001) are retrieved. We randomly select 100 images per class to form the testing set, and the remaining 5900 images per class form the training set. NUS-WIDE (Chua et al., 2009) is a real-world dataset containing 269,648 images related to 81 ground truth concepts. We pick the most 21 frequent concepts for evaluation. For each category, 100 images are randomly sampled to form the testing set, and 100,000 images from the remaining images form the training set. 4096-dimensional CNN features (Krizhevsky et al., 2017) are firstly extracted, and then 1000-dimensional PCA features are generated to represent each image. FLICKR25K (Huiskes and Lew, 2008) is also a real-world dataset including 25,000 images of 24 categories. We randomly select 1000 images to form the testing set, and 19,015 images for the training set (Dong et al., 2021). Following the setting of NUS-WIDE, we extract 1000-dimensional PCA features to represent each image.
4.2. Settings
For the proposed JPSH, the test range of parameters , and is , the test range of parameter is , and the test range of parameter is . Based on the test results, we empirically set , , and on MNIST and CIFAR-10, and set , and on NUS-WIDE and FLICKR25K, while for NUS-WIDE, and for FLICKR25K. In addition, we set in Eq. (6) for MNIST and CIFAR-10 datasets, and set for FLICKR25K and NUS-WIDE datasets. We test the system with different lengths of codes ranged in the set of . As for the evaluation, five standard evaluation metrics, i.e. mean Average Precision (mAP), the mean average precision of the top 100 testing samples (Pre@100), Precision-Recall curve, precision vs. top-N positions (Precision@N) curve, and recall vs. top-N positions (Recall@N) curve, are used to evaluate the performance. All results shown here are produced with Hamming radius 2, being the average of 10 run times. The baseline methods used in this evaluation are under no domain adaption assumption. The baseline methods consist of 10 state-of-the-art unsupervised hashing methods: LSH (Datar et al., 2004), SH (Weiss et al., 2008), PCA-ITQ (Gong et al., 2012), SP (Xia et al., 2015), SGH (Jiang and Li, 2015), OEH (Liu et al., 2016), OCH (Liu et al., 2018), JSH (Lai et al., 2018), CH (Weng and Zhu, 2020) and RSSH (Tian et al., 2020). All hashing methods are examined using Matlab on a PC with 3.6GHz and 64G RAM.
4.3. Ablation Studies
To analyze the roles of pairwise and semantic similarities played in JPSH, we perform ablation studies on the CIFAR-10 and FLICKR25K datasets. The experimental results are depicted in Table 1. From this table, we have three clear observations: Firstly, JPSH achieves better performance than JSH and PSH. The reason is that JPSH learns and creates a seamless hash function, jointly considering pairwise and semantic similarities. Compared to JSH, the proposed JPSH considers further the semantics of instances corresponding to the intrinsic geometric structures. In addition, the JPSH retains the superior performance of the traditional manifold hashing method JSH comparsed with the PSH. Secondly, closely looking at JSH and PSH, we observe that PSH performs worse than JSH. One possible reason is that pseudo labels determined by K-means cannot accurately represent the semantic information of the high-dimensional data, comparing to the ground truths. Third, only taking PSH into consideration, we discover that PSH can achieve decent performance, especially on real-world FLICKR25K dataset. This verifies that the proposed PSH can retain the semantic similarity of the high-dimensional space into the Hamming space in a certain extent, without any manually tagged labels. Overall, these well-designed studies exhibit the effectiveness of combining JSH-based pairwise and PSH-based semantic similarity to a seamless formulation.
Datasets | CIFAR-10 | FLICKR25K | ||
---|---|---|---|---|
mAP | Pre@100 | mAP | Pre@100 | |
JSH(Lai et al., 2018) | 0.1561 | 0.3001 | 0.6667 | 0.8909 |
PSH | 0.1133 | 0.1613 | 0.5632 | 0.8316 |
JPSH | 0.1606 | 0.3275 | 0.6936 | 0.9176 |
Datasets | Metrics | Bits | LSH(Datar et al., 2004) | SH(Weiss et al., 2008) | PCA-ITQ(Gong et al., 2012) | SP(Xia et al., 2015) | SGH(Jiang and Li, 2015) | OEH(Liu et al., 2016) | OCH(Liu et al., 2018) | JSH(Lai et al., 2018) | CH(Weng and Zhu, 2020) | RSSH(Tian et al., 2020) | JPSH |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MNIST |
mAP |
8 | 0.1808 | 0.2651 | 0.3589 | 0.3404 | 0.3111 | 0.2920 | 0.2376 | 0.3204 | 0.3030 | 0.3940 | 0.4332 |
16 | 0.2094 | 0.2661 | 0.4042 | 0.3780 | 0.3526 | 0.3100 | 0.3073 | 0.4066 | 0.3688 | 0.4128 | 0.5561 | ||
32 | 0.2633 | 0.2606 | 0.4378 | 0.4178 | 0.3898 | 0.3497 | 0.3727 | 0.5118 | 0.4317 | 0.4183 | 0.6184 | ||
64 | 0.3199 | 0.2400 | 0.4490 | 0.4401 | 0.4180 | 0.3727 | 0.4107 | 0.5613 | 0.4112 | 0.4204 | 0.6548 | ||
96 | 0.3592 | 0.2367 | 0.4596 | 0.4472 | 0.4282 | 0.3889 | 0.4253 | 0.5675 | 0.4318 | 0.4222 | 0.6666 | ||
128 | 0.3757 | 0.2330 | 0.4627 | 0.4510 | 0.4358 | 0.4014 | 0.4339 | 0.5758 | 0.4401 | 0.4245 | 0.6659 | ||
Pre@100 |
8 | 0.4372 | 0.4859 | 0.7165 | 0.7057 | 0.6835 | 0.6040 | 0.5581 | 0.7213 | 0.6644 | 0.7843 | 0.8635 | |
16 | 0.4871 | 0.4904 | 0.7999 | 0.7630 | 0.7300 | 0.6523 | 0.6786 | 0.8112 | 0.7424 | 0.7122 | 0.9396 | ||
32 | 0.6102 | 0.4732 | 0.8468 | 0.8169 | 0.7828 | 0.7233 | 0.7461 | 0.8954 | 0.7954 | 0.6795 | 0.9614 | ||
64 | 0.7010 | 0.4416 | 0.8533 | 0.8451 | 0.8004 | 0.7470 | 0.7794 | 0.9279 | 0.7641 | 0.6750 | 0.9619 | ||
96 | 0.7565 | 0.4379 | 0.8685 | 0.8524 | 0.8111 | 0.7915 | 0.7931 | 0.9370 | 0.7914 | 0.6737 | 0.9646 | ||
128 | 0.7832 | 0.4370 | 0.8656 | 0.8501 | 0.8139 | 0.7945 | 0.8026 | 0.9447 | 0.8072 | 0.6768 | 0.9656 | ||
CIFAR-10 |
mAP |
8 | 0.1102 | 0.1232 | 0.1421 | 0.1234 | 0.1382 | 0.1408 | 0.1314 | 0.1290 | 0.1413 | 0.1342 | 0.1380 |
16 | 0.1134 | 0.1266 | 0.1472 | 0.1265 | 0.1497 | 0.1489 | 0.1494 | 0.1561 | 0.1447 | 0.1428 | 0.1606 | ||
32 | 0.1230 | 0.1252 | 0.1561 | 0.1281 | 0.1592 | 0.1561 | 0.1618 | 0.1607 | 0.1560 | 0.1485 | 0.1812 | ||
64 | 0.1306 | 0.1254 | 0.1485 | 0.1268 | 0.1684 | 0.1646 | 0.1705 | 0.1785 | 0.1658 | 0.1533 | 0.1848 | ||
96 | 0.1334 | 0.1250 | 0.1547 | 0.1308 | 0.1713 | 0.1671 | 0.1750 | 0.1845 | 0.1713 | 0.1544 | 0.1925 | ||
128 | 0.1373 | 0.1249 | 0.1576 | 0.1333 | 0.1734 | 0.1694 | 0.1765 | 0.1865 | 0.1738 | 0.1574 | 0.1895 | ||
Pre@100 |
8 | 0.1517 | 0.1899 | 0.2612 | 0.2019 | 0.2382 | 0.2486 | 0.2157 | 0.2187 | 0.2601 | 0.2138 | 0.2413 | |
16 | 0.1583 | 0.1906 | 0.2730 | 0.1961 | 0.2539 | 0.2652 | 0.2661 | 0.3001 | 0.2740 | 0.2443 | 0.3275 | ||
32 | 0.1915 | 0.1760 | 0.3063 | 0.1979 | 0.2839 | 0.2876 | 0.2989 | 0.3313 | 0.2894 | 0.2626 | 0.3672 | ||
64 | 0.2103 | 0.1771 | 0.2720 | 0.1942 | 0.3171 | 0.3118 | 0.3244 | 0.3657 | 0.3154 | 0.2819 | 0.3910 | ||
96 | 0.2120 | 0.1748 | 0.2988 | 0.2047 | 0.3263 | 0.3177 | 0.3371 | 0.3765 | 0.3318 | 0.2828 | 0.4109 | ||
128 | 0.2235 | 0.1754 | 0.3083 | 0.2047 | 0.3319 | 0.3247 | 0.3424 | 0.3846 | 0.3351 | 0.2949 | 0.4015 | ||
NUS-WIDE |
mAP |
8 | 0.3427 | 0.4293 | 0.5260 | 0.5063 | 0.4725 | 0.4864 | 0.4152 | 0.4425 | 0.4321 | 0.3475 | 0.5521 |
16 | 0.3598 | 0.4067 | 0.5243 | 0.5032 | 0.4600 | 0.4780 | 0.4672 | 0.5511 | 0.4712 | 0.3614 | 0.6027 | ||
32 | 0.3873 | 0.3830 | 0.5230 | 0.5055 | 0.4601 | 0.4844 | 0.5046 | 0.5876 | 0.4783 | 0.3952 | 0.6200 | ||
64 | 0.4138 | 0.3689 | 0.5286 | 0.5133 | 0.4815 | 0.4903 | 0.5350 | 0.6138 | 0.4758 | 0.4126 | 0.6271 | ||
96 | 0.4346 | 0.3674 | 0.5344 | 0.5189 | 0.4971 | 0.4964 | 0.5431 | 0.6169 | 0.4855 | 0.4164 | 0.6293 | ||
128 | 0.4479 | 0.3769 | 0.5397 | 0.5224 | 0.5059 | 0.5019 | 0.5487 | 0.6224 | 0.4947 | 0.4211 | 0.6340 | ||
Pre@100 |
8 | 0.6660 | 0.7121 | 0.8258 | 0.8173 | 0.7485 | 0.8033 | 0.7479 | 0.7588 | 0.7529 | 0.6644 | 0.8313 | |
16 | 0.6875 | 0.6998 | 0.8243 | 0.8142 | 0.7408 | 0.7991 | 0.7805 | 0.8420 | 0.7796 | 0.6752 | 0.8636 | ||
32 | 0.7144 | 0.6805 | 0.8240 | 0.8105 | 0.7525 | 0.8026 | 0.8107 | 0.8695 | 0.7795 | 0.6935 | 0.8762 | ||
64 | 0.7357 | 0.6809 | 0.8265 | 0.8179 | 0.7729 | 0.8018 | 0.8342 | 0.8929 | 0.8024 | 0.6983 | 0.8841 | ||
96 | 0.7578 | 0.6865 | 0.8322 | 0.8225 | 0.7862 | 0.8001 | 0.8427 | 0.8877 | 0.8026 | 0.6947 | 0.8860 | ||
128 | 0.7638 | 0.6986 | 0.8362 | 0.8255 | 0.7935 | 0.8117 | 0.8439 | 0.8948 | 0.8060 | 0.6979 | 0.8885 | ||
FLICKR25K |
mAP |
8 | 0.5802 | 0.6321 | 0.6949 | 0.6828 | 0.6539 | 0.6653 | 0.6360 | 0.6206 | 0.6711 | 0.5674 | 0.6790 |
16 | 0.5927 | 0.6174 | 0.6911 | 0.6818 | 0.6476 | 0.6677 | 0.6606 | 0.6667 | 0.6736 | 0.5792 | 0.6936 | ||
32 | 0.6100 | 0.6023 | 0.6907 | 0.6819 | 0.6539 | 0.6698 | 0.6859 | 0.6875 | 0.6735 | 0.6028 | 0.7138 | ||
64 | 0.6246 | 0.5923 | 0.6909 | 0.6834 | 0.6653 | 0.6687 | 0.7017 | 0.7011 | 0.6819 | 0.6420 | 0.7253 | ||
96 | 0.6403 | 0.5928 | 0.6923 | 0.6864 | 0.6742 | 0.6759 | 0.7070 | 0.7042 | 0.6838 | 0.6521 | 0.7313 | ||
128 | 0.6443 | 0.5928 | 0.6950 | 0.6884 | 0.6805 | 0.6799 | 0.7115 | 0.7085 | 0.6866 | 0.6546 | 0.7331 | ||
Pre@100 |
8 | 0.8417 | 0.8701 | 0.9079 | 0.9027 | 0.8874 | 0.8991 | 0.8773 | 0.8603 | 0.8959 | 0.8361 | 0.9119 | |
16 | 0.8487 | 0.8633 | 0.9055 | 0.9004 | 0.8754 | 0.8926 | 0.8920 | 0.8909 | 0.8930 | 0.8398 | 0.9176 | ||
32 | 0.8587 | 0.8590 | 0.9039 | 0.8998 | 0.8803 | 0.8941 | 0.9057 | 0.9077 | 0.8938 | 0.8482 | 0.9269 | ||
64 | 0.8684 | 0.8561 | 0.9036 | 0.8992 | 0.8836 | 0.8930 | 0.9127 | 0.9142 | 0.9008 | 0.8652 | 0.9291 | ||
96 | 0.8745 | 0.8565 | 0.9049 | 0.9014 | 0.8892 | 0.8957 | 0.9151 | 0.9167 | 0.9029 | 0.8688 | 0.9307 | ||
128 | 0.8789 | 0.8597 | 0.9060 | 0.9022 | 0.8932 | 0.8978 | 0.9181 | 0.9187 | 0.9050 | 0.8710 | 0.9305 |
4.4. Results and Discussions
For the MNIST dataset, the first two rows in Table 1 illustrate the mAP and Pre@100 of our proposed JPSH and the baselines with different hash bits. Figure 4 (a, b, c) show the Precision-Recall curve, Precision@N and Recall@N curves on 64 bits, respectively. Going through these results, we have the following observations: (1) JPSH consistently outperforms all baselines in all cases on this dataset with large gaps, demonstrating its effectiveness on the binary representation in dealing with similarity search tasks. (2) Compared against the second-best method JSH, our proposed method improves and at least for mAP and Pre@100, respectively. This shows that JPSH outperforms the standard manifold-based hashing method JSH, and generates more discriminative binary codes by jointly learning semantic and pairwise similarities in the Hamming space. (3) RSSH shows better performance than manifold-based hashing methods in low hash bits but worse than JPSH. This indicates that semantic preservation helps to learn the discriminative information, especially that for our proposed method. (4) Figure 4 (a, b, c) show that JPSH achieves consistent performance improvement in terms of Precision-Recall curve, Precision@N and Recall curves on 64 bits, indicating the effectiveness of our JPSH.
For the CIFAR-10 dataset, mAP and Pre@100 are reported in the third and fourth rows of Table 1, and Figure 4 (d, e, f) show the Precision-Recall curve, Precision@N and Recall@N curves based on 64 bits, respectively. From these results, we have the observations similar to those of MNIST: (1) Compared with all baselines, JPSH achieves the best results of 16 to 128 bits in terms of mAP and Pre@100, demonstrating that JPSH can learn discriminative binary codes by maintaining semantic structures in different personalized subspaces. (2) Of 8 bits, PCA-ITQ achieves the best performance. However, the performance of JPSH and PCA-ITQ is very close, whilst JPSH is competitive to the other methods. (3) In Figure 4 (d, e, f), JPSH matches the best baseline in terms of Precision-Recall curve and Recall@N curve, but little low than RSSH on Precision@N. It is worth noting that JPSH ourperforms JSH in terms of Precision@N and Recall@N curves. This indicates that PSH can help to map instances within the same cluster to the same Hamming space in a certain extent.
For the NUS-WIDE dataset, we report the results of mAP and Pre@100 in the fifth and sixth rows of Table 1. The Precision-Recall, Precision@N, and Recall@N curves are shown in Figure 4 (g, h, i). Regarding those results, we have similar observations to the above cases: (1) JPSH also outperforms all baselines in all cases in terms of mAP on this real-world dataset. It significantly surpasses all baselines of low hash bits (8 to 32 bits) in terms of Pre@100, and little lower than JSH of high hash bits (64 to 128 bits). As a whole, these results demonstrate the effectiveness and robustness of the semantic- and pairwise-preserving scheme used in JPSH. (2) From Figure 4 (g, h, i), we witness that JPSH has a consistent improvement compared with all baselines. This justifies that our proposed JPSH discovers more discriminative structural information in binary representation learning.
For the FLICKR25K dataset, the last two rows of Table 1 show results of mAP and Pre@100, and curves of Precision-Recall, Precision@N and Recall@N are shown in Figure 4 (j, k, l). Again, we have observations similar to the above three datasets: (1) JPSH obtains the best results compared with all baselines in terms of mAP and Pre@100, demonstrating the effectiveness and robustness of our proposed JPSH in the binary representation learning process. (2) Figure 4 (j, k, l) show that our proposed JPSH outperforms all compared methods in terms of Precision-Recall and Recall@N curves, slightly lower than some compared methods in terms of Precision@N curve. Furthermore, JPSH is always better than JSH in Figure 4 (k). Therefore, our JPSH can learn better representative binary codes for similarity search.
4.5. Semantic Analysis
In JPSH, we use pseudo labels to distinguish different clusters, and then map instances within the same cluster to the same Hamming space. In order to verify that our JPSH indeed maintains a certain semantic similarity, the training set of PSH is replaced by random samples called JPSH. The results of mAP and Pre@100 between JPSH and JPSH are shown in Fig 5 and Fig 6, respectively. From those figures, we can observe that the mAP and Pre@100 of JPSH are obviously below that of JPSH, particularly in CIFAR-10 and NUS-WIDE datasets. These results indicate that K-means is good for extracting discriminative information with exploring the semantic structures. Moreover, our proposed JPSH successfully retains those structures into the Hamming space, and JPSH can not be.
4.6. Sparsity Analysis
As shown in Figure 7, to study the sparsity of JPSH, we depict the visualization of personalized weight matrix processed by function with size , and . It can be seen that each matrix contains some green columns that are equal to zero due to the sparsity constraint of . It means that those corresponding features are ignored in a projection operation. Thus, we can eliminate some redundant features for binary representation learning. For example, the first column of matrices in Figure 7 (a), (b) and (c) are all the zero vector, which means that the first feature of data is malfunction to distinguish the differences between data. Another benefit of sparsity lies in that it can increase the interpretability of the JPSH model. We can know that which features provide the discriminative information, while those features do not. Therefore, the sparsity of works in our JPSH model.
4.7. Sensitivity Analysis
We discuss the sensitivity of our JPSH over parameters , , , and . The setting ranges of the corresponding parameters are described in Section 4.2. As an example, Figure 8 (a, b, c) show the sensitivity of the , and on the FLICKR25K dataset with 64 bits. Figure 8 (a) and (b) show that the mAPs are over 0.71 within a large wide ranges w.r.t and . In addition, the mAPs fluctuated within a certain range indicate the terms of and can influence the performance of the JPSH. We set and as 10 for the FLICKR25K dataset because of the relatively higher and more stable performance. From Figure 8 (c), mAPs exceed 0.69 in the range of , and when equals to , the mAP surpasses 0.72. Therefore, we set the best for the FLICKR25K dataset. In addition, Figure 8 (d) shows mAPs versus and on the FLICKR25K dataset with 64 bits, suggesting that the JPSH is not sensitive to and , either. The sensitivity analysis of , , , and on MNIST, CIFAR-10 and NUS-WIDE datasets is similar with that on the FLICKR25K dataset.
4.8. Convergence Analysis
Figure 8 (e) and (f) show the convergence curves of our JPSH method on CIFAR-10 and FLICKR25K datasets with 32 bits. The objective function of the JPSH is well converging, and quickly converges to an optimal solution within 20 iterations in all datasets. From Figure 8 (e), we witness that the objective loss function values settle after 4 iterations. Thus, we set the time iteration of our JPSH for CIFAR-10 dataset as . Analogously, based on Figure 8 (f), we set for FLICKR25K dataset. For MNIST and NUS-WIDE datasets, we empirically set iteration times to be and , respectively.
4.9. Time Complexity
The overall training time complexity of the JPSH is mainly composed by three parts: (1) the building of matrix , which the computational complexity is ; (2) the time-consuming of computing matrix is ; (3) For each iteration, the computational complexity of Eq. (14) is ; of Eq. (16) is ; of performing SVD on rotation matrices and both are ; and of Eq. (23) is . Generally, due to the length of binary codes and anchor points are usually smaller, the total computational complexity of our JPSH is . It is linear with the number of training samples. Note that the whole computational process is mainly based on matrix multiplications, which can be computed in parallel for accelerations. It is reasonably efficient on the binary representation learning. To quantitatively compare the computational complexity, we evaluate the training time-consumation of OEH, RSSH and JPSH on CIFAR-10 dataset in terms of 32, 64 and 96 bits. Table 3 shows the time complexity and running times of OEH, RSSH and JPSH. Obviously, we could find that JPSH needs much less time than the multi-stage hashing method RSSH while little higher than OEH. It is noted that the JPSH yields higher mAP values than OEH, which are shown in Table 2. Therefore, the time cost of our proposed JPSH model is acceptable.
Methods | Time Complexity | Running Times (seconds) | ||
---|---|---|---|---|
32 bits | 64 bits | 96 bits | ||
OEH(Liu et al., 2016) | 13.3385 | 21.9931 | 43.9122 | |
RSSH(Tian et al., 2020) | 75.3624 | 80.4510 | 86.0708 | |
JPSH | 45.0945 | 53.9658 | 62.6788 |
5. Conclusion
In this paper, we proposed an effective unsupervised hashing method for binary representation learning, namely Jointly Personalized Sparse Hashing (JPSH), which jointly preserves semantic and pairwise similarities in the Hamming space. Firstly, we developed a novel Personalized Sparse Hashing (PSH) module that adaptively maps different clusters to corresponding personalized subspaces to maintain the identical semantics. Then, we jointed PSH and the manifold-based model JSH to construct JPSH, which can preserve semantic and pairwise similarities in a seamless formulation. Finally, an alternating optimization method was adopted to iteratively solve one variable by fixing the others. Extensive experiments on four benchmark image datasets had been conducted. The results verified that our proposed JPSH outperforms the other state-of-the-art unsupervised hashing methods in the similarity search task.
Acknowledgements.
This work was supported in part by the Guanxi Natural Science Foundation under grants (2019GXNSFFA245014, 2020GXNSFBA238014, 2020GXNSFAA297061), the National Natural Science Foundation of China under grants (62172120, 62002082, 62071174), the Natural Science Foundation of Hunan Province under Grant (2020JJ3014), and the Guangxi Key Research and Development Project (AB21220037).References
- Learning deep features for image emotion classification. In Proceedings of the IEEE International Conference on Image Processing, pp. 4491–4495. Cited by: §1.
- NUS-wide: a real-world web image database from national university of singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 1–9. Cited by: §4.1.
- Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the Annual Symposium on Computational Geometry, pp. 253–262. Cited by: §1, §4.2, Table 2.
- Unsupervised deep k-means hashing for efficient image retrieval and clustering. IEEE Transactions on Circuits and Systems for Video Technology 31 (8), pp. 3266–3277. Cited by: §4.1.
- Unsupervised remote sensing image retrieval using probabilistic latent semantic hashing. IEEE Geoscience and Remote Sensing Letters 18 (2), pp. 256–260. Cited by: §1.
- Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (12), pp. 2916–2929. Cited by: §1, §3.1, §4.2, Table 2.
- Network lasso: clustering and optimization in large graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 387–396. Cited by: §3.1.
- Unsupervised t-distributed video hashing and its deep hashing extension. IEEE Transactions on Image Processing 26 (11), pp. 5531–5544. Cited by: §1.
- Discrete spectral hashing for efficient similarity retrieval. IEEE Transactions on Image Processing 28 (3), pp. 1080–1091. Cited by: §4.1.
- Creating something from nothing: unsupervised knowledge distillation for cross-modal hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3123–3132. Cited by: §1.
- The mir flickr retrieval evaluation. In Proceedings of the ACM International Conference on Multimedia Information Retrieval, pp. 39–43. Cited by: §4.1.
- A small approximately min-wise independent family of hash functions. Journal of Algorithms 38 (1), pp. 84–90. Cited by: §1.
- Locally linear hashing for extracting non-linear manifolds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2115–2122. Cited by: §1.
- Toward optimal manifold hashing via discrete locally linear embedding. IEEE Transactions on Image Processing 26 (11), pp. 5411–5420. Cited by: §1.
- Scalable graph hashing with feature transformation. In Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 15, pp. 2248–2254. Cited by: §1, §4.2, Table 2.
- Deep semantic-preserving ordinal hashing for cross-modal similarity search. IEEE Transactions on Neural Networks and Learning Systems 30 (5), pp. 1429–1440. Cited by: §1.
- Unsupervised discrete hashing with affinity similarity. IEEE Transactions on Image Processing 30, pp. 6130–6141. Cited by: §1, §1.
- Exclusive feature learning on arbitrary structures via l1,2-norm. Advances in Neural Information Processing Systems 27, pp. 1655–1663. Cited by: §3.1.
- Uncorrelated group lasso. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1765–1771. Cited by: §3.1.
- Learning multiple layers of features from tiny images. (7). Cited by: §4.1.
- Imagenet classification with deep convolutional neural networks. Communications of the ACM 60 (6), pp. 84–90. Cited by: §4.1.
- Jointly sparse hashing for image retrieval. IEEE Transactions on Image Processing 27 (12), pp. 6147–6158. Cited by: Figure 2, §1, §2.4, §3.2, §3.4, §4.2, Table 1, Table 2.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §4.1.
- Unsupervised personalized feature selection. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3514–3521. Cited by: §3.4.
- Toward personalized relational learning. In Proceedings of the SIAM International Conference on Data Mining, pp. 444–452. Cited by: §3.4.
- Structure-adaptive neighborhood preserving hashing for scalable video search. IEEE Transactions on Circuits and Systems for Video Technology 32 (4), pp. 2441–2454. Cited by: §1.
- Large graph hashing with spectral rotation. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 2203–2209. Cited by: §3.4.
- Weakly-supervised semantic guided hashing for social image retrieval. International Journal of Computer Vision 128 (8), pp. 2265–2278. Cited by: §1.
- Supervised online hashing via hadamard codebook learning. In Proceedings of the ACM International Conference on Multimedia, pp. 1635–1643. Cited by: §4.1.
- Semantics-preserving hashing for cross-view retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3864–3872. Cited by: §1.
- Ordinal constraint binary coding for approximate nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 41 (4), pp. 941–955. Cited by: §1, §4.2, Table 2.
- Towards optimal binary code learning via ordinal embedding. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1258–1265. Cited by: §1, §4.2, Table 2, Table 3.
- Large graph construction for scalable semi-supervised learning. In Proceedings of the International Conference on International Conference on Machine Learning, pp. 679–686. Cited by: §1, §1.
- Discrete graph hashing. Advances in Neural Information Processing Systems 27, pp. 3419–3427. Cited by: §1, §1.
- Hashing with graphs. In Proceedings of the International Conference on Machine Learning, pp. 1–8. Cited by: §1, §2.3.
- Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision 42 (3), pp. 145–175. Cited by: §4.1.
- Learning multi-level deep representations for image emotion classification. Neural Processing Letters 51 (3), pp. 2043–2061. Cited by: §1.
- Multi-level region-based convolutional neural network for image emotion classification. Neurocomputing 333, pp. 429–439. Cited by: §1.
- Inductive hashing on manifolds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1562–1569. Cited by: §1.
- Towards universal representation learning for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6817–6826. Cited by: §1.
- Discriminative deep quantization hashing for face image retrieval. IEEE Transactions on Neural Networks and Learning Systems 29 (12), pp. 6154–6162. Cited by: §1.
- Unsupervised hashing based on the recovery of subspace structures. Pattern Recognition 103, pp. 107261. Cited by: §1, §4.2, Table 2, Table 3.
- Visualizing data using t-sne.. Journal of Machine Learning Research 9 (11). Cited by: §1.
- Feature hashing for large scale multitask learning. In Proceedings of the Annual International Conference on Machine Learning, pp. 1113–1120. Cited by: §3.1.
- Spectral hashing. Advances in Neural Information Processing Systems 21, pp. 1753–1760. Cited by: §1, §2.2, §4.2, Table 2.
- Concatenation hashing: a relative position preserving method for learning binary codes. Pattern Recognition 100, pp. 107151. Cited by: §1, §4.2, Table 2.
- Sparse projections for high-dimensional binary codes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3332–3339. Cited by: §1, §4.2, Table 2.
- Sub-region localized hashing for fine-grained image retrieval. IEEE Transactions on Image Processing 31, pp. 314–326. Cited by: §1.
- Semantic structure-based unsupervised deep hashing. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1064–1070. Cited by: §1.
- Adaptive deep metric learning for affective image retrieval and classification. IEEE Transactions on Multimedia 23, pp. 1640–1653. Cited by: §1.
- Attention-aware polarity sensitive embedding for affective image retrieval. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1140–1150. Cited by: §1.
- Deep unsupervised hybrid-similarity hadamard hashing. In Proceedings of the ACM International Conference on Multimedia, pp. 3274–3282. Cited by: §1.
- SADIH: semantic-aware discrete hashing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 5853–5860. Cited by: §1.
- Affective image retrieval via multi-graph learning. In Proceedings of the ACM International Conference on Multimedia, pp. 1025–1028. Cited by: §1.
- Exploring auxiliary context: discrete semantic transfer hashing for scalable image retrieval. IEEE Transactions on Neural Networks and Learning Systems 29 (11), pp. 5264–5276. Cited by: §1.
- Graph pca hashing for similarity search. IEEE Transactions on Multimedia 19 (9), pp. 2033–2044. Cited by: §1.
- A sparse embedding and least variance encoding approach to hashing. IEEE Transactions on Image Processing 23 (9), pp. 3737–3750. Cited by: §1.
- Sparse principal component analysis. Journal of Computational and Graphical Statistics 15 (2), pp. 265–286. Cited by: §3.4.