Application of Data Encryption in Chinese Named Entity Recognition

Kaifang Long, Jikun Dong, Shengyu Fan, Yanfang Geng,
Yang Cao, Han Zhao, Hui Yu, Weizhi Xu
School of Information Science and Engineering, Shandong Normal University,
Jinan 250358, China
School of Business, Shandong Normal University, Jinan 250358, China
1642445417@qq.com, xuweizhi@sdnu.edu.cn

Abstract

Recently, with the continuous development of deep learning, the performance of named entity recognition tasks has been dramatically improved. However, the privacy and the confidentiality of data in some specific fields, such as biomedical and military, cause insufficient data to support the training of deep neural networks. In this paper, we propose an encryption learning framework to address the problems of data leakage and inconvenient disclosure of sensitive data in certain domains. We introduce multiple encryption algorithms to encrypt training data in the named entity recognition task for the first time. In other words, we train the deep neural network using the encrypted data. We conduct experiments on six Chinese datasets, three of which are constructed by ourselves. The experimental results show that the encryption method achieves satisfactory results. The performance of some models trained with encrypted data even exceeds the performance of the unencrypted method, which verifies the effectiveness of the introduced encryption method and solves the problem of data leakage to a certain extent.

1 Introduction

Deep learning achieves state-of-the-art performance for many natural language processing (NLP) tasks such as named entity recognition (NER). However, many datasets used for training the deep learning model contain sensitive information. For example, datasets in the biomedical field usually consist of electronic medical records, which generally include identification information, disease information and treatment plans for patients. Many people or groups worry about the leak of their private data and are unwilling to disclose their data. Therefore, data absence is severe in many fields for model training. Most enterprises have problems such as a limited amount of data and poor data quality and cannot support the realization of artificial intelligence (AI) technology. In order to solve the above problem, federated learning [31, 19, 16, 28, 23, 14] came into being.

However, there are some shortcomings in federated learning. For example, 1) The data is not centralized. 2) It is impossible to foresee and avoid unstable connections between networks. Once the network is disconnected, the learning process will time out or exit abnormally. 3) Since the federal learning system requires multi-party collaboration, there are problems such as users cannot change models at will, the training speed is slow, and the model training requires high hardware configuration. 4) Although we can see that federated learning has been applied in some practical business scenarios, this technology is still far from entering the stage of large-scale implementation. To tackle the above problems, we propose a deep learning framework based on multiple data encryptions for the NER task. The proposed framework can overcome some shortcomings of federal learning and avoid privacy leakage to some degree.

NER is an important task in NLP, which identifies useful entities in the unstructured text, such as person name, place name, organization name, time, etc. The performance of NER seriously affects its downstream tasks, including relation extraction [34], knowledge graph [3], question answering [6], etc. With the emergence of dynamic word embeddings such as BERT [5], the application of recurrent neural network (RNN) suitable for time series modeling and conditional random field (CRF) with label constraints improves the performance of English NER [44, 48].

Compared with English NER, Chinese NER faces more difficulties. 1) There is no natural space as a character-to-character separator in Chinese sequences. 2) Because of the different lengths of entities in Chinese sentences, it is difficult to determine the boundaries of entities. Even if Chinese word segmentation tools can obtain the boundaries of some entities with different granularities, the error propagation caused by word segmentation cannot be avoided. In the wake of the development of pre-trained language models and character-word-based models like Lattice LSTM [49], the performance is satisfactory on resource-rich Chinese datasets [49, 45, 22, 35, 15] such as MSRA and Resume.

As an important international language, Chinese is unique in many ways. Chinese language, culture, and history are receiving more and more international attention and study, but there is a shortage of datasets about Chinese history in the Chinese NER field. The lack of data and applications poses challenges for scholars studying Chinese history for the first time and the construction of knowledge graphs on Chinese history. Based on the above, we construct a new Chinese historical dataset.

Meanwhile, we conduct experiments using multiple encryption methods on six datasets in biomedicine, news and history domains. The experiments show that the performance of encrypted data is satisfactory. This proves that our approach ensures the accuracy of the deep learning models and prevents data leakage to a certain extent. Our main contributions are as follows.

We introduce the hash algorithms and the ciphertext policy attribute-based encryption (CP-ABE) to NER for the first time. Experiments on six datasets show that our proposed multi-encryption strategy can ensure the performance of the model and protect the data to some degree.
We have an interesting finding that the performance does not degrade significantly when encrypting the training data for the NER task.
We release a new historical dataset for Chinese named entity recognition. It provides a foundation for recognizing identities from historical documents and building knowledge graphs.

2 Related Work

2.1 Named Entity Recognition

Named entity recognition aims to quickly extract entity information of specific types from complex natural language texts, which provides the foundation for information extraction to generate structured data. The earliest methods of NER are rule-based methods, lexicon-based methods and statistical machine learning-based methods, such as the Support Vector Machine Model (SVM) [13], Hidden Markov Model (HMM) [50], Conditional Random Field (CRF) [38], and so on. However, these methods suffer from feature engineering and fail to achieve the desired results for NER.

With the development of artificial intelligence, numerous neural network methods have been applied and achieved good results for named entity recognition. At that time, researchers mainly used LSTM-CRF or CNN-CRF models to encode and decode the input features [17, 21, 30, 4, 40, 46, 47]. But there are some issues with these methods. 1) The character-based methods do not use dictionary information. 2) The word-based method will cause error propagation due to word segmentation errors.

Accordingly, Zhang and Yang proposed a character-word-based hybrid model in 2018. This model can integrate word information into the character of the input sequence, which further improves the performance of the NER task. Therefore, many researchers have derived a large number of models based on the approach . For example, [27] proposed the WC-LSTM model, [11] proposed the LR-CNN model, and [29] proposed the Softword method. [7, 41, 12] used graph neural networks to improve NER performance. It is well known that NER tasks rely heavily on word embeddings. The quality of word embeddings can determine the performance of the model. The emergence of dynamic word embeddings such as BERT pushed the performance of NER to a new level. For instance, [25] and [32] proposed a FLAT model and a Porous Lattice Transformer Encoder based on Lattice LSTM, respectively.

2.2 Data protection

Named entity recognition introduces data protection [37, 9, 33] to protect the private information contained in a dataset from being widely read and leaked. However, data leakage is the intentional or unintentional disclosure or loss of data to untrustworthy third parties. This paper focuses on protecting private data using data desensitization and access control strategies to prevent data leakage. Data desensitization is a crucial method to remove the sensitivity of personal privacy data and minimize the risk of data leakage. It is mainly used to process the original data utilizing data replacement, data randomization, data encryption, and hash transformation. We performed a hash transformation on the NER dataset using data desensitization. Second, access control generally means that the data provider sets up security rules or policies. Users can obtain and decrypt the encrypted data according to their permissions or attributes.

Figure 1: The overall architecture of the proposed method.

3 Method

To avoid widespread reading and leakage of data, and yet ensure that users can build machine learning models according to their needs. We propose an NER framework based on multi-encrypted data to solve the above problem. As shown in Figure 1, the framework consists of three steps. The first step is that the data provider encrypts the original data using hash algorithms. The second step is that we use Ciphertext policy attribute-based encryption (CP-ABE) based on hash encryption to achieve double data protection to solve the problem of illegal users accessing the data. The third step is that the training and prediction of the model after the legitimate users obtain the data.

3.1 Data encryption

Hash functions are used to encrypt data in our model framework in Figure 1. To verify the authenticity and validity of encrypted data training and encrypt the same data using more than one encryption method, we also introduce Serial Cipher and Base64 methods besides hash functions.

Serial Cipher [39], also called Stream Cipher, is one of the symmetric cryptographic algorithms. It encrypts the plaintext by adding the key. Since serial encryption has the advantages of simple implementation and fast encryption speed, we take it as an optional method for data encryption. Meanwhile, the Caesar Cipher and Affine Cipher can be consistent with the Serial Cipher by adjusting the coefficients. We grouped the three encryption algorithms into one category.

As we all know, Base64 [18] is the most common encoding method for transmission in the network, which is used to transmit 8-bit byte codes. Although it violates the principle of non-disclosure of encryption keys, Base64 encoding can process text data due to its unreadable advantages. Therefore, the method provides a way for verifying the validity of the encryption method and performing multiple encryptions.

MD5 message-digest algorithm and Has256 algorithm [36] are hash functions widely used in computer security. They provide the ability to transform data with arbitrary length to a fixed value. At the same time, the MD5 algorithm and the Has256 algorithm are irreversible, which can effectively ensure the security of the dataset. In the experiment, to realize the feasibility of multi-encryption of the same data, we also use the Base64 algorithm on top of the Has256 algorithm for encryption.

3.2 Cp-Abe

Traditional attribute-based encryption (ABE) [10] systems describe the ciphertext by attributes and embed the policy into the user’s key. Where attributes are characteristics of things or information files, and policy is a logical expression composed of attributes and relationships. CP-ABE [1, 43] uses attributes to portray the user’s eligibility. The data provider makes the ciphertext acquisition policy to decide who can decrypt the ciphertext. In other words, the attributes are embedded in the key, and the policy is embedded in the ciphertext.

As shown in the upper right of Figure 1, firstly, the data provider initializes a public key (PK) and a master key (MK). Secondly, the public key, the master key, and the user’s attribute set (Au) generate a private key (SK). Then, the data provider constructs an access control policy (Ac-cp) based on the user’s attributes. The public key, the plaintext M ( M indicates the data encrypted with the hash algorithm in the figure), can generate the ciphertext (C) through the access control policy. Finally, the user could decrypt the data based on the public and private keys. If the user attribute is legitimate, the ciphertext can be decoded successfully; otherwise, it cannot.

3.3 Model training and prediction

As shown in the bottom right of Figure 1, we describe how the user trains the model and predicts with the model on the encrypted dataset. First, the legitimate user follows the CP-ABE rule and decodes the data to obtain three data files using the ciphertext, public key, and private key provided by the data provider. Text1 represents the ciphertext encrypted with the hash algorithm, Text2 represents the label text corresponding to the ciphertext, and Text3 is the length of each sequence in the ciphertext. According to the three texts, users can get the training data completely. If users need to test the model’s performance with their data, users could encrypt the data using the same hashing algorithm in Text1 before prediction.

With the continuous exploration of researchers, neural networks have developed rapidly. Li et al. significantly improved the performance of biomedical named entity recognition by using recurrent neural networks [24]. Meanwhile, applying deep neural networks such as convolutional neural networks, self-attention mechanisms, and transformers [42, 2] has effectively promoted the development of NER. Therefore, the user can build models based on the above deep neural networks after obtaining encrypted data. Here, we use the classical model BiLSTM-CRF as a benchmark. BiLSTM comprises a forget gate, an input gate, and an output gate. These three gate mechanisms interact with each other and update the cell state. The specific formulas are as follows.

F^{T} = σ (W_{α}^{F} X^{T} + W_{β}^{F} H^{T - 1} + B_{F})

(1)

I^{T} = σ (W_{α}^{I} X^{T} + W_{β}^{I} H^{T - 1} + B_{I})

(2)

O^{T} = σ (W_{α}^{O} X^{T} + W_{β}^{O} H^{T - 1} + B_{O})

(3)

{~ C}^{T} = T a n h (W_{α}^{C} X^{T} + W_{β}^{C} H^{T - 1} + B_{C})

(4)

C^{T} = F^{T} ⊙ C^{T - 1} + I^{T} ⊙ {~ C}^{T}

(5)

H^{T} = O^{T} ⊙ T a n h (C^{T})

(6)

Here, $F^{T}$ represents the information that the cell state will forget. $I^{T}$ and $O^{T}$ denote the input and output gates, respectively. ${~ C}^{T}$ denotes the current cell state and $C^{T}$ denotes the final cell state. Where $W$ is the hyperparameter and $H$ denotes the output of the hidden state.

Figure 2: Label state transitions of conditional random fields in named entity recognition.

We use a conditional random field [8, 20, 26] to constrain the transfer between tags after the encoding layer. As shown in Figure 2, B-L denotes the start of a location, I-L denotes the middle of a location, and E-L denotes the end of a location. S-L indicates that a single entity constitutes a location, and O is a non-entity. According to the figure, we perform the following constraints. O is impossible to transfer into I-L and E-L. B-L is impossible to transfer into B-L, S-L, and O. I-L is impossible to transfer into B-L, S-L, and O. E-L is impossible to transfer into E-L and I-L.

4 Datasets

This paper focuses on comparing model performance before and after data encryption. We use six Chinese NER datasets to verify the authenticity and effectiveness of the experiments, including CCKS2017, Resume, MSRA, and History. The number of sentences and words in each dataset is shown in Table 1.

Datasets	Type	Train	Dev	Test
Resume	Char	124.4K	13.9K	15.1K
Resume	Sentence	3.8K	0.46K	0.48K
MSRA	Char	2169.9K	–	172.6K
MSRA	Sentence	46.4K	–	4.4K
CCKS2017	Char	200.0K	31.8K	33.6K
CCKS2017	Sentence	5.9K	0.82K	1.09K
History	Char	289.1K	30.9K	0.97K
History	Sentence	8.9K	0.97K	0.81K

Table 1: Statistics of datasets

–	ORGARM	LOC	DAT	ORG	PER
Train	1202	4231	1510	3934	8618
Dev	215	578	179	426	829
Test	212	603	311	396	507
–	LOCPER	EVE	POS	APP	–
Train	231	383	3169	834	–
Dev	34	19	441	111	–
Test	32	64	216	118	–

Table 2: The number of nine entities on the train, dev, and test sets.

CCKS2017 is a clinical medicine NER dataset released by China Conference on Knowledge Graph and Semantic Computing. Since CCKS2017 only provides a relatively large-scale dataset, we divided the dataset to test set, development set, and training set. Resume and MSRA are from social media and news, and the History dataset comes from the field of Chinese history.

Model	CCKS2017			Resume			MSRA
Model	$P$	$R$	$F 1$	$P$	$R$	$F 1$	$P$	$R$	$F 1$
LSTM-CRF	88.45	87.35	87.90	93.73	93.44	93.58	89.52	87.41	88.45
+ Serial Cipher	89.25	87.13	88.18	93.53	93.13	93.33	89.10	87.25	88.16
+Base64	88.66	87.11	87.88	93.68	93.62	93.65	89.74	87.34	88.53
+MD5	89.21	86.96	88.07	93.46	93.80	93.63	89.68	87.18	88.41
+Has256-Base64	87.74	87.48	87.61	93.50	93.56	93.53	88.14	86.19	87.15
WC-LSTM	88.96	87.33	88.14	95.14	94.79	94.96	93.67	92.20	92.93
+ Serial Cipher	90.55	86.14	88.29	95.36	94.66	95.01	93.66	92.02	92.83
+Base64	89.59	87.38	88.47	93.64	93.87	93.75	89.33	87.58	88.45
+MD5	89.43	87.33	88.37	93.65	94.05	93.85	88.81	86.92	87.86
+Has256-Base64	88.96	87.21	88.08	93.57	93.74	93.66	90.23	87.34	88.76
Multi-digraph	89.50	88.40	88.94	94.62	94.97	94.79	90.82	91.20	91.01
+ Serial Cipher	88.43	86.01	87.20	95.04	95.28	95.16	88.21	85.73	86.95
+Base64	89.66	88.86	89.26	94.44	94.91	94.68	91.30	90.68	90.99
+MD5	89.22	88.94	89.08	94.37	94.66	94.52	90.54	87.19	88.83
+Has256-Base64	88.73	89.18	88.95	94.74	95.03	94.89	91.38	90.86	91.12
SoftLexicon	89.67	87.23	88.43	95.30	95.77	95.53	93.72	91.88	92.79
+ Serial Cipher	90.08	88.34	89.20	95.48	94.66	95.07	88.42	84.32	86.32
+Base64	89.74	86.89	88.29	94.25	93.50	93.87	88.44	84.63	86.49
+MD5	90.52	86.98	88.72	95.50	94.97	95.23	88.23	84.52	86.33
+Has256-Base64	90.46	86.42	88.39	95.26	94.97	95.12	87.83	84.68	86.23

Table 3: Performance on three public datasets

Model	History-9types			History-3types			History-2types
Model	$P$	$R$	$F 1$	$P$	$R$	$F 1$	$P$	$R$	$F 1$
LSTM-CRF	76.01	60.68	67.48	76.15	50.45	60.69	72.40	60.58	65.96
+ Serial Cipher	76.04	60.27	67.24	76.94	49.36	60.09	71.32	52.97	60.79
+Base64	76.06	60.59	67.45	73.60	51.85	60.84	72.14	60.17	65.61
+MD5	76.88	59.90	67.34	72.18	52.23	60.61	73.18	58.51	65.03
+Has256-Base64	77.20	59.90	67.46	74.25	50.32	59.98	75.00	57.68	65.21
WC-LSTM	82.43	68.48	74.81	83.19	61.15	70.48	82.45	62.38	71.02
+ Serial Cipher	82.30	68.08	74.52	81.99	62.04	70.63	84.73	62.93	72.22
+Base64	77.84	61.12	68.47	73.02	51.72	60.55	73.47	61.27	66.82
+MD5	76.27	62.10	68.46	72.99	51.94	60.71	74.15	60.30	66.51
+Has256-Base64	76.67	61.61	68.32	75.57	50.45	60.50	72.50	61.27	66.42
Multi-digraph	75.31	71.70	73.46	76.14	59.36	66.71	75.31	76.35	75.82
+ Serial Cipher	77.24	68.73	72.74	69.54	61.66	65.36	71.54	77.18	74.25
+Base64	78.42	71.37	74.73	69.06	62.55	65.64	74.90	75.52	75.21
+MD5	75.79	71.82	73.75	69.79	60.64	64.89	76.23	70.54	73.28
+Has256-Base64	78.66	63.42	70.22	73.87	60.51	66.53	75.46	74.00	74.72
SoftLexicon	82.65	70.11	75.86	82.47	67.13	74.02	85.71	73.03	78.86
+ Serial Cipher	80.31	66.33	72.65	82.23	63.06	71.38	82.30	71.37	76.44
+Base64	73.00	52.99	61.40	73.36	51.85	60.76	71.99	56.15	63.09
+MD5	75.26	58.28	65.69	73.01	54.87	62.59	79.62	63.21	70.47
+Has256-Base64	75.24	57.83	65.39	74.43	53.76	62.43	76.97	64.73	70.32

Table 4: Performance on History datasets

At present, Chinese has become an important international language. We constructed a brand-new dataset to solve the lack of historical datasets (History in Table 1) in the Chinese NER domain. As shown in Table 2, our dataset consisted of 9 types of labels, including organization name (ORG), place name (LOC), time (DAT), person name (PER), salutation (POS), official position (APP), book name (EVE), army name (ORGARM), and place of belonging (LOCPER). Furthermore, we divided the historical dataset into three categories to identify certain specific entities. The first category has nine types of tags called History-9types. The second category, History-3types, has LOC, APP, and EVE entities. The third category has only PER and POS called History-2types.

5 Experiments

5.1 Baseline Methods

In this section, we use four models proposed in recent years to verify the effectiveness of encryption algorithms.

BiLSTM-CRF. BiLSTM-CRF was proposed by Lample et al. in 2016. The method is a classical model in named entity recognition. Compared with the traditional machine learning models, it shows a dramatic enhancement in performance.

WC-LSTM. WC-LSTM (2019) is a word-character-based model proposed by Liu et al. for addressing the shortcomings of Lattice LSTM. The model provides four methods that effectively integrate lexicon knowledge into characters.

Multi-digraph Model. Multi-digraph (2019) is a model proposed by modifying the gated graph neural network (GGNN), which can effectively integrate word information into characters.

SoftLexicon. SoftLexicon (2020) is a novel approach to utilizing dictionary information proposed by Ma et al. Its encoding framework is very flexible and can enormously improve the performance of entity recognition.

5.2 Overall Performances

Table 3 shows the results on three public datasets, including CCKS2017, Resume, and MSRA. The results of four different models are given in Table3. For each model, we do five experiments. In the first experiment, we train the model and predict with the model using plaintext. In the other four experiments, we train the model and predict with the model using ciphertext, encrypted by Serical Cipher, Base64, MD5 and Has256-Base64 respectively. Has256-Base64 means that the data encrypted first by Has256 and then by Base64. For CCKS2017 and Resume, the experimental results of the four encryption methods show barely any change in performance on the four baseline models when comparing the plaintext training method with the ciphertext training methods. some results obtained with the ciphertext training methods are even better than the plaintext training method. For the MSRA dataset, when training Has256-Base64 encrypted data using Multi-digraph, Precision and F1 value are 0.56% and 0.11% higher than the plaintext training method. When training LSTM-CRF, there is almost no degradation in the performance. When training WC-LSTM and SoftLexicon, the experimental performance degradation may be caused by the under-utilization of lexical knowledge by the encryption methods and the effect of the absence of development sets.

Table 4 shows the results of the historical dataset. As shown in the table, compared to unencrypted, the performance of the four encryption algorithms barely degrades when trained with the LSTM-CRF model. Moreover, some encryption algorithms have better performance than unencrypted ones. For example, Precision is 0.87% and 0.78% higher than unencrypted by using the MD5 encryption algorithm on the History-9types and History-2types datasets. Precision is improved by 0.79% using serial encryption algorithms, and Recall is improved by 1.78% using MD5 encryption algorithms on the History-3types dataset. When the WC-LSTM model is trained using Serial Cipher encrypted data, Recall is 0.89% higher than the unencrypted dataset on the History-3types dataset. Precision, Recall, and F1 value are 2.28%, 0.55%, and 1.2% higher than the unencrypted on the History-2types dataset. When trained with the Multi-digraph model, Precision and F1 value are 3.11% and 1.27% higher than unencrypted when using the Base64 algorithm on History-9types. When trained with the SoftLexicon model, we find the effectiveness of the encryption algorithm drops significantly. Because the SoftLexicon model breaks the convention of using vocabulary knowledge, it causes poor performance.

5.3 Analysis

5.3.1 Performance analysis

Except for the LSTM-CRF model, the other models use vocabulary knowledge. However, the performance of models using vocabulary knowledge degraded on certain datasets. The possible reasons are as follows. 1) The vocabulary knowledge is underutilized after encrypting the data. 2) The embedding of characters and words is possibly affected by the encryption process. 3) Different model architectures also influence the performance. To solve the above problems, we can obtain the word embedding by retraining the model using large-scale encrypted data in the future. At the same time, because of the advantage of dynamic word embedding like BERT, we can also consider training a BERT using encrypted data to improve the performance of NER in the future.

5.3.2 Security analysis

In this paper, we use hash functions and CP-ABE to ensure data security. The hash functions have weak collision-resistant, strong collision-resistant, and modification resistance characteristics. The most important feature is that it is irreversible, so users are difficult to decrypt the plaintext, which can ensure data security to a great extent. At the same time, we can further encrypt the data on top of the hash encryption to strengthen the security such as Has256-Base64. Finally, we use CP-ABE to ensure the legitimacy of the user who obtains the ciphertext generated by other encrytion methods. Since CP-ABE is built on the difficulty of computing discrete logarithms, it is challenging to get the data for unauthorized users, proxy servers, etc. Based on the above analysis, we can ensure the feasibility and security of our framework.

5.4 Relations between entities

We train a NER model using the Chinese historical dataset to improve the performance of its downstream tasks, which can accurately identify entities in unstructured text. Figure 3 shows a simple knowledge graph constructed after we extracted partial entity types from the text by the model.

The red nodes indicate the Chinese dynasties, such as the Qing Dynasty, Ming Dynasty, and Tang Dynasty. The red edges indicate the substitution between dynasties, such as the Qing Dynasty overthrew the Ming Dynasty and the Tang Dynasty overthrew the Sui Dynasty. The yellow nodes express the capital of each dynasty. As a result of capital city migration, a red node may be connected to multiple yellow nodes. A black node describes the country’s leader, and a black edge indicates a father-son or brother relationship. Green nodes are emperors’ aliases.

In Figure 3, we can know most of the relationships between the various dynasties, which fully justifies the necessity of constructing a Chinese historical dataset. It paves the way for future research.

Figure 3: Diagram of the relationship of partial entities in the historical dataset.

6 Conclusion

In this paper, we introduce the hash algorithm and CP-ABE to the named entity recognition task for the first time. This method can effectively solve the problem of data leakage in some fields. We propose a new dataset to solve the problem of the lack of datasets in historical domains. Our experiments on six datasets demonstrate that our method can obtain satisfactory results. Simultaneously, in the future, we can further improve the experimental performance by training BERT using encrypted data.

References

[1] J. Bethencourt, A. Sahai, and B. Waters (2007) Ciphertext-policy attribute-based encryption. In 2007 IEEE symposium on security and privacy (SP’07), pp. 321–334. Cited by: §3.2.
[2] P. Cao, Y. Chen, K. Liu, J. Zhao, and S. Liu (2018) Adversarial transfer learning for chinese named entity recognition with self-attention mechanism. In Proceedings of the 2018 conference on empirical methods in natural language processing, pp. 182–192. Cited by: §3.3.
[3] M. Chen, W. Zhang, Z. Yuan, Y. Jia, and H. Chen (2021) Fede: embedding knowledge graphs in federated setting. In The 10th International Joint Conference on Knowledge Graphs, pp. 80–88. Cited by: §1.
[4] J. P. Chiu and E. Nichols (2016) Named entity recognition with bidirectional lstm-cnns. Transactions of the association for computational linguistics 4, pp. 357–370. Cited by: §2.1.
[5] J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §1.
[6] D. Diefenbach, V. Lopez, K. Singh, and P. Maret (2018) Core techniques of question answering systems over knowledge bases: a survey. Knowledge and Information systems 55 (3), pp. 529–569. Cited by: §1.
[7] R. Ding, P. Xie, X. Zhang, W. Lu, L. Li, and L. Si (2019) A neural multi-digraph model for chinese ner with gazetteers. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1462–1467. Cited by: §2.1.
[8] G. D. Forney (1973) The viterbi algorithm. Proceedings of the IEEE 61 (3), pp. 268–278. Cited by: §3.3.
[9] K. Gai, M. Qiu, and H. Zhao (2017) Privacy-preserving data encryption strategy for big data in mobile cloud computing. IEEE Transactions on Big Data 7 (4), pp. 678–688. Cited by: §2.2.
[10] V. Goyal, O. Pandey, A. Sahai, and B. Waters (2006) Attribute-based encryption for fine-grained access control of encrypted data. In Proceedings of the 13th ACM conference on Computer and communications security, pp. 89–98. Cited by: §3.2.
[11] T. Gui, R. Ma, Q. Zhang, L. Zhao, Y. Jiang, and X. Huang (2019) CNN-based chinese ner with lexicon rethinking.. In ijcai, pp. 4982–4988. Cited by: §2.1.
[12] T. Gui, Y. Zou, Q. Zhang, M. Peng, J. Fu, Z. Wei, and X. Huang (2019) A lexicon-based graph neural network for chinese ner. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 1040–1050. Cited by: §2.1.
[13] M. S. Habib and J. Kalita (2007) Language and domain-independent named entity recognition: experiment using svm and high-dimensional features. In Proc. of the 4th Biotechnology and Bioinformatics Symposium (BIOT-2007), Colorado Springs, CO, Cited by: §2.1.
[14] C. He, M. Annavaram, and S. Avestimehr (2020) Group knowledge transfer: federated learning of large cnns at the edge. Advances in Neural Information Processing Systems 33, pp. 14068–14080. Cited by: §1.
[15] H. He and X. Sun (2017) F-score driven max margin neural network for named entity recognition in chinese social media. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 713–718. Cited by: §1.
[16] B. Hitaj, G. Ateniese, and F. Perez-Cruz (2017) Deep models under the gan: information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp. 603–618. Cited by: §1.
[17] Z. Huang, W. Xu, and K. Yu (2015) Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991. Cited by: §2.1.
[18] S. Josefsson et al. (2006) The base16, base32, and base64 data encodings. Technical report RFC 4648, October. Cited by: §3.1.
[19] P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al. (2021) Advances and open problems in federated learning. Foundations and Trends® in Machine Learning 14 (1–2), pp. 1–210. Cited by: §1.
[20] J. Lafferty, A. McCallum, and F. C. Pereira (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. Cited by: §3.3.
[21] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer (2016) Neural architectures for named entity recognition. In Proceedings of NAACL-HLT, pp. 260–270. Cited by: §2.1.
[22] G. Levow (2006) The third international chinese language processing bakeoff: word segmentation and named entity recognition. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 108–117. Cited by: §1.
[23] D. Li and J. Wang (2019) Fedmd: heterogenous federated learning via model distillation. arXiv preprint arXiv:1910.03581. Cited by: §1.
[24] L. Li, W. Xu, and H. Yu (2020) Character-level neural network model based on nadam optimization and its application in clinical concept extraction. Neurocomputing 414, pp. 182–190. Cited by: §3.3.
[25] X. Li, H. Yan, X. Qiu, and X. Huang (2020) FLAT: chinese ner using flat-lattice transformer. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6836–6842. Cited by: §2.1.
[26] J. C. Lin, Y. Shao, J. Zhang, and U. Yun (2020) Enhanced sequence labeling based on latent variable conditional random fields. Neurocomputing 403, pp. 431–440. Cited by: §3.3.
[27] W. Liu, T. Xu, Q. Xu, J. Song, and Y. Zu (2019) An encoding strategy based word-character lstm for chinese ner. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 2379–2389. Cited by: §2.1.
[28] Y. Liu, Y. Kang, C. Xing, T. Chen, and Q. Yang (2020) A secure federated transfer learning framework. IEEE Intelligent Systems 35 (4), pp. 70–82. Cited by: §1.
[29] R. Ma, M. Peng, Q. Zhang, Z. Wei, and X. Huang (2020) Simplify the usage of lexicon in chinese ner. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5951–5960. Cited by: §2.1.
[30] X. Ma and E. Hovy (2016) End-to-end sequence labeling via bi-directional lstm-cnns-crf. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1064–1074. Cited by: §2.1.
[31] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas (2017) Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273–1282. Cited by: §1.
[32] X. Mengge, B. Yu, T. Liu, Y. Zhang, E. Meng, and B. Wang (2020) Porous lattice transformer encoder for chinese ner. In Proceedings of the 28th International Conference on Computational Linguistics, pp. 3831–3841. Cited by: §2.1.
[33] K. Mivule (2017) Data swapping for private information sharing of web search logs. Procedia computer science 114, pp. 149–158. Cited by: §2.2.
[34] M. Miwa and M. Bansal (2016) End-to-end relation extraction using lstms on sequences and tree structures. arXiv preprint arXiv:1601.00770. Cited by: §1.
[35] N. Peng and M. Dredze (2015) Named entity recognition for chinese social media with jointly trained embeddings. In Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 548–554. Cited by: §1.
[36] R. Rivest (1992) The md5 message-digest algorithm. Technical report Cited by: §3.1.
[37] T. Sivakumar, T. Sheela, R. Kumar, and K. Ganesan (2017) Enhanced secure data encryption standard (es-des) algorithm using extended substitution box (s-box). Int. J. Appl. Eng. Res 12 (21), pp. 11365–11373. Cited by: §2.2.
[38] N. Sobhana, P. Mitra, and S. Ghosh (2010) Conditional random field based named entity recognition in geological text. International Journal of Computer Applications 1 (3), pp. 143–147. Cited by: §2.1.
[39] D. R. Stinson (2005) Cryptography: theory and practice. Chapman and Hall/CRC. Cited by: §3.1.
[40] E. Strubell, P. Verga, D. Belanger, and A. McCallum (2017) Fast and accurate entity recognition with iterated dilated convolutions. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2670–2680. Cited by: §2.1.
[41] D. Sui, Y. Chen, K. Liu, J. Zhao, and S. Liu (2019) Leverage lexical knowledge for chinese named entity recognition via collaborative graph network. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3830–3840. Cited by: §2.1.
[42] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention is all you need. Advances in neural information processing systems 30. Cited by: §3.3.
[43] Z. Wan, R. H. Deng, et al. (2011) HASBE: a hierarchical attribute-based solution for flexible and scalable access control in cloud computing. IEEE transactions on information forensics and security 7 (2), pp. 743–754. Cited by: §3.2.
[44] X. Wang, Y. Jiang, N. Bach, T. Wang, Z. Huang, F. Huang, and K. Tu (2020) Automated concatenation of embeddings for structured prediction. arXiv preprint arXiv:2010.05006. Cited by: §1.
[45] R. Weischedel, S. Pradhan, L. Ramshaw, M. Palmer, N. Xue, M. Marcus, A. Taylor, C. Greenberg, E. Hovy, R. Belvin, et al. (2011) Ontonotes release 4.0. LDC2011T03, Philadelphia, Penn.: Linguistic Data Consortium. Cited by: §1.
[46] F. Wu, J. Liu, C. Wu, Y. Huang, and X. Xie (2019) Neural chinese named entity recognition via cnn-lstm-crf and joint training with word segmentation. In The World Wide Web Conference, pp. 3342–3348. Cited by: §2.1.
[47] J. Yang and Y. Zhang (2018) NCRF++: an open-source neural sequence labeling toolkit. In Proceedings of ACL 2018, System Demonstrations, pp. 74–79. Cited by: §2.1.
[48] J. Yu, B. Bohnet, and M. Poesio (2020) Named entity recognition as dependency parsing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6470–6476. Cited by: §1.
[49] Y. Zhang and J. Yang (2018) Chinese ner using lattice lstm. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1554–1564. Cited by: §1.
[50] G. Zhou and J. Su (2002) Named entity recognition using an hmm-based chunk tagger. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 473–480. Cited by: §2.1.