Temporal Fuzzy Utility Maximization with Remaining Measure

Shicheng Wan, Zhenqiang Ye, Wensheng Gan, and Jiahui Chen
This research was supported in part by the National Natural Science Foundation of China (Grant Nos. 61902079 and 62002136), Guangzhou Basic and Applied Basic Research Foundation (Grant Nos. 202102020928 and 202102020277), National Key-Research and Development Program of China (Grant No. 2020YFB2104003), and the Young Scholar Program of Pazhou Lab (Grant No. PZL2021KF0023).Shicheng Wan, Zhenqiang Ye, and Jiahui Chen are with the Department of Computer Science, Guangdong University of Technology, Guangzhou 510006, China. (E-mail: scwan1998@gmail.com, yzq66f@gmail.com, csjhchen@gmail.com)Wensheng Gan is with the College of Cyber Security, Jinan University, Guangzhou 510632, China; and also with Pazhou Lab, Guangzhou 510330, China. (E-mail: wsgan001@gmain.com)Corresponding author: Wensheng Gan and Jiahui Chen

Abstract

High utility itemset mining approaches discover hidden patterns from large amounts of temporal data. However, an inescapable problem of high utility itemset mining is that its discovered results hide the quantities of patterns, which causes poor interpretability. The results only reflect the shopping trends of customers, which cannot help decision makers quantify collected information. In linguistic terms, computers use mathematical or programming languages that are precisely formalized, but the language used by humans is always ambiguous. In this paper, we propose a novel one-phase temporal fuzzy utility itemset mining approach called TFUM. It revises temporal fuzzy-lists to maintain less but major information about potential high temporal fuzzy utility itemsets in memory, and then discovers a complete set of real interesting patterns in a short time. In particular, the remaining measure is the first adopted in the temporal fuzzy utility itemset mining domain in this paper. The remaining maximal temporal fuzzy utility is a tighter and stronger upper bound than that of previous studies adopted. Hence, it plays an important role in pruning the search space in TFUM. Finally, we also evaluate the efficiency and effectiveness of TFUM on various datasets. Extensive experimental results indicate that TFUM outperforms the state-of-the-art algorithms in terms of runtime cost, memory usage, and scalability. In addition, experiments prove that the remaining measure can significantly prune unnecessary candidates during mining.

Impact Statement—This article proposes a method to find profitable fuzzy itemsets from temporal databases. The novel algorithm can play an important role in marketing and business, where these discovered patterns are paramount for strategic decision-making. The TFUM algorithm achieves this by integrating fuzzy set theory into itemset mining and then making it possible to transform linguistic terms into quantitative values and making binary machines find both interesting and explicable patterns, such as a young and tall man, beautiful flowers, and sauna days. The proposed remaining measure addresses the “combinatorial explosion” problem in a certain way and achieves state-of-the-art performance on various databases (including sparse and dense). TFUM can provide a contribution to the explainable artificial intelligence system, pattern recognition, communication of information applications, and so on.

artificial intelligence, remaining measure, temporal fuzzy-list, temporal mining, fuzzy set.

I Introduction

Data mining technology can be regarded as an algorithmic process that takes data as input and yields useful results as output. It is meaningful to identify and isolate relationships among the different items that may be hidden in massive data, since the first and well-known association rule mining (ARM) algorithm was proposed [2]. In market basket analysis, the database records items purchased by a customer at a single time as a transaction. The association rule mining task aims to find out the “association” between sets of items with some strong specified confidence. An example of such an association is the statement that 88% of transactions involving men from 30 to 39 years old will buy a car. The number 88% shows the confidence value of the rule. The antecedent of this rule consists of men aged 30 to 39 years old, and the consequent consists of the car alone. This rule means a man has a high probability of purchasing a car when he is between 30 and 39 years old. In the meantime, frequent itemset mining (FIM) [15, 14, 33] started as a phase in the discovery of association rules, but has been generalized independent of these to many other patterns. In recent years, Aggarwal et al. [1] proposed a spatio-temporal FIM algorithm on web data. However, a fundamental limitation of both FIM and ARM is that they both assume that each item cannot appear more than once in each transaction and that all items have the same importance (e.g., weight, unit profit, or risk). In other words, bread will be regarded as interesting, but diamonds will not, because the sale volume of the latter one is far less than that of the prior. The frequency of an itemset cannot be a sufficient indicator of interest in some cases. As a consequence, traditional high-utility itemset mining (HUIM) algorithms discover more realistic and important knowledge than FIM by taking into account non-binary occurrences of items within transactions and various interests in distinct items. The discovered results of traditional HUIM are more interpretable than those of FIM. After decades of development, plenty of algorithms [28, 12, 11, 37] were proposed and have been applied in many applications like cross-marketing [30], click stream analysis [7], and traffic management [13].

However, a critical drawback of previous studies is that few of them consider the temporal aspect of databases. If a customer visited the time of a retail last year, his/her transaction record may be invalid at present, and thus be useless for analysis nowadays [36]. More generally, consider that transactions in a retail store contain the following five items: {apple}, {ice-cream}, {neckerchief}, {sweater}, and {watermelon}. Items such as {watermelon} and {ice-cream} are typically best-selling during the summer, whereas items such as {sweater} and {neckerchief} sales better in the winter than in the summer, and {apple} will be needed all year round. Therefore, in this case, traditional HUIM approaches are unable to deal with temporal transaction databases. In recent years, Chen et al. [5] proposed a novel on-shelf utility mining algorithm and then discovered maximal profitable product combinations by considering on-shelf periods of items. However, all the above data mining algorithms simply reveal that discovered itemsets (i.e., product combinations) are profitable but not capable of reflecting quantization information. The Likert scale is a notable example of this issue. There are five different types of answers: totally disagree, disagree a little, neutral opinion, agree a little, and totally agree. The fact remains that such imprecisely defined ”extents/classes” play an important role in human thinking, particularly in the domains of pattern recognition, communication of information, and abstraction. Hence, how to make a machine learns linguistic representations (e.g., sweet, hot, and beautiful) is an interesting but challenging task. Huang et al. [19] figured out that fuzzy quantifiers can be a fuzzy approximation to linguistic representations. By adopting fuzzy set theory [43] and user-defined membership functions, users can easily comprehend how many products are sold at an expensive, moderate, or cheap price. The characteristic of fuzzy set is that determining whether an element belongs to a set cannot be answered simply by “yes” or “no”, whereas it is gradient. Huang et al. [19] first defined the temporal fuzzy utility itemset mining task. That is, given a predefined minimum threshold and user-defined membership functions, mining a complete set of high temporal fuzzy utility itemset from temporal quantitative transaction databases. The aim of the novel mining task is to find patterns that are easily interpretable by users, and thus can help in understanding the results. However, their proposed algorithm is a “generate-and-test/two-phase” approach and thus performs poorly. Then, Hong et al. continuously proposed several new studies [17, 18] to improve the performance of algorithms in the temporal fuzzy utility itemset mining domain.

Recently, Ye et al. [42] proposed an effective temporal fuzzy utility itemset mining algorithm, which performs better than the algorithm of Hong et al. [18]. Though their approach utilizes list structure to efficiently store heuristic information, the adopted upper-bounds are too loose to consume a large amount of runtime and memory. This issue motivates us to further develop the temporal fuzzy utility itemset mining. In this paper, we propose a new one-phase list-based algorithm called TFUM. The major contributions of this paper are summarized as follows.

A novel structure, called RTF-List, is proposed. We first integer maximal remaining fuzzy utility into temporal fuzzy-lists and propose a remaining measure to decide whether the fuzzy itemsets should be pruned or not.
Without checking transaction identifications one by one, we propose a TP-table which is capable of fast locating quantitative transactions by period identification.
The novel algorithm holds the downward closure property and adopts the temporal fuzzy utility upper-bound ratio to prune the search space as far as possible.
Extensive experiments have been conducted on four datasets (including sparse and dense). Under various periods and minimal thresholds, the experimental results reveal that the novel proposed algorithm performs far better than the state-of-the-art algorithms in terms of run time, memory consumption, and scalability.

The rest of the content in this paper is organized as follows. The related work will be briefly reviewed in Section II. Section III introduces some basic definitions and the problem statement. Section IV describes the proposed algorithm, and Section V illustrates a detailed example of our approach. Finally, extensive experiments and responding analysis are shown in Section VI, and future work is finally presented in Section VII.

Ii Related Work

Ii-a Itemset Mining Using List Structure

Based on the way of discovering interesting patterns from databases, many list-based algorithms in itemset mining domain actually belong to a one-phase approach. In high utility itemset mining (HUIM), HUI-Miner [28] is the earliest and most famous list-based algorithm. Compared to the two-phase HUIM algorithms [41, 29], HUI-Miner is a pattern-growth approach and does not need to scan the original database more than twice. It firstly finds promising low-level itemsets (which are potential high utility itemsets) and then constructs utility-lists for these itemsets. A utility-list consists of several tuples, and there are two basic elements (i.e., identifiers of the transaction involving the itemset and the utility of the itemset) in each tuple. Then, the algorithm joins different utility-lists to generate new utility-lists of high-level itemsets until there are no more itemsets generated. After that, numerous optimization algorithms [12, 10, 27, 20, 9, 21, 40] have been proposed in the last decades.

Besides, to meet the requirements of various domains, various list-based itemset mining extensions have been proposed, which take constraints and more complex data types into account. Wan et al. [38] first adopted fuzzy-lists to mine high-fuzzy itemsets from quantitative transaction databases. They proposed that the remaining fuzzy utility of each fuzzy itemset is a tighter upper bound than FUUB [23]. And the extensive experiments reveal the novel upper-bound plays an important role in their algorithm. Furthermore, Cui et al. [8] considered frequency constraint and utilized fuzzy-lists to find valuable and interesting fuzzy rare itemsets. In transaction databases, high utility itemsets do not provide information about the purchase quantities of items. Then, Nouioua et al. [31] designed the utility lists of $q$ -itemsets to solve the issue. Chen et al. [6] noticed most existing HUIM algorithms assume that itemsets always occur regardless of the period. However, this assumption is not realistic in most cases, such as the sold volume of ice cream. Hence, they considered the rich information (e.g., quantity and period) in databases and adopted utility-lists to discover on-shelf high-utility quantitative itemsets. In addition, recently, Chen et al. [4] mined high utility-occupancy patterns from uncertain data by probability-utility-occupancy list and probability-frequency-utility table. Considering the existence probability of items will be more realistic than assuming the items must occur in transactions.

Ii-B Itemset Mining Using Fuzzy Theory

Since Srikant and Agrawal [35] first proposed a fuzzy mining approach to discover quantitative association rules, they have opened a novel fuzzy pattern mining domain. Though their proposed approach utilizes a naive method (i.e., the generate-and-test mechanism), which leads to inefficiency, the discovered results of the new algorithm can be more convenient for helping decision makers to understand and use. Then, Chan and Au [3] figured out that the discovered quantitative association rules by discretizing the domains of quantitative attributes into intervals is not concise and meaningful enough. Therefore, they employed linguistic terms to reveal quantitative association rules by setting membership functions in advance. They called the mining rules as fuzzy association rules, because their proposed algorithm adopts the fuzzy set theory. Later, Kuok et al. [22] computed the membership value of a high-level fuzzy itemset by applying a minimum operation to get the overlap value of membership regions in its consisting fuzzy items. Since the user-predefined membership function may divide an item into several membership regions, Hong et al. [16] noticed that using the maximal membership value as the fuzzy value of the item allows the item to be generalized to a certain extent. This idea also affects many subsequent fuzzy pattern mining algorithms. The study [32] employed a heuristic method and tree structure to mine fuzzy association rules. The experimental results showed that their proposed algorithm performs better than previous Apriori-based fuzzy association rule mining algorithms. Then, there are also several variations on the original FP-Growth algorithm [15] proposed to discover fuzzy frequent itemsets from quantitative databases [24, 25, 26]. However, frequent itemsets often stand in front of the barrier of discovering frequent but low profit patterns.

As similar to traditional high utility itemset mining, Wang et al. [39] first proposed fuzzy utility mining (FUM) domain to solve the above issue. They had successfully extended HUIM with fuzzy sets to handle quantitative transaction databases. Subsequently, Lan et al. [23] proposed another different kind of FUM algorithm named TPFU. TPFU assesses the utility of an item based on both its linguistic terms defined by users and the membership values of terms scoped in a user-predefined membership function. Though TPFU belongs to two-phase approach, it has integrated minimum operation and maximal membership value into the fuzzy utility mining well. Recently, Wan et al. [38] proposed a list-based fuzzy utility mining approach called FUIM, and the novel algorithm outperforms TPFU in terms of runtime and memory usage since FUIM is a one-phase approach. However, the time attribute of a discovered pattern plays an important role in many real applications, like weblog analysis, business decisions, and so on. Huang et al. [19] therefore defined a novel task of discovering high temporal fuzzy utility itemsets from temporal quantitative databases (temporal fuzzy utility itemset mining, TFUIM). Then, Hong et al. [17, 18] have successively published two TFUIM algorithms (i.e., FHTFUP and ATTFUM) which are implemented by tree structure. In particular, ATTFUM is a one-phase algorithm and adopts an array-embedded tree structure. It utilizes an array-list to keep key information about interesting fuzzy itemset in each tree node. Recently, Ye et al. [42] adopted TF-Lists to compress the temporal quantitative databases. The experiments revealed that the novel algorithm performs better than previous algorithms on dense databases.

Iii Preliminaries and Problem Statement

This section firstly introduces some basic and commonly used notations and definitions in this paper. Most of them are proposed from previous studies [19, 18, 42]. A finite set $I$ = { $x_{1}$ , $x_{2}$ , $\dots$ , $x_{n}$ } is consisted of $n$ distinct items. Itemset and transaction are both subsets of $I$ . The difference between them is that a transaction can be regarded as an $k$ -itemset, which means that it contains $k$ different items (1 $\leq$ $k$ $\leq$ $n$ ). In addition, the length of $k$ -itemset $X$ is denoted as $| X |$ = $k$ . Each transaction is assigned a unique transaction identification (simplified as Tid). A period is a set of spatio-temporal data that may contain zero, one, or more transactions ( $P$ = { $T_{1}$ , $\dots$ , $T_{m}$ }). We also call a temporal quantitative transaction if it is contained in a period, and a transaction cannot belong to two or more different periods at the same time. The fuzzy set is defined by the membership function, and the membership value range is [0, 1]. Fuzzy itemsets that are contained in temporal quantitative transactions are called temporal fuzzy itemsets. Then, a temporal quantitative database TQD consists of one or more periods.

In this paper, we take a temporal quantitative database (Table I) and a predefined membership function (Fig. 1) as our running example. There are six distinct items (i.e., $A$ , $B$ , $C$ , $D$ , $E$ , and $F$ ), ten transactions, and five periods in the running example database (TQD)¹¹1In order to make the expression more concise, we will always use the symbol “TQD” instead of “temporal quantitative database” during discussion.. The number in each row of TQD represents the internal utility (e.g., quantity) of the corresponding item (which is listed in the first row). Furthermore, the external utility (e.g., profit) of each item is shown in Table II. Finally, we will formalize the problem definition for TFUIM.

Fig. 1: The predefined membership function.

Period	Tid	A	B	C	D	E	F
$P_{1}$	$T_{1}$	1	0	3	0	1	2
$P_{1}$	$T_{2}$	0	4	0	3	0	1
$P_{2}$	$T_{3}$	0	2	0	1	2	0
$P_{2}$	$T_{4}$	2	0	6	1	3	0
$P_{3}$	$T_{5}$	0	1	6	3	0	3
$P_{3}$	$T_{6}$	0	3	0	0	1	7
$P_{4}$	$T_{7}$	2	4	1	5	8	1
$P_{4}$	$T_{8}$	1	6	2	0	1	0
$P_{5}$	$T_{9}$	7	0	0	3	0	0
$P_{5}$	$T_{10}$	3	1	0	6	1	9

TABLE I: A sample temporal quantitative database

Item	A	B	C	D	E	F
Utility	9	5	4	2	1	7

TABLE II: The external utility of each item

Definition 1

As previous content introduced, the number of linguistic terms of an item is depended on the regions of the given membership function. The fuzzy set of an item $x_{i}$ in a temporal quantitative transaction $T_{j}$ is defined as $f_{i j}$ = { $\frac{f_{i j 1}}{R_{i 1}}$ + $\dots$ + $\frac{f_{i j l}}{R_{i l}}$ + $\dots$ + $\frac{f_{i j h}}{R_{i h}}$ }. Fuzzy membership degree $f_{i j l}$ is calculated from quantity $q (x_{i}, T_{j})$ and region $R_{i l}$ .

For example, consider the TQD in Table I and the membership function shown in Fig. 1, since the quantity of $A$ in transaction $T_{9}$ is 7 that the region of $A$ in $T_{9}$ are Middle and High. Based on the membership function, the region values of $R_{A, Middle}$ and $R_{A, High}$ are 0.67 and 0.33, respectively. Therefore, the fuzzy set of item $A$ in $T_{9}$ ( $f_{A, T_{9}}$ ) is { $\frac{0.67}{A . Middle}$ + $\frac{0.33}{A . High}$ }.

Definition 2

The fuzzy utility of $l$ -th fuzzy region of an item $x_{i}$ in a temporal quantitative transaction $T_{j}$ is defined as fu $_{i j l}$ ( $x_{i}$ , $T_{j}$ ) = $f_{i j l} \times q (x_{i}, T_{j}) \times p (x_{i})$ . And the fuzzy utility of $x_{i}$ in TQD is the summation of fu $_{i j l}$ ( $x_{i}$ , $T_{j}$ ). That is, fu $_{i l}$ ( $x_{i}$ ) = $\sum_{x_{i} \in T_{j} \land T_{j} \in TQD}$ fu $_{i j l}$ ( $x_{i}$ , $T_{j}$ ). Besides, the fuzzy utility of a fuzzy itemset $X$ in $T_{j}$ is denoted as fu $_{j X}$ ( $X$ , $T_{j}$ ) = $f_{j X}$ $\times$ $\sum_{x_{i} \in X}$ [ $q (x_{i}, T_{j}) \times p (x_{i})$ ], where $f_{j X}$ = Min{ $f_{ijl} | x_{i} \in X \land R_{i l} \in x_{i}$ }. Similarly, the fuzzy utility of $X$ in TQD is defined as fu $_{X}$ = $\sum_{X \subseteq T_{j} \land T_{j} \in TQD}$ fu $_{j X}$ ( $X$ , $T_{j}$ ). In addition, it should be pointed out that different fuzzy regions of an item cannot occur in a same fuzzy itemset in the meanwhile. In other words, an item cannot both Low and Middle in a fuzzy itemset together.

For instance, the item $C$ occurs in transactions $T_{1}, T_{4}, T_{5}, T_{7}$ and $T_{8}$ . The quantity of $C$ in these transactions are 3, 6, 6, 1 and 2, respectively. Based on the Fig. 1 and above definitions, $f_{C, T_{8}, Low}$ is 0.67, and then fu $_{C, T_{8}, Low}$ ( $C$ , $T_{8}$ ) is 0.67 $\times$ 2 $\times$ 4, which is 5.36. By the same way, fu $_{C, T_{1}, Low}$ ( $C$ , $T_{1}$ ) and fu $_{C, T_{7}, Low}$ ( $C$ , $T_{7}$ ) are 12 and 1.32, respectively. In TQD, fu $_{C.Low}$ is 5.36 + 12 + 1.32 = 18.68 and fu $_{C.Middle}$ is 48. Consider a fuzzy itemset {A.Low, C.Middle} in $T_{4}$ . Since the minimum fuzzy membership degree of {A.Low, C.Middle} is 0.67, and thus fu $_{A.Low, C.Middle}$ is 0.67 $\times$ (2 $\times$ 9 + 6 $\times$ 4) = 28.14.

Definition 3

The fuzzy utility of a $T_{j}$ in TQD is defined as tfu $_{j}$ = $\sum_{x_{i} \in T_{j} \land f_{i j l} \in f_{i j}}$ fu $_{i j l}$ ( $x_{i}$ , $T_{j}$ ). In addition, the start transaction period STP $_{i z}$ of $x_{i}$ is firstly occurring time period $P_{z}$ in TQD of the transaction ( $x_{i} \in T_{j}$ ). The last transaction periods (LTP) of an itemset $X$ is a set of periods from the latest start transaction periods of all items in $X$ to the end period of TQD. Then, the temporal fuzzy utility ratio of a fuzzy itemset $X$ is formulated as tfur $_{X}$ = $\frac{\sum_{X \subseteq T_{j} \land T_{j} \in {LTP}_{X}} {fu}_{j X}}{\sum_{T_{j} \in {LTP}_{X}} {tfu}_{j}}$ .

For example, STP $_{B.Low}$ is $P_{1}$ because B.Low first appears in $T_{2}$ , and $T_{2}$ belongs to $P_{1}$ . Take the fuzzy itemset {B.Low, C.Middle} as an example. STP $_{B.Low}$ and STP $_{C.Middle}$ are $P_{1}$ and $P_{6}$ , respectively. Therefore, the LTP of {B.Low, C.Middle} is a set of { $P_{2}$ , $P_{3}$ , $P_{4}$ , $P_{5}$ }.

Definition 4

Given a user-defined minimal utility threshold $γ$ , a fuzzy itemset $X$ is assumed to be a high temporal fuzzy utility itemset (abbreviated as HTFUI) if and only if its tfur $_{X}$ is higher than or equal to $γ$ . Otherwise, $X$ is a low temporal fuzzy utility itemset (LTFUI), which is useless for users.

For example, if $γ$ = 20%²²2Unless specifically specified, without loss of generality, we are going to default that $γ$ is 20% in this paper., the temporal fuzzy utility ratio of {A.Low, F.High} is 52.03% $>$ 20%, and thus it is an HTFUI. Table III lists other HTFUIs when $γ$ is 20%.

HTFUIs	tfur
${$ A.Low, F.High $}$	52.03%
${$ A.Middle $}$	24.4%
${$ D.Low, A.Middle $}$	26.73%
${$ D.Middle, A.Low $}$	22.53%
${$ D.Middle, A.Low, F.High $}$	58.97%
${$ D.Middle, F.High $}$	43.36%
${$ F.High $}$	36.42%

TABLE III: A set of high temporal fuzzy utility itemsets

Problem statement. Based on previous introduced definitions, given a user-specified minimal utility threshold and a user-defined membership function, we define the problem of temporal fuzzy utility itemset mining (TFUIM) task as discovering the set of high temporal fuzzy utility itemsets in a temporal quantitative database.

Iv The Proposed Flexible Algorithm

It can be found that {A.Middle} and {D.Low, A.Middle} are HTFUIs, but {D.Low} is not in Table III. Note that the temporal fuzzy utility ratio is neither anti-monotonic nor monotonic. We will discuss how to make our approach hold downward closure property in this section. We also use the temporal remaining fuzzy utility upper-bound to improve TFUM’s performance. The details will be introduced in the following.

Iv-a Downward Closure Property

Definition 5

The maximal fuzzy utility of a fuzzy item $x_{i}$ in a temporal quantitative transaction $T_{j}$ is denoted as mfu $_{i j}$ = Max{fu $_{i j 1}$ , fu $_{i j 2}$ , $\dots$ , fu $_{i j l}$ }. Then, the summation of the maximal fuzzy utility of all fuzzy items in $T_{j}$ can be a pre-evaluated value of $x_{i}$ . That is, mtfu $_{j}$ = $\sum_{x_{i} \in T_{j}}$ mfu $_{i j}$ . Obviously, the maximal fuzzy utility of any $x_{i}$ in $T_{j}$ is never higher than the maximal fuzzy utility of $T_{j}$ (i.e., mfu $_{i j}$ $\leq$ mtfu $_{j}$ ).³³3Specially, mtfu is the upper-bound of all fuzzy items in corresponding transactions, but tfu is not in this paper (as shown in Table IV). Please see the study [18] for more information on the proof.

For example, consider $T_{7}$ in Table I. The quantity of the item $D$ is 5, and then the values for $f_{D, T_{7}, Low}$ and $f_{D, T_{7}, Middle}$ are 0.33 and 0.67, respectively. Hence, mfu $_{D, T_{7}}$ is Max{fu $_{D, T_{7}, Low}$ , fu $_{D, T_{7}, Middle}$ } = Max{0.33 $\times$ 5 $\times$ 2, 0.67 $\times$ 5 $\times$ 2} = 6.7. The mfu of other items in $T_{7}$ can be calculated with the same method. Therefore, mtfu $_{T_{7}}$ = fu $_{A, T_{7}, Low}$ + fu $_{B, T_{7}, Low}$ + fu $_{C, T_{7}, Low}$ + fu $_{D, T_{7}, Middle}$ + fu $_{E, T_{7}, High}$ + fu $_{F, T_{7}, Low}$ = 12.06 + 13.4 + 1.32 + 6.7 + 5.36 + 2.31 = 41.15.

Definition 6

We defined the start transaction period (STP) of all fuzzy items as STP $_{all}$ = max $_{TP}$ {STP $_{1}$ , STP $_{2}$ , $\dots$ , STP $_{n}$ }, where $n$ is the number of fuzzy items, and max $_{TP}$ operation has the latest time period (LTP) of the attached parameters. In addition, LTP $_{all}$ means all the time periods from STP $_{all}$ to the last time period of TQD.

For example, the STP of F.High is $T_{10}$ and $T_{10}$ is the last transaction in TQD, that is STP $_{all} = T_{10}$ . Then LTP $_{all}$ contains only one period $P_{5}$ .

Definition 7

The temporal fuzzy utility upper-bound ratio of an itemset $X$ is defined as tfuubr $_{X}$ = $\frac{\sum_{X \subseteq T_{j} \land T_{j} \in {LTP}_{X}} {mtfu}_{j X}}{} \sum_{T_{j} \in {LTP}_{all}} {tfu}_{j}$ ⁴⁴4In this paper, we use itemset instead of fuzzy itemset will get a more loose upper-bound than before, but experiments show this case does not affect obtaining the right final results.. Furthermore, if tfuubr $_{X}$ is higher than or equal to the user-defined minimal threshold $γ$ , it will be assumed as a high temporal fuzzy utility upper-bound itemset (HTFUUI) and should be further checked for its temporal fuzzy utility ratio. Otherwise, it is a LTFUI. Hence, the set of HFUIs is actually a subset of HTFUUIs. The proof details can be referred to the study [19].

Tid	$T_{1}$	$T_{2}$	$T_{3}$	$T_{4}$	$T_{5}$
tfu	29.3	28.31	8.7	39.72	52.65
mtfu	24.68	21.71	8.7	39.72	52.65
Tid	$T_{6}$	$T_{7}$	$T_{8}$	$T_{9}$	$T_{10}$
tfu	64.33	53.69	38.66	69	103.98
mtfu	48.16	41.15	38.66	48.21	103.98

TABLE IV: A comparison table (tfu vs. mtfu)

Property 1

Considering the definition of the temporal fuzzy utility upper-bound ratio, the denominator always stays the same. Therefore, the value of molecular shows a large fraction of the formula. Given a fuzzy itemset $X$ , it is clear that the number of temporal quantitative transactions containing $X$ is always greater than the number of its superset $X^{'}$ . Then, the in-equation tfuubr $_{X}$ $\geq$ tfuubr $_{X^{'}}$ $\geq$ tfur $_{X^{'}}$ is always true.

Strategy 1

Based on Property 1, if a temporal fuzzy utility upper-bound ratio of a fuzzy itemset is less than the user-defined minimal fuzzy utility threshold $γ$ , we can directly ignore it and its supersets during the mining process.

Since tfuubr $_{X}$ is always higher than or equal to tfuubr $_{X^{'}}$ , we define the symbol $≺$ as a global descending order of all single items from $I$ to help prune the search space. Thus, a temporal quantitative transaction is “revised” when all the contained items in it are HTFUUIs and all remaining items are sorted by the global descending order (“ $≺$ ”). Besides, if a database consists of revised temporal quantitative transactions, the database is considered as TQD $^{'}$ .

Iv-B TP-table for Fast Locating

In previous sections, we introduced several basic concepts of temporal fuzzy utility and a revised temporal quantitative database. Since a period may contain zero or one or even more temporal quantitative transactions, it is foreseeable that calculating the temporal fuzzy utility of a fuzzy itemset by scanning over and over again is unacceptable. In this subsection, a transactions-period table (TP-table) structure is proposed to record the relationships between transactions and the periods that include them, which is defined as follows.

Definition 8

Given a temporal quantitative database TQD, a TP-table can be created by scanning the TQD once, where each temporal quantitative transaction $T_{j}$ is used as an index to indicate the period $P_{i}$ that contain it. That is, $P_{i}$ = { $T_{j}$ $|$ $T_{j} \in P_{i}$ $\land T_{j}$ $\in TQD$ }.

Considering Table I, a typical example of a TP-table is given in Table V, and a TP-table consists of two fields, including period and Tids. In general, a naive method adopts a matrix array (5 $\times$ 10), that the first column stores the period, and the rest columns are used to store Tids. Furthermore, bitmaps can also be integrated into the matrix to save memory. What’s more, we implement TP-table by hash-map in our work, which accelerates the locating process effectively. According to the utilization of the TP-table, it is clear that it can save memory and runtime consumption because massive unnecessary times of scanning databases are reduced. As databases tend to be large, the improved effect will be more significant.

Period	Tids
$P_{1}$	$T_{1}$ , $T_{2}$
$P_{2}$	$T_{3}$ , $T_{4}$
$P_{3}$	$T_{5}$ , $T_{6}$
$P_{4}$	$T_{7}$ , $T_{8}$
$P_{5}$	$T_{9}$ , $T_{10}$

TABLE V: An example of TP-table

Iv-C Tighter Remaining Measure

The mining mechanism of the proposed approach is a depth-first method. The search space is a set-enumeration tree [34] with an empty root. The first layer of the tree is filled with single items (1-itemsets) from $I$ . Then, 2-itemsets are the child nodes of these 1-itemsets in the second layer. And there are third, fourth layers, and so on in the same manner. Note that the total number of nodes in the search space exceeds $2^{| I |}$ . Hence, performing a downward traversal of every branch by a naive method is unrealistic. We next introduce several definitions related to the exploration of fuzzy itemsets in depth-first search space.

Definition 9

In the search space, the set of extension fuzzy items $x_{i}$ of a fuzzy itemset $X$ is sorted behind $X$ in $I$ . In this paper, we adopt the global descending order (“ $≺$ ”) to obtain large fuzzy itemsets and avoid generating the same fuzzy itemset repeatedly. That is, $E$ ( $X$ ) = { $x_{i}$ $|$ $x_{i} \in I$ $\land x_{c}$ $≺ x_{i}$ , $\forall x_{c}$ $\in X$ }. Then, consider $X$ in a revised temporal quantitative transaction $T_{j}$ , and the set of all fuzzy items after $X$ is named remaining fuzzy items of $X$ and denoted as $T_{j}$ / $X$ = { $x_{i}$ $|$ $x_{c}$ $≺ x_{i}$ , $\forall x_{c}$ $\in X$ $\land$ $x_{i}$ $\in$ $T_{j}$ }.

Since the maximal fuzzy utility of an item is the upper-bound in the fuzzy set of the item and a fuzzy itemset does not contain fuzzy items from the same fuzzy set, it is feasible to take the maximal fuzzy utility of an item as the estimation of the extended fuzzy utility of a fuzzy itemset. In other words, the remaining maximal fuzzy utility of a fuzzy itemset represents how much the fuzzy utility of it can increase in the subsequent exploring process.

Definition 10

Given a fuzzy itemset $X$ in a temporal quantitative transaction $T_{j}$ , the utility of remaining fuzzy items $x_{i}$ $\in$ $T_{j}$ / $X$ is named remaining maximal temporal fuzzy utility, and defined as rmtfu $_{j X}$ = $\sum_{x_{i} \in T_{j} / X}$ mfu $_{i j}$ .

For example, consider a fuzzy item B.Low in the revised temporal quantitative transaction $T_{7}$ . $T_{7}$ / B.Low = {B.Middle, E.Middle, E.High, A.Low, F.Low, C.Low}, since B.Low and B.Middle can not occur in a same fuzzy itemset, thus rmtfu $_{T_{7}, B.Low}$ = mtfu $_{E, T_{7}}$ + mtfu $_{A, T_{7}}$ + mtfu $_{F, T_{7}}$ + mtfu $_{C, T_{7}}$ = 5.36 + 12.06 + 2.31 + 1.32 = 21.05.

Iv-D Revised Temporal Fuzzy-List Structure

In the FUMT algorithm [42], the temporal fuzzy-list structure (TF-List) was proposed to store HTFUUIs information. The TF-List significantly reduces the database scanning times. However, we notice that FUMT takes the maximal fuzzy utility of fuzzy items in the quantitative transaction as the upper-bound, which is too loose to effectively prune the search space. Thus, we modify the TF-List based on the remaining measure in our work. We discuss and prove the superiority of the remaining measure in this subsection.

Fig. 2: The temporal fuzzy-lists of some HTFUUIs $_{1}$ .

Definition 11

A temporal fuzzy-list [42] of a fuzzy itemset $X$ is a set of tuples. As shown in Fig. 2, the tuple consists of five elements: 1) transaction identification (Tid) shows how many transactions contain $X$ ; 2) fu records the fuzzy utility of $X$ in the corresponding temporal quantitative transaction; 3) $R$ reveals the fuzzy region values of $X$ in different transactions; 4) the latest STP ( $L_{STP}$ ); and 5) the sum of maximal fuzzy utility of the temporal quantitative transaction (sumMtfu).

After FUMT discovers a complete set of high temporal fuzzy utility upper-bound 1-itemsets (HTFUUIs $_{1}$ ) by scanning the temporal quantitative database (TQD) for the first time, a revised temporal quantitative database (TQD $^{'}$ ) is obtained. FUMT then traverses TQD $^{'}$ to construct the temporal fuzzy-lists of HTFUUIs $_{1}$ (TF-List $_{1}$ ). Take fuzzy item {F.Low} in Table I as an example. Since the {F.Low} occurs in the transactions $T_{1}$ , $T_{2}$ , $T_{5}$ and $T_{7}$ , the STP of {F.Low} is $P_{1}$ . From the Table IV we can know the mtfus of {F.Low} are 24.68, 21.71, 52.65, and 41.15, respectively. Then the sumMtfu can be calculated as (24.68 + 21.71 + 52.65 + 41.15), which is 140.19. During the process of obtaining the mtfu and calculating the sumMtfu, it stores the Tid, fu and R in the list. By the same way, the FUMT algorithm constructs the TF-List $_{1}$ . In addition, sumMtfu is an upper-bound that keeps decreasing. Its value reflects how much the fuzzy utility of fuzzy itemset can increase. Considering the fuzzy 2-itemset {B.Low, A.Low}, the itemset {B.Low, A.Low} appears in $T_{7}$ and $T_{10}$ . The mtfu of these two transactions is 41.15 and 103.98, respectively. Therefore, the sumMtfu of the itemset {B.Low, A.Low} is 145.13. As for the latest STP, since the STP of B.Low and A.Low are both $P_{1}$ that the latest STP of {B.Low, A.Low} is $P_{1}$ . The information for Tid, fu and R is recorded in the same manner.

However, taking {B.Low, A.Low} as the same sample, the remaining maximal fuzzy utility of it is computed as {B.Low, A.Low}.rmtfu = 63, which is far less than its sumMtfu. Fig. 3 lists rmtfu of other HTFUUIs $_{1}$ .

Definition 12

Based on the remaining fuzzy utility definition, we modify the temporal fuzzy-list structure as the revised temporal fuzzy-list (RTF-List) structure. An RTF-List consists of a set of tuples, and each tuple contains four elements: 1) transaction identification (Tid); 2) utility of itemset ( $u$ ); 3) remaining maximal temporal fuzzy utility (rmtfu); and 4) fuzzy region value ( $R$ ).

Fig. 3: The revised temporal fuzzy-lists of several HTFUUIs $_{1}$ .

There is no need for a database scan because the proposed method generates fuzzy 2-itemsets by joining RTF-Lists of different low-level fuzzy itemsets. According to Definition 2, the fuzzy utility of an itemset is utility multiplied by region value. In order to reduce the amount of calculating, the RTF-List structure records the utility of an item. We also take {B.Low, A.Low} as illustrations. The common Tids in two RTF-Lists of B.Low and A.Low are $T_{7}$ and $T_{10}$ . In other words, the high-level fuzzy itemset {B.Low, A.Low} occurs in these temporal quantitative transactions. Thus, RTF-List $_{{B.Low, A.Low}}$ contains two tuples. The u of {B.Low, A.Low} in each tuple is the summation of the utility of B.Low and A.Low. And the {B.Low, A.Low}.rmtfu is equal to that of A.Low because A.Low is a remaining fuzzy item of B.Low. Finally, Fig. 4 lists the details of RTF-List $_{{B.Low, A.Low}}$ and others.

Fig. 4: The revised temporal fuzzy-lists of several fuzzy 2-itemsets.

Then we consider the RTF-List of fuzzy $k$ -itemset ( $k \geq 3$ ) and take the same manner as described previously. The difference between intersecting RTF-Lists of fuzzy ( $k$ -1)-itemsets and constructing RTF-List of fuzzy 2-itemset is that u of fuzzy $k$ -itemset has double-calculated the fuzzy utility of its prefix fuzzy itemset ({ $x_{1}$ , $\dots$ , $x_{k - 2}$ }). Note that this is a miscalculation. For example, in Fig. 4, the $u$ of the element associated with $T_{c}$ in RTF-List $_{{B.Low, A.Low, F.High}}$ is calculated as {B.Low, A.Low, F.High}.u = {B.Low, A.Low}.u + {B.Low, F.High}.u - {B.Low}.u.

Definition 13

In RTF-List of a fuzzy itemset $X$ , the sum of fu elements is denoted as $X$ .sumFu = $\sum_{T_{j} \in {RTF-List}_{X}}$ fu $_{j X}$ ( $X$ , $T_{j}$ ), and the sum of rmtfu elements is denoted as $X$ .sumRmtfu = $\sum_{T_{j} \in {RTF-List}_{X}}$ rmtfu $_{j X}$ ( $X$ , $T_{j}$ ).

Property 2

Since the RTF-List contains all rmtfu information of a fuzzy itemset $X$ , sumRmtfu can reflect how much the fuzzy utility of $X$ increases in TQD. Hence, if $X$ .sumFu + $X$ .sumRmtfu $\geq$ $γ$ $\times$ $\sum_{T_{j} \in {LTP}_{all}}$ tfu $_{j}$ is true, there must exist one or more fuzzy super-itemsets of $X$ are HTFUIs.

Strategy 2

Given a fuzzy itemset $X$ and its RTF-List $_{X}$ . If the condition $X$ .sumFu $+$ $X$ .sumRmtfu $<$ $γ$ $\times$ $\sum_{T_{j} \in {LTP}_{all}}$ tfu $_{j}$ is true, there is no need to explore the search space for $X$ .

Proof:

Given a fuzzy itemset $X$ and its extension $X^{'}$ ( $X \subset X^{'}$ ), and for $\forall X^{'} \subseteq T_{j}$ there is ( $X^{'} - X$ ) = ( $X^{'}$ / $X$ ).

∵

X \subset X^{'} \subseteq T_{j}

\Rightarrow

(

X^{'} / X

)

\subseteq

(

T_{j} / X

$∴$ fu $_{j X^{'}}$ ( $X^{'}$ , $T_{j}$ )	= fu $_{j X}$ ( $X$ , $T_{j}$ ) + fu $_{j (X^{'} - X)}$ (( $X^{'} - X$ ), $T_{j}$ )
	= fu $_{j X}$ ( $X$ , $T_{j}$ ) + fu $_{j (X^{'} / X)}$ (( $X^{'} / X$ ), $T_{j}$ )
	= fu $_{j X}$ ( $X$ , $T_{j}$ ) + $\sum_{x_{i} \in X^{'} / X}$ fu $_{j x_{i}}$ ( $x_{i}$ , $T_{j}$ )
	$\leq$ fu $_{j X}$ ( $X$ , $T_{j}$ ) + $\sum_{x_{i} \in T_{j} / X}$ fu $_{j x_{i}}$ ( $x_{i}$ , $T_{j}$ )
	$\leq$ fu $_{j X}$ ( $X$ , $T_{j}$ ) + $\sum_{x_{i} \in T_{j} / X}$ mfu $_{j x_{i}}$ ( $x_{i}$ , $T_{j}$ )
	= fu $_{j X}$ + rmtfu $_{j X}$ .

Suppose a Tid set of $X$ in list is denoted as $| {RTF-List}_{X} |$ , and a Tid set of $X^{'}$ in LTP $_{X^{'}}$ is defined as $| {LTP}_{X^{'}} |$ , then:

∵

| {RTF-List}_{X} |

\geq

| {LTP}_{X^{'}} |

, and LTP

_{all}

\subseteq

LTP

_{X^{'}}

$∴$ tfur $_{X^{'}}$	= $\sum_{X \in T_{j} \land T_{j} \in {LTP}_{X^{'}}} {fu}_{j X^{'}}$ $/$ $\sum_{T_{j} \in {LTP}_{X^{'}}} {tfu}_{j}$
	$\leq$ $\sum_{T_{j} \in {RTF-List}_{X}} ({fu}_{j X} + {rmtfu}_{j X})$ $/$ $\sum_{T_{j} \in {LTP}_{X^{'}}} {tfu}_{j}$
	$\leq$ $\sum_{T_{j} \in {RTF-List}_{X}} ({fu}_{j X} + {rmtfu}_{j X})$ $/$ $\sum_{T_{j} \in {LTP}_{all}} {tfu}_{j}$
	= $X . sumFu + X . sumRmtfu$ $/$ $\sum_{T_{j} \in {LTP}_{all}} {tfu}_{j}$ .

The proof is done. \qed

Property 3

Assume that the intersection of two distinct RTF-Lists (RTF-List $_{X}$ and RTF-LIst $_{Y}$ ) is not empty. In fact, Strategy 2 considers all elements of RTF-List $_{X}$ . Thus, we propose a tighter remaining upper-bound than the last one.

Strategy 3

Given two fuzzy itemsets $X$ and $Y$ , where $X \neq Y$ and RTF-List $_{X}$ $\cap$ RTF-List $_{Y}$ $\neq \emptyset$ . If $X$ .sumFu + $X$ .sumRmtfu $-$ $\sum_{Y ⊈ T_{j} \land T_{j} \in {RTF-List}_{X}}$ ( $X$ .fu + $X$ .rmtfu) $<$ $γ$ $\times$ $\sum_{T_{j} \in {LTP}_{all}} {tfu}_{j}$ , all supersets of XY can be pruned safely.

Proof:

Assume the extension of fuzzy itemsets $X$ and $Y$ are $X^{'}$ and $Y^{'}$ , respectively. Similarly, the extension of XY is X $^{'}$ Y $^{'}$ .

∵

\sum_{T_{j} \in {RTF-List}_{X}}

(fu

_{j X}

+

rmtfu

_{j X}

)

\sum_{T_{j} \in {RTF-List}_{X} \land Y \subseteq T_{j}}

(fu

_{j X}

+

rmtfu

_{j X}

)

+

\sum_{T_{j} \in {RTF-List}_{X} \land Y ⊈ T_{j}}

(fu

_{j X}

+

rmtfu

_{j X}

∴

\sum_{T_{j} \in {RTF-List}_{X} \land Y \subseteq T_{j}}

(fu

_{j X}

+

rmtfu

_{j X}

)

\sum_{T_{j} \in {RTF-List}_{X}}

(fu

_{j X}

+

rmtfu

_{j X}

)

-

\sum_{T_{j} \in {RTF-List}_{X} \land Y ⊈ T_{j}}

(fu

_{j X}

+

rmtfu

_{j X}

)

\sum_{T_{j} \in {RTF-List}_{XY}}

(fu

_{j XY}

+

rmtfu

_{j XY}

Assume that $T_{p} \in {RTF-List}_{XY}$ , $T_{q} \in {RTF-List}_{X^{'} Y^{'}}$ , and $T_{j} \in {RTF-List}_{X}$ , thus $T_{q} \subseteq T_{p} \subseteq T_{j}$ .

$∴$ fu $_{X^{'} Y^{'}}$	= $\sum_{T_{q} \in {RTF-List}_{X^{'} Y^{'}}}$ fu $_{q X^{'} Y^{'}}$
	$\leq$ $\sum_{T_{p} \in {RTF-List}_{XY}}$ (fu $_{p XY}$ $+$ rmtfu $_{p XY}$ )
	= $\sum_{T_{j} \in {RTF-List}_{X}}$ (fu $_{j X}$ $+$ rmtfu $_{j X}$ )
	$-$ $\sum_{T_{j} \in {RTF-List}_{X} \land Y ⊈ T_{j}}$ (fu $_{j X}$ $+$ rmtfu $_{j X}$ ).

Based on the proof details of Property 2, Property 3 is proved. \qed

Iv-E The TFUM Algorithm

The pseudocode of TFUM is presented in Algorithm 1. It takes three input parameters: 1) a quantitative transaction database that is temporal; 2) a user-specified minimum threshold; and 3) a pre-defined membership function. The TFUM algorithm firstly initializes both the TP-table and high temporal fuzzy utility of fuzzy 1-itemsets as empty (Line 1). In Lines 2–10, TFUM aims to prepare for the subsequent mining process, which includes updating the TP-table and HTFUUIs $_{1}$ , adding the start time period of each fuzzy item to STP-List, and STP $_{all}$ . In order to make the following discussion easier, we use the symbol $δ$ to represent $γ \times \sum_{T_{j} \in {LTP}_{all}} {tfu}_{j}$ (Line 11). Since a complete set of high temporal fuzzy utilities of fuzzy 1-itemsets is collected, TFUM will get a global order “ $≺$ ” and a revised temporal quantitative database in Lines 12 and 13. Subsequently, in Line 14, the revised temporal fuzzy-lists of HTFUUIs $_{1}$ will be generated, and then TFUM keeps relevant information about fuzzy itemsets from the revised database in memory. In Line 15, after TFUM has finished initializing the prefix fuzzy itemset $P$ and its list RTF-List $_{P}$ , TFUM calls a recursive method until no more high temporal fuzzy utility itemsets are generated (Line 16). Finally, a complete set of HTFUIs will be output when the TFUM algorithm terminates.

Input: TQD: a temporal quantitative transaction database;

γ

: a user-specified minimum temporal fuzzy utility threshold;

f (R)

: a pre-defined membership function.

Output: HTFUIs: a complete set of temporal fuzzy high utility itemsets.

1 initialize TP-table, STP-List and HTFUUIs

_{1}

as null; for temporal transaction $T_{j}$ $\in$ TQD do

2 update TP-table; for each item $x_{i} \in T_{j}$ do

3 calculate fu of fuzzy items

x_{i l}

f (R)

; STP-List

\leftarrow

the start time period (STP

_{i}

) of

x_{i l}

;

4 end for

5 compute tfu and mtfu of

T_{j}

;

6 end for

STP

_{all}

= max

_{T P}

{STP-List}; set

δ

γ

\times

\sum_{T_{j} \in {LTP}_{all}} {tfu}_{j}

; collect a complete set of HTFUUIs

_{1}

; get the revised database TQD

^{'}

by global order

≺

; scan TQD

^{'}

and construct RTF-Lists

_{1}

of HTFUUIs

_{1}

; initialize prefix fuzzy itemset

P

and RTF-List

_{P}

as null; call Miner(

P

, RTF-List

_{P}

, RTF-List

_{1}

γ

δ

); return HTFUIs

Algorithm 1 The TFUM algorithm

Input:

P

: the prefix fuzzy itemset; RTF-List

_{P}

: a revised temporal fuzzy-list of

P

; RTF-Lists: a set of revised temporal fuzzy-lists;

γ

: a minimum fuzzy utiltiy threshold;

δ

: a fuzzy utility upper-bound.

1 for each RTF-List $_{X}$ $\in$ RTF-Lists do

2 if tfur $_{X}$ $\geq$ $γ$ then

X

is a HTFUI;

4 end if

5 initialize exRTF-Lists as null; if $X$ .sumFu + $X$ .sumRmtfu $\geq$ $δ$ then

6 for each RTF-List $_{Y}$ $\in$ RTF-lists do

7 RTF-List

_{XY}

= Construct(RTF-List

_{P}

, RTF-List

_{X}

, RTF-List

_{Y}

); if XY.sumRmtfu $>$ $δ$ then

8 exRTF-Lists += RTF-List

_{XY}

;

9 end if

11 end for

P

\leftarrow

P

\cup

X

; call Miner(

P

, RTF-List

_{X}

, exRTF-Lists,

δ

γ

);

13 end if

15 end for

Algorithm 2 The Miner procedure

Input: RTF-List

_{P}

: a revised temporal fuzzy-list of prefix fuzzy itemset

P

; RTF-List

_{Px}

: a revised temporal fuzzy-list of fuzzy itemset Px; RTF-List

_{Py}

: a revised temporal fuzzy-list of fuzzy itemset Py.

Output: a revised temporal fuzzy-list of high-level fuzzy itemset Pxy.

1 initialize RTF-List

_{Pxy}

as null; Pxy.L

_{STP}

= Max{Px.L

_{STP}

, Py.L

_{STP}

}; for each element Px $_{e}$ $\in$ RTF-List $_{Px}$ do

2 if $\exists {Py}_{e} \in {RTF-List}_{Py}$ and ${Px}_{e}$ .Tid == ${Py}_{e}$ .Tid then

3 Pxy

_{e}

.R = Min{Px

_{e}

.R, Py

_{e}

.R}; if RTF-List $_{P}$ $\neq$ null then

4 find

P_{e}

\in

RTF-List

_{P}

where

P_{e}

.Tid ==

{Px}_{e}

.Tid;

5 Pxy

_{e}

.u = Px

_{e}

.u + Py

_{e}

.u - P

_{e}

.u;

6 else

7 Pxy

_{e}

.u =

{Px}_{e}

.u +

{Py}_{e}

.u;

8 end if

9 Pxy

_{e}

= (Px

_{e}

.Tid, Pxy

_{e}

.u, Py

_{e}

.rmtfu, Pxy

_{e}

.R); add Pxy

_{e}

into RTF-List

_{Pxy}

;

10 end if

12 end for

return RTF-List

_{Pxy}

Algorithm 3 The Construct procedure

The mining procedure of the TFUM algorithm is shown in Algorithm 2. The miner procedure takes a prefix fuzzy itemset $P$ , a set of revised temporal fuzzy-lists (including RTF-List $_{P}$ ), the minimum threshold $γ$ , and a fuzzy utility upper-bound $δ$ ⁵⁵5 $δ$ = $γ \times \sum_{T_{j} \in {LTP}_{all}} {tfu}_{j}$ , which is described in the last paragraph.. The procedure will traversal each RTF-List $_{X}$ in the input set RTF-Lists. It firstly will check the current fuzzy itemset $X$ is an HTFUI or not (Line 2). If yes, $X$ will be added into HTFUIs set (Line 3). Then, the procedure checks $X$ whether it can be extended or not in the following steps. The procedure initializes RTF-List of the extension fuzzy itemset of $X$ as null, and then assesses the summation of fuzzy utility and remaining maximal temporal fuzzy utility of $X$ is higher than $δ$ or not (Lines 5 and 6). If the condition is true, the miner procedure will examine RTF-List of other fuzzy itemsets $Y$ in RTF-Lists, where $X ≺ Y$ , to determine the super fuzzy itemset XY should be generated or not (Lines 7–12). While there is no more remaining fuzzy itemset of $X$ existing, the procedure will add $X$ into $P$ and recursively call Algorithm 2 itself.

Algorithm 3 shows the procedure of TF-List construction. The procedure joins RTF-Lists of two different fuzzy itemsets and then outputs a novel RTF-List of the fuzzy super-itemset. The construct procedure firstly initials RTF-List $_{xy}$ and Pxy.L $_{STP}$ in Lines 1 and 2. Then, it joins elements of RTF-List $_{Px}$ and RTF-List $_{Py}$ which contain the same Tids (Lines 3–15). If the prefix fuzzy itemset is null, then the procedure computes the sum of utility of two elements directly (Line 10); otherwise, the summation value should minus the utility of prefix fuzzy itemset (Line 8). In addition, binary search method can greatly accelerate the intersection process. In Line 12, a new element of RTF-List $_{Pxy}$ is obtained. And then, the procedure adds it into RTF-List $_{Pxy}$ (Line 13). When the construct procedure terminates, a novel revised temporal fuzzy utility of fuzzy super-itemset Pxy is collected (Line 16).

V An Illustrate Example

To more clearly introduce the TFUM algorithm, in this section, a simple example is given to illustrate how to find temporal fuzzy itemsets from a temporal quantitative database (TQD), as shown in Table I. There are ten transactions, and these ten transactions are attributed to five periods. The six items in TQD are denoted $A$ to $F$ , respectively. In addition, the external utility of all items is listed in Table II. The membership function is shown in Fig. 1. In this example, the minimum threshold $γ$ = 20%. The detailed process of the TFUM algorithm is as follows.

To discover the temporal fuzzy utility itemsets, TP-table (Table V) stores the period information of each transaction.
While traversing each transaction in TQD, the algorithm calculates the fuzzy utility of each item with the given membership function. Then mtfu is the sum of all maximum fuzzy utilities in this transaction, and tfu is the sum of all fuzzy utilities. For instance, according to the membership function, the transaction $T_{7}$ can be transformed to {A.Low, B.Low, B.Middle, C.Low, D.Low, D.Middle, E.Middle, E.High, F.Low}. Since the fuzzy utility of B.Low is larger than that of B.Middle, the algorithm takes B.Low but not B.Middle into account when calculating mtfu. The same manner can be done for all transactions in TQD and the results are shown in Table IV.
In addition, in order to find STP $_{a l l}$ of all fuzzy items, a map STP-List records STP of each fuzzy item. From the STP-List, TFUM finds out the latest STP (i.e., STP $_{a l l}$ ). Since STP $_{a l l}$ is $T_{10}$ which belongs to period $P_{5}$ , LTP $_{a l l}$ is the last period of TQD (i.e., $P_{5}$ ).
For computing the global order, TFUM utilizes a list structure to record all transactions where the item is located, then according to the tfuubr to confirm the order. For instance, item $A$ appears in transactions { $T_{1}$ , $T_{4}$ , $T_{7}$ , $T_{8}$ , $T_{9}$ , $T_{10}$ } and item $F$ appears in { $T_{1}$ , $T_{2}$ , $T_{5}$ , $T_{6}$ , $T_{7}$ , $T_{10}$ }. Then the tfuubr of $A$ and $F$ is 1.71 and 1.69, respectively. Therefore, $A$ $≺$ $F$ . The algorithm sorts the items in transactions using the global order. After revising all transaction in TQD, we will get a TQD $^{'}$ , as shown in Table VI.
TFUM scans TQD $^{'}$ to construct the RTF-List $_{1}$ for each fuzzy item. Take A.Low in Table VI as an example. The item $A$ appears in six transactions, but A.Low appears in only five transactions $T_{1}, T_{4}, T_{7}, T_{8}$ , and $T_{10}$ . According to the positions where A.Low is located, the rmtfu of A.Low can be calculated. Meanwhile, store the utility ( $u$ ) and region value ( $R$ ) of A.Low in each transaction, respectively. The RTF-List $_{A . L o w}$ is shown in Fig 3.

From now on, since the RTF-Lists of all fuzzy 1-itemsets have been constructed, thus invoking the Miner procedure to mine all temporal fuzzy utility itemsets. The items in TQD $^{'}$ have been sorted, and the mining order is consistent with this order. This will ensure the correctness and completeness of the mining results. Take the temporal fuzzy utility itemsets with the prefix D.Middle as an example. The details of the Miner procedure are described as below.

In the Miner procedure, the algorithm calculates the tfur $_{D.Middle}$ at first. For D.Middle, the tfur can be calculated as ( ${fu}_{T_{7}}$ + ${fu}_{T_{10}}$ )/( ${tfu}_{T_{7}}$ + ${tfu}_{T_{8}}$ + ${tfu}_{T_{9}}$ + ${tfu}_{T_{10}}$ ) = (6.7 + 3.96)/(53.69 + 38.66 + 69 + 103.98), which is 4.02%. Since 4.02% is less than 20%, D.Middle is not a HTFUI.
Next, it is easy to compute the tfuubr of D.Middle since the RTF-List $_{D.Middle}$ had already stored the needed information. The tfuubr of D.Middle can be calculated as ((6.7 + 34.45) + (12 + 91.98)) / (69 + 103.98) = 83.9%, which is larger than $γ$ . This means D.Middle can be extended.
Table VII shows the results of constructing the RTF-List $_{2}$ with D.Middle as prefix. From the value of tfuubr, we know that {D.Middle, B.Low}, {D.Middle, B.Middle}, {D.Middle, E.Low}, {D.Middle, A.Low} and {D.Middle, F.High} can be extended. Then, TFUM continues to call the Miner procedure for discovering high temporal fuzzy utility itemsets.
Finally, the whole procedure is completed and the seven high temporal fuzzy utility itemsets in Table III are output to users.

Period	Tid	Revised Transaction (item, quantity)
$P_{1}$	$T_{1}$	( $E$ , 1) ( $A$ , 1) ( $F$ , 2) ( $C$ , 3)
$P_{1}$	$T_{2}$	( $D$ , 3) ( $B$ , 4) ( $F$ , 1)
$P_{2}$	$T_{3}$	( $D$ , 1) ( $B$ , 2) ( $E$ , 2)
$P_{2}$	$T_{4}$	( $D$ , 1) ( $E$ , 3) ( $A$ , 2) ( $C$ , 6)
$P_{3}$	$T_{5}$	( $D$ , 3) ( $B$ , 1) ( $F$ , 3) ( $C$ , 6)
$P_{3}$	$T_{6}$	( $B$ , 3) ( $E$ , 1) ( $F$ , 7)
$P_{4}$	$T_{7}$	( $D$ , 5) ( $B$ , 4) ( $E$ , 8) ( $A$ , 2) ( $F$ , 1) ( $C$ , 1)
$P_{4}$	$T_{8}$	( $B$ , 6) ( $E$ , 1) ( $A$ , 1) ( $C$ , 2)
$P_{5}$	$T_{9}$	( $D$ , 3) ( $A$ , 7)
$P_{5}$	$T_{10}$	( $D$ , 6) ( $B$ , 1) ( $E$ , 1) ( $A$ , 3) ( $F$ , 9)

TABLE VI: A revised temporal quantitative database

Pattern	tfur	tfuubr
${$ D.Middle, B.Low $}$	10.03%	79.25%
${$ D.Middle, B.Middle $}$	6.05%	49.5%
${$ D.Middle, E.Low $}$	2.48%	54.51%
${$ D.Middle, E.Middle $}$	2.24%	12.50%
${$ D.Middle, E.High $}$	4.55%	12.5%
${$ D.Middle, A.Low $}$	21.77%	71.91%
${$ D.Middle, F.Low $}$	2.11%	4.01%
${$ D.Middle, F.High $}$	43.36%	43.36%
${$ D.Middle, C.Low $}$	1.74%	2.67%

TABLE VII: The results of 2-itemsets with D.Middle as prefix

Vi Experimental Results

The experimental algorithms (ATTFUM [18], FUMT [42], and TFUM) are all implemented in the Java language, and we conducted the experiments on a computer with an Intel Core 3.0 GHz processor with 16 GB of RAM and a Windows 10 64-bit operating system.

Dataset description. Four different datasets (BMSPOS, Retail, Mushroom, and T40I10D100k) were used to evaluate the performance of the tested algorithms. All the datasets can be easily downloaded from the open source library⁶⁶6SPMF: http://www.philippe-fournier-viger.com/spmf/index.php. The characteristics of databases are shown in Table VIII. However, two common features of four datasets are that 1) the quantity range of each item is [1, 6]; and 2) they cannot be used for mining high temporal fuzzy itemsets directly since they only offer utilities without periods. Therefore, in order to test the influence of the number of time periods, each dataset was randomly assigned to 1, 2, or 4 time periods, respectively. In addition, all experiments were executed three times independently. The “Runtime” and “Memory” respectively indicate the average running time and memory consumption of each experiment under the various thresholds ( $γ$ ). Finally, we suppose the algorithm is terminated if its running time exceeds over 3,600 seconds, and we use the symbol “-” to represent this in tables.

Dataset	#Trans	#Items	#AvgLen	Type
BMSPOS	515,366	1,656	6.51	sparse
Retail	88,162	16,470	10.3	sparse
Mushroom	8,124	119	23.0	dense
T40I10D100k	10,000	942	39.6	dense

TABLE VIII: Information of experimental datasets

Dataset	ATTFUM	FUMT	TFUM
BMSPOS $_{P4}$	1196	1015	945
Retail $_{P4}$	679	579	183
Mushroom $_{P4}$	203	198	174
T40I10D100k $_{P4}$	2348	1004	1214

TABLE IX: The peak memory of tested algorithms

Membership function. For convenience, as shown in Fig. 5, we supposed all items have the same membership function in our experiments. There are three fuzzy regions (Low, Middle, and High) taken into account.

Fig. 5: The experimental membership function.

Fig. 6: The runtime usage on different periods and thresholds.

Dataset	Notation	# Candidate under various thresholds
Dataset	Notation	test $_{1}$	test $_{2}$	test $_{3}$	test $_{4}$	test $_{5}$	test $_{6}$
	$γ$	0.6%	0.8%	1%	2%	4%	6%
	ATTFUM	-	-	-	-	627	242
BMSPOS $_{P2}$	FUMT	-	-	-	2,510	585	238
	TFUM	19,769	10,765	6,945	1,811	522	267
	$γ$	1%	2%	3%	4%	5%	6%
	ATTFUM	-	-	285,203	137,647	64,054	40,047
Mushroom $_{P2}$	FUMT	4,013,164	616,261	182,093	90,164	41,560	26,468
	TFUM	1,181,227	209,004	75,233	35,015	19,769	12,229

TABLE X: The number of candidates generated by three algorithms

Fig. 7: The memory consumption on different periods and thresholds.

Dataset	Notation	# Prune ratio under various thresholds
Dataset	Notation	test $_{1}$	test $_{2}$	test $_{3}$	test $_{4}$	test $_{5}$	test $_{6}$
	$γ$	0.6%	0.8%	1%	2%	4%	6%
BMSPOS $_{P2}$	FUMT	-	-	-	95.79%	96.38%	97.37%
	TFUM	98.03%	97.98%	97.9%	97.82%	97.1%	96.57%
	$γ$	1%	2%	3%	4%	5%	6%
Mushroom $_{P2}$	FUMT	72.19%	76.85%	78.16%	79.42%	80.35%	81.47%
	TFUM	91.82%	92.75%	93.09%	93.3%	93.38%	93.51%

TABLE XI: The prune ratio of FUMT and TFUM algorithms

Vi-a Runtime Analysis

From Fig. 6 we can clearly see that the mining performance of ATTFUM and FUMT is degraded consequently while $γ$ decreases or the number of time periods increases. For example, in Figs. 6 from (g) to (i), FUMT consumes nearly 350 seconds, 1100 seconds, and 3300 seconds in different periods, respectively, when $γ$ = 1%. The runtime cost of ATTFUM increases from 393 seconds to 3584 seconds, while $γ$ decreases from 3% to 1% in Fig. 6(l). Besides, ATTFUM is always timeout in most cases (especially in Fig. 6(c)). The reason we suppose is that ATTFUM consumes massive runtime in traversing subtrees. In addition, it can be observed that the proposed TFUM algorithm outperforms the other experimental algorithms. For instance, on the BMSPOS dataset, TFUM is up to two orders of magnitude faster than FUMT and ATTFUM. This is reasonable since the TFUM algorithm adopts the TP-table and lists to locate key information in a short time. In the same type of datasets (dense or sparse), the remaining measure is capable of further pruning and then fast discovering HTFUIs, when the average length of transactions are shorter. As shown in Fig. 6(h) (Mushroom dataset, two periods, and $γ$ = 1%), TFUM takes only 66 seconds, but FUMT requires 1140 seconds. In Fig. 6, there are many sub-graphs that reveal the much larger gap between the novel algorithm and other algorithms.

Vi-B Memory Usage Analysis

As illustrated in Fig. 7, in general, the memory consumption of the proposed algorithm (i.e., the red line) is usually below that of other experimental algorithms. There are a lot of cases where lines have intersected in Fig. 7(a) to (l). For example, on Retail dataset with four time periods, the memory cost of FUMT is higher than that of ATTFUM when $γ$ = 0.6%, but the opposite is true if $γ$ is 0.8%. However, it can be observed that the runtime consumption gap between two algorithms is becoming larger. We suppose this is a trade-off case. In particular, FUMT costs less memory than that of ATTFUM on dense datasets (Figs. 7(g)–(l)). In the meantime, the most interesting aspect of Fig. 7 is that the TFUM algorithm is stabler than others. The reason is that the adopted prune strategies by three algorithms cause a great difference in the number of visited nodes. Table IX lists the peak memory of three algorithms on four datasets, and we use bold to highlight the minimal values. As we described in previous content, the more periods, the higher the memory consumption. In fact, the peak memory of ATTFUM and FUMT on the same datasets is comparable, except for T40I10D100k $_{P4}$ . It is clear that the list-based algorithms perform better than the tree-based algorithms in terms of memory usage. The reason is that the lists compress major information instead of maintaining the dataset in memory.

Fig. 8: The scalability evaluation on different dataset sizes.

Fig. 9: The scalability evaluation on different periods.

Vi-C Candidate Comparison Analysis

In this subsection, we compare the amount of candidates generation of experimented algorithms on BMSPOS $_{P2}$ and Mushroom $_{P2}$ datasets (Table X). Due to the execution time of ATTFUM exceeds 4900 seconds when $γ$ is 2% on BMSPOS $_{P2}$ , we can infer that ATTFUM will be over time following $γ$ decreases. Considering the Mushroom $_{P2}$ dataset, the total number of candidates generation of FUMT is about three times more than that of TFUM. We suppose Properties 2 and 3 play an important role during searching high level fuzzy itemsets. Compared to the naive maximal value upper-bound, the remaining upper bounds are tighter and easily computed (i.e., intersection). Moreover, we also evaluate the prune effect of two list-based algorithms. The formulation is defined as prune ratio = $\frac{| visited nodes - candidates |}{| candidates |}$ . A fuzzy itemset is assumed as a candidate if it meets Properties 1, 2 and 3. Table XI illustrates that the average prune ratio of TFUM on BMSPOS $_{P2}$ and Mushroom $_{P2}$ are 97.6% and 92.9%, respectively. In conclusion, the novel algorithm has a higher prune ratio than the benchmark, especially on dense datasets.

Vi-D Scalability Analysis

We also conducted several experiments to evaluate the scalability of our proposed algorithm on the Retail dataset. In Fig. 8, $γ$ is assumed to be fixed and the time periods are 1, 2, and 4, respectively. Due to the tested results of 100% of datasets have already been illustrated in Figs. 6 and 7, we only conduct 10%, 20%, 40%, 60%, and 80% of Retail as experimental datasets. From Figs. 8(a) to (f), the runtime and memory consumption steadily increase as the tested data size increases. In addition, TFUM also performs better than FUMT under the same conditions. Then, we evaluate the performance of the TFUM algorithm in different time periods. In Fig. 9, we continue to use Retail as experimental dataset and $γ$ is set to 0.9%. Besides, each transaction in Retail will be randomly assigned a period within the given range. And it is possible that a period does not contain any transactions. As can be seen from Fig. 9, with the increment of the number of periods, the peak memory usage of TFUM never exceeds 250 MB, and the runtime consumption increases linearly. All in all, we can conclude that the novel algorithm has good scalability performance.

Vii Conclusions and Future Works

This paper provided a new fuzzy utility mining method called TFUM for efficiently mining HTFUIs from temporal quantitative databases. TFUM used a novel and compact fuzzy-list and TP-table for storing temporal fuzzy information. The novel fuzzy-lists utilize the remaining measure, which makes for tighter upper-bounds than before. Our experimental evaluation of the novel algorithm reveals promising results over the state-of-the-art algorithms on sparse and dense datasets. Extensive experiments show that the adopted remaining measure can efficiently limit the search space of mining and has a high prune ratio on dense datasets. The runtime performance improvements were observed to be much lower for almost all the baseline algorithms. As the threshold was raised, the runtime consumption decreased steadily. Meanwhile, the peak memory requirements in all experiments were also less than those of other tested algorithms. In addition, we also conducted some experiments in terms of scalability. The final results show the proposed algorithm has good scalability no matter the different data sizes or periods. As a part of future work, we intend to consider further reducing the number of visited nodes during the mining process. Other interesting ideas can also be applied to time series forecasting, spatio-temporal applications, and other interesting work.

References

[1] A. Aggarwal and D. Toshniwal (2018) Spatio-temporal frequent itemset mining on web data. In Proceedings of the IEEE International Conference on Data Mining Workshops, pp. 1160–1165. Cited by: §I.
[2] R. Agrawal and R. Srikant (1994) Fast algorithms for mining association rules. In Proceedings of the 20-th ACM International Conference on Very Large Data Bases, Vol. 1215, pp. 487–499. Cited by: §I.
[3] K. C. Chan and W. H. Au (1997) Mining fuzzy association rules. In Proceedings of the 6-th International Conference on Information and Knowledge Management, pp. 209–215. Cited by: §II-B.
[4] C. M. Chen, L. Chen, W. Gan, L. Qiu, and W. Ding (2021) Discovering high utility-occupancy patterns from uncertain data. Information Sciences 546, pp. 1208–1229. Cited by: §II-A.
[5] J. Chen, X. Guo, W. Gan, C. M. Chen, W. Ding, and G. Chen (2022) On-shelf utility mining from transaction database. Engineering Applications of Artificial Intelligence 107, pp. 104516. Cited by: §I.
[6] L. Chen, W. Gan, Q. Lin, J. Miao, and C. M. Chen (2021) Mining on-shelf high-utility quantitative itemsets. In Proceedings of the 9-th IEEE International Conference on Big Data, pp. 5491–5500. Cited by: §II-A.
[7] C. Chu, V. S. Tseng, and T. Liang (2008) An efficient algorithm for mining temporal high utility itemsets from data streams. Journal of Systems and Software 81 (7), pp. 1105–1117. Cited by: §I.
[8] Y. Cui, W. Gan, H. Lin, and W. Zheng (2022) FRI-Miner: fuzzy rare itemset mining. Applied Intelligence 52 (3), pp. 3387–3402. Cited by: §II-A.
[9] Q. H. Duong, P. Fournier Viger, H. Ramampiaro, K. Nørvåg, and T. L. Dam (2018) Efficient high utility itemset mining using buffered utility-lists. Applied Intelligence 48 (7), pp. 1859–1877. Cited by: §II-A.
[10] P. Fournier Viger, C. W. Wu, S. Zida, and V. S. Tseng (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In Proceedings of the International Symposium on Methodologies for Intelligent Systems, pp. 83–92. Cited by: §II-A.
[11] W. Gan, J. C. W. Lin, P. Fournier Viger, H. C. Chao, T. P. Hong, and H. Fujita (2018) A survey of incremental high-utility itemset mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8 (2), pp. e1242. Cited by: §I.
[12] W. Gan, S. Wan, J. Chen, C. M. Chen, and L. Qiu (2020) TopHUI: top- $k$ high-utility itemset mining with negative utility. In Proceedings of the 8-th IEEE International Conference on Big Data, pp. 5350–5359. Cited by: §I, §II-A.
[13] P. N. Hai, P. Poncelet, and M. Teisseire (2011) Moving objects: combining gradual rules and spatio-temporal patterns. In Proceedings of the IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services, pp. 131–136. Cited by: §I.
[14] J. Han, J. Pei, Y. Yin, and R. Mao (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining and Knowledge Discovery 8 (1), pp. 53–87. Cited by: §I.
[15] J. Han, J. Pei, and Y. Yin (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Record 29 (2), pp. 1–12. Cited by: §I, §II-B.
[16] T. P. Hong, C. S. Kuo, and S. C. Chi (1999) Mining association rules from quantitative data. Intelligent Data Analysis 3 (5), pp. 363–376. Cited by: §II-B.
[17] T. P. Hong, C. Y. Lin, W. M. Huang, K. S. M. Li, L. S. L. Wang, and J. C. W. Lin (2020) Using tree structure to mine high temporal fuzzy utility itemsets. IEEE Access 8, pp. 153692–153706. Cited by: §I, §II-B.
[18] T. P. Hong, C. Y. Lin, W. M. Huang, S. M. Li, S. L. Wang, and J. C. W. Lin (2022) A one-phase tree-structure method to mine high temporal fuzzy utility itemsets. Applied Sciences 12 (6), pp. 2821. Cited by: §I, §I, §II-B, §III, §VI, footnote 3.
[19] W. M. Huang, T. P. Hong, G. C. Lan, M. C. Chiang, and J. C. W. Lin (2017) Temporal-based fuzzy utility mining. IEEE Access 5, pp. 26639–26652. Cited by: §I, §II-B, §III, Definition 7.
[20] S. Krishnamoorthy (2017) HMiner: efficiently mining high utility itemsets. Expert Systems with Applications 90, pp. 168–183. Cited by: §II-A.
[21] S. Krishnamoorthy (2019) Mining top- $k$ high utility itemsets with effective threshold raising strategies. Expert Systems with Applications 117, pp. 148–165. Cited by: §II-A.
[22] C. M. Kuok, A. Fu, and M. H. Wong (1998) Mining fuzzy association rules in databases. ACM Special Interest Group on Management of Data Record 27 (1), pp. 41–46. Cited by: §II-B.
[23] G. C. Lan, T. P. Hong, Y. H. Lin, and S. L. Wang (2015) Fuzzy utility mining with upper-bound measure. Applied Soft Computing 30, pp. 767–777. Cited by: §II-A, §II-B.
[24] C. W. Lin, T. P. Hong, and W. H. Lu (2010) Linguistic data mining with fuzzy fp-trees. Expert Systems with Applications 37 (6), pp. 4560–4567. Cited by: §II-B.
[25] C. W. Lin and T. P. Hong (2013) A survey of fuzzy web mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 3 (3), pp. 190–199. Cited by: §II-B.
[26] C. W. Lin and T. P. Hong (2014) Mining fuzzy frequent itemsets based on UBFFP trees. Journal of Intelligent & Fuzzy Systems 27 (1), pp. 535–548. Cited by: §II-B.
[27] J. Liu, K. Wang, and B. C. Fung (2015) Mining high utility patterns in one phase without generating candidates. IEEE Transactions on Knowledge and Data Engineering 28 (5), pp. 1245–1257. Cited by: §II-A.
[28] M. Liu and J. Qu (2012) Mining high utility itemsets without candidate generation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 55–64. Cited by: §I, §II-A.
[29] Y. Liu, W. K. Liao, and A. Choudhary (2005) A two-phase algorithm for fast discovery of high utility itemsets. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 689–695. Cited by: §II-A.
[30] Y. C. Liu, C. P. Cheng, and V. S. Tseng (2013) Mining differential top- $k$ co-expression patterns from time course comparative gene expression datasets. BMC Bioinformatics 14 (1), pp. 1–13. Cited by: §I.
[31] M. Nouioua, P. Fournier Viger, C. W. Wu, J. C. W. Lin, and W. Gan (2021) FHUQI-Miner: fast high utility quantitative itemset mining. Applied Intelligence 51 (10), pp. 6785–6809. Cited by: §II-A.
[32] S. Papadimitriou and S. Mavroudi (2005) The fuzzy frequent pattern tree. In The WSEAS International Conference on Computers, pp. 1–7. Cited by: §II-B.
[33] J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang (2007) H-Mine: fast and space-preserving frequent pattern mining in large databases. IIE Transactions 39 (6), pp. 593–605. Cited by: §I.
[34] R. Rymon (1992) Search through systematic set enumeration. In Proceedings of the 3rd International Conference on Principles of Knowledge Representation and Reasoning, pp. 539–550. Cited by: §IV-C.
[35] R. Srikant and R. Agrawal (1996) Mining quantitative association rules in large relational tables. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1–12. Cited by: §II-B.
[36] S. Wan, J. Chen, Z. Qi, W. Gan, and L. Tang (2022) Fast RFM model for customer segmentation. In Companion Proceedings of the Web Conference, pp. 1–8. Cited by: §I.
[37] S. Wan, J. Chen, P. Zhang, W. Gan, and T. Gu (2022) Discovering top- $k$ profitable patterns for smart manufacturing. In Companion Proceedings of the Web Conference, pp. 1–9. Cited by: §I.
[38] S. Wan, W. Gan, X. Guo, J. Chen, and U. Yun (2021) FUIM: fuzzy utility itemset mining. arXiv preprint arXiv:2111.00307. Cited by: §II-A, §II-B.
[39] C. M. Wang, S. H. Chen, and Y. Huang (2009) A fuzzy approach for mining high utility quantitative itemsets. In Proceedings of the IEEE International Conference on Fuzzy Systems, pp. 1909–1913. Cited by: §II-B.
[40] P. Wu, X. Niu, P. Fournier Viger, C. Huang, and B. Wang (2022) UBP-Miner: an efficient bit based high utility itemset mining algorithm. Knowledge-Based Systems, pp. 108865. Cited by: §II-A.
[41] H. Yao, H. J. Hamilton, and C. J. Butz (2004) A foundational approach to mining itemset utilities from databases. In Proceedings of the SIAM International Conference on Data Mining, pp. 482–486. Cited by: §II-A.
[42] Z. Ye, S. Wan, W. Gan, J. Chen, and L. Tang (2022) Fuzzy utility mining on temporal data. In Proceedings of the 4-th International Conference on Data Intelligence and Security, pp. 1–8. Cited by: §I, §II-B, §III, §IV-D, §VI, Definition 11.
[43] L. A. Zadeh (1965) Fuzzy sets. Information and Control 8 (3), pp. 338–353. Cited by: §I.