Temporal Fuzzy Utility Maximization with Remaining Measure

Shicheng Wan, Zhenqiang Ye, Wensheng Gan,  and Jiahui Chen
This research was supported in part by the National Natural Science Foundation of China (Grant Nos. 61902079 and 62002136), Guangzhou Basic and Applied Basic Research Foundation (Grant Nos. 202102020928 and 202102020277), National Key-Research and Development Program of China (Grant No. 2020YFB2104003), and the Young Scholar Program of Pazhou Lab (Grant No. PZL2021KF0023).Shicheng Wan, Zhenqiang Ye, and Jiahui Chen are with the Department of Computer Science, Guangdong University of Technology, Guangzhou 510006, China. (E-mail: scwan1998@gmail.com, yzq66f@gmail.com, csjhchen@gmail.com)Wensheng Gan is with the College of Cyber Security, Jinan University, Guangzhou 510632, China; and also with Pazhou Lab, Guangzhou 510330, China. (E-mail: wsgan001@gmain.com)Corresponding author: Wensheng Gan and Jiahui Chen
Abstract

High utility itemset mining approaches discover hidden patterns from large amounts of temporal data. However, an inescapable problem of high utility itemset mining is that its discovered results hide the quantities of patterns, which causes poor interpretability. The results only reflect the shopping trends of customers, which cannot help decision makers quantify collected information. In linguistic terms, computers use mathematical or programming languages that are precisely formalized, but the language used by humans is always ambiguous. In this paper, we propose a novel one-phase temporal fuzzy utility itemset mining approach called TFUM. It revises temporal fuzzy-lists to maintain less but major information about potential high temporal fuzzy utility itemsets in memory, and then discovers a complete set of real interesting patterns in a short time. In particular, the remaining measure is the first adopted in the temporal fuzzy utility itemset mining domain in this paper. The remaining maximal temporal fuzzy utility is a tighter and stronger upper bound than that of previous studies adopted. Hence, it plays an important role in pruning the search space in TFUM. Finally, we also evaluate the efficiency and effectiveness of TFUM on various datasets. Extensive experimental results indicate that TFUM outperforms the state-of-the-art algorithms in terms of runtime cost, memory usage, and scalability. In addition, experiments prove that the remaining measure can significantly prune unnecessary candidates during mining.

Impact Statement—This article proposes a method to find profitable fuzzy itemsets from temporal databases. The novel algorithm can play an important role in marketing and business, where these discovered patterns are paramount for strategic decision-making. The TFUM algorithm achieves this by integrating fuzzy set theory into itemset mining and then making it possible to transform linguistic terms into quantitative values and making binary machines find both interesting and explicable patterns, such as a young and tall man, beautiful flowers, and sauna days. The proposed remaining measure addresses the “combinatorial explosion” problem in a certain way and achieves state-of-the-art performance on various databases (including sparse and dense). TFUM can provide a contribution to the explainable artificial intelligence system, pattern recognition, communication of information applications, and so on.

artificial intelligence, remaining measure, temporal fuzzy-list, temporal mining, fuzzy set.

I Introduction

Data mining technology can be regarded as an algorithmic process that takes data as input and yields useful results as output. It is meaningful to identify and isolate relationships among the different items that may be hidden in massive data, since the first and well-known association rule mining (ARM) algorithm was proposed [2]. In market basket analysis, the database records items purchased by a customer at a single time as a transaction. The association rule mining task aims to find out the “association” between sets of items with some strong specified confidence. An example of such an association is the statement that 88% of transactions involving men from 30 to 39 years old will buy a car. The number 88% shows the confidence value of the rule. The antecedent of this rule consists of men aged 30 to 39 years old, and the consequent consists of the car alone. This rule means a man has a high probability of purchasing a car when he is between 30 and 39 years old. In the meantime, frequent itemset mining (FIM) [15, 14, 33] started as a phase in the discovery of association rules, but has been generalized independent of these to many other patterns. In recent years, Aggarwal et al. [1] proposed a spatio-temporal FIM algorithm on web data. However, a fundamental limitation of both FIM and ARM is that they both assume that each item cannot appear more than once in each transaction and that all items have the same importance (e.g., weight, unit profit, or risk). In other words, bread will be regarded as interesting, but diamonds will not, because the sale volume of the latter one is far less than that of the prior. The frequency of an itemset cannot be a sufficient indicator of interest in some cases. As a consequence, traditional high-utility itemset mining (HUIM) algorithms discover more realistic and important knowledge than FIM by taking into account non-binary occurrences of items within transactions and various interests in distinct items. The discovered results of traditional HUIM are more interpretable than those of FIM. After decades of development, plenty of algorithms [28, 12, 11, 37] were proposed and have been applied in many applications like cross-marketing [30], click stream analysis [7], and traffic management [13].

However, a critical drawback of previous studies is that few of them consider the temporal aspect of databases. If a customer visited the time of a retail last year, his/her transaction record may be invalid at present, and thus be useless for analysis nowadays [36]. More generally, consider that transactions in a retail store contain the following five items: {apple}, {ice-cream}, {neckerchief}, {sweater}, and {watermelon}. Items such as {watermelon} and {ice-cream} are typically best-selling during the summer, whereas items such as {sweater} and {neckerchief} sales better in the winter than in the summer, and {apple} will be needed all year round. Therefore, in this case, traditional HUIM approaches are unable to deal with temporal transaction databases. In recent years, Chen et al. [5] proposed a novel on-shelf utility mining algorithm and then discovered maximal profitable product combinations by considering on-shelf periods of items. However, all the above data mining algorithms simply reveal that discovered itemsets (i.e., product combinations) are profitable but not capable of reflecting quantization information. The Likert scale is a notable example of this issue. There are five different types of answers: totally disagree, disagree a little, neutral opinion, agree a little, and totally agree. The fact remains that such imprecisely defined ”extents/classes” play an important role in human thinking, particularly in the domains of pattern recognition, communication of information, and abstraction. Hence, how to make a machine learns linguistic representations (e.g., sweet, hot, and beautiful) is an interesting but challenging task. Huang et al. [19] figured out that fuzzy quantifiers can be a fuzzy approximation to linguistic representations. By adopting fuzzy set theory [43] and user-defined membership functions, users can easily comprehend how many products are sold at an expensive, moderate, or cheap price. The characteristic of fuzzy set is that determining whether an element belongs to a set cannot be answered simply by “yes” or “no”, whereas it is gradient. Huang et al. [19] first defined the temporal fuzzy utility itemset mining task. That is, given a predefined minimum threshold and user-defined membership functions, mining a complete set of high temporal fuzzy utility itemset from temporal quantitative transaction databases. The aim of the novel mining task is to find patterns that are easily interpretable by users, and thus can help in understanding the results. However, their proposed algorithm is a “generate-and-test/two-phase” approach and thus performs poorly. Then, Hong et al. continuously proposed several new studies [17, 18] to improve the performance of algorithms in the temporal fuzzy utility itemset mining domain.

Recently, Ye et al. [42] proposed an effective temporal fuzzy utility itemset mining algorithm, which performs better than the algorithm of Hong et al. [18]. Though their approach utilizes list structure to efficiently store heuristic information, the adopted upper-bounds are too loose to consume a large amount of runtime and memory. This issue motivates us to further develop the temporal fuzzy utility itemset mining. In this paper, we propose a new one-phase list-based algorithm called TFUM. The major contributions of this paper are summarized as follows.

  • A novel structure, called RTF-List, is proposed. We first integer maximal remaining fuzzy utility into temporal fuzzy-lists and propose a remaining measure to decide whether the fuzzy itemsets should be pruned or not.

  • Without checking transaction identifications one by one, we propose a TP-table which is capable of fast locating quantitative transactions by period identification.

  • The novel algorithm holds the downward closure property and adopts the temporal fuzzy utility upper-bound ratio to prune the search space as far as possible.

  • Extensive experiments have been conducted on four datasets (including sparse and dense). Under various periods and minimal thresholds, the experimental results reveal that the novel proposed algorithm performs far better than the state-of-the-art algorithms in terms of run time, memory consumption, and scalability.

The rest of the content in this paper is organized as follows. The related work will be briefly reviewed in Section II. Section III introduces some basic definitions and the problem statement. Section IV describes the proposed algorithm, and Section V illustrates a detailed example of our approach. Finally, extensive experiments and responding analysis are shown in Section VI, and future work is finally presented in Section VII.

Ii Related Work

Ii-a Itemset Mining Using List Structure

Based on the way of discovering interesting patterns from databases, many list-based algorithms in itemset mining domain actually belong to a one-phase approach. In high utility itemset mining (HUIM), HUI-Miner [28] is the earliest and most famous list-based algorithm. Compared to the two-phase HUIM algorithms [41, 29], HUI-Miner is a pattern-growth approach and does not need to scan the original database more than twice. It firstly finds promising low-level itemsets (which are potential high utility itemsets) and then constructs utility-lists for these itemsets. A utility-list consists of several tuples, and there are two basic elements (i.e., identifiers of the transaction involving the itemset and the utility of the itemset) in each tuple. Then, the algorithm joins different utility-lists to generate new utility-lists of high-level itemsets until there are no more itemsets generated. After that, numerous optimization algorithms [12, 10, 27, 20, 9, 21, 40] have been proposed in the last decades.

Besides, to meet the requirements of various domains, various list-based itemset mining extensions have been proposed, which take constraints and more complex data types into account. Wan et al. [38] first adopted fuzzy-lists to mine high-fuzzy itemsets from quantitative transaction databases. They proposed that the remaining fuzzy utility of each fuzzy itemset is a tighter upper bound than FUUB [23]. And the extensive experiments reveal the novel upper-bound plays an important role in their algorithm. Furthermore, Cui et al. [8] considered frequency constraint and utilized fuzzy-lists to find valuable and interesting fuzzy rare itemsets. In transaction databases, high utility itemsets do not provide information about the purchase quantities of items. Then, Nouioua et al. [31] designed the utility lists of -itemsets to solve the issue. Chen et al. [6] noticed most existing HUIM algorithms assume that itemsets always occur regardless of the period. However, this assumption is not realistic in most cases, such as the sold volume of ice cream. Hence, they considered the rich information (e.g., quantity and period) in databases and adopted utility-lists to discover on-shelf high-utility quantitative itemsets. In addition, recently, Chen et al. [4] mined high utility-occupancy patterns from uncertain data by probability-utility-occupancy list and probability-frequency-utility table. Considering the existence probability of items will be more realistic than assuming the items must occur in transactions.

Ii-B Itemset Mining Using Fuzzy Theory

Since Srikant and Agrawal [35] first proposed a fuzzy mining approach to discover quantitative association rules, they have opened a novel fuzzy pattern mining domain. Though their proposed approach utilizes a naive method (i.e., the generate-and-test mechanism), which leads to inefficiency, the discovered results of the new algorithm can be more convenient for helping decision makers to understand and use. Then, Chan and Au [3] figured out that the discovered quantitative association rules by discretizing the domains of quantitative attributes into intervals is not concise and meaningful enough. Therefore, they employed linguistic terms to reveal quantitative association rules by setting membership functions in advance. They called the mining rules as fuzzy association rules, because their proposed algorithm adopts the fuzzy set theory. Later, Kuok et al. [22] computed the membership value of a high-level fuzzy itemset by applying a minimum operation to get the overlap value of membership regions in its consisting fuzzy items. Since the user-predefined membership function may divide an item into several membership regions, Hong et al. [16] noticed that using the maximal membership value as the fuzzy value of the item allows the item to be generalized to a certain extent. This idea also affects many subsequent fuzzy pattern mining algorithms. The study [32] employed a heuristic method and tree structure to mine fuzzy association rules. The experimental results showed that their proposed algorithm performs better than previous Apriori-based fuzzy association rule mining algorithms. Then, there are also several variations on the original FP-Growth algorithm [15] proposed to discover fuzzy frequent itemsets from quantitative databases [24, 25, 26]. However, frequent itemsets often stand in front of the barrier of discovering frequent but low profit patterns.

As similar to traditional high utility itemset mining, Wang et al. [39] first proposed fuzzy utility mining (FUM) domain to solve the above issue. They had successfully extended HUIM with fuzzy sets to handle quantitative transaction databases. Subsequently, Lan et al. [23] proposed another different kind of FUM algorithm named TPFU. TPFU assesses the utility of an item based on both its linguistic terms defined by users and the membership values of terms scoped in a user-predefined membership function. Though TPFU belongs to two-phase approach, it has integrated minimum operation and maximal membership value into the fuzzy utility mining well. Recently, Wan et al. [38] proposed a list-based fuzzy utility mining approach called FUIM, and the novel algorithm outperforms TPFU in terms of runtime and memory usage since FUIM is a one-phase approach. However, the time attribute of a discovered pattern plays an important role in many real applications, like weblog analysis, business decisions, and so on. Huang et al. [19] therefore defined a novel task of discovering high temporal fuzzy utility itemsets from temporal quantitative databases (temporal fuzzy utility itemset mining, TFUIM). Then, Hong et al. [17, 18] have successively published two TFUIM algorithms (i.e., FHTFUP and ATTFUM) which are implemented by tree structure. In particular, ATTFUM is a one-phase algorithm and adopts an array-embedded tree structure. It utilizes an array-list to keep key information about interesting fuzzy itemset in each tree node. Recently, Ye et al. [42] adopted TF-Lists to compress the temporal quantitative databases. The experiments revealed that the novel algorithm performs better than previous algorithms on dense databases.

Iii Preliminaries and Problem Statement

This section firstly introduces some basic and commonly used notations and definitions in this paper. Most of them are proposed from previous studies [19, 18, 42]. A finite set = {, , , } is consisted of distinct items. Itemset and transaction are both subsets of . The difference between them is that a transaction can be regarded as an -itemset, which means that it contains different items (1 ). In addition, the length of -itemset is denoted as = . Each transaction is assigned a unique transaction identification (simplified as Tid). A period is a set of spatio-temporal data that may contain zero, one, or more transactions ( = {, , }). We also call a temporal quantitative transaction if it is contained in a period, and a transaction cannot belong to two or more different periods at the same time. The fuzzy set is defined by the membership function, and the membership value range is [0, 1]. Fuzzy itemsets that are contained in temporal quantitative transactions are called temporal fuzzy itemsets. Then, a temporal quantitative database TQD consists of one or more periods.

In this paper, we take a temporal quantitative database (Table I) and a predefined membership function (Fig. 1) as our running example. There are six distinct items (i.e., , , , , , and ), ten transactions, and five periods in the running example database (TQD)111In order to make the expression more concise, we will always use the symbol “TQD” instead of “temporal quantitative database” during discussion.. The number in each row of TQD represents the internal utility (e.g., quantity) of the corresponding item (which is listed in the first row). Furthermore, the external utility (e.g., profit) of each item is shown in Table II. Finally, we will formalize the problem definition for TFUIM.

The predefined membership function.
Fig. 1: The predefined membership function.
Period Tid A B C D E F
1 0 3 0 1 2
0 4 0 3 0 1
0 2 0 1 2 0
2 0 6 1 3 0
0 1 6 3 0 3
0 3 0 0 1 7
2 4 1 5 8 1
1 6 2 0 1 0
7 0 0 3 0 0
3 1 0 6 1 9
TABLE I: A sample temporal quantitative database
Item A B C D E F
Utility 9 5 4 2 1 7
TABLE II: The external utility of each item
Definition 1

As previous content introduced, the number of linguistic terms of an item is depended on the regions of the given membership function. The fuzzy set of an item in a temporal quantitative transaction is defined as = { + + + + }. Fuzzy membership degree is calculated from quantity and region .

For example, consider the TQD in Table I and the membership function shown in Fig. 1, since the quantity of in transaction is 7 that the region of in are Middle and High. Based on the membership function, the region values of and are 0.67 and 0.33, respectively. Therefore, the fuzzy set of item in () is { + }.

Definition 2

The fuzzy utility of -th fuzzy region of an item in a temporal quantitative transaction is defined as fu(, ) = . And the fuzzy utility of in TQD is the summation of fu(, ). That is, fu() = fu(, ). Besides, the fuzzy utility of a fuzzy itemset in is denoted as fu(, ) = [], where = Min{}. Similarly, the fuzzy utility of in TQD is defined as fu = fu(, ). In addition, it should be pointed out that different fuzzy regions of an item cannot occur in a same fuzzy itemset in the meanwhile. In other words, an item cannot both Low and Middle in a fuzzy itemset together.

For instance, the item occurs in transactions and . The quantity of in these transactions are 3, 6, 6, 1 and 2, respectively. Based on the Fig. 1 and above definitions, is 0.67, and then fu(, ) is 0.67 2 4, which is 5.36. By the same way, fu(, ) and fu(, ) are 12 and 1.32, respectively. In TQD, fu is 5.36 + 12 + 1.32 = 18.68 and fu is 48. Consider a fuzzy itemset {A.Low, C.Middle} in . Since the minimum fuzzy membership degree of {A.Low, C.Middle} is 0.67, and thus fu is 0.67 (2 9 + 6 4) = 28.14.

Definition 3

The fuzzy utility of a in TQD is defined as tfu = fu(, ). In addition, the start transaction period STP of is firstly occurring time period in TQD of the transaction (). The last transaction periods (LTP) of an itemset is a set of periods from the latest start transaction periods of all items in to the end period of TQD. Then, the temporal fuzzy utility ratio of a fuzzy itemset is formulated as tfur = .

For example, STP is because B.Low first appears in , and belongs to . Take the fuzzy itemset {B.Low, C.Middle} as an example. STP and STP are and , respectively. Therefore, the LTP of {B.Low, C.Middle} is a set of {, , , }.

Definition 4

Given a user-defined minimal utility threshold , a fuzzy itemset is assumed to be a high temporal fuzzy utility itemset (abbreviated as HTFUI) if and only if its tfur is higher than or equal to . Otherwise, is a low temporal fuzzy utility itemset (LTFUI), which is useless for users.

For example, if = 20%222Unless specifically specified, without loss of generality, we are going to default that is 20% in this paper., the temporal fuzzy utility ratio of {A.Low, F.High} is 52.03% 20%, and thus it is an HTFUI. Table III lists other HTFUIs when is 20%.

HTFUIs tfur
A.Low, F.High 52.03%
A.Middle 24.4%
D.Low, A.Middle 26.73%
D.Middle, A.Low 22.53%
D.Middle, A.Low, F.High 58.97%
D.Middle, F.High 43.36%
F.High 36.42%
TABLE III: A set of high temporal fuzzy utility itemsets

Problem statement. Based on previous introduced definitions, given a user-specified minimal utility threshold and a user-defined membership function, we define the problem of temporal fuzzy utility itemset mining (TFUIM) task as discovering the set of high temporal fuzzy utility itemsets in a temporal quantitative database.

Iv The Proposed Flexible Algorithm

It can be found that {A.Middle} and {D.Low, A.Middle} are HTFUIs, but {D.Low} is not in Table III. Note that the temporal fuzzy utility ratio is neither anti-monotonic nor monotonic. We will discuss how to make our approach hold downward closure property in this section. We also use the temporal remaining fuzzy utility upper-bound to improve TFUM’s performance. The details will be introduced in the following.

Iv-a Downward Closure Property

Definition 5

The maximal fuzzy utility of a fuzzy item in a temporal quantitative transaction is denoted as mfu = Max{fu, fu, , fu}. Then, the summation of the maximal fuzzy utility of all fuzzy items in can be a pre-evaluated value of . That is, mtfu = mfu. Obviously, the maximal fuzzy utility of any in is never higher than the maximal fuzzy utility of (i.e., mfu mtfu).333Specially, mtfu is the upper-bound of all fuzzy items in corresponding transactions, but tfu is not in this paper (as shown in Table IV). Please see the study [18] for more information on the proof.

For example, consider in Table I. The quantity of the item is 5, and then the values for and are 0.33 and 0.67, respectively. Hence, mfu is Max{fu, fu} = Max{0.33 5 2, 0.67 5 2} = 6.7. The mfu of other items in can be calculated with the same method. Therefore, mtfu = fu + fu + fu + fu + fu + fu = 12.06 + 13.4 + 1.32 + 6.7 + 5.36 + 2.31 = 41.15.

Definition 6

We defined the start transaction period (STP) of all fuzzy items as STP = max{STP, STP, , STP}, where is the number of fuzzy items, and max operation has the latest time period (LTP) of the attached parameters. In addition, LTP means all the time periods from STP to the last time period of TQD.

For example, the STP of F.High is and is the last transaction in TQD, that is STP. Then LTP contains only one period .

Definition 7

The temporal fuzzy utility upper-bound ratio of an itemset is defined as tfuubr = 444In this paper, we use itemset instead of fuzzy itemset will get a more loose upper-bound than before, but experiments show this case does not affect obtaining the right final results.. Furthermore, if tfuubr is higher than or equal to the user-defined minimal threshold , it will be assumed as a high temporal fuzzy utility upper-bound itemset (HTFUUI) and should be further checked for its temporal fuzzy utility ratio. Otherwise, it is a LTFUI. Hence, the set of HFUIs is actually a subset of HTFUUIs. The proof details can be referred to the study [19].

Tid
tfu 29.3 28.31 8.7 39.72 52.65
mtfu 24.68 21.71 8.7 39.72 52.65
Tid
tfu 64.33 53.69 38.66 69 103.98
mtfu 48.16 41.15 38.66 48.21 103.98
TABLE IV: A comparison table (tfu vs. mtfu)
Property 1

Considering the definition of the temporal fuzzy utility upper-bound ratio, the denominator always stays the same. Therefore, the value of molecular shows a large fraction of the formula. Given a fuzzy itemset , it is clear that the number of temporal quantitative transactions containing is always greater than the number of its superset . Then, the in-equation tfuubr tfuubr tfur is always true.

Strategy 1

Based on Property 1, if a temporal fuzzy utility upper-bound ratio of a fuzzy itemset is less than the user-defined minimal fuzzy utility threshold , we can directly ignore it and its supersets during the mining process.

Since tfuubr is always higher than or equal to tfuubr, we define the symbol as a global descending order of all single items from to help prune the search space. Thus, a temporal quantitative transaction is “revised” when all the contained items in it are HTFUUIs and all remaining items are sorted by the global descending order (“”). Besides, if a database consists of revised temporal quantitative transactions, the database is considered as TQD.

Iv-B TP-table for Fast Locating

In previous sections, we introduced several basic concepts of temporal fuzzy utility and a revised temporal quantitative database. Since a period may contain zero or one or even more temporal quantitative transactions, it is foreseeable that calculating the temporal fuzzy utility of a fuzzy itemset by scanning over and over again is unacceptable. In this subsection, a transactions-period table (TP-table) structure is proposed to record the relationships between transactions and the periods that include them, which is defined as follows.

Definition 8

Given a temporal quantitative database TQD, a TP-table can be created by scanning the TQD once, where each temporal quantitative transaction is used as an index to indicate the period that contain it. That is, = { }.

Considering Table I, a typical example of a TP-table is given in Table V, and a TP-table consists of two fields, including period and Tids. In general, a naive method adopts a matrix array (5 10), that the first column stores the period, and the rest columns are used to store Tids. Furthermore, bitmaps can also be integrated into the matrix to save memory. What’s more, we implement TP-table by hash-map in our work, which accelerates the locating process effectively. According to the utilization of the TP-table, it is clear that it can save memory and runtime consumption because massive unnecessary times of scanning databases are reduced. As databases tend to be large, the improved effect will be more significant.

Period Tids
,
,
,
,
,
TABLE V: An example of TP-table

Iv-C Tighter Remaining Measure

The mining mechanism of the proposed approach is a depth-first method. The search space is a set-enumeration tree [34] with an empty root. The first layer of the tree is filled with single items (1-itemsets) from . Then, 2-itemsets are the child nodes of these 1-itemsets in the second layer. And there are third, fourth layers, and so on in the same manner. Note that the total number of nodes in the search space exceeds . Hence, performing a downward traversal of every branch by a naive method is unrealistic. We next introduce several definitions related to the exploration of fuzzy itemsets in depth-first search space.

Definition 9

In the search space, the set of extension fuzzy items of a fuzzy itemset is sorted behind in . In this paper, we adopt the global descending order (“”) to obtain large fuzzy itemsets and avoid generating the same fuzzy itemset repeatedly. That is, () = { , }. Then, consider in a revised temporal quantitative transaction , and the set of all fuzzy items after is named remaining fuzzy items of and denoted as / = { , }.

Since the maximal fuzzy utility of an item is the upper-bound in the fuzzy set of the item and a fuzzy itemset does not contain fuzzy items from the same fuzzy set, it is feasible to take the maximal fuzzy utility of an item as the estimation of the extended fuzzy utility of a fuzzy itemset. In other words, the remaining maximal fuzzy utility of a fuzzy itemset represents how much the fuzzy utility of it can increase in the subsequent exploring process.

Definition 10

Given a fuzzy itemset in a temporal quantitative transaction , the utility of remaining fuzzy items / is named remaining maximal temporal fuzzy utility, and defined as rmtfu = mfu.

For example, consider a fuzzy item B.Low in the revised temporal quantitative transaction . / B.Low = {B.Middle, E.Middle, E.High, A.Low, F.Low, C.Low}, since B.Low and B.Middle can not occur in a same fuzzy itemset, thus rmtfu = mtfu + mtfu + mtfu + mtfu = 5.36 + 12.06 + 2.31 + 1.32 = 21.05.

Iv-D Revised Temporal Fuzzy-List Structure

In the FUMT algorithm [42], the temporal fuzzy-list structure (TF-List) was proposed to store HTFUUIs information. The TF-List significantly reduces the database scanning times. However, we notice that FUMT takes the maximal fuzzy utility of fuzzy items in the quantitative transaction as the upper-bound, which is too loose to effectively prune the search space. Thus, we modify the TF-List based on the remaining measure in our work. We discuss and prove the superiority of the remaining measure in this subsection.

The temporal fuzzy-lists of some
Fig. 2: The temporal fuzzy-lists of some HTFUUIs.
Definition 11

A temporal fuzzy-list [42] of a fuzzy itemset is a set of tuples. As shown in Fig. 2, the tuple consists of five elements: 1) transaction identification (Tid) shows how many transactions contain ; 2) fu records the fuzzy utility of in the corresponding temporal quantitative transaction; 3) reveals the fuzzy region values of in different transactions; 4) the latest STP (); and 5) the sum of maximal fuzzy utility of the temporal quantitative transaction (sumMtfu).

After FUMT discovers a complete set of high temporal fuzzy utility upper-bound 1-itemsets (HTFUUIs) by scanning the temporal quantitative database (TQD) for the first time, a revised temporal quantitative database (TQD) is obtained. FUMT then traverses TQD to construct the temporal fuzzy-lists of HTFUUIs (TF-List). Take fuzzy item {F.Low} in Table I as an example. Since the {F.Low} occurs in the transactions , , and , the STP of {F.Low} is . From the Table IV we can know the mtfus of {F.Low} are 24.68, 21.71, 52.65, and 41.15, respectively. Then the sumMtfu can be calculated as (24.68 + 21.71 + 52.65 + 41.15), which is 140.19. During the process of obtaining the mtfu and calculating the sumMtfu, it stores the Tid, fu and R in the list. By the same way, the FUMT algorithm constructs the TF-List. In addition, sumMtfu is an upper-bound that keeps decreasing. Its value reflects how much the fuzzy utility of fuzzy itemset can increase. Considering the fuzzy 2-itemset {B.Low, A.Low}, the itemset {B.Low, A.Low} appears in and . The mtfu of these two transactions is 41.15 and 103.98, respectively. Therefore, the sumMtfu of the itemset {B.Low, A.Low} is 145.13. As for the latest STP, since the STP of B.Low and A.Low are both that the latest STP of {B.Low, A.Low} is . The information for Tid, fu and R is recorded in the same manner.

However, taking {B.Low, A.Low} as the same sample, the remaining maximal fuzzy utility of it is computed as {B.Low, A.Low}.rmtfu = 63, which is far less than its sumMtfu. Fig. 3 lists rmtfu of other HTFUUIs.

Definition 12

Based on the remaining fuzzy utility definition, we modify the temporal fuzzy-list structure as the revised temporal fuzzy-list (RTF-List) structure. An RTF-List consists of a set of tuples, and each tuple contains four elements: 1) transaction identification (Tid); 2) utility of itemset (); 3) remaining maximal temporal fuzzy utility (rmtfu); and 4) fuzzy region value ().

The revised temporal fuzzy-lists of several
Fig. 3: The revised temporal fuzzy-lists of several HTFUUIs.

There is no need for a database scan because the proposed method generates fuzzy 2-itemsets by joining RTF-Lists of different low-level fuzzy itemsets. According to Definition 2, the fuzzy utility of an itemset is utility multiplied by region value. In order to reduce the amount of calculating, the RTF-List structure records the utility of an item. We also take {B.Low, A.Low} as illustrations. The common Tids in two RTF-Lists of B.Low and A.Low are and . In other words, the high-level fuzzy itemset {B.Low, A.Low} occurs in these temporal quantitative transactions. Thus, RTF-List contains two tuples. The u of {B.Low, A.Low} in each tuple is the summation of the utility of B.Low and A.Low. And the {B.Low, A.Low}.rmtfu is equal to that of A.Low because A.Low is a remaining fuzzy item of B.Low. Finally, Fig. 4 lists the details of RTF-List and others.

The revised temporal fuzzy-lists of several fuzzy 2-itemsets.
Fig. 4: The revised temporal fuzzy-lists of several fuzzy 2-itemsets.

Then we consider the RTF-List of fuzzy -itemset () and take the same manner as described previously. The difference between intersecting RTF-Lists of fuzzy (-1)-itemsets and constructing RTF-List of fuzzy 2-itemset is that u of fuzzy -itemset has double-calculated the fuzzy utility of its prefix fuzzy itemset ({, , }). Note that this is a miscalculation. For example, in Fig. 4, the of the element associated with in RTF-List is calculated as {B.Low, A.Low, F.High}.u = {B.Low, A.Low}.u + {B.Low, F.High}.u - {B.Low}.u.

Definition 13

In RTF-List of a fuzzy itemset , the sum of fu elements is denoted as .sumFu = fu(, ), and the sum of rmtfu elements is denoted as .sumRmtfu = rmtfu(, ).

Property 2

Since the RTF-List contains all rmtfu information of a fuzzy itemset , sumRmtfu can reflect how much the fuzzy utility of increases in TQD. Hence, if .sumFu + .sumRmtfu tfu is true, there must exist one or more fuzzy super-itemsets of are HTFUIs.

Strategy 2

Given a fuzzy itemset and its RTF-List. If the condition .sumFu .sumRmtfu tfu is true, there is no need to explore the search space for .

Proof:

Given a fuzzy itemset and its extension (), and for there is () = ( / ).

() (),
fu(, ) = fu(, ) + fu((), )
= fu(, ) + fu((), )
= fu(, ) + fu(, )
fu(, ) + fu(, )
fu(, ) + mfu(, )
= fu + rmtfu.

Suppose a Tid set of in list is denoted as , and a Tid set of in LTP is defined as , then:

, and LTP LTP,
tfur =
= .

The proof is done. \qed

Property 3

Assume that the intersection of two distinct RTF-Lists (RTF-List and RTF-LIst) is not empty. In fact, Strategy 2 considers all elements of RTF-List. Thus, we propose a tighter remaining upper-bound than the last one.

Strategy 3

Given two fuzzy itemsets and , where and RTF-List RTF-List . If .sumFu + .sumRmtfu (.fu + .rmtfu) , all supersets of XY can be pruned safely.

Proof:

Assume the extension of fuzzy itemsets and are and , respectively. Similarly, the extension of XY is XY.

(fu rmtfu)
= (fu rmtfu)
(fu rmtfu),
(fu rmtfu)
= (fu rmtfu)
(fu rmtfu)
= (fu rmtfu).

Assume that , , and , thus .

fu = fu
(fu rmtfu)
= (fu rmtfu)
(fu rmtfu).

Based on the proof details of Property 2, Property 3 is proved. \qed

Iv-E The TFUM Algorithm

The pseudocode of TFUM is presented in Algorithm 1. It takes three input parameters: 1) a quantitative transaction database that is temporal; 2) a user-specified minimum threshold; and 3) a pre-defined membership function. The TFUM algorithm firstly initializes both the TP-table and high temporal fuzzy utility of fuzzy 1-itemsets as empty (Line 1). In Lines 2–10, TFUM aims to prepare for the subsequent mining process, which includes updating the TP-table and HTFUUIs, adding the start time period of each fuzzy item to STP-List, and STP. In order to make the following discussion easier, we use the symbol to represent (Line 11). Since a complete set of high temporal fuzzy utilities of fuzzy 1-itemsets is collected, TFUM will get a global order “” and a revised temporal quantitative database in Lines 12 and 13. Subsequently, in Line 14, the revised temporal fuzzy-lists of HTFUUIs will be generated, and then TFUM keeps relevant information about fuzzy itemsets from the revised database in memory. In Line 15, after TFUM has finished initializing the prefix fuzzy itemset and its list RTF-List, TFUM calls a recursive method until no more high temporal fuzzy utility itemsets are generated (Line 16). Finally, a complete set of HTFUIs will be output when the TFUM algorithm terminates.

Input: TQD: a temporal quantitative transaction database; : a user-specified minimum temporal fuzzy utility threshold; : a pre-defined membership function.
Output: HTFUIs: a complete set of temporal fuzzy high utility itemsets.
1 initialize TP-table, STP-List and HTFUUIs as null; for temporal transaction TQD do
2       update TP-table; for each item  do
3             calculate fu of fuzzy items by ; STP-List the start time period (STP) of ;
4       end for
5      compute tfu and mtfu of ;
6 end for
STP = max{STP-List}; set as ; collect a complete set of HTFUUIs; get the revised database TQD by global order ; scan TQD and construct RTF-Lists of HTFUUIs; initialize prefix fuzzy itemset and RTF-List as null; call Miner(, RTF-List, RTF-List, , ); return HTFUIs
Algorithm 1 The TFUM algorithm
Input: : the prefix fuzzy itemset; RTF-List: a revised temporal fuzzy-list of ; RTF-Lists: a set of revised temporal fuzzy-lists; : a minimum fuzzy utiltiy threshold; : a fuzzy utility upper-bound.
1 for each RTF-List RTF-Lists do
2       if tfur  then
3             is a HTFUI;
4       end if
5      initialize exRTF-Lists as null; if .sumFu + .sumRmtfu  then
6             for each RTF-List RTF-lists do
7                   RTF-List = Construct(RTF-List, RTF-List, RTF-List); if XY.sumRmtfu  then
8                         exRTF-Lists += RTF-List;
9                   end if
10                  
11             end for
12             ; call Miner(, RTF-List, exRTF-Lists, , );
13       end if
14      
15 end for
Algorithm 2 The Miner procedure
Input: RTF-List: a revised temporal fuzzy-list of prefix fuzzy itemset ; RTF-List: a revised temporal fuzzy-list of fuzzy itemset Px; RTF-List: a revised temporal fuzzy-list of fuzzy itemset Py.
Output: a revised temporal fuzzy-list of high-level fuzzy itemset Pxy.
1 initialize RTF-List as null; Pxy.L = Max{Px.L, Py.L}; for each element Px RTF-List do
2       if  and .Tid == .Tid then
3             Pxy.R = Min{Px.R, Py.R}; if RTF-List null then
4                   find RTF-List where .Tid == .Tid;
5                   Pxy.u = Px.u + Py.u - P.u;
6            else
7                   Pxy.u = .u + .u;
8             end if
9            Pxy = (Px.Tid, Pxy.u, Py.rmtfu, Pxy.R); add Pxy into RTF-List;
10       end if
11      
12 end for
return RTF-List
Algorithm 3 The Construct procedure

The mining procedure of the TFUM algorithm is shown in Algorithm 2. The miner procedure takes a prefix fuzzy itemset , a set of revised temporal fuzzy-lists (including RTF-List), the minimum threshold , and a fuzzy utility upper-bound 555 = , which is described in the last paragraph.. The procedure will traversal each RTF-List in the input set RTF-Lists. It firstly will check the current fuzzy itemset is an HTFUI or not (Line 2). If yes, will be added into HTFUIs set (Line 3). Then, the procedure checks whether it can be extended or not in the following steps. The procedure initializes RTF-List of the extension fuzzy itemset of as null, and then assesses the summation of fuzzy utility and remaining maximal temporal fuzzy utility of is higher than or not (Lines 5 and 6). If the condition is true, the miner procedure will examine RTF-List of other fuzzy itemsets in RTF-Lists, where , to determine the super fuzzy itemset XY should be generated or not (Lines 7–12). While there is no more remaining fuzzy itemset of existing, the procedure will add into and recursively call Algorithm 2 itself.

Algorithm 3 shows the procedure of TF-List construction. The procedure joins RTF-Lists of two different fuzzy itemsets and then outputs a novel RTF-List of the fuzzy super-itemset. The construct procedure firstly initials RTF-List and Pxy.L in Lines 1 and 2. Then, it joins elements of RTF-List and RTF-List which contain the same Tids (Lines 3–15). If the prefix fuzzy itemset is null, then the procedure computes the sum of utility of two elements directly (Line 10); otherwise, the summation value should minus the utility of prefix fuzzy itemset (Line 8). In addition, binary search method can greatly accelerate the intersection process. In Line 12, a new element of RTF-List is obtained. And then, the procedure adds it into RTF-List (Line 13). When the construct procedure terminates, a novel revised temporal fuzzy utility of fuzzy super-itemset Pxy is collected (Line 16).

V An Illustrate Example

To more clearly introduce the TFUM algorithm, in this section, a simple example is given to illustrate how to find temporal fuzzy itemsets from a temporal quantitative database (TQD), as shown in Table I. There are ten transactions, and these ten transactions are attributed to five periods. The six items in TQD are denoted to , respectively. In addition, the external utility of all items is listed in Table II. The membership function is shown in Fig. 1. In this example, the minimum threshold = 20%. The detailed process of the TFUM algorithm is as follows.

  1. To discover the temporal fuzzy utility itemsets, TP-table (Table V) stores the period information of each transaction.

  2. While traversing each transaction in TQD, the algorithm calculates the fuzzy utility of each item with the given membership function. Then mtfu is the sum of all maximum fuzzy utilities in this transaction, and tfu is the sum of all fuzzy utilities. For instance, according to the membership function, the transaction can be transformed to {A.Low, B.Low, B.Middle, C.Low, D.Low, D.Middle, E.Middle, E.High, F.Low}. Since the fuzzy utility of B.Low is larger than that of B.Middle, the algorithm takes B.Low but not B.Middle into account when calculating mtfu. The same manner can be done for all transactions in TQD and the results are shown in Table IV.

  3. In addition, in order to find STP of all fuzzy items, a map STP-List records STP of each fuzzy item. From the STP-List, TFUM finds out the latest STP (i.e., STP). Since STP is which belongs to period , LTP is the last period of TQD (i.e., ).

  4. For computing the global order, TFUM utilizes a list structure to record all transactions where the item is located, then according to the tfuubr to confirm the order. For instance, item appears in transactions {, , , , , } and item appears in {, , , , , }. Then the tfuubr of and is 1.71 and 1.69, respectively. Therefore, . The algorithm sorts the items in transactions using the global order. After revising all transaction in TQD, we will get a TQD, as shown in Table VI.

  5. TFUM scans TQD to construct the RTF-List for each fuzzy item. Take A.Low in Table VI as an example. The item appears in six transactions, but A.Low appears in only five transactions , and . According to the positions where A.Low is located, the rmtfu of A.Low can be calculated. Meanwhile, store the utility () and region value () of A.Low in each transaction, respectively. The RTF-List is shown in Fig 3.

From now on, since the RTF-Lists of all fuzzy 1-itemsets have been constructed, thus invoking the Miner procedure to mine all temporal fuzzy utility itemsets. The items in TQD have been sorted, and the mining order is consistent with this order. This will ensure the correctness and completeness of the mining results. Take the temporal fuzzy utility itemsets with the prefix D.Middle as an example. The details of the Miner procedure are described as below.

  1. In the Miner procedure, the algorithm calculates the tfur at first. For D.Middle, the tfur can be calculated as ( + )/( + + + ) = (6.7 + 3.96)/(53.69 + 38.66 + 69 + 103.98), which is 4.02%. Since 4.02% is less than 20%, D.Middle is not a HTFUI.

  2. Next, it is easy to compute the tfuubr of D.Middle since the RTF-List had already stored the needed information. The tfuubr of D.Middle can be calculated as ((6.7 + 34.45) + (12 + 91.98)) / (69 + 103.98) = 83.9%, which is larger than . This means D.Middle can be extended.

  3. Table VII shows the results of constructing the RTF-List with D.Middle as prefix. From the value of tfuubr, we know that {D.Middle, B.Low}, {D.Middle, B.Middle}, {D.Middle, E.Low}, {D.Middle, A.Low} and {D.Middle, F.High} can be extended. Then, TFUM continues to call the Miner procedure for discovering high temporal fuzzy utility itemsets.

  4. Finally, the whole procedure is completed and the seven high temporal fuzzy utility itemsets in Table III are output to users.

Period Tid Revised Transaction (item, quantity)
(, 1) (, 1) (, 2) (, 3)
(, 3) (, 4) (, 1)
(, 1) (, 2) (, 2)
(, 1) (, 3) (, 2) (, 6)
(, 3) (, 1) (, 3) (, 6)
(, 3) (, 1) (, 7)
(, 5) (, 4) (, 8) (, 2) (, 1) (, 1)
(, 6) (, 1) (, 1) (, 2)
(, 3) (, 7)
(, 6) (, 1) (, 1) (, 3) (, 9)
TABLE VI: A revised temporal quantitative database
Pattern tfur tfuubr
D.Middle, B.Low 10.03% 79.25%
D.Middle, B.Middle 6.05% 49.5%
D.Middle, E.Low 2.48% 54.51%
D.Middle, E.Middle 2.24% 12.50%
D.Middle, E.High 4.55% 12.5%
D.Middle, A.Low 21.77% 71.91%
D.Middle, F.Low 2.11% 4.01%
D.Middle, F.High 43.36% 43.36%
D.Middle, C.Low 1.74% 2.67%
TABLE VII: The results of 2-itemsets with D.Middle as prefix

Vi Experimental Results

The experimental algorithms (ATTFUM [18], FUMT [42], and TFUM) are all implemented in the Java language, and we conducted the experiments on a computer with an Intel Core 3.0 GHz processor with 16 GB of RAM and a Windows 10 64-bit operating system.

Dataset description. Four different datasets (BMSPOS, Retail, Mushroom, and T40I10D100k) were used to evaluate the performance of the tested algorithms. All the datasets can be easily downloaded from the open source library666SPMF: http://www.philippe-fournier-viger.com/spmf/index.php. The characteristics of databases are shown in Table VIII. However, two common features of four datasets are that 1) the quantity range of each item is [1, 6]; and 2) they cannot be used for mining high temporal fuzzy itemsets directly since they only offer utilities without periods. Therefore, in order to test the influence of the number of time periods, each dataset was randomly assigned to 1, 2, or 4 time periods, respectively. In addition, all experiments were executed three times independently. The “Runtime” and “Memory” respectively indicate the average running time and memory consumption of each experiment under the various thresholds (). Finally, we suppose the algorithm is terminated if its running time exceeds over 3,600 seconds, and we use the symbol “-” to represent this in tables.

Dataset #Trans #Items #AvgLen Type
BMSPOS 515,366 1,656 6.51 sparse
Retail 88,162 16,470 10.3 sparse
Mushroom 8,124 119 23.0 dense
T40I10D100k 10,000 942 39.6 dense
TABLE VIII: Information of experimental datasets
Dataset ATTFUM FUMT TFUM
BMSPOS 1196 1015 945
Retail 679 579 183
Mushroom 203 198 174
T40I10D100k 2348 1004 1214
TABLE IX: The peak memory of tested algorithms

Membership function. For convenience, as shown in Fig. 5, we supposed all items have the same membership function in our experiments. There are three fuzzy regions (Low, Middle, and High) taken into account.

The experimental membership function.
Fig. 5: The experimental membership function.
The runtime usage on different periods and thresholds.
Fig. 6: The runtime usage on different periods and thresholds.
Dataset Notation # Candidate under various thresholds
test test test test test test
0.6% 0.8% 1% 2% 4% 6%
ATTFUM - - - - 627 242
BMSPOS FUMT - - - 2,510 585 238
TFUM 19,769 10,765 6,945 1,811 522 267
1% 2% 3% 4% 5% 6%
ATTFUM - - 285,203 137,647 64,054 40,047
Mushroom FUMT 4,013,164 616,261 182,093 90,164 41,560 26,468
TFUM 1,181,227 209,004 75,233 35,015 19,769 12,229
TABLE X: The number of candidates generated by three algorithms
The memory consumption on different periods and thresholds.
Fig. 7: The memory consumption on different periods and thresholds.
Dataset Notation # Prune ratio under various thresholds
test test test test test test
0.6% 0.8% 1% 2% 4% 6%
BMSPOS FUMT - - - 95.79% 96.38% 97.37%
TFUM 98.03% 97.98% 97.9% 97.82% 97.1% 96.57%
1% 2% 3% 4% 5% 6%
Mushroom FUMT 72.19% 76.85% 78.16% 79.42% 80.35% 81.47%
TFUM 91.82% 92.75% 93.09% 93.3% 93.38% 93.51%
TABLE XI: The prune ratio of FUMT and TFUM algorithms

Vi-a Runtime Analysis

From Fig. 6 we can clearly see that the mining performance of ATTFUM and FUMT is degraded consequently while decreases or the number of time periods increases. For example, in Figs. 6 from (g) to (i), FUMT consumes nearly 350 seconds, 1100 seconds, and 3300 seconds in different periods, respectively, when = 1%. The runtime cost of ATTFUM increases from 393 seconds to 3584 seconds, while decreases from 3% to 1% in Fig. 6(l). Besides, ATTFUM is always timeout in most cases (especially in Fig. 6(c)). The reason we suppose is that ATTFUM consumes massive runtime in traversing subtrees. In addition, it can be observed that the proposed TFUM algorithm outperforms the other experimental algorithms. For instance, on the BMSPOS dataset, TFUM is up to two orders of magnitude faster than FUMT and ATTFUM. This is reasonable since the TFUM algorithm adopts the TP-table and lists to locate key information in a short time. In the same type of datasets (dense or sparse), the remaining measure is capable of further pruning and then fast discovering HTFUIs, when the average length of transactions are shorter. As shown in Fig. 6(h) (Mushroom dataset, two periods, and = 1%), TFUM takes only 66 seconds, but FUMT requires 1140 seconds. In Fig. 6, there are many sub-graphs that reveal the much larger gap between the novel algorithm and other algorithms.

Vi-B Memory Usage Analysis

As illustrated in Fig. 7, in general, the memory consumption of the proposed algorithm (i.e., the red line) is usually below that of other experimental algorithms. There are a lot of cases where lines have intersected in Fig. 7(a) to (l). For example, on Retail dataset with four time periods, the memory cost of FUMT is higher than that of ATTFUM when = 0.6%, but the opposite is true if is 0.8%. However, it can be observed that the runtime consumption gap between two algorithms is becoming larger. We suppose this is a trade-off case. In particular, FUMT costs less memory than that of ATTFUM on dense datasets (Figs. 7(g)–(l)). In the meantime, the most interesting aspect of Fig. 7 is that the TFUM algorithm is stabler than others. The reason is that the adopted prune strategies by three algorithms cause a great difference in the number of visited nodes. Table IX lists the peak memory of three algorithms on four datasets, and we use bold to highlight the minimal values. As we described in previous content, the more periods, the higher the memory consumption. In fact, the peak memory of ATTFUM and FUMT on the same datasets is comparable, except for T40I10D100k. It is clear that the list-based algorithms perform better than the tree-based algorithms in terms of memory usage. The reason is that the lists compress major information instead of maintaining the dataset in memory.

The scalability evaluation on different dataset sizes.
Fig. 8: The scalability evaluation on different dataset sizes.
The scalability evaluation on different periods.
Fig. 9: The scalability evaluation on different periods.

Vi-C Candidate Comparison Analysis

In this subsection, we compare the amount of candidates generation of experimented algorithms on BMSPOS and Mushroom datasets (Table X). Due to the execution time of ATTFUM exceeds 4900 seconds when is 2% on BMSPOS, we can infer that ATTFUM will be over time following decreases. Considering the Mushroom dataset, the total number of candidates generation of FUMT is about three times more than that of TFUM. We suppose Properties 2 and 3 play an important role during searching high level fuzzy itemsets. Compared to the naive maximal value upper-bound, the remaining upper bounds are tighter and easily computed (i.e., intersection). Moreover, we also evaluate the prune effect of two list-based algorithms. The formulation is defined as prune ratio = . A fuzzy itemset is assumed as a candidate if it meets Properties 1, 2 and 3. Table XI illustrates that the average prune ratio of TFUM on BMSPOS and Mushroom are 97.6% and 92.9%, respectively. In conclusion, the novel algorithm has a higher prune ratio than the benchmark, especially on dense datasets.

Vi-D Scalability Analysis

We also conducted several experiments to evaluate the scalability of our proposed algorithm on the Retail dataset. In Fig. 8, is assumed to be fixed and the time periods are 1, 2, and 4, respectively. Due to the tested results of 100% of datasets have already been illustrated in Figs. 6 and 7, we only conduct 10%, 20%, 40%, 60%, and 80% of Retail as experimental datasets. From Figs. 8(a) to (f), the runtime and memory consumption steadily increase as the tested data size increases. In addition, TFUM also performs better than FUMT under the same conditions. Then, we evaluate the performance of the TFUM algorithm in different time periods. In Fig. 9, we continue to use Retail as experimental dataset and is set to 0.9%. Besides, each transaction in Retail will be randomly assigned a period within the given range. And it is possible that a period does not contain any transactions. As can be seen from Fig. 9, with the increment of the number of periods, the peak memory usage of TFUM never exceeds 250 MB, and the runtime consumption increases linearly. All in all, we can conclude that the novel algorithm has good scalability performance.

Vii Conclusions and Future Works

This paper provided a new fuzzy utility mining method called TFUM for efficiently mining HTFUIs from temporal quantitative databases. TFUM used a novel and compact fuzzy-list and TP-table for storing temporal fuzzy information. The novel fuzzy-lists utilize the remaining measure, which makes for tighter upper-bounds than before. Our experimental evaluation of the novel algorithm reveals promising results over the state-of-the-art algorithms on sparse and dense datasets. Extensive experiments show that the adopted remaining measure can efficiently limit the search space of mining and has a high prune ratio on dense datasets. The runtime performance improvements were observed to be much lower for almost all the baseline algorithms. As the threshold was raised, the runtime consumption decreased steadily. Meanwhile, the peak memory requirements in all experiments were also less than those of other tested algorithms. In addition, we also conducted some experiments in terms of scalability. The final results show the proposed algorithm has good scalability no matter the different data sizes or periods. As a part of future work, we intend to consider further reducing the number of visited nodes during the mining process. Other interesting ideas can also be applied to time series forecasting, spatio-temporal applications, and other interesting work.

References

  • [1] A. Aggarwal and D. Toshniwal (2018) Spatio-temporal frequent itemset mining on web data. In Proceedings of the IEEE International Conference on Data Mining Workshops, pp. 1160–1165. Cited by: §I.
  • [2] R. Agrawal and R. Srikant (1994) Fast algorithms for mining association rules. In Proceedings of the 20-th ACM International Conference on Very Large Data Bases, Vol. 1215, pp. 487–499. Cited by: §I.
  • [3] K. C. Chan and W. H. Au (1997) Mining fuzzy association rules. In Proceedings of the 6-th International Conference on Information and Knowledge Management, pp. 209–215. Cited by: §II-B.
  • [4] C. M. Chen, L. Chen, W. Gan, L. Qiu, and W. Ding (2021) Discovering high utility-occupancy patterns from uncertain data. Information Sciences 546, pp. 1208–1229. Cited by: §II-A.
  • [5] J. Chen, X. Guo, W. Gan, C. M. Chen, W. Ding, and G. Chen (2022) On-shelf utility mining from transaction database. Engineering Applications of Artificial Intelligence 107, pp. 104516. Cited by: §I.
  • [6] L. Chen, W. Gan, Q. Lin, J. Miao, and C. M. Chen (2021) Mining on-shelf high-utility quantitative itemsets. In Proceedings of the 9-th IEEE International Conference on Big Data, pp. 5491–5500. Cited by: §II-A.
  • [7] C. Chu, V. S. Tseng, and T. Liang (2008) An efficient algorithm for mining temporal high utility itemsets from data streams. Journal of Systems and Software 81 (7), pp. 1105–1117. Cited by: §I.
  • [8] Y. Cui, W. Gan, H. Lin, and W. Zheng (2022) FRI-Miner: fuzzy rare itemset mining. Applied Intelligence 52 (3), pp. 3387–3402. Cited by: §II-A.
  • [9] Q. H. Duong, P. Fournier Viger, H. Ramampiaro, K. Nørvåg, and T. L. Dam (2018) Efficient high utility itemset mining using buffered utility-lists. Applied Intelligence 48 (7), pp. 1859–1877. Cited by: §II-A.
  • [10] P. Fournier Viger, C. W. Wu, S. Zida, and V. S. Tseng (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In Proceedings of the International Symposium on Methodologies for Intelligent Systems, pp. 83–92. Cited by: §II-A.
  • [11] W. Gan, J. C. W. Lin, P. Fournier Viger, H. C. Chao, T. P. Hong, and H. Fujita (2018) A survey of incremental high-utility itemset mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8 (2), pp. e1242. Cited by: §I.
  • [12] W. Gan, S. Wan, J. Chen, C. M. Chen, and L. Qiu (2020) TopHUI: top- high-utility itemset mining with negative utility. In Proceedings of the 8-th IEEE International Conference on Big Data, pp. 5350–5359. Cited by: §I, §II-A.
  • [13] P. N. Hai, P. Poncelet, and M. Teisseire (2011) Moving objects: combining gradual rules and spatio-temporal patterns. In Proceedings of the IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services, pp. 131–136. Cited by: §I.
  • [14] J. Han, J. Pei, Y. Yin, and R. Mao (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Mining and Knowledge Discovery 8 (1), pp. 53–87. Cited by: §I.
  • [15] J. Han, J. Pei, and Y. Yin (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Record 29 (2), pp. 1–12. Cited by: §I, §II-B.
  • [16] T. P. Hong, C. S. Kuo, and S. C. Chi (1999) Mining association rules from quantitative data. Intelligent Data Analysis 3 (5), pp. 363–376. Cited by: §II-B.
  • [17] T. P. Hong, C. Y. Lin, W. M. Huang, K. S. M. Li, L. S. L. Wang, and J. C. W. Lin (2020) Using tree structure to mine high temporal fuzzy utility itemsets. IEEE Access 8, pp. 153692–153706. Cited by: §I, §II-B.
  • [18] T. P. Hong, C. Y. Lin, W. M. Huang, S. M. Li, S. L. Wang, and J. C. W. Lin (2022) A one-phase tree-structure method to mine high temporal fuzzy utility itemsets. Applied Sciences 12 (6), pp. 2821. Cited by: §I, §I, §II-B, §III, §VI, footnote 3.
  • [19] W. M. Huang, T. P. Hong, G. C. Lan, M. C. Chiang, and J. C. W. Lin (2017) Temporal-based fuzzy utility mining. IEEE Access 5, pp. 26639–26652. Cited by: §I, §II-B, §III, Definition 7.
  • [20] S. Krishnamoorthy (2017) HMiner: efficiently mining high utility itemsets. Expert Systems with Applications 90, pp. 168–183. Cited by: §II-A.
  • [21] S. Krishnamoorthy (2019) Mining top- high utility itemsets with effective threshold raising strategies. Expert Systems with Applications 117, pp. 148–165. Cited by: §II-A.
  • [22] C. M. Kuok, A. Fu, and M. H. Wong (1998) Mining fuzzy association rules in databases. ACM Special Interest Group on Management of Data Record 27 (1), pp. 41–46. Cited by: §II-B.
  • [23] G. C. Lan, T. P. Hong, Y. H. Lin, and S. L. Wang (2015) Fuzzy utility mining with upper-bound measure. Applied Soft Computing 30, pp. 767–777. Cited by: §II-A, §II-B.
  • [24] C. W. Lin, T. P. Hong, and W. H. Lu (2010) Linguistic data mining with fuzzy fp-trees. Expert Systems with Applications 37 (6), pp. 4560–4567. Cited by: §II-B.
  • [25] C. W. Lin and T. P. Hong (2013) A survey of fuzzy web mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 3 (3), pp. 190–199. Cited by: §II-B.
  • [26] C. W. Lin and T. P. Hong (2014) Mining fuzzy frequent itemsets based on UBFFP trees. Journal of Intelligent & Fuzzy Systems 27 (1), pp. 535–548. Cited by: §II-B.
  • [27] J. Liu, K. Wang, and B. C. Fung (2015) Mining high utility patterns in one phase without generating candidates. IEEE Transactions on Knowledge and Data Engineering 28 (5), pp. 1245–1257. Cited by: §II-A.
  • [28] M. Liu and J. Qu (2012) Mining high utility itemsets without candidate generation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 55–64. Cited by: §I, §II-A.
  • [29] Y. Liu, W. K. Liao, and A. Choudhary (2005) A two-phase algorithm for fast discovery of high utility itemsets. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 689–695. Cited by: §II-A.
  • [30] Y. C. Liu, C. P. Cheng, and V. S. Tseng (2013) Mining differential top- co-expression patterns from time course comparative gene expression datasets. BMC Bioinformatics 14 (1), pp. 1–13. Cited by: §I.
  • [31] M. Nouioua, P. Fournier Viger, C. W. Wu, J. C. W. Lin, and W. Gan (2021) FHUQI-Miner: fast high utility quantitative itemset mining. Applied Intelligence 51 (10), pp. 6785–6809. Cited by: §II-A.
  • [32] S. Papadimitriou and S. Mavroudi (2005) The fuzzy frequent pattern tree. In The WSEAS International Conference on Computers, pp. 1–7. Cited by: §II-B.
  • [33] J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang (2007) H-Mine: fast and space-preserving frequent pattern mining in large databases. IIE Transactions 39 (6), pp. 593–605. Cited by: §I.
  • [34] R. Rymon (1992) Search through systematic set enumeration. In Proceedings of the 3rd International Conference on Principles of Knowledge Representation and Reasoning, pp. 539–550. Cited by: §IV-C.
  • [35] R. Srikant and R. Agrawal (1996) Mining quantitative association rules in large relational tables. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1–12. Cited by: §II-B.
  • [36] S. Wan, J. Chen, Z. Qi, W. Gan, and L. Tang (2022) Fast RFM model for customer segmentation. In Companion Proceedings of the Web Conference, pp. 1–8. Cited by: §I.
  • [37] S. Wan, J. Chen, P. Zhang, W. Gan, and T. Gu (2022) Discovering top- profitable patterns for smart manufacturing. In Companion Proceedings of the Web Conference, pp. 1–9. Cited by: §I.
  • [38] S. Wan, W. Gan, X. Guo, J. Chen, and U. Yun (2021) FUIM: fuzzy utility itemset mining. arXiv preprint arXiv:2111.00307. Cited by: §II-A, §II-B.
  • [39] C. M. Wang, S. H. Chen, and Y. Huang (2009) A fuzzy approach for mining high utility quantitative itemsets. In Proceedings of the IEEE International Conference on Fuzzy Systems, pp. 1909–1913. Cited by: §II-B.
  • [40] P. Wu, X. Niu, P. Fournier Viger, C. Huang, and B. Wang (2022) UBP-Miner: an efficient bit based high utility itemset mining algorithm. Knowledge-Based Systems, pp. 108865. Cited by: §II-A.
  • [41] H. Yao, H. J. Hamilton, and C. J. Butz (2004) A foundational approach to mining itemset utilities from databases. In Proceedings of the SIAM International Conference on Data Mining, pp. 482–486. Cited by: §II-A.
  • [42] Z. Ye, S. Wan, W. Gan, J. Chen, and L. Tang (2022) Fuzzy utility mining on temporal data. In Proceedings of the 4-th International Conference on Data Intelligence and Security, pp. 1–8. Cited by: §I, §II-B, §III, §IV-D, §VI, Definition 11.
  • [43] L. A. Zadeh (1965) Fuzzy sets. Information and Control 8 (3), pp. 338–353. Cited by: §I.