[1]ZHANG He,CAI Jiang-hui,ZHANG Ji-fu,et al.An outlier mining algorithm based on information entropy[J].CAAI Transactions on Intelligent Systems,2010,5(2):150-155.
Copy
CAAI Transactions on Intelligent Systems[ISSN 1673-4785/CN 23-1538/TP] Volume:
5
Number of periods:
2010 2
Page number:
150-155
Column:
学术论文—人工智能基础
Public date:
2010-04-25
- Title:
-
An outlier mining algorithm based on information entropy
- Author(s):
-
ZHANG He1; CAI Jiang-hui1; ZHANG Ji-fu1; QIAO Kan2
-
1.School of Computer Science and Technology, Taiyuan University of Science & Technology, Taiyuan 030024, China;
2. Automation Science and Electrical Engineering College, Beijing University of Aeronautics and Astronautics, Beijing 100191, China
-
- Keywords:
-
outlier; information entropy; outlier measure factor; data mining
- CLC:
-
TP311
- DOI:
-
-
- Abstract:
-
The task of outlier mining is to discover patterns that are exceptional, interesting, and sparse or isolated even though they are concealed within tremendous volumes of data. Traditional outlier detection methods are easily influenced by manmade factors. A novel outlier mining algorithm based on information entropy has been formulated. It used an outlier measurement factor based on information entropy. In the algorithm, the outlier measurement factor of each record was calculated using information entropy. Outliers were then detected by analyzing the values of the outlier measurement factor. In this way the impact of manmade factors was eliminated in outlier mining. The definition of an outlier was based on an outlier measurement factor which could explain the meaning of the outliers. Experimental results proved the feasibility and effectiveness of the algorithm when it was used to analyze the UC Irvine (UCI) data set as well as highdimensional star spectrum data.