Thumbnail Image

Y-Means Clustering Vs N-CP Clustering with Canopies for Intrusion Detection

Sabari Kannan, Sivanadiyan
Intrusions present a very serious security threat in a network environment. It is therefore essential to detect intrusions to prevent compromising the stability of the system or the security of information that is stored on the network. The most difficult problem is detecting new intrusion types, of which intrusion detection systems may not be aware. Many of the signature based methods and learning algorithms generally cannot detect these new intrusions. We propose an optimized algorithm called n-CP clustering algorithm that is capable of detecting intrusions that may be new or otherwise. The algorithm also overcomes two significant shortcomings of K-Means clustering namely dependency and degeneracy on the number of clusters. The proposed clustering method utilizes the concept of canopies to optimize the search by eliminating the pair-wise distance computation of all the data points. The system will also maintain a low false positive rate and high detection rate. The efficiency and the speed of the algorithm are analyzed by comparing with another clustering algorithms used for intrusion detection, called Y-Means clustering. Both the algorithms are tested against the KDD-99 data set to compute the detection rate and false positive rate. The algorithms are also tested for efficiency with varying number of data fields of the dataset. This thesis outlines the technical difficulties of K-means clustering, an algorithm to eliminate those shortcomings and the canopies technique to speed up the intrusion detection process. The results show that our clustering algorithm that uses canopies concept is approximately 40% faster than the Y-Means clustering and overcomes the two main limitations of K-Means clustering. Finally, a comparative analysis of the Y-means clustering and our proposed n-CP clustering with canopies was carried out with the help of ROC Curves showing the respective hit rates to false alarm rates.