Improved Data Analysis Using K-Means Algorithm with Quantum Particle Swarm Optimization Algorithm and PVSFCA
Keywords:Data Mining, Unsupervised Learning, Clustering, Data Mining Techniques, QPSO-K-means Clustering Algorithm, Data Analysis, Intelligent Data Analysis, Error Rate, PVSFCA
Data analysis based on the K-Means algorithm has been enhanced by integrating the Quantum Particle Swarm Optimization (QPSO) approach and the proposed Vector Space Function Clustering Algorithm (PVSFCA). Data mining aims to extract valuable insights from vast datasets and present them in an understandable format for further utilisation. Clustering involves grouping objects so that objects within the same cluster exhibit greater similarity to each other than those in other clusters. K-Means clustering involves partitioning a dataset into K clusters. This study explores the significance and prevalence of data mining techniques, investigates clustering’s role in data mining, and delves into the characteristics, fundamental principles, and execution process of the K-means algorithm. Although K-means clustering is widely used for its simplicity, efficiency, and empirical success, classic K-means has shortcomings like predefining K, random initial centre selection, and more, impacting its performance. Numerous variations of K-means have emerged to address these limitations. K-means is often utilised to minimise the squared distance between feature values of points within the same cluster. The Quantum Particle Swarm Optimization (QPSO) algorithm, integrated with K-means (QPSO-K-means), is an evolutionary computation technique to find suboptimal solutions in various scenarios. This approach simulates cluster centres as particles to obtain stable and suitable clusters, leading to effective clustering outcomes. The proposed algorithm (PVSFCA) is analysed using the UCI healthcare dataset, demonstrating its efficiency and accuracy. The algorithm’s outcomes provide a reliable foundation for enhancing clustering strategies by considering factors such as iterations, error rate, and the optimal creation of cluster centres.