Abstract
Big data applications generate a huge range of evolving, real-time, and high-dimensional streaming data. In many applications, data stream clustering regarding efficiency and effectiveness becomes challenging. A major issue in data mining is clustering of data streams. The several clustering techniques were implemented for stream data, but they are mostly quite restricted approaches to cluster dynamics. Generally, the data stream is an arrival of data sequence and also several factors are added in the clustering, which is rather than the classical clustering. For every data point, the stream is mostly unbounded and also the data has been estimated atleast once. It leads to higher processing time and an additional requirement on memory. In addition, the clusters in each data and their statistical property vary over time, and streams can be noisy. To address these challenges, this research work aims to implement a novel data stream clustering which is developed with a hybrid meta-heuristic model. Initially, a data stream is collected, and the micro-clusters are formed by the K-Means Clustering (KMC) technique. Then, the formation of micro-clusters, merge and sorting of the data clusters, where the cluster optimization is performed by the Hybrid Group Search Pelican Optimization (HGSPO). The main objective of the clustering is performed to maximize the accuracy through the radius, distance and similarity measures and then, the thresholds of these metrics are optimized. In the training phase, a stream of clustering threshold is fixed for each cluster. When new data comes into this stream clustering model, the output of training data is measured with new data output that is decided to forward the data into the appropriate clusters based on the assigned threshold with minimum similarity. Through the performance analysis and the attained results, the clustering quality of the recommended system is ensured regarding standard performance metrics by estimating with various clustering and heuristic algorithms.
Similar content being viewed by others
References
Bezdek JC, Keller JM (2021) Streaming data analysis: clustering or classification? IEEE Trans Syst, Man, Cybern: Syst 51(1):91–102
Fahy C, Yang S (2022) Finding and tracking multi-density clusters in online dynamic data streams. IEEE Trans Big Data 8(1):178–192
Huang L, Wang C-D, Chao H-Y, Yu PS (2020) MVStream: multiview data stream clustering. IEEE Trans Neural Netw Learn Syst 31(9):3482–3496
Zhang X, Furtlehner C, Germain-Renaud C, Sebag M (2014) Data stream clustering with affinity propagation. IEEE Trans Knowl Data Eng 26(7):1644–1656
Tareq M, Sundararajan EA, Harwood A, Bakar AA (2022) A systematic review of density grid-based clustering for data streams. IEEE Access 10:579–596
Cheng L, Niu J, Di Francesco M, Das SK, Luo C, Gu Y (2016) Seamless streaming data delivery in cluster-based wireless sensor networks with mobile elements. IEEE Syst J 10(2):805–816
Li X, Zhang Z (2019) Research and analysis for real-time streaming big data based on controllable clustering and edge computing algorithm. IEEE Access 7:171621–171632
Hahsler M, Bolaños M (2016) Clustering data streams based on shared density between micro-clusters. IEEE Trans Knowl Data Eng 28(6):1449–1461
Liu B, Xiao Y, Yu PS, Cao L, Zhang Y, Hao Z (2014) Uncertain one-class learning and concept summarization learning on uncertain data streams. IEEE Trans Knowl Data Eng 26(2):468–484
Rodrigues PP, Gama J, Pedroso J (2008) Hierarchical clustering of time-series data streams. IEEE Trans Knowl Data Eng 20(5):615–627
Yang Y, Chen K (2011) Temporal data clustering via weighted clustering ensemble with different representations. IEEE Trans Knowl Data Eng 23(2):307–320
Zubaroğlu A, Atalay V (2021) Data stream clustering: a review. Artif Intell Rev 54:1201–1236
Fahy C, Yang S (2019) Dynamic feature selection for clustering high dimensional data streams. IEEE Access 7:127128–127140
Tareq M, Sundararajan EA, Mohd M, Sani NS (2020) Online clustering of evolving data streams using a density grid-based method. IEEE Access 8:166472–166490
Bai L, Cheng X, Liang J, Shen H (2016) An optimization model for clustering categorical data streams with drifting concepts. IEEE Trans Knowl Data Eng 28(11):2871–2883
Wang C, Lai J, Huang D, Zheng W (2013) SVStream: a support vector-based algorithm for clustering data streams. IEEE Trans Knowl Data Eng 25(6):1410–1424
Youn J, Shim J, Lee S-G (2018) Efficient data stream clustering with sliding windows based on locality-sensitive hashing. IEEE Access 6:63757–63776
Sui J, Liu Z, Jung A, Liu L, Li X (2018) Dynamic clustering scheme for evolving data streams based on improved STRAP. IEEE Access 6:46157–46166
Li Y, Li H, Wang Z, Liu B, Cui J, Fei H (2022) ESA-Stream: efficient self-adaptive online data stream clustering. IEEE Trans Knowl Data Eng 34(2):617–630
Yan X, Razeghi-Jahromi M, Homaifar A, Erol BA, Girma A, Tunstel E (2019) A novel streaming data clustering algorithm based on fitness proportionate sharing. IEEE Access 7:184985–185000
Fahy C, Yang S, Gongora M (2019) Ant colony stream clustering: a fast density clustering algorithm for dynamic data streams. IEEE Trans Cybern 49(6):2215–2228
Puschmann D, Barnaghi P, Tafazolli R (2017) Adaptive clustering for dynamic IoT data streams. IEEE Internet Things J 4(1):64–74
Yin C, Xia L, Zhang S, Sun R, Wang J (2018) Improved clustering algorithm based on high-speed network data stream. Soft Comput 22:4185–4195
Wang Y, Li J, Yang B, Li H-G (2022) Stream-data-clustering based adaptive alarm threshold setting approaches for industrial processes with multiple operating conditions. ISA Trans 129:594–608
Sun Y, Cao M, Sun Y, Gao H, Lou F, Liu S, Xia Q (2021) Uncertain data stream algorithm based on clustering RBF neural network. Microprocess Microsyst 81:103731
Aggarwal CC, Yu PS, Han J, Wang J, (2003) A framework for clustering evolving data streams, In: Proceedings 2003 VLDB Conference, pp. 81–92
Chan TF, Golub GH & LeVeque RJ, (1982) Updating formulae and a pairwise algorithm for computing sample variances, COMPSTAT 1982 5th Symposium held at Toulouse pp 30–41.
Ester M , Kriegel H-P , Sander J, Xu X , 1996 A density-based algorithm for discovering clusters in large spatial databases with noise, In: KDD-96 Proceedings, AAAI, pp 226–231
Cao F , Ester M , Qian W, and Zhou A, (2006) "Density-based clustering over an evolving data stream with noise, In: Proceedings of the 2006 SIAM international conference on data mining (SDM)
Sculley D, (2020) Web-scale k-means clustering, In: Proceedings of the 19th international conference on world wide web, pp 1177–1178.
O'Callaghan L, Mishra N, Meyerson A, Guha S, Motwani R, (2002) Streaming-data algorithms for high-quality clustering, In: Proceedings 18th international conference on data engineering, pp. 685–694
Assenmacher D & Trautmann H, (2002) Textual one-pass stream clustering with automated distance threshold adaption, In: Asian conference on intelligent information and database systems, pp 3–16
Carnein M , Assenmacher D & Trautmann H , (2017) Stream clustering of chat messages with applications to twitch streams, In: International conference on conceptual modeling, pp. 79–88
Preetha M, Anil KN, Elavarasi K, Vignesh T, Nagaraju V (2022) A hybrid clustering approach based Q-leach in TDMA to optimize QOS-parameters. Wireless Pers Commun 123(2):1169–1200
Manishankar M, Rao KV (2018) Mining stream data using k-means clustering algorithm. Int J Res 7:390–396
He S, Wu QH, Saunders JR (2009) Group search optimizer: an optimization algorithm inspired by animal searching behavior. IEEE Trans Evol Comput 13(5):973–990
Trojovský P, Dehghani M (2022) Pelican optimization algorithm: a novel nature-inspired algorithm for engineering applications. Sensors 22:855
Pedersen MEH, Chipperfield AJ (2010) Simplifying particle swarm optimization. Appl Soft Comput 10(2):618–628
Raom RV (2016) Jaya: a simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int J Indus Eng Comput 7(1):19–34
Zhang B, Qin S, Wang W, Wang D, Xue L (2016) Data stream clustering based on Fuzzy C-Mean algorithm and entropy theory. Signal Process 126:111–116
Author information
Authors and Affiliations
Contributions
All authors have made substantial contributions to conception and design, revising the manuscript, and the final approval of the version to be published. Also, all authors agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Agarwal, S., Reddy, C.R.K. A smart intelligent approach based on hybrid group search and pelican optimization algorithm for data stream clustering. Knowl Inf Syst 66, 2467–2500 (2024). https://doi.org/10.1007/s10115-023-02002-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-02002-5