Skip to main content
Log in

A smart intelligent approach based on hybrid group search and pelican optimization algorithm for data stream clustering

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Big data applications generate a huge range of evolving, real-time, and high-dimensional streaming data. In many applications, data stream clustering regarding efficiency and effectiveness becomes challenging. A major issue in data mining is clustering of data streams. The several clustering techniques were implemented for stream data, but they are mostly quite restricted approaches to cluster dynamics. Generally, the data stream is an arrival of data sequence and also several factors are added in the clustering, which is rather than the classical clustering. For every data point, the stream is mostly unbounded and also the data has been estimated atleast once. It leads to higher processing time and an additional requirement on memory. In addition, the clusters in each data and their statistical property vary over time, and streams can be noisy. To address these challenges, this research work aims to implement a novel data stream clustering which is developed with a hybrid meta-heuristic model. Initially, a data stream is collected, and the micro-clusters are formed by the K-Means Clustering (KMC) technique. Then, the formation of micro-clusters, merge and sorting of the data clusters, where the cluster optimization is performed by the Hybrid Group Search Pelican Optimization (HGSPO). The main objective of the clustering is performed to maximize the accuracy through the radius, distance and similarity measures and then, the thresholds of these metrics are optimized. In the training phase, a stream of clustering threshold is fixed for each cluster. When new data comes into this stream clustering model, the output of training data is measured with new data output that is decided to forward the data into the appropriate clusters based on the assigned threshold with minimum similarity. Through the performance analysis and the attained results, the clustering quality of the recommended system is ensured regarding standard performance metrics by estimating with various clustering and heuristic algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Bezdek JC, Keller JM (2021) Streaming data analysis: clustering or classification? IEEE Trans Syst, Man, Cybern: Syst 51(1):91–102

    Article  Google Scholar 

  2. Fahy C, Yang S (2022) Finding and tracking multi-density clusters in online dynamic data streams. IEEE Trans Big Data 8(1):178–192

    Article  Google Scholar 

  3. Huang L, Wang C-D, Chao H-Y, Yu PS (2020) MVStream: multiview data stream clustering. IEEE Trans Neural Netw Learn Syst 31(9):3482–3496

    Article  MathSciNet  Google Scholar 

  4. Zhang X, Furtlehner C, Germain-Renaud C, Sebag M (2014) Data stream clustering with affinity propagation. IEEE Trans Knowl Data Eng 26(7):1644–1656

    Article  Google Scholar 

  5. Tareq M, Sundararajan EA, Harwood A, Bakar AA (2022) A systematic review of density grid-based clustering for data streams. IEEE Access 10:579–596

    Article  Google Scholar 

  6. Cheng L, Niu J, Di Francesco M, Das SK, Luo C, Gu Y (2016) Seamless streaming data delivery in cluster-based wireless sensor networks with mobile elements. IEEE Syst J 10(2):805–816

    Article  Google Scholar 

  7. Li X, Zhang Z (2019) Research and analysis for real-time streaming big data based on controllable clustering and edge computing algorithm. IEEE Access 7:171621–171632

    Article  Google Scholar 

  8. Hahsler M, Bolaños M (2016) Clustering data streams based on shared density between micro-clusters. IEEE Trans Knowl Data Eng 28(6):1449–1461

    Article  Google Scholar 

  9. Liu B, Xiao Y, Yu PS, Cao L, Zhang Y, Hao Z (2014) Uncertain one-class learning and concept summarization learning on uncertain data streams. IEEE Trans Knowl Data Eng 26(2):468–484

    Article  Google Scholar 

  10. Rodrigues PP, Gama J, Pedroso J (2008) Hierarchical clustering of time-series data streams. IEEE Trans Knowl Data Eng 20(5):615–627

    Article  Google Scholar 

  11. Yang Y, Chen K (2011) Temporal data clustering via weighted clustering ensemble with different representations. IEEE Trans Knowl Data Eng 23(2):307–320

    Article  Google Scholar 

  12. Zubaroğlu A, Atalay V (2021) Data stream clustering: a review. Artif Intell Rev 54:1201–1236

    Article  Google Scholar 

  13. Fahy C, Yang S (2019) Dynamic feature selection for clustering high dimensional data streams. IEEE Access 7:127128–127140

    Article  Google Scholar 

  14. Tareq M, Sundararajan EA, Mohd M, Sani NS (2020) Online clustering of evolving data streams using a density grid-based method. IEEE Access 8:166472–166490

    Article  Google Scholar 

  15. Bai L, Cheng X, Liang J, Shen H (2016) An optimization model for clustering categorical data streams with drifting concepts. IEEE Trans Knowl Data Eng 28(11):2871–2883

    Article  Google Scholar 

  16. Wang C, Lai J, Huang D, Zheng W (2013) SVStream: a support vector-based algorithm for clustering data streams. IEEE Trans Knowl Data Eng 25(6):1410–1424

    Article  Google Scholar 

  17. Youn J, Shim J, Lee S-G (2018) Efficient data stream clustering with sliding windows based on locality-sensitive hashing. IEEE Access 6:63757–63776

    Article  Google Scholar 

  18. Sui J, Liu Z, Jung A, Liu L, Li X (2018) Dynamic clustering scheme for evolving data streams based on improved STRAP. IEEE Access 6:46157–46166

    Article  Google Scholar 

  19. Li Y, Li H, Wang Z, Liu B, Cui J, Fei H (2022) ESA-Stream: efficient self-adaptive online data stream clustering. IEEE Trans Knowl Data Eng 34(2):617–630

    Article  Google Scholar 

  20. Yan X, Razeghi-Jahromi M, Homaifar A, Erol BA, Girma A, Tunstel E (2019) A novel streaming data clustering algorithm based on fitness proportionate sharing. IEEE Access 7:184985–185000

    Article  Google Scholar 

  21. Fahy C, Yang S, Gongora M (2019) Ant colony stream clustering: a fast density clustering algorithm for dynamic data streams. IEEE Trans Cybern 49(6):2215–2228

    Article  Google Scholar 

  22. Puschmann D, Barnaghi P, Tafazolli R (2017) Adaptive clustering for dynamic IoT data streams. IEEE Internet Things J 4(1):64–74

    Article  Google Scholar 

  23. Yin C, Xia L, Zhang S, Sun R, Wang J (2018) Improved clustering algorithm based on high-speed network data stream. Soft Comput 22:4185–4195

    Article  Google Scholar 

  24. Wang Y, Li J, Yang B, Li H-G (2022) Stream-data-clustering based adaptive alarm threshold setting approaches for industrial processes with multiple operating conditions. ISA Trans 129:594–608

    Article  Google Scholar 

  25. Sun Y, Cao M, Sun Y, Gao H, Lou F, Liu S, Xia Q (2021) Uncertain data stream algorithm based on clustering RBF neural network. Microprocess Microsyst 81:103731

    Article  Google Scholar 

  26. Aggarwal CC, Yu PS, Han J, Wang J, (2003) A framework for clustering evolving data streams, In: Proceedings 2003 VLDB Conference, pp. 81–92

  27. Chan TF, Golub GH & LeVeque RJ, (1982) Updating formulae and a pairwise algorithm for computing sample variances, COMPSTAT 1982 5th Symposium held at Toulouse pp 30–41.

  28. Ester M , Kriegel H-P , Sander J, Xu X , 1996 A density-based algorithm for discovering clusters in large spatial databases with noise, In: KDD-96 Proceedings, AAAI, pp 226–231

  29. Cao F , Ester M , Qian W, and Zhou A, (2006) "Density-based clustering over an evolving data stream with noise, In: Proceedings of the 2006 SIAM international conference on data mining (SDM)

  30. Sculley D, (2020) Web-scale k-means clustering, In: Proceedings of the 19th international conference on world wide web, pp 1177–1178.

  31. O'Callaghan L, Mishra N, Meyerson A, Guha S, Motwani R, (2002) Streaming-data algorithms for high-quality clustering, In: Proceedings 18th international conference on data engineering, pp. 685–694

  32. Assenmacher D & Trautmann H, (2002) Textual one-pass stream clustering with automated distance threshold adaption, In: Asian conference on intelligent information and database systems, pp 3–16

  33. Carnein M , Assenmacher D & Trautmann H , (2017) Stream clustering of chat messages with applications to twitch streams, In: International conference on conceptual modeling, pp. 79–88

  34. Preetha M, Anil KN, Elavarasi K, Vignesh T, Nagaraju V (2022) A hybrid clustering approach based Q-leach in TDMA to optimize QOS-parameters. Wireless Pers Commun 123(2):1169–1200

    Article  Google Scholar 

  35. Manishankar M, Rao KV (2018) Mining stream data using k-means clustering algorithm. Int J Res 7:390–396

    Google Scholar 

  36. He S, Wu QH, Saunders JR (2009) Group search optimizer: an optimization algorithm inspired by animal searching behavior. IEEE Trans Evol Comput 13(5):973–990

    Article  Google Scholar 

  37. Trojovský P, Dehghani M (2022) Pelican optimization algorithm: a novel nature-inspired algorithm for engineering applications. Sensors 22:855

    Article  Google Scholar 

  38. Pedersen MEH, Chipperfield AJ (2010) Simplifying particle swarm optimization. Appl Soft Comput 10(2):618–628

    Article  Google Scholar 

  39. Raom RV (2016) Jaya: a simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int J Indus Eng Comput 7(1):19–34

    Google Scholar 

  40. Zhang B, Qin S, Wang W, Wang D, Xue L (2016) Data stream clustering based on Fuzzy C-Mean algorithm and entropy theory. Signal Process 126:111–116

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors have made substantial contributions to conception and design, revising the manuscript, and the final approval of the version to be published. Also, all authors agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Corresponding author

Correspondence to Swathi Agarwal.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Agarwal, S., Reddy, C.R.K. A smart intelligent approach based on hybrid group search and pelican optimization algorithm for data stream clustering. Knowl Inf Syst 66, 2467–2500 (2024). https://doi.org/10.1007/s10115-023-02002-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-02002-5

Keywords

Navigation